BULK INSERT statement

BULK INSERT is a SQL statement that:

reads data from various sources,
maps data locations and data types,
performs optional lightweight data transformations,
then loads the data to a table using INSERT statements

In some cases, the data will be pre-sorted by internal FeatureBase partition before being ingested by multiple concurrent routines.

expr

Before you begin

Prepare your data source:

file(s)
URL(s)
inline blob

The data source must be in one of the following formats:

Table of contents

Before you begin
BNF diagrams
Syntax
Arguments
MAP clause
TRANSFORM clause
FROM clause
WITH clause
Additional information
- INPUT additional
Examples
BULK INSERT examples
- TRANSFORM with operators and expressions
- BULK INSERT with STREAM

BNF diagrams

expr

Column list

expr

Run SHOW TABLE <tablename> to learn the structure of a destination table

Map list

expr

Transform clause

expr

With clause

expr expr

Syntax

BULK INSERT
  INTO
    <table-name> [(<column-list>)]
  MAP
    ({$.<column-name> | <map-id>} <data-type>,...)
  [TRANSFORM
    (expr,...)]
  FROM
    ['<URL>' | '<string-literal>']
  WITH
    [
      [BATCHSIZE <integer-value>]
      [ROWSLIMIT <integer-value>]
      [INPUT ['INLINE' | 'STREAM' | 'URL']]
      [FORMAT
        ['CSV' [HEADER_ROW] [CSV_EMPTY_STRING_AS_NULL] [CSV_NULL_AS_NULL] [NULL_AS_EMPTY_SET]] |
        ['NDJSON' [ALLOW_MISSING_VALUES] [NULL_AS_EMPTY_SET]] |
        ['PARQUET' [NULL_AS_EMPTY_SET]] |
        ['ORC' [NULL_AS_EMPTY_SET]]
      ]
      ...
    ]

Arguments

Argument	Description	Required?	Additional information
`INSERT`	INSERT a row of data identified by an `_id` value not in the table UPDATE values in a row where the `_id` already exists	Yes	Columns not defined in the `<column-list>` are not updated
`<table-name>`	Name of target table	Yes
`<column-list>`	Columns belonging to `<table-name>`, starting with the `_id` column	Optional	Assumes all columns in `<table-name>` are to be updated if left blank

MAP clause

Argument	Description	Required?	Additional information
`MAP`	Comma-separated list defining each column in the `<column-list>` as a positive integer and data type	Yes
`$.<column-name>`	`MAP` values in `NDJSON` data source to the target table `<column-name>`		MAP clause with NDJSON data source
`<map-id>`	Corresponds to the ordinal position of each column	Yes	`_id` column is numbered `0`
`<data-type>`	The data type assigned to each column in the `CREATE TABLE` statement	Yes	Data types

TRANSFORM clause

Argument	Description	Required?	Additional information
`TRANSFORM`	Prefix each column integer from the `MAP` clause with the `@` symbol	Optional
`expr`	Any valid function or operator used to manipulate values identified by `@<map-id>`	Optional

FROM clause

Argument	Description	Required?	Additional information
`FROM`	A single or multi-line string literal that specifies the data source defined in the `WITH...INPUT` and `WITH...SOURCE` clauses	Yes
`'<URL>'`	One or more valid URLs that link to valid data sources	Required for `INPUT 'URL'`	Define multiple URLs as string literals
`<string-literal>`	CSV or NDJSON records as a string literal.	Required for `INPUT 'INLINE'`	* Not supported for `FORMAT 'PARQUET'` * String literals

WITH clause

Argument	Description	Required?	Additional information
`WITH`	Pass one or more statement level options	Optional

Ingest arguments

Argument	Description	Default	Additional information
`BATCHSIZE`	Specify the batch size of the BULK commit	`1000`	If batching not available on client, use `STREAM` instead
`ROWSLIMIT`	Limit the number of rows processed in a batch.
`CONCURRENCY`	Number of concurrent workers to ingest the data after it has been presorted	`8`	Can be used with `CSV` and `NDJSON` which can presort. Not supported for `PARQUET` and `ORC`

INPUT arguments

Argument	Description	Required?	Additional information
`INPUT`	Input values must match those used in the `FROM` clause
`'INLINE'`	Used for data included directly from the `FROM` clause with contents of the literal read as though they were in a file.	Required for `FROM <string-literal>'` Not supported for `PARQUET` and `ORC`	INPUT additional
`'STREAM'`	`STREAM` supports a streaming payload using an HTTP multipart POST.	Optional	BULK INSERT with STREAM
`FORMAT`	Supported data source formats are: * CSV * `NDJSON` * `ORC` * `PARQUET`	Optional	`INPUT 'INLINE'` does not support `PARQUET` or `ORC`

FORMAT arguments

The following arguments are optional

Argument	Data source	Description	Additional information
`ALLOW_MISSING_VALUES`	`NDJSON`	Output a `NULL` value from the `MAP` clause if the path expression fails	NDJSON value assignment
`CSV_EMPTY_STRING_AS_NULL`	`CSV`	Assign `""` value as `null`	CSV value assignment
`CSV_NULL_AS_NULL`	`CSV`	Assign `NULL` value as `null`	CSV value assignment
`HEADER_ROW`	`CSV`	Ignore the header row
`NULL_AS_EMPTY_SET`	All	`MAP` any `NULL` values from the data source to a `SET` column without error	SET and SETQ data types

Additional information

Any variables are evaluated during execution for each row

INPUT additional

String literals

'Single quotes surround all string literals'
'A single quote that''s contained in a string literal is escaped with a second single quote'
'A string literal containing ""double-quotes"" can be escaped using a second double-quote'
Multi-line string literals are prefixed with x:

x'<first line>
<second line>'

The closing quote of a multi-line string literal must follow the final value

CSV data source value assignment

Literal Value	Target Data Type	Result	Additional information
`,,` or `,"",`	All data types unless stated	`NULL`
`,,` or `,"",`	STRING	`''` (empty string)	`NULL` when `WITH` `'CSV_EMPTY_STRING_AS_NULL'` is defined
`,,` or `,"",`	SET and SETQ	`NULL`	Empty set `[]` if `WITH 'NULL AS EMPTY SET'` is defined
`,NULL,`	All data types unless stated	`'NULL'` (string literal)	`NULL` when `WITH 'CSV_NULL_AS_NULL'` is defined

NDJSON data source value assignment

Literal Value	Target Data Type	Result	Additional information
`""`	STRING	`''` (empty string)
`""`	STRINGSET and STRINGSETQ	`['']` (set with empty string member)
`null`	All data types unless stated	`NULL`	Not supported for
`[]`	SET and SETQ	`[]` (empty set)
Value Missing ()	All data types unless stated	`NULL`	This will only occur if using `ALLOW_MISSING_VALUES`
Value Missing ()	SET and SETQ	`NULL`	Empty set `[]` if `FORMAT` includes `'NULL_AS_EMPTY_SET'` and `'ALLOW_MISSING_VALUES'`

Examples

`BULK INSERT` examples

The following SQL examples demonstrate BULK INSERT using different clauses and data sources.

Statement	Target table	FROM clause	TRANSFORM clause	WITH clause
BULK INSERT using CSV data source	csv-target	URL		* `BATCHSIZE` * `FORMAT` * `INPUT` * `HEADER_ROW`
BULK INSERT using ndJSON data source	ndjson-target	URL	Includes CAST function	* `BATCHSIZE` * `FORMAT` * `INPUT` * `ALLOW_MISSING_VALUES`
BULK INSERT using ORC data source	orc-target	URL		* `FORMAT` * `INPUT`
BULK INSERT using Apache Parquet data source	parquet-target	URL		* `FORMAT` * `INPUT`
BULK INSERT using inline CSV	tuple-set-target	Inline CSV	Combines column values using TUPLE function	* `FORMAT` * `INPUT`

TRANSFORM with operators and expressions

BULK INSERT INTO doctest (_id, zip-code, state-name)
MAP (
  0 id,
  1 INTEGER,
  2 STRING)
TRANSFORM (
    @0 + 10, -- offset the new _id value by 10
    @1,      -- pass through unchanged
    CASE     -- clean up state names
        WHEN @2 = 'Texas' then 'TX' end
        WHEN @2 = 'California' then 'CA' end
        ELSE @2
    end
)

BULK INSERT with STREAM

Using input STREAM requires making an HTTP multipart form POST request. The argument to FROM may be '*' and all files attached to the request will be streamed in and ingested together. The fbsql CLI tool implements the multipart POST and you would use it as follows:

\file myfile.csv
\file myfile2.csv
\file myfile3.csv

bulk replace
  into insert_test (_id, int1, string1, timestamp1)
  map (0 id, 1 int, 2 string)
  transform (@0, @1, @2, current_timestamp)
  from
    '*'
  with
    format 'CSV'
    input 'STREAM';

BULK INSERT statement

Before you begin

BNF diagrams

Column list

Map list

Transform clause

With clause

Syntax

Arguments

MAP clause

TRANSFORM clause

FROM clause

WITH clause

Ingest arguments

INPUT arguments

FORMAT arguments

Additional information

INPUT additional

String literals

CSV data source value assignment

NDJSON data source value assignment

Examples

BULK INSERT examples

TRANSFORM with operators and expressions

BULK INSERT with STREAM

`BULK INSERT` examples