Skip to main content Link Menu Expand (external link) Document Search Copy Copied

BULK INSERT statement

BULK INSERT is a SQL statement that:

  • reads data from various sources,
  • maps data locations and data types,
  • performs optional lightweight data transformations,
  • then loads the data to a table using INSERT statements

In some cases, the data will be pre-sorted by internal FeatureBase partition before being ingested by multiple concurrent routines.

expr

Sources can include:

  • file(s)
  • URL(s)
  • inline blob

Supported formats include:

  • CSV
  • PARQUET
  • NDJSON
Table of contents

BNF diagrams

expr

Column list

expr

Map list

expr

Transform clause

expr

With clause

expr expr

Syntax

BULK INSERT
  INTO
    table_name [(column_name,...)]
  MAP (position type_name,...)
  [TRANSFORM (expr,...)]
  FROM
    ['path/file_name' | 'URL' | x'records']
  WITH
    [
      [BATCHSIZE integer_literal]
      [ROWSLIMIT integer_literal]
      [INPUT ['path/file_name' | 'INLINE' | 'STREAM' |'URL']]
      [FORMAT
        ['CSV' [HEADER_ROW] [CSV_EMPTY_STRING_AS_NULL] [CSV_NULL_AS_NULL] [NULL_AS_EMPTY_SET]] |
        ['NDJSON' [ALLOW_MISSING_VALUES] [NULL_AS_EMPTY_SET]] |
        ['PARQUET' [NULL_AS_EMPTY_SET]]
      ]
      ...
    ]

Arguments

Argument Description Required? Further information
INSERT Insert new records if the _id does not exist else update the record with the values passed. Values are not updated for missing columns. Yes REPLACE can be used but is the same functionality
table_name Name of target table Yes  
column_name Valid columns belonging to table_name. First column must be defined _id column. Optional System builds a column list from existing columns in table_name if columns are not specified.
MAP Specifies how source data is mapped from its location and what datatype to output as. Values from the MAP clause are inserted to columns specified in the column_list. Yes Map examples
position Ordinal position of value in source.    
type_name Data type of the value in source.   Data types
TRANSFORM expr One or more SQL expressions with dependencies on column_list and the MAP clause Optional Transform additional
FROM A single or multi-line string literal that specifies the source of data and are interpreted based on the INPUT option. Yes  
'path/file_name' Valid path and file name for data source. Optional Not available for FeatureBase Cloud.
'URL' Valid URL(s) for data sources. Optional Multiple URLs may be passed separated by whitespace. If newlines are used, must use an x before the quote like x'URL<newline>URL'
x'records' CSV or NDJSON records as a string literal. Required for INLINE Not supported for FORMAT 'PARQUET'
WITH Pass one or more statement level options. Optional  
BATCHSIZE Specify the batch size of the BULK commit. Defaults to 1000. Optional Can be used with STREAM to batch records as they are streamed to the server where batching not available on client
ROWSLIMIT Limit the number of rows processed in a batch. Optional  
INPUT Input values must match those used in the FROM clause    
'INLINE' Used for data included directly from the FROM clause with contents of the literal read as though they were in a file. Required for FROM x'records'
Not supported for PARQUET Format
INLINE quotation marks
'STREAM' STREAM supports a streaming payload using an HTTP multipart POST. Optional BULK INSERT with STREAM
FORMAT Set the format of the source data to 'CSV', 'NDJSON' or 'PARQUET' Optional 'PARQUET' does not support INPUT (INLINE)
CONCURRENCY Number of concurrent workers to ingest the data after it has been presorted. Default 8. Optional Only applies to CSV and NDJSON currently as PARQUET does not yet presort.
NULL_AS_EMPTY_SET Argument that will coerce all NULL values resulting from the MAP clause into [] (empty sets) for all target columns with SET datatypes Optional  
HEADER_ROW CSV argument that will ignore the header in the source CSV file. Optional  
CSV_EMPTY_STRING_AS_NULL CSV argument that will assign "" value as null Optional  
CSV_NULL_AS_NULL CSV argument that will assign NULL value as null Optional  
ALLOW_MISSING_VALUES NDJSON argument that outputs a NULL value from the MAP clause if the path expression fails. Optional  

Additional information

Transform additional

The TRANSFORM clause must include:

  • SQL expressions that match those in the column_list, and
  • specify data transformations using ordinal positions defined in the MAP clause

Any variables are evaluated during execution for each row.

TRANSFORM with TUPLE() function

When the TUPLE() function is used in a TRANSFORM clause, the following values are returned:

Source data type Returns
IDSETQ TIMESTAMP and associated integer value
STRINGSETQ TIMESTAMP and associated string value

CSV Value Assignment

There are special assignments for certain literal values when inserting CSV data.

Literal Value Target Data Type Resultant Further information
,, or ,"", All unless explicitly listed NULL  
,, or ,"", string '' (empty string) if CSV_EMPTY_STRING_AS_NULL is used, the resultant becomes NULL
,, or ,"", stringset
idset
stringsetq
idsetq
NULL if NULL_AS_EMPTY_SET is used, the resultant becomes [] (empty set)
,NULL, All unless explicitly listed 'NULL' (string literal) if CSV_NULL_AS_NULL is used, the resultant becomes NULL

NDJSON Value Assignment

There are special assignments for certain literal values when inserting NDJSON data.

Literal Value Target Data Type Resultant Further information
"" string '' (empty string)  
"" stringset
stringsetq
[''] (set with empty string member)  
null All unless explicitly listed NULL  
[] stringset
idset
stringsetq
idsetq
[] (empty set)  
Value Missing () All unless explicitly listed NULL This will only occur if using ALLOW_MISSING_VALUES
Value Missing () stringset
idset
stringsetq
idsetq
NULL if NULL_AS_EMPTY_SET is used, the resultant becomes [] (empty set). This will only occur if using ALLOW_MISSING_VALUES

Examples

MAP examples

Input type MAP expression for value in source column Example Additional information
CSV Integer offset BULK INSERT CSV example  
NDJSON String BULK INSERT NDJSON example JsonPath expression for the NDJSON value
PARQUET A string label that precisely matches the column name in the schema within the parquet file. BULK INSERT PARQUET example  

TRANSFORM examples

Map clause TRANSFORM clause
MAP (0 id, 1 int, 4 string) Variables: @0, @1 and @2
TRANSFORM (
    @0 + 10, -- offset the new _id value by 10
    @1,      -- pass through unchanged
    TUPLE(@3,@4),      -- create tuple to load time quantum fields
    CASE     -- clean up state names
        WHEN @2 = 'Texas' then 'TX' end
        WHEN @2 = 'California' then 'CA' end
        ELSE @2
    end
)

FROM examples

Using INLINE argument

The contents of an inline string literal are treated as a file and read line-by-line.

Single line string literal

'this is a single-line string literal'

Multi-line string literal

Multi line (prepend with x)

x'this
is
a
multi-line
string
literal'

Using INLINE with quotation marks

FROM clause quotation marks must be escaped before the BULK statement is run, even when CSV values are quoted.

Incorrect Correct
FeatureBase's speed FeatureBase''s speed
""Time is money." – Benjamin Franklin." """Time is money."" – Benjamin Franklin."

BULK INSERT with read from CSV file

bulk replace
    into insert_test (_id, int1, string1, timestamp1)
    map (0 id, 1 int, 2 string)
    transform (@0, @1, @2, current_timestamp)
from
    '/dev/queries/insert_test.csv'
with
    format 'CSV'
    input 'FILE';

BULK INSERT with STREAM

Using input STREAM requires making an HTTP multipart form POST request. The argument to FROM may be '*' and all files attached to the request will be streamed in and ingested together. The fbsql CLI tool implements the multipart POST and you would use it as follows:

\file myfile.csv
\file myfile2.csv
\file myfile3.csv

bulk replace
  into insert_test (_id, int1, string1, timestamp1)
  map (0 id, 1 int, 2 string)
  transform (@0, @1, @2, current_timestamp)
  from
    '*'
  with
    format 'CSV'
    input 'STREAM';

This would ingest all three files in a single request.

BULK INSERT using TRANSFORM with TUPLE() function

BULK INSERT statement with TUPLE()

BULK INSERT INTO tuple-demo(
    _id,
    time_col,
    stringset_col,
    ideset_col,
    stringsetq_col,
    idesetq_col
  )
  MAP (
    0 STRING,
    1 TIMESTAMP,
    2 STRINGSET,
    3 IDSET
  )
  TRANSFORM(
    @0,
    @1,
    @2,
    @3,
    TUPLE(@1,@2),
    TUPLE(@1,@3)
  )
  FROM x'
    A,2013-07-15T01:18:46Z,stringset1, 1
    B,2014-07-15T01:18:46Z,stringset2, 2
  '
  WITH
    BATCHSIZE 10000
    format 'CSV'
    input 'INLINE';

Further information