Skip to main content Link Menu Expand (external link) Document Search Copy Copied

BULK INSERT statement

BULK INSERT is a SQL statement that:

  • reads data from various sources,
  • maps data locations and data types,
  • performs optional lightweight data transformations,
  • then loads the data to a table using INSERT statements

In some cases, the data will be pre-sorted by internal FeatureBase partition before being ingested by multiple concurrent routines.

expr

Before you begin

Prepare your data source:

  • file(s)
  • URL(s)
  • inline blob

The data source must be in one of the following formats:

Table of contents

BNF diagrams

expr

Column list

expr

Run SHOW TABLE <tablename> to learn the structure of a destination table

Map list

expr

Transform clause

expr

With clause

expr expr

Syntax

BULK INSERT
  INTO
    <table-name> [(<column-list>)]
  MAP
    ({$.<column-name> | <map-id>} <data-type>,...)
  [TRANSFORM
    (expr,...)]
  FROM
    ['<URL>' | '<string-literal>']
  WITH
    [
      [BATCHSIZE <integer-value>]
      [ROWSLIMIT <integer-value>]
      [INPUT ['INLINE' | 'STREAM' | 'URL']]
      [FORMAT
        ['CSV' [HEADER_ROW] [CSV_EMPTY_STRING_AS_NULL] [CSV_NULL_AS_NULL] [NULL_AS_EMPTY_SET]] |
        ['NDJSON' [ALLOW_MISSING_VALUES] [NULL_AS_EMPTY_SET]] |
        ['PARQUET' [NULL_AS_EMPTY_SET]] |
        ['ORC' [NULL_AS_EMPTY_SET]]
      ]
      ...
    ]

Arguments

Argument Description Required? Additional information
INSERT INSERT a row of data identified by an _id value not in the table
UPDATE values in a row where the _id already exists
Yes Columns not defined in the <column-list> are not updated
<table-name> Name of target table Yes  
<column-list> Columns belonging to <table-name>, starting with the _id column Optional Assumes all columns in <table-name> are to be updated if left blank

MAP clause

Argument Description Required? Additional information
MAP Comma-separated list defining each column in the <column-list> as a positive integer and data type Yes  
$.<column-name> MAP values in NDJSON data source to the target table <column-name>   MAP clause with NDJSON data source
<map-id> Corresponds to the ordinal position of each column Yes _id column is numbered 0
<data-type> The data type assigned to each column in the CREATE TABLE statement Yes Data types

TRANSFORM clause

Argument Description Required? Additional information
TRANSFORM Prefix each column integer from the MAP clause with the @ symbol Optional  
expr Any valid function or operator used to manipulate values identified by @<map-id> Optional  

FROM clause

Argument Description Required? Additional information
FROM A single or multi-line string literal that specifies the data source defined in the WITH...INPUT and WITH...SOURCE clauses Yes  
'<URL>' One or more valid URLs that link to valid data sources Required for INPUT 'URL' Define multiple URLs as string literals
<string-literal> CSV or NDJSON records as a string literal. Required for INPUT 'INLINE' * Not supported for FORMAT 'PARQUET'
* String literals

WITH clause

Argument Description Required? Additional information
WITH Pass one or more statement level options Optional  

Ingest arguments

Argument Description Default Additional information
BATCHSIZE Specify the batch size of the BULK commit 1000 If batching not available on client, use STREAM instead
ROWSLIMIT Limit the number of rows processed in a batch.    
CONCURRENCY Number of concurrent workers to ingest the data after it has been presorted 8 Can be used with CSV and NDJSON which can presort. Not supported for PARQUET and ORC

INPUT arguments

Argument Description Required? Additional information
INPUT Input values must match those used in the FROM clause    
'INLINE' Used for data included directly from the FROM clause with contents of the literal read as though they were in a file. Required for FROM <string-literal>'
Not supported for PARQUET and ORC
INPUT additional
'STREAM' STREAM supports a streaming payload using an HTTP multipart POST. Optional BULK INSERT with STREAM
FORMAT Supported data source formats are:
* CSV
* NDJSON
* ORC
* PARQUET
Optional INPUT 'INLINE' does not support PARQUET or ORC

FORMAT arguments

The following arguments are optional

Argument Data source Description Additional information
ALLOW_MISSING_VALUES NDJSON Output a NULL value from the MAP clause if the path expression fails NDJSON value assignment
CSV_EMPTY_STRING_AS_NULL CSV Assign "" value as null CSV value assignment
CSV_NULL_AS_NULL CSV Assign NULL value as null CSV value assignment
HEADER_ROW CSV Ignore the header row  
NULL_AS_EMPTY_SET All MAP any NULL values from the data source to a SET column without error SET and SETQ data types

Additional information

Any variables are evaluated during execution for each row

INPUT additional

String literals

  • 'Single quotes surround all string literals'
  • 'A single quote that''s contained in a string literal is escaped with a second single quote'
  • 'A string literal containing ""double-quotes"" can be escaped using a second double-quote'
  • Multi-line string literals are prefixed with x:
x'<first line>
<second line>'

The closing quote of a multi-line string literal must follow the final value

CSV data source value assignment

Literal Value Target Data Type Result Additional information
,, or ,"", All data types unless stated NULL  
,, or ,"", STRING '' (empty string) NULL when WITH 'CSV_EMPTY_STRING_AS_NULL' is defined
,, or ,"", SET and SETQ NULL Empty set [] if WITH 'NULL AS EMPTY SET' is defined
,NULL, All data types unless stated 'NULL' (string literal) NULL when WITH 'CSV_NULL_AS_NULL' is defined

NDJSON data source value assignment

Literal Value Target Data Type Result Additional information
"" STRING '' (empty string)  
"" STRINGSET and STRINGSETQ [''] (set with empty string member)  
null All data types unless stated NULL Not supported for
[] SET and SETQ [] (empty set)  
Value Missing () All data types unless stated NULL This will only occur if using ALLOW_MISSING_VALUES
Value Missing () SET and SETQ NULL Empty set [] if FORMAT includes 'NULL_AS_EMPTY_SET' and 'ALLOW_MISSING_VALUES'

Examples

BULK INSERT examples

The following SQL examples demonstrate BULK INSERT using different clauses and data sources.

Statement Target table FROM clause TRANSFORM clause WITH clause
BULK INSERT using CSV data source csv-target URL   * BATCHSIZE
* FORMAT
* INPUT
* HEADER_ROW
BULK INSERT using ndJSON data source ndjson-target URL Includes CAST function * BATCHSIZE
* FORMAT
* INPUT
* ALLOW_MISSING_VALUES
BULK INSERT using ORC data source orc-target URL   * FORMAT
* INPUT
BULK INSERT using Apache Parquet data source parquet-target URL   * FORMAT
* INPUT
BULK INSERT using inline CSV tuple-set-target Inline CSV Combines column values using TUPLE function * FORMAT
* INPUT

TRANSFORM with operators and expressions

BULK INSERT INTO doctest (_id, zip-code, state-name)
MAP (
  0 id,
  1 INTEGER,
  2 STRING)
TRANSFORM (
    @0 + 10, -- offset the new _id value by 10
    @1,      -- pass through unchanged
    CASE     -- clean up state names
        WHEN @2 = 'Texas' then 'TX' end
        WHEN @2 = 'California' then 'CA' end
        ELSE @2
    end
)

BULK INSERT with STREAM

Using input STREAM requires making an HTTP multipart form POST request. The argument to FROM may be '*' and all files attached to the request will be streamed in and ingested together. The fbsql CLI tool implements the multipart POST and you would use it as follows:

\file myfile.csv
\file myfile2.csv
\file myfile3.csv

bulk replace
  into insert_test (_id, int1, string1, timestamp1)
  map (0 id, 1 int, 2 string)
  transform (@0, @1, @2, current_timestamp)
  from
    '*'
  with
    format 'CSV'
    input 'STREAM';