BULK INSERT statement
BULK INSERT is a SQL statement that:
- reads data from various sources,
- maps data locations and data types,
- performs optional lightweight data transformations,
- then loads the data to a table using INSERT statements
In some cases, the data will be pre-sorted by internal FeatureBase partition before being ingested by multiple concurrent routines.
Sources can include:
- file(s)
- URL(s)
- inline blob
Supported formats include:
- CSV
- PARQUET
- NDJSON
Table of contents
BNF diagrams
Column list
Map list
Transform clause
With clause
Syntax
BULK INSERT
INTO
table_name [(column_name,...)]
MAP (position type_name,...)
[TRANSFORM (expr,...)]
FROM
['path/file_name' | 'URL' | x'records']
WITH
[
[BATCHSIZE integer_literal]
[ROWSLIMIT integer_literal]
[INPUT ['path/file_name' | 'INLINE' | 'STREAM' |'URL']]
[FORMAT
['CSV' [HEADER_ROW] [CSV_EMPTY_STRING_AS_NULL] [CSV_NULL_AS_NULL] [NULL_AS_EMPTY_SET]] |
['NDJSON' [ALLOW_MISSING_VALUES] [NULL_AS_EMPTY_SET]] |
['PARQUET' [NULL_AS_EMPTY_SET]]
]
...
]
Arguments
Argument | Description | Required? | Further information |
---|---|---|---|
INSERT | Insert new records if the _id does not exist else update the record with the values passed. Values are not updated for missing columns. | Yes | REPLACE can be used but is the same functionality |
table_name | Name of target table | Yes | |
column_name | Valid columns belonging to table_name . First column must be defined _id column. | Optional | System builds a column list from existing columns in table_name if columns are not specified. |
MAP | Specifies how source data is mapped from its location and what datatype to output as. Values from the MAP clause are inserted to columns specified in the column_list . | Yes | Map examples |
position | Ordinal position of value in source. | ||
type_name | Data type of the value in source. | Data types | |
TRANSFORM expr | One or more SQL expressions with dependencies on column_list and the MAP clause | Optional | Transform additional |
FROM | A single or multi-line string literal that specifies the source of data and are interpreted based on the INPUT option. | Yes | |
'path/file_name' | Valid path and file name for data source. | Optional | Not available for FeatureBase Cloud. |
'URL' | Valid URL(s) for data sources. | Optional | Multiple URLs may be passed separated by whitespace. If newlines are used, must use an x before the quote like x'URL<newline>URL' |
x'records' | CSV or NDJSON records as a string literal. | Required for INLINE | Not supported for FORMAT 'PARQUET' |
WITH | Pass one or more statement level options. | Optional | |
BATCHSIZE | Specify the batch size of the BULK commit. Defaults to 1000. | Optional | Can be used with STREAM to batch records as they are streamed to the server where batching not available on client |
ROWSLIMIT | Limit the number of rows processed in a batch. | Optional | |
INPUT | Input values must match those used in the FROM clause | ||
'INLINE' | Used for data included directly from the FROM clause with contents of the literal read as though they were in a file. | Required for FROM x'records' Not supported for PARQUET Format | INLINE quotation marks |
'STREAM' | STREAM supports a streaming payload using an HTTP multipart POST. | Optional | BULK INSERT with STREAM |
FORMAT | Set the format of the source data to 'CSV' , 'NDJSON' or 'PARQUET' | Optional | 'PARQUET' does not support INPUT (INLINE) |
CONCURRENCY | Number of concurrent workers to ingest the data after it has been presorted. Default 8 . | Optional | Only applies to CSV and NDJSON currently as PARQUET does not yet presort. |
NULL_AS_EMPTY_SET | Argument that will coerce all NULL values resulting from the MAP clause into [] (empty sets) for all target columns with SET datatypes | Optional | |
HEADER_ROW | CSV argument that will ignore the header in the source CSV file. | Optional | |
CSV_EMPTY_STRING_AS_NULL | CSV argument that will assign "" value as null | Optional | |
CSV_NULL_AS_NULL | CSV argument that will assign NULL value as null | Optional | |
ALLOW_MISSING_VALUES | NDJSON argument that outputs a NULL value from the MAP clause if the path expression fails. | Optional |
Additional information
Transform additional
The TRANSFORM
clause must include:
- SQL expressions that match those in the
column_list
, and - specify data transformations using ordinal positions defined in the
MAP
clause
Any variables are evaluated during execution for each row.
TRANSFORM with TUPLE()
function
When the TUPLE()
function is used in a TRANSFORM
clause, the following values are returned:
Source data type | Returns |
---|---|
IDSETQ | TIMESTAMP and associated integer value |
STRINGSETQ | TIMESTAMP and associated string value |
CSV Value Assignment
There are special assignments for certain literal values when inserting CSV data.
Literal Value | Target Data Type | Resultant | Further information |
---|---|---|---|
,, or ,"", | All unless explicitly listed | NULL | |
,, or ,"", | string | '' (empty string) | if CSV_EMPTY_STRING_AS_NULL is used, the resultant becomes NULL |
,, or ,"", | stringset idset stringsetq idsetq | NULL | if NULL_AS_EMPTY_SET is used, the resultant becomes [] (empty set) |
,NULL, | All unless explicitly listed | 'NULL' (string literal) | if CSV_NULL_AS_NULL is used, the resultant becomes NULL |
NDJSON Value Assignment
There are special assignments for certain literal values when inserting NDJSON data.
Literal Value | Target Data Type | Resultant | Further information |
---|---|---|---|
"" | string | '' (empty string) | |
"" | stringset stringsetq | [''] (set with empty string member) | |
null | All unless explicitly listed | NULL | |
[] | stringset idset stringsetq idsetq | [] (empty set) | |
Value Missing () | All unless explicitly listed | NULL | This will only occur if using ALLOW_MISSING_VALUES |
Value Missing () | stringset idset stringsetq idsetq | NULL | if NULL_AS_EMPTY_SET is used, the resultant becomes [] (empty set). This will only occur if using ALLOW_MISSING_VALUES |
Examples
MAP examples
Input type | MAP expression for value in source column | Example | Additional information |
---|---|---|---|
CSV | Integer offset | BULK INSERT CSV example | |
NDJSON | String | BULK INSERT NDJSON example | JsonPath expression for the NDJSON value |
PARQUET | A string label that precisely matches the column name in the schema within the parquet file. | BULK INSERT PARQUET example |
TRANSFORM examples
Map clause | TRANSFORM clause |
---|---|
MAP (0 id, 1 int, 4 string) | Variables: @0 , @1 and @2 |
TRANSFORM (
@0 + 10, -- offset the new _id value by 10
@1, -- pass through unchanged
TUPLE(@3,@4), -- create tuple to load time quantum fields
CASE -- clean up state names
WHEN @2 = 'Texas' then 'TX' end
WHEN @2 = 'California' then 'CA' end
ELSE @2
end
)
FROM examples
Using INLINE argument
The contents of an inline string literal are treated as a file and read line-by-line.
Single line string literal
'this is a single-line string literal'
Multi-line string literal
Multi line (prepend with x
)
x'this
is
a
multi-line
string
literal'
Using INLINE with quotation marks
FROM clause quotation marks must be escaped before the BULK statement is run, even when CSV values are quoted.
Incorrect | Correct |
---|---|
FeatureBase's speed | FeatureBase''s speed |
""Time is money." – Benjamin Franklin." | """Time is money."" – Benjamin Franklin." |
BULK INSERT with read from CSV file
bulk replace
into insert_test (_id, int1, string1, timestamp1)
map (0 id, 1 int, 2 string)
transform (@0, @1, @2, current_timestamp)
from
'/dev/queries/insert_test.csv'
with
format 'CSV'
input 'FILE';
BULK INSERT with STREAM
Using input STREAM requires making an HTTP multipart form POST request. The argument to FROM may be '*'
and all files attached to the request will be streamed in and ingested together. The fbsql CLI tool implements the multipart POST and you would use it as follows:
\file myfile.csv
\file myfile2.csv
\file myfile3.csv
bulk replace
into insert_test (_id, int1, string1, timestamp1)
map (0 id, 1 int, 2 string)
transform (@0, @1, @2, current_timestamp)
from
'*'
with
format 'CSV'
input 'STREAM';
This would ingest all three files in a single request.
BULK INSERT using TRANSFORM with TUPLE() function
BULK INSERT statement with TUPLE()
BULK INSERT INTO tuple-demo(
_id,
time_col,
stringset_col,
ideset_col,
stringsetq_col,
idesetq_col
)
MAP (
0 STRING,
1 TIMESTAMP,
2 STRINGSET,
3 IDSET
)
TRANSFORM(
@0,
@1,
@2,
@3,
TUPLE(@1,@2),
TUPLE(@1,@3)
)
FROM x'
A,2013-07-15T01:18:46Z,stringset1, 1
B,2014-07-15T01:18:46Z,stringset2, 2
'
WITH
BATCHSIZE 10000
format 'CSV'
input 'INLINE';
Further information
- BULK INSERT using CSV file
- BULK INSERT using NDJSON data source
- BULK INSERT using PARQUET data source