Number of records to read before indexing them as a batch. A larger value indicates better throughput and more memory usage. Recommended: 1,048,576
1
--concurrency
int
Number of concurrent sources and indexing routines to launch. Does not support SQL ingestion or --auto-generate
1
When ingesting multiple CSV files
--featurebase-hosts
string
Supply FeatureBase default bind points using comma separated list of host:port pairs.
[localhost:10101]
--index
string
Name of target FeatureBase index.
Yes
--string-array-separator
string
character used to delineate values in string array
,
--use-shard-transactional-endpoint
Use alternate import endpoint that ingests data for all fields in a shard in a single atomic request. No negative performance impact and better consistency.
Automatically generate IDs. Used for testing purposes. Cannot be used with --concurrency
When --id-field or --primary-key-fields not defined
--external-generate
Allocate _id using the FeatureBase ID allocator. Supports --offset-mode. Requires --auto-generate
--id-alloc-key-prefix
string
Prefix for ID allocator keys when using --external-generate. Requires different value for each concurrent ingester
ingest
--id-field
string
A sequence of positive integers that uniquely identifies each record. Use instead of --primary-key-fields
if --auto-generate or --primary-key-fields not defined
--primary-key-fields
string
Convert records to strings for use as unique _id. Single records are not added to target as records. Multiple records are concatenated using / and added to target as records. Use instead of --id-field
[]
If --auto-generate or --id-field are not defined.
--offset-mode
Set Offset-mode based Autogenerated IDs. Requires --auto-generate and --external-generate
When ingesting from an offset-based data source
Batch processing flags
flag
data type
Description
Default
Required
--batch-size
int
Number of records to read before indexing them as a batch. A larger value indicates better throughput and more memory usage. Recommended: 1,048,576
1
Error handling flags
flag
data type
Description
Default
Required
--allow-decimal-out-of-range
Allow ingest to continue when it encounters out of range decimals in Decimal Fields.
false
--allow-int-out-of-range
Allow ingest to continue when it encounters out of range integers in Int Fields.
false
--allow-timestamp-out-of-range
Allow ingest to continue when it encounters out of range timestamps in Timestamp Fields.
false
--batch-max-staleness
duration
Maximum length of time the oldest record in a batch can exist before the batch is flushed. This may result in timeouts while waiting for the source
--commit-timeout
duration
A commit is a process of informing the data source the current batch of records is ingested. --commit-timeout is the maximum time before the commit process is cancelled. May not function for CSV ingest process.
--skip-bad-rows
int
Fail the ingest process if n rows are not processed.
Missing and empty string values are handled the same.
Field data type
Expected behaviour
"ID"
Error if "ID" selected for id-field. Otherwise, do not update value in index.
"DateInt"
Raise error during ingestion - timestamp must have a valid value.
"Timestamp"
Raise error during ingestion - input is not time.
"RecordTime"
Do not update value in index.
"Int"
Do not update value in index.
"Decimal"
Do not update value in index.
"String"
Error if "String" select for primary-key field. Otherwise do not update value in index.
"Bool"
Do not update value in index.
"StringArray"
Do not update value in index.
"IDArray"
Do not update value in index.
"ForeignKey"
Do not update value in index.
Examples
Ingest data from SQL server database
SELECT queries the SQL-Server table then the ingester:
converts the data to Roaring Bitmap format
imports the records to the my_data FeatureBase index.
molecula-consumer-sql \--connection-string'server=sqldb.myserver.com;userid=mysqlusername;password=secret;database=mydbname'\--featurebase-hosts 10.0.0.1:10101 \--batch-size 1000000 \--driver=mssql \--index=my_data \--id-field=id\--row-expr'SELECT tableID as id__ID, zipcode as zipcode__String limit 10'
SQL ingest flags to import from assets table
SELECT queries the MySQL table, then the ingester:
converts the records to Roaring Bitmap format
imports the records to the asset_list FeatureBase index.
molecula-consumer-sql \--driver mysql \--connection-string'username:password@(127.0.0.1:3306)/dbname'\--featurebase-hosts localhost:10101 \--batch-size 10000 \--index=asset_list \--primary-key-fields'asset_tag'\--row-expr'SELECT asset_tag as asset_tag__String, weight as weight__Int, warehouse as warehouse__String FROM assets'
Join Assets and Events tables into a single FeatureBase index
The SELECT statement queries the MySQL events table to:
return event data along with weight of relative asset_tag
create locale field based on the first three characters from the events.pk field
join assets.asset_tag and events.asset_tag.
The ingester then:
converts the records to Roaring Bitmap format
imports the records to the events_plus_weight FeatureBase index.
molecula-consumer-sql \--driver mysql \--connection-string'username:password@(127.0.0.1:3306)/dbname'\--featurebase-hosts localhost:10101 \--batch-size 10000 \--index=events_plus_weight \--primary-key-fields'pk'\--row-expr'SELECT events.pk as pk__String, events.asset_tag as asset_tag__String, assets.weight as weight__Int, SUBSTRING(events.pk, 1, 3) as locale__String FROM events INNER JOIN assets on assets.asset_tag = events.asset_tag'