fbsql loader TOML configuration
The fbsql loader
command relies on an appropriately formatted TOML configuration file that contains:
- FeatureBase target table to insert data
- Connection settings for an Apache Impala, Apache Kafka or PostgreSQL data source
- An optional series of key/value pairs that correspond to target table columns.
Before you begin
- Learn about TOML format
- Learn about Apache Impala
- Learn about Apache Kafka Confluent Consumer
- Learn about PostgreSQL
- Learn about “docopt” notation standards used in this guide
- Learn about fbsql
- Create a destination table
TOML configuration syntax
# Kafka keys
hosts = ["<address:port>",...]
group = "<kafka-confluent-group>"
topics = "<kafka-confluent-topics>"
# Impala and PostgreSQL connection keys
driver= "<datasource-type>"
connection-string = "<datasource-type>://<datasource-connection-string>"
# Data keys
table = "<target-table>"
query = "<select-from-impala-or-postgresql-data-source>"
# Ingest batching keys
batch-size = <integer-value>
batch-max-staleness = "<integer-value><time-unit>"
timeout = "<integer-value><time-unit>"
# Optional target table keys
[[fields]]
name = "<target-table-column>"
source-type = "<target-table-column-data-type>"
source-column = "<target-table-column>"
[primary-key= "true"]
[source-path = ["<kafka-json-parent-key>", "<json-child-key>"]]
Kafka keys
Key | Description | Required | Additional information |
---|---|---|---|
hosts | One or more Kafka confluent consumer hosts. Use [] for multiple hosts | Apache Kafka | Confluent Hosts documentation |
group | Kafka consumer group | Kafka | Confluent Hosts documentation |
topics | One or more Kafka topics | Yes | Confluent Hosts documentation |
Impala and PostgreSQL connection keys
Key | Description | Required | Additional information |
---|---|---|---|
driver | Driver required for data source | Impala or PostgreSQL | |
connection-string | Quoted connection string that includes the data source type | Impala or PostgreSQL | Data source connection strings |
Data keys
Key | Description | Required | Additional information |
---|---|---|---|
table | Double-quoted target table to insert data | Yes | CREATE TABLE statement |
query | Valid SQL query to SELECT data from the data source for insertion into the target table | Impala or PostgreSQL |
Ingest batching keys
Data is collected into batches before importing to FeatureBase. Default values are used if batching keys are not supplied.
Key | Description | Required | Default | Additional information |
---|---|---|---|---|
batch-size | Integer value representing the maximum size of a batch file containing the data to import. | Yes | 1 | Batch keys |
batch-max-staleness | Maximum length of time the oldest record in a batch can exist before the batch is flushed | Kafka | Batch keys | |
timeout | Time to wait before batch is flushed | Kafka | "1s" | Batch keys |
Optional target table keys
Run SHOW CREATE TABLE <tablename>
to output column names and data types required for [[fields]]
key-values.
FeatureBase will supply values from specified table
key if [[fields]]
key/values are not supplied.
Key | Description | Required | Additional information |
---|---|---|---|
name | Target column name | Yes | |
source-type | Target column data | Yes | Featurebase data types Configuring Record Time for Time Quantum fields |
source-path | Nested JSON object parent and child | Kafka | Defaults to name value when not supplied |
source-column | Target column name | Optional | When omitted, order of [[fields]] key-values are correlated to those in <target-table> |
primary-key | Set to "true" for FeatureBase _id column | Only for _id column | Omit for other columns |
source-type
Specifies the FeatureBase column type the incoming data will be formatted as. For example, if a kafka message contains “foo”:”6” the configuration for foo should contain source-type = “string” even if the foo column in FeatureBase is an Int type. If a source-type is not provided, it will default to the FeatureBase field’s type.
Configuring Record Time for Time Quantum fields
To load data into set type columns with TIMEQUATUM
option, the loader configuration should have their mapped fields defined with source-type
set to StringSetQ
or IDSetQ
. Also, an optional timestamp field can be defined in the loader configuration to specify a record time for these time quantum set fields, this special timestamp field must be defined with source-type
set to recordTime
. Loader will automatically apply the timestamp value from record time field to all the time quantum set fields found in the record. If the optional record time field is not configured or data for that field is not available then time quantum fields will be loaded without a record time and they will be visible to all time ranges without restrictions.
Additional information
Impala and PostgreSQL connection strings
Batch keys
- There is a direct correlation between the
batch-size
value in relation to the speed of import and resource usage. batch-max-staleness
values may result in timeouts while waiting for a data sourcetimeout
can be set to0s
to disable
Batch key time-unit
Batch keys that require <integer-value><time-unit>
can use one or more of the following combinations, in descending order.
Time unit | Declaration | Example |
---|---|---|
hour | h | 24h30m |
minute | m | 30m45s |
second | s | 45s10ms |
milliseconds | ms | 10ms22us |
microseconds | us | 22us28ns |
nanoseconds | ns | 28ns |