Data modeling low-cardinality data
In a traditional database, low cardinality data is typically normalized into separate tables and keys are used to maintain the relationships.
This FAQ guides you through:
- selecting the best unique identifier for each row of data
- CSV structure
- SQL statements to create a target table and read data from the CSV file
- querying the data
Table of contents
Before you begin
Sample data
Species | Vertebrae | Captivity |
---|---|---|
Manatee | Yes | 3 |
Sea Horse | Yes | 956 |
Koala | Yes | 19 |
Starfish | No | 20 |
Which unique identifier should I choose?
The first column in every FeatureBase table is the unique identifier _id
and can be assigned one of two data types:
A
CREATE TABLE
statement with_id STRING
will increase disk requirements and query duration.
The choice of unique identifier helps reduce the number of rows to be ingested, which then increases the speed of your queries.
In this example, Vertebrae
is the best choice for unique identifier because it reduces four rows to two:
Vertebrae | Species |
---|---|
yes | Manatee, Sea Horse, Koala |
no | Starfish |