Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Data modeling low-cardinality data

In a traditional database, low cardinality data is typically normalized into separate tables and keys are used to maintain the relationships.

This FAQ guides you through:

  • selecting the best unique identifier for each row of data
  • CSV structure
  • SQL statements to create a target table and read data from the CSV file
  • querying the data
Table of contents

Before you begin

Sample data

Species Vertebrae Captivity
Manatee Yes 3
Sea Horse Yes 956
Koala Yes 19
Starfish No 20

Which unique identifier should I choose?

The first column in every FeatureBase table is the unique identifier _id and can be assigned one of two data types:

A CREATE TABLE statement with _id STRING will increase disk requirements and query duration.

The choice of unique identifier helps reduce the number of rows to be ingested, which then increases the speed of your queries.

In this example, Vertebrae is the best choice for unique identifier because it reduces four rows to two:

Vertebrae Species
yes Manatee, Sea Horse, Koala
no Starfish

Next step