FeatureBase converts data to base-2 (binary) in two types of bitmap:
- Equality-encoded bitmaps for non-integer values, or
- Bit-sliced bitmaps which slice integer values into a single bitmap for each power of two
Table of contents
- Before you begin
- Why use bitmaps for data storage?
- What are the drawbacks of Bitmaps?
- What bitmaps are created for my data?
- How does FeatureBase store bitmaps?
- Are column names converted to bitmaps?
- Further information
Bitmaps make updates and queries faster because the data is encoded as
Bitmap updates in FeatureBase are faster for two reasons.
|FeatureBase can directly update a value encoded as a standard bitmap without needing to traverse other values in the structure
|Updates to bit-slice bitmaps mean flipping or adding bits rather than altering the entire value
There are two limitations to every data query:
- Latency, where the structure and encoding cause delays returning results
- Concurrency, where queries slow down because multiple users are accessing the same data at the same time
FeatureBase addresses these limitations as follows:
|Lower latency queries mean data is accessed for shorter times, which reduces the number of connections and concurrency issues
|Equality-encoded bitmaps mean Boolean queries such as
OR are substantially faster because data relationships are represented as
1 (they exist) or
0 (they don’t exist)
|SELECT specific values
|Queries on equality-encoded and bit-slice data can directly and sequentially access specified values without needing to traverse all other values in a database table
|Integer values are bit-sliced into individual bitmaps for each power of two. This means range queries can combine the specific bitmaps instead of working with integers in a traditional row/column format
Bitmaps have two main issues:
- low-cardinality data duplication
- data storage overheads
FeatureBase overcomes low-cardinality issues with four unique data types suitable for
Encoding data as base-2 equality-encoded or bit-slice bitmaps makes queries faster but incurs storage overheads because the number of bitmaps scale:
- with the number of values, and
- the cardinality of those values
For example, the average storage overheads for a 10,000 value dataset will be as follows:
|Dataset saved as
|Average storage overhead (KB)
|Row and column based structure
|20480 - 30720
|* equality-encoded bitmaps
* Bit-slice bitmaps
FeatureBase overcomes this issue by compressing all bitmap data using Roaring Bitmap Format, based on Roaring Bitmaps.
Data is converted to bitmaps based on the destination data type:
|FeatureBase data type
|Date and time
|Low cardinality keyed to date/time values
At a high level FeatureBase bitmaps are stored as Shards, made up of:
- a Roaring Bitmap Format (RBF) data file
- a Write Ahead Log (WAL) file
FeatureBase Cloud stores shards on disk in the
Column names are saved to disk in the Roaring Bitmap Format data file.
- Learn about equality-encoded bitmaps
- Learn about bit-sliced bitmaps
- Learn about data modeling for FeatureBase
- Learn about importing data to FeatureBase