FeatureBase converts data to base-2 (binary) in two types of bitmap:
- Equality-encoded bitmaps for non-integer values, or
- Bit-sliced bitmaps which slice integer values into a single bitmap for each power of two
Table of contents
- Before you begin
- Why use bitmaps for data storage?
- What are the drawbacks of Bitmaps?
- What bitmaps are created for my data?
- How does FeatureBase store bitmaps?
- Are column names converted to bitmaps?
- Further information
Bitmaps make updates and queries faster because the data is encoded as
Bitmap updates in FeatureBase are faster for two reasons.
|Bitmap type||Update description|
|Equality-encoded||FeatureBase can directly update a value encoded as a standard bitmap without needing to traverse other values in the structure|
|Bit-slice||Updates to bit-slice bitmaps mean flipping or adding bits rather than altering the entire value|
There are two limitations to every data query:
- Latency, where the structure and encoding cause delays returning results
- Concurrency, where queries slow down because multiple users are accessing the same data at the same time
FeatureBase addresses these limitations as follows:
|Multiple||Concurrency||Lower latency queries mean data is accessed for shorter times, which reduces the number of connections and concurrency issues|
|Boolean queries||Latency||Equality-encoded bitmaps mean Boolean queries such as |
|SELECT specific values||Latency||Queries on equality-encoded and bit-slice data can directly and sequentially access specified values without needing to traverse all other values in a database table|
|Range queries||Latency||Integer values are bit-sliced into individual bitmaps for each power of two. This means range queries can combine the specific bitmaps instead of working with integers in a traditional row/column format|
Bitmaps have two main issues:
- low-cardinality data duplication
- data storage overheads
FeatureBase overcomes low-cardinality issues with four unique data types suitable for
Encoding data as base-2 equality-encoded or bit-slice bitmaps makes queries faster but incurs storage overheads because the number of bitmaps scale:
- with the number of values, and
- the cardinality of those values
For example, the average storage overheads for a 10,000 value dataset will be as follows:
|Database||Dataset saved as||Average storage overhead (KB)|
|RDBMS||Row and column based structure||20480 - 30720|
|FeatureBase||* equality-encoded bitmaps |
* Bit-slice bitmaps
FeatureBase overcomes this issue by compressing all bitmap data using Roaring Bitmap Format, based on Roaring Bitmaps.
Data is converted to bitmaps based on the destination data type:
|User data||FeatureBase data type||Bitmap type|
|Unsigned integer||ID||Equality-encoded bitmaps|
|Date and time||Timestamp||Bit-sliced|
|Low cardinality||Set||Equality-encoded bitmaps|
|Low cardinality keyed to date/time values||SetQ||Equality-encoded bitmaps|
At a high level FeatureBase bitmaps are stored as Shards, made up of:
- a Roaring Bitmap Format (RBF) data file
- a Write Ahead Log (WAL) file
FeatureBase stores shards on disk in the following directories:
|FeatureBase Product||Directory||Further information|
|Community|| ||FeatureBase Community data directory|
Column names are saved to disk in the Roaring Bitmap Format data file.
- Learn about equality-encoded bitmaps
- Learn about bit-sliced bitmaps
- Learn about importing data to FeatureBase