How does FeatureBase encode integer values as bitmaps?
Bit-sliced data creates a single bitmap for each power of 2. This approach means FeatureBase can represent any number in the 64-bit range (10^19) using only 64 bitmaps.
Table of contents
Before you begin
What data types are converted to bit-sliced bitmaps?
User data mapped to the following data types is converted to bit-sliced bitmaps:
User data | FeatureBase data type | Additional information |
---|---|---|
Floating point | Decimal | A bitmap is added for the decimal point |
Signed Integer | Integer | A bitmap is added for the sign |
Date and time | Timestamp |
How does FeatureBase bit-slice integer data?
The following table represents the historical names for FeatureBase and downloads for each.
ID | historical_name | downloads |
---|---|---|
1 | Pilosa | 10000 |
2 | Molecula | 18524 |
3 | FeatureBase | 50000 |
By bit-slicing, the downloads
data can be encoded:
- as 3 bits
- in 16 bitmaps
- with column names saved to disk
Step 1 - convert values to base-2
Using historical_name
as the id
means downloads
integer values can be represented as individual base-2 values
| id | downloads |
| ------ | -------------- |
| Pilosa | 10011100010000 |
| id | downloads |
| -------- | --------------- |
| Molecula | 100100001011100 |
| id | downloads |
| ----------- | ---------------- |
| FeatureBase | 1100001101010000 |
Step 2 - Bit slicing the integer values
Bit slicing adds a column for each power of 2.
| id | 32768 | 16384 | 8192 | 4096 | 2048 | 1024 | 512 | 256 | 128 | 64 | 32 | 16 | 8 | 4 | 2 | 1 |
| ------ | ----- | ----- | ---- | ---- | ---- | ---- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| Pilosa | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| id | 32768 | 16384 | 8192 | 4096 | 2048 | 1024 | 512 | 256 | 128 | 64 | 32 | 16 | 8 | 4 | 2 | 1 | |
| -------- | ----- | ----- | ---- | ---- | ---- | ---- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| Molecula | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 |
| id | 32768 | 16384 | 8192 | 4096 | 2048 | 1024 | 512 | 256 | 128 | 64 | 32 | 16 | 8 | 4 | 2 | 1 |
| ----------- | ----- | ----- | ---- | ---- | ---- | ---- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| FeatureBase | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
Step 3 - Save bitslice bitmaps
The bit-slice columns can now be saved as individual bitmaps. For example:
| id | 32768 |
| ----------- | ----- |
| Pilosa | 0 |
| Molecula | 0 |
| FeatureBase | 1 |
Bitmap storage overheads
Encoding data as base-2 equality-encoded or bit-slice bitmaps makes queries faster but incurs storage overheads because the number of bitmaps scale:
- with the number of values, and
- the cardinality of those values
For example, the average storage overheads for a 10,000 value dataset will be as follows:
Database | Dataset saved as | Average storage overhead (KB) |
---|---|---|
RDBMS | Row and column based structure | 20480 - 30720 |
FeatureBase | * equality-encoded bitmaps * Bit-slice bitmaps | 1280000 |
FeatureBase overcomes this issue by compressing all bitmap data using Roaring Bitmap Format, based on Roaring Bitmaps.