Skip to main content Link Menu Expand (external link) Document Search Copy Copied

How does FeatureBase encode integer values as bitmaps?

Bit-sliced data creates a single bitmap for each power of 2. This approach means FeatureBase can represent any number in the 64-bit range (10^19) using only 64 bitmaps.

Table of contents

Before you begin

What data types are converted to bit-sliced bitmaps?

User data mapped to the following data types is converted to bit-sliced bitmaps:

User data FeatureBase data type Additional information
Floating point Decimal A bitmap is added for the decimal point
Signed Integer Integer A bitmap is added for the sign
Date and time Timestamp  

How does FeatureBase bit-slice integer data?

The following table represents the historical names for FeatureBase and downloads for each.

ID historical_name downloads
1 Pilosa 10000
2 Molecula 18524
3 FeatureBase 50000

By bit-slicing, the downloads data can be encoded:

  • as 3 bits
  • in 16 bitmaps
  • with column names saved to disk

Step 1 - convert values to base-2

Using historical_name as the id means downloads integer values can be represented as individual base-2 values

| id     | downloads      |
| ------ | -------------- |
| Pilosa | 10011100010000 |
| id       | downloads       |
| -------- | --------------- |
| Molecula | 100100001011100 |
| id          | downloads        |
| ----------- | ---------------- |
| FeatureBase | 1100001101010000 |

Step 2 - Bit slicing the integer values

Bit slicing adds a column for each power of 2.

| id     | 32768 | 16384 | 8192 | 4096 | 2048 | 1024 | 512 | 256 | 128 | 64  | 32  | 16  | 8   | 4   | 2   | 1   |
| ------ | ----- | ----- | ---- | ---- | ---- | ---- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| Pilosa | 0     | 0     | 1    | 0    | 0    | 1    | 1   | 1   | 0   | 0   | 0   | 1   | 0   | 0   | 0   | 0   |
| id       | 32768 | 16384 | 8192 | 4096 | 2048 | 1024 | 512 | 256 | 128 | 64  | 32  | 16  | 8   | 4   | 2   | 1   |     |
| -------- | ----- | ----- | ---- | ---- | ---- | ---- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| Molecula | 0     | 0     | 1    | 0    | 0    | 1    | 0   | 0   | 0   | 0   | 1   | 0   | 1   | 1   | 1   | 0   | 0   |
| id          | 32768 | 16384 | 8192 | 4096 | 2048 | 1024 | 512 | 256 | 128 | 64  | 32  | 16  | 8   | 4   | 2   | 1   |
| ----------- | ----- | ----- | ---- | ---- | ---- | ---- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| FeatureBase | 1     | 1     | 0    | 0    | 0    | 0    | 1   | 1   | 0   | 1   | 0   | 1   | 0   | 0   | 0   | 0   |

Step 3 - Save bitslice bitmaps

The bit-slice columns can now be saved as individual bitmaps. For example:

| id          | 32768 |
| ----------- | ----- |
| Pilosa      | 0     |
| Molecula    | 0     |
| FeatureBase | 1     |

Bitmap storage overheads

Encoding data as base-2 equality-encoded or bit-slice bitmaps makes queries faster but incurs storage overheads because the number of bitmaps scale:

  • with the number of values, and
  • the cardinality of those values

For example, the average storage overheads for a 10,000 value dataset will be as follows:

Database Dataset saved as Average storage overhead (KB)
RDBMS Row and column based structure 20480 - 30720
FeatureBase * equality-encoded bitmaps
* Bit-slice bitmaps
1280000

FeatureBase overcomes this issue by compressing all bitmap data using Roaring Bitmap Format, based on Roaring Bitmaps.

Further information