How does FeatureBase reduce storage overheads?
Encoding data as base-2 equality-encoded or bit-slice bitmaps makes queries faster but incurs storage overheads because the number of bitmaps scale:
- with the number of values, and
- the cardinality of those values
For example, the average storage overheads for a 10,000 value dataset will be as follows:
Database | Dataset saved as | Average storage overhead (KB) |
---|---|---|
RDBMS | Row and column based structure | 20480 - 30720 |
FeatureBase | * equality-encoded bitmaps * Bit-slice bitmaps | 1280000 |
FeatureBase overcomes this issue by compressing all bitmap data using Roaring Bitmap Format, based on Roaring Bitmaps.
Table of contents
Before you begin
- Learn about Roaring Bitmaps
- Learn about FeatureBase bitmaps
- Learn about equality-encoded bitmaps
- Learn about bit-sliced bitmaps
What kind of database is Roaring Bitmap Format?
At the highest level, RBF consists of two data files, saved to disk, also known as a shard.
Data file | Consisting of | Additional information |
---|---|---|
RBF | The RBF data file contains a sequential collection of Roaring Bitmap friendly 8kb page files | RBF data file |
Write Ahead Log (WAL) | Sequential snapshot of transactions batched into Roaring Bitmap containers of 2^16 integers (65536 bits) | When transactions finish, the checkpointing process: * updates each database page with changes from the WAL file * truncates the WAL file when updates are complete |
What is the RBF data file?
Roaring Bitmap Format (RBF) is a collection of b-trees serialized to disk that contain a page hierarchy used to store:
- RBF metadata
- a b-tree structure that stores data in Roaring Bitmap containers.
RBF metadata pages
The following page types contain RBF metadata:
Page number | Page type | Description | Additional information |
---|---|---|---|
0 | Meta | The Meta page keeps track of b-tree information including: * total pages * current position in the Write Ahead Log * Root record page number * Freelist page number | * Freelist reference * Meta page example |
1 | Root record | Stores a list of bitmaps * defined as combination of field and view * and root page number for that b-tree | Root record example |
B-tree structure and data pages
The following page types contain the b-tree structure and data:
Page type | Description | Additional information |
---|---|---|
Branch | Branch pages make up the general “tree” structure of the b-tree, and consists of pointers to branch and leaf nodes | Pointers to additional branch pages are created when an RBF database is expanded. |
Leaf | Leaf pages contain one or Roaring Bitmap containers of data in a specified format. | Leaf page additional |
Bitmap | A bitmap page contains 8KB (65536 bytes) of Roaring Bitmap container data and is referenced by a Leaf Bitmap page. | Bitmap page additional |
Additional information
Benefits and features of RBF
RBF has the following benefits and features:
- full ACID (Atomicity, Consistency, Isolation, Durability) transaction support which allows incremental updates within the RBF file
- 8KB page files can accommodate the largest possible Roaring Bitmap container
Leaf page additional
Leaf pages relate directly to Roaring Bitmap format which:
- splits bitmaps into containers that are 65,536 bits wide
- chooses one of three encoding types for the container based on which will give the best compression:
Type | Data storage | Use case | Additional information |
---|---|---|---|
Array | A Leaf Array page contains cells which correspond to a Roaring Bitmap container | Sparse containers with few bits set | An array of 16-bit integers is kept to tell which of the 65,535 bits are set |
Run Length Encoded (RLE) | A Leaf RLE page contains cells which correspond to a Roaring Bitmap container | Containers where most or all bits are set | |
Bitmap pointer | Uncompressed, always 8KB (65,536 bits) | The cell contains a reference to a separate RBF page because Roaring Bitmap containers are exactly 8KB |
Leaf/Roaring Bitmap Bitwise operations
Bitwise operations (AND, OR, XOR, DIFFERENCE, NOT, etc.) are defined on each possible pair of encodings which means:
- decode-operate-encode processes are not required
- all operations take place on the data in the format it’s already in
- compression benefits double as performance benefits
- it takes fewer processor instructions to skip over sections of containers where no bits or all bits are set
Bitmap page additional
Bitmaps are defined as a combination of field and view.
Bitmap pages have no space for metadata so that is stored higher in the heirarchy
Examples
Meta page
Pgno: 0
Type: meta
PageN: 5
WALID: 8
Root Record Pgno: 1
Freelist Pgno: 2
Root record page
Pgno: 1
Type: root record
Next: 0
Records: n=2
[0]: name="~_exists;standard<" pgno=3
[1]: name="~fld1;standard<" pgno=4
Branch page
Pgno: 5
Type: branch
Cells: n=9
[0]: key=16 flags=0 pgno=47
[1]: key=32 flags=0 pgno=159
[2]: key=48 flags=0 pgno=80
[3]: key=64 flags=0 pgno=180
[4]: key=80 flags=0 pgno=48
[5]: key=96 flags=0 pgno=136
[6]: key=112 flags=0 pgno=49
[7]: key=128 flags=0 pgno=103
[8]: key=144 flags=0 pgno=181
Leaf array page
Pgno: 23
Type: leaf
Cells: n=2
[0]: key=48 type=array values=[2 27 37 42 44]
[1]: key=51 type=array values=[1315 1316 1319]
Leaf bitmap page
Pgno: 47
Type: leaf
Cells: n=1
[0]: key=16 type=bitmap-ptr pgno=217