Skip to main content Link Menu Expand (external link) Document Search Copy Copied

How do I query the billion row database?

The sample database is provided for users and organizations evaluating FeatureBase at scale. It contains one billion rows with attributes commonly used for segmentation queries.

Learn how fast the FeatureBase database is by running queries against the cseg (customer segmentation) and skills tables

Table of contents

Before you begin

Table statements

Run SHOW CREATE TABLE statements to view the table structure:

SHOW CREATE TABLE cseg;
SHOW CREATE TABLE skills;

FeatureBase IDSET and STRINGSET data types are present in both tables:

SQL queries

The following queries demonstrate sub-second latency in the FeatureBase database

Data Exploration 

Prove there are over one billion rows in the database with these queries:

Query Result
SELECT COUNT(*) FROM cseg; 1000000000
SELECT COUNT(*) FROM skills; 25000

Output the top 10 rows in each table:

SELECT TOP(10) * FROM cseg;
SELECT TOP(10) * FROM skills;

Aggregate queries

These queries aggregate the values in the income column of the cseg table.

Query Result
SELECT SUM(income) FROM cseg; 100775986981472
SELECT SUM(income) FROM cseg where income > 5000; 100700159226528
SELECT AVG(income) FROM cseg; 100775.9869

Complex segmentation 

The following query, based on the aggregate queries above, has the following conditions:

  • income is greater-than 5000
  • age is 45
  • skills column contain either of MS Office or Excel
Query Result Additional information
SELECT SUM(income) FROM cseg WHERE income > 5000 AND age = 45 AND (SETCONTAINSANY(skills,['Ms Office','Excel'])); 32177307009 SETCONTAINSANY function

Grouping with Complex Conditions

SELECT hobbies, COUNT(*) as cnt
FROM cseg
WITH (flatten(hobbies))
GROUP BY hobbies
HAVING COUNT(*) > 200000000
ORDER BY cnt DESC;
SELECT education, SUM(income)
FROM cseg
WITH (FLATTEN(education))
WHERE age=18
GROUP BY education;

Next step

Delete the database or tables as soon as testing finishes to reduce costs.