Documentation

Row()

The Row() query returns a set of record IDs / keys for records that have been filtered in some way. You can think of Row() as serving a similar function to the WHERE clause in SQL.

There are two different call definitions for Row(). One for Set, Mutex, and Time fields and another for Int, Decimal, and Timestamp fields. For more information on field types in FeatureBase, go here.

Call Definition

Set, Mutex, and Time Fields

Row(FIELD = FIELD_VALUE, from = TIMESTAMP, to = TIMESTAMP)

Note: Time fields are similar to Set fields except there is one or more date and time associated with each relationship between a record and field value. For that reason, you can only use the from and to arguments on Time fields.

Mandatory Arguments

  • FIELD : the name of a field in the index being queried
  • FIELD_VALUE : a field value within a Set, Mutex or Time field, FIELD

Optional Arguments

  • from : the start date and time when querying Time fields (TIMESTAMP formatted like '2006-01-02T00:00:00Z'). TIMESTAMP here is an inclusive value - i.e. records with relationships made on or after this time will be included in the result – for Time fields only.
  • to: the end date and time when querying Time fields (TIMESTAMP formatted like '2006-01-02T00:00:00Z'). TIMESTAMP here is an exclusive value - i.e. records with relationships made before this time (but not on this time) will be included in the result – for Time fields only.

Returns

  • list of record IDs or record keys

Int, Decimal, Timestamp Fields

Row(VALUE OPERATOR FIELD OPERATOR VALUE)

Mandatory Arguments

  • FIELD : the name of a field in the index being queried
  • VALUE : a integer, decimal, or timestamp value - consistent with the field type of FIELD (e.g. 1, 10.2, ‘2006-01-02T05:01:00Z’) – the first VALUE is optional
  • OPERATOR : >, <, <=, >=, ==, or != – the first OPERATOR is optional

Optional Arguments

Returns

  • list of record IDs or record keys

Examples

Raw Event Data:

Each row below represents a purchase event by a customer in some system. There is an event ID (_id), customer ID (customer_id), brand that customer purchased from (brand), and the time the purchase was made (time).

 _id | brand  | customer_id |         time
-----+--------+-------------+----------------------
 1   | brand1 | 0           | 2021-01-02T02:45:00Z
 2   | brand2 | 0           | 2021-01-05T08:30:00Z
 3   | brand3 | 1           | 2019-07-24T11:15:00Z
 4   | brand1 | 1           | 2020-09-12T12:30:00Z
 5   | brand1 | 2           | 2021-07-07T18:25:00Z
 6   | brand3 | 2           | 2021-08-06T16:15:00Z
 7   | brand1 | 4           | 2020-10-01T20:45:00Z
 8   | brand4 | 4           | 2021-10-01T20:45:00Z
 9   | brand4 | 5           | 2022-01-13T11:00:00Z
 10  | brand1 | 0           | 2022-03-15T02:55:00Z

Raw Customer Data:

Each row represents a customer. Note the _id column in this data corresponse to a customer_id column in the data above.

_id | age
----+-----
 0  | 23
 1  | 31
 2  | 28
 3  | 19
 4  | 25
 5  | 40

The raw data above can be ingested to FeatureBase in at least three ways.

  • An events index and a seperate users index. In this case, the two indexes would look like raw data looks above. One index would be built around purchase events (i.e. each record is a different purchase event) and one index would be built around users (i.e. each record is a different user).
  • A users index where the time associated with purchase relationships (customer / brand) doesn’t matter. Here each record would be a user and we’d use a Set field to track which users have bought from which brands. Note we lose the time information when using a Set field. This was done in the customer_set index below.
  • A users index where the time associated with each relationship (customer / brand) does matter. Here each record would be a user and a Time field to track which users have bought from which brands. This was done in the customer_time index below.

FeatureBase Data

Index: customer_set (non-keyed index) 
 _id | age (Int) | has_purchased (Set) | last_purchase (Timestamp)
-----+-----------+---------------------+---------------------------
 0   |    23     | ["brand1","brand2"] | 2021-01-05T08:30:00Z
 1   |    31     | ["brand1","brand3"] | 2020-09-12T12:30:00Z
 2   |    28     | ["brand1","brand3"] | 2021-08-06T16:15:00Z
 3   |    19     | []                  | null
 4   |    25     | ["brand1","brand4"] | 2021-10-01T20:45:00Z
 5   |    40     | ["brand4"]          | 2022-01-13T11:00:00Z

Index: customer_time (non-keyed index)
 _id | age (Int) | has_purchased (Time)
-----+-----------+---------------------
 0   | 23        | ["brand1","brand2"]
 1   | 31        | ["brand1","brand3"]
 2   | 28        | ["brand1","brand3"]
 3   | 19        | []
 4   | 25        | ["brand1","brand4"]
 5   | 40        | ["brand4"]

In both indexes above, a customer table was create with a customer ID as the record ID. Some information about the customer (e.g. age) was included. In the first index (customer_set), a Set field is used to store purchase history. Here we’re tacking the brands each customer has purchased from but there is no time associated with the relationship. In the second index (customer_time), a Time field is used to store purchase history. This means there is at least one time associated with each relationship that can be used for range queries.


Example 1

What customers have purchased from brand1?

Query

[customer_set]Row(has_purchased=brand1)

Tabular Response

 _id
-----
 0
 1
 2
 4

HTTP Response

{
  "results": [
    {
      "columns": [
        0,
        1,
        2,
        4
      ]
    }
  ]
}

Explanation

Customers 0, 1, 2, and 4 are the only customer that had a purchase event with brand1.


Example 2

Which customers have purchased from brand3 during 2021?

[customer_time]Row(has_purchased=brand3, from='2021-01-01', to='2022-01-01')

Tabular Response

 _id
-----
 2

HTTP Response

{
  "results": [
    {
      "columns": [
        2
      ]
    }
  ]
}

Explanation

Note the primary index is customer_time. In that index, the has_purchased field is a Time field. This allows us to use the from / to arguments to search for relationship made during a given time window. This is not possible in the customer_set index which uses a Set field for the has_purchased field. Looking at the raw events data, only customer 2 purchased from brand 3 during 2021.


Example 3

What customer are between the age of 20 and 30?

[customer_set]Row(20 < age < 30)

Tabular Response

 _id
-----
 0
 2
 4

HTTP Response

{
  "results": [
    {
      "columns": [
        0,
        2,
        4
      ]
    }
  ]
}

Explanation

Customer 0, 2, and 4 are between the ages of 20 and 30.


Example 4

What customers made their last purchase in after the start 2022?

[customer_set]Row(last_purchase > "2022-01-01T00:00:00Z")

Tabular Response

 _id
-----
 5

HTTP Response

{
  "results": [
    {
      "columns": [
        5
      ]
    }
  ]
}

Explanation

Only customer 5’s last purchase was made after the start of 2022.

Something missing or incorrect?
Help improve this article or join us on Discord!

© 2022 Molecula Corp. (DBA FeatureBase). All rights reserved.