Skip to content
Center

Indexes

3forge provides indexes for improved SELECT ... WHERE query performance. The below page provides information on implementation, usage, and generation of indexes in 3forge.

Overview

Indexes are data structures that improve the speed of SELECT ... WHERE queries.

Without an index, a SELECT ... WHERE query needs to check and iterate through each row of data. While this is fine for small datasets, this can quickly become inefficient.

Columns can be indexed such that relevant rows in a WHERE clause can be quickly looked up and retrieved. The type of index to use depends on your use case.

Index Data Structure

3forge supports three different index data structures to consider:

  1. HASH

    • Backed by Java hashmap
    • The fastest and most versatile indexing approach, but is only useful for queries that directly use equality (=, ==) as data is unsorted
  2. SORT

    • Backed by Java treemap
    • This has additional overhead to a HASH index for both inserting and querying but can be used with ordinality comparisons (<,>) as well as equals
  3. SERIES

    • Backed by a sorted array
    • Special purpose and highly optimized index useful for data inserted in ascending order
    • Entries are quickly added to the end of the array and queries evaluated using a binary lookup

    Warning

    If data is out of order the array is forced to do a memcopy which can be very expensive.

Key Constraints

Constraints can be added to an index to determine the outcome of a key collision. Three different types of constraints are supported:

  1. NONE

    • Default
    • No restriction
    • Multiple rows can have the same key identifier
  2. UNIQUE

    • Prevents duplicate keys; no two rows can have the same key
    • Does not allow updating or inserting with the same key
    • Attempting to insert or update a row with a given, unique key will fail
  3. PRIMARY

    • Keys are unique in that there can only be one row per key
    • Insertions matching a given key will overwrite or update the existing row corresponding to the key
    • There can be at most one PRIMARY index

    • Primary indexes can also be automatically generated on a particular column using AUTOGEN (see here for detailed examples), where two options are available:

      1. RAND:

        • A random UID is assigned to the column with a unique value for each row
      2. INC :

        • An auto-incrementing UID is assigned to the column with a unique value for each row, starting at 0 for the first row, 1 for the second row, and etc.
        • This option is only supported for INT and LONGcolumns

CREATE INDEX

Single indexes

1
2
3
CREATE INDEX [IF NOT EXISTS] idx_name
    ON tbl_name(col_name [HASH|SORT|SERIES] [,col_name [HASH|SORT|SERIES] ...])
    [USE  CONSTRAINT="[NONE|UNIQUE|PRIMARY]"]
  • idx_name: Name of the index, each index's name must be unique for the table.
  • tbl_name: Name of the table to add the index to.
  • col_name: Name of the column to put the index on.

Example

The command below will create a primary index on the column Name:

CREATE INDEX MyIndex ON MyTable(Name HASH) USE CONSTRAINT="PRIMARY"

Joint Indexes

Joint indexes are indexes that span across multiple columns in a table. The syntax is:

CREATE INDEX myIndex ON MyTable(A <Type>,B <Type>,C <Type>,....);

Joint indexes are helpful when multiple columns participate in your query's WHERE clause. It will enable the query optimizer to find out the best path to return the results by reducing the search space.

Note

Order matters. The two statements below will produce different indexes:

CREATE INDEX myIndex ON MyTable(A HASH,B HASH);
CREATE INDEX myIndex ON MyTable(B HASH,A HASH);

If you create an index on column A and column B, it first creates the index on column A and for every value in column A, it creates the second index on column B.

This means it is important to put high cardinality columns first. If A is a binary value (e.g. BUY or SELL) but B is a higher cardinality value (e.g. asset symbol), then B should be put first. This is because you are more likely to reduce the search space more in the initial lookup.

Example 1

Consider the table orders:

1
2
3
4
5
CREATE PUBLIC TABLE ORDERS(orderId int, Symbol string, Region string, Quantity int, Price double);

List regionList = new list("Asia","North America","South America","Europe","Africa");

for(int i=0;i<1000000;i++){ INSERT INTO ORDERS VALUES(rand(10000), "sym"+rand(1000),(string) (regionList.get(rand(5))), 500+rand(1000), rand()*200); }

In this table, Symbol has a cardinality of 500 and Region of 5.

For optimal joint indexing, it is advisable to first create an index on Symbol then Region, e.g:

CREATE INDEX jointIndex ON orders(Symbol HASH, Region HASH);

Now let's run some concrete queries to see how this joint index helps increase the query speed.

  • Case 1: full usage of the joint index

    SELECT * FROM orders WHERE Symbol=="sym500" AND Region=="North America"; //use both part of the jointIndex
    
  • Case 2: partial usage of the joint index

    SELECT * FROM orders WHERE Symbol=="sym500"; //use only "Symbol" part of the jointIndex
    
  • Case 3: no usage of the joint index

    1
    2
    3
    SELECT * FROM orders WHERE Region=="North America"; //not using index, do a hard forward scan
    
    SELECT * FROM orders WHERE Region=="North America" AND Symbol=="sym500"; //not using index, do a hard forward scan
    

Example 2

This example will add an index to the table MyTable's Account and Price column.

Queries that use the Account column or the Account and Price column will be faster (queries using only the Price column will not sped up by this index).

1
2
3
CREATE INDEX MyIndex ON MyTable(Account HASH, Price SORT)

CREATE INDEX MyIndex ON MyTable(Name HASH) USE CONSTRAINT="PRIMARY"

This is an ideal index if we expect to be running queries with the following format:

SELECT ... FROM MyIndex WHERE Account==*somevalue* and Price\[ \<, ==, \> \] *somevalue*

DROP INDEX

Removes the specified index from the given table, as created by CREATE INDEX.

DROP INDEX [IF EXISTS] index_name ON tbl_name [,index_name ON tbl_name...]

Example

This example will drop the index named MyIndex from the MyTable table.

Drop index MyIndex on MyTable

AUTOGEN

Column Types Supported Using AUTOGEN

The table below outlines table column types that are supported using AUTOGEN options where Y denotes supported and N denotes unsupported.

ColumnType\Option INC RAND
INT Y Y
LONG Y Y
STRING N Y
UUID N Y
FLOAT N Y
DOUBLE N Y
Anything else N N

AUTOGEN="RAND"

This example demonstrates how to use AUTOGEN to create automatically generated random primary indexes on a string column.

Take the below table:

CREATE PUBLIC table MarketOrders(Id String, Symbol String, Quantity Int);

This command will create a primary index on the column Id String using the RAND option:

CREATE INDEX RandomIndex ON MarketOrders(Id HASH) USE CONSTRAINT="PRIMARY" AUTOGEN="RAND"

On inserting data into this table, the randomly generated indexes will look similar to the following:

INSERT INTO MarketOrders(Symbol, Quantity) values("AAPL",20),("IBM",30),("MSFT",50);
SELECT Id FROM MarketOrders;
Id
String
FdZXX4BWEN
NcbSCeVusm
IOVQJkLDbX
...

AUTOGEN="INC"

This example demonstrates how to use AUTOGEN to create automatically generated auto-incrementing primary indexes on an integer column.

Take the below table:

CREATE PUBLIC table MarketOrders2(Id Int, Symbol String, Quantity Int);

This command will create a primary index on the column Id Integer using the INC option:

CREATE INDEX RandomIndex2 ON MarketOrders2(Id HASH) USE CONSTRAINT="PRIMARY" AUTOGEN="INC"

On inserting data into this table, the autogenerated indexes will look as follows:

INSERT INTO MarketOrders2(Symbol, Quantity) values("AAPL",20),("IBM",30),("MSFT",50);
SELECT Id FROM MarketOrders2;
Id
Integer
0
1
2
...