Indexes¶
3forge provides indexes for improved SELECT ... WHERE query performance. The below page provides information on implementation, usage, and generation of indexes in 3forge.
Overview¶
Indexes are data structures that improve the speed of SELECT ... WHERE queries.
Without an index, a SELECT ... WHERE query needs to check and iterate through each row of data. While this is fine for small datasets, this can quickly become inefficient.
Columns can be indexed such that relevant rows in a WHERE clause can be quickly looked up and retrieved. The type of index to use depends on your use case.
Index Data Structure¶
3forge supports three different index data structures to consider:
-
HASH- Backed by Java hashmap
- The fastest and most versatile indexing approach, but is only useful for queries that directly use equality (
=,==) as data is unsorted
-
SORT- Backed by Java treemap
- This has additional overhead to a
HASHindex for both inserting and querying but can be used with ordinality comparisons (<,>) as well as equals
-
SERIES- Backed by a sorted array
- Special purpose and highly optimized index useful for data inserted in ascending order
- Entries are quickly added to the end of the array and queries evaluated using a binary lookup
Warning
If data is out of order the array is forced to do a memcopy which can be very expensive.
Key Constraints¶
Constraints can be added to an index to determine the outcome of a key collision. Three different types of constraints are supported:
-
NONE- Default
- No restriction
- Multiple rows can have the same key identifier
-
UNIQUE- Prevents duplicate keys; no two rows can have the same key
- Does not allow updating or inserting with the same key
- Attempting to insert or update a row with a given, unique key will fail
-
PRIMARY- Keys are unique in that there can only be one row per key
- Insertions matching a given key will overwrite or update the existing row corresponding to the key
-
There can be at most one
PRIMARYindex -
Primary indexes can also be automatically generated on a particular column using
AUTOGEN(see here for detailed examples), where two options are available:-
RAND:- A random UID is assigned to the column with a unique value for each row
-
INC:- An auto-incrementing UID is assigned to the column with a unique value for each row, starting at 0 for the first row, 1 for the second row, and etc.
- This option is only supported for
INTandLONGcolumns
-
CREATE INDEX¶
Single indexes¶
idx_name: Name of the index, each index's name must be unique for the table.tbl_name: Name of the table to add the index to.col_name: Name of the column to put the index on.
Example¶
-
The command below will create a primary index on the column
Name:
Joint Indexes¶
Joint indexes are indexes that span across multiple columns in a table. The syntax is:
Joint indexes are helpful when multiple columns participate in your query's WHERE clause. It will enable the query optimizer to find out the best path to return the results by reducing the search space.
Note
Order matters. The two statements below will produce different indexes:
If you create an index on column A and column B, it first creates the index on column A and for every value in column A, it creates the second index on column B.
This means it is important to put high cardinality columns first. If A is a binary value (e.g. BUY or SELL) but B is a higher cardinality value (e.g. asset symbol), then B should be put first. This is because you are more likely to reduce the search space more in the initial lookup.
Example 1¶
-
Consider the table orders:
In this table,
Symbolhas a cardinality of 500 andRegionof 5.For optimal joint indexing, it is advisable to first create an index on Symbol then Region, e.g:
Now let's run some concrete queries to see how this joint index helps increase the query speed.
-
Case 1: full usage of the joint index
-
Case 2: partial usage of the joint index
-
Case 3: no usage of the joint index
-
Example 2¶
-
This example will add an index to the table
MyTable'sAccountandPricecolumn.Queries that use the
Accountcolumn or theAccountandPricecolumn will be faster (queries using only thePricecolumn will not sped up by this index).This is an ideal index if we expect to be running queries with the following format:
DROP INDEX¶
Removes the specified index from the given table, as created by CREATE INDEX.
Example¶
This example will drop the index named MyIndex from the MyTable table.
AUTOGEN¶
Column Types Supported Using AUTOGEN¶
The table below outlines table column types that are supported using AUTOGEN options where Y denotes supported and N denotes unsupported.
ColumnType\Option |
INC |
RAND |
|---|---|---|
| INT | Y | Y |
| LONG | Y | Y |
| STRING | N | Y |
| UUID | N | Y |
| FLOAT | N | Y |
| DOUBLE | N | Y |
| Anything else | N | N |
AUTOGEN="RAND"¶
-
This example demonstrates how to use
AUTOGENto create automatically generated random primary indexes on a string column.Take the below table:
This command will create a primary index on the column
Id Stringusing theRANDoption:
On inserting data into this table, the randomly generated indexes will look similar to the following:
| Id String |
|---|
| FdZXX4BWEN |
| NcbSCeVusm |
| IOVQJkLDbX |
| ... |
AUTOGEN="INC"¶
-
This example demonstrates how to use
AUTOGENto create automatically generated auto-incrementing primary indexes on an integer column.Take the below table:
This command will create a primary index on the column
Id Integerusing theINCoption:
On inserting data into this table, the autogenerated indexes will look as follows:
| Id Integer |
|---|
| 0 |
| 1 |
| 2 |
| ... |