Indexing in Relational Databases

Indexing will increase database selection or read-access performance. A book which has an index allows rapid access to a particular subject area within that book. Indexing a database table provides rapid location of specific items of data or rows within that table. Note that indexes are used by the database engine itself to optimise on data access speed. When indexes are not used or are not matched by SQL statements submitted to that database then a full table scan is exectued. A full table scan will read all the data in a table to find a specific row or set of rows. This extremely inefficient.

Number of Indexes per Table

A very important fact to note about indexing is as follows. Whenever a table is inserted into, updated or deleted from all indexes plus the table must be updated. Thus if one places ten indexes onto a single table then every change to that table requires an effective change to a single table file and ten index files. This will probably degrade performance substantially since one insert actually requires eleven inserts as far as the database is concerned. So be frugal with your indexing and concious of the potential ill as well as the good effects produced.

Table Columns to be Indexed

  1. Try to always create indexes on columns with fixed values. Use integers if possible or fixed length strings. Try very hard not to use variable length strings (VARCHARs) and dates. If possible create automatically generated integer columns to represent other more complex data types as index columns.

  2. Composite keys are indexes made up of multiple columns. Minimise on the number of columns in a composite key. Composite keys are sometimes better substituted for with automatically generated integers. Also examine the data. It may sometimes be better to exclude some of the lower-level or less relevant columns from the index since at that level there may not be much data. Thus a composite index comprised of five columns could be reduced to the first three columns based on a limtied number of rows traversed as a result of ignoring the last two columns. Look at your data carefully when constructing indexes. The more columns you add to a composite index the slower the search will be since there is a more complex requirement for that search. Most database experts recommend a maximum of three columns for composite keys.

Types of Indexes

In general indexes are stored in a separate file or table to that of the data table. Thus searches are performed through a small set of columns (indexed columns only) and then the data is ultimately retrieved from the table itself by the database using the index. The objectives are as follows.

  1. Primary keys or indexes are generally used to uniquely identify a record. The original intention of primary and foreign keys was that of the representation and enforcement of referential integrity between related tables. Primary keys are now used as the the most important index for a table and do not necessarily have referring foreign keys in other tables. Primary keys are usually attached to the data space of the table itself but not always. This is database product specific.

  2. Non primary indexes are indexes constructed from one or more table columns. Non primary indexes may or may not include the primary index column or columns. The purpose of non primary indexes is to improve database access performance. These indexes will generally be created as a separate file within the database. Thus when searching these indexes a small number of columns is loaded into memory for searching. These indexes will have virtual memory addresses into the data space of the table, thus allowing rapid access between index space and table space. Note that the memory or cache space required for an index is much smaller than that of a data table since the row length of an index table is smaller, atleast it should be. Thus in the case of page swapping in and out of memory or cache and disk, indexes load many more pages into memory at once since they are smaller than data tables and thus indexes can be traversed much more rapidly that data table spaces. Different databases handle this in different ways.

  3. There are different types of indexes available in different databases. These different indexes are applicable under specific circumstances.

    1. Indexes can be unique or non-unique. Unique indexes are generally used as primary keys (not always) and non-unqiue indexes contain duplications which are generally used for database access.

    2. Clustered indexes are generally data table space forms of hashing or btree algorithms. A clustered index clusters or groups the actual data table rows of the table where the actual data rows reside in the leaf pages of the index.

    3. A non-clustered index sets the leaf pages of the index as pointers to the data pages containing the rows in the table.

    4. Indexes can be ordered as ascending or descending. Descending indexes are rare and are usually application specific.

    5. The most common type of indexing algorithm is the binary search algorithm or BTREE. There are various forms or BTREE searching algorithms. Basically a binary search traverses a tree of items facilitating rapid location of a specific item within a table.

    6. Hashing and ISAM algorithms are also used. Generally these index construction algorithms are much faster but less versatile and are thus more applicable to static data, ie. data which is infrequently changed. A hashing algorithm will create integer values specific for particular areas or physical locations within the data within a table. Hash indexes are most efficiently created from integer columns. Use of field data types such as variable length strings and dates in hash indexes is generally very unwise do the nature of the calculation of a hash value.

    7. The simplest form of indexing is a heap structure. Generally a heap structured index is not really an index and generally does not really aid in performance other than seperating index and table spaces. For frequently changing tables where changes are not made always as an append to the end of the table and rapid selection is not required, then heap structured indexes should be vaguely considered.

Some databases actually allow null values into index attributes. This can be dangerous. However, if the database engine will always ignore null value containing index rows then serious improvements in efficiency can be gained. Generally use of null values in indexes can increase the volume of resultant application code required to interpret those null valued attributes. Use of null valued attributes generally saves storage space. Saving of storage space will always increase efficiency. Most database engines will not allow null values into indexed attributes, especially unique primary attributes.

What SQL does with Indexes

In general an SQL statement will attempt to match the structure of itself to an index. If no index is matched then a full table scan will be executed. A table scan is extremely inefficient for anything but the smallest of tables. Obviously if a table is read sequentially in physical order then an index is not required. A table does not always need an index. A primary key is an index. Therefore it follows that a primary index is not always required. Remember that primary keys were originally used to enforce and represent referential integrity between related tables.

What to Index

Use indexes where frequent queries are performed.

  1. Columns use in joins.

  2. Columns used in where clauses.

  3. Columns use in order-by clauses. Note that the order-by clause is generally executed on the subset retrieved by the where clause, not the entire data table. Therefore the order-by clause should never include the columns contained in the where clause. The only case where the order-by clause will include columns contained in the where clause is the case of the where clause not matching any index in the database.

  4. The group-by clause can be enhanced by indexing when the range of values being grouped is small in relation to the number of rows in the table selected.

What not to Index

Indexes will degrade performance of inserts, updates and deletes, sometimes substantially.

  1. Tables with a small number of rows.

  2. Static tables.

  3. Columns with a wide range of values.

  4. Tables changed frequently and with a low amount of data access.

  5. Columns not used in data access query select statements.