CREATE INDEX index_name ON CLUSTER cluster_name
Columns are not specified since the columns are taken from the CREATE CLUSTER command which created the named cluster.
CREATE TABLE cluster_table (column_list) CLUSTER cluster_name (cluster column(s))
Only sizing and storage parameters can be altered plus no columns can be added to or removed from a cluster. Extent allocation and deallocation can be performed in Oracle.
A cluster can be used when several tables store a row that is of the same datatype and size in the same location. This reduces storage requirements and can speed up data access. Performance degradation can occur with inserts, updates and deletes. Clusters should only be used with tables with expected joins and few inserts, updates and deletes.
Clusters store shared data values in the same physical blocks, ie. cluster key values. Clusters can speed access to frequently joined tables. Clustering of a single table forces the key values for that table into a single set of blocks. Single table clustering also uses a structure which further improves access speed. Note that Oracle has a type of cluster which includes the data with the index. In the case of an Oracle7 the cluster is a separate file to that of the data table. Also Oracle7 has a type of cluster called a hashing cluster. A hashing cluster does not have an index except that the record location is recorded in a hash value where retrieval of the record requires recalculation of that hash value.
An index organised table is similar to a cluster. A cluster is intended to join multiple tables and allow efficient access to that composite. An index organised tables provides the benefits of clustering to individual tables as opposed to joined table clusters.
Oracle will by default store a single cluster key entry and it's associated rows into each data block of the cluster data segment. Generally cluster key columns should allows their respective groups of rows to fill a single block. Thus too few rows wastes space and too many rows causes overflow. Try to keep cluster keys non-specific such that they cover a significant number of rows. Thus too few columns in a cluster key causes too much uniqueness and thus too few rows in a block. Keys which are too general in nature such as MALE or FEMALE can cause too much searching since there are too many clsuter keys assigned to a single block.
Separate clsuters and their respective cluster indexes into separate tablespaces if possible. This will allow for simultaneous access of both index and data areas.
Hash clusters will improve read-access performance. Non-hash indexes use index column values and their respective ROWIDs to then select rows from the data space. With a hash cluster table rows will be retrieved by application of hashing function to find a disk address or position relative to an absolute position, ie. a database block. Hash values are effectively a distribution of numeric values. It is extremely unwise and thus inefficient to create hashing algorithms based on anything other than numeric integer values due to longer calculations and a lot more of them when not using numbers. In general the hashing function contains a formula for the physical disk location of a specific block containing the data required. Hash clsuters do not require indexes.
Note that hash clusters can also be created for individual tables as opposed to clusters in general which require a multi-table join.
CREATE CLUSTER table_name (id INTEGER) SIZE 1M SINGLE TABLE HASHKEYS 500;
When locating a row not using a hash function and using an index atleast two I/O's are required. With a cluster only a hash caluclation function and a single I/O are required since the hashing function calculates the physical disk address of the data block for a row. Thus retrieval from hash clsuters is very efficient.
Hashing works best for exact match searches and not range searches, ie. equality = works best, < and > give extremely poor performance. This is because a search using a hash value always finds an exact block. Searching using indexes finds blocks relative to other blocks in multiple reads and thus index searching is best for range scans across binary trees. Searching for an exact match (equality) in a binary tree will still involve searching through the binary tree hierarchy based on something being in one branch or another, ie. range scans constanly asking whether a value is < or > a value. Note that hash clusters should only really be used for static or read-only data. When hash clsuters are updated the hashing values can not be recalculated and thus serious overflow can result. Obviously never perform full scans on hash clusters.