Oracle Row Source Join Methods


Types of Joins

The optimiser will determine the most efficient method of SQL statement execution. The optimal plan is passed on to the row source generator. The row source generator produces a SQL statement execution plan, a series of row sources. Each row source processes a set of rows and ultimately a row set is generated. Row sources are joined using different types of joining methods.

Nested Loop Join

  1. Read 1st row from 1st row source. This is the driving table (outer loop).
  2. Test the 2nd row source for matches with the 1st row source.
  3. Write all matches to the result set and find the next row from the 1st row source.

The driving table or outer loop must be small and dependant table or inner loop must be indexed with high cardinality, ie. records are found in the inner table precisely by exact block hits. The more records found in the inner table the slower the query. A nested loop join produces rows before completion.

Sort Merge Join

  1. Sort 1st row source.
  2. Sort 2nd row source.
  3. Merge the 1st and 2nd row sources.
  4. Rows are placed into the resulting row set as they are found.

A sort merge join is used for large row sources (to be joined) and lack of indexes or high cardinality indexes (finds too many records for each database hit). Obviously a sort merge join only works for equijoins (WHERE x = y) and not range scans (WHERE x >= y). This is because merging is performed after sorting. Note that sort merge is much slower than nested loop since nested loop accesses a small number of blocks and sort merge accesses a large number of blocks increasing I/O activity. Also a sort merge is likely to require temporary sorting space either in the SORT_AREA_SIZE memory buffer or in a temporary segment if row sources are large enough. This will also affect I/O and thus performance.

Cluster Join

A cluster join is much like a nested loop join in that specific high cardinality indexes are used to find exact blocks and thus minimise I/O and have no sorting. A cluster index is used to access exact block matches in the inner loop based on index values found in the outer loop.

Hash Join

  1. Builds a hash table in memory using the HASH_AREA_SIZE parameter for all join column values (those in the WHERE clause) from the 2nd row source.
  2. Scroll through each join column in the 1st row source and search into the in memory hash table for matching joined column values. Use the ORDERED hint to make the 1st row source the driving row source after the hash table for the second row source has been built in memory.

Note that if the HASH_AREA_SIZE (2nd row source) and DB_BLOCK_BUFFERS (1st) are sized appropriately that this type of join can be executed entirely in memory and thus increase performance dramtically.

Index Join

In this case the optimiser can generate a number of hash joins between multiple indexes in a join. An index join will only occur if all query constraints (the WHERE clause) are contained in available indexes. Note that all indexes must be available either as properly sequenced composite indexes or even individual column bitmap indexes.