Tuning Oracle I/O Usage


In general differing types of required database functionality requires differing types of tuning. OLTP databases need a quick response time for generally small transactions and batch performing databases require high throughput of large transactions. A number of factors are important with respect to tuning of I/O with Oracle.

Block Size

Large block size is beneficial for sequential reads and multimedia storage. Note that the most efficient method of multimedia storage is to store a reference only in the database and store multimedia objects outside the database. Text searching through large text objects is not effectively stored outside the database since usually extensive searching facilities are required. Small block size is beneficial for small transactional random reads and writes since only what is required is read at once. Large block size coupled with small transactions produces wasted I/O by reading blocks where only a small part of each block read contains the required data. Also when large blocks are accessed there is a higher chance of block contention since larger blocks potentially contain more rows where it is more likely than different requests will be satisfied by the same block. 8K is probably the most effective all-round block size for Oracle. Note that in Oracle 2K and 4K block size do not allow reading of more than a single block at once, ie. DB_FILE_MULTIBLOCK_READ_COUNT and SORT_MULTIBLOCK_READ_COUNT. Large block size requires less overhead but is totally inappropriate for index accessed data in an OLTP environment.

Problem Detection

Problems can occur at both the operating system and the Oracle level. It is best to separate heavy disk usage files from Oracle database files, this includes even the Oracle installation. Assuming an external RAID array it may be best to place the operating system plus the Oracle installation software on a disk internal to the server and then place Oracle database files onto the RAID array.

Operating System

In NT use the performance monitor and in UNIX use either sar -d or the iostat command.

Oracle

Oracle I/O statistics can be found as shown below.

V$FILESTAT

select	 substr(v$datafile.name,1,35) "Name"
	,v$filestat.phyrds "Reads"
	,v$filestat.phywrts "Writes"
	,v$filestat.phyblkrd "Blocks Read"
	,v$filestat.phyblkwrt "Blocks Written"
	,v$filestat.readtim "Read Time"
	,v$filestat.writetim "Write Time"
from v$datafile, v$filestat
where v$datafile.file#=v$filestat.file#
order by 1;

Name                                    Reads    Writes Blocks Read Blocks Written Read Time
----------------------------------- --------- --------- ----------- -------------- ---------
C:\ORACLE\ORADATA\ORCL\SYSTEM01.DBF      1446         6        2644              6         0
C:\ORACLE\ORADATA\ORCL\USERS01.DBF          0         0           0              0         0
D:\ORACLE\ORADATA\ORCL\INDX01.DBF           0         0           0              0         0
D:\ORACLE\ORADATA\ORCL\RBS01.DBF           13         2          13              2         0

V$SYSTEM_EVENT

select	 substr(event,1,28) "Event"
	,total_waits "Waits"
	,total_timeouts "Timeouts"
	,time_waited "Wait Time"
	,average_wait "Avg Wait"
from v$system_event
order by 1;

Event                            Waits  Timeouts Wait Time  Avg Wait
---------------------------- --------- --------- --------- ---------
Null event                           1         1         0         0
SQL*Net break/reset to clien        24         0         0         0
SQL*Net message from client        190         0         0         0
SQL*Net message to client          191         0         0         0
SQL*Net more data to client          7         0         0         0
control file parallel write        956         0         0         0
control file sequential read       199         0         0         0
db file parallel write               1         0         0         0
db file scattered read             116         0         0         0
db file sequential read           1359         0         0         0
db file single write                 4         0         0         0
direct path write                    8         0         0         0
file identify                       20         0         0         0
file open                           53         0         0         0
instance state change                1         0         0         0
latch free                           4         4         0         0
log file parallel write             24         0         0         0
log file sequential read             5         0         0         0
log file single write                5         0         0         0
log file sync                        2         0         0         0
pmon timer                      274422    274421         0         0
process startup                      7         1         0         0
rdbms ipc message                 2907      2850         0         0
rdbms ipc reply                      7         0         0         0
refresh controlfile command         24         1         0         0
reliable message                     1         0         0         0
smon timer                          15        10         0         0
sort segment request                 1         1         0         0

V$SESSION_EVENT

select	 substr(v$session.username,1,10) "Username"
	,substr(v$session_event.event,1,29) "Event"
	,v$session_event.total_waits "# of Waits"
	,v$session_event.total_timeouts "Timeouts"
	,v$session_event.time_waited "Time Waited"
	,v$session_event.average_wait "Avg Wait"
	,v$session_event.max_wait "Max Wait"
from v$session,v$session_event
where v$session.sid=v$session_event.sid
order by 1,2;

Username   Event                         # of Waits  Timeouts Time Waited  Avg Wait  Max Wait
---------- ----------------------------- ---------- --------- ----------- --------- ---------
SYSTEM     SQL*Net break/reset to client         24         0           0         0         0
SYSTEM     SQL*Net message from client          161         0           0         0         0
SYSTEM     SQL*Net message to client            162         0           0         0         0
SYSTEM     SQL*Net more data to client            7         0           0         0         0
SYSTEM     control file sequential read          87         0           0         0         0
SYSTEM     db file sequential read              356         0           0         0         0
SYSTEM     direct path write                      8         0           0         0         0
SYSTEM     file open                              5         0           0         0         0
SYSTEM     latch free                             1         1           0         0         0
SYSTEM     log file sync                          1         0           0         0         0
SYSTEM     refresh controlfile command           23         1           0         0         0
SYSTEM     sort segment request                   1         1           0         0         0
           control file parallel write            9         0           0         0         0
           control file parallel write          955         0           0         0         0
           control file sequential read          23         0           0         0         0
           control file sequential read          52         0           0         0         0
           control file sequential read          21         0           0         0         0
           db file parallel write                 1         0           0         0         0
           db file scattered read               110         0           0         0         0
           db file sequential read                6         0           0         0         0
           db file sequential read                6         0           0         0         0
           db file sequential read                4         0           0         0         0
           db file sequential read              680         0           0         0         0
           db file single write                   4         0           0         0         0
           file identify                          6         0           0         0         0
           file identify                          6         0           0         0         0
           file identify                          5         0           0         0         0
           file open                             10         0           0         0         0
           file open                              1         0           0         0         0
           file open                              8         0           0         0         0
           file open                              6         0           0         0         0
           file open                             11         0           0         0         0
           latch free                             2         2           0         0         0
           log file parallel write               24         0           0         0         0
           log file sequential read               5         0           0         0         0
           log file single write                  5         0           0         0         0
           pmon timer                        274432    274431           0         0         0
           rdbms ipc message                    965       959           0         0         0
           rdbms ipc message                    985       959           0         0         0
           rdbms ipc message                      5         4           0         0         0
           rdbms ipc message                    983       959           0         0         0
           smon timer                            15        10           0         0         0

How to Tune I/O

There are a number of approaches to tuning I/O. These involve striping of files across multiple disks and distribution of I/O, prevention of dynamic datafile space allocation, sort tuning, efficient checkpointing, database buffer writer and log buffer writer tuning.

Striping and I/O Distribution

In general all CPU activity waits for the completion of I/O operations. Striping of files across multiple disks either at the operating system or Oracle level can sometimes help to improve performance. Striping implies placing files onto different disks where files separated are those which contain I/O activity at exactly the time; there can be a lot of overhead generated by disk head movement whilst operating on different files at the same time. As with any type of tuning the types of operations determine largely how a system can best be tuned. With respect to Oracle transaction size, transaction size between specific periods of time or the spread of transaction sizes over time can determine the correct course for tuning. Obviously small and large transactions use resources at differing rates, thus different approaches to tuning are required.

Typically it is sensible to separate datafiles from redo log files since both are active at the same time for any database change operation. Also when multiplexing redo logs it is advisable to place each redo log file group onto a separate disk. Note that redo log file group multiplexing is very efficient since the different groups are written in parallel. Placing of archive logs on a disk separated from all redo logs is another possiblity thus preventing I/I retention between redo logs and archives during copying of redo logs to archive logs. It can also be advisable to place rollback segment datafiles on another separate disk aswell since rollback is generated during database change operations aswell, even though rollback will not see as much activity as redo logs. In the past it was also recommended to separate datafiles containing tables from their indexes but generally indexes are hit just before the table is. However, complex joins may negate this fact and make it advantageous to place datafiles containing tables and datafiles containing indexes on separate disks. Note that with the advent of RAID arrays and efficient operating system level striping of files separation of Oracle datafiles and log files on separate disks becomes less important.

It is important to place all files unrelated to Oracle on separate disks to that of Oracle and its datafiles. It is probably also a good idea to place the Oracle installation software itself on a separate disk, probably the primary disk or partition.

Another possible form of striping is at the Oracle level rather than the operating system level. This can involve the striping of large tables in multiple datafiles by placing the different datafiles onto separate disks. One form of this type of striping would simple be creation of multiple datafiles in a single tablespace where the different datafiles reside on different disks, this is beneficial to large amounts of random access to retrieve small amounts of data. A second form of this type of striping would be partitioning of tables across multiple tablespaces where the tablespace datafiles can be placed onto multiple disks. This type of striping can be achieved by creation of multiple files of the same size and then setting the initial parameter in the table storage clause to the size of the datafile plus the minextents parameter to the size of the initial parametes multiplied by the number of datafiles. This will probably only work well with static data because once data is added probably only the first datafile declared for the tablespace will extend automatically, or whichever datafile is found first which can be automatically extended. In short this method is a little ridiculous when a RAID array can do the same thing, with mirroring, and probably a lot more efficiently. If you can afford Oracle you can afford a RAID array.

Dynamic Space Allocation

When a datafile or rollback segment runs out of space Oracle will dynamically add more space to that segment. This is called dynamic extension by the process of Oracle adding new extents to a segment. This can adversely affect resursive calls are used to not only execute the SQL statements required but also to create the new extents. There is a statistic in the V$SYSSTAT view called recursive calls. Other things also cause recursive calls, ie. data cache misses (search the disk), trigger firings (go off and do something else before proceeding), DDL, SQL statements in any type of block and primary to foreign key referential integrity constraint checking. None of these types of five things can be avoided, for instance blocked SQL transactions have serious performance advantages.

select name,value from v$sysstat where name='recursive calls';

NAME                                                                 VALUE
---------------------------------------------------------------- ---------
recursive calls                                                       8313

The obvious solution to reducing the automatic allocation of dynamic extents is to increase the size of the extents for an object or for all tablespaces. High database insertion activity is more likely to cause high automatic extent allocation. Larger extents can store more blocks and thus many blocks read sequentially will be in fewer extents and less I/O is required. The down-side to larger extents is that they may not be physically stored contiguously. Rollback segments tend to do a lot of dynamic extension since it is best to decrease their size after use to the optimal size. It is simply dangerous to automatically rollback segments, run-away transactions can use up disk space completely and crash the database. One solution for rollback segment dynamic extension is to use multiple different sized rollback segments and allocate each transaction a specific rollback segment using the SET TRANSACTION command. However, any unused, specialised rollback segments should either be created privately for specific transactions or held offline unless in use. Obviously SET TRANSACTION can cause contention because multiple users executing the same transaction will be using the same rollback segment (CRUNCH !!!). Create rollback segments generally for the average transaction size and perhaps some specialised offline rollback segments for specific, infrequent, single-user uses.

Tuning Sorts

It is faster for sorting to be performed in memory. When memory is insufficient then sorting will be performed on disk either in a temporary tablespace or whatever tablespace is designated for the user as that user's temporary tablespace. Temporary tablespaces are specifically designed for disk sorts, they should be used.

select name,value from v$sysstat where name in ('sorts (memory)','sorts (disk)');

NAME                                                                 VALUE
---------------------------------------------------------------- ---------
sorts (memory)                                                         124
sorts (disk)                                                             0

SORT_AREA_SIZE determines the amount of memory allocated to memory sorting. Most Texts recommend setting SORT_AREA_RETAINED_SIZE equal to SORT_AREA_SIZE. This means that all of the memory will be retained for sorting after query completion. Doing this is really only appropriate for large batch operations and not for OLTP databases. If small transactions are frequent and large transaction infrequent it is better to sort to disk for the large transactions. If the two parameters are kept the same small transactions will have far too much memory to use and large transactions will simply push all small transactions to disk when they occur. Generally set SORT_AREA_RETAINED_SIZE to about 10% of SORT_AREA_SIZE. For large sorts set the SORT_MULTIBLOCK_READ_COUNT parameter higher and thus more sorting transfer is achieved between memory and disk, ie. sorting is done in a few larger chunks rather many smaller chunks, ie. less I/O. An index can be created with the NOSORT option, this allows index creation on table rows correctly physically ordered to be a copy of the order in the table.

Checkpoints

A checkpoint forces the writing of dirty buffers (changed data) from the data buffer cache to the disk. The more frequently checkpoints are performed the less recovery would be required at instance failure. However, frequent checkpointing could possibly affect performance by requiring more I/O. Minimum checkpointing can be implemented by setting LOG_CHECKPOINT_INTERVAL to be larger than a single redo logs (in blocks) and set LOG_CHECKPOINT_TIMEOUT = 0. Do not have differing redo log sizes. Unless you are manually controlling when logs are used by forcing switches make all redo logs the same size. The larger a redo log is the longer it takes to write to the archives, however small redo logs are not effective for a very active system since they will be filled, archived and switched too rapidly. FAST_START_IO_TARGET will limit number of operations required for recovery by writing dirty data buffer cache to disk when the FAST_START_IO_TARGET limit is exceeded. FAST_START_IO_TARGET will also aid recoverability and not performance.

The DBWn and LGWR Processes

The LGWR process will being writing the log buffer to disk when the log buffer is 1/3 full or a commit command is issued. A small log buffer causes too many I/Os and a large log buffer can delay writing to disk. Set the CHECKPOINT_PROCESS to TRUE to enable the CKPT process therebye freeing the LGWR process of the checkpointing responsibilities.

Create multiple database writer processes (writing dirty database buffer cache entries to disk) by setting the DB_WRITER_PROCESSES parameter from between 1 and 10 producing processes DBW0 through to DBW9. Note that DBWR_IO_SLAVES cannot be executed concurrently with multiple DBWn processes. With a single buffer pool database buffer cache is divided up among multiple DBWn processes by LRU latches, one latch per LRU list (list of latches). Load should be spread amongst all DBWn processes by setting DB_BLOCK_LRU_LATCHES equal to or a multiple of CPUs. Thus a 4 CPU system with 2 DBWn processes you could have 4 LRU latches. In the case of multiple buffer pools (DEFAULT, KEEP and RECYCLE buffers) set LRU latches equal to or a multiple of DBWn processes. Spread the load, place an equal number of latches in each pool out of the total which is a multiple of the DBWn processes.