Tuning Oracle CPU Usage


The Operating System

CPU usage can be tuned at peak, average and idle workloads. Obviously the biggest problems occur at peak workloads thus the most attention should be paid to peak usage times. Peak times occur at different times depending on the type of system. Batch operations should be exectued at low usage times with respect to interactive systems. Tuning of client-server type systems is much less involved than that of internet based databases because internet systems involve potentially astronomical amounts of users.

If I/O wait time and idle time are both near zero then CPU utilisation is not effective at the operating system level. This means that disk access is fast and that the CPU is not idle. Since the CPU is very much faster than I/O this indicates a problem with processor use because the CPU should be waiting for I/O, definitely not the other way around.

Swapping of pages in and out of memory can cause high CPU utilisation. This is generally because memory allocation is too low. It can also be caused by what Oracle calls spinning recursive processes such as very large transactions causing too much rollback and temporary segment usage. Too much network usage can cause CPU overload when many small messages are sent repeatedly. Solve this by batching of network usage such as with stored procedures. Too many processes continually switching in a round-robin type queue can cause problems. Continual recreation and breaking of connections is a waste of resources.

It is important to note that when assessing process' memory usage that memory usage for each process is made up of process size, shared memory, heap and executable stack space. Thus each actual process uses much less memory than is really expected. Problems with performance stemming from the operating system involve processes, memory and scheduling. Consequently I/O bandwidth, CPU speed and swap space are important.

Oracle CPU Usage Tuning

In Oracle over-usage of CPU is caused by many things. The most common is poorly built SQL (missing indexes can also cause problems even though too many and too large indexes can cause the same problem), SQL statement reparsing or reloading and re-execution, poorly designed database structures (this includes rollback segments) and waiting for lock releases (contention for latches); usually a sign of poor database structure design.

Checking for Re-Parsing of SQL Statements

select substr(name,1,32) "Statistic", value from v$sysstat
where name in ('parse time cpu','parse time elapsed', 'parse count (hard)', 'parse count (total)');

Statistic                            VALUE
-------------------------------- ---------
parse time cpu                           0
parse time elapsed                       0
parse count (total)                    387
parse count (hard)                     133

Frequently Reparsed Statements

Increase the SESSION_CACHED_CURSORS parameter to lower re-parsing. Also use bind variables in SQL statements rather than hard-wired literal values. A SQL statement with differing text will be reparsed. SQL statements in stored procedures utilising bind variables are the most efficient since the text of the SQL statement lokks the same and thus the statement will not reparsed. Remember that the process of reparsing is the process of regeneration of an optimised execution plan, not the actual execution of that execution plan. See How to Find Problem Queries.

select substr(sql_text,1,64) "SQL Text",parse_calls,executions from v$sqlarea order by parse_calls;

SQL Text                                                         PARSE_CALLS EXECUTIONS
---------------------------------------------------------------- ----------- ----------
ALTER DATABASE MOUNT EXCLUSIVE                                             0          1
ALTER SESSION SET NLS_LANGUAGE= 'AMERICAN' NLS_TERRITORY= 'AMERI           0          1
alter database open                                                        0          1
select ctime, mtime, stime from obj$ where obj# = :1                       0         52
select charsetid, charsetform from col$  where obj# = :1 and col           0         45
select blevel, leafcnt, distkey, lblkkey, dblkkey, clufac,                 0         29
select line#, sql_text from bootstrap$ where obj# != :1                    0          1
select rowcnt,blkcnt,empcnt,avgspc,chncnt,avgrln,nvl(degree,1),            0         18
update undo$ set name=:2,file#=:3,block#=:4,status$=:5,user#=:6,           0          8
select value$ from sys.props$ where name = :1                              0         18
select name,file#,block#,status$,user#,undosqn,xactsqn,scnbas,sc           0         28
ALTER SESSION SET NLS_LANGUAGE= 'AMERICAN' NLS_TERRITORY= 'AMERI           1          1
SELECT DECODE('A','A','1','2') FROM DUAL                                   1          1
select * from v$sysstat where name in ('parse time cpu','parse t           1          1
commit                                                                     1          1
SELECT USER FROM DUAL                                                      1          1
select TOTAL from SYS.ID_GENS$                                             1          1
select location_name from loc$                                             1          1
select local_tran_id, global_tran_fmt, global_oracle_id, global_           1          1
.
.
.

Consistency of Data and Rollback

Consistency with data is a problem. Mixing of numerous small insert transactions with a large selection can a lot of rolling back of the insert statements to maintain the consistency of the select statement. This is kind of odd. However, this is the way Oracle works, rollback is maintained and subject for use within the scope of a transaction, ie. prior a commit or rollback statement. Thus do not mix batch processing or reporting with an OLTP system. Even a replicated database would cause this problem since all SQL is duplicated. Thus replication is not really an effective method for reporting off an internet database for instance. It would be better to execute reports after hours at low usage times or on a separated database such as a periodically read only standby database. In fact if using a standby database it is probably best for an internet database to create two standby databases, one for immediate potential activation and the second for 24 hour reporting. Too few rollback segments can result in snapshot too old errors due to resuse of not yet commited transactions. This problem can also be solved by committing more often, ie. having fewer SQL statements in each transaction. There can also be a case of the database writer process being unable to keep up with the number of dirty buffers produced. Resolve this problem by increasing the number of DBWn processes or increasing the size of the database buffer block cache.

Locking

Processes can contend for latches. Spin counts can also be set for Oracle processes where a process will wake-up periodically. The V$SYSTEM_EVENT view will show how long processes await latches.

select substr(event,1,32) "EVENT",Total_waits,total_timeouts,time_waited,average_wait from v$system_event order by event;

EVENT                            TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT
-------------------------------- ----------- -------------- ----------- ------------
Null event                                 1              1           0            0
SQL*Net break/reset to client              2              0           0            0
SQL*Net message from client               63              0           0            0
SQL*Net message to client                 64              0           0            0
control file parallel write              150              0           0            0
control file sequential read              67              0           0            0
db file parallel write                     1              0           0            0
db file scattered read                   105              0           0            0
db file sequential read                 1208              0           0            0
db file single write                       4              0           0            0
file identify                             20              0           0            0
file open                                 48              0           0            0
instance state change                      1              0           0            0
latch free                                 2              0           0            0
log file parallel write                   12              0           0            0
log file sequential read                   5              0           0            0
log file single write                      5              0           0            0
log file sync                              1              0           0            0
pmon timer                             43687          43686           0            0
process startup                            3              1           0            0
rdbms ipc message                        444            422           0            0

EVENT                            TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT
-------------------------------- ----------- -------------- ----------- ------------
rdbms ipc reply                            7              0           0            0
refresh controlfile command                1              0           0            0
reliable message                           1              0           0            0
smon timer                                 5              2           0            0

The queries below may show some sensible clues as to the source of latch misses.

select count(*),parent_name from v$latch_misses group by parent_name;
select substr(parent_name,1,32) "Parent",sum(nwfail_count) "NWFail",sum(sleep_count) "Sleep",sum(wtr_slp_count) "WTRSLP",sum(longhold_count) "LongHold" from v$latch_misses group by parent_name;
select * from parent_name from v$latch_misses;

Hardware and Software Architecture

Multiple tiered and clustering architectures can help to save and utilise CPU resources more effectively.