Wait Event Analysis Example


There are several types of wait events
Only a few wait events that matter normally Add up the "shadow process" event times.
This gives us the total amount of time waited by all the users



Questions or comments ? Send a note to Kyle Hailey

A typical waits event section of a report.txt looks like


SQLDBA> Rem System wide wait events.
SQLDBA> select  n1.event "Event Name",
     2>         n1.event_count "Count",
     3>         n1.time_waited "Total Time",
     4>         (n1.time_waited/n1.event_count) "Average Time"
     5>    from stats$event n1
     6>    where n1.event_count > 0
     7>    order by n1.time_waited desc;
Event Name                  Count        Total Time   Average Time
--------------------------- ------------ ------------ ------------
client message                   1770692     26676762 15.065726846
rdbms ipc message                   7626       782385 102.59441385
pmon timer                           535       158481 296.22616822
smon timer                             7       146587        20941
db file sequential read            83662        79166 .94625995075
db file scattered read              5224        26120            5
db file parallel write              6982         6848 .98080779146
inactive session                      60         6060          101
latch free                         15653         4442 .28377946719
enqueue                               28         2268           81
log file sync                       3638         2245 .61709730621
log file parallel write             3892         2004 .51490236382
rdbms ipc reply                       19         1813 95.421052632
control file parallel write          843          850 1.0083036773
log file space/switch                  5          236         47.2
db file single write                 726          210 .28925619835
control file sequential rea         1196           22 .01839464883
library cache pin                     24           10 .41666666667
buffer busy waits                     28            6 .21428571429
library cache load lock                2            5          2.5
write complete waits                   5            5            1
log file sequential read               1            2            2
log file single write                  2            0            0
23 rows selected.
SQLDBA>
(To see and interpretation of all the wait events possible see: wait events )

We can immediately simplify this list by eliminating the wait events that are not related to users



Event Name                  Count        Total Time   Average Time
--------------------------- ------------ ------------ ------------
client message                   1770692     26676762 15.065726846
rdbms ipc message                   7626       782385 102.59441385
pmon timer                           535       158481 296.22616822
smon timer                             7       146587        20941
db file sequential read            83662        79166 .94625995075
db file scattered read              5224        26120            5
db file parallel write              6982         6848 .98080779146
inactive session                      60         6060          101
latch free                         15653         4442 .28377946719
enqueue                               28         2268           81
log file sync                       3638         2245 .61709730621
log file parallel write             3892         2004 .51490236382
rdbms ipc reply                       19         1813 95.421052632
control file parallel write          843          850 1.0083036773
log file space/switch                  5          236         47.2
db file single write                 726          210 .28925619835
control file sequential rea         1196           22 .01839464883
library cache pin                     24           10 .41666666667
buffer busy waits                     28            6 .21428571429
library cache load lock                2            5          2.5
write complete waits                   5            5            1
log file sequential read               1            2            2
log file single write                  2            0            0


The above output gives a profile of the "bottleknecks" in the database. We can elemenate most of the lines because they are usually irrelevant. For example:

To see a list of wait events broken down by type see: wait events by type

This leaves us with:

Event Name                  Count        Total Time   Average Time
--------------------------- ------------ ------------ ------------
db file sequential read            83662        79166 .94625995075
db file scattered read              5224        26120            5
latch free                         15653         4442 .28377946719
enqueue                               28         2268           81
log file sync                       3638         2245 .61709730621
rdbms ipc reply                       19         1813 95.421052632
log file space/switch                  5          236         47.2
library cache pin                     24           10 .41666666667
buffer busy waits                     28            6 .21428571429
library cache load lock                2            5          2.5
write complete waits                   5            5            1

The TIME_WAITED is measured in 100ths of seconds.

The basic easily tunable events are:

So by far the biggest waits from the example above are on and the sum of every time any users ever waited for this event during the period of bstat/estat is

This leaves us with:


Event Name                  Count        Total Time   Average Time
--------------------------- ------------ ------------ ------------
db file sequential read            83662        79166 .94625995075
db file scattered read              5224        26120            5

Now the question, so is 17.5 minutes significant? Well thats harder to decide.
Probably the easiest thing to do is look at the statistic in the statistic section of the bstat/estat report . Compare the ratio of the CPU time to the WAIT time.

for the lovers of generic rules:

Another way to look at the relative importance of the wait events is to look at how many users were connected during the bstat/estat to figure out the average time waited by users., First one needs to know over what period of time the bstat/estat was run. This is given at the bottom of the report.txt ... time period of stats
Second, one needs to know how many users were working (thus possible waiting for these event). The file report.txt doesn't really help here. It does give how many users logged on during the period, but not how many were connected. You can run the following for the number of users connected:

OK, say you know over what period the bstat/estat was taken and you know roughly how many users were connected, then you can get some idea of whether the wait time is important. For example:

The importance of the time waited is relative to the time we spent working. So in order to determine the importance of the wait events we need to know how much time the users spent active or working (and not idle). There are two ways to determine this

The first method I think is probably the best and easiest to use. The stat "CPU used by this session" is the sum of time spent by all Oracle processes using the CPU measured in 100ths of a second. So: So assuming user active time = wait_time+cpu_time, then the amount waited So for generic rules lovers given the above formula, if the result is: