Friday, February 3, 2012

Do RUNSTATS at the right point in time

Updating statistics about table and index cardinalities and the distribution of values in there to give the optimizer the right information, so that it can determine the best possible access path to the data is one of the BAU tasks of database maintenance. In WPS installations the usual recommendation is to do it once a week during off-peak hours, as performing runstats means to read all the data, which represents a certain impact that you might want to avoid during peak hours.

However there might be a pattern, where such an approach is quite counter-productive. In a recent case runstats was done weekly on a Sunday night. The processing workload consisted of primarily microflows and some (short lived) macroflows. The observed problem was processing slowdowns due to a major amount of deadlocks (up to 10 per minute) during the peak times of the day.

Checking the explains for the deadlocked statements showed, that the database's optimizer didn't take the best possible index. There was an index, that covered all the columns in the where-clause, but this index was not taken by the optimizer, but the primary key index instead.

A few SELECT COUNT statements on the involved tables showed a low and changing cardinality (just a few dozen rows). Together with the information about the short life-time of the macroflows and the daily processing pattern where almost everything happens during typical office hours, the idea of performing runstats on a Sunday night probably lead to collecting statistics on empty tables.
And when the optimizer believes (due to the statistics it has) that the tables are empty, it might take an access path that involves e.g. more locking than necessary.

Now the suggestion was to perform runstats during a typical peak time, where the involved BPEDB tables contained a few dozen rows of data.

Tataaa - deadlocks were gone.

And the explain on the SQL statements in question showed that exactly the access path was taken that theoretically would be the best one.

Conclusion:
Sometimes runstats should be done during peak times - especially when the workload pattern is such that you could expect no or almost no process instance data being available at those off-peak hours where you typically would do database maintenance.