a
o
s
a
s
But first, you need to understand the path
elected by the DB2 optimizer. That information is obtained via the DB2 EXPLAIN statement
(EXPLAIN output for a statement can be examined in text form by querying EXPLAIN tables,
or in a graphical form using the Optimization
Service Center for DB2 for z/OS or the Visual
Explain functionality of DB2 for LUW). Once
you have the EXPLAIN output, look at how
the query’s result set is generated, and then
consider alternative access path possibilities.
To properly assess these access path
lternatives, you’ll need information from
the DB2 catalog about the tables accessed
by the query and the indexes defined on
those tables. Keep in mind that this catalog
data is what the optimizer uses to make
access path decisions. If the catalog statistics are inaccurate, get them up to date
with RUNSTATS. That simple step could lead
to a new access path that might dramatically improve query performance.
When you’ve looked at the access path
elected by DB2 for your target query, ask
yourself this question: Should DB2 have
selected a different access path for the query?
If a path other than the one selected by DB2
looks better to you, check to see if the tables
targeted by the query—and their associated
indexes—are well organized. If the catalog
statistics are up to date and DB2 knows that a
particular index is poorly organized, DB2 may
decide not to use it. In that case, reorganizing
the index might lead to its selection as part of
a better-performing query access path.
A more likely explanation for a difference
f opinion between you and DB2 is that you
know something that DB2 doesn’t. Perhaps
you know that data values in a certain column
are not evenly distributed. Without column
data-value distribution information in the
catalog, DB2 will assume an even distribution
of values, which may lead the optimizer to a
sub-optimal access path.
If a significant data value skew exists for
column referenced in a query predicate,
make sure that DB2 knows this, preferably by
generating column-value histogram statistics
using RUNSTATS. (Histogram statistics, available
in DB2 for LUW for quite some time, can be
generated with DB2 for z/OS V9 RUNSTATS. If
you are using DB2 for z/OS V8, the next best
thing is the FREQVAL option of RUNSTATS.) Note
that advice for enriching DB2 for z/OS catalog
statistics for the benefit of a particular query
can be obtained from the Optimization Service
Center for DB2 for z/OS—a free version of
which can be downloaded from ibm.com/
db2/zos/downloads/ osc.html.
a
n
When you and DB2 agree
If you and DB2 agree on the best available
access path but that path does not deliver
satisfactory performance, you need to
provide DB2 with a new access path choice.
There are several ways to do this, including
adjusting your indexes, using a materialized query table, and table re-clustering.
Index-related actions can reduce the
umber of pages that DB2 has to examine
to generate the query’s result set. You might
define an index on a predicate-referenced
column that is not currently indexed. You
might define an index on a set of columns
that is already indexed, but with a different
column order specified to increase the
number of index key columns on which DB2
can perform predicate-related value matches.
(For example: given a compound predicate
of the form COL_A > y AND COL_B = x, and an
existing index on COL_A | COL_B, a new index
on COL_B | COL_A would enable DB2 to match
on both columns of the index key instead of
just matching on the first key column.) You
might add a column to an existing index to
get index-only access for a portion of the
query’s processing.
Another way to provide a new and better
ccess path for a query is through the use
of DB2’s materialized query table (MQT)
functionality. An MQT is defined by way of a
SELECT statement—usually a result set that
would otherwise have to be dynamically built
when a query executes. Here are three great
things about an MQT:
1. DB2 can automatically rewrite a query to
take advantage of an MQT.
q
i
p
2. Because the result set in the MQT is
already there, the run time for a query
that is rewritten by DB2 to access the
MQT can be reduced dramatically,
saving time that would otherwise be
spent dynamically building that result
set at query execution time.
3. An MQT can be indexed to provide an even
greater boost to query performance.
Table re-clustering is another means of
roviding DB2 with a better access path by
reducing the number of pages that DB2 has
to examine in executing the query. Data clustering can have an especially large impact on
data warehouse query performance, because
rows are often retrieved in large numbers. If
a table was originally defined to be clustered
on an account number column, and it subsequently turned out that users were retrieving
rows from the table on a date-range basis, it
might be worth considering a change in the
table’s clustering sequence to get retrieved
rows into close proximity to each other in the
table. Also, the performance of join operations may be improved significantly if joined
tables are clustered in join-column sequence
(e.g., by CUSTOMER_ID when the join predicate is TABLE_A.CUSTOMER_ID = TABLE_B.
CUSTOMER_ID).
If you do decide to re-cluster a table, keep
n mind that DB2 for z/OS V9 makes this
much easier through the new CLUSTER and
NOT CLUSTER options of the ALTER INDEX statement. If re-clustering is on the table, make
sure that you give some thought to exploiting
clustering-related enhancements delivered in
the last couple of DB2 versions, such as multidimensional clustering for DB2 for LUW, and
the ability to partition a table on one key and
cluster within partitions on another key in a
DB2 for z/OS environment.
Here’s hoping that your data warehouse
uery-tuning efforts will deliver breakthrough
results. Business intelligence is as hot an application area as you can find these days, and
getting into it (if you’re not already there) will
give you all kinds of opportunities to deliver
real value to your organization. Tune on!