Making
Humongous
Doable
Need to manage a gigantic database?
Let DB2 do the heavy lifting.
Robert Catterall
(r catterall@catterallconsulting.
com) is president of Catterall
Consulting, a provider of DB2
consulting and training services.
“Really big,” in the context of a data-serving
system, used to mean a few hundred transactions per second, a terabyte or more of data,
and a thousand or more database objects (tables
and indexes). Today, a system with these characteristics would be thought of as “large,” but
the bar for “huge” has been raised considerably:
more than a thousand transactions per second,
multiple terabytes of data, and tens of thousands
of database objects.
Organizations around the world must deal
with such huge systems in a cost-effective
manner. The IBM DB2 development group has
consistently delivered features and functions
that address this need for ever-more efficient
support of ever-larger databases. You could write
a book on all that stuff, but I have about 1,200
more words at my disposal, so I’ll focus on two
of my favorites: a new and game-changing scale-out solution for DB2 on the AIX/Power platform,
and a DB2 for z/OS feature that is great for large
databases but overlooked by a lot of DB2 people.
DB2 pureScale: Advanced shared-
data clustering
You can read the specifics on DB2 pureScale in
the article “What is DB2 pureScale?” in this
issue. Here, I want to talk about why it’s so
important. Vertical scalability, also known as
scale-up, is a DB2 strength. Give a DB2 server
more and/or faster engines, and you’ll get better
throughput. Sounds simple, but making good
on that proposition—once you get past a few
processors—requires advanced engineering.
That said, sometimes a really big database system is best supported with a scale-out
(multi-node) configuration. In such cases, it’s
important to go with a scale-out solution that
provides a good match for the requirements of
the target application. For large data warehouse
systems, the shared-nothing multi-node architecture implemented via the DB2 for Linux, UNIX,
and Windows (LUW) data partitioning feature
(DPF) makes lots of sense. For online transaction processing (OLTP) applications, on the
other hand, a shared-data cluster (multiple servers
with concurrent read/write access to a database
on shared disk) is likely to be a better fit. For years,
the only available DB2-based shared-data system
was DB2 for z/OS data sharing on a mainframe
parallel sysplex. That changed with the announcement of IBM DB2 pureScale, a shared-data solution for DB2 on the IBM AIX/Power platform.
DB2 pureScale, announced in October 2009,
trumps the shared-data cluster competition in
the UNIX market. Here’s the deal: if you’re going
to give multiple data servers read/write access
to one database, you have a couple of choices
when it comes to keeping the different servers
from trashing the consistency of said database.
One option is to have a node directly communicate with all the other nodes regarding data
rows that it’s changing and data that it has
cached locally. Alternatively, you can go with a
centralized mechanism by which a data server
node posts global lock and global buffer pool
information to structures residing in devices
that provide a shared-memory resource to