o
n
instances of DB2, and the databases under
each instance’s control. If the transaction load
was heavy at the time the partial site failure
occurred, it can take several minutes to restart
the instances and the databases affected. With
HADR, a failover occurs and the standby database takes over, replacing the failed primary
database in seconds. Furthermore, you can
automatically redirect any clients that were
using the primary database to the standby
database either by using Automatic Client
Reroute or by adding retry logic in each application that interacts with the database.
Because HADR uses TCP/IP to commu-icate between the primary and the standby
database, each database can reside in a
different location. For example, the primary
database might be located at one data center
in one city, while the standby database might
be located at another data center in another
city. If a complete site failure occurs at the
primary site, data availability is maintained by
having the remote standby database take over
as the primary database.
After the emergency is over and the failed
riginal primary server/database has been
repaired, it can rejoin the HADR pair as the
standby database—provided both copies of the
database can be made consistent. And once the
original primary database is reintegrated into
the HADR pair as the standby database, you
can switch the roles so that the original primary
database once again functions as the primary
database. This is known as a failback operation.
In a perfect world, mission-critical databases would never be subjected to issues like hard- ware, network, or software malfunctions, nor would they be exposed to natural disasters like fire and flood. But in reality, any database can be impacted by one of these events
and the consequences can vary—downtime
for recovery, loss of critical data, or the need
to completely rebuild the entire database
infrastructure. If your database environment consists of single-partition DB2 databases, you can minimize the consequences
of such events by taking advantage of an
IBM DB2 for Linux, UNIX, and Windows
feature known as High Availability Disaster
Recovery (HADR).
A step-by-step guide to minimizing the impact
of failures on your database environment
Reducing
Downtime with
HADR
Roger E. Sanders
( roger_e_sanders@yahoo.com)
is consultant corporate systems
engineer at EMC Corporation.
He is the author of 18 books on
DB2 for Linux, UNIX, and
Windows and teaches classes
at many DB2 conferences. He is
currently working on a new
book that outlines how to write
technical magazine articles and
books and get them published.
Special thanks to Dale McInnis,
senior technical staff member
(STSM) and DB2 availability
architect at the IBM Toronto
Lab, for reviewing the material
presented in this article.
r
What is HADR?
HADR is a DB2 database replication feature
that is designed to minimize the impact to
a database system when a partial (
hardware, network, or software malfunction) or a
complete site (fire, flood, and so forth) failure
occurs. HADR protects against data loss by
replicating data changes from a source database, called the primary, to a target database,
called the standby. Synchronization with the
standby database occurs by rolling forward
transaction log records that were generated
for the primary database and have been
shipped to the standby database (see sidebar,
“What gets replicated?”).
Without HADR, a partial site failure
equires restarting a server, one or more