DB2 - Problem description
| Problem IC71654 | Status: Closed |
TAKEOVER HADR COMMAND HANGS UP ON STANDBY WHEN A TRAP HAS BEEN PREVIOUSLY SUSTAINED IN PRIMARY DATABASE | |
| product: | |
DB2 FOR LUW / DB2FORLUW / 970 - DB2 | |
| Problem description: | |
The hang problem occurs if a takeover is issued on an HADR
Standby when the HADR Primary has previously sustained a trap.
On the HADR Standby: the takeover command will hang, and other
commands such as 'db2stop force' will either hang or not work.
On the HADR Primary: clients will be unable to connect.
If the HADR Primary has previously sustained a trap, you will
be able to see:
1) ADM14012C or ADM14013C messages in the administration
notification log ({instance_name}.nfy)
AND
2) A suspended db2agent in 'db2pd -EDUs' output.
And even after you apply APAR IC69960 fix, the takeover
command will get into hang on the conditions above.
The takeover command fails on the condition above with the
Severe error messages like ADM14013C in db2diag.log of primary,
which indicate the db2agents had been suspended in primary like
below.
2010-09-27-14.35.38.415495+540 I1781400A564 LEVEL: Severe
PID : 1577038 TID : 11054 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : TESTDB
APPHDL : 0-367 APPID:
10.219.61.1.64526.100927053458
AUTHID : DB2INST1
EDUID : 11054 EDUNAME: db2agent (TESTDB) 0
FUNCTION: DB2 UDB, RAS/PD component,
pdResilienceIsSafeToSustain, probe:800
DATA #1 : String, 37 bytes
Trap Sustainability Criteria Checking
DATA #2 : Hex integer, 8 bytes
0x0000000000021000
DATA #3 : Boolean, 1 bytes
true
...
2010-09-27-14.35.38.625896+540 E1813735A941 LEVEL: Severe
PID : 1577038 TID : 11054 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : TESTDB
APPHDL : 0-367 APPID:
10.219.61.1.64526.100927053458
AUTHID : DB2INST1
EDUID : 11054 EDUNAME: db2agent (TESTDB) 0
(suspended) 0
FUNCTION: DB2 UDB, DRDA Application Server,
sqljsTrapResilience, probe:800
MESSAGE : ADM14013C The following type of critical error
occurred: "Trap".
This error occurred because one or more threads that
are associated
with the current DB2 instance have been suspended, but
the instance
process is still running. First Occurrence Data
Capture (FODC) was
invoked in the following mode: "Automatic". FODC
diagnostic
information is located in the following directory:
"/var/log/db2/FODC_Trap_2010-09-27-14.35.38.031284/".
For more information on sustained traps, see:
* Enhanced resilience to errors and traps reduces outages
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?t
opic=/com.ibm.db2.luw.wn.doc/doc/c0054512.html | |
| Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * All * **************************************************************** * PROBLEM DESCRIPTION: * * "takeover hadr" command hangs up when a trap has been * * sustained. * **************************************************************** * RECOMMENDATION: * * Upgrade to db2 Version 9.7 FixPak 4 * **************************************************************** | |
| Local Fix: | |
If db2_kill is issued on the primary hadr system to disconnect HADR connection, takeover hadr should be ended with errors. For more information on recovering from sustained traps, see: * Recovering from sustained traps http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?t opic=/com.ibm.db2.luw.admin.trb.doc/doc/t0055494.html | |
| available fix packs: | |
DB2 Version 9.7 Fix Pack 4 for Linux, UNIX, and Windows | |
| Solution | |
Problem was the first fixed in Version 9.7 FixPak 4 | |
| Workaround | |
not known / see Local fix | |
| Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 04.10.2010 09.05.2011 09.05.2011 |
| Problem solved at the following versions (IBM BugInfos) | |
9.7. | |
| Problem solved according to the fixlist(s) of the following version(s) | |
| 9.7.0.4 |
|