DB2 - Problembeschreibung
| Problem IC71654 | Status: Geschlossen |
TAKEOVER HADR COMMAND HANGS UP ON STANDBY WHEN A TRAP HAS BEEN PREVIOUSLY SUSTAINED IN PRIMARY DATABASE | |
| Produkt: | |
DB2 FOR LUW / DB2FORLUW / 970 - DB2 | |
| Problembeschreibung: | |
The hang problem occurs if a takeover is issued on an HADR
Standby when the HADR Primary has previously sustained a trap.
On the HADR Standby: the takeover command will hang, and other
commands such as 'db2stop force' will either hang or not work.
On the HADR Primary: clients will be unable to connect.
If the HADR Primary has previously sustained a trap, you will
be able to see:
1) ADM14012C or ADM14013C messages in the administration
notification log ({instance_name}.nfy)
AND
2) A suspended db2agent in 'db2pd -EDUs' output.
And even after you apply APAR IC69960 fix, the takeover
command will get into hang on the conditions above.
The takeover command fails on the condition above with the
Severe error messages like ADM14013C in db2diag.log of primary,
which indicate the db2agents had been suspended in primary like
below.
2010-09-27-14.35.38.415495+540 I1781400A564 LEVEL: Severe
PID : 1577038 TID : 11054 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : TESTDB
APPHDL : 0-367 APPID:
10.219.61.1.64526.100927053458
AUTHID : DB2INST1
EDUID : 11054 EDUNAME: db2agent (TESTDB) 0
FUNCTION: DB2 UDB, RAS/PD component,
pdResilienceIsSafeToSustain, probe:800
DATA #1 : String, 37 bytes
Trap Sustainability Criteria Checking
DATA #2 : Hex integer, 8 bytes
0x0000000000021000
DATA #3 : Boolean, 1 bytes
true
...
2010-09-27-14.35.38.625896+540 E1813735A941 LEVEL: Severe
PID : 1577038 TID : 11054 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : TESTDB
APPHDL : 0-367 APPID:
10.219.61.1.64526.100927053458
AUTHID : DB2INST1
EDUID : 11054 EDUNAME: db2agent (TESTDB) 0
(suspended) 0
FUNCTION: DB2 UDB, DRDA Application Server,
sqljsTrapResilience, probe:800
MESSAGE : ADM14013C The following type of critical error
occurred: "Trap".
This error occurred because one or more threads that
are associated
with the current DB2 instance have been suspended, but
the instance
process is still running. First Occurrence Data
Capture (FODC) was
invoked in the following mode: "Automatic". FODC
diagnostic
information is located in the following directory:
"/var/log/db2/FODC_Trap_2010-09-27-14.35.38.031284/".
For more information on sustained traps, see:
* Enhanced resilience to errors and traps reduces outages
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?t
opic=/com.ibm.db2.luw.wn.doc/doc/c0054512.html | |
| Problem-Zusammenfassung: | |
**************************************************************** * USERS AFFECTED: * * All * **************************************************************** * PROBLEM DESCRIPTION: * * "takeover hadr" command hangs up when a trap has been * * sustained. * **************************************************************** * RECOMMENDATION: * * Upgrade to db2 Version 9.7 FixPak 4 * **************************************************************** | |
| Local-Fix: | |
If db2_kill is issued on the primary hadr system to disconnect HADR connection, takeover hadr should be ended with errors. For more information on recovering from sustained traps, see: * Recovering from sustained traps http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?t opic=/com.ibm.db2.luw.admin.trb.doc/doc/t0055494.html | |
| verfügbare FixPacks: | |
DB2 Version 9.7 Fix Pack 4 for Linux, UNIX, and Windows | |
| Lösung | |
Problem was the first fixed in Version 9.7 FixPak 4 | |
| Workaround | |
keiner bekannt / siehe Local-Fix | |
| Weitere Daten | |
Datum - Problem gemeldet : Datum - Problem geschlossen : Datum - der letzten Änderung: | 04.10.2010 09.05.2011 09.05.2011 |
| Problem behoben ab folgender Versionen (IBM BugInfos) | |
9.7. | |
| Problem behoben lt. FixList in der Version | |
| 9.7.0.4 |
|