DB2 - Problembeschreibung
Problem IC71427 | Status: Geschlossen |
ON LARGE DPF SYSTEMS WITH MANY NODES, DB2STOP CAN TAKE A LONG TIME TO COMPLETE (TOO LONG IN NODE RECOVERY) | |
Produkt: | |
DB2 FOR LUW / DB2FORLUW / 970 - DB2 | |
Problembeschreibung: | |
If there are many nodes, e.g. more than 90 nodes, in a DPF environment, then it is possible for db2stop to hit the default START_STOP_TIME timeout of 10 minutes, which would cause DB2 to issue a kill underneath to all nodes. Symptom: 1.) db2diag.log would have logs like the following: 2010-08-19-03.20.45.041184-300 I109662A299 LEVEL: Event PID : 5910764 TID : 1 PROC : db2stop2 INSTANCE: XXXXXX NODE : 000 EDUID : 1 FUNCTION: DB2 UDB, base sys utilities, DB2StopMain, probe:240 DATA #1 : String, 26 bytes Stop phase is in progress. 2010-08-19-03.20.45.041799-300 I109962A314 LEVEL: Event PID : 5910764 TID : 1 PROC : db2stop2 INSTANCE: XXXXXX NODE : 000 EDUID : 1 FUNCTION: DB2 UDB, base sys utilities, DB2StopMain, probe:250 DATA #1 : String, 41 bytes Requesting system controller termination. and then many occurrences of the following messages: 2010-08-19-03.21.20.554971-300 I115791A390 LEVEL: Error PID : 389258 TID : 772 PROC : db2sysc 0 INSTANCE: XXXXXX NODE : 000 EDUID : 772 EDUNAME: db2fcms 0 FUNCTION: DB2 UDB, fast comm manager, sqkfSendConduit::ValidateConnectedLinks, probe:100 RETCODE : ZRC=0x8159006B=-2124873621=SQLKF_CONN_CLOSED "FCM connection closed" 2010-08-19-03.21.39.244694-300 I116936A362 LEVEL: Error PID : 389258 TID : 772 PROC : db2sysc 0 INSTANCE: XXXXXX NODE : 000 EDUID : 772 EDUNAME: db2fcms 0 FUNCTION: DB2 UDB, fast comm manager, sqkfTcpLink::closeConn, probe:25 MESSAGE : Link info: node 14; type 4; state 5; session 0;activated 1 2010-08-19-03.21.39.244867-300 I117299A390 LEVEL: Error PID : 389258 TID : 772 PROC : db2sysc 0 INSTANCE: XXXXXX NODE : 000 EDUID : 772 EDUNAME: db2fcms 0 FUNCTION: DB2 UDB, fast comm manager, sqkfSendConduit::ValidateConnectedLinks, probe:100 RETCODE : ZRC=0x8159006B=-2124873621=SQLKF_CONN_CLOSED "FCM connection closed" 2010-08-19-03.21.50.657934-300 I118444A362 LEVEL: Error PID : 389258 TID : 772 PROC : db2sysc 0 INSTANCE: XXXXXX NODE : 000 EDUID : 772 EDUNAME: db2fcms 0 FUNCTION: DB2 UDB, fast comm manager, sqkfTcpLink::closeConn, probe:25 MESSAGE : Link info: node 15; type 4; state 5; session 0;activated 1 ...... 2.) The stack trace would have the following pattern: <StackTrace> -------Frame------ ------Function + Offset------ 0x09000000001174D4 __fd_poll + 0x98 0x09000000000A9AE0 poll + 0xC 0x09000000000A8968 res_nsend + 0xDA4 0x0900000000100D0C res_nquery + 0x130 0x0900000000100370 res_nquerydomain + 0x180 0x090000000010063C res_nsearch + 0x228 0x09000000000B3D1C res_search + 0xA8 0x0900000000106088 ho_byname2 + 0x13C 0x09000000001210E0 ho_byname2 + 0x1AC 0x09000000000A6550 gethostbyname2 + 0x190 0x09000000000A9E98 getaddrinfo2 + 0x384 0x09000000000AB2C4 getaddrinfo + 0x36C 0x0900000009515220 sqloPdbTcpIpGetAddrInfo + 0x13C 0x090000000C2DACE0 sqloPdbTcpIpResolveHostName + 0x1C4 0x090000000C2DB030 sqloPdbTcpIpResolveHostName@glue557 + 0x7C 0x09000000086A7324 sqloReadDb2nodes + 0x8C4 0x090000000836A978 RefreshDb2nodesCache__19sqkfFastCommManagerFv + 0x210 0x090000000834FF48 RefreshNodesInfo__15sqkfSendConduitFP14sqkfDataTargetiPb + 0x74 0x090000000928A560 CheckForFailoverConnectRetry__15sqkfSendConduitFsPi + 0x328 0x090000000927F640 HandleConnectLostEvent__15sqkfSendConduitFUl + 0x160 0x090000000927D6D0 RunEDU__15sqkfSendConduitFv + 0x264 0x0900000008A46E2C EDUDriver__9sqzEDUObjFv + 0xF8 0x0900000008A4C778 sqloEDUEntry + 0x278 </StackTrace> | |
Problem-Zusammenfassung: | |
**************************************************************** * USERS AFFECTED: * * Users of large DPF environments * **************************************************************** * PROBLEM DESCRIPTION: * * If there are many nodes, e.g. more than 90 nodes, in a DPF * * environment, then it is possible for db2stop to hit the * * default START_STOP_TIME timeout of 10 minutes, which would * * cause DB2 to issue a kill underneath to all nodes. * **************************************************************** * RECOMMENDATION: * * Update to Version 9.7 Fix Pack 4 or higher * **************************************************************** | |
Local-Fix: | |
verfügbare FixPacks: | |
DB2 Version 9.7 Fix Pack 4 for Linux, UNIX, and Windows | |
Lösung | |
Problem was first fixed in Version 9.7 Fix Pack 4 | |
Workaround | |
keiner bekannt / siehe Local-Fix | |
Bug-Verfolgung | |
Vorgänger : APAR is sysrouted TO one or more of the following: IC71429 IC71431 Nachfolger : | |
Weitere Daten | |
Datum - Problem gemeldet : Datum - Problem geschlossen : Datum - der letzten Änderung: | 23.09.2010 02.05.2011 02.05.2011 |
Problem behoben ab folgender Versionen (IBM BugInfos) | |
9.7.FP4 | |
Problem behoben lt. FixList in der Version | |
9.7.0.4 |