DB2 - Problem description
Problem IT18186 | Status: Closed |
"POSSIBLE MEMORY CORRUPTION DETECTED" CAUSING INSTANCE TO CRASH | |
product: | |
DB2 FOR LUW / DB2FORLUW / A50 - DB2 | |
Problem description: | |
The scenario we have here is: cat node = 0 coor node = 1 remote nodes = other nodes on the system 1) User issues DROP TABLE on coor node 2) coor node sends RPC to cat node to do the DROP 3) cat node sends RPC out to remote nodes to do invaliation (sqlrkrpc_nl) 4) subagents at remote nodes wait behind the var lock (held by INSERT) 5) User issues ctrl-C on coor node to kill off DROP TABLE 6) the interrupt drives statement savepoint rollback at the coor node sqlrr_coor_rbsvpt 7) the coor node knows it RPC'ed to cat node, so asks cat node to do savepoint rollback 8) the cat node get interrupted at sqlrkrpc_nl while waiting for the remote nodes to come back 9) the cat node does statement savepoint rollback sqlrr_sub_rrsvpt At this point, both coor and cat nodes have rolled back due to the interrupt, however, the subagents at the remote nodes are left waiting for the var lock thinking they still need to carry out the invalidation. The reason these remote node subagents were NOT interrupted as part of the ctrl-C is because DB2 didn't drive secondary rollback/interrupt for APM typed RPC requests. This results in an instance crash. You will see a db2diag.log entry like so: 2015-09-11-17.22.28.006301-240 I123409802E2041 LEVEL: Severe PID : 21335 TID : 46923113294144PROC : db2sysc 4 INSTANCE: db2inst1 NODE : 004 DB : SAMPLE APPHDL : 1-33639 APPID: 10.60.83.96.64331.150911202928 AUTHID : USER1 EDUID : 164366 EDUNAME: db2agntp (SAMPLE) 4 FUNCTION: DB2 UDB, SQO Memory Management, sqloDiagnoseFreeBlockFailure, probe:10 MESSAGE : Possible memory corruption detected. DATA #1 : ZRC, PD_TYPE_ZRC, 4 bytes 0x820F0002 DATA #2 : Corrupt block address, PD_TYPE_CORRUPT_BLK_PTR, 8 bytes 0x00002aab7d1ce1e0 DATA #3 : Block header, PD_TYPE_BLK_HEADER, 24 bytes 0x00002AAB7D1CE1C8 : A7D1 B7AA 0200 B0FA 87E2 3233 3E05 C1FC ..........23>... 0x00002AAB7D1CE1D8 : A7D1 B7AA 0200 B0FA ........ DATA #4 : Data header, PD_TYPE_BLK_DATA_HEAD, 48 bytes 0x00002AAB7D1CE1E0 : 0000 0000 0000 0000 C006 B2A9 AB2A 0000 .............*.. 0x00002AAB7D1CE1F0 : 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x00002AAB7D1CE200 : 0000 0000 0000 0000 A7D1 B7AA 0200 B0FA ................ CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol) [0] 0x00002AAAAB82D9E6 pdLog + 0x398 [1] 0x00002AAAAD3DC3CA /home/home/db2inst1/sqllib/lib64/libdb2e.so.1 + 0x291F3CA [2] 0x00002AAAAE158649 sqlofmblkEx + 0x91B [3] 0x00002AAAAB848775 _Z11sqlofmblkExPKcmP13SQLO_MEM_POOLPv + 0x9 [4] 0x00002AAAAD69871D _Z23sqlra_cache_del_dep_varP8sqlrr_cbPP18sqlra_list_dep_vari + 0x107 [5] 0x00002AAAAD69ADAD _Z20sqlra_inval_obj_hardP8sqlrr_cbP23sqlra_anchor_dependencyP23s qlra_cached_dependencyjiPtj + 0x665 [6] 0x00002AAAAD69C08B _Z21sqlra_inval_vars_hardP8sqlrr_cbhPhjS1_sjiPtj + 0x3A9 [7] 0x00002AAAAD69C676 _Z22sqlra_event_inval_hardP8sqlrr_cbP20sqlr_rpc_apm_request + 0x9C [8] 0x00002AAAAD684F8D _Z19sqlra_execute_eventP8sqlrr_cbP20sqlr_rpc_apm_request + 0x31D [9] 0x00002AAAAD6B520C _Z16sqlrk_apm_routerP8sqlrr_cbP16sqlkdRqstRplyFmtjPP15SQLR_RPCME SSAGE + 0x216 2015-09-11-17.22.28.097169-240 E123411844E977 LEVEL: Critical PID : 21335 TID : 46923113294144PROC : db2sysc 4 INSTANCE: db2inst1 NODE : 004 DB : SAMPLE APPHDL : 1-33639 APPID: 10.60.83.96.64331.150911202928 AUTHID : USER1 EDUID : 164366 EDUNAME: db2agntp (SAMPLE) 4 FUNCTION: DB2 UDB, SQO Memory Management, sqloDiagnoseFreeBlockFailure, probe:10 MESSAGE : ADM14001C An unexpected and critical error has occurred: "Panic". The instance may have been shutdown as a result. "Automatic" FODC (First Occurrence Data Capture) has been invoked and diagnostic information has been recorded in directory "/home/tst/apm/dbs/db2inst1/db2dump/FODC_Panic_2015-09-11-17.22. 28. 051370_0004/". Please look in this directory for detailed evidence about what happened and contact IBM support if necessary to diagnose the problem. The stack looks like this: 0x00002AAAB045B1EF ossDumpStackTraceEx + 0x01ef 0x00002AAAB0455FAE _ZN11OSSTrapFile6dumpExEmiP7siginfoPvm + 0x00cc 0x00002AAAAD3A09BF sqlo_trce + 0x03fb 0x00002AAAAD3E9B7F sqloEDUCodeTrapHandler + 0x02db 0x00002AAAAD3DB87C sqloCrashOnCriticalMemoryValidationFailure + 0x0020 0x00002AAAAD3E0F83 _ZN13SQLO_MEM_POOL32diagnoseMemoryCorruptionAndCrashEmPKcb + 0x02c3 0x00002AAAAE158649 sqlofmblkEx + 0x091b 0x00002AAAAB848775 _Z11sqlofmblkExPKcmP13SQLO_MEM_POOLPv + 0x0009 0x00002AAAAD69871D _Z23sqlra_cache_del_dep_varP8sqlrr_cbPP18sqlra_list_dep_vari + 0x0107 0x00002AAAAD69ADAD _Z20sqlra_inval_obj_hardP8sqlrr_cbP23sqlra_anchor_dependencyP23s qlra_cached_dependencyjiPtj + 0x0665 0x00002AAAAD69C08B _Z21sqlra_inval_vars_hardP8sqlrr_cbhPhjS1_sjiPtj + 0x03a9 0x00002AAAAD69C676 _Z22sqlra_event_inval_hardP8sqlrr_cbP20sqlr_rpc_apm_request + 0x009c 0x00002AAAAD684F8D _Z19sqlra_execute_eventP8sqlrr_cbP20sqlr_rpc_apm_request + 0x031d 0x00002AAAAD6B520C _Z16sqlrk_apm_routerP8sqlrr_cbP16sqlkdRqstRplyFmtjPP15SQLR_RPCME SSAGE + 0x0216 0x00002AAAAD5C59B8 _Z16sqlrr_rpc_routerP8sqlrr_cb + 0x06d0 0x00002AAAAD5C6E40 _Z21sqlrr_subagent_routerP8sqeAgentP12SQLE_DB2RA_T + 0x0d82 0x00002AAAAC7642EF _Z20sqleSubRequestRouterP8sqeAgentPjS1_ + 0x0679 0x00002AAAAC764D5E _Z21sqleProcessSubRequestP8sqeAgent + 0x00a6 0x00002AAAAB96A8A3 _ZN8sqeAgent6RunEDUEv + 0x0649 0x00002AAAAC068BD0 _ZN9sqzEDUObj9EDUDriverEv + 0x00a6 | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * ALL * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Upgrade to Db2 10.5 Fix Pack 9 or higher * **************************************************************** | |
Local Fix: | |
Do not ctrl-C out of DROP TABLE | |
Solution | |
First fixed in Db2 10.5 Fix Pack 9 | |
Workaround | |
not known / see Local fix | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 01.12.2016 29.09.2017 16.10.2017 |
Problem solved at the following versions (IBM BugInfos) | |
9.0. | |
Problem solved according to the fixlist(s) of the following version(s) |