suche 36x36
Latest versionsfixlist
11.1.0.7 FixList
10.5.0.9 FixList
10.1.0.6 FixList
9.8.0.5 FixList
9.7.0.11 FixList
9.5.0.10 FixList
9.1.0.12 FixList
Have problems? - contact us.
Register for free anmeldung-x26
Contact form kontakt-x26

DB2 - Problem description

Problem IT18186 Status: Closed

"POSSIBLE MEMORY CORRUPTION DETECTED" CAUSING INSTANCE TO CRASH

product:
DB2 FOR LUW / DB2FORLUW / A50 - DB2
Problem description:
The scenario we have here is: 
 
cat node = 0 
coor node = 1 
remote nodes = other nodes on the system 
 
1) User issues DROP TABLE on coor node 
2) coor node sends RPC to cat node to do the DROP 
3) cat node sends RPC out to remote nodes to do invaliation 
(sqlrkrpc_nl) 
4) subagents at remote nodes wait behind the var lock (held by 
INSERT) 
5) User issues ctrl-C on coor node to kill off DROP TABLE 
6) the interrupt drives statement savepoint rollback at the coor 
node sqlrr_coor_rbsvpt 
7) the coor node knows it RPC'ed to cat node, so asks cat node 
to do savepoint rollback 
8) the cat node get interrupted at sqlrkrpc_nl while waiting for 
the remote nodes to come back 
9) the cat node does statement savepoint rollback 
sqlrr_sub_rrsvpt 
 
At this point, both coor and cat nodes have rolled back due to 
the interrupt, however, the subagents at the remote nodes are 
left waiting for the var lock thinking they still need to carry 
out the invalidation.  The reason these remote node subagents 
were NOT interrupted as part of the ctrl-C is because DB2 didn't 
drive secondary rollback/interrupt for APM typed RPC requests. 
This results in an instance crash.  You will see a db2diag.log 
entry like so: 
 
2015-09-11-17.22.28.006301-240 I123409802E2041     LEVEL: Severe 
PID     : 21335                TID  : 46923113294144PROC : 
db2sysc 4 
INSTANCE: db2inst1             NODE : 004          DB   : SAMPLE 
APPHDL  : 1-33639              APPID: 
10.60.83.96.64331.150911202928 
AUTHID  : USER1 
EDUID   : 164366               EDUNAME: db2agntp (SAMPLE) 4 
FUNCTION: DB2 UDB, SQO Memory Management, 
sqloDiagnoseFreeBlockFailure, probe:10 
MESSAGE : Possible memory corruption detected. 
DATA #1 : ZRC, PD_TYPE_ZRC, 4 bytes 
0x820F0002 
DATA #2 : Corrupt block address, PD_TYPE_CORRUPT_BLK_PTR, 8 
bytes 
0x00002aab7d1ce1e0 
DATA #3 : Block header, PD_TYPE_BLK_HEADER, 24 bytes 
0x00002AAB7D1CE1C8 : A7D1 B7AA 0200 B0FA 87E2 3233 3E05 C1FC 
..........23>... 
0x00002AAB7D1CE1D8 : A7D1 B7AA 0200 B0FA 
........ 
DATA #4 : Data header, PD_TYPE_BLK_DATA_HEAD, 48 bytes 
0x00002AAB7D1CE1E0 : 0000 0000 0000 0000 C006 B2A9 AB2A 0000 
.............*.. 
0x00002AAB7D1CE1F0 : 0000 0000 0000 0000 0000 0000 0000 0000 
................ 
0x00002AAB7D1CE200 : 0000 0000 0000 0000 A7D1 B7AA 0200 B0FA 
................ 
CALLSTCK: (Static functions may not be resolved correctly, as 
they are resolved to the nearest symbol) 
  [0] 0x00002AAAAB82D9E6 pdLog + 0x398 
  [1] 0x00002AAAAD3DC3CA 
/home/home/db2inst1/sqllib/lib64/libdb2e.so.1 + 0x291F3CA 
  [2] 0x00002AAAAE158649 sqlofmblkEx + 0x91B 
  [3] 0x00002AAAAB848775 _Z11sqlofmblkExPKcmP13SQLO_MEM_POOLPv + 
0x9 
  [4] 0x00002AAAAD69871D 
_Z23sqlra_cache_del_dep_varP8sqlrr_cbPP18sqlra_list_dep_vari + 
0x107 
  [5] 0x00002AAAAD69ADAD 
_Z20sqlra_inval_obj_hardP8sqlrr_cbP23sqlra_anchor_dependencyP23s 
qlra_cached_dependencyjiPtj + 0x665 
  [6] 0x00002AAAAD69C08B 
_Z21sqlra_inval_vars_hardP8sqlrr_cbhPhjS1_sjiPtj + 0x3A9 
  [7] 0x00002AAAAD69C676 
_Z22sqlra_event_inval_hardP8sqlrr_cbP20sqlr_rpc_apm_request + 
0x9C 
  [8] 0x00002AAAAD684F8D 
_Z19sqlra_execute_eventP8sqlrr_cbP20sqlr_rpc_apm_request + 0x31D 
  [9] 0x00002AAAAD6B520C 
_Z16sqlrk_apm_routerP8sqlrr_cbP16sqlkdRqstRplyFmtjPP15SQLR_RPCME 
SSAGE + 0x216 
 
2015-09-11-17.22.28.097169-240 E123411844E977      LEVEL: 
Critical 
PID     : 21335                TID  : 46923113294144PROC : 
db2sysc 4 
INSTANCE: db2inst1             NODE : 004          DB   : SAMPLE 
APPHDL  : 1-33639              APPID: 
10.60.83.96.64331.150911202928 
AUTHID  : USER1 
EDUID   : 164366               EDUNAME: db2agntp (SAMPLE) 4 
FUNCTION: DB2 UDB, SQO Memory Management, 
sqloDiagnoseFreeBlockFailure, probe:10 
MESSAGE : ADM14001C  An unexpected and critical error has 
occurred: "Panic". 
          The instance may have been shutdown as a result. 
"Automatic" FODC 
          (First Occurrence Data Capture) has been invoked and 
diagnostic 
          information has been recorded in directory 
 
"/home/tst/apm/dbs/db2inst1/db2dump/FODC_Panic_2015-09-11-17.22. 
28. 
          051370_0004/". Please look in this directory for 
detailed evidence 
          about what happened and contact IBM support if 
necessary to diagnose 
          the problem. 
 
The stack looks like this: 
 
0x00002AAAB045B1EF ossDumpStackTraceEx + 0x01ef 
0x00002AAAB0455FAE _ZN11OSSTrapFile6dumpExEmiP7siginfoPvm + 
0x00cc 
0x00002AAAAD3A09BF sqlo_trce + 0x03fb 
0x00002AAAAD3E9B7F sqloEDUCodeTrapHandler + 0x02db 
0x00002AAAAD3DB87C sqloCrashOnCriticalMemoryValidationFailure + 
0x0020 
0x00002AAAAD3E0F83 
_ZN13SQLO_MEM_POOL32diagnoseMemoryCorruptionAndCrashEmPKcb + 
0x02c3 
0x00002AAAAE158649 sqlofmblkEx + 0x091b 
0x00002AAAAB848775 _Z11sqlofmblkExPKcmP13SQLO_MEM_POOLPv + 
0x0009 
0x00002AAAAD69871D 
_Z23sqlra_cache_del_dep_varP8sqlrr_cbPP18sqlra_list_dep_vari + 
0x0107 
0x00002AAAAD69ADAD 
_Z20sqlra_inval_obj_hardP8sqlrr_cbP23sqlra_anchor_dependencyP23s 
qlra_cached_dependencyjiPtj + 0x0665 
0x00002AAAAD69C08B 
_Z21sqlra_inval_vars_hardP8sqlrr_cbhPhjS1_sjiPtj + 0x03a9 
0x00002AAAAD69C676 
_Z22sqlra_event_inval_hardP8sqlrr_cbP20sqlr_rpc_apm_request + 
0x009c 
0x00002AAAAD684F8D 
_Z19sqlra_execute_eventP8sqlrr_cbP20sqlr_rpc_apm_request + 
0x031d 
0x00002AAAAD6B520C 
_Z16sqlrk_apm_routerP8sqlrr_cbP16sqlkdRqstRplyFmtjPP15SQLR_RPCME 
SSAGE + 0x0216 
0x00002AAAAD5C59B8 _Z16sqlrr_rpc_routerP8sqlrr_cb + 0x06d0 
0x00002AAAAD5C6E40 
_Z21sqlrr_subagent_routerP8sqeAgentP12SQLE_DB2RA_T + 0x0d82 
0x00002AAAAC7642EF _Z20sqleSubRequestRouterP8sqeAgentPjS1_ + 
0x0679 
0x00002AAAAC764D5E _Z21sqleProcessSubRequestP8sqeAgent + 0x00a6 
0x00002AAAAB96A8A3 _ZN8sqeAgent6RunEDUEv + 0x0649 
0x00002AAAAC068BD0 _ZN9sqzEDUObj9EDUDriverEv + 0x00a6
Problem Summary:
**************************************************************** 
* USERS AFFECTED:                                              * 
* ALL                                                          * 
**************************************************************** 
* PROBLEM DESCRIPTION:                                         * 
* See Error Description                                        * 
**************************************************************** 
* RECOMMENDATION:                                              * 
* Upgrade to Db2 10.5 Fix Pack 9 or higher                     * 
****************************************************************
Local Fix:
Do not ctrl-C out of DROP TABLE
Solution
First fixed in Db2 10.5 Fix Pack 9
Workaround
not known / see Local fix
Timestamps
Date  - problem reported    :
Date  - problem closed      :
Date  - last modified       :
01.12.2016
29.09.2017
16.10.2017
Problem solved at the following versions (IBM BugInfos)
9.0.
Problem solved according to the fixlist(s) of the following version(s)