DB2 - Problem description
| Problem IC92924 | Status: Closed |
DB2 HANGS WHEN STMM RESIZING BUFFERPOOL AFTER "SORT LIST SERVICES PROGRAMMING ERROR" | |
| product: | |
DB2 FOR LUW / DB2FORLUW / 970 - DB2 | |
| Problem description: | |
DB2 might hang with the following symptomps:
1. There is a lot of latch contention on the following latch
SQLO_LT_sqeDBMgr__dbMgrLatch.
2. STMM shows a stack similar to the following:
0x090000001520BD88 sqloWaitEDUWaitPost + 0x218
0x0900000016D68120
sqlbRemInvalidPagesFromBufferPool__FP15SQLB_BufferPoolUiN32P12SQ
LB_GLOBALS + 0x334
0x0900000016D64DA0
sqlbDecreaseBufferpoolSize__FP15SQLB_BufferPoolP21SQLB_BP_UC_ALT
ER_INFOP12SQLB_GLOBALS + 0x554
0x0900000016D63FE4
sqlbResizeBufferPool__FP15SQLB_BufferPoolP21SQLB_BP_UC_ALTER_INF
OP12SQLB_GLOBALS + 0x1E8
0x09000000165EBBA4 sqlbAlterAutomaticBufferPool__FUiiP8sqeAgent
+ 0x654
0x0900000015F3B058
sqlrlStmmAlterBufferPool__FP8sqeAgentPciT3P14db2UCinterfaceP5sql
ca + 0x414
0x0900000015F88784 stmmAlterBufferPool__FP8sqeAgentPciT3 + 0x1F4
0x0900000015F85924
stmmResizeRecord__FP21stmmCostBenefitRecordP16sqeLocalDatabase +
0xEB4
0x0900000015F838B8
stmmDecreaseEntriesAndRemoveFromList__FP16sqeLocalDatabasePP21st
mmCostBenefitRecordPUi + 0xBE0
0x0900000015F820F0
stmmResizeEntriesAndRemoveFromList__FPP21stmmCostBenefitRecordP1
6sqeLocalDatabase + 0xCC
0x0900000015F81E94
stmmTuneMemory__FPP21stmmCostBenefitRecordP16sqeLocalDatabase +
0x11C
0x0900000015566824 stmmMemoryTunerMain + 0x488
0x09000000153AA25C sqleIndCoordProcessRequest__FP8sqeAgent +
0x198
0x09000000150F1B94 RunEDU__8sqeAgentFv + 0x16C
0x09000000150EE418 EDUDriver__9sqzEDUObjFv + 0xF4
0x09000000150E51CC sqloEDUEntry + 0x264
There are not entries in the STMM logs showing this activity.
3. The last agent trying to deactivate the database is on the
following stack:
0x090000001520BD88 sqloWaitEDUWaitPost + 0x218
0x09000000150D75C4 sqloWaitEDUWaitPost@glue113 + 0x78
0x09000000150D6FBC
TermDbConnect__16sqeLocalDatabaseFP8sqeAgentP5sqlcai + 0x620
0x09000000150D282C
AppStopUsing__14sqeApplicationFP8sqeAgentUcP5sqlca + 0xD88
0x09000000153E40F4 sqlesrspWrp__FP14db2UCinterface + 0xA8
0x09000000153E4368 sqleUCagentConnectReset + 0xF8
0x0900000015429340
@63@sqljsCleanup__FP8sqeAgentP14db2UCconHandle + 0x910
0x090000001542A2F0
@63@sqljsDrdaAsInnerDriver__FP18SQLCC_INITSTRUCT_Tb + 0x330
0x0900000015429D1C sqljsDrdaAsDriver__FP18SQLCC_INITSTRUCT_T +
0x100
0x09000000150F1D1C RunEDU__8sqeAgentFv + 0x2F4
0x09000000150EE418 EDUDriver__9sqzEDUObjFv + 0xF4
0x09000000150E51CC sqloEDUEntry + 0x264
4. At some point in time there are entries in the db2diag.log
like which have the message "Sort Error. Failed sanity check
before unfixing page, fixount is 0, aborting sort" as key:
FUNCTION: DB2 UDB, sort/list services,
sqlsSanityCheckPageAlreadyUnfixed, probe:4099
MESSAGE : ZRC=0x82130001=-2112684031=SQLS_NONSEVERE_PE
"Sort List Services programming error."
DIA8532C An internal processing error has occurred.
DATA #1 : String, 81 bytes
Sort Error. Failed sanity check before unfixing page, fixount is
0, aborting sort
DATA #2 : Fix control block, PD_TYPE_SQLB_FIX_CB, 168 bytes
accessMethod: SQLB_POOL_RELATIVE
fixMode: 2 SQLBOLD/SQLBOLDS
buffptr: 0x0000000000000000
bpdPtr: 0x0770000024973170
dmDebugHdl: 0
objectPageNum: 2280
empDiskPageNum: 4294967295
unfixFlags: 6 SQLB_UFIX_PURGE_MODE |
SQLB_UFIX_DEFERRED_MODE
dirtyState: SQLBCLEAN
fixInfoFlags:
regEDUid: 0
Pagekey: {pool:1;obj:2;type:128} PPNum:2280
And there is a matching trap file with a stack like:
pthread_kill + 0x88
sqloDumpEDU + 0x34
sqldDumpContext__FP9sqeBsuEduiN42PCcPvT2 + 0xC4
sqldDumpContext__FP9sqeBsuEduiN42PCcPvT2@glue5AE + 0x98
sqlrr_dump_ffdc__FP8sqlrr_cbiT2 + 0x388
sqlzeDumpFFDC__FP8sqeAgentUiP5sqlcai + 0x30
sqlzeDumpFFDC__FP8sqeAgentUiP5sqlcai@glue534 + 0x80
sqlzeMapZrc__FP8sqeAgentUiUlT2P5sqlcaiPC12sqlzeContextb + 0x1F8
sqlrrMapZrc__FP8sqlrr_cbUiUli@glue3C3 + 0x80
sqlriclo__FP8sqlrr_cbP9sqlri_taoi + 0xA8
sqlriclo__FP8sqlrr_cbP9sqlri_taoi@glueBA6 + 0x78
sqlricjp__FP8sqlrr_cbP12sqlri_opparmilT4 + 0x30
sqlricls_simple__FP8sqlrr_cbil + 0x170
sqlrr_process_close_request__FP8sqlrr_cbiN32 + 0x18C
sqlrr_close__FP14db2UCinterfaceP15db2UCCursorInfo + 0x304
sqljs_ddm_clsqry__FP14db2UCinterfaceP13sqljDDMObject + 0x760
sqljsParseRdbAccessed__FP13sqljsDrdaAsCbP13sqljDDMObjectP14db2UC
interface + 0x180
.sqljsParse.fdpr.clone.0__FP13sqljsDrdaAsCbP14db2UCinterfaceP8sq
eAgentb + 0x6DC
@63@sqljsSqlam__FP14db2UCinterfaceP8sqeAgentb + 0x2D4
@63@sqljsDriveRequests__FP8sqeAgentP14db2UCconHandle + 0xB4
@63@sqljsDrdaAsInnerDriver__FP18SQLCC_INITSTRUCT_Tb + 0x2D0
This might show also with entries similar to :
FUNCTION: DB2 UDB, sort/list services,
sqlsSanityCheckPageAlreadyUnfixed, probe:4099
MESSAGE : ZRC=0x82130001=-2112684031=SQLS_NONSEVERE_PE
"Sort List Services programming error."
DIA8532C An internal processing error has occurred.
DATA #1 : String, 81 bytes
Sort Error. Failed sanity check before unfixing page, fixount is
0,
aborting sort
DATA #2 : Fix control block, PD_TYPE_SQLB_FIX_CB, 168 bytes
accessMethod: SQLB_POOL_RELATIVE
fixMode: 2 SQLBOLD/SQLBOLDS
buffptr: 0x0000000000000000
bpdPtr: 0x07700000ca4af6a0
dmDebugHdl: 0
objectPageNum: 4
empDiskPageNum: 4294967295
unfixFlags: 2 SQLB_UFIX_PURGE_MODE
dirtyState: SQLBCLEAN
fixInfoFlags:Page 4 of 9
regEDUid: 0
Pagekey: {pool:1;obj:2;type:128} PPNum:4
....
CALLSTCK: (Static functions may not be resolved correctly, as
they are
resolved to the nearest symbol)
[0] 0x0900000012A79A84 pdLog + 0xF4
[1] 0x090000001410A168 pdLog@glue421 + 0x12C
[2] 0x09000000133A8E6C
sqlsmergerec__FP8sqeAgentP10SQLS_SLDESP10SQLS_MBUFSUl + 0x260
[3] 0x09000000128E0BA0
sqlsfetc__FP8sqeAgentP8SQLD_CCBiP10SQLD_DPREDPP10SQLD_VALUEP8SQL
Z_RIDPc + 0x438
[4] 0x0900000012A5F288
sqlriPrefetchRIDs__FP8sqlrr_cbP8sqlri_lfl + 0x260
[5] 0x09000000128E1BD0 sqlriListFetch__FP8sqlrr_cb + 0x4C
[6] 0x090000001293B16C sqlriNljnPiped__FP8sqlrr_cb + 0x26C
[7] 0x090000001293C624
sqlriSectInvoke__FP8sqlrr_cbP12sqlri_opparm + 0x30
[8] 0x0900000012B6F024
sqlrr_process_fetch_request__FP14db2UCinterface + 0x1C0
[9] 0x09000000129A68D0
sqlrr_fetch__FP14db2UCinterfaceP15db2UCCursorInfo + 0x38C
The hung situation is due to the sort problem documented in
point 4 above.
STMM will hang on the page after the sort failure, making it
impossible to deactivate the database or allowing
new connections. | |
| Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * ALL * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Upgrade to Db2 Version 9.7 and Fix Pack 9. * **************************************************************** | |
| Local Fix: | |
Make sure that you explicitly activate the database with "db2 activate database..." which will avoid hangs on disconnections of the last agent. Note that if you hit this issue, even after activating the database explicitly, STMM will still be stuck and you might need to restart the database for it to continue working. | |
| available fix packs: | |
DB2 Version 9.7 Fix Pack 9 for Linux, UNIX, and Windows | |
| Solution | |
Problem was first fixed in DB2 Version 9.7 and Fix Pack 9. | |
| Workaround | |
not known / see Local fix | |
| Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 07.06.2013 17.12.2013 17.12.2013 |
| Problem solved at the following versions (IBM BugInfos) | |
9.0., 9.7. | |
| Problem solved according to the fixlist(s) of the following version(s) | |
| 9.7.0.9 |
|
| 9.7.0.9 |
|