DB2 - Problem description
Problem IT19768 | Status: Closed |
DB2 MIGHT HANG DURING A MANUAL 'UPDATE DATABASE CFG' TO DECREASETHE DATABASE MEMORY CONFIG SETTING | |
product: | |
DB2 FOR LUW / DB2FORLUW / A50 - DB2 | |
Problem description: | |
This is a very rare race deadlatch/loop condition between two db2 edus. The first edu is servicing a manual database memory decrease. To satisfy this request it tries to reduce the size of a buffer pool. This size reduction can only complete when there are no further pinned buffer pool pages in the pages that will be discarded. When such pages are seen, the agent waits until those are no longer pinned. ( pinned typically means that another agent is actively using those pages ) This wait is what we see in the waiter stack : 0x0900000000112014 thread_wait + 0x94 0x09000000212FD520 sqloWaitEDUWaitPost + 0x248 0x090000001E26A080 sqlbResizeBufferPool__FP15SQLB_BufferPoolP21SQLB_BP_UC_ALTER_INF OP12SQLB_GLOBALS + 0xA20 0x090000001E26E564 sqlbResizeBufferPool__FP15SQLB_BufferPoolP21SQLB_BP_UC_ALTER_INF OP12SQLB_GLOBALS + 0x246C 0x090000001E26F128 sqlbResizeBufferPool__FP15SQLB_BufferPoolP21SQLB_BP_UC_ALTER_INF OP12SQLB_GLOBALS + 0x3030 0x090000001FF0D640 sqlbAlterAutomaticBufferPool__FPcUiiP8sqeAgent + 0xAE8 0x0900000022909EBC sqlrlStmmAlterBufferPool__FP8sqeAgentPciT3P14db2UCinterfaceP5sql ca + 0x448 0x090000002290AAB8 sqlbScaleAutoBPsByFactor__FP8sqeAgentP10sqlf_dbcfdPcdPUlb + 0x818 0x0900000022906204 stmmScaleAutosOnDBMemDecrease__FUlT1P16sqeLocalDatabase + 0x16F4 0x0900000022C154E4 sqlf_dynamic_db_update__FP16sqeLocalDatabaseP11db2CfgParamP10sql f_dbcfdP5sqlca + 0xB604 0x0900000022C1516C sqlf_dynamic_db_update__FP16sqeLocalDatabaseP11db2CfgParamP10sql f_dbcfdP5sqlca + 0xB28C In the latch information of the stack file, we can see it is also holding a latch to protect the database configuration from concurrent updates : <LatchInformation> Holding Latch type: (SQLO_LT_SQLE_KRCB__cfg_change_latch) - Address: (0x78000000045a3d8), Line: 4608, File: sqlf_db_op.C HoldCount: 1 </LatchInformation> The session which then holds the pinned page, has to wait as well because it is in a code path where it dynamically wants to adjust the sort heap. It loops endlessly in stmmUpdateDBConfig as seen in this stack: 0x0900000000564910 _p_nsleep + 0x10 0x0900000000039364 nsleep + 0xE4 0x090000000015E310 nanosleep + 0x190 0x0900000004204008 ossSleep + 0xA8 0x090000002067236C sqlorest + 0x188 0x0900000022CCD1C8 sqlfUpdateDbCfg__FP16sqeLocalDatabaseP6db2CfgUiiT4P5sqlcas + 0x1C50 0x0900000022C49E04 sqlfUpdateDbCfg__FP16sqeLocalDatabaseP6db2CfgUiiT4P5sqlcas + 0x4C1C 0x0900000022C5D904 sqlfDispatchDbCfgUpdate__FP16sqeLocalDatabaseP6db2CfgiT3P5sqlca + 0x110 0x09000000228FCF14 stmmUpdateDBConfig__FP16sqeLocalDatabaseUlPclbN25 + 0x1F8 0x09000000228FCCC4 stmmUpdateSortHeap__FP16sqeLocalDatabaseUibT3 + 0x30 0x0900000020184E6C stmmReactToSHeapThresOverflowOrMinSortHeap__FP16sqeLocalDatabase Pb + 0x4F4 0x090000002320BDF4 sqlri_hsjnFlushSinglePartition__FP8sqlrr_cbP11sqlri_hsjnolP20sql ri_hsjnTupleBlock + 0x960 This is because it needs to obtain the latch held by the first session. It assumes that the STMM edu holds this latch and goes into a wait loop for the STMM to release the latch. However, because of the manual database_memory decrease, STMM has been paused until that decrease completes. So this loop is also infinite and hence the pinned page is never released and the first session also never makes progress. | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * ALL * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Upgrade to Db2 10.5 Fix Pack 9 or higher * **************************************************************** | |
Local Fix: | |
Solution | |
First fixed in Db2 10.5 Fix Pack 9 | |
Workaround | |
not known / see Local fix | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 20.03.2017 29.09.2017 29.09.2017 |
Problem solved at the following versions (IBM BugInfos) | |
9.0. | |
Problem solved according to the fixlist(s) of the following version(s) |