DB2 - Problem description
| Problem IC71743 | Status: Closed |
DB2 HANGS DUE TO INDIRECT DEAD LATCH BETWEEN THE AGENT EXECUTING A QUERY AND AN INDEPENDENT COORDINATOR AGENT SPAWNED | |
| product: | |
DB2 FOR LUW / DB2FORLUW / 970 - DB2 | |
| Problem description: | |
When running out of package cache memory space, DB2 will spawn
an independent coordinator agent to clean up package cache This
might cause an indirect dead latch between the agent executing a
query and the spawned independent coordinator agent. The
scenario of the dead latch situation can be illustrated as
below.
1) the agent executing a query spawns an independent coordinator
agent to clean up package cache and is waiting for the new
agent's post back while itself is holding a latch.
For example, you might see the agent 8244 waiting for the post
had the stacks similar to below:
semtimedop + 0x000a
sqloWaitEDUWaitPost + 0x019d
sqeIcoordCB18WaitForSanityCheckEP5sqlca + 0x003d
sqeIcoordCB26DispatchIndependentDBAgent+ 0x0281
ABPIcoordAgent16spawnIcoordAgent + 0x00cb
ABPDispatcher10spawnAgent + 0x006f
ABPDispatcher23activateTaskProContexts + 0x0165
abpActivateTaskProContextsPK8sqeAgent + 0x0069
sqlra_cache_del_var + 0x0a26
sqlra_csm_drop_var+ 0x01a8
sqlra_csm_clean_lru + 0x0e03
sqlra_cache_space_mgmtP8sqlrr_cbm + 0x00ec
sqlra_cache_reserve_memoryP8sqlrr_cbm + 0x02d8
sqlra_rollup_dyn_stmtPK11sqlmon_stmt + 0x04d6
sqlmon_conn8stmt_end + 0x0320
sqlmon_acb14agent_stmt_endEjbP8sqlrr_cb + 0x0668
sqlrr_rds_common_postP14db2UCinterfaceiil + 0x1350
sqlrr_prepareP14db2UCinterfaceP16db2UCprepareInfo + 0x02af
and db2pd -latches show one of the latches it's holding was an
RDS latch for the monitor:
Address Holder Waiter Filename LOC
LatchType HoldCount
0x0000000200DBBB68 8244 25 Unknown
681 SQLO_LT_sqlmon_conn__rds_latch 1
2) On the other hand, the newly spawned independent coordinator
agent is waiting for a database latch being hold by a third
agent, so it never got a chance to post back to the agent which
has spawned it.
For example, you might see the spawned independent agent 8399
had the stacks similar to below.
sqloSpinLockConflict + 0x0240
sqeLocalDatabase29ShouldWeStartBackgroundAgents + 0x042a
sqlm_a_initP8sqeAgent + 0x023a
sqeApplication20InitEngineComponents+ 0x06c9
sqeApplication13AppStartUsing + 0x04f2
sqleSubAgentStartUsingP8sqeAgentP16SQLE_CLIENT_INFO + 0x035a
sqeApplication22AppSecondaryStartUsing + 0x0202
sqleIndCoordProcessRequestP8sqeAgent + 0x059f
sqleIndCoordProcessRequestP8sqeAgent + 0x02d8
sqleIndCoordProcessRequestP8sqeAgent + 0x0084
sqeAgent6RunEDUEv + 0x0381
and db2pd -latches show it's waiting for the database latch hold
by the third agent 25:
Address Holder Waiter Filename LOC
LatchType HoldCount
0x00000002004D7F48 25 8339 Unknown
395 SQLO_LT_sqeLocalDatabase__dblatch 1
3) The third agent is in turn waiting for the latch being hold
by the original agent executing the query as in 1).
For example, you might see the third agent 25 had the stacks
similar to below.
sqloSpinLockConflict + 0x0240
sqloxult_trackP11sqlo_xlatch14SQLO_LT_VALUES + 0x005f
sqm_collect_db_bp_data + 0x03b0
sqm_snap_dbase + 0x013e
sqlmonssagnt + 0x0466
sqlmonssbackendP12SQLE_DB2RA_T + 0x0500
sqlesrvrP14db2UCinterface + 0x05bd
sqleMappingFnServerP5sqldaP5sqlca + 0x049d
sqlerKnownProcedure + 0x0270
sqlerCallDLP14db2UCinterfaceP9UCstpInfo + 0x0492
and db2pd -latches show it's waiting for the RDS latch hold by
the original agent 8244.
Address Holder Waiter Filename LOC
LatchType HoldCount
0x0000000200DBBB68 8244 25 Unknown
681 SQLO_LT_sqlmon_conn__rds_latch 1
As a result, the dead latch occurs. Recycling the instance will
solve the dead latch.
This issue doesn't occur on DB2 version prior to DB2 UDB V9.7
for LUW. | |
| Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * The issue doesn't occur on DB2 versions prior to DB2 UDB * * V9.7 for LUW * **************************************************************** * PROBLEM DESCRIPTION: * * See above Error Description * **************************************************************** * RECOMMENDATION: * * Upgrade to DB2 Version 9.7 Fix Pack 4 * **************************************************************** | |
| Local Fix: | |
1) turn off the default statement switch (DFT_MON_STMT) in the Database Manager Configuration. Also need to make sure there is no application turn on the database manager statement switch. For example, db2 Governor tool will turn on the statement switch. If db2 Governor is used, it need to be disabled as well. 2) increasing the package cache size(PCKCACHESZ) in the Database Configuration may also reduce the chance of getting the dead latch | |
| available fix packs: | |
DB2 Version 9.7 Fix Pack 4 for Linux, UNIX, and Windows | |
| Solution | |
Problem was first fixed in Version 9.7 Fix Pack 4 | |
| Workaround | |
See above Local Fix | |
| Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 07.10.2010 02.05.2011 02.05.2011 |
| Problem solved at the following versions (IBM BugInfos) | |
9.7.FP4 | |
| Problem solved according to the fixlist(s) of the following version(s) | |
| 9.7.0.4 |
|