suche 36x36
Latest versionsfixlist
11.1.0.7 FixList
10.5.0.9 FixList
10.1.0.6 FixList
9.8.0.5 FixList
9.7.0.11 FixList
9.5.0.10 FixList
9.1.0.12 FixList
Have problems? - contact us.
Register for free anmeldung-x26
Contact form kontakt-x26

DB2 - Problem description

Problem IT24027 Status: Closed

DB2 LOG REPLAY (RECOVERY, RFWD, HADR) MIGHT HANG WITH DB2REDOM IN
SQLPRGETFREEQE->WAIT AND DB2REDOWS IN SQLPRFINDQUEUE->WAIT.

product:
DB2 FOR LUW / DB2FORLUW / A50 - DB2
Problem description:
Under rare conditions, typically with a long sequence
(thousands) of single-record transactions without a commit, that
has to be replayed, Db2 log replay might hang with all EDUs
ending up in a wait state. Log replay scenarios are:
- crash recovery
- rollforward
- HADR replication

In case of crash recovery, "db2pd -recovery" and "list
utilities" will indicate an ongoing recovery, but "completed
work" will not move forward. Stacks from EDUs involved in
recovery will show the recovery master (db2redom) in:

sqloWaitInterrupt
sqloWaitEDUWaitPost
sqlprGetFreeQE
sqlpPRecReadLog
sqlpParallelRecovery

and all recovery workers (db2redow) in:
sqloWaitInterrupt
sqloWaitEDUWaitPost
sqlprFindQueue
sqlpPRecProcLog
sqlpParallelRecovery
sqleSubCoordProcessRequest
The same EDUs will be involved in the remaining scenarios
(rollforward and HADR).


Condition leading to the hang is very likely to cause the
recovery master to grow the transaction table, which will
trigger a message from db2redom in db2diag.log similar to this
one:
2018-02-01-12.00.00.850000+060 I179497F539          LEVEL: Info
PID     : 5092                 TID : 4488           PROC :
db2syscs
INSTANCE: DB2                  NODE : 000           DB   :
SAMPLE
APPHDL  : 0-7                  APPID: *LOCAL.DB2.180201115810
AUTHID  : db2inst1             HOSTNAME: db2host
EDUID   : 4488                 EDUNAME: db2redom (SAMPLE) 0
FUNCTION: DB2 UDB, data protection services, sqlptintMore,
probe:701
DATA #1 : 
Current usable transaction entries are 14463 on log stream 0.
Problem Summary:
****************************************************************
* USERS AFFECTED:                                              *
* All                                                          *
****************************************************************
* PROBLEM DESCRIPTION:                                         *
* See Error Description                                        *
****************************************************************
* RECOMMENDATION:                                              *
* Upgrade to DB2 version 10.5 Fix Pack 10 or higher.           *
****************************************************************
Local Fix:
Problem is related to the internal logic of work parallelization
during the recovery, which depends on the number of recovery
worker EDUs (db2redow). By default number of them is calculated
based on the number of CPUs. In case of a hang like this, one
can try to force Db2 to use a higher number of recovery workers
using DB2BPVARS:
$ echo "PREC_NUM_AGENTS=64" > db2bpvars.cfg
$ db2set DB2BPVARS=$(pwd)/db2bpvars.cfg
and see if that allows recovery to complete. Setting requires
instance restart to be applied and should be cleared once
problem is fixed.
Solution
Workaround
not known / see Local fix
BUG-Tracking
forerunner  : 
follow-up : IT24028 
Timestamps
Date  - problem reported    :
Date  - problem closed      :
Date  - last modified       :
12.02.2018
16.07.2018
16.07.2018
Problem solved at the following versions (IBM BugInfos)
Problem solved according to the fixlist(s) of the following version(s)