suche 36x36
Latest versionsfixlist
11.1.0.7 FixList
10.5.0.9 FixList
10.1.0.6 FixList
9.8.0.5 FixList
9.7.0.11 FixList
9.5.0.10 FixList
9.1.0.12 FixList
Have problems? - contact us.
Register for free anmeldung-x26
Contact form kontakt-x26

DB2 - Problem description

Problem IT33851 Status: Closed

TSA INITIATED TAKEOVER IN AN AUTOMATED HADR ENVIRONMENT MAY FAILDUE TO
PEER WINDOW HAVING EXPIRED.

product:
DB2 FOR LUW / DB2FORLUW / B50 - DB2
Problem description:
In a TSA automated HADR environment, the standby database may
not be able to successfully takeover as the new primary in the
event of a failure on the old primary host due to the peer
window expiring. In this case, the following db2diag.log error
will be observed:

2020-05-21-09.19.47.704063-420 I6042A435            LEVEL:
Warning
PID     : 13933                TID : 4395211155728  PROC :
db2sysc 0
INSTANCE: seeluser             NODE : 000           DB   :
HADRDB
HOSTNAME: svlxtorf.svl.ibm.com
EDUID   : 62                   EDUNAME: db2hadrs.0.0 (HADRDB) 0
FUNCTION: DB2 UDB, High Availability Disaster Recovery,
hdrEduAcceptEvent, probe:20202
MESSAGE : Peer window ends. Peer window expired.
2020-05-21-09.19.47.704162-420 E6478A470            LEVEL: Event
PID     : 13933                TID : 4395211155728  PROC :
db2sysc 0
INSTANCE: seeluser             NODE : 000           DB   :
HADRDB
HOSTNAME: svlxtorf.svl.ibm.com
EDUID   : 62                   EDUNAME: db2hadrs.0.0 (HADRDB) 0
FUNCTION: DB2 UDB, High Availability Disaster Recovery,
hdrSetHdrState, probe:10000
CHANGE  : HADR state set to HDR_S_REM_CATCHUP_PENDING (was
HDR_S_DISCONN_PEER), connId=4
2020-05-21-09.19.52.006067-420 I6949A392            LEVEL:
Warning
PID     : 27581                TID : 2199453935008  PROC :
db2gcf
INSTANCE: seeluser             NODE : 000
HOSTNAME: svlxtorf.svl.ibm.com
FUNCTION: DB2 Common, Generic Control Facility, gcf_start,
probe:928
DATA #1 : String, 18 bytes
Current HADR state
DATA #2 : String, 6 bytes
HADRDB
DATA #3 : unsigned integer, 8 bytes
2
2020-05-21-09.19.52.019589-420 E7342A399            LEVEL: Info
PID     : 27581                TID : 2199453935008  PROC :
db2gcf
INSTANCE: seeluser             NODE : 000
HOSTNAME: svlxtorf.svl.ibm.com
FUNCTION: DB2 UDB, high avail services, sqlhaCreateFlagRG,
probe:535
MESSAGE : IBM.Test flag resource has been created
DATA #1 : String, 49 bytes
db2_HADRDB_ClusterInitiatedMove_seeluser_seeluser
2020-05-21-09.19.52.019623-420 I7742A384            LEVEL:
Warning
PID     : 27581                TID : 2199453935008  PROC :
db2gcf
INSTANCE: seeluser             NODE : 000
HOSTNAME: svlxtorf.svl.ibm.com
FUNCTION: DB2 Common, Generic Control Facility, gcf_start,
probe:957
DATA #1 : String, 48 bytes
Initiating cluster driven HADR takeover request.
DATA #2 : String, 6 bytes
HADRDB
2020-05-21-09.19.52.023487-420 E8127A514            LEVEL: Event
PID     : 13933                TID : 4376282261776  PROC :
db2sysc 0
INSTANCE: seeluser             NODE : 000           DB   :
HADRDB
APPHDL  : 0-22                 APPID:
*LOCAL.seeluser.200521161952
AUTHID  : SEELUSER             HOSTNAME: svlxtorf.svl.ibm.com
EDUID   : 80                   EDUNAME: db2agent (HADRDB) 0
FUNCTION: DB2 UDB, base sys utilities,
sqeDBMgr::StartUsingLocalDatabase, probe:13
START   : Received TAKEOVER HADR command.
[12:28 PM] 2020-05-21-09.19.52.024417-420 I8642A870
LEVEL: Warning
PID     : 13933                TID : 4376282261776  PROC :
db2sysc 0
INSTANCE: seeluser             NODE : 000           DB   :
HADRDB
APPHDL  : 0-22                 APPID:
*LOCAL.seeluser.200521161952
AUTHID  : SEELUSER             HOSTNAME: svlxtorf.svl.ibm.com
EDUID   : 80                   EDUNAME: db2agent (HADRDB) 0
FUNCTION: DB2 UDB, High Availability Disaster Recovery,
hdrValidateTakeoverRequest, probe:52050
MESSAGE :
ZRC=0x8280001D=-2105540579=HDR_ZRC_NOT_TAKEOVER_CANDIDATE_FORCED
          "Forced takeover rejected as standby is in the wrong
state or peer window has expired"
DATA #1 : 
HADR standby not ready for takeover.
   Current HADR state: HDR_S_REM_CATCHUP_PENDING
   Light scan status : Inactive
   Peer Window End   : 1590077986
   Current Time      : 1590077991
2020-05-21-09.19.52.024459-420 I9513A713            LEVEL: Error
PID     : 13933                TID : 4376282261776  PROC :
db2sysc 0
INSTANCE: seeluser             NODE : 000           DB   :
HADRDB
APPHDL  : 0-22                 APPID:
*LOCAL.seeluser.200521161952
AUTHID  : SEELUSER             HOSTNAME: svlxtorf.svl.ibm.com
EDUID   : 80                   EDUNAME: db2agent (HADRDB) 0
FUNCTION: DB2 UDB, High Availability Disaster Recovery,
hdrRequestTakeover, probe:39999
MESSAGE :
ZRC=0x8280001D=-2105540579=HDR_ZRC_NOT_TAKEOVER_CANDIDATE_FORCED
          "Forced takeover rejected as standby is in the wrong
state or peer window has expired"
DATA #1 : String, 36 bytes
HADR takeover pre-validation failed

This could be due to the RSCT grace period communication group
setting being enabled. This setting specifies the grace period
that is used when heartbeats are no longer received. Setting
this value delays the host failure detection time which could
cause the peer window to expire prior to the takeover command
being received on the standby.
Problem Summary:
****************************************************************
* USERS AFFECTED:                                              *
* all                                                          *
****************************************************************
* PROBLEM DESCRIPTION:                                         *
* See Error Description                                        *
****************************************************************
* RECOMMENDATION:                                              *
* Upgrade to Db2 11.5.5.0 or higher                            *
****************************************************************
Local Fix:
Setting the RSCT grace period to 0 (disabling it), will allow
automation to acknowledge the host failure sooner, thus reducing
the likelihood of encountering this error. In addition to this
it is recommended that the HADR_PEER_WINDOW value is set to at
least 120 seconds for automated HADR environments.

As root, verify the communication group settings of the domain
via the lscomg command.

Ex:

$ lscomg
Name Sensitivity Period Priority Broadcast SourceRouting
NIMPathName NIMParameters Grace        MediaType
UseForNodeMembership
CG1  4           4      1        Yes       Yes
-1 (Default) 1 (IP)    1
CG2  4           4      1        Yes       Yes
-1 (Default) 1 (IP)    1


If ?Grace? is set to anything other than 0, set it to 0 via the
chcomg command for every communication group:

e.g.
$ chcomg -g 0 CG1
$ chcomg -g 0 CG2

Once disabled, the lscomg output should look as follows:

$ lscomg
Name Sensitivity Period Priority Broadcast SourceRouting
NIMPathName NIMParameters Grace        MediaType
UseForNodeMembership
CG2  4           4      1        Yes       Yes
0 (Disabled) 1 (IP)    1
CG1  4           4      1        Yes       Yes
0 (Disabled) 1 (IP)    1
Solution
Workaround
not known / see Local fix
Timestamps
Date  - problem reported    :
Date  - problem closed      :
Date  - last modified       :
10.08.2020
20.11.2020
20.11.2020
Problem solved at the following versions (IBM BugInfos)
Problem solved according to the fixlist(s) of the following version(s)