DB2 - Problem description
| Problem IC65580 | Status: Closed |
HADRV97_MONITOR NOT REPORTING CORRECT HADR STATE AFTER TAKEOVER WHILE TSAMP IN MANUAL MODE | |
| product: | |
DB2 FOR LUW / DB2FORLUW / 970 - DB2 | |
| Problem description: | |
Starting with primary in a peer state:
Initial State
$ db2pd -hadr -db hadrdb
Role State SyncMode HeartBeatsMissed
LogGapRunAvg (bytes)
Primary Peer Sync 0 0
$ db2pd -hadr -db hadrdb
HADR Information:
Role State SyncMode HeartBeatsMissed
LogGapRunAvg (bytes)
Standby Peer Sync 0 0
root:# lssam -g db2_db2inst1_db2inst1_HADRDB-rg
Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB-rg
Nominal=Online
|- Online
IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs
|- Online
IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:samp1
'- Offline
IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:samp2
'- Online IBM.ServiceIP:db2ip_9_42_153_214-rs
|- Online
IBM.ServiceIP:db2ip_9_42_153_214-rs:samp1
'- Offline
IBM.ServiceIP:db2ip_9_42_153_214-rs:samp2
Re-create the problem by placing TSAMP in Manual mode:
Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB-rg
Automation=Manual Nominal=Online
|- Online
IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs
|- Online
IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:samp1
'- Offline
IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:samp2
'- Online IBM.ServiceIP:db2ip_9_42_153_214-rs
|- Online
IBM.ServiceIP:db2ip_9_42_153_214-rs:samp1
'- Offline
IBM.ServiceIP:db2ip_9_42_153_214-rs:samp2
and issue 'db2 takeover hadr on database hadrdb' on node "samp2"
root:# lsrsrc -Ab IBM.Test
Resource Persistent and Dynamic Attributes for IBM.Test
resource 1:
Name =
"db2_HADRDB_samp2_UserInitiatedMove_db2inst1_db2inst1"
ResourceType = 0
AggregateResource = "0x3fff 0xffff 0x00000000 0x00000000
0x00000000 0x00000000"
ForceOpState = 0
TimeToStart = 0
TimeToStop = 0
WriteToSyslog = 0
MoveTime = 0
MoveFail = 0
ForceMoveState = 0
ActivePeerDomain = "hadrdom"
NodeNameList = {"samp1"}
OpState = 2
ConfigChanged = 0
ChangedAttributes = {}
MoveState = [0,{}]
OpQuorumState = 0
Check HADR roles after 'db2 takeover hadr ...' command
completed successfully :
$ db2pd -hadr -db hadrdb
Role State SyncMode HeartBeatsMissed
LogGapRunAvg (bytes)
Standby Peer Sync 0 0
$ db2pd -hadr -db hadrdb
HADR Information:
Role State SyncMode HeartBeatsMissed
LogGapRunAvg (bytes)
Primary Peer Sync 0 0
But 'lssam' doesn't change :
Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB-rg
Automation=Manual Nominal=Online
|- Online
IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs
|- Online
IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:samp1
'- Offline
IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:samp2
'- Online IBM.ServiceIP:db2ip_9_42_153_214-rs
|- Online
IBM.ServiceIP:db2ip_9_42_153_214-rs:samp1
'- Offline
IBM.ServiceIP:db2ip_9_42_153_214-rs:samp2
This shows that the Online/Offline states for the HADR database
do not match the Primary/Standby role reported by 'db2pd -hadr
...' command.
The mismatch exists because TSA is running in manual mode and so
the temporary TSA IBM.Test resource remains set :
root:# lsrsrc -Ab IBM.Test
Resource Persistent and Dynamic Attributes for IBM.Test
resource 1:
Name =
"db2_HADRDB_samp2_UserInitiatedMove_db2inst1_db2inst1"
ResourceType = 0
AggregateResource = "0x3fff 0xffff 0x00000000 0x00000000
0x00000000 0x00000000"
ForceOpState = 0
TimeToStart = 0
TimeToStop = 0
WriteToSyslog = 0
MoveTime = 0
MoveFail = 0
ForceMoveState = 0
ActivePeerDomain = "hadrdom"
NodeNameList = {"samp1"}
OpState = 2
ConfigChanged = 0
ChangedAttributes = {}
MoveState = [0,{}]
OpQuorumState = 0
Once TSA is switched back to auto mode, the resource state will
be updated correctly and no action is needed. | |
| Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * ALL * **************************************************************** * PROBLEM DESCRIPTION: * * See Problem Description above. * **************************************************************** * RECOMMENDATION: * * Upgrade to DB2 Version 9.7 Fix Pack 2 * **************************************************************** | |
| Local Fix: | |
If you wish to stay in manual mode, the mismatch can be resolved
by removing the IBM.Test resource:
rmrsrc -s "Name =
'db2_HADRDB_samp2_UserInitiatedMove_db2inst1_db2inst1'" IBM.Test
After the next HADR monitor is run for the associated HADR
database, 'lssam' reflects the true Online/Offline state based
on the location of the primary/standby roles :
Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB-rg
Nominal=Online
|- Online
IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs
|- Offline
IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:samp1
'- Online
IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:samp2
'- Online IBM.ServiceIP:db2ip_9_42_153_214-rs
|- Offline
IBM.ServiceIP:db2ip_9_42_153_214-rs:samp1
'- Online
IBM.ServiceIP:db2ip_9_42_153_214-rs:samp2 | |
| available fix packs: | |
DB2 Version 9.7 Fix Pack 6 for Linux, UNIX, and Windows | |
| Solution | |
First fixed in DB2 Version 9.7 Fix Pack 2 | |
| Workaround | |
not known / see Local fix | |
| Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 14.01.2010 21.03.2011 08.03.2012 |
| Problem solved at the following versions (IBM BugInfos) | |
9.7.FP2 | |
| Problem solved according to the fixlist(s) of the following version(s) | |
| 9.7.0.2 |
|