DB2 - Problem description
| Problem IC91816 | Status: Closed |
TSA AUTOMATED HADR DATABASE DOES NOT FAILOVER AFTER UNPLUGGING PUBLIC NETWORK CABLE FROM THE PRIMARY SERVER | |
| product: | |
DB2 FOR LUW / DB2FORLUW / A10 - DB2 | |
| Problem description: | |
In a TSA-MP managed HADR environment, if the public network
cable is unplugged from the HADR primary server, the HADR
database is unable to failover to the standby server.
See the following example for more details:
- lssam output prior to unplugging the network cable:
Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB-rg
Nominal=Online
|- Online
IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs
|- Online
IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:node01
'- Offline
IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:node02
'- Online IBM.ServiceIP:db2ip_10_10_3_111-rs
|- Online
IBM.ServiceIP:db2ip_10_10_3_111-rs:node01
'- Offline
IBM.ServiceIP:db2ip_10_10_3_111-rs:node02
Online IBM.ResourceGroup:db2_db2inst1_node01_0-rg Nominal=Online
'- Online IBM.Application:db2_db2inst1_node01_0-rs
'- Online
IBM.Application:db2_db2inst1_hostA_0-rs:node01
Online IBM.ResourceGroup:db2_db2inst1_node02_0-rg Nominal=Online
'- Online IBM.Application:db2_db2inst1_node02_0-rs
'- Online
IBM.Application:db2_db2inst1_node02_0-rs:node02
Online IBM.Equivalency:db2_db2inst1_db2inst1_HADRDB-rg_group-equ
|- Online IBM.PeerNode:node01:node01
'- Online IBM.PeerNode:node02:node02
Online IBM.Equivalency:db2_db2inst1_node01_0-rg_group-equ
'- Online IBM.PeerNode:node01:node01
Online IBM.Equivalency:db2_db2inst1_node02_0-rg_group-equ
'- Online IBM.PeerNode:node02:node02
Online IBM.Equivalency:db2_private_network_0
|- Online IBM.NetworkInterface:en1:node01
'- Online IBM.NetworkInterface:en1:node02
Online IBM.Equivalency:db2_public_network_0
|- Online IBM.NetworkInterface:en2:node02
'- Online IBM.NetworkInterface:en2:node01
- lssam output after the network cable is unplugged:
Pending Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HADRDB-rg
Request=Lock Nominal=Online
|- Offline
IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs
Control=StartInhibitedBecauseSuspended
|- Offline
IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:node01
'- Offline
IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs:node02
'- Online IBM.ServiceIP:db2ip_10_10_3_111-rs
|- Online
IBM.ServiceIP:db2ip_10_10_3_111-rs:node01
'- Offline
IBM.ServiceIP:db2ip_10_10_3_111-rs:node02
Failed offline IBM.ResourceGroup:db2_db2inst1_node01_0-rg
Binding=Sacrificed Nominal=Online
'- Offline IBM.Application:db2_db2inst1_node01_0-rs
'- Offline
IBM.Application:db2_db2inst1_hostA_0-rs:node01
Online IBM.ResourceGroup:db2_db2inst1_node02_0-rg Nominal=Online
'- Online IBM.Application:db2_db2inst1_node02_0-rs
'- Online
IBM.Application:db2_db2inst1_node02_0-rs:node02
Online IBM.Equivalency:db2_db2inst1_db2inst1_HADRDB-rg_group-equ
|- Online IBM.PeerNode:node01:node01
'- Online IBM.PeerNode:node02:node02
Online IBM.Equivalency:db2_db2inst1_node01_0-rg_group-equ
'- Online IBM.PeerNode:node01:node01
Online IBM.Equivalency:db2_db2inst1_node02_0-rg_group-equ
'- Online IBM.PeerNode:node02:node02
Online IBM.Equivalency:db2_private_network_0
|- Online IBM.NetworkInterface:en1:node01
'- Online IBM.NetworkInterface:en1:node02
Online IBM.Equivalency:db2_public_network_0
|- Online IBM.NetworkInterface:en2:node02
'- Offline IBM.NetworkInterface:en2:node01
As displayed in the above lssam output, HADR is stopped
(resource is set to "Offline") on the original primary (node01),
but node02 does not takeover the primary HADR role, i.e. the
HADR resource for node02 is also set as "Offline".
In addition to this, the virtual IP address (IBM.ServiceIP
resource) still binds to the original primary server. (node01)
----------------------------------------------------------------
In the above scenario, whereby the public network cable is
unplugged, the IBM.ServiceIP resource is not brought offline by
TSA on the primary node(node01). There needs to be an additional
dependency created from the HADR resource to the public network
equivalency which will allow the HADR failover process to be
initiated in the case of a public network cable pull. With this
additional dependency in place, the HADR resource will be able
to successfully failover from the primary to the standby in the
event of a public network cable pull. | |
| Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * ALL * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Upgrade to DB2 Version 10.1 Fix Pack 3. * **************************************************************** | |
| Local Fix: | |
Verify if there exists a dependency from the HADR resource to
the public network by issuing the 'lsrel -Ab' command as the DB2
instance owner. If the dependency exists, here is how it would
be displayed:
Managed Relationship 1:
Class:Resource:Node[Source] =
IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs
Class:Resource:Node[Target] =
{IBM.Equivalency:db2_public_network_0}
Relationship = DependsOn
Conditional = NoCondition
Name =
db2_db2inst1_db2inst1_HADRDB-rs_DependsOn_db2_public_network_0-r
el
ActivePeerDomain = hadr_dom
ConfigValidity =
If this dependency does not exist, then create it as follows:
1) Bring the cluster into maintenance mode by running the
"db2haicu -disable" command as the DB2 instance owner.
2) As root from either node, run the following:
"export CT_MANAGEMENT_SCOPE=2"
"mkrel -p dependson -S
IBM.Application:db2_db2inst1_db2inst1_HADRDB-rs -G
IBM.Equivalency:db2_public_network_0
db2_db2inst1_db2inst1_HADRDB-rs_DependsOn_db2_public_network_0-r
el"
3) Verify that the dependency is now created via the "lsrel -Ab"
command.
4) Once verified that the dependency exists, exit cluster
maintenance mode by running the "db2haicu" command as the DB2
instance owner. | |
| available fix packs: | |
DB2 Version 10.1 Fix Pack 3 for Linux, UNIX, and Windows | |
| Solution | |
First fixed in Version 10.1 Fix Pack 3. | |
| Workaround | |
not known / see Local fix | |
| BUG-Tracking | |
forerunner : APAR is sysrouted TO one or more of the following: IC94057 IC94071 IC95313 follow-up : | |
| Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 22.04.2013 22.10.2013 22.10.2013 |
| Problem solved at the following versions (IBM BugInfos) | |
| Problem solved according to the fixlist(s) of the following version(s) | |
| 10.1.0.3 |
|
| 10.1.0.3 |
|