suche 36x36
Latest versionsfixlist
11.1.0.7 FixList
10.5.0.9 FixList
10.1.0.6 FixList
9.8.0.5 FixList
9.7.0.11 FixList
9.5.0.10 FixList
9.1.0.12 FixList
Have problems? - contact us.
Register for free anmeldung-x26
Contact form kontakt-x26

DB2 - Problem description

Problem IT22123 Status: Closed

PURESCALE MEMBER HITS AN FODC AND RESTARTS AFTER PRIMARY CF FAILOVER

product:
DB2 FOR LUW / DB2FORLUW / A50 - DB2
Problem description:
##  Problem description
DB2 pureScale member hits an FODC and restart after CF failover.

##  Diagnostic information
During a CF failover scenario, a standby CF takesover the
primary role


A member recieves  RECONSTRUCT notification.

2017-08-18-17.30.48.391876+480 I2841217A1267        LEVEL: Event
PID     : 5112136              TID : 1              PROC :
db2rocme 1 [db2inst1]
INSTANCE: db2inst1                NODE : 001
HOSTNAME: hosta
EDUID   : 1                    EDUNAME: db2rocme 1 [db2inst1]
FUNCTION: DB2 UDB, high avail services,
rocmCommandRetryUntilFailure, probe:109
MESSAGE : Sending ROCM notification.
DATA #1 : ROCM Notification, PD_TYPE_ROCM_NOTIFICATION, 368
bytes
notification->version: 1111
notification->eventType: RECONSTRUCT
notification->actor->actorType: PRIMARY
notification->actor->actorID: 900
notification->actor->underlyingActorID: 129

During reconstruction the member is supposed to close
connections to the restarted CF.
But in this case,  the reconstruction times out after 100
seconds:

2017-08-18-17.32.28.026664+480 E2844960A2834        LEVEL:
Severe
PID     : 5112136              TID : 1              PROC :
db2rocme 1 [db2inst1]
INSTANCE: db2inst1                NODE : 001
HOSTNAME: hostA
EDUID   : 1                    EDUNAME: db2rocme 1 [db2inst1]
FUNCTION: DB2 UDB, high avail services,
rocmCommandRetryUntilFailure, probe:709
MESSAGE :
ZRC=0x82000199=-2113928807=SQLZ_RC_HA_NOTIFICATION_RETRY
          "The given whitelist transition is valid, but can not
be completed at the present time."
DATA #1 : Codepath, 8 bytes
1:2:3:4:20:32:47:48:57:58
DATA #2 : ROCM Actor, PD_TYPE_ROCM_ACTOR, 304 bytes
actor->actorType: DB2
actor->actorID: 1
actor->instName: db2inst1
actor->hostname: NOT_POPULATED
actor->options: NONE
DATA #3 : Database Partition Number, PD_TYPE_NODE, 2 bytes
0
DATA #4 : signed integer, 4 bytes
0
DATA #5 : ROCM Notification, PD_TYPE_ROCM_NOTIFICATION, 368
bytes
notification->version: 1111
notification->eventType: RECONSTRUCT
notification->actor->actorType: PRIMARY
notification->actor->actorID: 900

As a result DB2 member 1 FODC's.

Inside the FODC directory, the members connection pool state is
dumped.
To identify this issue, Active List Size is greater than 0
meaning that a connection was not cleaned up

db2pd.1.cfpool.txt:

Shared connection pool information for CF[128]
...
Active List Size:       1
...

CF Pool Entries (Legend: U A S D S W R R = Used, Async, Sent,
Dedicated, SLS, WAR, RLS, RAR):
----------------------------
  #                     CF Netname  Mbr DevName List Use Cnt
Flag U A S D S W R R      PID    EDU          Mem Address
Function Probe
--- ------------------------------ ------------ ---- -------
---- - - - - - - - - -------- ------ --------------------
------------------------------ -----
  0                   neta         hca3    3 129387159 0001 Y N
N N N N N N 15663144 363125   0x0a00020696dc0080
sqlbgbCastout  2299
1                   netb         hca0   -1 139269382 0000 N N N
N N N N N 15663144 574572   0x0a00020696a60080
sqlpLLMSetLockState   340
2                   netc         hca1   -1 140648402 0000 N N N
N N N N N 15663144 814362   0x0a00020696e80080
sqlpLLMSetLockState   340
3                   netd         hca0   -1 153163535 0000 N N N
N N N N N 15663144 701532   0x0a000204fb510080
sqlpLLMSetLockState   340

From above output the edu holding the connection is 363125
inside function sqlbCastout ,
the following is the stack of that edu


-------Frame------ ------Function + Offset------
0x0900000000112F14 thread_wait + 0x94
0x090000000C91D930 sqloWaitEDUWaitPost + 0x254
0x090000000D4FFB30
SAL_WaitForPrimaryArrival__14@82@SAL_CA_KEYFCUiCUl + 0x290
0x090000000D4390C0
SAL_ReadSA__13SAL_SA_HANDLEFCP10PsPageNameCPUlCUlCUiT3 + 0x19A8
0x090000000C1B9968
SAL_RefreshGlobalCastoutDrainState__14SAL_GBP_HANDLEFv + 0x19C
0x090000000C1BA7D8
SAL_ReadForCastoutMult__14SAL_GBP_HANDLEFR35SAL_CASTOUT_CLASS_AN
D_PROGRESS_INFOCP18SAL_CASTOUT_COOKIECUlCPP9SQLB_PAGEP13PsCastou
tNameCPCUlRC10PsP
ageNameCUsCPUiN29CUiT3 + 0xD68
0x090000000C1B8DEC sqlbgbCastout__FP12SQLB_CLNR_CBb + 0x2978
0x090000000CCD3A04 sqlbClnrGatherPages__FP12SQLB_CLNR_CB + 0x24
0x090000000CCD26EC sqlbClnrEntryPoint__FP12sqbPgClnrEdu + 0x904
0x090000000BB8558C RunEDU__12sqbPgClnrEduFv + 0x40
0x090000000C4F7DA8 EDUDriver__9sqzEDUObjFv + 0x3F4
0x090000000C1892D4 sqloEDUEntry + 0x3A0


The member is automatically restarted after FODC and returns to
normal state.
No further action is required from DBA.
Problem Summary:
****************************************************************
* USERS AFFECTED:                                              *
* db2 pureScale                                                *
****************************************************************
* PROBLEM DESCRIPTION:                                         *
* See Error Description                                        *
****************************************************************
* RECOMMENDATION:                                              *
* Update to next release if it contains the fix.               *
****************************************************************
Local Fix:
Solution
Workaround
not known / see Local fix
Timestamps
Date  - problem reported    :
Date  - problem closed      :
Date  - last modified       :
23.08.2017
18.07.2018
18.07.2018
Problem solved at the following versions (IBM BugInfos)
Problem solved according to the fixlist(s) of the following version(s)