suche 36x36
Neueste VersionenFixList
11.1.0.7 FixList
10.5.0.9 FixList
10.1.0.6 FixList
9.8.0.5 FixList
9.7.0.11 FixList
9.5.0.10 FixList
9.1.0.12 FixList
Haben Sie Probleme? - Kontaktieren Sie uns.
Kostenlos registrieren anmeldung-x26
Kontaktformular kontakt-x26

DB2 - Problembeschreibung

Problem IT29073 Status: Geschlossen

ENDLESS ITERATION OF DB2 CLEANUP AND KILL PROCESSES MAKE DB2 PURESCALE
CLUSTER HANG

Produkt:
DB2 FOR LUW / DB2FORLUW / A50 - DB2
Problembeschreibung:
On AIX operatin system, a process can be stuck in "EXITING"
state in the kernel.
In this state, it cannot be killed using kill signal.

If db2sysc process can not be terminated by SIGKILL signal,
db2rocm CLEANUP and KILL processes are interrupted by SIGALRM
signal (Time expired).

  In such a situation, TSA CLEANUP task will be repeatedly
issued until the system is rebooted and its member will not be
started on the other host as restart light.

  In the meanwhile, all applications will be getting stack to
wait for the database objects which are not cleaned up by the
member crash recovery during restart light.

  In this situation, similar messgaes are logged in db2diag.log
as below.

2019-05-05-20.00.56.369398+540 I58987522A827        LEVEL: Event
PID     : 19136798             TID : 1              PROC :
db2rocm 0 [db2inst1]
INSTANCE: db2inst1             NODE : 000
HOSTNAME: member00
EDUID   : 1                    EDUNAME: db2rocm 0 [db2inst1]
FUNCTION: DB2 UDB, oper system services, sqlossig, probe:10
MESSAGE : Sending SIGKILL to the following process id
DATA #1 : signed integer, 4 bytes
-11337922
CALLSTCK: (Static functions may not be resolved correctly, as
they are resolved to the nearest symbol)
  [0] 0x090000000E0D5FE0 sqlossig + 0xA0
  [1] 0x00000001000203C0
sqlhaKillProcesses__FP18SQLHA_PROCESS_INFOUlbT2T3 + 0x8E0
  [2] 0x00000001000144DC sqlhaDB2KillNode + 0xE3C
  [3] 0x000000010000C120 rocmDB2Cleanup + 0x10A0
  [4] 0x0000000100004080 main + 0x1820
  [5] 0x00000001000002F8 __start + 0x70

2019-05-05-20.03.26.369026+540 I58998026A1507       LEVEL:
Warning
PID     : 19136798             TID : 1              PROC :
db2rocm 0 [db2inst1]
INSTANCE: db2inst1             NODE : 000
HOSTNAME: member00
EDUID   : 1                    EDUNAME: db2rocm 0 [db2inst1]
FUNCTION: DB2 UDB, high avail services,
rocmSignalsForTimeoutOffline, probe:411
MESSAGE : Received signal during CLEANUP - exiting with return
code 12.
DATA #1 : String, 7 bytes
SIGALRM
DATA #2 : ROCM Action, PD_TYPE_ROCM_ACTION, 2103568 bytes
action->version: 1
action->actor->actorType: DB2
action->actor->actorID: 0
action->actor->instName: db2inst1
action->actor->hostname: NOT_POPULATED
action->actor->options: NONE
action->command: CLEANUP
DATA #3 : PGRP File Contents, PD_TYPE_SQLO_PGRP_FILE_CONTENTS,
3224 bytes
pgrpFile->iPgrpFileVersion : 2225
pgrpFile->iPgrpId : 11337922
pgrpFile->iWdogPgrpId : 12517570
pgrpFile->iSubPgrpId : NOT_INITIALIZED
pgrpFile->iIndex : 0
pgrpFile->iNumber : 0
pgrpFile->iMonitorOverride : 0
pgrpFile->crashCounter : 0
pgrpFile->firstCrashTimeSeconds : 1970-01-01 09:00:00.000000
pgrpFile->monitorTimeoutCounter : 0
pgrpFile->firstMonitorTimeoutSeconds : 1970-01-01
09:00:00.000000
pgrpFile->lastMonitorTimeoutSeconds : 1970-01-01 09:00:00.000000
pgrpFile->hostname : member00
pgrpFile->iNumHCAs : 0
CALLSTCK: (Static functions may not be resolved correctly, as
they are resolved to the nearest symbol)
  [0] 0x0000000100006EB4 rocmSignalsForTimeoutOffline + 0xAF4
  [1] 0x0000000000000000 ?unknown + 0x0

2019-05-05-20.03.26.623617+540 I59000696A890        LEVEL: Event
PID     : 46924020             TID : 1              PROC :
db2rocme 0 [db2inst1]
INSTANCE: db2inst1             NODE : 000
HOSTNAME: member00
EDUID   : 1                    EDUNAME: db2rocme 0 [db2inst1]
FUNCTION: DB2 UDB, oper system services, sqlossig, probe:10
MESSAGE : Sending SIGKILL to the following process id
DATA #1 : signed integer, 4 bytes
-11337922
CALLSTCK: (Static functions may not be resolved correctly, as
they are resolved to the nearest symbol)
  [0] 0x090000000E0D5FE0 sqlossig + 0xA0
  [1] 0x00000001001002C0
sqlhaKillProcesses__FP18SQLHA_PROCESS_INFOUlbT2T3 + 0x8E0
  [2] 0x00000001000FC6CC sqlhaDB2KillNode + 0xE4C
  [3] 0x000000010000FAD8 rocmDB2Notify + 0x2F8
  [4] 0x000000010010322C rocmCommandRetryUntilFailure + 0x162C
  [5] 0x0000000100003F00 main + 0x16A0
  [6] 0x00000001000002F8 __start + 0x70

2019-05-05-20.03.56.620065+540 I59003951A1646       LEVEL:
Warning
PID     : 46924020             TID : 1              PROC :
db2rocme 0 [db2inst1]
INSTANCE: db2inst1               NODE : 000
HOSTNAME: member00
EDUID   : 1                    EDUNAME: db2rocme 0 [db2inst1]
FUNCTION: DB2 UDB, high avail services,
rocmSignalsForTimeoutOffline, probe:426
MESSAGE : Received signal during KILL event - exiting with
return code 13.
DATA #1 : String, 7 bytes
SIGALRM
DATA #2 : ROCM Action, PD_TYPE_ROCM_ACTION, 2103568 bytes
action->version: 1
action->actor->actorType: DB2
action->actor->actorID: 0
action->actor->instName: db2inst1
action->actor->hostname: NOT_POPULATED
action->actor->options: NONE
action->command: NOTIFY
action->notification->version: 1111
action->notification->eventType: KILL
action->notification->actor->actorType: DB2
action->notification->actor->actorID: 0
action->notification->actor->instName: db2inst1
action->notification->actor->hostname: member01
action->notification->actor->options: NONE
action->notification->sequenceNumber: 214 (0x00000000000000d6)
action->notification->eventWhitelistFlags: NONE
action->notification->bNotifSent: false
action->notification->retryNum: 0
action->notification->eventWhitelistFlagsToChange: 0
action->notification->options: FORCE
DATA #3 : PGRP File Contents, PD_TYPE_SQLO_PGRP_FILE_CONTENTS,
3224 bytes
Object not dumped: Address: 0x0000000000000000 Size: 3224
Reason: Address is NULL
CALLSTCK: (Static functions may not be resolved correctly, as
they are resolved to the nearest symbol)
  [0] 0x00000001000084EC rocmSignalsForTimeoutOffline + 0xA2C
  [1] 0x0000000000000000 ?unknown + 0x0
...


if one of the event recorders are formatted using db2fdump
command the following message would
indicate that the process is stuck in exiting state:

7445    Event sequence number: 0      Time:
2019-05-05-13.03.26.350648433
        sqlhaVerifyProcessExists (3.115.49.0.748)
        PID:            TID:                      EDUID:
APPHDL:

        Data1        (PD_TYPE_SQLHA_ER_PDINFO,80) SQLHA Event
Recorder header data (struct sqlhaErPdInfo):
          m_pTimeStamp: N/A
          m_LogDestination: 0
          m_PdFlags: 1
          m_FunctionId: 462946353 (sqlhaVerifyProcessExists)
          m_ErrorCode: 0 = 0
          m_Probe: 748
          m_Level: 4

        Data2        (PD_TYPE_MESSAGE,46) Message String:
        Process is in EXITING state - returning ONLINE

        Data3        (PD_TYPE_PROCESS_ID,4) Process ID:
        11337922

        Data4        (PD_TYPE_STRING,9) String:
        db2sysc 0

        Data5        (PD_TYPE_UINT,8) unsigned integer:
        0

        Data6        (PD_TYPE_MESSAGE,39) Message String:
        Setting ROCM_ACTION_FLAGS_DUMP_HA_EVENT
Problem-Zusammenfassung:
****************************************************************
* USERS AFFECTED:                                              *
* ALL                                                          *
****************************************************************
* PROBLEM DESCRIPTION:                                         *
* See Error Description                                        *
****************************************************************
* RECOMMENDATION:                                              *
* Upgrade to Db2 10.5 Fix Pack 11 or higher                    *
****************************************************************
Local-Fix:
Reboot the system where never died processes exist with such
message logs in db2diag.log
Lösung
Workaround
keiner bekannt / siehe Local-Fix
Bug-Verfolgung
Vorgänger  : 
Nachfolger : IT29259 
Weitere Daten
Datum - Problem gemeldet    :
Datum - Problem geschlossen :
Datum - der letzten Änderung:
09.05.2019
25.02.2020
25.02.2020
Problem behoben ab folgender Versionen (IBM BugInfos)
Problem behoben lt. FixList in der Version