suche 36x36
Latest versionsfixlist
11.1.0.7 FixList
10.5.0.9 FixList
10.1.0.6 FixList
9.8.0.5 FixList
9.7.0.11 FixList
9.5.0.10 FixList
9.1.0.12 FixList
Have problems? - contact us.
Register for free anmeldung-x26
Contact form kontakt-x26

DB2 - Problem description

Problem IT22532 Status: Closed

FAILURE ON MEMBER-CF COMMUNICATION ONCE ONE OF REDUNDANT SWITCHES IS
FAILED.

product:
DB2 FOR LUW / DB2FORLUW / B10 - DB2
Problem description:
The issue starts with the first switch failure. Based on the 
system and RSCT logs, Db2 detects the adapter is getting down 
and up when 
the first switch is failed as expected. 
However, when RSCT detects the adapter is getting up and issues 
a callback 
to Db2, following entry is logged in the db2diag.log: 
 
2017-09-05-09.27.46.880674+120 I1572284E687          LEVEL: 
Event 
PID     : 15687 TID : 139701730141952 PROC : 
db2sysc 3 
INSTANCE: db2inst1             NODE : 003 
HOSTNAME: node03 
EDUID   : 24                   EDUNAME: db2clstrRscMon 3 
FUNCTION: DB2 UDB, high avail services, rocmHCAMonitorCallback, 
probe:1727 
MESSAGE : HCA callback data: Member, adapter, online, numOnline, 
attrCount, 
          attr[0] value 
DATA #1 : Database Partition Number, PD_TYPE_NODE, 2 bytes 
3 
DATA #2 : String, 9 bytes 
eth1-mlx0 
DATA #3 : Boolean, 1 bytes 
false 
DATA #4 : signed integer, 8 bytes 
1 
DATA #5 : signed integer, 4 bytes 
1 
DATA #6 : signed integer, 4 bytes 
1 
 
2017-09-05-09.27.46.885922+120 I1572972E1910         LEVEL: 
Severe 
PID     : 15687 TID : 139701730141952 PROC : 
db2sysc 3 
INSTANCE: db2inst1             NODE : 003 
HOSTNAME: node03 
EDUID   : 24                   EDUNAME: db2clstrRscMon 3 
FUNCTION: DB2 UDB, oper system services, 
sqloAtForkPrepareHandler, probe:100 
DATA #1 : Codepath, 8 bytes 
3:19 
MESSAGE : Cannot invoke fork() within the engine, this thread 
will be suspended 
          now for further investigation. 
CALLSTCK: (Static functions may not be resolved correctly, as 
they are resolved to the nearest symbol) 
  [0] 0x00007F0EECEAF96D sqloAtForkPrepareHandler + 0x51D 
  [1] 0x00007F0EE4B5BF82 __libc_fork + 0x52 
  [2] 0x00007F0EE4B0AF9C _IO_proc_open + 0xBC 
  [3] 0x00007F0EE4B0B22C popen + 0x5C 
  [4] 0x00007F0EECDE7A1C 
_Z39sqloConfigureRoutesForMultipleRoCELinuxv + 0x54C 
  [5] 0x00007F0EB97BCBFB rocmHCAMonitorCallback + 0x8AB 
  [6] 0x00007F0EB355989B /lib64/libct_mc.so + 0x2D89B 
  [7] 0x00007F0EB354BFB7 /lib64/libct_mc.so + 0x1FFB7 
  [8] 0x00007F0EB354B885 /lib64/libct_mc.so + 0x1F885 
  [9] 0x00007F0EB354B271 /lib64/libct_mc.so + 0x1F271 
  [10] 0x00007F0EB354B0C7 /lib64/libct_mc.so + 0x1F0C7 
  [11] 0x00007F0EB354AC33 /lib64/libct_mc.so + 0x1EC33 
  [12] 0x00007F0EB354A67F /lib64/libct_mc.so + 0x1E67F 
  [13] 0x00007F0EB353EDF9 /lib64/libct_mc.so + 0x12DF9 
  [14] 0x00007F0EB353E54C /lib64/libct_mc.so + 0x1254C 
  [15] 0x00007F0EB353DDAA mc_dispatch_1 + 0x2E6 
  [16] 0x00007F0EB97C0F79 
_Z51rocmMemberHCAMonitorStartSessionRegisterAndDispatchP16ROCM_H 
CA_MONITOR + 0x369 
  [17] 0x00007F0EB97C09D7 rocmMemberHCAMonitor + 0x37 
  [18] 0x00000000004211E7 
_ZN26sqeMemberAdapterMonitorEdu6RunEDUEv + 0x107 
  [19] 0x00007F0EEE8FDC96 _ZN9sqzEDUObj9EDUDriverEv + 0x116 
  [20] 0x00007F0EECEB6358 sqloEDUEntry + 0x578 
  [21] 0x00007F0EF471DDC5 /lib64/libpthread.so.0 + 0x7DC5 
  [22] 0x00007F0EE4B94CED clone + 0x6D 
 
 
 
This fork() error suspends the thread hence Db2 cannot proceed 
ahead with marking adapter links up and never marks these 
links as Online. 
When second switch is lost, we see the following entry: 
 
2017-09-05-09.43.23.514282+120 I7981597E486          LEVEL: 
Warning 
PID     : 15687 TID : 139701671421696 PROC : 
db2sysc 3 
INSTANCE: db2inst1             NODE : 003            DB   : 
SAMPLE 
HOSTNAME: node03 
EDUID   : 586                  EDUNAME: db2XInot SCA 1-0 
(SAMPLE) 3 
FUNCTION: DB2 UDB, Shared Data Structure Abstraction Layer for 
CF, SAL_GBP_HANDLE::SAL_ResetXiConnection, probe:894 
DATA #1 : <preformatted> 
All links are monitored offline.
Problem Summary:
**************************************************************** 
* USERS AFFECTED:                                              * 
* PureScale                                                    * 
**************************************************************** 
* PROBLEM DESCRIPTION:                                         * 
* See Error Description                                        * 
**************************************************************** 
* RECOMMENDATION:                                              * 
* Upgrade to Db2 Version 11.1 Mod2 Fix Pack2 iFix001           * 
****************************************************************
Local Fix:
available fix packs:
Db2 Version 11.1 Mod2 Fix Pack2 iFix001 for Linux, UNIX, and Windows
Db2 Version 11.1 Mod2 Fix Pack2 iFix002 for Linux, UNIX, and Windows
Db2 Version 11.1 Mod 3 Fix Pack 3 for Linux, UNIX, and Windows
Db2 Version 11.1 Mod3 Fix Pack3 iFix001 for Linux, UNIX, and Windows
Db2 Version 11.1 Mod3 Fix Pack3 iFix002 for Linux, UNIX, and Windows

Solution
First fixed in Db2 Version 11.1 Mod2 Fix Pack2 iFix001
Workaround
not known / see Local fix
Timestamps
Date  - problem reported    :
Date  - problem closed      :
Date  - last modified       :
25.09.2017
09.10.2017
11.10.2017
Problem solved at the following versions (IBM BugInfos)
Problem solved according to the fixlist(s) of the following version(s)