DB2 - Problem description
Problem IT22532 | Status: Closed |
FAILURE ON MEMBER-CF COMMUNICATION ONCE ONE OF REDUNDANT SWITCHES IS FAILED. | |
product: | |
DB2 FOR LUW / DB2FORLUW / B10 - DB2 | |
Problem description: | |
The issue starts with the first switch failure. Based on the system and RSCT logs, Db2 detects the adapter is getting down and up when the first switch is failed as expected. However, when RSCT detects the adapter is getting up and issues a callback to Db2, following entry is logged in the db2diag.log: 2017-09-05-09.27.46.880674+120 I1572284E687 LEVEL: Event PID : 15687 TID : 139701730141952 PROC : db2sysc 3 INSTANCE: db2inst1 NODE : 003 HOSTNAME: node03 EDUID : 24 EDUNAME: db2clstrRscMon 3 FUNCTION: DB2 UDB, high avail services, rocmHCAMonitorCallback, probe:1727 MESSAGE : HCA callback data: Member, adapter, online, numOnline, attrCount, attr[0] value DATA #1 : Database Partition Number, PD_TYPE_NODE, 2 bytes 3 DATA #2 : String, 9 bytes eth1-mlx0 DATA #3 : Boolean, 1 bytes false DATA #4 : signed integer, 8 bytes 1 DATA #5 : signed integer, 4 bytes 1 DATA #6 : signed integer, 4 bytes 1 2017-09-05-09.27.46.885922+120 I1572972E1910 LEVEL: Severe PID : 15687 TID : 139701730141952 PROC : db2sysc 3 INSTANCE: db2inst1 NODE : 003 HOSTNAME: node03 EDUID : 24 EDUNAME: db2clstrRscMon 3 FUNCTION: DB2 UDB, oper system services, sqloAtForkPrepareHandler, probe:100 DATA #1 : Codepath, 8 bytes 3:19 MESSAGE : Cannot invoke fork() within the engine, this thread will be suspended now for further investigation. CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol) [0] 0x00007F0EECEAF96D sqloAtForkPrepareHandler + 0x51D [1] 0x00007F0EE4B5BF82 __libc_fork + 0x52 [2] 0x00007F0EE4B0AF9C _IO_proc_open + 0xBC [3] 0x00007F0EE4B0B22C popen + 0x5C [4] 0x00007F0EECDE7A1C _Z39sqloConfigureRoutesForMultipleRoCELinuxv + 0x54C [5] 0x00007F0EB97BCBFB rocmHCAMonitorCallback + 0x8AB [6] 0x00007F0EB355989B /lib64/libct_mc.so + 0x2D89B [7] 0x00007F0EB354BFB7 /lib64/libct_mc.so + 0x1FFB7 [8] 0x00007F0EB354B885 /lib64/libct_mc.so + 0x1F885 [9] 0x00007F0EB354B271 /lib64/libct_mc.so + 0x1F271 [10] 0x00007F0EB354B0C7 /lib64/libct_mc.so + 0x1F0C7 [11] 0x00007F0EB354AC33 /lib64/libct_mc.so + 0x1EC33 [12] 0x00007F0EB354A67F /lib64/libct_mc.so + 0x1E67F [13] 0x00007F0EB353EDF9 /lib64/libct_mc.so + 0x12DF9 [14] 0x00007F0EB353E54C /lib64/libct_mc.so + 0x1254C [15] 0x00007F0EB353DDAA mc_dispatch_1 + 0x2E6 [16] 0x00007F0EB97C0F79 _Z51rocmMemberHCAMonitorStartSessionRegisterAndDispatchP16ROCM_H CA_MONITOR + 0x369 [17] 0x00007F0EB97C09D7 rocmMemberHCAMonitor + 0x37 [18] 0x00000000004211E7 _ZN26sqeMemberAdapterMonitorEdu6RunEDUEv + 0x107 [19] 0x00007F0EEE8FDC96 _ZN9sqzEDUObj9EDUDriverEv + 0x116 [20] 0x00007F0EECEB6358 sqloEDUEntry + 0x578 [21] 0x00007F0EF471DDC5 /lib64/libpthread.so.0 + 0x7DC5 [22] 0x00007F0EE4B94CED clone + 0x6D This fork() error suspends the thread hence Db2 cannot proceed ahead with marking adapter links up and never marks these links as Online. When second switch is lost, we see the following entry: 2017-09-05-09.43.23.514282+120 I7981597E486 LEVEL: Warning PID : 15687 TID : 139701671421696 PROC : db2sysc 3 INSTANCE: db2inst1 NODE : 003 DB : SAMPLE HOSTNAME: node03 EDUID : 586 EDUNAME: db2XInot SCA 1-0 (SAMPLE) 3 FUNCTION: DB2 UDB, Shared Data Structure Abstraction Layer for CF, SAL_GBP_HANDLE::SAL_ResetXiConnection, probe:894 DATA #1 : <preformatted> All links are monitored offline. | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * PureScale * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Upgrade to Db2 Version 11.1 Mod2 Fix Pack2 iFix001 * **************************************************************** | |
Local Fix: | |
available fix packs: | |
Db2 Version 11.1 Mod2 Fix Pack2 iFix001 for Linux, UNIX, and Windows | |
Solution | |
First fixed in Db2 Version 11.1 Mod2 Fix Pack2 iFix001 | |
Workaround | |
not known / see Local fix | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 25.09.2017 09.10.2017 11.10.2017 |
Problem solved at the following versions (IBM BugInfos) | |
Problem solved according to the fixlist(s) of the following version(s) |