DB2 - Problem description
Problem IT29277 | Status: Closed |
IN PURESCALE, WHILE USING I/O DRAWER, DB2 MEMBER MAY GO DOWN WHEN THERE IS A PROBLEM WITH A ROCE PORT | |
product: | |
DB2 FOR LUW / DB2FORLUW / B10 - DB2 | |
Problem description: | |
When a RoCE port that is configured for HA encounters issues, it may result in one of the members going down. In this case, the db2diag.log shows the following entries: 2018-09-07-05.27.43.138008+540 I2379A709 LEVEL: Severe PID : 15597810 TID : 139862 PROC : db2sysc 0 INSTANCE: db2inst1 NODE : 000 DB : DUIT APPHDL : 0-10933 APPID: *N0.db2inst1.180905095247 AUTHID : DB2IRS HOSTNAME: host21 EDUID : 139862 EDUNAME: db2agent (DUIT) 0 FUNCTION: DB2 UDB, RAS/PD component, pdLogCaPrintf, probe:876 DATA #1 : xport_send: dat_ep_post_rdma_write of the MCB failed: 0x80040000. EP: 0x1111177d0 DATA #1 : If a CF return code is displayed above and you wish to get more information then please run the following command: ... 2018-09-07-05.27.43.152685+540 I8731A746 LEVEL: Error PID : 15597810 TID : 102875 PROC : db2sysc 0 INSTANCE: db2inst1 NODE : 000 DB : DUIT HOSTNAME: host21 EDUID : 102875 EDUNAME: db2XInot GBP 2-0 (DUIT) 0 FUNCTION: DB2 UDB, RAS/PD component, pdLogCaPrintf, probe:876 DATA #1 : link_status_write: do_dequeue for link status Buffer FAILED dest Address: 0x111b86f68 RKEY = 0x4ee00 len = 4, src Address: 0x 121146ac LKEY = 0x36700 len = 4 status = 0x80090020, ep = 0x12114c50 DATA #1 : If a CF return code is displayed above and you wish to get more information then please run the following command: db2diag -cfrc ... 2018-09-07-05.27.43.154096+540 I10195A6128 LEVEL: Event PID : 15597810 TID : 102875 PROC : db2sysc 0 INSTANCE: db2inst1 NODE : 000 DB : DUIT HOSTNAME: host21 EDUID : 102875 EDUNAME: db2XInot GBP 2-0 (DUIT) 0 FUNCTION: DB2 UDB, Shared Data Structure Abstraction Layer for CF, SAL_GBP_HANDLE::SAL_CheckXiLink, probe:204 MESSAGE : CA RC= 2148073504 DATA #1 : String, 59 bytes Detected broken XI connection, attempt reset operation now. DATA #2 : Codepath, 8 bytes 7:15 DATA #3 : unsigned integer, 8 bytes 1 DATA #4 : SAL CF Index, PD_TYPE_SAL_CF_INDEX, 8 bytes 2 DATA #5 : SAL CF Node Number, PD_TYPE_SAL_CF_NODE_NUM, 2 bytes 129 DATA #6 : String, 49 bytes current xi cf-server/member-devname/adapter-index DATA #7 : SAL CF Server Name, PD_TYPE_SAL_CF_SERVER_NAME, 13 bytes host22-en1 DATA #8 : SAL Member Device Name, PD_TYPE_SAL_MEMBER_DEVICE_NAME, 4 bytes hca0 DATA #9 : Connection pool link adapter number, PD_TYPE_SAL_ADAPTER_NUMBER, 8 bytes 0 ... 2018-09-07-05.27.43.156303+540 I17603A738 LEVEL: Error PID : 15597810 TID : 101309 PROC : db2sysc 0 INSTANCE: db2inst1 NODE : 000 DB : DUIT HOSTNAME: host21 EDUID : 101309 EDUNAME: db2LLMn2 (DUIT) 0 FUNCTION: DB2 UDB, RAS/PD component, pdLogCaPrintf, probe:876 DATA #1 : link_status_write: do_dequeue for link status Buffer FAILED dest Address: 0x111b882e8 RKEY = 0x10500 len = 4, src Address: 0x 185ac29c LKEY = 0x16800 len = 4 status = 0x80090020, ep = 0x185bd5d0 DATA #1 : If a CF return code is displayed above and you wish to get more information then please run the following command: db2diag -cfrc ... 2018-09-07-05.27.43.161216+540 I21396A630 LEVEL: Error PID : 15597810 TID : 101309 PROC : db2sysc 0 INSTANCE: db2inst1 NODE : 000 DB : DUIT HOSTNAME: host21 EDUID : 101309 EDUNAME: db2LLMn2 (DUIT) 0 FUNCTION: DB2 UDB, RAS/PD component, pdLogCaPrintf, probe:876 DATA #1 : notify_disconnect(close): dat_ep_disconnect failed: 0x80030000, EP: 0x1185bd5d0 Token: 0x1a000 DATA #1 : If a CF return code is displayed above and you wish to get more information then please run the following command: db2diag -cfrc ... 2018-09-07-05.27.43.167388+540 I30439A4907 LEVEL: Event PID : 15597810 TID : 102106 PROC : db2sysc 0 INSTANCE: db2inst1 NODE : 000 DB : DUIT HOSTNAME: host21 EDUID : 102106 EDUNAME: db2XInot SCA 2-0 (DUIT) 0 FUNCTION: DB2 UDB, Shared Data Structure Abstraction Layer for CF, SAL_GBP_HANDLE::SAL_CheckXiLink, probe:204 MESSAGE : CA RC= 2148073504 DATA #1 : String, 59 bytes Detected broken XI connection, attempt reset operation now. ... 2018-09-07-05.27.43.185042+540 E53804A4857 LEVEL: Error PID : 15597810 TID : 139862 PROC : db2sysc 0 INSTANCE: db2inst1 NODE : 000 DB : DUIT APPHDL : 0-10933 APPID: *N0.db2inst1.180905095247 AUTHID : DB2IRS HOSTNAME: host21 EDUID : 139862 EDUNAME: db2agent (DUIT) 0 FUNCTION: DB2 UDB, Shared Data Structure Abstraction Layer for CF, SAL_MANAGEMENT_PORT_HANDLE::SAL_ManagementQueryKillConnection, probe:12678 MESSAGE : ECF=0x94C6004D=-1798963123 DATA #1 : CF RC, PD_TYPE_SD_CF_RC, 4 bytes 2147876941 The stack files shows following stack of functions: -------Frame------ ------Function + Offset------ 0x090000000057FF14 pthread_kill + 0xD4 0x090000000057F764 _p_raise + 0x44 0x0900000000039E68 raise + 0x48 0x0900000000056864 abort + 0xC4 0x0900000004A59CF8 sqloExitEDU + 0x298 0x0900000004ABE0DC sqle_panic__Fi + 0x71C 0x090000000534DC54 SAL_ResetXiConnection__14SAL_GBP_HANDLEFR17SAL_XI_RECONN_EDU + 0x3D54 0x090000000B4C985C SAL_CheckXiLink__14SAL_GBP_HANDLEFR17SAL_XI_RECONN_EDU + 0xC9C 0x090000000B4C9CF4 RunEDU__17SAL_XI_RECONN_EDUFv + 0x34 0x0900000004B5EFA0 EDUDriver__9sqzEDUObjFv + 0x2E0 0x0900000004A53694 sqloEDUEntry + 0x374 | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * ALL * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Upgrade to Db2 11.1 Mod 4 Fixpack 5 or higher * **************************************************************** | |
Local Fix: | |
Solution | |
Workaround | |
not known / see Local fix | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 28.05.2019 16.01.2020 16.01.2020 |
Problem solved at the following versions (IBM BugInfos) | |
Problem solved according to the fixlist(s) of the following version(s) |