Informix - Problem description
Problem IT45632 | Status: Closed |
NEW ER NIF CONNECTION OVER SMX MIGHT HANG IN 'CONNECTING' STATE WITH UNRELIABLE NETWORK | |
product: | |
INFORMIX SERVER / 5725A3900 / E10 - | |
Problem description: | |
An ER connection to a remote server might show as forever 'Connecting' in "cdr list server" output, and a connection will never be fully established. The reason would reside in delayed detection of a no longer viable SMX pipe underneath the newly to be established ER NIF connection: the non-viability of an SMX pipe (due to packet loss or other network errors) might be detected at rather different points in time between two connected ER nodes. The first one to detect this will tear down the ER NIF connection (over SMX) and soon after might initiate a new one, over a remaining viable SMX pipe or a new one. The initial exchange will be about whether SMX even can and should be used between the two servers: a specific SMX message will be sent to the other side to which that side will feed back OK or not OK (to use SMX). With some bad luck, the bad SMX pipe that triggered all this and that already got dismantled on this side, might still look good on the other side, so that feedback from there might be placed to this pipe - which then would be the point where its non-viability gets detected, with the net result that the feedback message will never arrive. The typical NIF send thread stack for this situation, on the initiating side, is: Stack for thread: 1257 CDRNsT3 base: 0x0000000059ac8000 len: 69632 pc: 0x000000000151167b tos: 0x0000000059ad8630 state: sleeping vp: 9 0x000000000151167b (oninit) yield_processor_mvp 0x0000000001642814 (oninit) smx_recv 0x0000000001237213 (oninit) isPeerSupportERSmxCon 0x000000000122d601 (oninit) nifiGenericStart 0x00000000014dd5d3 (oninit) th_init_initgls 0x00000000015264cf (oninit) startup In onstat -g nif all, the connection would be seen with either no state or, if a disconnect had been attempted, in state "INTR,SHUT". What's interesting here is the Connection Start time which typically is very shortly before or the same as one delayed "SMX thread is exiting" message. Corresponding such messages between the servers, possibly for multiple SMX pipes terminating, would occur with some seconds difference between the sites. | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * **************************************************************** * PROBLEM DESCRIPTION: * **************************************************************** * RECOMMENDATION: * **************************************************************** The fix also includes changes to the error message in order to make it better for diagnostics. | |
Local Fix: | |
Solution | |
Workaround | |
**************************************************************** * USERS AFFECTED: * **************************************************************** * PROBLEM DESCRIPTION: * **************************************************************** * RECOMMENDATION: * **************************************************************** The fix also includes changes to the error message in order to make it better for diagnostics. | |
Comment | |
Fixed in Informix Server 14.10.xC11. | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 04.03.2024 11.06.2024 24.09.2024 |
Problem solved at the following versions (IBM BugInfos) | |
Problem solved according to the fixlist(s) of the following version(s) |