suche 36x36
  • Admin-Scout-small-Banner
           

    CURSOR Admin-Scout

    get the ultimate tool for Informix

    pfeil  
Latest versionsfixlist
14.10.xC10 FixList
12.10.xC16.X5 FixList
11.70.xC9.XB FixList
11.50.xC9.X2 FixList
11.10.xC3.W5 FixList
Have problems? - contact us.
Register for free anmeldung-x26
Contact form kontakt-x26

Informix - Problem description

Problem IT27709 Status: Closed

PRIMARY AND SECONDARY UNABLE TO RECONNECT AFTER NETWORK FAILURE

product:
INFORMIX SERVER / 5725A3900 / C10 - IDS 12.10
Problem description:
In some cases it might be possible that a network interruption
could cause the primary and hdr secondary to not reconnect
without bouncing the hdr secondary.  It is possible that this
would only be encountered on HDR pairs where the secondary is an
UPDATABLE secondary, or if SMX_PING_INTERVAL/SMX_PING_RETRY were
configured differently on the primary and secondary servers.

In this specific case, it appears that the issue is that HDR is
not able to properly shut itself down after detecting the
network problems.  If it can't shutdown properly, then it
consequently can't get to the code to attempt to reconnect.

The symptoms of this problems can be identified by checking the
state and stack of both the dr_prsend thread and the dr_prping
thread.

At the point where the tear down appears to be stuck onstat -g
ath would show the 2 threads in the following states:

Threads:
 tid     tcb              rstcb            prty status
vp-class       name
159      112258d48        10feee060        3    join wait
32846355    14cpu         dr_prsend
...
32846355 1d22fdc58        2c9555520        3    yield time
1cpu         dr_prping

The stacks would look like this:

Stack for thread: 159 dr_prsend
...
0x000000001118a62c (oninit)mt_join
0x0000000010ea5030 (oninit)dr_session_thread
0x00000000111ca69c (oninit)startup

Stack for thread: 32846355 dr_prping
...
0x00000000111831a0 (oninit)mt_yield
0x00000000112ed520 (oninit)smx_recv
0x0000000010e9b7ec (oninit)dr_isSecondaryInCheckpoint
0x0000000010e86e90 (oninit)dr_primary_ping
0x00000000111ca69c (oninit)startup

Another key element would be the following sequence of events
based on errors in the MSGPATH file.  What would be seen is that
on the PRIMARY server, you would see smx messages about
connections being closed because other server was unresponsive.
Then it would report that smx had created a new transport to the
hdr secondary.  Then on the hdr secondary, it would then report
that it had smx connections closed because the other server was
unresponse.  It's important that this message occur at some
point in time after the primary had it's smx connections report
being closed and it creating the new transport.  So here is
sample error sequences:

PRIMARY MSGPATH file:

23:40:37  The SMX connection between high availability servers
was closed because the
 peer server was unresponsive for the timeout period (120
seconds times the
 number of retries).
23:40:46  The SMX connection between high availability servers
was closed because the
 peer server was unresponsive for the timeout period (120
seconds times the
 number of retries).
23:40:56  The SMX connection between high availability servers
was closed because the
 peer server was unresponsive for the timeout period (120
seconds times the
 number of retries).
23:41:00  smx creates 1 transports to server allende3
23:42:55  WARNING: Detected slow or failing DNS service response
101 time(s).
23:54:30  DR: Receive error
23:54:30  dr_prsend thread : asfcode = -25582: oserr = 0: errstr
= : Network connection is broken.

23:54:30  DR_ERR set to -1

SECONDARY MSGPATH file:

23:43:22  DR: ping timeout
23:43:22  DR: Receive error
23:43:22  dr_secrcv thread : asfcode = -25582: oserr = 0: errstr
= : Network connection is broken.

23:43:22  DR_ERR set to -1
23:43:23  DR:  Terminating redirected write subsystem due to
server disconnect.
          All open redirected transactions will be rolled back.
23:43:24  Updates from secondary currently not allowed
23:43:24  ERROR: Mach11 proxyWritePostPBlobCmdSync failed
23:43:24  DR: Turned off on secondary server
23:45:16  The SMX connection between high availability servers
was closed because the
 peer server was unresponsive for the timeout period (360
seconds times the
 number of retries).
23:45:18  The SMX connection between high availability servers
was closed because the
 peer server was unresponsive for the timeout period (360
seconds times the
 number of retries).
23:45:25  The SMX connection between high availability servers
was closed because the
 peer server was unresponsive for the timeout period (360
seconds times the
 number of retries).

So the reported timings are important.
Problem Summary:
****************************************************************
* USERS AFFECTED:                                              *
* Users of IDS prior to 12.10.xC13.                            *
****************************************************************
* PROBLEM DESCRIPTION:                                         *
* Primary and Secondary unable to reconnect after network      *
* failure.                                                     *
****************************************************************
* RECOMMENDATION:                                              *
****************************************************************
Local Fix:
Solution
Workaround
not known / see Local fix
Timestamps
Date  - problem reported    :
Date  - problem closed      :
Date  - last modified       :
09.01.2019
24.09.2019
24.09.2019
Problem solved at the following versions (IBM BugInfos)
12.10.xC13
Problem solved according to the fixlist(s) of the following version(s)
Informix EditionsInformix Editions
Informix Editions
DocumentationDocumentation
Documentation
IBM NewsletterIBM Newsletter
IBM Newsletter
Current BugsCurrent Bugs
Current Bugs
Bug ResearchBug Research
Bug Research
Bug FixlistsBug Fixlists
Bug Fixlists
Release NotesRelease Notes
Release Notes
Machine NotesMachine Notes
Machine Notes
Release NewsRelease News
Release News
Product LifecycleProduct Lifecycle
Lifecycle
Media DownloadMedia Download
Media Download