suche 36x36
  • Admin-Scout-small-Banner
           

    CURSOR Admin-Scout

    das ultimative Tool für Informix

    pfeil  
Neueste VersionenFixList
14.10.xC10 FixList
12.10.xC16.X5 FixList
11.70.xC9.XB FixList
11.50.xC9.X2 FixList
11.10.xC3.W5 FixList
Haben Sie Probleme? - Kontaktieren Sie uns.
Kostenlos registrieren anmeldung-x26
Kontaktformular kontakt-x26

Informix - Problembeschreibung

Problem IT27709 Status: Geschlossen

PRIMARY AND SECONDARY UNABLE TO RECONNECT AFTER NETWORK FAILURE

Produkt:
INFORMIX SERVER / 5725A3900 / C10 - IDS 12.10
Problembeschreibung:
In some cases it might be possible that a network interruption
could cause the primary and hdr secondary to not reconnect
without bouncing the hdr secondary.  It is possible that this
would only be encountered on HDR pairs where the secondary is an
UPDATABLE secondary, or if SMX_PING_INTERVAL/SMX_PING_RETRY were
configured differently on the primary and secondary servers.

In this specific case, it appears that the issue is that HDR is
not able to properly shut itself down after detecting the
network problems.  If it can't shutdown properly, then it
consequently can't get to the code to attempt to reconnect.

The symptoms of this problems can be identified by checking the
state and stack of both the dr_prsend thread and the dr_prping
thread.

At the point where the tear down appears to be stuck onstat -g
ath would show the 2 threads in the following states:

Threads:
 tid     tcb              rstcb            prty status
vp-class       name
159      112258d48        10feee060        3    join wait
32846355    14cpu         dr_prsend
...
32846355 1d22fdc58        2c9555520        3    yield time
1cpu         dr_prping

The stacks would look like this:

Stack for thread: 159 dr_prsend
...
0x000000001118a62c (oninit)mt_join
0x0000000010ea5030 (oninit)dr_session_thread
0x00000000111ca69c (oninit)startup

Stack for thread: 32846355 dr_prping
...
0x00000000111831a0 (oninit)mt_yield
0x00000000112ed520 (oninit)smx_recv
0x0000000010e9b7ec (oninit)dr_isSecondaryInCheckpoint
0x0000000010e86e90 (oninit)dr_primary_ping
0x00000000111ca69c (oninit)startup

Another key element would be the following sequence of events
based on errors in the MSGPATH file.  What would be seen is that
on the PRIMARY server, you would see smx messages about
connections being closed because other server was unresponsive.
Then it would report that smx had created a new transport to the
hdr secondary.  Then on the hdr secondary, it would then report
that it had smx connections closed because the other server was
unresponse.  It's important that this message occur at some
point in time after the primary had it's smx connections report
being closed and it creating the new transport.  So here is
sample error sequences:

PRIMARY MSGPATH file:

23:40:37  The SMX connection between high availability servers
was closed because the
 peer server was unresponsive for the timeout period (120
seconds times the
 number of retries).
23:40:46  The SMX connection between high availability servers
was closed because the
 peer server was unresponsive for the timeout period (120
seconds times the
 number of retries).
23:40:56  The SMX connection between high availability servers
was closed because the
 peer server was unresponsive for the timeout period (120
seconds times the
 number of retries).
23:41:00  smx creates 1 transports to server allende3
23:42:55  WARNING: Detected slow or failing DNS service response
101 time(s).
23:54:30  DR: Receive error
23:54:30  dr_prsend thread : asfcode = -25582: oserr = 0: errstr
= : Network connection is broken.

23:54:30  DR_ERR set to -1

SECONDARY MSGPATH file:

23:43:22  DR: ping timeout
23:43:22  DR: Receive error
23:43:22  dr_secrcv thread : asfcode = -25582: oserr = 0: errstr
= : Network connection is broken.

23:43:22  DR_ERR set to -1
23:43:23  DR:  Terminating redirected write subsystem due to
server disconnect.
          All open redirected transactions will be rolled back.
23:43:24  Updates from secondary currently not allowed
23:43:24  ERROR: Mach11 proxyWritePostPBlobCmdSync failed
23:43:24  DR: Turned off on secondary server
23:45:16  The SMX connection between high availability servers
was closed because the
 peer server was unresponsive for the timeout period (360
seconds times the
 number of retries).
23:45:18  The SMX connection between high availability servers
was closed because the
 peer server was unresponsive for the timeout period (360
seconds times the
 number of retries).
23:45:25  The SMX connection between high availability servers
was closed because the
 peer server was unresponsive for the timeout period (360
seconds times the
 number of retries).

So the reported timings are important.
Problem-Zusammenfassung:
****************************************************************
* USERS AFFECTED:                                              *
* Users of IDS prior to 12.10.xC13.                            *
****************************************************************
* PROBLEM DESCRIPTION:                                         *
* Primary and Secondary unable to reconnect after network      *
* failure.                                                     *
****************************************************************
* RECOMMENDATION:                                              *
****************************************************************
Local-Fix:
Lösung
Workaround
keiner bekannt / siehe Local-Fix
Weitere Daten
Datum - Problem gemeldet    :
Datum - Problem geschlossen :
Datum - der letzten Änderung:
09.01.2019
24.09.2019
24.09.2019
Problem behoben ab folgender Versionen (IBM BugInfos)
12.10.xC13
Problem behoben lt. FixList in der Version
Informix EditionenInformix Editionen
Informix Editionen
DokumentationDokumentation
Dokumentation
IBM NewsletterIBM Newsletter
IBM Newsletter
Current BugsCurrent Bugs
Current Bugs
Bug ResearchBug Research
Bug Research
Bug FixlistsBug Fixlists
Bug Fixlists
Release NotesRelease Notes
Release Notes
Machine NotesMachine Notes
Machine Notes
Release NewsRelease News
Release News
Product LifecycleProduct Lifecycle
Lifecycle
Media DownloadMedia Download
Media Download