suche 36x36
  • Admin-Scout-small-Banner
           

    CURSOR Admin-Scout

    get the ultimate tool for Informix

    pfeil  
Latest versionsfixlist
14.10.xC10 FixList
12.10.xC16.X5 FixList
11.70.xC9.XB FixList
11.50.xC9.X2 FixList
11.10.xC3.W5 FixList
Have problems? - contact us.
Register for free anmeldung-x26
Contact form kontakt-x26

Informix - Problem description

Problem IT30240 Status: Closed

LOGICAL RECOVERY ERROR TRYING TO RESTART SERVER IF RTREE INDEX SENT TO
SECONDARY BUT NO CHECKPOINT HAPPENS BEFORE FAILURE

product:
INFORMIX SERVER / 5725A3900 / B70 - IDS 11.70
Problem description:
So to see the problem, you would have to see something like the
following in the secondary servers MSGPATH file:

08:47:57  Checkpoint Completed:  duration was 0 seconds.
08:47:57  Tue Sep 10 - loguniq 23, logpos 0x1086080, timestamp:
0x14f597 Interval: 523

08:47:57  Maximum server connections 0
08:47:57  Checkpoint Statistics - Avg. Txn Block Time 0.000, #
Txns blocked 0, Plog used 21, Llog used 0

08:48:07  DR: Receiving index db1:"jrenaut".t2#t2_geo : Started
08:48:08  DR: Receiving index db1:"jrenaut".t2#t2_geo :
Completed.
08:48:15  The Master Daemon Died

So from this the sequence looks like a checkpoint happens and
then HDR receives an index and then the server aborts.  However,
if you look at onlog output, you can see the CINDEX log record
happens in a transaction prior to the checkpoint @ 8:47:57
logpos 0x1086080.

Here's the onlog:

107f018  56   BEGIN    15       23 0        09/10/2019 08:47:41
1522     jrenaut
...
10841fc  136  CINDEX   15       0  1084034  300002   1
1084284  136  CINDEX   15       0  10841fc  300002   1
...
1085284  56   COMMIT   15       0  10851cc  09/10/2019 08:47:41

...

1086080  68   CKPOINT  1        23 0        1

The above CINDEX is for the db1:"jrenaut".t2#t2_geo index that
says it was successfully shipped to the secondary.  The
transaction that creates the index commits prior to the
checkpoint, but the index isn't shipped until after that
checkpoint.

So now when the server is brought online, we see the following
messages and failure in the MSGPATH file:

08:48:30  IBM Informix Dynamic Server Started.
...
08:48:37  Physical Recovery Started at Page (1:12901).
08:48:37  Physical Recovery Complete: 1 Pages Examined, 1 Pages
Restored.
08:48:37  DR: Trying to connect to primary server = rtpri
08:48:37  Dataskip is now OFF for all dbspaces
08:48:37  Restartable Restore has been ENABLED
08:48:37  Recovery Mode
08:48:38  DR: Secondary server connected
08:48:39  DR: Using default behavior of failure-recovering
Secondary server

08:48:39  DR: Failure recovery from disk in progress ...
08:48:39  Logical Recovery Started.
08:48:39  10 recovery worker threads will be started.
08:48:39  Start Logical Recovery - Start Log 23, End Log ?
08:48:39  Starting Log Position - 23 0x1086080
08:48:43  Started processing open transactions on secondary
during startup
08:48:43  Finished processing open transactions on secondary
during startup.
08:48:43  Rollforward of log record failed. iserrno = 126
08:48:43  Log Record: log = 23, pos = 0x10891cc, type =
OLDRSAM:HUPDATE(73), trans = 15

So when the server tries to recovery from the checkpoint after
the index creation was committed, it hits a log record for that
rtree index that unfortunately now that we are in recovery,
hasn't been reshipped yet and it fails because it seems not all
rtree operations are flagged as index operations that would
normally be allowed to fail without then aborting recovery and
bringing the server down.

Here is a portion of the contents of the af file:

08:48:43  Log record (OLDRSAM:HUPDATE) failed, partnum 0x300002
rowid 0x801 iserrno 126
Log Record: log = 23, pos = 0x10891cc, type =
OLDRSAM:HUPDATE(73), trans = 15
0000000058709060: 64000000 00004900 1e200000 00000000   d.....I.
. ......
0000000058709070: 00000000 00000000 0f000000 3c910801   ........
....
Problem Summary:
****************************************************************
* USERS AFFECTED:                                              *
* Users of Informix 11.70.xC8 and earlier versions.            *
****************************************************************
* PROBLEM DESCRIPTION:                                         *
* Txns blocked 0, Plog used 21, Llog used 0                    *
*                                                              *
* 08:48:07  DR: Receiving index db1:"jrenaut".t2#t2_geo :      *
* Started                                                      *
* 08:48:08  DR: Receiving index db1:"jrenaut".t2#t2_geo :      *
* Completed.                                                   *
* 08:48:15  The Master Daemon Died                             *
*                                                              *
* So from this the sequence looks like a checkpoint happens    *
* and                                                          *
* then HDR receives an index and then the server aborts.       *
* However,                                                     *
* if you look at onlog output, you can see the CINDEX log      *
* record                                                       *
* happens in a transaction prior to the checkpoint @ 8:47:57   *
* logpos 0x1086080.                                            *
*                                                              *
* Here's the onlog:                                            *
*                                                              *
* 107f018  56   BEGIN    15       23 0        09/10/2019       *
* 08:47:41                                                     *
* 1522     jrenaut                                             *
* ...                                                          *
* 10841fc  136  CINDEX   15       0  1084034  300002   1       *
* 1084284  136  CINDEX   15       0  10841fc  300002   1       *
* ...                                                          *
* 1085284  56   COMMIT   15       0  10851cc  09/10/2019       *
* 08:47:41                                                     *
*                                                              *
* ...                                                          *
*                                                              *
* 1086080  68   CKPOINT  1        23 0        1                *
*                                                              *
* The above CINDEX is for the db1:"jrenaut".t2#t2_geo index    *
* that                                                         *
* says it was successfully shipped to the secondary.  The      *
* transaction that creates the index commits prior to the      *
* checkpoint, but the index isn't shipped until after that     *
* checkpoint.                                                  *
*                                                              *
* So now when the server is brought online, we see the         *
* following                                                    *
* messages and failure in the MSGPATH file:                    *
*                                                              *
* 08:48:30  IBM Informix Dynamic Server Started.               *
* ...                                                          *
* 08:48:37  Physical Recovery Started at Page (1:12901).       *
* 08:48:37  Physical Recovery Complete: 1 Pages Examined, 1    *
* Pages                                                        *
* Restored.                                                    *
* 08:48:37  DR: Trying to connect to primary server = rtpri    *
* 08:48:37  Dataskip is now OFF for all dbspaces               *
* 08:48:37  Restartable Restore has been ENABLED               *
* 08:48:37  Recovery Mode                                      *
* 08:48:38  DR: Secondary server connected                     *
* 08:48:39  DR: Using default behavior of failure-recovering   *
* Secondary server                                             *
*                                                              *
* 08:48:39  DR: Failure recovery from disk in progress ...     *
* 08:48:39  Logical Recovery Started.                          *
* 08:48:39  10 recovery worker threads will be started.        *
* 08:48:39  Start Logical Recovery - Start Log 23, End Log ?   *
* 08:48:39  Starting Log Position - 23 0x1086080               *
* 08:48:43  Started processing open transactions on secondary  *
* during startup                                               *
* 08:48:43  Finished processing open transactions on secondary *
* during startup.                                              *
* 08:48:43  Rollforward of log record failed. iserrno = 126    *
* 08:48:43  Log Record: log = 23, pos = 0x10891cc, type =      *
* OLDRSAM:HUPDATE(73), trans = 15                              *
*                                                              *
* So when the server tries to recovery from the checkpoint     *
* after                                                        *
* the index creation was committed, it hits a log record for   *
* that                                                         *
* rtree index that unfortunately now that we are in recovery,  *
* hasn't been reshipped yet and it fails because it seems not  *
* all                                                          *
* rtree operations are flagged as index operations that would  *
* normally be allowed to fail without then aborting recovery   *
* and                                                          *
* bringing the server down.                                    *
*                                                              *
* Here is a portion of the contents of the af file:            *
*                                                              *
* 08:48:43  Log record (OLDRSAM:HUPDATE) failed, partnum       *
* 0x300002                                                     *
* rowid 0x801 iserrno 126                                      *
* Log Record: log = 23, pos = 0x10891cc, type =                *
* OLDRSAM:HUPDATE(73), trans = 15                              *
* 0000000058709060: 64000000 00004900 1e200000 00000000        *
* d.....I.                                                     *
* . ......                                                     *
* 0000000058709070: 00000000 00000000 0f000000 3c910801        *
* ........                                                     *
* ....
Local Fix:
Solution
Workaround
****************************************************************
* USERS AFFECTED:                                              *
* Users of Informix 11.70.xC8 and earlier versions.            *
****************************************************************
* PROBLEM DESCRIPTION:                                         *
* Txns blocked 0, Plog used 21, Llog used 0                    *
*                                                              *
* 08:48:07  DR: Receiving index db1:"jrenaut".t2#t2_geo :      *
* Started                                                      *
* 08:48:08  DR: Receiving index db1:"jrenaut".t2#t2_geo :      *
* Completed.                                                   *
* 08:48:15  The Master Daemon Died                             *
*                                                              *
* So from this the sequence looks like a checkpoint happens    *
* and                                                          *
* then HDR receives an index and then the server aborts.       *
* However,                                                     *
* if you look at onlog output, you can see the CINDEX log      *
* record                                                       *
* happens in a transaction prior to the checkpoint @ 8:47:57   *
* logpos 0x1086080.                                            *
*                                                              *
* Here's the onlog:                                            *
*                                                              *
* 107f018  56   BEGIN    15       23 0        09/10/2019       *
* 08:47:41                                                     *
* 1522     jrenaut                                             *
* ...                                                          *
* 10841fc  136  CINDEX   15       0  1084034  300002   1       *
* 1084284  136  CINDEX   15       0  10841fc  300002   1       *
* ...                                                          *
* 1085284  56   COMMIT   15       0  10851cc  09/10/2019       *
* 08:47:41                                                     *
*                                                              *
* ...                                                          *
*                                                              *
* 1086080  68   CKPOINT  1        23 0        1                *
*                                                              *
* The above CINDEX is for the db1:"jrenaut".t2#t2_geo index    *
* that                                                         *
* says it was successfully shipped to the secondary.  The      *
* transaction that creates the index commits prior to the      *
* checkpoint, but the index isn't shipped until after that     *
* checkpoint.                                                  *
*                                                              *
* So now when the server is brought online, we see the         *
* following                                                    *
* messages and failure in the MSGPATH file:                    *
*                                                              *
* 08:48:30  IBM Informix Dynamic Server Started.               *
* ...                                                          *
* 08:48:37  Physical Recovery Started at Page (1:12901).       *
* 08:48:37  Physical Recovery Complete: 1 Pages Examined, 1    *
* Pages                                                        *
* Restored.                                                    *
* 08:48:37  DR: Trying to connect to primary server = rtpri    *
* 08:48:37  Dataskip is now OFF for all dbspaces               *
* 08:48:37  Restartable Restore has been ENABLED               *
* 08:48:37  Recovery Mode                                      *
* 08:48:38  DR: Secondary server connected                     *
* 08:48:39  DR: Using default behavior of failure-recovering   *
* Secondary server                                             *
*                                                              *
* 08:48:39  DR: Failure recovery from disk in progress ...     *
* 08:48:39  Logical Recovery Started.                          *
* 08:48:39  10 recovery worker threads will be started.        *
* 08:48:39  Start Logical Recovery - Start Log 23, End Log ?   *
* 08:48:39  Starting Log Position - 23 0x1086080               *
* 08:48:43  Started processing open transactions on secondary  *
* during startup                                               *
* 08:48:43  Finished processing open transactions on secondary *
* during startup.                                              *
* 08:48:43  Rollforward of log record failed. iserrno = 126    *
* 08:48:43  Log Record: log = 23, pos = 0x10891cc, type =      *
* OLDRSAM:HUPDATE(73), trans = 15                              *
*                                                              *
* So when the server tries to recovery from the checkpoint     *
* after                                                        *
* the index creation was committed, it hits a log record for   *
* that                                                         *
* rtree index that unfortunately now that we are in recovery,  *
* hasn't been reshipped yet and it fails because it seems not  *
* all                                                          *
* rtree operations are flagged as index operations that would  *
* normally be allowed to fail without then aborting recovery   *
* and                                                          *
* bringing the server down.                                    *
*                                                              *
* Here is a portion of the contents of the af file:            *
*                                                              *
* 08:48:43  Log record (OLDRSAM:HUPDATE) failed, partnum       *
* 0x300002                                                     *
* rowid 0x801 iserrno 126                                      *
* Log Record: log = 23, pos = 0x10891cc, type =                *
* OLDRSAM:HUPDATE(73), trans = 15                              *
* 0000000058709060: 64000000 00004900 1e200000 00000000        *
* d.....I.                                                     *
* . ......                                                     *
* 0000000058709070: 00000000 00000000 0f000000 3c910801        *
* ........                                                     *
* ....
Comment
Upgrade to Informix 11.70.xC9W2.
Timestamps
Date  - problem reported    :
Date  - problem closed      :
Date  - last modified       :
10.09.2019
27.08.2020
31.08.2020
Problem solved at the following versions (IBM BugInfos)
Problem solved according to the fixlist(s) of the following version(s)
Informix EditionsInformix Editions
Informix Editions
DocumentationDocumentation
Documentation
IBM NewsletterIBM Newsletter
IBM Newsletter
Current BugsCurrent Bugs
Current Bugs
Bug ResearchBug Research
Bug Research
Bug FixlistsBug Fixlists
Bug Fixlists
Release NotesRelease Notes
Release Notes
Machine NotesMachine Notes
Machine Notes
Release NewsRelease News
Release News
Product LifecycleProduct Lifecycle
Lifecycle
Media DownloadMedia Download
Media Download