home clear 64x64
en blue 200x116 de orange 200x116 info letter User
suche 36x36
Neueste VersionenFixList
11.1.0.7 FixList
10.5.0.9 FixList
10.1.0.6 FixList
9.8.0.5 FixList
9.7.0.11 FixList
9.5.0.10 FixList
9.1.0.12 FixList
Haben Sie Probleme? - Kontaktieren Sie uns.
Kostenlos registrieren anmeldung-x26
Kontaktformular kontakt-x26

DB2 - Problembeschreibung

Problem IC66646 Status: Geschlossen

HADR PRIMARY REINTEGRATION WILL FAIL WITH PRIMARY/STANDBY MISMATCH
AFTER THE PAIR REACHES PEER STATE

Produkt:
DB2 FOR LUW / DB2FORLUW / 970 - DB2
Problembeschreibung:
The problem can be seen after a takeover by force is issued and 
a) the old-primary is deactivated and brought up as a standby 
or 
 b) the old-primary is killed and is brought up as a primary 
first instead of as a standby (which will fail),then trying to 
reintegrate it as a standby 
 
will cause a Primary/Standby lsn mismatch. The reason is that 
when the old-primary is deactivated or the old-primary is first 
brought up as a primary (which will eventually fail due to 
timeout). The last/current log file will be truncated and the 
minbufflsn, lowtranlsn and remote catchup start lsn will be 
moved to the start of next file, The same log record that is 
truncated on the old-primary is NOT truncated on the new Primary 
and so is used for writing more log records and so is used for 
writing more log records. When the old-Primary is reintegrated 
as a standby 
and if no log writes are done on the new-primary until this 
point a Peer connection is established between the 
Primary/Standby. 
After the peer state is established, when the new primary writes 
some logs, sends them to standby then it will result in a 
Primary/standby LSN mismatch on the standby server which will 
bring down the standby server. The error mssage "SQL1768N unable 
to start HADR. Reason code='7' " will be given. 
 
You may see the following log entries in the db2diag.log file. 
 
2010-02-10-10.36.47.166177-360 E121063953A371     LEVEL: Event 
PID     : 172306               TID  : 7969        PROC : db2sysc 
0 
INSTANCE: db2inst1             NODE : 000 
EDUID   : 7969                 EDUNAME: db2hadrs (sample) 0 
FUNCTION: DB2 UDB, High Availability Disaster Recovery, 
hdrSetHdrState, probe:10000 
CHANGE  : HADR state set to S-Peer (was S-NearlyPeer) 
 
2010-02-10-10.36.51.574186-360 I121079812A498     LEVEL: Error 
PID     : 172306               TID  : 7969        PROC : db2sysc 
0 
INSTANCE: db2inst1             NODE : 000 
EDUID   : 7969                 EDUNAME: db2hadrs (sample) 0 
FUNCTION: DB2 UDB, High Availability Disaster Recovery, 
hdrAddDataBlock, probe:40012 
MESSAGE : Primary/standby mismatch. RCUStartLSN 0000000224D4000C 
not on record 
          boundary. RCU first page bytecount 4080, firstindex 
16, pagelsn 
          0002230BCFFB. 
 
          2010-02-10-10.36.51.574321-360 I121080311A438 
LEVEL: Severe 
PID     : 172306               TID  : 7969        PROC : db2sysc 
0 
INSTANCE: db2inst1             NODE : 000 
EDUID   : 7969                 EDUNAME: db2hadrs (sample) 0 
FUNCTION: DB2 UDB, High Availability Disaster Recovery, 
hdrAddDataBlock, probe:40012 
RETCODE : ZRC=0x87800145=-2021654203=HDR_ZRC_VALIDATION_REJECT 
          "HADR shuts down due to validation rejection"
Problem-Zusammenfassung:
The problem can be seen after a takeover by force is issued and 
 
 a) the old-primary is deactivated and brought up as a standby 
or 
 b) the old-primary is killed and is brought up as a primary 
first instead of as a standby (which will fail),then trying to 
reintegrate it as a standby 
 
will cause a Primary/Standby lsn mismatch. The reason is that 
when the old-primary is deactivated or the old-primary is first 
brought up as a primary (which will eventually fail due to 
timeout). The last/current log file will be truncated and the 
minbufflsn, lowtranlsn and remote catchup start lsn will be 
moved to the start of next file, The same log record that is 
truncated on the old-primary is NOT truncated on the new Primary 
and so is used for writing more log records and so is used for 
writing more log records. When the old-Primary is reintegrated 
as a standby 
and if no log writes are done on the new-primary until this 
point a Peer connection is established between the 
Primary/Standby. 
After the peer state is established, when the new primary writes 
some logs, sends them to standby then it will result in a 
Primary/standby LSN mismatch on the standby server which will 
bring down the standby server. The error mssage "SQL1768N unable 
to start HADR. Reason code='7' " will be given. 
 
You may see the following log entries in the db2diag.log file. 
 
2010-02-10-10.36.47.166177-360 E121063953A371     LEVEL: Event 
PID     : 172306               TID  : 7969        PROC : db2sysc 
0 
INSTANCE: db2inst1             NODE : 000 
EDUID   : 7969                 EDUNAME: db2hadrs (sample) 0 
FUNCTION: DB2 UDB, High Availability Disaster Recovery, 
hdrSetHdrState, probe:10000 
CHANGE  : HADR state set to S-Peer (was S-NearlyPeer) 
 
2010-02-10-10.36.51.574186-360 I121079812A498     LEVEL: Error 
PID     : 172306               TID  : 7969        PROC : db2sysc 
0 
INSTANCE: db2inst1             NODE : 000 
EDUID   : 7969                 EDUNAME: db2hadrs (sample) 0 
FUNCTION: DB2 UDB, High Availability Disaster Recovery, 
hdrAddDataBlock, probe:40012 
MESSAGE : Primary/standby mismatch. RCUStartLSN 0000000224D4000C 
not on record 
          boundary. RCU first page bytecount 4080, firstindex 
16, pagelsn 
          0002230BCFFB. 
 
          2010-02-10-10.36.51.574321-360 I121080311A438 
LEVEL: Severe 
PID     : 172306               TID  : 7969        PROC : db2sysc 
0 
INSTANCE: db2inst1             NODE : 000 
EDUID   : 7969                 EDUNAME: db2hadrs (sample) 0 
FUNCTION: DB2 UDB, High Availability Disaster Recovery, 
hdrAddDataBlock, probe:40012 
RETCODE : ZRC=0x87800145=-2021654203=HDR_ZRC_VALIDATION_REJECT 
          "HADR shuts down due to validation rejection"
Local-Fix:
Backup the new primary database and restore it on the standby 
machine and enable HADR to bring it up as a standby. 
If the system is in HA (TSA) environment fixing the APAR IC65836 
maybe avoid hitting this APAR
verfügbare FixPacks:
DB2 Version 9.7 Fix Pack 3 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 3a for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 4 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 5 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 6 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 7 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 8 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 9 for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 9a for Linux, UNIX, and Windows
DB2 Version 9.7 Fix Pack 10 for Linux, UNIX, and Windows

Lösung
This issue is first fixed on DB2 V9.7fp3
Workaround
Backup the new primary database and restore it on the standby 
 
machine and enable HADR to bring it up as a standby. 
If the system is in HA (TSA) environment fixing the APAR IC65836 
maybe avoid hitting this APAR
Weitere Daten
Datum - Problem gemeldet    :
Datum - Problem geschlossen :
Datum - der letzten Änderung:
25.02.2010
23.09.2010
23.09.2010
Problem behoben ab folgender Versionen (IBM BugInfos)
9.7.FP3
Problem behoben lt. FixList in der Version
9.7.0.3 FixList
9.7.0.3 FixList