DB2 - Problem description
Problem IT20572 | Status: Closed |
HADR STANDBY WAS SHUTDOWN WHEN IT TRIES TO CONVERT ALL THE BLOCKS OF THE BLOCK LIST TO ONDISK BLOCK | |
product: | |
DB2 FOR LUW / DB2FORLUW / A50 - DB2 | |
Problem description: | |
The issue will be hit when the standby tries to convert all the blocks of the block list to ondisk block, and the first block has an extra xhdr flag. Subsequent call to hdrConvertToOnDiskBlock() starts converting from the ONDISK block. Now the new ONDISK block doesn't remember about the read before flag. Accessing affected log files will thus bring down the Standby. The fix will ensure that flags from ONDISK blocks are retained and set in the new ONDISK block. Related db2diag.log entries : 2017-04-22-13.41.24.216024+480 I144971033E479 LEVEL: Warning PID : 7459 TID : 140730232203008 PROC : db2sysc 0 INSTANCE: INST1 NODE : 000 DB : SAMPLE HOSTNAME: MACHINE1 EDUID : 92 EDUNAME: db2hadrs.0.0 (SAMPLE) 0 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrConvertToOnDiskBlock, probe:2039 DATA #1 : <preformatted> Merged block contains previous XHDR 60737 that needs to be read from disk 2017-04-22-13.41.24.216803+480 I144971513E479 LEVEL: Warning PID : 7459 TID : 140730232203008 PROC : db2sysc 0 INSTANCE: INST1 NODE : 000 DB : SAMPLE HOSTNAME: MACHINE1 EDUID : 92 EDUNAME: db2hadrs.0.0 (SAMPLE) 0 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrConvertToOnDiskBlock, probe:2352 DATA #1 : <preformatted> Merged block contains 'before' XHDR 60737 that needs to be read from disk ... 2017-04-22-13.42.07.468464+480 I145000249E1022 LEVEL: Error PID : 7459 TID : 140730160899840 PROC : db2sysc 0 INSTANCE: INST1 NODE : 000 DB : SAMPLE APPHDL : 0-10 APPID: *LOCAL.DB2.170211182828 HOSTNAME: MACHINE1 EDUID : 109 EDUNAME: db2shred (SAMPLE) 0 FUNCTION: DB2 UDB, recovery manager, sqlpshrEdu, probe:45325 MESSAGE : ZRC=0x8610000D=-2045771763=SQLP_BADLOG "Log File cannot be used" DIA8414C Logging can not continue due to an error. DATA #1 : String, 66 bytes last record from this extent does not match lastLfsLsn in its XHDR DATA #2 : SQLPG_EXTENT_NUM, PD_TYPE_SQLPG_EXTENT_NUM, 4 bytes 60737 DATA #3 : unsigned integer, 8 bytes 305704492706 DATA #4 : unsigned integer, 8 bytes 305704507694 DATA #5 : unsigned integer, 8 bytes 305704492706 DATA #6 : LFS/LSN, PD_TYPE_SQLP_LFS_LSN_PAIR, 16 bytes 40008856/000000006EECE15E DATA #7 : LFS/LSN, PD_TYPE_SQLP_LFS_LSN_PAIR, 16 bytes 18446744073709551615/FFFFFFFFFFFFFFFF ... 2017-04-22-13.42.07.483511+480 I145002233E671 LEVEL: Warning PID : 7459 TID : 140730165094144 PROC : db2sysc 0 INSTANCE: INST1 NODE : 000 DB : SAMPLE APPHDL : 0-10 APPID: *LOCAL.DB2.170211182828 HOSTNAME: MACHINE1 EDUID : 108 EDUNAME: db2redom (SAMPLE) 0 FUNCTION: DB2 UDB, recovery manager, sqlpshrValidateLogStreamEndPoint, probe:1105 DATA #1 : <preformatted> Found the end of logs on a log stream but firstLFSInNextExtent was set in the last extent. Continuing on in the hopes of finding the true EOL or a chain break. LogStreamId 0, ExtNum 60738, firstLFSInNextExtent 40008857 2017-04-22-13.42.07.484101+480 I145002905E776 LEVEL: Error PID : 7459 TID : 140730165094144 PROC : db2sysc 0 INSTANCE: INST1 NODE : 000 DB : SAMPLE APPHDL : 0-10 APPID: *LOCAL.DB2.170211182828 HOSTNAME: MACHINE1 EDUID : 108 EDUNAME: db2redom (SAMPLE) 0 FUNCTION: DB2 UDB, data protection services, sqlpshrLogMergeForWorker, probe:1257 MESSAGE : ZRC=0x8610000D=-2045771763=SQLP_BADLOG "Log File cannot be used" DIA8414C Logging can not continue due to an error. DATA #1 : db2LogStreamIDType, PD_TYPE_DB2_LOG_STREAM_ID, 2 bytes 0 DATA #2 : SQLP_GLOBAL_LOG_POS, PD_TYPE_SQLP_GLOBAL_LOG_POS, 24 bytes 40008856 000000006EECE15D 0 0 DATA #3 : Hex integer, 8 bytes 0x0000000000000000 ... 2017-04-22-13.42.07.883160+480 I145036269E1029 LEVEL: Severe PID : 7459 TID : 140736938895104 PROC : db2sysc 0 INSTANCE: INST1 NODE : 000 DB : SAMPLE APPHDL : 0-10 APPID: *LOCAL.DB2.170211182828 HOSTNAME: MACHINE1 EDUID : 33 EDUNAME: db2agent (SAMPLE) 0 FUNCTION: DB2 UDB, recovery manager, sqlpReplayMaster, probe:2500 MESSAGE : ZRC=0x8610000D=-2045771763=SQLP_BADLOG "Log File cannot be used" DIA8414C Logging can not continue due to an error. DATA #1 : String, 37 bytes Replay master fatal error: localSqlca DATA #2 : SQLCA, PD_DB2_TYPE_SQLCA, 136 bytes sqlcaid : SQLCA sqlcabc: 136 sqlcode: -1042 sqlerrml: 0 sqlerrmc: sqlerrp : sqlpRepl sqlerrd : (1) 0x00000000 (2) 0x00000000 (3) 0x00000000 (4) 0x00000000 (5) 0x00000000 (6) 0x00000000 sqlwarn : (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) sqlstate: | |
Problem Summary: | |
The issue will be hit when the standby tries to convert all the blocks of the block list to ondisk block, and the first block has an extra xhdr flag. Subsequent call to hdrConvertToOnDiskBlock() starts converting from the ONDISK block. Now the new ONDISK block doesn't remember about the read before flag. Accessing affected log files will thus bring down the Standby. The fix will ensure that flags from ONDISK blocks are retained and set in the new ONDISK block. Related db2diag.log entries : 2017-04-22-13.41.24.216024+480 I144971033E479 LEVEL: Warning PID : 7459 TID : 140730232203008 PROC : db2sysc 0 INSTANCE: INST1 NODE : 000 DB : SAMPLE HOSTNAME: MACHINE1 EDUID : 92 EDUNAME: db2hadrs.0.0 (SAMPLE) 0 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrConvertToOnDiskBlock, probe:2039 DATA #1 : <preformatted> Merged block contains previous XHDR 60737 that needs to be read from disk 2017-04-22-13.41.24.216803+480 I144971513E479 LEVEL: Warning PID : 7459 TID : 140730232203008 PROC : db2sysc 0 INSTANCE: INST1 NODE : 000 DB : SAMPLE HOSTNAME: MACHINE1 EDUID : 92 EDUNAME: db2hadrs.0.0 (SAMPLE) 0 FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrConvertToOnDiskBlock, probe:2352 DATA #1 : <preformatted> Merged block contains 'before' XHDR 60737 that needs to be read from disk ... 2017-04-22-13.42.07.468464+480 I145000249E1022 LEVEL: Error PID : 7459 TID : 140730160899840 PROC : db2sysc 0 INSTANCE: INST1 NODE : 000 DB : SAMPLE APPHDL : 0-10 APPID: *LOCAL.DB2.170211182828 HOSTNAME: MACHINE1 EDUID : 109 EDUNAME: db2shred (SAMPLE) 0 FUNCTION: DB2 UDB, recovery manager, sqlpshrEdu, probe:45325 MESSAGE : ZRC=0x8610000D=-2045771763=SQLP_BADLOG "Log File cannot be used" DIA8414C Logging can not continue due to an error. DATA #1 : String, 66 bytes last record from this extent does not match lastLfsLsn in its XHDR DATA #2 : SQLPG_EXTENT_NUM, PD_TYPE_SQLPG_EXTENT_NUM, 4 bytes 60737 DATA #3 : unsigned integer, 8 bytes 305704492706 DATA #4 : unsigned integer, 8 bytes 305704507694 DATA #5 : unsigned integer, 8 bytes 305704492706 DATA #6 : LFS/LSN, PD_TYPE_SQLP_LFS_LSN_PAIR, 16 bytes 40008856/000000006EECE15E DATA #7 : LFS/LSN, PD_TYPE_SQLP_LFS_LSN_PAIR, 16 bytes 18446744073709551615/FFFFFFFFFFFFFFFF ... 2017-04-22-13.42.07.483511+480 I145002233E671 LEVEL: Warning PID : 7459 TID : 140730165094144 PROC : db2sysc 0 INSTANCE: INST1 NODE : 000 DB : SAMPLE APPHDL : 0-10 APPID: *LOCAL.DB2.170211182828 HOSTNAME: MACHINE1 EDUID : 108 EDUNAME: db2redom (SAMPLE) 0 FUNCTION: DB2 UDB, recovery manager, sqlpshrValidateLogStreamEndPoint, probe:1105 DATA #1 : <preformatted> Found the end of logs on a log stream but firstLFSInNextExtent was set in the last extent. Continuing on in the hopes of finding the true EOL or a chain break. LogStreamId 0, ExtNum 60738, firstLFSInNextExtent 40008857 2017-04-22-13.42.07.484101+480 I145002905E776 LEVEL: Error PID : 7459 TID : 140730165094144 PROC : db2sysc 0 INSTANCE: INST1 NODE : 000 DB : SAMPLE APPHDL : 0-10 APPID: *LOCAL.DB2.170211182828 HOSTNAME: MACHINE1 EDUID : 108 EDUNAME: db2redom (SAMPLE) 0 FUNCTION: DB2 UDB, data protection services, sqlpshrLogMergeForWorker, probe:1257 MESSAGE : ZRC=0x8610000D=-2045771763=SQLP_BADLOG "Log File cannot be used" DIA8414C Logging can not continue due to an error. DATA #1 : db2LogStreamIDType, PD_TYPE_DB2_LOG_STREAM_ID, 2 bytes 0 DATA #2 : SQLP_GLOBAL_LOG_POS, PD_TYPE_SQLP_GLOBAL_LOG_POS, 24 bytes 40008856 000000006EECE15D 0 0 DATA #3 : Hex integer, 8 bytes 0x0000000000000000 ... 2017-04-22-13.42.07.883160+480 I145036269E1029 LEVEL: Severe PID : 7459 TID : 140736938895104 PROC : db2sysc 0 INSTANCE: INST1 NODE : 000 DB : SAMPLE APPHDL : 0-10 APPID: *LOCAL.DB2.170211182828 HOSTNAME: MACHINE1 EDUID : 33 EDUNAME: db2agent (SAMPLE) 0 FUNCTION: DB2 UDB, recovery manager, sqlpReplayMaster, probe:2500 MESSAGE : ZRC=0x8610000D=-2045771763=SQLP_BADLOG "Log File cannot be used" DIA8414C Logging can not continue due to an error. DATA #1 : String, 37 bytes Replay master fatal error: localSqlca DATA #2 : SQLCA, PD_DB2_TYPE_SQLCA, 136 bytes sqlcaid : SQLCA sqlcabc: 136 sqlcode: -1042 sqlerrml: 0 sqlerrmc: sqlerrp : sqlpRepl sqlerrd : (1) 0x00000000 (2) 0x00000000 (3) 0x00000000 (4) 0x00000000 (5) 0x00000000 (6) 0x00000000 sqlwarn : (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) sqlstate: | |
Local Fix: | |
Restarting standby will workaround this issue. | |
Solution | |
Problem was first fixed in Version 10.5 FixPack 5 | |
Workaround | |
Restarting standby will workaround this issue. | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 12.05.2017 14.05.2017 14.05.2017 |
Problem solved at the following versions (IBM BugInfos) | |
Problem solved according to the fixlist(s) of the following version(s) |