DB2 - Problembeschreibung
| Problem IC90996 | Status: Geschlossen |
SQL0952N : INCORRECT TIMEOUT VALUE OF -1 LEADS TO NODE FAILURES AND INTERMITTENT "LOG STATE MARKED BAD" ERRORS | |
| Produkt: | |
DB2 FOR LUW / DB2FORLUW / A10 - DB2 | |
| Problembeschreibung: | |
- This problem happens intermittently in DPF (multi-partition)
environments.
- You will notice INTERRUPTS (SQLCODE -952) on non-catalog node
and ROLLBACKs (SQLCODE -1229) on catalog node, accompanied by
following db2diag.log messages :
On non-catalog nodes :
2013-02-27-19.42.XXX XXXX LEVEL: Error
PID : 23330818 TID : 140509 PROC :
db2sysc 22
INSTANCE: db2inst1 NODE : 015 DB :
SAMPLE
APPHDL : 0-22 APPID:
xxx.xxx.xxx.xxx.xxxxx.xxxxxxxx
AUTHID : user HOSTNAME: AAAAAA
EDUID : 140509 EDUNAME: db2agntp (SAMPLE) 15
FUNCTION: DB2 UDB, data protection services,
SQLP_DBCB::setLogState,
probe:5005
DATA #1 : <preformatted>
Error detected during initialization. As a result, for
precautionary
reasons the database log state has been marked bad.
2013-02-27-19.42.XXX XXXX LEVEL: Severe
PID : 23330818 TID : 140509 PROC :
db2sysc 22
INSTANCE: db2inst1 NODE : 015 DB :
SAMPLE
APPHDL : 0-22 APPID:
xxx.xxx.xxx.xxx.xxxxx.xxxxxxxx
AUTHID : user HOSTNAME: AAAAAA
EDUID : 140509 EDUNAME: db2agntp (SAMPLE) 15
FUNCTION: DB2 UDB, base sys utilities,
sqeLocalDatabase::FirstConnect,
probe:8721
DATA #1 : SQLCA, PD_DB2_TYPE_SQLCA, 136 bytes
sqlcaid : SQLCA sqlcabc: 136 sqlcode: -952 sqlerrml: 0
sqlerrmc:
sqlerrp : SQLEDINT
sqlerrd : (1) 0x00000000 (2) 0x00000000 (3)
0x00000000
(4) 0x00000000 (5) 0x00000000 (6)
0x00000016
sqlwarn : (1) (2) (3) (4) (5) (6)
(7) (8) (9) (10) (11)
sqlstate:
- The first trigger of the problem can be found in db2diag.log
when catalog node detects an fcm connection failure while trying
to communicate with the non catalog node due to TIMEOUT :
2013-02-27-19.42.XXX XXXX LEVEL: Error
PID : 23330818 TID : 140509 PROC :
db2sysc 22
INSTANCE: db2inst1 NODE : 0 DB :
SAMPLE
APPHDL : 0-22 APPID:
xxx.xxx.xxx.xxx.xxxxx.xxxxxxxx
AUTHID : user HOSTNAME: AAAAAA
EDUID : 1800 EDUNAME: db2fcms 0
FUNCTION: DB2 UDB, fast comm manager,
sqkfNetworkServices::detectNodeFailure, probe:15
DATA #1 : <preformatted>
Detected failure for node 15 - time elapsed: 4294967295; max
timeout: 500; link state: 4
The max timeout by default is 500 (default values of 10 secs
(CONN_ELAPSE ) and 5 ( MAX_CONNRETRIES ) it converts to 500
seconds).
So in above example node 0 could not reach node 15 in more than
500 secs.
Time elapsed: 4294967295, 4294967295 converts to hex 0xFFFFFFFF
which is -1.
This is the trigger of the FCM failures resulting in INTERRUPTS
on non-catalog nodes, -1229's on catalog node and the log state
being marked bad.
This way the node becomes unreachable due to a timing problem in
db2. | |
| Problem-Zusammenfassung: | |
**************************************************************** * USERS AFFECTED: * * ALL * **************************************************************** * PROBLEM DESCRIPTION: * * See Problem Description above. * **************************************************************** * RECOMMENDATION: * * Upgrade to DB2 Version V10.1 Fix Pack 3. * **************************************************************** | |
| Local-Fix: | |
N/A. | |
| verfügbare FixPacks: | |
DB2 Version 10.1 Fix Pack 3 for Linux, UNIX, and Windows | |
| Lösung | |
First fixed in DB2 Version 10.1 Fix Pack 3. | |
| Workaround | |
keiner bekannt / siehe Local-Fix | |
| Bug-Verfolgung | |
Vorgänger : APAR is sysrouted TO one or more of the following: IC95228 Nachfolger : | |
| Weitere Daten | |
Datum - Problem gemeldet : Datum - Problem geschlossen : Datum - der letzten Änderung: | 20.03.2013 19.11.2013 19.11.2013 |
| Problem behoben ab folgender Versionen (IBM BugInfos) | |
| Problem behoben lt. FixList in der Version | |
| 10.1.0.3 |
|
| 10.1.0.3 |
|