DB2 - Problem description
| Problem IC95228 | Status: Closed |
SQL0952N : INCORRECT TIMEOUT VALUE OF -1 LEADS TO NODE FAILURES AND INTERMITTENT "LOG STATE MARKED BAD" ERRORS | |
| product: | |
DB2 FOR LUW / DB2FORLUW / A50 - DB2 | |
| Problem description: | |
- This problem happens intermittently in DPF (multi-partition)
environments.
- You will notice INTERRUPTS (SQLCODE -952) on non-catalog node
and ROLLBACKs (SQLCODE -1229) on catalog node, accompanied by
following db2diag.log messages :
On non-catalog nodes :
2013-02-27-19.42.XXX XXXX LEVEL: Error
PID : 23330818 TID : 140509 PROC :
db2sysc 22
INSTANCE: db2inst1 NODE : 015 DB :
SAMPLE
APPHDL : 0-22 APPID:
xxx.xxx.xxx.xxx.xxxxx.xxxxxxxx
AUTHID : user HOSTNAME: AAAAAA
EDUID : 140509 EDUNAME: db2agntp (SAMPLE) 15
FUNCTION: DB2 UDB, data protection services,
SQLP_DBCB::setLogState,
probe:5005
DATA #1 : <preformatted>
Error detected during initialization. As a result, for
precautionary
reasons the database log state has been marked bad.
2013-02-27-19.42.XXX XXXX LEVEL: Severe
PID : 23330818 TID : 140509 PROC :
db2sysc 22
INSTANCE: db2inst1 NODE : 015 DB :
SAMPLE
APPHDL : 0-22 APPID:
xxx.xxx.xxx.xxx.xxxxx.xxxxxxxx
AUTHID : user HOSTNAME: AAAAAA
EDUID : 140509 EDUNAME: db2agntp (SAMPLE) 15
FUNCTION: DB2 UDB, base sys utilities,
sqeLocalDatabase::FirstConnect,
probe:8721
DATA #1 : SQLCA, PD_DB2_TYPE_SQLCA, 136 bytes
sqlcaid : SQLCA sqlcabc: 136 sqlcode: -952 sqlerrml: 0
sqlerrmc:
sqlerrp : SQLEDINT
sqlerrd : (1) 0x00000000 (2) 0x00000000 (3)
0x00000000
(4) 0x00000000 (5) 0x00000000 (6)
0x00000016
sqlwarn : (1) (2) (3) (4) (5) (6)
(7) (8) (9) (10) (11)
sqlstate:
- The first trigger of the problem can be found in db2diag.log
when catalog node detects an fcm connection failure while trying
to communicate with the non catalog node due to TIMEOUT :
2013-02-27-19.42.XXX XXXX LEVEL: Error
PID : 23330818 TID : 140509 PROC :
db2sysc 22
INSTANCE: db2inst1 NODE : 0 DB :
SAMPLE
APPHDL : 0-22 APPID:
xxx.xxx.xxx.xxx.xxxxx.xxxxxxxx
AUTHID : user HOSTNAME: AAAAAA
EDUID : 1800 EDUNAME: db2fcms 0
FUNCTION: DB2 UDB, fast comm manager,
sqkfNetworkServices::detectNodeFailure, probe:15
DATA #1 : <preformatted>
Detected failure for node 15 - time elapsed: 4294967295; max
timeout: 500; link state: 4
The max timeout by default is 500 (default values of 10 secs
(CONN_ELAPSE ) and 5 ( MAX_CONNRETRIES ) it converts to 500
seconds).
So in above example node 0 could not reach node 15 in more than
500 secs.
Time elapsed: 4294967295, 4294967295 converts to hex 0xFFFFFFFF
which is -1.
This is the trigger of the FCM failures resulting in INTERRUPTS
on non-catalog nodes, -1229's on catalog node and the log state
being marked bad.
This way the node becomes unreachable due to a timing problem in
db2. | |
| Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * ALL * **************************************************************** * PROBLEM DESCRIPTION: * * See Problem Description Above * **************************************************************** * RECOMMENDATION: * * Upgrade to DB2 Version 10.5 Fix Pack 3. * **************************************************************** | |
| Local Fix: | |
N/A. | |
| available fix packs: | |
DB2 Version 10.5 Fix Pack 3 for Linux, UNIX, and Windows | |
| Solution | |
First fixed in Version 10.5 Fix Pack 3. | |
| Workaround | |
not known / see Local fix | |
| Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 27.08.2013 27.02.2014 27.02.2014 |
| Problem solved at the following versions (IBM BugInfos) | |
| Problem solved according to the fixlist(s) of the following version(s) | |
| 10.5.0.3 |
|
| 10.5.0.3 |
|