Informix - Problem description
Problem IT16807 | Status: Closed |
HDR PRIMARY BLOCKS AT CHECKPOINT AFTER PING TIMEOUT AND FAILURE TO SUCCESSFULLY RECONNECT | |
product: | |
INFORMIX SERVER / 5725A3900 / C10 - IDS 12.10 | |
Problem description: | |
The primary server can hang at checkpoint unable to proceed. The hang is because nobody can flush logical log buffers, so you would tend to see the main_loop thread waiting for threads to leave their critical section, and then some threads that might be trying to flush log buffers would be waiting on the previous log buffers to flush and then ultimately there would be 1 or possibly more threads that would be trying to flush a logical log buffer, but they would be waiting on the drcb_lock mutex. Preceding the hang the primary server would have encountered a ping timeout and then you would not see it properly reconnect to the secondary get hdr operational again. So here's the stack for the main_loop thread waiting for threads to leave critical sections Stack for thread: 7 main_loop(): yield_processor_mvp wait4critex checkpoint main_loop th_init_initgls startup From onstat -u you can see a thread or threads in critical sections and also waiting on log buf (so X and G flags) lots of threads waiting on checkpoint flag (C) Userthreads address flags sessid user tty wait tout locks nreads nwrites 14d0326a8 G-BPX-- 37072 garpac - 1207419b8 0 54 8159649 48299 11f3f63e8 C--P--- 41922 danielal - 10a45b850 0 1 592209 203 11f3f6ca8 C--P--- 43376 informix - 10a45b850 0 1 1 0 11f3f7568 C--P--- 43252 informix - 10a45b850 0 0 15 0 11f3f7e28 C--P--- 43238 andreasl 107 10a45b850 0 2 2200 0 11f3f86e8 C--P--- 43128 kasiak 131 10a45b850 0 1 17 0 Then last you can see a thread waiting a long time waiting for the drcb_lock in the onstat -g lmx output: Locked mutexes: mid addr name holder lkcnt waiter waittime 9687 120372ad0 drcb_lock 139 0 111 5797 9688 120372b78 drcb_node_count_lo 139 0 The owner of the drcb_lock mutex is the dr_prsend thread and it's stack would look like this (so it's waiting for an smx response from the secondary server) yield_processor_mvp smx_listWait smx_recv GetServerVersionInfo dr_state_change dr_session_thread startup then the thread waiting for the drcb_lock mutex stack showing it's trying to flush a logical log buffer: yield_processor_mvp mt_lock_wait mt_lock dr_logcopy logwrite log_flushtolsn logm_flush rmiLogFlush rmiMonitor cdrMonitorThread cdrTrampolineThread th_init_initgls startup (in this particular case it was a thread used for ER but it wouldn't have to be, it could be any thread that had tried to call logwrite()) | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * All users * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Update to IBM Informix Server 12.10.xC8 * **************************************************************** | |
Local Fix: | |
Solution | |
Workaround | |
not known / see Local fix | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 29.08.2016 09.12.2016 09.12.2016 |
Problem solved at the following versions (IBM BugInfos) | |
12.10.xC8 | |
Problem solved according to the fixlist(s) of the following version(s) |