Informix - Problem description
Problem IT37230 | Status: Closed |
RSS NODE BLOCKED:LOG_DROP POSSIBLE WHEN DROPPING LOGICAL LOGS ONPRIMARY WITH CONCURRENT INDEX BUILD ON RSS NODE | |
product: | |
INFORMIX SERVER / 5725A3900 / C10 - IDS 12.10 | |
Problem description: | |
There appears to be a timing issue where it's possible the RSS node gets in this blocked state. It will be unable to advance its log position, so will start lagging behind the primary. Additionally, the situation is a deadlock where the recovery thread needs the index build to proceed, however, some of the parallel index build threads need the recovery thread to proceed so they can unblock to continue. The index build session needs to be interrupted/killed with onmode -z to get the server out of this condition. So to identify the issue, first is the main onstat banner would show the following LOG_DROP blocked condition: IBM Informix Dynamic Server Version 12.10.FC9W1X4 -- Read-Only (RSS) -- Up 18 days 05:10:21 -- 223444992 Kbytes Blocked:LOG_DROP Next, the xchg_2.0 recovery thread (note some of the parallel index build threads can have the same name) stack would look like the following: yield_processor_mvp wait4critex isenter_critblock rslogdrop plogredo rlogm_redo scan_logredo next_lscan prod_loop1 producer_thread startup So it's waiting for all other threads to get out of critical sections (users with X flag in onstat -u). So there would be at least 1 other thread in a critical section. So the 1st user (from onstat -u output below) is the xchg_2.0 recovery thread. The other user/thread that the server is deadlocked with is the 2nd on, in this case from session 2327271. 173a98ce8 --B-XRD 178 informix - 0 0 0 16178 3106443 19c4f5c98 ----XR- 2327271 user1 - 0 0 0 0 0 From onstat -g ses you should see the sqlexec thread has spawned multiple other threads (used to perform the fast/parallel index build): session #RSAM total used dynamic id user tty pid hostname threads memory memory explain 2327271 user1 - 14376 host1 10 2306048 2198656 off tid name rstcb flags curstk status 3708024 sqlexec 1a95f1358 Y--P-R- 21455 cond wait opened_up - 3708268 xchg_1.0 19c4f5c98 ----XR- 5903 running- 3708269 xchg_1.1 184e7c5d8 Y----R- 4255 cond wait sortproc:0- 3708270 xchg_1.2 1b34db5f8 Y----R- 6639 cond wait block - 3708273 psortpro 184e724f8 Y----R- 3663 cond wait backend:0 - 3708274 psortpro 19c4d7108 Y----R- 3663 cond wait backend:0 - 3708277 psortpro 184e55438 Y----R- 5071 cond wait block - 3708278 psortpro 184e5b688 Y----R- 3663 cond wait backend:1 - 3708279 psortpro 19c4dd358 Y----R- 3663 cond wait backend:1 - 3708280 psortpro 184e4d718 Y----R- 3663 cond wait backend:1 - The SQL the session is running could be some query (if the optimizer plan generated caused the server to create a temp table and then an index build on that temp table, or it could just be a manual index build on a temp table on the RSS node). In this particular case, the SQL was a query where the optimizer had required the building of a temp table and then building of an index on the temp table, as can be seen by the stack of the above sqlexec thread. Main point here is that the sqlexec thread has gone into code doing an index build for which ever reason: yield_processor_mvp mt_wait mt_wait_sem open_xchg open_btmrg execute fastidxbld rsaddindex fmamaddindex fmaddindex tmptab_create_index filltemp scan_open join_open join_open join_open join_open merge_open merge_open sort_open filltemp scan_open materialize_viewtmp prepselect open_cursor sql_open sq_open sqmain The xchg_1.0 thread (or 1 of the parallel index build xcgh threads will be constantly trying to run or be running, while 1 or more of them would be in the block state. The running 1 if you can get a good stack should show it in next_sort. | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * Users of Informix Server prior to 12.10.xC15 and 14.10.xC7. * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Upgrade to Informix Server 12.10.xC15 or 14.10.xC7 (when * * available). * **************************************************************** | |
Local Fix: | |
Solution | |
Workaround | |
**************************************************************** * USERS AFFECTED: * * Users of Informix Server prior to 12.10.xC15 and 14.10.xC7. * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Upgrade to Informix Server 12.10.xC15 or 14.10.xC7 (when * * available). * **************************************************************** | |
Comment | |
Fixed in Informix Server 12.10.xC15 and 14.10.xC7. | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 11.06.2021 26.08.2021 26.08.2021 |
Problem solved at the following versions (IBM BugInfos) | |
Problem solved according to the fixlist(s) of the following version(s) |