DB2 - Problem description
Problem IT33322 | Status: Closed |
CDE QUERY SEEMS HANGING, IT DOESN'T COMPLETE IN HOURS WHEN IT NORMALLY COMPLETES IN MINUTES THE CAUSE IS CDE PARADISE SORT HANGS | |
product: | |
DB2 FOR LUW / DB2FORLUW / B10 - DB2 | |
Problem description: | |
The issue is a race condition caused by a lack of memory barriers when barrierWaitInterruptible() is used. The unused function barrierWait() may also have a similar issue, and that will be fixed as well. The hang includes the following stacks: ================================================================ ================ === 35184.248967.014.stack.txt: 248967 - db2agntcol (BLUDB) 14 [-] 0x000010001664FF80 ossWasteTime + 0x0050 0x0000100009A12B68 ibm_cde::query::NativeSortCB::paradisSort(ibm_cde::query::NSJob* , unsigned long) + 0x0f38 0x0000100009A1135C ibm_cde::query::NativeSortCB::sort(unsigned int) + 0x0d0c 0x000010000996479C ibm_cde::query::SortEvaluator::sortPartition(ibm_cde::query::Sor tPartition*) + 0x014c 0x00001000099628B0 ibm_cde::query::SortEvaluator::processInputsSynchronously() + 0x0590 0x0000100006CA685C ibm_cde::query::Evaluator::evaluate(bool, bool, ibm_cde::query::Evaluator::EvaluatorRestartState&, ibm_cde::query::OptPredicateTracker*) + 0x088c 0x0000100006BA279C ibm_cde::query::EvaluationRoutine::evaluate(unsigned int, sql_static_data*) + 0x03ac 0x0000100007AC3E88 ibm_cde::query::Scheduler::evaluateChain(ibm_cde::query::Evaluat ionRoutine*, unsigned long&, unsigned int) + 0x0418 0x0000100007AC0F18 ibm_cde::query::Scheduler::runWorkerThread(void*, int*) + 0x03b8 0x0000100007AC8CDC ibm_cde::query::cdeEntryPointImpl(sqeAgent*, void*, void*) + 0x00bc 0x0000100008CFBFDC cdeInterface::startCdeSubagent(sqeAgent*) + 0x00ec 0x000010000F214384 sqlriInvokeCde(sqlrr_cb*) + 0x0064 0x000010000F0487F0 sqlriSectInvoke(sqlrr_cb*, sqlri_opparm*) + 0x0410 ================================================================ ================ === 35184.226015.014.stack.txt: 226015 - db2agntcol (BLUDB) 14 [-] === 35184.226037.014.stack.txt: 226037 - db2agntcol (BLUDB) 14 [-] === 35184.248731.014.stack.txt: 248731 - db2agntcol (BLUDB) 14 [-] === 35184.249369.014.stack.txt: 249369 - db2agntcol (BLUDB) 14 [-] 0x00001000000942B8 __nanosleep + 0x0088 0x0000100009A107EC ibm_cde::query::NativeSortCB::sort(unsigned int) + 0x019c 0x000010000996479C ibm_cde::query::SortEvaluator::sortPartition(ibm_cde::query::Sor tPartition*) + 0x014c 0x00001000099628B0 ibm_cde::query::SortEvaluator::processInputsSynchronously() + 0x0590 0x0000100006CA685C ibm_cde::query::Evaluator::evaluate(bool, bool, ibm_cde::query::Evaluator::EvaluatorRestartState&, ibm_cde::query::OptPredicateTracker*) + 0x088c 0x0000100006BA279C ibm_cde::query::EvaluationRoutine::evaluate(unsigned int, sql_static_data*) + 0x03ac 0x0000100007AC3E88 ibm_cde::query::Scheduler::evaluateChain(ibm_cde::query::Evaluat ionRoutine*, unsigned long&, unsigned int) + 0x0418 0x0000100007AC0F18 ibm_cde::query::Scheduler::runWorkerThread(void*, int*) + 0x03b8 0x0000100007AC8CDC ibm_cde::query::cdeEntryPointImpl(sqeAgent*, void*, void*) + 0x00bc 0x0000100008CFBFDC cdeInterface::startCdeSubagent(sqeAgent*) + 0x00ec 0x000010000F214384 sqlriInvokeCde(sqlrr_cb*) + 0x0064 0x000010000F0487F0 sqlriSectInvoke(sqlrr_cb*, sqlri_opparm*) + 0x0410 This is on power pc which is heavily inlined. There may be some barrier related functions on the stack between paradisSort and ossWasteTime on other platforms. | |
Problem Summary: | |
**************************************************************** * USERS AFFECTED: * * All * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * None * **************************************************************** | |
Local Fix: | |
There are 2 possible workarounds: 1)Re-submitting the query, with a different system workload may works. 2)The using of the following Registry setting : db2set DB2_REDUCED_OPTIMIZATION=COL_NO_OLAP or passing it to the query in embedded guidelines : /* */ | |
Solution | |
Workaround | |
not known / see Local fix | |
Timestamps | |
Date - problem reported : Date - problem closed : Date - last modified : | 25.06.2020 28.01.2021 28.01.2021 |
Problem solved at the following versions (IBM BugInfos) | |
Problem solved according to the fixlist(s) of the following version(s) |