DB2 - Problem description
| Problem IT01804 | Status: Closed | 
| DB2 OPTIMIZER MAY CHOOSE TO DEFER PROCESSING DISTINCT CLAUSE OF A UNION-BRANCH | |
| product: | |
| DB2 FOR LUW / DB2FORLUW / A50 - DB2 | |
| Problem description: | |
| Even if DISTINCT clause is explicitly mentioned for a 
union-branch, the DB2 optimizer may choose an access plan that 
defers processing of the DISTINCT clause i.e. duplicate values 
will not be removed from that branch before doing UNION. If the 
union-branch has many duplicate values, then delaying DISTINCT 
may result in poor performance. 
 
The optimizer may choose such access plan under the following 
conditions: 
   1. SQL query contains a DISTINCT clause in a full SELECT 
   2. There is a UNION operation in the sub-query, which is an 
input to above full select 
   3. There is a DISTINCT clause in the full select of a 
UNION-branch 
 
For example, consider the following query: 
   SELECT DISTINCT COALESCE(T1.COL1, T2.COL2) 
   FROM T1 RIGHT OUTER JOIN 
   ( SELECT DISTINCT COL2 
     FROM T0 
     WHERE COL2 IS NOT NULL 
     UNION ALL 
     SELECT '(NULL)' AS COL2 
     FROM SYSIBM.SYSDUMMY1 ) AS T2 
   ON (T1.COL2 = T2.COL2); 
 
Note the following in the above SQL query: 
   1. Query's full select contains a DISTINCT clause: "DISTINCT 
COALESCE(T1.COL1, T2.COL2)" 
   2. One UNION-branch also contains a DISTINCT clause: 
"DISTINCT COL2" 
 
DB2 optimizer may choose an access plan that does not find 
DISTINCT values in the T0.COL2 column. This could result in 
sub-optimal performance if there are many duplicate values in 
T0.COL2 column. In the exfmt output, the explain plan graph 
structure may look as shown below. Note that there is no SORT or 
UNIQUE operation for applying DISTINCT clause on T0.COL2 column. 
The SORT operation-4 is applying the "DISTINCT COALESCE(T1.COL1, 
T2.COL2)" clause. 
                                ... 
                             SORT 
                              (   4) 
                                 | 
                             HSJOIN< 
                              (   5) 
           /---------------+---------------\ 
    TBSCAN                              UNION 
       (   6)                                     (   7) 
         |                                  /--------+---------\ 
TABLE: SCHEMA1      TBSCAN           TBSCAN 
         T1                            (   8) 
(   9) 
                                             | 
| 
                            TABFNC: SYSIBM    TABLE: SCHEMA1 
                                 GENROW                 T0 | |
| Problem Summary: | |
| **************************************************************** * USERS AFFECTED: * * Aix, Linux, Windows * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Upgrade to DB2 V10.5 Fix Pack 3 or higher. * **************************************************************** | |
| Local Fix: | |
| The above SQL query can be rewritten as follows: SELECT DISTINCT COALESCE(T1.COL1, T2.COL2) FROM T1 RIGHT OUTER JOIN ( SELECT MAX( COL2 ) as COL2 FROM T0 WHERE COL2 IS NOT NULL GROUP BY COL2 UNION ALL SELECT '(NULL)' AS COL2 FROM SYSIBM.SYSDUMMY1 ) AS T2 ON (T1.COL2 = T2.COL2); Note the "DISTINCT COL2" clause is replaced with "MAX(COL2) AS COL2 ... GROUP BY COL2". | |
| available fix packs: | |
| DB2 Cancun Release 10.5.0.4 (also known as Fix Pack 4) for Linux, UNIX, and Windows | |
| Solution | |
| Fixed in DB2 V10.5 Fix Pack 3. | |
| Workaround | |
| not known / see Local fix | |
| Timestamps | |
| Date - problem reported : Date - problem closed : Date - last modified : | 14.05.2014 29.01.2015 29.01.2015 | 
| Problem solved at the following versions (IBM BugInfos) | |
| Problem solved according to the fixlist(s) of the following version(s) | |
| 10.5.0.4 |  | 







 
