UNION versus collations

Поиск
Список
Период
Сортировка
От Tom Lane
Тема UNION versus collations
Дата
Msg-id 3605568.1731970579@sss.pgh.pa.us
обсуждение исходный текст
Ответы Re: UNION versus collations
Список pgsql-hackers
prepunion.c's plan_union_children(), which merges
identically-propertied UNION operations into one, has this comment:

 * NOTE: currently, we ignore collations while determining if a child has
 * the same properties.  This is semantically sound only so long as all
 * collations have the same notion of equality.  It is valid from an
 * implementation standpoint because we don't care about the ordering of
 * a UNION child's result: UNION ALL results are always unordered, and
 * generate_union_paths will force a fresh sort if the top level is a UNION.

This argument seems well past its sell-by date.  In the first place,
now that we have nondeterministic collations we can't assume that
"all collations have the same notion of equality".  In the second
place, since commit 66c0185a3 it's completely untrue that "we
don't care about the ordering of a UNION child's result", and
also untrue that "generate_union_paths will force a fresh sort".

As far as I can tell from some desultory testing, the implementation
issues don't lead to any observable bugs.  Even though the code may
try to produce wrongly-ordered paths for the sub-SELECTs, the pathkey
logic will recognize that it's not really the same ordering, leading
to injection of per-child Sorts.  Nonetheless, we're wasting cycles
producing useless ordered paths, and perhaps we might pick a less than
optimal plan in some cases (not entirely sure about that).

The point about nondeterministic collations is undeniable though.
I've not bothered to build a test case, but surely one can be
made wherein a sub-UNION acts differently than expected.

So I think we ought to apply the attached as far back as we have 
nondeterministic collations.

            regards, tom lane

diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index a0baf6d4a1..fc9f005c09 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -719,9 +719,9 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,

     /*
      * If any of my children are identical UNION nodes (same op, all-flag, and
-     * colTypes) then they can be merged into this node so that we generate
-     * only one Append/MergeAppend and unique-ification for the lot.  Recurse
-     * to find such nodes.
+     * colTypes/colCollations) then they can be merged into this node so that
+     * we generate only one Append/MergeAppend and unique-ification for the
+     * lot.  Recurse to find such nodes.
      */
     rellist = plan_union_children(root,
                                   op,
@@ -1197,13 +1197,6 @@ generate_nonunion_paths(SetOperationStmt *op, PlannerInfo *root,
  *
  * NOTE: we can also pull a UNION ALL up into a UNION, since the distinct
  * output rows will be lost anyway.
- *
- * NOTE: currently, we ignore collations while determining if a child has
- * the same properties.  This is semantically sound only so long as all
- * collations have the same notion of equality.  It is valid from an
- * implementation standpoint because we don't care about the ordering of
- * a UNION child's result: UNION ALL results are always unordered, and
- * generate_union_paths will force a fresh sort if the top level is a UNION.
  */
 static List *
 plan_union_children(PlannerInfo *root,
@@ -1232,7 +1225,8 @@ plan_union_children(PlannerInfo *root,

             if (op->op == top_union->op &&
                 (op->all == top_union->all || op->all) &&
-                equal(op->colTypes, top_union->colTypes))
+                equal(op->colTypes, top_union->colTypes) &&
+                equal(op->colCollations, top_union->colCollations))
             {
                 /* Same UNION, so fold children into parent */
                 pending_rels = lcons(op->rarg, pending_rels);

В списке pgsql-hackers по дате отправления: