An improvement on parallel DISTINCT

Поиск
Список
Период
Сортировка
От Richard Guo
Тема An improvement on parallel DISTINCT
Дата
Msg-id CAMbWs48u9VoVOouJsys1qOaC9WVGVmBa+wT1dx8KvxF5GPzezA@mail.gmail.com
обсуждение исходный текст
Ответы Re: An improvement on parallel DISTINCT  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-hackers
While reviewing Heikki's Omit-junk-columns patchset[1], I noticed that
root->upper_targets[] is used to set target for partial_distinct_rel,
which is not great because root->upper_targets[] is not supposed to be
used by the core code.  The comment in grouping_planner() says:

  * Save the various upper-rel PathTargets we just computed into
  * root->upper_targets[].  The core code doesn't use this, but it
  * provides a convenient place for extensions to get at the info.

Then while fixing this issue, I noticed an opportunity for improvement
in how we generate Gather/GatherMerge paths for the two-phase DISTINCT.
The Gather/GatherMerge paths are added by generate_gather_paths(), which
does not consider ordering that might be useful above the GatherMerge
node.  This can be improved by using generate_useful_gather_paths()
instead.  With this change I can see query plan improvement from the
regression test "select_distinct.sql".  For instance,

-- Test parallel DISTINCT
SET parallel_tuple_cost=0;
SET parallel_setup_cost=0;
SET min_parallel_table_scan_size=0;
SET max_parallel_workers_per_gather=2;

-- Ensure we get a parallel plan
EXPLAIN (costs off)
SELECT DISTINCT four FROM tenk1;

-- on master
EXPLAIN (costs off)
SELECT DISTINCT four FROM tenk1;
                     QUERY PLAN
----------------------------------------------------
 Unique
   ->  Sort
         Sort Key: four
         ->  Gather
               Workers Planned: 2
               ->  HashAggregate
                     Group Key: four
                     ->  Parallel Seq Scan on tenk1
(8 rows)

-- on patched
EXPLAIN (costs off)
SELECT DISTINCT four FROM tenk1;
                     QUERY PLAN
----------------------------------------------------
 Unique
   ->  Gather Merge
         Workers Planned: 2
         ->  Sort
               Sort Key: four
               ->  HashAggregate
                     Group Key: four
                     ->  Parallel Seq Scan on tenk1
(8 rows)

I believe the second plan is better.

Attached is a patch that includes this change and also eliminates the
usage of root->upper_targets[] in the core code.  It also makes some
tweaks for the comment.

Any thoughts?

[1] https://www.postgresql.org/message-id/flat/2ca5865b-4693-40e5-8f78-f3b45d5378fb%40iki.fi

Thanks
Richard
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Zhijie Hou (Fujitsu)"
Дата:
Сообщение: RE: Synchronizing slots from primary to standby
Следующее
От: Andrei Lepikhov
Дата:
Сообщение: Re: POC: GROUP BY optimization