An improvement on parallel DISTINCT

Поиск

Список

Период

Сортировка

От	Richard Guo
Тема	An improvement on parallel DISTINCT
Дата	26 декабря 2023 г. 14:23:02
Msg-id	CAMbWs48u9VoVOouJsys1qOaC9WVGVmBa+wT1dx8KvxF5GPzezA@mail.gmail.com обсуждение исходный текст
Ответы	Re: An improvement on parallel DISTINCT (David Rowley <dgrowleyml@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

While reviewing Heikki's Omit-junk-columns patchset[1], I noticed that
root->upper_targets[] is used to set target for partial_distinct_rel,
which is not great because root->upper_targets[] is not supposed to be
used by the core code. The comment in grouping_planner() says:

* Save the various upper-rel PathTargets we just computed into
* root->upper_targets[]. The core code doesn't use this, but it
* provides a convenient place for extensions to get at the info.

Then while fixing this issue, I noticed an opportunity for improvement
in how we generate Gather/GatherMerge paths for the two-phase DISTINCT.
The Gather/GatherMerge paths are added by generate_gather_paths(), which
does not consider ordering that might be useful above the GatherMerge
node. This can be improved by using generate_useful_gather_paths()
instead. With this change I can see query plan improvement from the
regression test "select_distinct.sql". For instance,

-- Test parallel DISTINCT
SET parallel_tuple_cost=0;
SET parallel_setup_cost=0;
SET min_parallel_table_scan_size=0;
SET max_parallel_workers_per_gather=2;

-- Ensure we get a parallel plan
EXPLAIN (costs off)
SELECT DISTINCT four FROM tenk1;

-- on master
EXPLAIN (costs off)
SELECT DISTINCT four FROM tenk1;
QUERY PLAN
----------------------------------------------------
Unique
-> Sort
Sort Key: four
-> Gather
Workers Planned: 2
-> HashAggregate
Group Key: four
-> Parallel Seq Scan on tenk1
(8 rows)

-- on patched
EXPLAIN (costs off)
SELECT DISTINCT four FROM tenk1;
QUERY PLAN
----------------------------------------------------
Unique
-> Gather Merge
Workers Planned: 2
-> Sort
Sort Key: four
-> HashAggregate
Group Key: four
-> Parallel Seq Scan on tenk1
(8 rows)

I believe the second plan is better.

Attached is a patch that includes this change and also eliminates the
usage of root->upper_targets[] in the core code. It also makes some
tweaks for the comment.

Any thoughts?

[1] https://www.postgresql.org/message-id/flat/2ca5865b-4693-40e5-8f78-f3b45d5378fb%40iki.fi

Thanks
Richard

Вложения

v1-0001-Improve-parallel-DISTINCT.patch

В списке pgsql-hackers по дате отправления:

Предыдущее

От: "Zhijie Hou (Fujitsu)"
Дата: 26 декабря 2023 г., 14:09:57
Сообщение: RE: Synchronizing slots from primary to standby

Следующее

От: Andrei Lepikhov
Дата: 26 декабря 2023 г., 14:37:01
Сообщение: Re: POC: GROUP BY optimization

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

An improvement on parallel DISTINCT

Вложения

Предыдущее

Следующее