Re: Use virtual tuple slot for Unique node

Поиск

Список

Период

Сортировка

От	Denis Smirnov
Тема	Re: Use virtual tuple slot for Unique node
Дата	31 августа 2023 г. 03:06:56
Msg-id	527984DD-4215-4A0D-ABB0-B34A9C39214E@gmail.com обсуждение исходный текст
Ответ на	Re: Use virtual tuple slot for Unique node (David Rowley <dgrowleyml@gmail.com>)
Ответы	Re: Use virtual tuple slot for Unique node
Список	pgsql-hackers

Дерево обсуждения

Here is the example (checked on the current master branch with release build + I've made about 10 runs for each explain analyze to get repeatable results)

Before the patch:

adb=# create table t(a int, primary key(a));

adb=# insert into t select random() * 5000000

from generate_series(1, 5000000)

on conflict do nothing;

adb=# explain analyze select a from t group by a;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------

Group (cost=0.43..98761.06 rows=3160493 width=4) (actual time=0.085..1225.139 rows=3160493 loops=1)

Group Key: a

-> Index Only Scan using t_pkey on t (cost=0.43..90859.82 rows=3160493 width=4) (actual time=0.081..641.567 rows=3160493 loops=1)

Heap Fetches: 0

Planning Time: 0.188 ms

Execution Time: 1370.027 ms

(6 rows)

adb=# explain analyze select distinct a from t;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------

Unique (cost=0.43..98761.06 rows=3160493 width=4) (actual time=0.135..1525.704 rows=3160493 loops=1)

-> Index Only Scan using t_pkey on t (cost=0.43..90859.82 rows=3160493 width=4) (actual time=0.130..635.742 rows=3160493 loops=1)

Heap Fetches: 0

Planning Time: 0.273 ms

Execution Time: 1660.857 ms

(5 rows)

We can see that ExecCopySlot occupies 24% of the CPU inside ExecUnique function (thanks to palloc in Unique’s minimal tuples). On the other hand ExecCopySlot is only 6% of the ExecGroup function (we use virtual tuples in Group node).

After the patch Unique node works a little bit faster then the Group node:

adb=# explain analyze select distinct a from t;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------

Unique (cost=0.43..98761.06 rows=3160493 width=4) (actual time=0.094..1072.007 rows=3160493 loops=1)

-> Index Only Scan using t_pkey on t (cost=0.43..90859.82 rows=3160493 width=4) (actual time=0.092..592.619 rows=3160493 loops=1)

Heap Fetches: 0

Planning Time: 0.203 ms

Execution Time: 1209.940 ms

(5 rows)

adb=# explain analyze select a from t group by a;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------

Group (cost=0.43..98761.06 rows=3160493 width=4) (actual time=0.074..1140.644 rows=3160493 loops=1)

Group Key: a

-> Index Only Scan using t_pkey on t (cost=0.43..90859.82 rows=3160493 width=4) (actual time=0.070..591.930 rows=3160493 loops=1)

Heap Fetches: 0

Planning Time: 0.193 ms

Execution Time: 1276.026 ms

(6 rows)

I have added current patch to the commitfest.

31 авг. 2023 г., в 04:59, David Rowley <dgrowleyml@gmail.com> написал(а):

On Thu, 31 Aug 2023 at 05:37, Денис Смирнов <darthunix@gmail.com> wrote:
I have inspected the performance of the GROUP BY and DISTINCT queries for the sorted data streams and found out, that Group node (produced by GROUP BY) works faster then the Unique node (produced by DISTINCT). The flame graph should out the reason - Unique palloc`s tuples for the result slot while the Group node doesn’t.

I wonder, why do we use minimal tuples for the Unique node instead of the virtual ones? It looks like there is no actual reason for that as Unique doesn’t make any materialization.

It would be good to see example queries and a demonstration of the
performance increase. I'm not disputing your claims, but showing some
performance numbers might catch the eye of a reviewer more quickly.

You should also add this to the September commitfest at
https://commitfest.postgresql.org/44/

David

Вложения

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Use virtual tuple slot for Unique node

Вложения