Re: POC: GROUP BY optimization

Поиск

Список

Период

Сортировка

От	Andrei Lepikhov
Тема	Re: POC: GROUP BY optimization
Дата	28 декабря 2023 г. 08:22:47
Msg-id	75743fb8-42a1-4017-a108-adea6f027f96@postgrespro.ru обсуждение исходный текст
Ответ на	Re: POC: GROUP BY optimization (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: POC: GROUP BY optimization
Список	pgsql-hackers

Дерево обсуждения

On 27/12/2023 12:07, Tom Lane wrote:
> Andrei Lepikhov <a.lepikhov@postgrespro.ru> writes:
>> To be clear. In [1], I mentioned we can perform micro-benchmarks and
>> structure costs of operators. At least for fixed-length operators, it is
>> relatively easy.
> 
> I repeat what I said: this is a fool's errand.  You will not get
> trustworthy results even for the cases you measured, let alone
> all the rest.  I'd go as far as to say I would not believe your
> microbenchmarks, because they would only apply for one platform,
> compiler, backend build, phase of the moon, etc.

Thanks for the explanation.
I removed all cost-related codes. It still needs to be finished; I will 
smooth the code further and rewrite regression tests - many of them 
without cost-dependent reorderings look silly. Also, remember 
Alexander's remarks, which must be implemented, too.
But already here, it works well. Look:

Preliminaries:
CREATE TABLE t(x int, y int, z text, w int);
INSERT INTO t SELECT gs%100,gs%100, 'abc' || gs%10, gs
   FROM generate_series(1,10000) AS gs;
CREATE INDEX abc ON t(x,y);
ANALYZE t;
SET enable_hashagg = 'off';

This patch eliminates unneeded Sort operation:
explain SELECT x,y FROM t GROUP BY (x,y);
explain SELECT x,y FROM t GROUP BY (y,x);

Engages incremental sort:
explain SELECT x,y FROM t GROUP BY (x,y,z,w);
explain SELECT x,y FROM t GROUP BY (z,y,w,x);
explain SELECT x,y FROM t GROUP BY (w,z,x,y);
explain SELECT x,y FROM t GROUP BY (w,x,z,y);

Works with subqueries:
explain SELECT x,y
FROM (SELECT * FROM t ORDER BY x,y,w,z) AS q1
GROUP BY (w,x,z,y);
explain SELECT x,y
FROM (SELECT * FROM t ORDER BY x,y,w,z LIMIT 100) AS q1
GROUP BY (w,x,z,y);

But arrangement with an ORDER BY clause doesn't work:

DROP INDEX abc;
explain SELECT x,w,z FROM t GROUP BY (w,x,z) ORDER BY (x,z,w);

I think the reason is that the sort_pathkeys and group_pathkeys are 
physically different structures, and we can't just compare pointers here.

-- 
regards,
Andrei Lepikhov
Postgres Professional

Вложения

0001-Explore-alternative-orderings-of-group-by-pathkeys-d-20231228.patch

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: POC: GROUP BY optimization

Вложения