Re: POC: GROUP BY optimization
От | Andrei Lepikhov |
---|---|
Тема | Re: POC: GROUP BY optimization |
Дата | |
Msg-id | 75743fb8-42a1-4017-a108-adea6f027f96@postgrespro.ru обсуждение исходный текст |
Ответ на | Re: POC: GROUP BY optimization (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: POC: GROUP BY optimization
|
Список | pgsql-hackers |
On 27/12/2023 12:07, Tom Lane wrote: > Andrei Lepikhov <a.lepikhov@postgrespro.ru> writes: >> To be clear. In [1], I mentioned we can perform micro-benchmarks and >> structure costs of operators. At least for fixed-length operators, it is >> relatively easy. > > I repeat what I said: this is a fool's errand. You will not get > trustworthy results even for the cases you measured, let alone > all the rest. I'd go as far as to say I would not believe your > microbenchmarks, because they would only apply for one platform, > compiler, backend build, phase of the moon, etc. Thanks for the explanation. I removed all cost-related codes. It still needs to be finished; I will smooth the code further and rewrite regression tests - many of them without cost-dependent reorderings look silly. Also, remember Alexander's remarks, which must be implemented, too. But already here, it works well. Look: Preliminaries: CREATE TABLE t(x int, y int, z text, w int); INSERT INTO t SELECT gs%100,gs%100, 'abc' || gs%10, gs FROM generate_series(1,10000) AS gs; CREATE INDEX abc ON t(x,y); ANALYZE t; SET enable_hashagg = 'off'; This patch eliminates unneeded Sort operation: explain SELECT x,y FROM t GROUP BY (x,y); explain SELECT x,y FROM t GROUP BY (y,x); Engages incremental sort: explain SELECT x,y FROM t GROUP BY (x,y,z,w); explain SELECT x,y FROM t GROUP BY (z,y,w,x); explain SELECT x,y FROM t GROUP BY (w,z,x,y); explain SELECT x,y FROM t GROUP BY (w,x,z,y); Works with subqueries: explain SELECT x,y FROM (SELECT * FROM t ORDER BY x,y,w,z) AS q1 GROUP BY (w,x,z,y); explain SELECT x,y FROM (SELECT * FROM t ORDER BY x,y,w,z LIMIT 100) AS q1 GROUP BY (w,x,z,y); But arrangement with an ORDER BY clause doesn't work: DROP INDEX abc; explain SELECT x,w,z FROM t GROUP BY (w,x,z) ORDER BY (x,z,w); I think the reason is that the sort_pathkeys and group_pathkeys are physically different structures, and we can't just compare pointers here. -- regards, Andrei Lepikhov Postgres Professional
Вложения
В списке pgsql-hackers по дате отправления: