Re: POC: GROUP BY optimization

Поиск

Список

Период

Сортировка

От	Andrei Lepikhov
Тема	Re: POC: GROUP BY optimization
Дата	2 февраля 2024 г. 04:40:12
Msg-id	f5dae142-d351-4991-a2aa-92d12dcea004@postgrespro.ru обсуждение
Ответ на	Re: POC: GROUP BY optimization (Richard Guo <guofenglinux@gmail.com>)
Ответы	Re: POC: GROUP BY optimization
Список	pgsql-hackers

Дерево обсуждения

On 2/2/2024 11:06, Richard Guo wrote:
> 
> On Fri, Feb 2, 2024 at 11:32 AM Richard Guo <guofenglinux@gmail.com 
> <mailto:guofenglinux@gmail.com>> wrote:
> 
>     On Fri, Feb 2, 2024 at 10:02 AM Tom Lane <tgl@sss.pgh.pa.us
>     <mailto:tgl@sss.pgh.pa.us>> wrote:
> 
>         One of the test cases added by this commit has not been very
>         stable in the buildfarm.  Latest example is here:
> 
>         https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2024-02-01%2021%3A28%3A04
<https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2024-02-01%2021%3A28%3A04>
> 
>         and I've seen similar failures intermittently on other machines.
> 
>         I'd suggest building this test atop a table that is more stable
>         than pg_class.  You're just waving a red flag in front of a bull
>         if you expect stable statistics from that during a regression run.
>         Nor do I see any particular reason for pg_class to be especially
>         suited to the test.
> 
> 
>     Yeah, it's not a good practice to use pg_class in this place.  While
>     looking through the test cases added by this commit, I noticed some
>     other minor issues that are not great.  Such as
> 
>     * The table 'btg' is inserted with 10000 tuples, which seems a bit
>     expensive for a test.  I don't think we need such a big table to test
>     what we want.
> 
>     * I don't see why we need to manipulate GUC max_parallel_workers and
>     max_parallel_workers_per_gather.
> 
>     * I think we'd better write the tests with the keywords being all upper
>     or all lower.  A mixed use of upper and lower is not great. Such as in
> 
>          explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w;
> 
>     * Some comments for the test queries are not easy to read.
> 
>     * For this statement
> 
>          CREATE INDEX idx_y_x_z ON btg(y,x,w);
> 
>     I think the index name would cause confusion.  It creates an index on
>     columns y, x and w, but the name indicates an index on y, x and z.
> 
>     I'd like to write a draft patch for the fixes.
> 
> 
> Here is the draft patch that fixes the issues I complained about in
> upthread.
I passed through the patch. Looks like it doesn't break anything. Why do 
you prefer to use count(*) in EXPLAIN instead of plain targetlist, like 
"SELECT x,y,..."?
Also, according to the test mentioned by Tom:
1. I see, PG uses IndexScan on (x,y). So, column x will be already 
sorted before the MergeJoin. Why not use Incremental Sort on (x,z,w) 
instead of full sort?
2. For memo, IMO, this test shows us the future near perspective of this 
feature: It is cheaper to use grouping order (w,z) instead of (z,w).

-- 
regards,
Andrei Lepikhov
Postgres Professional

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: POC: GROUP BY optimization