Re: POC: GROUP BY optimization

Поиск
Список
Период
Сортировка
От Andrei Lepikhov
Тема Re: POC: GROUP BY optimization
Дата
Msg-id d026580a-15a0-4734-9e69-cc6ebb70da1d@postgrespro.ru
обсуждение исходный текст
Ответ на Re: POC: GROUP BY optimization  (Richard Guo <guofenglinux@gmail.com>)
Ответы Re: POC: GROUP BY optimization
Список pgsql-hackers
On 22/2/2024 13:35, Richard Guo wrote:
> The avg() function on integer argument is commonly used in
> aggregates.sql.  I don't think this is an issue.  See the first test
> query in aggregates.sql.
Make sense
>      > it should be parallel to the test cases for utilize the ordering of
>      > index scan and subquery scan.
> 
>     Also, I'm unsure about removing the disabling of the
>     max_parallel_workers_per_gather parameter. Have you discovered the
>     domination of the current plan over the partial one? Do the cost
>     fluctuations across platforms not trigger a parallel plan?
> 
> 
> The table used for testing contains only 100 tuples, which is the size
> of only one page.  I don't believe it would trigger any parallel plans,
> unless we manually change min_parallel_table_scan_size.
I don't intend to argue it, but just for the information, I frequently 
reduce it to zero, allowing PostgreSQL to make a decision based on 
costs. It sometimes works much better, because one small table in multi 
join can disallow an effective parallel plan.
> 
>     What's more, I suggest to address here the complaint from [1]. As I
>     see,
>     cost difference between Sort and IncrementalSort strategies in that
>     case
>     is around 0.5. To make the test more stable I propose to change it a
>     bit
>     and add a limit:
>     SELECT count(*) FROM btg GROUP BY z, y, w, x LIMIT 10;
>     It makes efficacy of IncrementalSort more obvious difference around 10
>     cost points.
> 
> 
> I don't think that's necessary.  With Incremental Sort the final cost
> is:
> 
>      GroupAggregate  (cost=1.66..19.00 rows=100 width=25)
> 
> while with full Sort it is:
> 
>      GroupAggregate  (cost=16.96..19.46 rows=100 width=25)
> 
> With the STD_FUZZ_FACTOR (1.01), there is no doubt that the first path
> is cheaper on total cost.  Not to say that even if somehow we decide the
> two paths are fuzzily the same on total cost, the first path still
> dominates because its startup cost is much cheaper.
As before, I won't protest here - it needs some computations about how 
much cost can be added by bulk extension of the relation blocks. If 
Maxim will answer that it's enough to resolve his issue, why not?

-- 
regards,
Andrei Lepikhov
Postgres Professional




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Add lookup table for replication slot invalidation causes
Следующее
От: Laurenz Albe
Дата:
Сообщение: Re: Speeding up COPY TO for uuids and arrays