Re: POC: GROUP BY optimization

Поиск
Список
Период
Сортировка
От Gavin Flower
Тема Re: POC: GROUP BY optimization
Дата
Msg-id 0b3ee377-2f67-f58d-6177-bc2ed4626499@archidevsys.co.nz
обсуждение исходный текст
Ответ на Re: POC: GROUP BY optimization  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: POC: GROUP BY optimization  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
On 30/06/18 03:03, Tomas Vondra wrote:
> On 06/29/2018 04:51 PM, Teodor Sigaev wrote:
>>
>>>> I tried to attack the cost_sort() issues and hope on that basis we 
>>>> can solve problems with 0002 patch and improve incremental sort patch.
>>>>
>>>
>>> OK, will do. Thanks for working on this!
>>
>> I hope, now we have a better cost_sort(). The obvious way is a try 
>> all combination of pathkeys in get_cheapest_group_keys_order() and 
>> choose cheapest one by cost_sort().
>
>> But it requires N! operations and potentially could be very
>> expensive in case of large number of pathkeys and doesn't solve the
>> issue with user-knows-what-he-does pathkeys.
>
> Not sure. There are N! combinations, but this seems like a good 
> candidate for backtracking [1]. You don't have to enumerate and 
> evaluate all N! combinations, just construct one and then abandon 
> whole classes of combinations as soon as they get more expensive than 
> the currently best one. That's thanks to additive nature of the 
> comparison costing, because appending a column to the sort key can 
> only make it more expensive. My guess is this will make this a non-issue.
>
> [1] https://en.wikipedia.org/wiki/Backtracking
>
>>
>> We could suggest an order of pathkeys as patch suggests now and if 
>> cost_sort() estimates cost is less than 80% (arbitrary chosen) cost
>> of user-suggested pathkeys then it use our else user pathkeys.
>>
>
> I really despise such arbitrary thresholds. I'd much rather use a more 
> reliable heuristics by default, even if it gets it wrong in some cases 
> (which it will, but that's natural).
>
> regards
>
Additionally put an upper limit threshold on the number of combinations 
to check, fairly large by default?

If first threshold is exceeded, could consider checking out a few more 
selected at random from paths not yet checked, to avoid any bias caused 
by stopping a systematic search.  This might prove important when N! is 
fairly large.


Cheers,
Gavin



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: TLS 1.3 and OpenSSL
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: Internal error XX000 with enable_partition_pruning=on, pg 11beta1 on Debian