Re: Optimizer on sort aggregate

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Optimizer on sort aggregate
Дата
Msg-id CAApHDvpG4FOxwH5dBExY=n1oJpUDJ=5ZA5RW3UAaF85EH5Dsbw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Optimizer on sort aggregate  (Feng Tian <ftian@vitessedata.com>)
Список pgsql-hackers
On Sat, Oct 18, 2014 at 2:25 PM, Feng Tian <ftian@vitessedata.com> wrote:
Hi, David,

Yes, switch sorting order would loose an interesting order so if user dictates order by t, i; planner need to resort to its cost model.   Estimating cardinality of groupby is a much bigger topic than this thread.


Well I don't want to jump in and make the idea more complex than it needs to be, More research on this would be really good!

Just to make my thoughts a bit more clear, I had thought that you'd likely use attwidth from pg_statistic to determine the average width of the column in order to know if you'd want to play about with the order or not. If not, then how would you know which column to put last in the group by if there happened to be 2 text type columns in the grouping list? 

Maybe this could be determined based on some ratio between stadistinct and stawidth in pg_statistic. If for example 2 columns had the same width, then you'd likely want to use the one with more distinct values estimated. Perhaps the heuristics for determining this would be more complex as, if you had an bigint with 1000 distinct values, then it perhaps would be better to group by that first before some int with, say 5 distinct values. Or maybe not? Some series of benchmarks might some sort of indication if there is any sort of magical tipping point to be sought after here. Of course the width alone might not be a great thing to base any benchmarks on as a multibyte text type with, for example, 6 in the stawidth column, would likely be slower to group by first than a bigint, which would have 8 in the stawidth column. People that had carefully written their group bys a certain way for performance might get upset if we made their query slower by messing with them too, so I guess the logic should likely only kick in if it's a clear win to swap the order.

Regards

David Rowley



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Magnus Hagander
Дата:
Сообщение: Re: [PATCH] add ssl_protocols configuration option
Следующее
От: David Rowley
Дата:
Сообщение: Re: [PATCH] Simplify EXISTS subqueries containing LIMIT