Re: Huge Data sets, simple queries

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Huge Data sets, simple queries
Дата
Msg-id 18925.1138474508@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Huge Data sets, simple queries  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-performance
I wrote:
> (We might need to tweak the planner to discourage selecting
> HashAggregate in the presence of DISTINCT aggregates --- I don't
> remember whether it accounts for the sortmem usage in deciding
> whether the hash will fit in memory or not ...)

Ah, I take that all back after checking the code: we don't use
HashAggregate at all when there are DISTINCT aggregates, precisely
because of this memory-blow-out problem.

For both your group-by-date query and the original group-by-month query,
the plan of attack is going to be to read the original input in grouping
order (either via sort or indexscan, with sorting probably preferred
unless the table is pretty well correlated with the index) and then
sort/uniq on the DISTINCT value within each group.  The OP is probably
losing on that step compared to your test because it's over much larger
groups than yours, forcing some spill to disk.  And most likely he's not
got an index on month, so the first sort is in fact a sort and not an
indexscan.

Bottom line is that he's probably doing a ton of on-disk sorting
where you're not doing any.  This makes me think Luke's theory about
inadequate disk horsepower may be on the money.

            regards, tom lane

В списке pgsql-performance по дате отправления:

Предыдущее
От: "Luke Lonergan"
Дата:
Сообщение: Re: Huge Data sets, simple queries
Следующее
От: hubert depesz lubaczewski
Дата:
Сообщение: Re: Huge Data sets, simple queries