Re: Huge Data sets, simple queries

От: Tom Lane
Тема: Re: Huge Data sets, simple queries
Дата: ,
Msg-id: 18925.1138474508@sss.pgh.pa.us
(см: обсуждение, исходный текст)
Ответ на: Re: Huge Data sets, simple queries  (Tom Lane)
Список: pgsql-performance

Скрыть дерево обсуждения

Huge Data sets, simple queries  ("Mike Biamonte", )
 Re: Huge Data sets, simple queries  ("Jeffrey W. Baker", )
 Re: Huge Data sets, simple queries  (Tom Lane, )
  Re: Huge Data sets, simple queries  ("Jeffrey W. Baker", )
   Re: Huge Data sets, simple queries  (Tom Lane, )
    Re: Huge Data sets, simple queries  (Tom Lane, )
 Re: Huge Data sets, simple queries  ("Luke Lonergan", )
  Re: Huge Data sets, simple queries  (hubert depesz lubaczewski, )
   Re: Huge Data sets, simple queries  (Michael Stone, )
  Re: Huge Data sets, simple queries  ("Jim C. Nasby", )
   Re: Huge Data sets, simple queries  ("Luke Lonergan", )
    Re: Huge Data sets, simple queries  (Kevin, )
    Re: Huge Data sets, simple queries  ("Jim C. Nasby", )
     Re: Huge Data sets, simple queries  ("Luke Lonergan", )
      Re: Huge Data sets, simple queries  ("Jim C. Nasby", )
       Re: Huge Data sets, simple queries  ("Luke Lonergan", )
        Re: Huge Data sets, simple queries  ("Jim C. Nasby", )
    Re: Huge Data sets, simple queries  ("Jeffrey W. Baker", )
     Re: Huge Data sets, simple queries  ("Luke Lonergan", )
      Re: Huge Data sets, simple queries  (PFC, )
       Re: Huge Data sets, simple queries  ("Luke Lonergan", )
      Re: Huge Data sets, simple queries  ("Steinar H. Gunderson", )
       Re: Huge Data sets, simple queries  ("Luke Lonergan", )
      Re: Huge Data sets, simple queries  ("Jeffrey W. Baker", )
       Re: Huge Data sets, simple queries  ("Luke Lonergan", )
        Re: Huge Data sets, simple queries  ("Jeffrey W. Baker", )
         Re: Huge Data sets, simple queries  (PFC, )
          Re: Huge Data sets, simple queries  ("Luke Lonergan", )
           Re: Huge Data sets, simple queries  ("Steinar H. Gunderson", )
           Re: Huge Data sets, simple queries  (Mike Rylander, )
         Re: Huge Data sets, simple queries  ("Luke Lonergan", )
       Re: Huge Data sets, simple queries  (Michael Stone, )
     Re: Huge Data sets, simple queries  (Alan Stange, )
 Re: Huge Data sets, simple queries  ("Luke Lonergan", )
  Re: Huge Data sets, simple queries  ("Jeffrey W. Baker", )
  Re: Huge Data sets, simple queries  (Charles Sprickman, )
   Re: Huge Data sets, simple queries  ("Luke Lonergan", )
  Re: Huge Data sets, simple queries  (hubert depesz lubaczewski, )
   Re: Huge Data sets, simple queries  ("Luke Lonergan", )
 Re: Huge Data sets, simple queries  (Michael Adler, )
 Re: Huge Data sets, simple queries  ("Craig A. James", )

I wrote:
> (We might need to tweak the planner to discourage selecting
> HashAggregate in the presence of DISTINCT aggregates --- I don't
> remember whether it accounts for the sortmem usage in deciding
> whether the hash will fit in memory or not ...)

Ah, I take that all back after checking the code: we don't use
HashAggregate at all when there are DISTINCT aggregates, precisely
because of this memory-blow-out problem.

For both your group-by-date query and the original group-by-month query,
the plan of attack is going to be to read the original input in grouping
order (either via sort or indexscan, with sorting probably preferred
unless the table is pretty well correlated with the index) and then
sort/uniq on the DISTINCT value within each group.  The OP is probably
losing on that step compared to your test because it's over much larger
groups than yours, forcing some spill to disk.  And most likely he's not
got an index on month, so the first sort is in fact a sort and not an
indexscan.

Bottom line is that he's probably doing a ton of on-disk sorting
where you're not doing any.  This makes me think Luke's theory about
inadequate disk horsepower may be on the money.

            regards, tom lane


В списке pgsql-performance по дате сообщения:

От: Tom Lane
Дата:
Сообщение: Re: Huge Data sets, simple queries
От: hubert depesz lubaczewski
Дата:
Сообщение: Re: Huge Data sets, simple queries