Re: Slow Count-Distinct Query

От: Tom Lane
Тема: Re: Slow Count-Distinct Query
Дата: ,
(см: обсуждение, исходный текст)
Ответ на: Slow Count-Distinct Query  (Christopher Jackson)
Ответы: Re: Slow Count-Distinct Query  (Christopher Jackson)
Список: pgsql-performance

Скрыть дерево обсуждения

Slow Count-Distinct Query  (Christopher Jackson, )
 Re: Slow Count-Distinct Query  (Shaun Thomas, )
 Re: Slow Count-Distinct Query  (Tom Lane, )
  Re: Slow Count-Distinct Query  (Christopher Jackson, )
 Re: Slow Count-Distinct Query  (bricklen, )
  Re: Slow Count-Distinct Query  (Christopher Jackson, )
   Re: Slow Count-Distinct Query  (Michael Paquier, )
    Re: Slow Count-Distinct Query  (Christopher Jackson, )

Christopher Jackson <> writes:
>   tl;dr - How can I speed up my count-distinct query?

EXPLAIN doesn't provide a lot of visibility into what the Aggregate plan
node is doing, but in this case what it's doing is an internal sort/uniq
operation to implement the DISTINCT.  You didn't say what value of
work_mem you're using, but it'd need to be probably 50-100MB to prevent
that sort from spilling to disk (and therefore being slow).

Note that the indexscan is actually *slower* than the seqscan so far as
the table access is concerned; if the table were big enough to not fit
in RAM, this would get very much worse.  So I'm not impressed with trying
to force the optimizer's hand as you've done here --- it might be a nice
plan now, but it's brittle.  See if a bigger work_mem improves matters
enough with the regular plan.

> *I'm concerned about setting the enable_bitmapscan and seq_page_cost values
> because I'm not yet sure what the consequences are.  Can anyone enlighten
> me on the recommended way to speed up this query?*

Turning off enable_bitmapscan globally would be a seriously bad idea.
Changing the cost settings to these values globally might be all right;
it would amount to optimizing for all-in-memory cases, which might or
might not be a good idea for your situation.  For that matter, greatly
increasing work_mem globally is usually not thought to be smart either;
remember that it's a per-sort-operation setting so you may need to
provision a considerable multiple of the setting as physical RAM,
depending on how many queries you expect to run concurrently.  So all in
all you might be well advised to just set special values for this one
query, whichever solution approach you use.

I doubt you need the "where email=email" hack, in any case.  That isn't
forcing the optimizer's decision in any meaningful fashion.

            regards, tom lane

В списке pgsql-performance по дате сообщения:

От: Will Platnick
Сообщение: Re: Sudden crazy high CPU usage
От: Niels Kristian Schjødt
Сообщение: Re: Sudden crazy high CPU usage