Re: Fwd: Slow Count-Distinct Query

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: Fwd: Slow Count-Distinct Query
Дата
Msg-id CAMkU=1y9RqS7E7xsVkEqrhJ8ZtzDu1Cj6uwC4p7uWpZ5c5U-JQ@mail.gmail.com
обсуждение исходный текст
Ответ на Fwd: Slow Count-Distinct Query  (Varadharajan Mukundan <srinathsmn@gmail.com>)
Ответы Re: Fwd: Slow Count-Distinct Query
Список pgsql-performance
On Fri, Apr 4, 2014 at 2:31 AM, Varadharajan Mukundan <srinathsmn@gmail.com> wrote:
Sorry that i just joined the list and have to break the thread to reply to Tom Lane's response on this @ http://www.postgresql.org/message-id/13741.1396275339@sss.pgh.pa.us


Note that the indexscan is actually *slower* than the seqscan so far as
the table access is concerned; if the table were big enough to not fit
in RAM, this would get very much worse.  So I'm not impressed with trying
to force the optimizer's hand as you've done here --- it might be a nice
plan now, but it's brittle.  See if a bigger work_mem improves matters
enough with the regular plan.

I agree to the point that hand tuning optimiser is brittle and something that should not be done. But the reason to that was to force the index-only scan (Not the index scan). I feel Index-only scan would speed up given postgres is row oriented and we are running count-distinct on a column in a table with lot of columns (Say 6-7 in number). I think that is what have contributed to the gain in performance. 


It looks like the original emailer wrote a query that the planner is not smart enough to plan properly (A known limitation of that kind of query).  He then made a bunch of changes, none of which worked.  He then re-wrote the query into a form for which the planner does a better job on.  What we do not know is, what would happen if he undoes all of those other changes and *just* uses the new form of the query?

 

I did a similar test with around 2 million tuples with work_mem = 250 MB and got the query to respond with 2x speed up. But the speed-up got with index-only scan was huge and response was in sub-seconds whereas with work_mem the response was couple of seconds.

This change is almost certainly due to the change from a sort to a hash aggregate, and nothing to do with the index-only scan at all.


В списке pgsql-performance по дате отправления:

Предыдущее
От: Thom Brown
Дата:
Сообщение: Re: PGSQL 9.3 - Materialized View - multithreading
Следующее
От: PARIS Nicolas
Дата:
Сообщение: Re: PGSQL 9.3 - Materialized View - multithreading