Re: weird execution plan

Поиск
Список
Период
Сортировка
От David G Johnston
Тема Re: weird execution plan
Дата
Msg-id 1410557668457-5818905.post@n5.nabble.com
обсуждение исходный текст
Ответ на Re: weird execution plan  ("Huang, Suya" <Suya.Huang@au.experian.com>)
Ответы Re: weird execution plan
Список pgsql-performance
Huang, Suya wrote
> Both queries have been run several times so cache would have same effect
> on both of them?  Below is the plan with buffer information.

Not everyone does so its nice to make certain - especially since I'm not all
that familiar with the code involved.  But since no one else has answered I
will theorize.

SELECT count(*) FROM ( SELECT DISTINCT col FROM tbl )

vs

SELECT count(DISTINCT col) FROM tbl

The code for "SELECT DISTINCT col" is likely highly efficient because it
works on complete sets of records.

The code for "SELECT count(DISTINCT col)" is at a relative disadvantage
since it must evaluate one row at a time and remember whether it had seen
the same value previously before deciding whether to increment a counter.

With a large number of duplicate rows the process of making the row set
smaller before counting the end result will perform better since fewer rows
must be evaluated in the less efficient count(DISTINCT) expression - the
time saved there more than offset by the fact that you are effectively
passing over that subset of the data a second time.

HashAggregate(1M rows) + Aggregate(200k rows) < Aggregate(1M rows)

David J.




--
View this message in context: http://postgresql.1045698.n5.nabble.com/weird-execution-plan-tp5818730p5818905.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


В списке pgsql-performance по дате отправления:

Предыдущее
От: "Huang, Suya"
Дата:
Сообщение: Re: weird execution plan
Следующее
От: "Huang, Suya"
Дата:
Сообщение: Re: weird execution plan