Re: weird execution plan

Поиск
Список
Период
Сортировка
От Huang, Suya
Тема Re: weird execution plan
Дата
Msg-id D83E55F5F4D99B4A9B4C4E259E6227CD01EBF653@AUX1EXC02.apac.experian.local
обсуждение исходный текст
Ответ на Re: weird execution plan  (David G Johnston <david.g.johnston@gmail.com>)
Список pgsql-performance
-----Original Message-----
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of David G
Johnston
Sent: Saturday, September 13, 2014 7:34 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] weird execution plan

>Not everyone does so its nice to make certain - especially since I'm not all that familiar with the code involved.
Butsince no one else has answered I will theorize. 
>
>SELECT count(*) FROM ( SELECT DISTINCT col FROM tbl )
>
>vs
>
>SELECT count(DISTINCT col) FROM tbl
>
>The code for "SELECT DISTINCT col" is likely highly efficient because it works on complete sets of records.
>
>The code for "SELECT count(DISTINCT col)" is at a relative disadvantage since it must evaluate one row at a time and
rememberwhether it had seen the same value previously before deciding whether to >increment a counter. 
>
>With a large number of duplicate rows the process of making the row set smaller before counting the end result will
performbetter since fewer rows must be evaluated in the less efficient count(DISTINCT) >expression - the time saved
theremore than offset by the fact that you are effectively passing over that subset of the data a second time. 
>
>HashAggregate(1M rows) + Aggregate(200k rows) < Aggregate(1M rows)
>
>David J.

Thanks David!

I am so surprised to the findings you put here. Just did an explain plan on the example you gave and pasted the result
below,you're correct.  

"Select count(distinct col1)" is really  a very common SQL statement we write daily, in Postgres, we need to rewrite it
sothat the aggregate doesn't happen on a very large data sets... I am wondering if this is something to be improved
fromthe optimizer ifself, instead of developers to rewrite SQL. Like having the optimizer just do the counting in the
endinstead of doing it each time.  I used Oracle before, never saw this issue... 

But really thank you for pointing this out, very valuable lesson-learnt in PG SQL writing for me and our developers.

dev=# explain analyze select count(*) from (select distinct wid from terms_weekly) foo;
                                                                          QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1278656.00..1278656.01 rows=1 width=0) (actual time=24316.335..24316.336 rows=1 loops=1)
   ->  HashAggregate  (cost=1278651.50..1278653.50 rows=200 width=42) (actual time=23899.916..24242.010 rows=1298124
loops=1)
         ->  Append  (cost=0.00..1171738.20 rows=42765321 width=42) (actual time=0.028..13631.898 rows=42765320
loops=1)
               ->  Seq Scan on search_terms_weekly  (cost=0.00..0.00 rows=1 width=516) (actual time=0.001..0.001 rows=0
loops=1)
               ->  Seq Scan on search_terms_weekly_20140503  (cost=0.00..293352.90 rows=10702190 width=42) (actual
time=0.026..2195.460rows=10702190 loops=1) 
               ->  Seq Scan on search_terms_weekly_20140510  (cost=0.00..298773.53 rows=10878953 width=42) (actual
time=8.244..3163.087rows=10878953 loops=1) 
               ->  Seq Scan on search_terms_weekly_20140517  (cost=0.00..288321.17 rows=10537717 width=41) (actual
time=7.345..2520.531rows=10537717 loops=1) 
               ->  Seq Scan on search_terms_weekly_20140524  (cost=0.00..291290.60 rows=10646460 width=41) (actual
time=8.543..2693.833rows=10646460 loops=1) 
 Total runtime: 24333.830 ms
(9 rows)

dev=# explain analyze select count(distinct wid) from terms_weekly;



                                                                       QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1278651.50..1278651.51 rows=1 width=42) (actual time=585774.511..585774.511 rows=1 loops=1)
   ->  Append  (cost=0.00..1171738.20 rows=42765321 width=42) (actual time=0.019..10656.782 rows=42765320 loops=1)
         ->  Seq Scan on search_terms_weekly  (cost=0.00..0.00 rows=1 width=516) (actual time=0.002..0.002 rows=0
loops=1)
         ->  Seq Scan on search_terms_weekly_20140503  (cost=0.00..293352.90 rows=10702190 width=42) (actual
time=0.017..2225.397rows=10702190 loops=1) 
         ->  Seq Scan on search_terms_weekly_20140510  (cost=0.00..298773.53 rows=10878953 width=42) (actual
time=0.009..2244.918rows=10878953 loops=1) 
         ->  Seq Scan on search_terms_weekly_20140517  (cost=0.00..288321.17 rows=10537717 width=41) (actual
time=0.008..1822.088rows=10537717 loops=1) 
         ->  Seq Scan on search_terms_weekly_20140524  (cost=0.00..291290.60 rows=10646460 width=41) (actual
time=0.006..1561.229rows=10646460 loops=1) 
 Total runtime: 585774.568 ms
(8 rows)


В списке pgsql-performance по дате отправления:

Предыдущее
От: David G Johnston
Дата:
Сообщение: Re: weird execution plan
Следующее
От: "Van Der Berg, Stefan"
Дата:
Сообщение: Strange performance problem with query