equivalent queries lead to different query plans for self-joins with group by?

Поиск
Список
Период
Сортировка
От Ben
Тема equivalent queries lead to different query plans for self-joins with group by?
Дата
Msg-id 5D061DAC-31E2-4C47-AE3B-7799118C57D3@gmail.com
обсуждение исходный текст
Ответы Re: equivalent queries lead to different query plans for self-joins with group by?
Список pgsql-performance
dear pgers --

consider the following toy example (using pg 8.4.3) :

create temporary table foo (
  ts timestamp not null,
  id integer not null,
  val double precision not null,
  primary key (ts, id)
);

i might want to return the vals, minus the averages at each timestamp.  the obvious self-join results in a sequential
scanover foo -- we aggregate the average val for EVERY timestamp, then join against the timestamps we want. 

us_quotedb=# explain select ts, id, val - aval from foo join (select ts, avg(val) as aval from foo group by ts) as a
using(ts) where ts > '2010-11-11' and ts < '2010-11-13';
                                   
                                                                        QUERY PLAN

                                 

-----------------------------------------------------------------------------------------------------------------------------------------------------------
                                                                                                                      
 Hash Join  (cost=49.06..54.41 rows=8 width=28)

                                  
   Hash Cond: (pg_temp_2.foo.ts = pg_temp_2.foo.ts)

                                  
   ->  HashAggregate  (cost=34.45..36.95 rows=200 width=16)

                                  
         ->  Seq Scan on foo  (cost=0.00..26.30 rows=1630 width=16)

                                  
   ->  Hash  (cost=14.51..14.51 rows=8 width=20)

                                 
         ->  Bitmap Heap Scan on foo  (cost=4.33..14.51 rows=8 width=20)

                                 
               Recheck Cond: ((ts > '2010-11-11 00:00:00'::timestamp without time zone) AND (ts < '2010-11-13
00:00:00'::timestampwithout time zone))
                                           
               ->  Bitmap Index Scan on foo_pkey  (cost=0.00..4.33 rows=8 width=0)

                                 
                     Index Cond: ((ts > '2010-11-11 00:00:00'::timestamp without time zone) AND (ts < '2010-11-13
00:00:00'::timestampwithout time zone))
                                       

on the other hand, if i specify "which" timestamp i'm restricting, it appears to do the right thing:

us_quotedb=# explain select ts, id, val - aval from foo join (select ts, avg(val) as aval from foo group by ts) as a
using(ts) where a.ts > '2010-11-11' and a.ts < '2010-11-13';
                                   
                                                                        QUERY PLAN

                                 

-----------------------------------------------------------------------------------------------------------------------------------------------------------
                                                                                                                      
 Nested Loop  (cost=18.86..29.14 rows=8 width=28)

                                  
   ->  HashAggregate  (cost=14.55..14.56 rows=1 width=16)

                                  
         ->  Bitmap Heap Scan on foo  (cost=4.33..14.51 rows=8 width=16)

                                 
               Recheck Cond: ((ts > '2010-11-11 00:00:00'::timestamp without time zone) AND (ts < '2010-11-13
00:00:00'::timestampwithout time zone))
                                           
               ->  Bitmap Index Scan on foo_pkey  (cost=0.00..4.33 rows=8 width=0)

                                 
                     Index Cond: ((ts > '2010-11-11 00:00:00'::timestamp without time zone) AND (ts < '2010-11-13
00:00:00'::timestampwithout time zone))
                                       
   ->  Bitmap Heap Scan on foo  (cost=4.31..14.45 rows=8 width=20)

                                 
         Recheck Cond: (pg_temp_2.foo.ts = pg_temp_2.foo.ts)

                                 
         ->  Bitmap Index Scan on foo_pkey  (cost=0.00..4.31 rows=8 width=0)

                                 
               Index Cond: (pg_temp_2.foo.ts = pg_temp_2.foo.ts)

                                 

i find this behavior curious.  my understanding is that both queries are equivalent, and i would expect that the query
plannerwould be able to choose either of those plans.  this is important -- with the real data i'm working with, the
tableis very large, and the sequential scan is a killer.  

are these queries equivalent, or am i mistaken?  if the planner distinguishes between these plans, how do i ensure that
whereclause restrictions propagate (correctly) to subqueries? 

best regards, ben


В списке pgsql-performance по дате отправления:

Предыдущее
От: Kenneth Marshall
Дата:
Сообщение: Re: anti-join chosen even when slower than old plan
Следующее
От: Tom Lane
Дата:
Сообщение: Re: equivalent queries lead to different query plans for self-joins with group by?