between not propated into a simple equality join

Поиск
Список
Период
Сортировка
От Benedikt Grundmann
Тема between not propated into a simple equality join
Дата
Msg-id CADbMkNM7qiD6Hp5bAsa=Gc04e13a-=Ku4Uby=yaNG5WxZ=WVgA@mail.gmail.com
обсуждение исходный текст
Ответы Re: between not propated into a simple equality join  ("David G. Johnston" <david.g.johnston@gmail.com>)
Список pgsql-hackers
We just run into a very simple query that the planner does much worse on than we thought it would (in production the table in question is ~ 100 GB).  It surprised us given the planner is generally quite good, so I thought I share our surprise

Setup:

postgres_prod@proddb_testing=# select version();[1]
                                                    version                                                     
────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 PostgreSQL 9.2.16 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-16), 64-bit
(1 row)

Time: 69.246 ms

postgres_prod@proddb_testing=# create table toy_data3 (the_date date, i int);   
CREATE TABLE
Time: 67.096 ms
postgres_prod@proddb_testing=# insert into toy_data3                            
  (select current_date-(s.idx/1000), s.idx from generate_series(1,1000000) as s(idx));
INSERT 0 1000000
Time: 1617.483 ms
postgres_prod@proddb_testing=# create index toy_data_date3 on toy_data3(the_date);
CREATE INDEX
Time: 660.166 ms
postgres_prod@proddb_testing=# analyze toy_data3;
ANALYZE
Time: 294.984 ms

The bad behavior:

postgres_prod@proddb_testing=# explain analyze
  select * from (
   select td1.the_date, td1.i     
    from toy_data3 td1, toy_data3 td2  where td1.the_date = td2.the_date and td1.i = td2.i
  ) foo
  where the_date between current_date and current_date;
                                                                   QUERY PLAN                                          
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Hash Join  (cost=55.49..21980.50 rows=1 width=8) (actual time=0.336..179.374 rows=999 loops=1)
   Hash Cond: ((td2.the_date = td1.the_date) AND (td2.i = td1.i))
   ->  Seq Scan on toy_data3 td2  (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.007..72.510 rows=1000000 lo
   ->  Hash  (cost=40.44..40.44 rows=1003 width=8) (actual time=0.321..0.321 rows=999 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 40kB
         ->  Index Scan using toy_data_date3 on toy_data3 td1  (cost=0.01..40.44 rows=1003 width=8) (actual time=0.018.
               Index Cond: ((the_date >= ('now'::cstring)::date) AND (the_date <= ('now'::cstring)::date))
 Total runtime: 179.440 ms
(8 rows)

Time: 246.094 ms

Notice the red.  Which is sad because one would like it to realize that it could propagate the index constraint onto td2.  That is on both sides of the join do the green.

As it does correctly when one explicitly uses equality (bold below) (but of course we sometimes have multiple day ranges in production and we only used a single date range above to make it extra interesting for the planner to NOT do a seqscan):

postgres_prod@proddb_testing=# explain analyze
  select * from (
   select td1.the_date, td1.i     
    from toy_data3 td1, toy_data3 td2  where td1.the_date = td2.the_date and td1.i = td2.i        ) foo
  where the_date = current_date;
                                                                   QUERY PLAN                                          
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Hash Join  (cost=50.47..92.17 rows=1 width=8) (actual time=0.300..0.652 rows=999 loops=1)
   Hash Cond: (td1.i = td2.i)
   ->  Index Scan using toy_data_date3 on toy_data3 td1  (cost=0.00..37.93 rows=1003 width=8) (actual time=0.023..0.169
         Index Cond: (the_date = ('now'::cstring)::date)
   ->  Hash  (cost=37.93..37.93 rows=1003 width=8) (actual time=0.270..0.270 rows=999 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 40kB
         ->  Index Scan using toy_data_date3 on toy_data3 td2  (cost=0.00..37.93 rows=1003 width=8) (actual time=0.007.
               Index Cond: (the_date = ('now'::cstring)::date)
 Total runtime: 0.713 ms
(9 rows)

Time: 66.904 ms

Cheers,

Bene

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: parallel.c is not marked as test covered
Следующее
От: Robert Haas
Дата:
Сообщение: Re: parallel.c is not marked as test covered