Re: between not propated into a simple equality join
От | David G. Johnston |
---|---|
Тема | Re: between not propated into a simple equality join |
Дата | |
Msg-id | CAKFQuwbHAD_KgaWbq6q6_RvnV0FEjGKhYCS28BBXSkGOo=Fjyg@mail.gmail.com обсуждение исходный текст |
Ответ на | between not propated into a simple equality join (Benedikt Grundmann <bgrundmann@janestreet.com>) |
Ответы |
Re: between not propated into a simple equality join
(David Rowley <david.rowley@2ndquadrant.com>)
|
Список | pgsql-hackers |
We just run into a very simple query that the planner does much worse on than we thought it would (in production the table in question is ~ 100 GB). It surprised us given the planner is generally quite good, so I thought I share our surpriseSetup:postgres_prod@proddb_testing=# select version();[1]version────────────────────────────────────────────────────────────────────────────────────────────────────────────────PostgreSQL 9.2.16 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-16), 64-bit(1 row)Time: 69.246 mspostgres_prod@proddb_testing=# create table toy_data3 (the_date date, i int);CREATE TABLETime: 67.096 mspostgres_prod@proddb_testing=# insert into toy_data3(select current_date-(s.idx/1000), s.idx from generate_series(1,1000000) as s(idx));INSERT 0 1000000Time: 1617.483 mspostgres_prod@proddb_testing=# create index toy_data_date3 on toy_data3(the_date);CREATE INDEXTime: 660.166 mspostgres_prod@proddb_testing=# analyze toy_data3;ANALYZETime: 294.984 msThe bad behavior:postgres_prod@proddb_testing=# explain analyzeselect * from (select td1.the_date, td1.ifrom toy_data3 td1, toy_data3 td2 where td1.the_date = td2.the_date and td1.i = td2.i) foowhere the_date between current_date and current_date;QUERY PLAN───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────Hash Join (cost=55.49..21980.50 rows=1 width=8) (actual time=0.336..179.374 rows=999 loops=1)Hash Cond: ((td2.the_date = td1.the_date) AND (td2.i = td1.i))-> Seq Scan on toy_data3 td2 (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.007..72.510 rows=1000000 lo-> Hash (cost=40.44..40.44 rows=1003 width=8) (actual time=0.321..0.321 rows=999 loops=1)Buckets: 1024 Batches: 1 Memory Usage: 40kB-> Index Scan using toy_data_date3 on toy_data3 td1 (cost=0.01..40.44 rows=1003 width=8) (actual time=0.018.Index Cond: ((the_date >= ('now'::cstring)::date) AND (the_date <= ('now'::cstring)::date))Total runtime: 179.440 ms(8 rows)Time: 246.094 msNotice the red. Which is sad because one would like it to realize that it could propagate the index constraint onto td2. That is on both sides of the join do the green.
FWIW
This is my plan result:
version
PostgreSQL 9.5.2 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit
All default settings
using "BETWEEN"
QUERY PLANNested Loop (cost=0.86..48.91 rows=1 width=8) (actual time=0.042..168.512 rows=999 loops=1)
-> Index Scan using toy_data_date3 on toy_data3 td1 (cost=0.43..8.46 rows=1 width=8) (actual time=0.022..1.388 rows=999 loops=1)
Index Cond: ((the_date >= ('now'::cstring)::date) AND (the_date <= ('now'::cstring)::date))
-> Index Scan using toy_data_date3 on toy_data3 td2 (cost=0.42..40.44 rows=1 width=8) (actual time=0.078..0.160 rows=1 loops=999)
Index Cond: (the_date = td1.the_date)
Filter: (td1.i = i)
Rows Removed by Filter: 998
Planning time: 0.353 ms
Execution time: 169.692 ms
using "="
QUERY PLAN
Hash Join (cost=49.89..90.46 rows=1 width=8) (actual time=2.320..5.652 rows=999 loops=1)
Hash Cond: (td1.i = td2.i)
-> Index Scan using toy_data_date3 on toy_data3 td1 (cost=0.43..37.37 rows=967 width=8) (actual time=0.014..1.168 rows=999 loops=1)
Index Cond: (the_date = ('now'::cstring)::date)
-> Hash (cost=37.37..37.37 rows=967 width=8) (actual time=2.292..2.292 rows=999 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 48kB
-> Index Scan using toy_data_date3 on toy_data3 td2 (cost=0.43..37.37 rows=967 width=8) (actual time=0.008..1.183 rows=999 loops=1)
Index Cond: (the_date = ('now'::cstring)::date)
Planning time: 0.326 ms
Execution time: 6.673 ms
I was hoping to be able to say more but alas cannot find the words.
I'm surprised by the estimate of 1 rows for the td1 index scan in my 9.5 query - and also why the 9.2 query would choose to sequential scan hash join in favor of what seems to be a superior index scan nested loop on a fraction of the table.
The fact that the between doesn't get transitively applied to td2 through the td1=td2 condition, not as much...though whether the limitation is due to theory or implementation I do not know.
I do suspect that at least part of the issue is that the computation of "the_date" makes the two columns highly correlated while the planner assumes independence.
David J.
В списке pgsql-hackers по дате отправления: