Re:
От | Tom Lane |
---|---|
Тема | Re: |
Дата | |
Msg-id | 18267.1044924111@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: ("Scott Morrison" <smorrison@navtechinc.com>) |
Список | pgsql-novice |
"Scott Morrison" <smorrison@navtechinc.com> writes: > Query 1 (Original): > ANALYZE sample; > CREATE INDEX sample_id ON sample(id); > CREATE INDEX sample_date ON sample(date); > ANALYZE sample; > EXPLAIN ANALYZE > SELECT * FROM sample a WHERE (id,date) IN > (SELECT id,max(date) FROM sample > WHERE id=a.id AND date<='2003-02-07' > GROUP BY id); It's got to be possible to improve on that. The big problem is the IN, which is a very inefficient construct (and the upcoming 7.4 improvements won't help any, because they only address cases where the subselect is uncorrelated, ie, doesn't use variables from the outer query). But it's easy to get rid of the IN. First off, because the sub-select contains WHERE id = a.id, there is no need to return id from the sub-SELECT; it must be equal to the outer a.id, no? SELECT * FROM sample a WHERE date IN (SELECT max(date) FROM sample WHERE id=a.id AND date<='2003-02-07' GROUP BY id); For the same reason, we do not need GROUP BY in the sub-select, because only one group can be formed. This gives us SELECT * FROM sample a WHERE date IN (SELECT max(date) FROM sample WHERE id=a.id AND date<='2003-02-07'); and now we can see that the sub-select returns exactly one row, which means we can reduce IN to "=" SELECT * FROM sample a WHERE date = (SELECT max(date) FROM sample WHERE id=a.id AND date<='2003-02-07'); and now the sub-select is amenable to the LIMIT-and-index trick for implementing MAX() quickly: CREATE INDEX fooi on sample (id, date); SELECT * FROM sample a WHERE date = (SELECT date FROM sample WHERE id=a.id AND date<='2003-02-07' ORDER BY id DESC, date DESC LIMIT 1); Now this is a pretty decent subplan --- if you try it you should see a plan like Seq Scan on sample a (cost=0.00..4330.81 rows=5 width=8) Filter: (date = (subplan)) SubPlan -> Limit (cost=0.00..4.31 rows=1 width=8) -> Index Scan Backward using fooi on sample (cost=0.00..7.18 rows=2 width=8) Index Cond: ((id = $0) AND (date <= '2003-02-07'::date)) and there isn't any way to make it quicker: the index is taking us directly to the single value that we want. But the overall performance will still suck, because we are invoking the subplan once for every row of the table. And here's where you need to consider rearranging your schema. The only reason we need to scan all of the 'sample' table is that we are using the table rows as the source of potential values of 'id' to plug into the subquery. Are you willing to make a side table showing the allowed values of 'id'? (You could enforce that it's correct with a foreign key constraint on 'sample'.) If you are, then this would work: SELECT a.* FROM sample a, possible_ids p WHERE p.id = a.id AND date = (SELECT date FROM sample WHERE id = p.id AND date <= '2003-02-07' ORDER BY id DESC, date DESC LIMIT 1); Experimenting, I get a plan like Nested Loop (cost=0.00..13477.14 rows=25 width=12) -> Seq Scan on possible_ids p (cost=0.00..20.00 rows=1000 width=4) -> Index Scan using fooi on sample a (cost=0.00..9.13 rows=1 width=8) Index Cond: (("outer".id = a.id) AND (a.date = (subplan))) SubPlan -> Limit (cost=0.00..4.31 rows=1 width=8) -> Index Scan Backward using fooi on sample (cost=0.00..7.18 rows=2 width=8) Index Cond: ((id = $0) AND (date <= '2003-02-07'::date)) -> Limit (cost=0.00..4.31 rows=1 width=8) -> Index Scan Backward using fooi on sample (cost=0.00..7.18 rows=2 width=8) Index Cond: ((id = $0) AND (date <= '2003-02-07'::date)) which looks pretty promising. This should require just one subplan execution (one index probe) followed by an index probe into the 'sample' table, for each possible 'id' value. Assuming you have many fewer distinct 'id' values than you do sample rows, this should win big. (Note: the reason we see the subplan listed twice is an arcane implementation detail; in practice only one of the copies will get executed in this query.) regards, tom lane
В списке pgsql-novice по дате отправления: