Обсуждение: LIMIT 1 poor query plan
Hail there,
--
Short question:
Why would pg optimizer choose a worst (slower) query plan for a
query with 'LIMIT 1' instead of, say, 'LIMIT 3'?
Complete scenario:
Query: 'SELECT * FROM a WHERE a.b_id = 42 ORDER BY created LIMIT 1'
- b_id is a FK to b;
- created is a datetime with the time of the creation of the row;
- both 'b' and 'created' are indexed separately
This query, with the LIMIT 1, uses the index on created, which is much
slower (10x) than if it used the index on b_id
If I change the LIMIT from 1 to 3 pg starts using the index on b_id.
Already tried running REINDEX and VACUUM ANALYZE on both A and B.
Nothing changed.
Why does this happen?
Is there any way I can hint/force the optimizer to use b_id index?
Thanks
Marcio Ribeiro
Marcio Ribeiro <mribeiro@gmail.com> writes: > Short question: > Why would pg optimizer choose a worst (slower) query plan for a > query with 'LIMIT 1' instead of, say, 'LIMIT 3'? > Complete scenario: > Query: 'SELECT * FROM a WHERE a.b_id = 42 ORDER BY created LIMIT 1' > - b_id is a FK to b; > - created is a datetime with the time of the creation of the row; > - both 'b' and 'created' are indexed separately > This query, with the LIMIT 1, uses the index on created, which is much > slower (10x) than if it used the index on b_id It's trying to avoid a sort; or to be less anthropomorphic, the estimated cost of scanning the "created" index until it hits the first row with b_id=42 is less than the estimated cost of collecting all the rows with b_id=42 and then sorting them by "created". The estimates unfortunately are kind of shaky because it's hard to predict how many rows will get skipped before finding one with b_id=42. If you do this type of query often enough to care about its performance, you could consider creating a two-column index on (b_id, created) (in that order). regards, tom lane
Yes, the composite index nailed it.
Thanks mate :)
On Sat, Oct 10, 2015 at 12:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Marcio Ribeiro <mribeiro@gmail.com> writes:
> Short question:
> Why would pg optimizer choose a worst (slower) query plan for a
> query with 'LIMIT 1' instead of, say, 'LIMIT 3'?
> Complete scenario:
> Query: 'SELECT * FROM a WHERE a.b_id = 42 ORDER BY created LIMIT 1'
> - b_id is a FK to b;
> - created is a datetime with the time of the creation of the row;
> - both 'b' and 'created' are indexed separately
> This query, with the LIMIT 1, uses the index on created, which is much
> slower (10x) than if it used the index on b_id
It's trying to avoid a sort; or to be less anthropomorphic, the estimated
cost of scanning the "created" index until it hits the first row with
b_id=42 is less than the estimated cost of collecting all the rows with
b_id=42 and then sorting them by "created". The estimates unfortunately
are kind of shaky because it's hard to predict how many rows will get
skipped before finding one with b_id=42.
If you do this type of query often enough to care about its performance,
you could consider creating a two-column index on (b_id, created)
(in that order).
regards, tom lane
Marcio Ribeiro