Обсуждение: Planner wrongly shuns multi-column index for select .. order by col1, col2 limit 1

Поиск
Список
Период
Сортировка

Planner wrongly shuns multi-column index for select .. order by col1, col2 limit 1

От
"John Surcombe"
Дата:

Hello,

 

We are using PostgreSQL 9.0.3, compiled by Visual C++ build 1500, 32-bit, installed on Windows 2003 R2 32-bit.

 

We have an ‘aisposition’ table used for a GPS tracking application, containing ~30 million rows and a number of indexes.  Two of these are:

 

idx_receiveddatetime: indexes aisposition(receiveddatetime timestamp)

 

idx_userid_receiveddatetime: indexes aisposition(userid int4 desc, receiveddatetime timestamp desc)

 

The problem we get is that the following query is taking many minutes to run:

 

select * from aisposition where userid = 311369000 order by userid desc, receiveddatetime desc limit 1

 

When we ‘EXPLAIN’ this query, PostgreSQL says it is using the index idx_receiveddatetime.  The way the application is designed means that in virtually all cases the query will have to scan a very long way into idx_receiveddatetime to find the first record where userid = 311369000.  If however we delete the idx_receiveddatetime index, the query uses the idx_userid_receiveddatetime index, and the query only takes a few milliseconds.

 

The EXPLAIN ANALYZE output with idx_receiveddatetime in place is:

 

Limit  (cost=0.00..1.30 rows=1 width=398) (actual time=1128097.540..1128097.541 rows=1 loops=1)

  ->  Index Scan Backward using idx_receiveddatetime on aisposition  (cost=0.00..2433441.05 rows=1875926 width=398) (actual time=1128097.532..1128097.532 rows=1 loops=1)

        Filter: (userid = 311369000)

Total runtime: 1128097.609 ms

 

And with that index deleted:

 

Limit  (cost=0.00..4.01 rows=1 width=398) (actual time=60.633..60.634 rows=1 loops=1)

  ->  Index Scan using idx_userid_receiveddatetime on aisposition  (cost=0.00..7517963.47 rows=1875926 width=398) (actual time=60.629..60.629 rows=1 loops=1)

        Index Cond: (userid = 311369000)

Total runtime: 60.736 ms

 

We would obviously prefer PostgreSQL to use the idx_userid_receiveddatetime index in all cases, because we know that this will guarantee results in a timely manner, whereas using idx_receiveddatetime will usually require a scan of much of the table and our application will not work.  What are we doing wrong?

 

Cheers now,

John

Re: Planner wrongly shuns multi-column index for select .. order by col1, col2 limit 1

От
Tom Lane
Дата:
"John Surcombe" <John.Surcombe@digimap.gg> writes:
> When we 'EXPLAIN' this query, PostgreSQL says it is using the index
> idx_receiveddatetime.  The way the application is designed means that in
> virtually all cases the query will have to scan a very long way into
> idx_receiveddatetime to find the first record where userid = 311369000.
> If however we delete the idx_receiveddatetime index, the query uses the
> idx_userid_receiveddatetime index, and the query only takes a few
> milliseconds.

That's just bizarre ... it knows the index is applicable, and the cost
estimates clearly favor the better index, so why did it pick the worse
one?

I tried to duplicate this locally, without success, so there's some
contributing factor you've neglected to mention.  Can you put together a
self-contained test case that acts like this?

            regards, tom lane

Re: Planner wrongly shuns multi-column index for select .. order by col1, col2 limit 1

От
Tom Lane
Дата:
I wrote:
> "John Surcombe" <John.Surcombe@digimap.gg> writes:
>> When we 'EXPLAIN' this query, PostgreSQL says it is using the index
>> idx_receiveddatetime.  The way the application is designed means that in
>> virtually all cases the query will have to scan a very long way into
>> idx_receiveddatetime to find the first record where userid = 311369000.
>> If however we delete the idx_receiveddatetime index, the query uses the
>> idx_userid_receiveddatetime index, and the query only takes a few
>> milliseconds.

> That's just bizarre ... it knows the index is applicable, and the cost
> estimates clearly favor the better index, so why did it pick the worse
> one?

No, scratch that, I misread the plans.  It *is* picking the plan it
thinks has lower cost; it's just a mistaken cost estimate.  It's strange
though that the less selective indexscan is getting a lower cost
estimate.  I wonder whether your table is (almost) perfectly ordered by
receiveddatetime, such that the one-column index has correlation close
to 1.0.  That could possibly lower the cost estimate to the point where
it'd appear to dominate the other index.  It'd be useful to see the
pg_stats.correlation value for both the userid and receiveddatetime
columns.

            regards, tom lane

Re: Planner wrongly shuns multi-column index for select .. order by col1, col2 limit 1

От
"John Surcombe"
Дата:
> >> When we 'EXPLAIN' this query, PostgreSQL says it is using the index
> >> idx_receiveddatetime.  The way the application is designed means
that
> >> in virtually all cases the query will have to scan a very long way
> >> into idx_receiveddatetime to find the first record where userid =
> 311369000.
> >> If however we delete the idx_receiveddatetime index, the query uses
> >> the idx_userid_receiveddatetime index, and the query only takes a
few
> >> milliseconds.
>
> > That's just bizarre ... it knows the index is applicable, and the
cost
> > estimates clearly favor the better index, so why did it pick the
worse
> > one?
>
> No, scratch that, I misread the plans.  It *is* picking the plan it
thinks has
> lower cost; it's just a mistaken cost estimate.  It's strange though
that the less
> selective indexscan is getting a lower cost estimate.  I wonder
whether your
> table is (almost) perfectly ordered by receiveddatetime, such that the
one-
> column index has correlation close to 1.0.  That could possibly lower
the cost
> estimate to the point where it'd appear to dominate the other index.
It'd be
> useful to see the pg_stats.correlation value for both the userid and
> receiveddatetime columns.

Yes, the table is indeed nearly perfectly ordered by receiveddatetime
(correlation 0.998479).  correlation on userid is -0.065556.  n_distinct
on userid is also low: 1097.

Is the problem perhaps something like the following:  PostgreSQL is
thinking that because there are not many userids and there is low
correlation, that if it just scans the table from the top in date order,
this will be cheap (because receiveddatetime correlation is high so it
won't have to seek randomly), and it won't have to scan very far before
it finds the first row with a matching userid.

The problem is though that in our application the userids are not
scattered randomly.  There are small regions of the table where a
particular userid appears frequently, interspersed with much larger
regions (perhaps millions or tens of millions of rows) where it doesn't
appear at all.  So in fact the planner's preferred solution is often
pathologically bad.

Is there a solution?

Re: Planner wrongly shuns multi-column index for select .. order by col1, col2 limit 1

От
Tom Lane
Дата:
"John Surcombe" <John.Surcombe@digimap.gg> writes:
>> It'd be
>> useful to see the pg_stats.correlation value for both the userid and
>> receiveddatetime columns.

> Yes, the table is indeed nearly perfectly ordered by receiveddatetime
> (correlation 0.998479).  correlation on userid is -0.065556.  n_distinct
> on userid is also low: 1097.

Ah-hah.

> Is the problem perhaps something like the following:  PostgreSQL is
> thinking that because there are not many userids and there is low
> correlation, that if it just scans the table from the top in date order,
> this will be cheap (because receiveddatetime correlation is high so it
> won't have to seek randomly), and it won't have to scan very far before
> it finds the first row with a matching userid.

There's some of that, but I think the main problem is that there's a
very high discount on the cost estimate for a perfectly-correlated
index, and that makes it end up looking cheaper to use than the
uncorrelated one.  (It doesn't help any that we don't do correlation
properly for multicolumn indexes; but given what you say above, the
correlation estimate for the two-column index would be small even if
we'd computed it exactly.)

You might find that reducing random_page_cost would avoid the problem.
That should reduce the advantage conferred on the high-correlation
index, and it probably would represent your actual configuration better
anyway, given the results you're showing here.

            regards, tom lane