Обсуждение: Planner chooses multi-column index in 9.2 when maybe it should not

Поиск
Список
Период
Сортировка

Planner chooses multi-column index in 9.2 when maybe it should not

От
Greg Sabino Mullane
Дата:
We are seeing a performance regression when moving to 9.2. There is a
complex query that is doing a self-join, but long story short, it
is choosing to use a multi-column index when it really ought not to be.
I was not able to develop a standalone test case without resorting
to changing enable_seqscan, but this does show the difference:

CREATE TABLE gregtest (a TEXT PRIMARY KEY, b TEXT);
CREATE INDEX gregtest_i ON gregtest(b,a);
SET enable_seqscan = off;
EXPLAIN SELECT 1 FROM gregtest WHERE a <> 'panda' AND b <> 'yak';

On versions 8.2, 8.3, 8.4, 9.0, and 9.1, this gives:

 Seq Scan on gregtest  (cost=10000000000.00..10000000022.90 rows=855 width=0)
   Filter: ((a <> 'panda'::text) AND (b <> 'yak'::text))

Which makes sense - I would imagine that b = 'yak' would use the index,
but the negation means the index is not very useful?

However, on 9.2, this gives:

 Bitmap Heap Scan on gregtest  (cost=8.76..31.66 rows=855 width=0)
   Filter: ((a <> 'panda'::text) AND (b <> 'yak'::text))
   ->  Bitmap Index Scan on gregtest_i  (cost=0.00..8.55 rows=860 width=0)

The above was tested on stock versions of Postgres, with no changes
made to postgresql.conf. In the actual query, the result is something like
this on 9.2 (columns explained below):

 Nested Loop  (cost=0.00..6050226723847.12 rows=282638194054762 width=8)
   Join Filter: ((o.foo)::text <> ((m.foo)::text || 'panda'::text))
   ->  Index Only Scan using index_i on foobar m  (cost=0.00..889187.83 rows=16998032 width=8)
   ->  Materialize  (cost=0.00..1079773.42 rows=16627702 width=8)
         ->  Index Only Scan using index_i on foobar o  (cost=0.00..931682.91 rows=16627702 width=8)
               Filter: ((status)::text <> 'split'::text)

But like this on 8.3:

 Nested Loop  (cost=1003294.60..8207409555713.15 rows=283931552087940 width=8)
   Join Filter: ((o.foo)::text <> ((m.foo)::text || 'panda'::text))
   ->  Seq Scan on foobar o  (cost=0.00..962314.95 rows=16672204 width=8)
         Filter: ((status)::text <> 'split'::text)
   ->  Materialize  (cost=1003294.60..1240121.96 rows=17030236 width=8)
         ->  Seq Scan on foobar m  (cost=0.00..919739.36 rows=17030236 width=8)

In the above, foobar has a primary key on foo, and an index named
index_i on foobar(status, foo). In another variation of the query,
8.3 uses foobar_pkey as well, rather than index_i, and filters that.
Matter of fact, index_i is never used.

At any rate, the change causes the original query to run much, much
slower. Problem on 9.2? Something wrong with our system and/or query?
More information needed from me?


--
Greg Sabino Mullane greg@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8

Вложения

Re: Planner chooses multi-column index in 9.2 when maybe it should not

От
Tom Lane
Дата:
Greg Sabino Mullane <greg@endpoint.com> writes:
> We are seeing a performance regression when moving to 9.2. There is a
> complex query that is doing a self-join, but long story short, it
> is choosing to use a multi-column index when it really ought not to be.
> I was not able to develop a standalone test case without resorting
> to changing enable_seqscan, but this does show the difference:

> CREATE TABLE gregtest (a TEXT PRIMARY KEY, b TEXT);
> CREATE INDEX gregtest_i ON gregtest(b,a);
> SET enable_seqscan = off;
> EXPLAIN SELECT 1 FROM gregtest WHERE a <> 'panda' AND b <> 'yak';

> On versions 8.2, 8.3, 8.4, 9.0, and 9.1, this gives:

>  Seq Scan on gregtest  (cost=10000000000.00..10000000022.90 rows=855 width=0)
>    Filter: ((a <> 'panda'::text) AND (b <> 'yak'::text))

> Which makes sense - I would imagine that b = 'yak' would use the index,
> but the negation means the index is not very useful?

> However, on 9.2, this gives:

>  Bitmap Heap Scan on gregtest  (cost=8.76..31.66 rows=855 width=0)
>    Filter: ((a <> 'panda'::text) AND (b <> 'yak'::text))
>    ->  Bitmap Index Scan on gregtest_i  (cost=0.00..8.55 rows=860 width=0)

The above doesn't seem like a regression to me.  You told it not to use
a seqscan, and it didn't.  (The reason it now considers the index is
that an index-only scan is possible; before 9.2 there was no reason to
consider an indexscan at all given this query, so you got the seqscan
despite the attempted disable.)


> The above was tested on stock versions of Postgres, with no changes
> made to postgresql.conf. In the actual query, the result is something like
> this on 9.2 (columns explained below):

>  Nested Loop  (cost=0.00..6050226723847.12 rows=282638194054762 width=8)
>    Join Filter: ((o.foo)::text <> ((m.foo)::text || 'panda'::text))
>    ->  Index Only Scan using index_i on foobar m  (cost=0.00..889187.83 rows=16998032 width=8)
>    ->  Materialize  (cost=0.00..1079773.42 rows=16627702 width=8)
>          ->  Index Only Scan using index_i on foobar o  (cost=0.00..931682.91 rows=16627702 width=8)
>                Filter: ((status)::text <> 'split'::text)

It's not obvious that this is a worse plan than a seqscan --- the
index-only scans will only have to read the index not the heap, at least
if the heap is all-visible.  If it's coming out slower, then that's a
question of whether the cost estimates match reality.  I'd wonder how
many heap fetches occur anyway, and also whether you've tweaked the
planner cost parameters.

You should be able to force it back to the seqscan based plan by turning
off enable_indexscan or enable_indexonlyscan.  It would be useful to
see EXPLAIN ANALYZE (not just EXPLAIN) results for both this plan and
the seqscan plan in 9.2.

            regards, tom lane


Re: Planner chooses multi-column index in 9.2 when maybe it should not

От
Greg Sabino Mullane
Дата:
On Wed, Oct 10, 2012 at 01:31:29PM -0400, Tom Lane wrote:
> The above doesn't seem like a regression to me.  You told it not to use
> a seqscan, and it didn't.  (The reason it now considers the index is
> that an index-only scan is possible; before 9.2 there was no reason to
> consider an indexscan at all given this query, so you got the seqscan
> despite the attempted disable.)

Ah...index-only scans. Now it makes sense.

...
> It's not obvious that this is a worse plan than a seqscan --- the
> index-only scans will only have to read the index not the heap, at least
> if the heap is all-visible.  If it's coming out slower, then that's a
> question of whether the cost estimates match reality.  I'd wonder how
> many heap fetches occur anyway, and also whether you've tweaked the
> planner cost parameters.

We've lowered random_page_cost, but raising it back to the default does
not help.

> You should be able to force it back to the seqscan based plan by turning
> off enable_indexscan or enable_indexonlyscan.  It would be useful to
> see EXPLAIN ANALYZE (not just EXPLAIN) results for both this plan and
> the seqscan plan in 9.2.

Thanks, I will play around with both a better test case and getting some
explain analyzes (they were taking too long to run; thought I should get
the email out first in case it was something obvious).


--
Greg Sabino Mullane greg@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8

Вложения

Re: Planner chooses multi-column index in 9.2 when maybe it should not

От
Greg Sabino Mullane
Дата:
Found a good demonstration of the problem. Here's explain analyze of a
query on 9.2 with enable_indexonlyscan = off; This produces the exact same
plan as 8.3. The tables in question have been analyzed. Changing
random_page_cost has no effect. The main foobar table has 17M rows.
I did multiple runs of both to eliminate any caching effects.

foobar.id     is VARCHAR(16)
foobar.status is VARCHAR(32)

Indexes:
    "foobar_pkey" PRIMARY KEY, btree (id) CLUSTER
    "foobar_status" UNIQUE, btree (status, id)

(8.3 and up, plus 9.2 with index scan disabled)
GroupAggregate  (C=30389..1754503 R=1 W=22) (AT=0.3..0.3 R=1 L=1)
 ->  Nested Loop Left Join  (C=30389..1754147 R=23751 W=22) (AT=0.1..0.2 R=7 L=1)
   ->  Nested Loop Left Join  (C=30389..835374 R=8980 W=16) (AT=0.1.0.1 R=1 L=1)
     ->  Index Scan using foobar_pkey on foobar m  (C=0..13 R=1 W=8) (AT=0.03..0.03 rows=1 L=1)
           Index Cond: ((id) = '17464097')
           Filter: ((id) !~~ '%.%')
     ->  Bitmap Heap Scan on foobar o  (C=30389..835271 R=8980 W=8) (AT=0.06..0.07 R=1 L=1)
           Recheck Cond: (((id) >= (m.id)) AND ((id) <= ((m.id) || '.999999')))
           Filter: (((status) <> ALL ('{panda,penguin}'[])) \
             AND ((id) ~ (('^' || (m.id)) || '(\.[0-9.]+)?$')))
       ->  Bitmap Index Scan on foobar_pkey  (C=0..30386 R=1888670 W=0) (AT=0.02..0.02 R=1 L=1)
             Index Cond: (((id) >= (m.id)) AND ((id) <= ((m.id) || '.999999')))
   ->  Index Scan using baz_id on baz (C=0..101 R=40 W=22) (AT=0.03..0.06 R=7 L=1)
         Index Cond: ((o.id) = (id))
         Filter: (price <> 0::numeric)
         Rows Removed by Filter: 3
Total runtime: 0.459 ms

Now, if we turn on index only scans, we get a terrible runtime:

GroupAggregate  (C=0.00..1314945 R=1 W=22) (AT=34502..34502 R=1 L=1)
->  Nested Loop Left Join  (C=0.00..1314589 R=23751 W=22) (AT=31934..34502 R=7 L=1)
  ->  Nested Loop Left Join  (C=0.00..395816 R=8980 W=16) (AT=31934..34502 R=1 L=1)
    ->  Index Only Scan using foobar_pkey on foobar m  (C=0.00..13.81 R=1 W=8) (AT=0.029..0.034 R=1 L=1)
          Index Cond: (id = '17464097')
          Filter: ((id) !~~ '%.%')
          Heap Fetches: 0
    ->  Index Only Scan using foobar_status on foobar o  (C=0.00..395713 R=8980 W=8) (AT=31934..34502 R=1 L=1)
          Index Cond: ((id >= (m.id)) AND (id <= ((m.id) || '.999999')))
          Filter: (((status) <> ALL ('{panda,penguin}'[])) \
            AND ((id) ~ (('^' || (m.id)) || '(\.[0-9.]+)?$')))
          Heap Fetches: 0
  ->  Index Scan using baz_id on baz (C=0..101 R=40 W=22) (AT=0.1..0.2 R=7 L=1)
        Index Cond: ((o.id) = (id))
        Filter: (price <> 0::numeric)
        Rows Removed by Filter: 3

Total runtime: 34502.670 ms

Yeah....34 seconds versus near-instant. The first index-only scan does great,
but that second one - ouch - even with no heap fetches at all!

--
Greg Sabino Mullane greg@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8

Вложения

Re: Planner chooses multi-column index in 9.2 when maybe it should not

От
Tom Lane
Дата:
Greg Sabino Mullane <greg@endpoint.com> writes:
> Found a good demonstration of the problem. Here's explain analyze of a
> query on 9.2 with enable_indexonlyscan = off; This produces the exact same
> plan as 8.3. The tables in question have been analyzed. Changing
> random_page_cost has no effect. The main foobar table has 17M rows.

Can you provide a self-contained test case for this (ie, sample data)?

            regards, tom lane


Re: Planner chooses multi-column index in 9.2 when maybe it should not

От
Tom Lane
Дата:
Greg Sabino Mullane <greg@endpoint.com> writes:
>      ->  Bitmap Heap Scan on foobar o  (C=30389..835271 R=8980 W=8) (AT=0.06..0.07 R=1 L=1)
>            Recheck Cond: (((id) >= (m.id)) AND ((id) <= ((m.id) || '.999999')))
>            Filter: (((status) <> ALL ('{panda,penguin}'[])) \
>              AND ((id) ~ (('^' || (m.id)) || '(\.[0-9.]+)?$')))
>        ->  Bitmap Index Scan on foobar_pkey  (C=0..30386 R=1888670 W=0) (AT=0.02..0.02 R=1 L=1)
>              Index Cond: (((id) >= (m.id)) AND ((id) <= ((m.id) || '.999999')))

Actually, looking closer, I think the problem is not with the estimation
of the index-only scan on the other index; the planner is estimating
that as pretty expensive, which it is.  The problem is that it thinks
the above bitmap scan is pretty expensive, when it isn't.  And the
reason evidently is that it's totally off in the weeds about the
selectivity of the range condition on foobar.id.  Anytime you've got
1888670 estimated rows and 1 actual row, you've got a problem.

This is related to the problem I was on about a couple weeks ago:
http://archives.postgresql.org/message-id/17655.1348874742@sss.pgh.pa.us
namely that the planner fails to recognize pairs of clauses as a range
constraint if they're join clauses.  If it had recognized that, you'd
have gotten an estimate that would still be far more than "1 row", but
would be more than an order of magnitude less than this one, which would
be enough to fix this problem.

I'd not been thinking of that change as something we'd risk
back-patching, but maybe we should consider putting it into 9.2.  It
seems like the index-only scan support has put a new level of premium on
the quality of the planner's rowcount estimates.

Meanwhile, that range condition in itself looks a tad, er, klugy.
Do you really need that, or is this a crummy way of stating
foobar.id = m.id?

            regards, tom lane


Re: Planner chooses multi-column index in 9.2 when maybe it should not

От
Greg Sabino Mullane
Дата:
On Wed, Oct 10, 2012 at 09:24:32PM -0400, Tom Lane wrote:
> I'd not been thinking of that change as something we'd risk
> back-patching, but maybe we should consider putting it into 9.2.  It
> seems like the index-only scan support has put a new level of premium on
> the quality of the planner's rowcount estimates.

Yes, please do! It's that or we globally disable index-only scans,
which I'd prefer not to do. Let me know if you'd like me to test
a patch, I can apply it and see if it fixes our issue at hand.

> Meanwhile, that range condition in itself looks a tad, er, klugy.
> Do you really need that, or is this a crummy way of stating
> foobar.id = m.id?

No, it's really needed. That's merely the tip of the kluginess;
don't get me started! The dangers of an organically grown schema. :)
This thing has been growing since Postgres v6.

(Looking back at how far Postgres has come from 6.x to 9.2
is truly awe-inspiring)

--
Greg Sabino Mullane greg@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8

Вложения