Обсуждение: planner not choosing fastest estimated plan

Поиск
Список
Период
Сортировка

planner not choosing fastest estimated plan

От
Jeff Janes
Дата:
I have a weird case where the planner doesn't choose the plan that it
itself believes to be the fastest plan.  If I disable seqscan, it then
chooses a faster plan (faster in estimate and faster in reality) than
the one chosen when all options were open to it.  I can't figure out
how this can be anything other than a bug.  The *collapse_limit
parameters are not restraining things.

I've created a dummy self-contained test case that is a simple
self-join of a partitioned table, with a function-based index.

If I analyze the tables after the function-based indexes are in place,
then the problems goes away.  And that is the production solution.
But still, a bug is a bug, even if there is a work around.

This is using the default configuration with LANG=C in 9.2, 9.3, and 9.4.

It was introduced in commit 5b7b5518d0ea56c422a19787, "Revise
parameterized-path mechanism to fix assorted issues"

I've tried compiling under OPTIMIZER_DEBUG, but the output did not
mean anything to me.  It looks like the only RELOPTINFO corresponding
to the join, "RELOPTINFO (1 2)" only lists the HashJoin when
enable_seqscan=on, and only contains NestLoop when enable_seqscan=off.I don't know why it doesn't list both in both
casesand then choose
 
the faster.



create table foo1 as select lpad(g::text,7,'0') as val1, g as num1
from generate_series(1,100000) g;
create table foo2 as select lpad(g::text,7,'0') as val1, g as num1
from generate_series(100001,1000000) g;
alter table foo2 inherit foo1;
create index on foo1(num1 );
create index on foo2(num1 );
analyze foo1; analyze foo2;
create language plperl;
CREATE OR REPLACE FUNCTION perlupper(text)RETURNS textLANGUAGE plperlIMMUTABLE COST 1000
AS $function$ return uc($_[0]);
$function$;
create index on foo1 (perlupper(val1));
create index on foo2 (perlupper(val1));


jjanes=# explain select a.num1 from foo1 a, foo1 b where
perlupper(a.val1)=perlupper(b.val1) and b.num1 = 987845;                                      QUERY PLAN
----------------------------------------------------------------------------------------Hash Join
(cost=32789.00..538040.65rows=10000 width=4)  Hash Cond: (perlupper(b.val1) = perlupper(a.val1))  ->  Append
(cost=0.00..16.64rows=2 width=8)        ->  Index Scan using foo1_num1_idx on foo1 b
 
(cost=0.00..8.28 rows=1 width=8)              Index Cond: (num1 = 987845)        ->  Index Scan using foo2_num1_idx on
foo2b
 
(cost=0.00..8.37 rows=1 width=8)              Index Cond: (num1 = 987845)  ->  Hash  (cost=15406.00..15406.00
rows=1000000width=12)        ->  Append  (cost=0.00..15406.00 rows=1000000 width=12)              ->  Seq Scan on foo1
a (cost=0.00..1541.00 rows=100000 width=12)              ->  Seq Scan on foo2 a  (cost=0.00..13865.00
 
rows=900000 width=12)

jjanes=# set enable_seqscan TO off;
jjanes=# explain select a.num1 from foo1 a, foo1 b where
perlupper(a.val1)=perlupper(b.val1) and b.num1 = 987845;                                          QUERY PLAN
-------------------------------------------------------------------------------------------------Nested Loop
(cost=17.14..60438.19rows=10000 width=4)  ->  Append  (cost=0.00..16.64 rows=2 width=8)        ->  Index Scan using
foo1_num1_idxon foo1 b
 
(cost=0.00..8.28 rows=1 width=8)              Index Cond: (num1 = 987845)        ->  Index Scan using foo2_num1_idx on
foo2b
 
(cost=0.00..8.37 rows=1 width=8)              Index Cond: (num1 = 987845)  ->  Append  (cost=17.14..30160.77 rows=5000
width=12)       ->  Bitmap Heap Scan on foo1 a  (cost=17.14..3022.65 rows=500 width=12)              Recheck Cond:
(perlupper(val1)= perlupper(b.val1))              ->  Bitmap Index Scan on foo1_perlupper_idx
 
(cost=0.00..17.01 rows=500 width=0)                    Index Cond: (perlupper(val1) = perlupper(b.val1))        ->
BitmapHeap Scan on foo2 a  (cost=92.22..27138.12
 
rows=4500 width=12)              Recheck Cond: (perlupper(val1) = perlupper(b.val1))              ->  Bitmap Index Scan
onfoo2_perlupper_idx
 
(cost=0.00..91.10 rows=4500 width=0)                    Index Cond: (perlupper(val1) = perlupper(b.val1))

Cheers,

Jeff



Re: planner not choosing fastest estimated plan

От
Pavel Stehule
Дата:
Hello

pls, can you send EXPLAIN ANALYZE result?

Regards

Pavel Stehule

2013/7/6 Jeff Janes <jeff.janes@gmail.com>:
> I have a weird case where the planner doesn't choose the plan that it
> itself believes to be the fastest plan.  If I disable seqscan, it then
> chooses a faster plan (faster in estimate and faster in reality) than
> the one chosen when all options were open to it.  I can't figure out
> how this can be anything other than a bug.  The *collapse_limit
> parameters are not restraining things.
>
> I've created a dummy self-contained test case that is a simple
> self-join of a partitioned table, with a function-based index.
>
> If I analyze the tables after the function-based indexes are in place,
> then the problems goes away.  And that is the production solution.
> But still, a bug is a bug, even if there is a work around.
>
> This is using the default configuration with LANG=C in 9.2, 9.3, and 9.4.
>
> It was introduced in commit 5b7b5518d0ea56c422a19787, "Revise
> parameterized-path mechanism to fix assorted issues"
>
> I've tried compiling under OPTIMIZER_DEBUG, but the output did not
> mean anything to me.  It looks like the only RELOPTINFO corresponding
> to the join, "RELOPTINFO (1 2)" only lists the HashJoin when
> enable_seqscan=on, and only contains NestLoop when enable_seqscan=off.
>  I don't know why it doesn't list both in both cases and then choose
> the faster.
>
>
>
> create table foo1 as select lpad(g::text,7,'0') as val1, g as num1
> from generate_series(1,100000) g;
> create table foo2 as select lpad(g::text,7,'0') as val1, g as num1
> from generate_series(100001,1000000) g;
> alter table foo2 inherit foo1;
> create index on foo1(num1 );
> create index on foo2(num1 );
> analyze foo1; analyze foo2;
> create language plperl;
> CREATE OR REPLACE FUNCTION perlupper(text)
>  RETURNS text
>  LANGUAGE plperl
>  IMMUTABLE COST 1000
> AS $function$
>   return uc($_[0]);
> $function$;
> create index on foo1 (perlupper(val1));
> create index on foo2 (perlupper(val1));
>
>
> jjanes=# explain select a.num1 from foo1 a, foo1 b where
> perlupper(a.val1)=perlupper(b.val1) and b.num1 = 987845;
>                                        QUERY PLAN
> ----------------------------------------------------------------------------------------
>  Hash Join  (cost=32789.00..538040.65 rows=10000 width=4)
>    Hash Cond: (perlupper(b.val1) = perlupper(a.val1))
>    ->  Append  (cost=0.00..16.64 rows=2 width=8)
>          ->  Index Scan using foo1_num1_idx on foo1 b
> (cost=0.00..8.28 rows=1 width=8)
>                Index Cond: (num1 = 987845)
>          ->  Index Scan using foo2_num1_idx on foo2 b
> (cost=0.00..8.37 rows=1 width=8)
>                Index Cond: (num1 = 987845)
>    ->  Hash  (cost=15406.00..15406.00 rows=1000000 width=12)
>          ->  Append  (cost=0.00..15406.00 rows=1000000 width=12)
>                ->  Seq Scan on foo1 a  (cost=0.00..1541.00 rows=100000 width=12)
>                ->  Seq Scan on foo2 a  (cost=0.00..13865.00
> rows=900000 width=12)
>
> jjanes=# set enable_seqscan TO off;
> jjanes=# explain select a.num1 from foo1 a, foo1 b where
> perlupper(a.val1)=perlupper(b.val1) and b.num1 = 987845;
>                                            QUERY PLAN
> -------------------------------------------------------------------------------------------------
>  Nested Loop  (cost=17.14..60438.19 rows=10000 width=4)
>    ->  Append  (cost=0.00..16.64 rows=2 width=8)
>          ->  Index Scan using foo1_num1_idx on foo1 b
> (cost=0.00..8.28 rows=1 width=8)
>                Index Cond: (num1 = 987845)
>          ->  Index Scan using foo2_num1_idx on foo2 b
> (cost=0.00..8.37 rows=1 width=8)
>                Index Cond: (num1 = 987845)
>    ->  Append  (cost=17.14..30160.77 rows=5000 width=12)
>          ->  Bitmap Heap Scan on foo1 a  (cost=17.14..3022.65 rows=500 width=12)
>                Recheck Cond: (perlupper(val1) = perlupper(b.val1))
>                ->  Bitmap Index Scan on foo1_perlupper_idx
> (cost=0.00..17.01 rows=500 width=0)
>                      Index Cond: (perlupper(val1) = perlupper(b.val1))
>          ->  Bitmap Heap Scan on foo2 a  (cost=92.22..27138.12
> rows=4500 width=12)
>                Recheck Cond: (perlupper(val1) = perlupper(b.val1))
>                ->  Bitmap Index Scan on foo2_perlupper_idx
> (cost=0.00..91.10 rows=4500 width=0)
>                      Index Cond: (perlupper(val1) = perlupper(b.val1))
>
> Cheers,
>
> Jeff
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



Re: planner not choosing fastest estimated plan

От
Jeff Janes
Дата:
On Sat, Jul 6, 2013 at 10:52 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Hello
>
> pls, can you send EXPLAIN ANALYZE result?

Sure.  I've put them on depesz.  (I don't know how useful it will be,
as I'm worried about the internal inconsistency rather than the actual
run time):

enable_seqscan=on
http://explain.depesz.com/s/8GF

enable_seqscan=off
http://explain.depesz.com/s/51V

Cheers,

Jeff



Re: planner not choosing fastest estimated plan

От
Tom Lane
Дата:
Jeff Janes <jeff.janes@gmail.com> writes:
> I have a weird case where the planner doesn't choose the plan that it
> itself believes to be the fastest plan.

I poked into this a bit and found where things are going wrong.  The
ideal plan for this case involves a nestloop with a parameterized inner
scan over an inheritance tree (an append relation).  The Path for that
scan has to be built by set_append_rel_pathlist(), around lines 810-860
of allpaths.c in HEAD.  And what that code is doing is taking the
cheapest path for each appendrel member, then "reparameterizing" it
if necessary, which means modifying it to include enforcement of any
available parameterized join quals that it wasn't using already.
In this case what we've got for each child is a simple seqscan path
(enforcing no quals) and a bitmap indexscan path that uses the
parameterized joinqual perlupper(a.val1)=perlupper(b.val1).  Ordinarily
the bitmap path would be the cheapest and everything would be fine.
However, Jeff's example assigns a very high cost to the UDF, and the
bitmap scan's cost estimate includes (correctly) one evaluation of the
UDF.  With a high enough UDF cost, the bare seqscan is cheaper, and so
it gets picked.  Now, once we reparameterize the seqscan, which in this
case amounts to having it evaluate the parameterized joinqual at every
row, it's hugely expensive; so we end up with a very expensive
parameterized append path that isn't going to look attractive when join
paths are made, and the planner picks a hash or merge join instead.

So in short, the error lies in assuming that the cheapest plain path
will still be the cheapest one after reparameterization.  I think that
I did recognize that as a risk when I wrote this code, but supposed that
the possible addition of extra quals to evaluate wouldn't change the
outcome.  Jeff's example shows that actually the added quals can make a
huge difference.

The simplest correct logic would involve running through all the
available paths for the child relation, reparameterizing each one, and
only then comparing their costs.  That sounds a bit expensive though.
What we could do to ameliorate the planning cost is to still search for
the overall-cheapest child path, and if it is already parameterized the
way we want (which will frequently be the case, I think), then we're
done.  Only if it requires reparameterization do we have to go through
the more complicated process.

Thoughts, better ideas?
        regards, tom lane