Обсуждение: [HACKERS] Regression in join selectivity estimations when using foreign keys

Поиск
Список
Период
Сортировка

[HACKERS] Regression in join selectivity estimations when using foreign keys

От
David Rowley
Дата:
I've been analyzing a reported regression case between a 9.5 plan and
a 9.6 plan. I tracked this down to the foreign key join selectivity
code, specifically the use_smallest_selectivity code which is applied
to outer joins where the referenced table is on the outer side of the
join.

A vastly simplified example case is:

create table fkest (a int, b int, c int unique, primary key(a,b));
create table fkest1 (a int, b int, primary key(a,b));

insert into fkest select x/10,x%10, x from generate_Series(1,400) x;
insert into fkest1 select x/10,x%10 from generate_Series(1,400) x;

alter table fkest1 add constraint fkest1_a_b_fkey foreign key (a,b)
references fkest;

analyze fkest;
analyze fkest1;

explain (costs on) select * from fkest f
left join fkest1 f1 on f.a = f1.a and f.b = f1.b
left join fkest1 f2 on f.a = f2.a and f.b = f2.b
left join fkest1 f3 on f.a = f3.a and f.b = f3.b
where f.c = 1;
                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Hash Left Join  (cost=24.15..41.89 rows=996 width=36)
   Hash Cond: ((f.a = f3.a) AND (f.b = f3.b))
   ->  Hash Left Join  (cost=12.15..28.36 rows=100 width=28)
         Hash Cond: ((f.a = f2.a) AND (f.b = f2.b))
         ->  Nested Loop Left Join  (cost=0.15..16.21 rows=10 width=20)
               ->  Seq Scan on fkest f  (cost=0.00..8.00 rows=1 width=12)
                     Filter: (c = 1)
               ->  Index Only Scan using fkest1_pkey on fkest1 f1
(cost=0.15..8.17 rows=1 width=8)
                     Index Cond: ((a = f.a) AND (b = f.b))
         ->  Hash  (cost=6.00..6.00 rows=400 width=8)
               ->  Seq Scan on fkest1 f2  (cost=0.00..6.00 rows=400 width=8)
   ->  Hash  (cost=6.00..6.00 rows=400 width=8)
         ->  Seq Scan on fkest1 f3  (cost=0.00..6.00 rows=400 width=8)
(13 rows)

-- now drop the foreign key to simulate the behaviour pre-9.6

alter table fkest1 drop constraint fkest1_a_b_fkey;

explain (costs on) select * from fkest f
left join fkest1 f1 on f.a = f1.a and f.b = f1.b
left join fkest1 f2 on f.a = f2.a and f.b = f2.b
left join fkest1 f3 on f.a = f3.a and f.b = f3.b
where f.c = 1;
                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=0.44..32.62 rows=1 width=36)
   ->  Nested Loop Left Join  (cost=0.29..24.41 rows=1 width=28)
         ->  Nested Loop Left Join  (cost=0.15..16.21 rows=1 width=20)
               ->  Seq Scan on fkest f  (cost=0.00..8.00 rows=1 width=12)
                     Filter: (c = 1)
               ->  Index Only Scan using fkest1_pkey on fkest1 f1
(cost=0.15..8.17 rows=1 width=8)
                     Index Cond: ((a = f.a) AND (b = f.b))
         ->  Index Only Scan using fkest1_pkey on fkest1 f2
(cost=0.15..8.17 rows=1 width=8)
               Index Cond: ((a = f.a) AND (b = f.b))
   ->  Index Only Scan using fkest1_pkey on fkest1 f3
(cost=0.15..8.17 rows=1 width=8)
         Index Cond: ((a = f.a) AND (b = f.b))
(11 rows)

The problem is that the use_smallest_selectivity in
get_foreign_key_join_selectivity() here is calculating 1/10 instead of
1/100 for each join level. Really it should be multiplying the
selectivities instead of taking the minimum selectivity, but even that
does not seem correct.

That bad estimate propagates up the join tree until some pretty insane
estimates come back.

The following is the offending code:

foreach(cell, removedlist)
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(cell);
Selectivity csel;

csel = clause_selectivity(root, (Node *) rinfo,
 0, jointype, sjinfo);
thisfksel = Min(thisfksel, csel);
}

This code is only applied to outer joins. I've studied this a bit and
generally, I'm unsure why we're treating outer joins in a special way
at all. Some comments indicate that this is in regards to NULLs, but
I'm not sure why we've code special code for outer joins and none for
INNER joins.

It's quite easy to show how this regresses in general in regards to
NULLs on INNER joins with:

create table part (partid int primary key);
create table sale (saleid int primary key, partid int references part);
insert into part select generate_series(1,1000);
insert into sale select x,NULL from generate_series(1,1000) x;
analyze part;
analyze sale;
explain analyze select * from part p inner join sale s on p.partid =
s.partid; -- using foreign key join estimations
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=27.50..55.12 rows=1000 width=12) (actual
time=0.733..0.733 rows=0 loops=1)
   Hash Cond: (s.partid = p.partid)
   ->  Seq Scan on sale s  (cost=0.00..15.00 rows=1000 width=8)
(actual time=0.018..0.332 rows=1000 loops=1)
   ->  Hash  (cost=15.00..15.00 rows=1000 width=4) (actual
time=0.336..0.336 rows=1000 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 44kB
         ->  Seq Scan on part p  (cost=0.00..15.00 rows=1000 width=4)
(actual time=0.014..0.132 rows=1000 loops=1)
 Planning time: 0.468 ms
 Execution time: 0.787 ms
(8 rows)

alter table sale drop constraint sale_partid_fkey; -- simulate pre-9.6
behaviour by dropping the fkey

explain analyze select * from part p inner join sale s on p.partid =
s.partid; -- normal join estimations.
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=27.50..55.12 rows=1 width=12) (actual
time=0.546..0.546 rows=0 loops=1)
   Hash Cond: (s.partid = p.partid)
   ->  Seq Scan on sale s  (cost=0.00..15.00 rows=1000 width=8)
(actual time=0.014..0.102 rows=1000 loops=1)
   ->  Hash  (cost=15.00..15.00 rows=1000 width=4) (actual
time=0.387..0.387 rows=1000 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 44kB
         ->  Seq Scan on part p  (cost=0.00..15.00 rows=1000 width=4)
(actual time=0.009..0.110 rows=1000 loops=1)
 Planning time: 0.278 ms
 Execution time: 0.572 ms

I've attached fixes, based on master, for both of these cases.

Patches:

0001: Is a minimal fix for the repored regression, but may cause
further regression as it does nothing to account for NULLs valued
columns in outer joins, but at least it is consistent with how INNER
joins are estimated regarding NULLs.

0002: Adds further code to apply the nullfrac from pg_statistics.

I've given this a round of testing and I can't find any case where it
gives worse estimates than the standard join estimation as seen when
you drop the foreign key, but I'd like to challenge someone else to
give it a go and see if they can.... Not because I'm super confident,
more just because I want this to be correct.

Thoughts?

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Regression in join selectivity estimations when using foreign keys

От
David Rowley
Дата:
On 18 May 2017 at 20:28, David Rowley <david.rowley@2ndquadrant.com> wrote:
> A vastly simplified example case is:
>
> create table fkest (a int, b int, c int unique, primary key(a,b));
> create table fkest1 (a int, b int, primary key(a,b));
>
> insert into fkest select x/10,x%10, x from generate_Series(1,400) x;
> insert into fkest1 select x/10,x%10 from generate_Series(1,400) x;
>
> alter table fkest1 add constraint fkest1_a_b_fkey foreign key (a,b)
> references fkest;
>
> analyze fkest;
> analyze fkest1;
>
> explain (costs on) select * from fkest f
> left join fkest1 f1 on f.a = f1.a and f.b = f1.b
> left join fkest1 f2 on f.a = f2.a and f.b = f2.b
> left join fkest1 f3 on f.a = f3.a and f.b = f3.b
> where f.c = 1;

I should have shown the EXPLAIN ANALYZE of this instead.


QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------Hash
LeftJoin  (cost=24.15..41.89 rows=996 width=36) (actual
 
time=0.430..0.463 rows=1 loops=1)  Hash Cond: ((f.a = f3.a) AND (f.b = f3.b))  ->  Hash Left Join  (cost=12.15..28.36
rows=100width=28) (actual
 
time=0.255..0.288 rows=1 loops=1)        Hash Cond: ((f.a = f2.a) AND (f.b = f2.b))        ->  Nested Loop Left Join
(cost=0.15..16.21rows=10
 
width=20) (actual time=0.046..0.079 rows=1 loops=1)              ->  Seq Scan on fkest f  (cost=0.00..8.00 rows=1
width=12) (actual time=0.013..0.045 rows=1 loops=1)                    Filter: (c = 1)                    Rows Removed
byFilter: 399              ->  Index Only Scan using fkest1_pkey on fkest1 f1
 
(cost=0.15..8.17 rows=1 width=8) (actual time=0.031..0.031 rows=1
loops=1)                    Index Cond: ((a = f.a) AND (b = f.b))                    Heap Fetches: 1        ->  Hash
(cost=6.00..6.00rows=400 width=8) (actual
 
time=0.180..0.180 rows=400 loops=1)              Buckets: 1024  Batches: 1  Memory Usage: 24kB              ->  Seq
Scanon fkest1 f2  (cost=0.00..6.00 rows=400
 
width=8) (actual time=0.006..0.041 rows=400 loops=1)  ->  Hash  (cost=6.00..6.00 rows=400 width=8) (actual
time=0.162..0.162 rows=400 loops=1)        Buckets: 1024  Batches: 1  Memory Usage: 24kB        ->  Seq Scan on fkest1
f3 (cost=0.00..6.00 rows=400 width=8)
 
(actual time=0.010..0.040 rows=400 loops=1)Planning time: 0.409 msExecution time: 0.513 ms
(19 rows)

which you can obviously see the poor estimate propagating to up the plan tree.

If we add another left join the final estimate is even worse:

explain analyze select * from fkest f
left join fkest1 f1 on f.a = f1.a and f.b = f1.b
left join fkest1 f2 on f.a = f2.a and f.b = f2.b
left join fkest1 f3 on f.a = f3.a and f.b = f3.b
left join fkest1 f4 on f.a = f4.a and f.b = f4.b where f.c = 1;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------Hash
LeftJoin  (cost=36.15..69.06 rows=9915 width=44) (actual
 
time=0.535..0.569 rows=1 loops=1)  Hash Cond: ((f.a = f4.a) AND (f.b = f4.b))  ->  Hash Left Join  (cost=24.15..41.89
rows=996width=36) (actual
 
time=0.371..0.404 rows=1 loops=1)        Hash Cond: ((f.a = f3.a) AND (f.b = f3.b))        ->  Hash Left Join
(cost=12.15..28.36rows=100 width=28)
 
(actual time=0.208..0.241 rows=1 loops=1)              Hash Cond: ((f.a = f2.a) AND (f.b = f2.b))              ->
NestedLoop Left Join  (cost=0.15..16.21 rows=10
 
width=20) (actual time=0.029..0.062 rows=1 loops=1)                    ->  Seq Scan on fkest f  (cost=0.00..8.00
rows=1
width=12) (actual time=0.014..0.047 rows=1 loops=1)                          Filter: (c = 1)
RowsRemoved by Filter: 399                    ->  Index Only Scan using fkest1_pkey on fkest1
 
f1  (cost=0.15..8.17 rows=1 width=8) (actual time=0.012..0.012 rows=1
loops=1)                          Index Cond: ((a = f.a) AND (b = f.b))                          Heap Fetches: 1
     ->  Hash  (cost=6.00..6.00 rows=400 width=8) (actual
 
time=0.168..0.168 rows=400 loops=1)                    Buckets: 1024  Batches: 1  Memory Usage: 24kB
-> Seq Scan on fkest1 f2  (cost=0.00..6.00
 
rows=400 width=8) (actual time=0.008..0.043 rows=400 loops=1)        ->  Hash  (cost=6.00..6.00 rows=400 width=8)
(actual
time=0.156..0.156 rows=400 loops=1)              Buckets: 1024  Batches: 1  Memory Usage: 24kB              ->  Seq
Scanon fkest1 f3  (cost=0.00..6.00 rows=400
 
width=8) (actual time=0.006..0.035 rows=400 loops=1)  ->  Hash  (cost=6.00..6.00 rows=400 width=8) (actual
time=0.155..0.155 rows=400 loops=1)        Buckets: 1024  Batches: 1  Memory Usage: 24kB        ->  Seq Scan on fkest1
f4 (cost=0.00..6.00 rows=400 width=8)
 
(actual time=0.004..0.034 rows=400 loops=1)Planning time: 0.864 msExecution time: 0.698 ms

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Regression in join selectivity estimations when using foreign keys

От
Tom Lane
Дата:
David Rowley <david.rowley@2ndquadrant.com> writes:
> I've been analyzing a reported regression case between a 9.5 plan and
> a 9.6 plan. I tracked this down to the foreign key join selectivity
> code, specifically the use_smallest_selectivity code which is applied
> to outer joins where the referenced table is on the outer side of the
> join.
> ...
> I've attached fixes, based on master, for both of these cases.

I'm entirely unconvinced by this patch --- it seems to simply be throwing
away a lot of logic.  Notably it lobotomizes the FK code altogether for
semi/antijoin cases, but you've not shown any example that even involves
such joins, so what's the argument for doing that?  Also, the reason
we had the use_smallest_selectivity code in the first place was that we
didn't believe the FK-based selectivity could be applied safely to
outer-join cases, so simply deciding that it's OK to apply it after all
seems insufficiently justified.

Or in short, exhibiting one counterexample to the existing code is not
a sufficient argument for changing things this much.  You need to give
an argument why this is the right thing to do instead.

Stepping back a bit, it seems like the core thing the FK selectivity code
was meant to do was to prevent us from underestimating selectivity of
multiple-clause join conditions through a false assumption of clause
independence.  The problem with the use_smallest_selectivity code is that
it's overestimating selectivity, but that doesn't mean that we want to go
all the way back to the old way of doing things.  I wonder if we could get
any traction in these dubious cases by computing the product, instead of
minimum, of the clause selectivities (thus matching the old estimate) and
then taking the greater of that and the FK-based selectivity.
        regards, tom lane



Re: [HACKERS] Regression in join selectivity estimations when usingforeign keys

От
David Rowley
Дата:
On 21 May 2017 at 07:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm entirely unconvinced by this patch --- it seems to simply be throwing
> away a lot of logic.  Notably it lobotomizes the FK code altogether for
> semi/antijoin cases, but you've not shown any example that even involves
> such joins, so what's the argument for doing that?  Also, the reason
> we had the use_smallest_selectivity code in the first place was that we
> didn't believe the FK-based selectivity could be applied safely to
> outer-join cases, so simply deciding that it's OK to apply it after all
> seems insufficiently justified.

Originally I looked at just multiplying the selectivities in
use_smallest_selectivity, but on further thought, I didn't manage to
see why using foreign keys to assist with selectivity estimations when
the ref_is_outer is true. We still have a very high probability that
the outer rel contains a tuple to match each inner rel's tuples
(because inner references outer). The only difference between OUTER
and INNER typed joins is that OUTER will produce a bunch of NULL rows
in place of non-matched inner rows. That part seems to be handled in
calc_joinrel_size_estimate() by code such as:
if (nrows < outer_rows)
nrows = outer_rows;

We could do much better than what we have today by also adding
outer_rows - (n_distinct inner rows on referencing Vars), to also
account for those unmatched rows. However, I'd say that's not for
back-patching, although it may be especially good now that we have
multi-variate ndistinct in PG10.

> Or in short, exhibiting one counterexample to the existing code is not
> a sufficient argument for changing things this much.  You need to give
> an argument why this is the right thing to do instead.
>
> Stepping back a bit, it seems like the core thing the FK selectivity code
> was meant to do was to prevent us from underestimating selectivity of
> multiple-clause join conditions through a false assumption of clause
> independence.  The problem with the use_smallest_selectivity code is that
> it's overestimating selectivity, but that doesn't mean that we want to go
> all the way back to the old way of doing things.  I wonder if we could get
> any traction in these dubious cases by computing the product, instead of
> minimum, of the clause selectivities (thus matching the old estimate) and
> then taking the greater of that and the FK-based selectivity.

My concern with going back to selectivity multiplication was exactly
because I didn't want to go back to the original undesired behaviour
of underestimation when conditions are co-related. I'm unsure why
taking the greater is any better than just using the foreign key
selectivity. It seems senseless to use clause based selectivities at
all when we've proved the foreign key exists -- there will be no
unmatched inner tuples.

The reason I dropped the non-singleton join for SEMI and ANTI is that
I couldn't see how we could make any improvements over the original
join estimation code for this case. Perhaps I'm assuming too much
about how get_foreign_key_join_selectivity() is used by doing this?
I'm assuming that leaving the clause list intact and referring the
whole case to calc_joinrel_size_estimate() is okay to do.

The reason I added the nullfrac code in 0002 was because I was
concerned with regressing OUTER join cases where the nulfrac works due
to using the clause_selectivity(). Although this case regressed
between 9.5 and 9.6 for INNER joins with a non-zero nullfrac on the
join condition.

In short; if we replace the use_smallest_selectivity code with fkselec
*= clauselist_selectivity(root, removedlist, 0, jointype, sjinfo);
then I'm worried about regressing back to the old underestimations.
The extended stats code in PG10's version of clauselist_selectivity()
will ignore these join conditions, so nothing can be done to assist
the underestmations by adding extended stats yet.

I also just noticed that I don't think I've got ANTI join cases
correct in the patch I sent. I'll look at that now.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Regression in join selectivity estimations when usingforeign keys

От
David Rowley
Дата:
On 22 May 2017 at 16:10, David Rowley <david.rowley@2ndquadrant.com> wrote:
> I also just noticed that I don't think I've got ANTI join cases
> correct in the patch I sent. I'll look at that now.

I've attached an updated patch.

This one is much less invasive than my original attempt.

There are two fundamental changes here:

1) OUTER joins now use the foreign key as proof that the join
condition must match.
2) selectivity of nullfrac for null valued columns for OUTER joins is
no longer taken into account. This is now consistent with INNER joins,
which might not be perfect, but it's less surprising. If this is a
problem then we can consider applying something like my 0002 patch
above, however that can mean that nulls are double counted if there
are any other strict clauses which are not part of the foreign key
constraint, so that idea is not perfect either.

In addition to those two things, the poor selectivity estimation in my
original email is also fixed.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Regression in join selectivity estimations when using foreign keys

От
Tom Lane
Дата:
David Rowley <david.rowley@2ndquadrant.com> writes:
> On 22 May 2017 at 16:10, David Rowley <david.rowley@2ndquadrant.com> wrote:
>> I also just noticed that I don't think I've got ANTI join cases
>> correct in the patch I sent. I'll look at that now.

> I've attached an updated patch.
> This one is much less invasive than my original attempt.

Sorry for not dealing with this sooner --- I put it on the back burner
for PGCon, and just now got back to it.

First off, I agree that it was probably a mistake for this code to
special-case LEFT/FULL joins at all.  eqjoinsel() does not do so;
it relies on the join-type correction applied later by
calc_joinrel_size_estimate().  Since that correction is also downstream
of this code, we should be able to do the same here, and indeed may
be double-counting somehow if we don't.

What that leaves is that we're using the "smallest per-column selectivity"
hack only for SEMI/ANTI joins where we can't really get anything helpful
from knowledge of the FK.  What your patch does is to fall back on the
traditional clauselist_selectivity calculation for the relevant clauses.
But we'd be better off ignoring the FK altogether and leaving those
clauses to be processed later.  That saves some cycles, and it might allow
those clauses to be used more fruitfully with a different FK, and even
if that doesn't happen it's better to let clauselist_selectivity see as
many clauses at once as possible.

So I whacked the patch around to do it like that and pushed it.

I'm not totally satisfied that there isn't any case where the smallest
selectivity hack is appropriate.  In the example you're showing here,
the FK columns are independent so that we get more or less the right
answer with or without the FK.  But in some quick testing I could not
produce a counterexample proving that that heuristic is helpful;
so for now let's can it.

Thanks, and sorry again for the delay.
        regards, tom lane



Re: [HACKERS] Regression in join selectivity estimations when usingforeign keys

От
David Rowley
Дата:
On 20 June 2017 at 07:49, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm not totally satisfied that there isn't any case where the smallest
> selectivity hack is appropriate.  In the example you're showing here,
> the FK columns are independent so that we get more or less the right
> answer with or without the FK.  But in some quick testing I could not
> produce a counterexample proving that that heuristic is helpful;
> so for now let's can it.
>
> Thanks, and sorry again for the delay.

Many thanks for taking the time on this.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services