Обсуждение: BUG #15383: Join Filter cost estimation problem in 10.5

Поиск
Список
Период
Сортировка

BUG #15383: Join Filter cost estimation problem in 10.5

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      15383
Logged by:          Marko Tiikkaja
Email address:      marko@joh.to
PostgreSQL version: 10.5
Operating system:   Linux
Description:

I was looking at a problematic plan submitted by "sjamaan" on IRC and I
noticed that the join filter estimation seems completely wrong here:

create function expensive_func(int) returns int as $$ begin return 1; end $$
language plpgsql stable cost 10000;
create table unique_inner(a int primary key);
insert into unique_inner select generate_series(1, 10000);

explain select * from unique_inner gs1(i) join generate_series(1, 10) gs2(i)
using (i) where expensive_func(gs1.i + gs2.i) > 0;
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Hash Join  (cost=303.19..315.81 rows=333 width=4)
   Hash Cond: (gs2.i = gs1.i)
   Join Filter: (expensive_func((gs1.i + gs2.i)) > 0)
   ->  Function Scan on generate_series gs2  (cost=0.00..10.00 rows=1000
width=4)
   ->  Hash  (cost=159.75..159.75 rows=11475 width=4)
         ->  Seq Scan on unique_inner gs1  (cost=0.00..159.75 rows=11475
width=4)
(6 rows)

(Notice how even though the function is expected to be called at least 333
times, the cost doesn't account for even a single call.)

Dropping the primary key constraint makes the costs more reasonable (though
I'm still not sure how the planner arrives at these costs):

alter table unique_inner drop constraint unique_inner_pkey;
explain select * from unique_inner gs1(i) join generate_series(1, 10) gs2(i)
using (i) where expensive_func(gs1.i + gs2.i) > 0;
                                       QUERY PLAN
----------------------------------------------------------------------------------------
 Hash Join  (cost=22.50..1436880.94 rows=19125 width=4)
   Hash Cond: (gs1.i = gs2.i)
   Join Filter: (expensive_func((gs1.i + gs2.i)) > 0)
   ->  Seq Scan on unique_inner gs1  (cost=0.00..159.75 rows=11475
width=4)
   ->  Hash  (cost=10.00..10.00 rows=1000 width=4)
         ->  Function Scan on generate_series gs2  (cost=0.00..10.00
rows=1000 width=4)
(6 rows)


Re: BUG #15383: Join Filter cost estimation problem in 10.5

От
Tom Lane
Дата:
=?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes:
>                                     QUERY PLAN
> ----------------------------------------------------------------------------------
>  Hash Join  (cost=303.19..315.81 rows=333 width=4)
>    Hash Cond: (gs2.i = gs1.i)
>    Join Filter: (expensive_func((gs1.i + gs2.i)) > 0)
>    ->  Function Scan on generate_series gs2  (cost=0.00..10.00 rows=1000
> width=4)
>    ->  Hash  (cost=159.75..159.75 rows=11475 width=4)
>          ->  Seq Scan on unique_inner gs1  (cost=0.00..159.75 rows=11475
> width=4)
> (6 rows)

> (Notice how even though the function is expected to be called at least 333
> times, the cost doesn't account for even a single call.)

Yeah.  This evidently got broken sometime during v10 development,
because 9.6 and below generate a more reasonable cost:

 Hash Join  (cost=270.00..25298.75 rows=333 width=4)
   Hash Cond: (gs2.i = gs1.i)
   Join Filter: (expensive_func((gs1.i + gs2.i)) > 0)
   ->  Function Scan on generate_series gs2  (cost=0.00..10.00 rows=1000 width=4)
   ->  Hash  (cost=145.00..145.00 rows=10000 width=4)
         ->  Seq Scan on unique_inner gs1  (cost=0.00..145.00 rows=10000 width=4)

> Dropping the primary key constraint makes the costs more reasonable

Interesting.  That sort of points the finger in the direction of the
inner_unique patch, though it could be elsewhere.

Will look into it if nobody beats me to it.

            regards, tom lane


Re: BUG #15383: Join Filter cost estimation problem in 10.5

От
David Rowley
Дата:
On 14 September 2018 at 05:57, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> > (Notice how even though the function is expected to be called at least 333
> > times, the cost doesn't account for even a single call.)
>
> Yeah.  This evidently got broken sometime during v10 development,
> because 9.6 and below generate a more reasonable cost:
>
>  Hash Join  (cost=270.00..25298.75 rows=333 width=4)
>    Hash Cond: (gs2.i = gs1.i)
>    Join Filter: (expensive_func((gs1.i + gs2.i)) > 0)
>    ->  Function Scan on generate_series gs2  (cost=0.00..10.00 rows=1000 width=4)
>    ->  Hash  (cost=145.00..145.00 rows=10000 width=4)
>          ->  Seq Scan on unique_inner gs1  (cost=0.00..145.00 rows=10000 width=4)
>
> > Dropping the primary key constraint makes the costs more reasonable
>
> Interesting.  That sort of points the finger in the direction of the
> inner_unique patch, though it could be elsewhere.

This seems to be a result of using the semifactors.outer_match_frac in
final_cost_hashjoin(). This is calculated to be very low @
3.3333333333333335e-05, which results in outer_matched_rows being set
to 0 in:

outer_matched_rows = rint(outer_path_rows *
extra->semifactors.outer_match_frac);

It's not all that obvious what might be done to fix this giving that
the low outer_match_frac is the result of performing an estimation on
both the gs1.i = gs2.i qual and the function call. Each of which are
estimated independently as:

gs1.i = gs2.i = 0.0001
expensive_func(gs1.i + gs2.i) > 0 = 0.33333333333333331

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: BUG #15383: Join Filter cost estimation problem in 10.5

От
Tom Lane
Дата:
David Rowley <david.rowley@2ndquadrant.com> writes:
> On 14 September 2018 at 05:57, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yeah.  This evidently got broken sometime during v10 development,
>> because 9.6 and below generate a more reasonable cost:

> This seems to be a result of using the semifactors.outer_match_frac in
> final_cost_hashjoin(). This is calculated to be very low @
> 3.3333333333333335e-05, which results in outer_matched_rows being set
> to 0 in:

So after poking around for awhile, my conclusion is that the cost
estimation aspects of the inner_unique patch are completely broken,
and it's not going to be very easy to fix.

The core issue here is that compute_semi_anti_join_factors was never
designed to work for any joins that aren't really SEMI or ANTI joins,
and just taking out the assert about that doesn't fix it.  In
particular, passing jointype == JOIN_SEMI to clauselist_selectivity,
when the underlying sjinfo has jointype == JOIN_INNER, isn't a supported
combination.  If you look at eqjoinsel() you will notice that it pays
no attention at all to the jointype parameter, only to sjinfo->jointype.
Therefore what we get out of the first clauselist_selectivity is the
same value as we get from the second one (ie, inner-join selectivity)
leading to entirely insane results from compute_semi_anti_join_factors.

There are more problems though.  If you change eqjoinsel() to look at
the jointype parameter, you get a cost estimate of around 4000, which
is because outer_match_frac gets set to 0.16667, which is better but
still not exactly good.  The reason for that is that eqjoinsel_semi
punts (due to lack of any stats for the generate_series RTE) and
returns 0.5, and likewise we get a default estimate of 0.33333 from
the inequality on the expensive_func result, so 0.16667 is the
combined jselec estimate.  So the default estimate from eqjoinsel_semi
is unrelated to the default estimate from eqjoinsel_inner, which is
unhelpful for what we're doing here, plus scalargtjoinsel didn't
adjust its behavior at all.  We have, in fact, not really built out
the semijoin estimation infrastructure anywhere except eqjoinsel and
neqjoinsel (though I see somebody made it work for networkjoinsel).
This is mostly tolerable for the purposes of real SEMI and ANTI joins,
because it's relatively hard/unusual for those to have join quals that
aren't equality quals.  But if you expect that infrastructure to give
you sane results for random other joins, you're going to be sorely
disappointed.

Also worth noting here is that it's wrong in the first place to be
including the selectivity of the inequality qual in our calculation
of how many rows are going to be fed to the inequality qual :-(.
Again, the infrastructure involved isn't really broken for its
designed purpose, because with a normal semijoin there aren't any
filter quals to be considered separately; but that assumption breaks
down when you try to use that infrastructure for a regular join that
happens to have a unique inner side.

So my conclusion here is that we probably ought to revert the changes
in compute_semi_anti_join_factors that made it not reject other join
types, and that unique_inner cases need some other cost estimation
mechanism that doesn't depend on pretending that a join is a SEMI
join when it isn't.

(Another, longer-term project is to rationalize the situation with
joinsel functions getting jointype parameters that are different
from sjinfo->jointype.  The cases where that can happen are pretty
random and underdocumented, and to the extent that there's any
guidance at all, it's the comment at the head of selfuncs.c that
says it's better to ignore jointype in favor of sjinfo->jointype.
So I'd not be very much on board with just changing eqjoinsel
even if that were enough to fix this completely --- we'd need to
take a very hard look at a bunch of cases to figure out what behavior
we really want the estimators to have.  In the end it might be best to
just refuse to allow those values to be different.)

            regards, tom lane


Re: BUG #15383: Join Filter cost estimation problem in 10.5

От
David Rowley
Дата:
On 20 September 2018 at 08:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> So after poking around for awhile, my conclusion is that the cost
> estimation aspects of the inner_unique patch are completely broken,
> and it's not going to be very easy to fix.
>
> The core issue here is that compute_semi_anti_join_factors was never
> designed to work for any joins that aren't really SEMI or ANTI joins,
> and just taking out the assert about that doesn't fix it.  In
> particular, passing jointype == JOIN_SEMI to clauselist_selectivity,
> when the underlying sjinfo has jointype == JOIN_INNER, isn't a supported
> combination.  If you look at eqjoinsel() you will notice that it pays
> no attention at all to the jointype parameter, only to sjinfo->jointype.
> Therefore what we get out of the first clauselist_selectivity is the
> same value as we get from the second one (ie, inner-join selectivity)
> leading to entirely insane results from compute_semi_anti_join_factors.

I looked at this again and I see that when we're building the join rel
we call calc_joinrel_size_estimate() to set ->rows. We do this based
on the actual join type as at this stage we've yet to detect if the
join has a unique inner side or not.  My original idea with this code
was that unique joins would be costed in the same way as semi joins
which I had hoped would encourage the unique side to be put on the
inner side when the costs were roughly the same.  Now in light of this
bug, and given that the rows for the join rel is set based on the
original join type, I think it's better just to pull out the code that
attempts to cost unique joins differently.  I don't really see
changing the rows property of the joinrel as something we can actually
do since any updated estimate would be bogus if the join order was
reversed, which is possible for JOIN_INNER with a unique inner, but
not possible for a JOIN_SEMI.

I've attached a patch which is a partial revert of the costing code
that was changed in 9c7f5229ad6.

I'm a bit unsure how good an idea it is to put back the Assert in
compute_semi_anti_join_factors() in the back branches.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: BUG #15383: Join Filter cost estimation problem in 10.5

От
Tom Lane
Дата:
David Rowley <david.rowley@2ndquadrant.com> writes:
> I've attached a patch which is a partial revert of the costing code
> that was changed in 9c7f5229ad6.

I've spent some time thinking about this (sorry for the delay).
I don't think I like doing what you suggest here.  While it makes
Marko's example better, it will make other cases worse, because
it effectively causes the planner not to give any cost preference
to inner_unique plans.  That means it'll sometimes choose another
plan over a plan making use of inner_unique when we don't want it
to.  So that's not very nice, and it definitely makes me not want
to back-patch this: people aren't that happy with back-patching
plan-destabilizing changes even when we think they are 100% wins.

So what I'm wondering about is if we can't fix
compute_semi_anti_join_factors() to produce something sane without
trying to apply JOIN_SEMI estimation mode to non-semijoins.  It
only has to compute two things, and match_count seems pretty
trivial: it's 1 by definition for an inner_unique case.  All we
need is a rule for computing outer_match_frac.

I experimented with the attached, which estimates outer_match_frac
as joinrel size divided by outer rel size, relying on the knowledge
that no outer row matches more than one inner row.  This makes
Marko's case better, but it's not completely better: it underestimates
the cost of the expensive function by 3X compared to what pre-v10 did.
That is due to the point I made upthread:

>> Also worth noting here is that it's wrong in the first place to be
>> including the selectivity of the inequality qual in our calculation
>> of how many rows are going to be fed to the inequality qual :-(.
>> Again, the infrastructure involved isn't really broken for its
>> designed purpose, because with a normal semijoin there aren't any
>> filter quals to be considered separately; but that assumption breaks
>> down when you try to use that infrastructure for a regular join that
>> happens to have a unique inner side.

We're including the estimated selectivity of the expensive_func qual in
our estimate of how many rows will be fed to the expensive function :-(.

It's not easy to make this better, though: the places that use
outer_match_frac aren't really drawing a distinction between filter
quals and quals that relate to the uniqueness property, and that's
what we'd need in order to have a sane estimate here.  Nor does
compute_semi_anti_join_factors have any idea which quals are which.

I think we'd have to redesign those cost calculations completely to
get ideal answers, and I don't especially want to do that right now.
So I'm wondering about the attached as a stopgap.  It seems like it's
an improvement over what we have, even if not a full fix.

(BTW, another point is that the executor is also imperfect here:
it will keep scanning if the "expensive_func(gs1.i + gs2.i) > 0"
qual fails, though it could stop, because the inner_unique property
is not dependent on that qual.)

Notes:

* For ease of review, I didn't reindent the body of
compute_semi_anti_join_factors, though it'd need it.

* The one join.sql test case that changes plan seems to need a redesign;
the new plan is correct, but it seems like it's not testing what its
comment says it's for.

Thoughts?

            regards, tom lane

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 4b9be13..fcdccf1 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -4078,7 +4078,7 @@ get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
  *    joinrel: join relation under consideration
  *    outerrel: outer relation under consideration
  *    innerrel: inner relation under consideration
- *    jointype: if not JOIN_SEMI or JOIN_ANTI, we assume it's inner_unique
+ *    jointype: if not JOIN_SEMI or JOIN_ANTI, must be an inner_unique case
  *    sjinfo: SpecialJoinInfo relevant to this join
  *    restrictlist: join quals
  * Output parameters:
@@ -4101,14 +4101,16 @@ compute_semi_anti_join_factors(PlannerInfo *root,
     List       *joinquals;
     ListCell   *l;

+    /* We have to do this differently for semi/anti joins vs. inner_unique */
+    if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
+    {
     /*
      * In an ANTI join, we must ignore clauses that are "pushed down", since
      * those won't affect the match logic.  In a SEMI join, we do not
      * distinguish joinquals from "pushed down" quals, so just use the whole
-     * restrictinfo list.  For other outer join types, we should consider only
-     * non-pushed-down quals, so that this devolves to an IS_OUTER_JOIN check.
+     * restrictinfo list.
      */
-    if (IS_OUTER_JOIN(jointype))
+    if (jointype == JOIN_ANTI)
     {
         joinquals = NIL;
         foreach(l, restrictlist)
@@ -4128,7 +4130,7 @@ compute_semi_anti_join_factors(PlannerInfo *root,
     jselec = clauselist_selectivity(root,
                                     joinquals,
                                     0,
-                                    (jointype == JOIN_ANTI) ? JOIN_ANTI : JOIN_SEMI,
+                                    jointype,
                                     sjinfo);

     /*
@@ -4155,7 +4157,7 @@ compute_semi_anti_join_factors(PlannerInfo *root,
                                     &norm_sjinfo);

     /* Avoid leaking a lot of ListCells */
-    if (IS_OUTER_JOIN(jointype))
+    if (jointype == JOIN_ANTI)
         list_free(joinquals);

     /*
@@ -4178,6 +4180,21 @@ compute_semi_anti_join_factors(PlannerInfo *root,
     }
     else
         avgmatch = 1.0;
+    }
+    else
+    {
+        /*
+         * Must be an inner_unique case.  match_count is 1 by definition, and
+         * we can compute outer_match_frac as joinrel size / outerrel size.
+         * For paranoia's sake, clamp that to 0..1.
+         */
+        if (outerrel->rows > 0)
+            jselec = joinrel->rows / outerrel->rows;
+        else
+            jselec = 1.0;
+        CLAMP_PROBABILITY(jselec);
+        avgmatch = 1.0;
+    }

     semifactors->outer_match_frac = jselec;
     semifactors->match_count = avgmatch;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 593aec2..b58ca75 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5712,15 +5712,14 @@ select * from j1 inner join j3 on j1.id = j3.id;
 -----------------------------------
  Hash Join
    Output: j1.id, j3.id
-   Inner Unique: true
-   Hash Cond: (j3.id = j1.id)
-   ->  Seq Scan on public.j3
-         Output: j3.id
-   ->  Hash
+   Hash Cond: (j1.id = j3.id)
+   ->  Seq Scan on public.j1
          Output: j1.id
-         ->  Seq Scan on public.j1
-               Output: j1.id
-(10 rows)
+   ->  Hash
+         Output: j3.id
+         ->  Seq Scan on public.j3
+               Output: j3.id
+(9 rows)

 -- ensure left join is marked as unique
 explain (verbose, costs off)

Re: BUG #15383: Join Filter cost estimation problem in 10.5

От
Tom Lane
Дата:
I wrote:
> I think we'd have to redesign those cost calculations completely to
> get ideal answers, and I don't especially want to do that right now.
> So I'm wondering about the attached as a stopgap.  It seems like it's
> an improvement over what we have, even if not a full fix.

> Thoughts?

Ping?  If nobody has any further ideas here, I'm going to clean up
the regression test issue and push this.

            regards, tom lane



Re: BUG #15383: Join Filter cost estimation problem in 10.5

От
David Rowley
Дата:
On Thu, 4 Apr 2019 at 06:07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I wrote:
> > I think we'd have to redesign those cost calculations completely to
> > get ideal answers, and I don't especially want to do that right now.
> > So I'm wondering about the attached as a stopgap.  It seems like it's
> > an improvement over what we have, even if not a full fix.
>
> > Thoughts?
>
> Ping?  If nobody has any further ideas here, I'm going to clean up
> the regression test issue and push this.

Going back to the original example:

create function expensive_func(int) returns int as $$ begin return 1; end $$
language plpgsql stable cost 10000;
create table unique_inner(a int primary key);
insert into unique_inner select generate_series(1, 10000);
analyze unique_inner;

The problem query was:

explain verbose select * from unique_inner gs1(i) join
generate_series(1, 10) gs2(i) using (i) where expensive_func(gs1.i +
gs2.i) > 0;

It is my expectation that the plan for the above should be the same as
the plan for the query below.

explain verbose select * from generate_series(1,10) gs2(i) where
exists (select i from unique_inner gs1(i) where gs1.i = gs2.i and
expensive_func(gs1.i + gs2.i) > 0);

and it is.

However, I'm a bit surprised at the following not having the plan come
in cheaper when the unique rel is on the inner side:

create table t1 (a int primary key);
create table t2 (a int);
create index on t2(a);
insert into t1 select generate_Series(1,10000);
insert into t2 select generate_Series(1,10000);
analyze t1,t2;
explain verbose select * from t1 inner join t2 on t1.a=t2.a;
                                QUERY PLAN
---------------------------------------------------------------------------
 Hash Join  (cost=270.00..552.50 rows=10000 width=8)
   Output: t1.a, t2.a
   Hash Cond: (t1.a = t2.a)
   ->  Seq Scan on public.t1  (cost=0.00..145.00 rows=10000 width=4)
         Output: t1.a
   ->  Hash  (cost=145.00..145.00 rows=10000 width=4)
         Output: t2.a
         ->  Seq Scan on public.t2  (cost=0.00..145.00 rows=10000 width=4)
               Output: t2.a
(9 rows)

-- again but with the relations written in the opposite order.
explain verbose select * from t2 inner join t1 on t1.a=t2.a;
                                QUERY PLAN
---------------------------------------------------------------------------
 Hash Join  (cost=270.00..552.50 rows=10000 width=8)
   Output: t2.a, t1.a
   Inner Unique: true
   Hash Cond: (t2.a = t1.a)
   ->  Seq Scan on public.t2  (cost=0.00..145.00 rows=10000 width=4)
         Output: t2.a
   ->  Hash  (cost=145.00..145.00 rows=10000 width=4)
         Output: t1.a
         ->  Seq Scan on public.t1  (cost=0.00..145.00 rows=10000 width=4)
               Output: t1.a
(10 rows)

If I try the same on master it always prefers t1 on the inside of the
join. It would be nice if the costs reflected that since there are
savings to be had during execution with the unique rel on the inner
side.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: BUG #15383: Join Filter cost estimation problem in 10.5

От
Alvaro Herrera
Дата:
On 2019-Apr-03, Tom Lane wrote:

> Ping?  If nobody has any further ideas here, I'm going to clean up
> the regression test issue and push this.

Is this still live?  If so, is it something for pg12?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: BUG #15383: Join Filter cost estimation problem in 10.5

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2019-Apr-03, Tom Lane wrote:
>> Ping?  If nobody has any further ideas here, I'm going to clean up
>> the regression test issue and push this.

> Is this still live?  If so, is it something for pg12?

It's still live --- Rowley seemed not happy with my proposal so
I didn't push it.  I'm not sure that it's must-fix-for-v12;
the problem is older than that.

            regards, tom lane



Re: BUG #15383: Join Filter cost estimation problem in 10.5

От
Alvaro Herrera
Дата:
On 2019-Jun-28, Tom Lane wrote:

> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > On 2019-Apr-03, Tom Lane wrote:
> >> Ping?  If nobody has any further ideas here, I'm going to clean up
> >> the regression test issue and push this.
> 
> > Is this still live?  If so, is it something for pg12?
> 
> It's still live --- Rowley seemed not happy with my proposal so
> I didn't push it.  I'm not sure that it's must-fix-for-v12;
> the problem is older than that.

Hmm, but we cannot backpatch because of plan stability, right?  So it's
a commitfest item for pg13?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: BUG #15383: Join Filter cost estimation problem in 10.5

От
David Rowley
Дата:
On Sat, 29 Jun 2019 at 10:06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > On 2019-Apr-03, Tom Lane wrote:
> >> Ping?  If nobody has any further ideas here, I'm going to clean up
> >> the regression test issue and push this.
>
> > Is this still live?  If so, is it something for pg12?
>
> It's still live --- Rowley seemed not happy with my proposal so
> I didn't push it.  I'm not sure that it's must-fix-for-v12;
> the problem is older than that.

One thing we could look at would be to charge an additional
cpu_tuple_cost per outer row for all joins except semi, anti and
unique joins.  This would account for the additional lookup for
another matching row which won't be found and cause the planner to
slightly favour keeping the unique rel on the inner side of the join,
when everything else is equal.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: BUG #15383: Join Filter cost estimation problem in 10.5

От
Alvaro Herrera
Дата:
On 2019-Jul-01, David Rowley wrote:

> One thing we could look at would be to charge an additional
> cpu_tuple_cost per outer row for all joins except semi, anti and
> unique joins.  This would account for the additional lookup for
> another matching row which won't be found and cause the planner to
> slightly favour keeping the unique rel on the inner side of the join,
> when everything else is equal.

So does this proposal fix the problem that David was describing earlier?
If it does, Tom, would you like to get this finalized and pushed?

I've bcc'ed someone who might be interested in chiming in and/or helping
out.  Please look here:
https://postgr.es/m/153683552113.22350.18441286362867559841@wrigleys.postgresql.org

Thanks,

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: BUG #15383: Join Filter cost estimation problem in 10.5

От
Peter Bex
Дата:
On Thu, Sep 05, 2019 at 07:20:14PM -0400, Alvaro Herrera wrote:
> So does this proposal fix the problem that David was describing earlier?
> If it does, Tom, would you like to get this finalized and pushed?
>
> I've bcc'ed someone who might be interested in chiming in and/or helping
> out.  Please look here:
> https://postgr.es/m/153683552113.22350.18441286362867559841@wrigleys.postgresql.org

Hi all,

I was the one who originally reported the issue on IRC (Alvarro BCCed me).
It took me a while to find the time to set up a VM with Postgres patched
and unpatched.  And I had to figure out which query it was again which
triggered the issue (thank god that Firefox still had my explain pastes
in its history).

But I'm happy to report that the patch actually affects the real life
query in a positive way; it's indeed much faster again after applying
the patch!

Will this make it into Postgres 12?

Cheers,
Peter

Вложения

Re: BUG #15383: Join Filter cost estimation problem in 10.5

От
Michael Paquier
Дата:
On Thu, Sep 05, 2019 at 07:20:14PM -0400, Alvaro Herrera wrote:
> So does this proposal fix the problem that David was describing earlier?
> If it does, Tom, would you like to get this finalized and pushed?

Tom, David, we still have this patch registered as a bug fix in the CF
app.  Are you planning to look at it and potentially commit something?
--
Michael

Вложения

Re: BUG #15383: Join Filter cost estimation problem in 10.5

От
Tom Lane
Дата:
Michael Paquier <michael@paquier.xyz> writes:
> On Thu, Sep 05, 2019 at 07:20:14PM -0400, Alvaro Herrera wrote:
>> So does this proposal fix the problem that David was describing earlier?
>> If it does, Tom, would you like to get this finalized and pushed?

> Tom, David, we still have this patch registered as a bug fix in the CF
> app.  Are you planning to look at it and potentially commit something?

I don't think we have any committable patch at the moment.  The "stopgap"
patch I posted seems unacceptable in view of the counterexample David
found, while his original partial-revert patch doesn't look any better
than it did then.

It's possible we could salvage the "stopgap" patch with the idea David
mentioned,

>>> One thing we could look at would be to charge an additional
>>> cpu_tuple_cost per outer row for all joins except semi, anti and
>>> unique joins.  This would account for the additional lookup for
>>> another matching row which won't be found and cause the planner to
>>> slightly favour keeping the unique rel on the inner side of the join,
>>> when everything else is equal.

which'd help break ties in the right direction.  It's a bit scary to
be fixing this issue by changing the cost estimates for non-unique
joins --- that could have side-effects we don't want.  But arguably,
the above is a correct refinement to the cost model, so maybe it's
okay.

Or, since I think we're out of the realm of what would be sane to
back-patch anyway, maybe it's time to take two steps back and try
to find a real, non-stopgap solution.  As I said upthread, I think
the core problem here is failure to distinguish quals associated with
the uniqueness condition from additional "filter" quals that have to
be checked anyway.  I wonder how hard that is to fix.

            regards, tom lane



Re: BUG #15383: Join Filter cost estimation problem in 10.5

От
Alvaro Herrera
Дата:
On 2019-Nov-30, Tom Lane wrote:

> Or, since I think we're out of the realm of what would be sane to
> back-patch anyway, maybe it's time to take two steps back and try
> to find a real, non-stopgap solution.  As I said upthread, I think
> the core problem here is failure to distinguish quals associated with
> the uniqueness condition from additional "filter" quals that have to
> be checked anyway.  I wonder how hard that is to fix.

So does anyone have a way to move this forward?  David, any input on
this?

Thanks

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: BUG #15383: Join Filter cost estimation problem in 10.5

От
David Rowley
Дата:
On Sun, 1 Dec 2019 at 06:32, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Michael Paquier <michael@paquier.xyz> writes:
> >>> One thing we could look at would be to charge an additional
> >>> cpu_tuple_cost per outer row for all joins except semi, anti and
> >>> unique joins.  This would account for the additional lookup for
> >>> another matching row which won't be found and cause the planner to
> >>> slightly favour keeping the unique rel on the inner side of the join,
> >>> when everything else is equal.
>
> which'd help break ties in the right direction.  It's a bit scary to
> be fixing this issue by changing the cost estimates for non-unique
> joins --- that could have side-effects we don't want.  But arguably,
> the above is a correct refinement to the cost model, so maybe it's
> okay.

I wanted to see just how much fallout there'd be in the regression
tests if we did add the small non-unique join surcharge to all joins
which are unable to skip to the next outer tuple after matching the
first inner one.  I went and added a cpu_tuple_cost per row to try to
account for the additional cost during execution.  In the simple test
that I showed in April last year on this thread, it now does always
prefer to keep the unique rel on the inner side of the join with all 3
join types.  However, there's quite a bit of fallout in the regression
tests.  Mostly around changes in join order, but I see there's also a
weird failure in join.out on a test that claims that it shouldn't
allow a unique join. The output of that has changed to actually
performing a unique join. The test looks to be broken since the qual
comparing to the cost should allow detection of unique joins.

Perhaps 1 cpu_tuple_cost per output tuple is far too big a surcharge.
For hash join with the example case from April 2019, the surcharge
adds about 18% to the overall cost of the join. 552.50 up to 652.50.

The attached patch is Tom's patch from this thread from March last
year minus his regression test changes plus my join surcharge stuff.
This is just intended as a topic of conversation at the moment and
does not make any adjustments to the expected test outputs.

David

Вложения

Re: BUG #15383: Join Filter cost estimation problem in 10.5

От
Michael Paquier
Дата:
On Fri, Apr 24, 2020 at 06:26:49PM +1200, David Rowley wrote:
> The attached patch is Tom's patch from this thread from March last
> year minus his regression test changes plus my join surcharge stuff.
> This is just intended as a topic of conversation at the moment and
> does not make any adjustments to the expected test outputs.

So, this patch has been moved across 7 successive CFs already.  Tom,
you are registered as a reviewer of this patch.  Do we have any plans
to move this forward?  Or, as this thread is now idle for months,
should it be just RwF or such?
--
Michael

Вложения

Re: BUG #15383: Join Filter cost estimation problem in 10.5

От
Tom Lane
Дата:
Michael Paquier <michael@paquier.xyz> writes:
> On Fri, Apr 24, 2020 at 06:26:49PM +1200, David Rowley wrote:
>> The attached patch is Tom's patch from this thread from March last
>> year minus his regression test changes plus my join surcharge stuff.
>> This is just intended as a topic of conversation at the moment and
>> does not make any adjustments to the expected test outputs.

> So, this patch has been moved across 7 successive CFs already.  Tom,
> you are registered as a reviewer of this patch.  Do we have any plans
> to move this forward?  Or, as this thread is now idle for months,
> should it be just RwF or such?

Well, there's a real problem here, but the patch is not pretending
to be committable -- as David says, it's more a starting point for
investigation.  It's fair to say that this isn't the sort of material
that should be on the CF list.  We don't have a good way to keep
track of "problems that should be investigated" :-(

            regards, tom lane



Re: BUG #15383: Join Filter cost estimation problem in 10.5

От
Michael Paquier
Дата:
On Thu, Sep 24, 2020 at 12:06:31AM -0400, Tom Lane wrote:
> Well, there's a real problem here, but the patch is not pretending
> to be committable -- as David says, it's more a starting point for
> investigation.  It's fair to say that this isn't the sort of material
> that should be on the CF list.  We don't have a good way to keep
> track of "problems that should be investigated" :-(

Even the TODO list?
--
Michael

Вложения

Re: BUG #15383: Join Filter cost estimation problem in 10.5

От
David Rowley
Дата:
On Thu, 24 Sep 2020 at 16:11, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Sep 24, 2020 at 12:06:31AM -0400, Tom Lane wrote:
> > Well, there's a real problem here, but the patch is not pretending
> > to be committable -- as David says, it's more a starting point for
> > investigation.  It's fair to say that this isn't the sort of material
> > that should be on the CF list.  We don't have a good way to keep
> > track of "problems that should be investigated" :-(
>
> Even the TODO list?

... putting it in a place we might one day look again might be better :-)

David



Re: BUG #15383: Join Filter cost estimation problem in 10.5

От
Michael Paquier
Дата:
On Thu, Sep 24, 2020 at 05:10:26PM +1200, David Rowley wrote:
> ... putting it in a place we might one day look again might be better :-)

So am I getting correctly that you are suggesting to wipe out entirely
the existing TODO list and recreate a new one?  That works for me :p

Except for that, I don't have a better idea than creating a new page
on the wiki, like something named after planner improvements, if we
don't want to keep this stuff in the CF for now.
--
Michael

Вложения

Re: BUG #15383: Join Filter cost estimation problem in 10.5

От
David Rowley
Дата:
On Thu, 24 Sep 2020 at 18:33, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Sep 24, 2020 at 05:10:26PM +1200, David Rowley wrote:
> > ... putting it in a place we might one day look again might be better :-)
>
> So am I getting correctly that you are suggesting to wipe out entirely
> the existing TODO list and recreate a new one?  That works for me :p

What I meant was, I never look at the TODO list on the wiki. I don't
think it's a good place to put it if we want to maintain the
motivation to get the problem fixed.

> Except for that, I don't have a better idea than creating a new page
> on the wiki, like something named after planner improvements, if we
> don't want to keep this stuff in the CF for now.

It seems a bit backwards to me to move a reminder for an item that we
want to fix to somewhere less visible. Feels a bit like sweeping bugs
under the carpet.  That does not make them go away.

Rather than see the item moved off somewhere else, my personal view is
that if people don't like the item being there then the likely best
course of action is for them to have a look at the problem and/or the
patch and voice their opinion on it and try to get the discussion
going again.  If the discussion concludes with that the problem is not
big enough to warrant fixing it then we can leave a note and withdraw
the item.

However, to me it feels like a good time to make these sort of changes
in the planner. There's still plenty of time for people to complain if
they don't like what we've done before PG14 ships.

David