On 14 September 2018 at 05:57, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> > (Notice how even though the function is expected to be called at least 333
> > times, the cost doesn't account for even a single call.)
>
> Yeah. This evidently got broken sometime during v10 development,
> because 9.6 and below generate a more reasonable cost:
>
> Hash Join (cost=270.00..25298.75 rows=333 width=4)
> Hash Cond: (gs2.i = gs1.i)
> Join Filter: (expensive_func((gs1.i + gs2.i)) > 0)
> -> Function Scan on generate_series gs2 (cost=0.00..10.00 rows=1000 width=4)
> -> Hash (cost=145.00..145.00 rows=10000 width=4)
> -> Seq Scan on unique_inner gs1 (cost=0.00..145.00 rows=10000 width=4)
>
> > Dropping the primary key constraint makes the costs more reasonable
>
> Interesting. That sort of points the finger in the direction of the
> inner_unique patch, though it could be elsewhere.
This seems to be a result of using the semifactors.outer_match_frac in
final_cost_hashjoin(). This is calculated to be very low @
3.3333333333333335e-05, which results in outer_matched_rows being set
to 0 in:
outer_matched_rows = rint(outer_path_rows *
extra->semifactors.outer_match_frac);
It's not all that obvious what might be done to fix this giving that
the low outer_match_frac is the result of performing an estimation on
both the gs1.i = gs2.i qual and the function call. Each of which are
estimated independently as:
gs1.i = gs2.i = 0.0001
expensive_func(gs1.i + gs2.i) > 0 = 0.33333333333333331
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services