Re: pg9.6 segfault using simple query (related to use fk for join estimates)

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: pg9.6 segfault using simple query (related to use fk for join estimates)
Дата
Msg-id CA+TgmobTp1XoWSzX01aD6BGiUQf_o1fmbwpoXjKnREO+foOY-w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: pg9.6 segfault using simple query (related to use fk for join estimates)  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Wed, May 4, 2016 at 2:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I spent some time trying to make a test case that was impossibly slow,
> without any really impressive result: I saw at most about a 25% growth in
> planning time, for a 27-way join with one or two foreign keys per table.
> I noted however that with a simple FROM list of tables, you don't really
> get the full force of the combinatorial explosion, because
> join_search_one_level prefers to generate left-deep trees first, and so
> the first creation of a given joinrel is always N left-side rels against 1
> right-side rel, causing the second level of looping to always iterate just
> once.  (GEQO behaves likewise, I think.)  I spent a little time trying to
> devise join order constraints that would result in a lot of high-level
> joinrels being formed with many relations on both sides, which would cause
> both of the second and third levels to iterate O(N) times not just once.
> I didn't have much luck, but I have zero confidence that our users won't
> find such cases.

Have you looked at the patch David Rowley proposed to fix this by
doing some caching?  I am not crazy about accepting even a 25% growth
in planning time on a 27-way join, although maybe 27-way joins are
rare enough and vulnerable enough to bad plans that it would be worth
it if we could convince ourselves that plan quality would go up.  But
if that patch drops it to some much lesser number, we should consider
that as a possible fix.

> Bugs in quals_match_foreign_key():
>
> * Test for clause->opno == fkinfo->conpfeqop[i] fails to consider
> cross-type operators, ie, what's in the clause might be int2 = int4
> while the conpfeqop is int4 = int2.
>
> * parent_ec check isn't really the right thing, since EC-derived clauses
> might not have that set.  I think it may be adequate given that you only
> deal with simple Vars, but at least a comment about that would be good.
>
> * Come to think of it, you could probably dodge the commutator operator
> problem altogether for clauses with nonnull parent_ec, because they must
> contain a relevant equality operator.  (Although if it's redesigned as I
> suggest above, the code path for a clause with parent_ec set would look
> totally different from this anyway.)
>
> * Maintaining the fkmatches bitmapset is useless expense, just use a
> counter of matched keys.  Or for that matter, why not just fail
> immediately if i'th key is not found?

Technically, only the first of these is a clear bug, IMHO.  But it
seems like they should all be fixed.

> find_best_foreign_key_quals():
>
> * Test on enable_fkey_estimates should be one call level up to dodge the
> useless doubly-nested loop in clauselist_join_selectivity (which would
> make the "fast path" exit here pretty much useless)

Yes, that's pretty stupid, and should be fixed.  Coding style is not
per project spec, either.  Also, the header comment for
find_best_foreign_key_quals and in fact the name of the function look
pretty poor.  It seems that the return value is the number of columns
in the foreign key and that an out parameter, joinqualsbitmap, whose
exact meaning doesn't seem to be documented in any comment anywhere in
the patch.

> clauselist_join_selectivity():
>
> * "either could be zero, but not both" is a pretty unhelpful comment given
> the if-test just above it.  What *could* have used some explanation is
> what the next two dozen lines are doing, because they're as opaque as can
> be; and the XXX comment doesn't leave a warm feeling that the author
> understands it either.  I'm not prepared to opine on whether this segment
> of code is correct at all without better commentary.

I'm pretty baffled by this code, too.  I think what the overlap stuff
is doing is trying to calculate selectivity when we match multiple
foreign key constraints, but it doesn't look very principled.
find_best_foreign_key_quals discards "shorter" matches entirely,
picking arbitrarily among longer ones, but then we try to deal with
all of the ones that survive that stage even if they overlap.  It's
hard to judge whether any of this makes sense without more
explanation.

> calc_joinrel_size_estimate():
>
> * why is clauselist_join_selectivity applied to pushed-down clauses and
> not local ones in an outer join?  If that's not an oversight, it at least
> requires an explanatory comment.  Note that this means we are not applying
> FK knowledge for "a left join b on x = y", though it seems like we could.
>
> compute_semi_anti_join_factors isn't using clauselist_join_selectivity
> either.  I don't know whether it should be, but if not, a comment about
> why not seems appropriate.  More generally, I wonder why this logic
> wasn't just folded into clauselist_selectivity.

Good questions.

> guc.c:
> undocumented GUCs are not acceptable

Agreed.

> paths.h:
> patch introduces an extern that's referenced noplace

Oops.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: what to revert
Следующее
От: Robert Haas
Дата:
Сообщение: Re: New pgbench functions are misnamed