Re: [HACKERS] postgres_fdw bug in 9.6

Поиск
Список
Период
Сортировка
От Ashutosh Bapat
Тема Re: [HACKERS] postgres_fdw bug in 9.6
Дата
Msg-id CAFjFpRfYETZLiKERpxCeAN1MsiJypsyNd6x9FQ7OWjY=_TGK2Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] postgres_fdw bug in 9.6  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [HACKERS] postgres_fdw bug in 9.6  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
Список pgsql-hackers
On Tue, Dec 13, 2016 at 4:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
>> I have a test case where I made the fdw connect back to itself, and
>> stripped out all the objects that I could and still reproduce the case.  It
>> is large, 21MB compressed, 163MB uncompressed, so I am linking it here:
>> https://drive.google.com/open?id=0Bzqrh1SO9FcEZkpPM0JwUEMzcmc
>
> Thanks for the test case.  I believe I understand the fundamental problem,
> or one of the fundamental problems --- I'm not exactly convinced there
> aren't several here.
>
> The key issue is that GetExistingLocalJoinPath believes it can return
> any random join path as what to use for the fdw_outerpath of a foreign
> join.  You can get away with that, perhaps, as long as you only consider
> two-way joins.  But in a nest of foreign joins, this fails to consider
> the interrelationships of join paths and their children.  In particular,
> what we have happening in this example is that GetExistingLocalJoinPath
> seizes randomly on a MergePath for an upper join relation, clones it,
> sees that the outer child is a join ForeignPath, and replaces that outer
> child with its fdw_outerpath ... which was chosen equally cavalierly by
> some previous execution of GetExistingLocalJoinPath, and does not have
> the sort ordering expected by the MergePath.  So we'd generate an invalid
> EPQ plan, except that the debug cross-check in create_mergejoin_plan
> notices the discrepancy.

Thanks for the detailed explanation.

>
> I believe there are probably more problems here, or at least if there
> aren't, it's not clear why not.  Because of GetExistingLocalJoinPath's
> lack of curiosity about what's underneath the join pathnode it picks,
> it seems to me that it's possible for it to return a path tree that
> *isn't* all local joins.  If we're looking at, say, a hash path for
> a 4-way join, whose left input is a hash path for a 3-way join, whose
> left input is a 2-way foreign join, what's stopping that from being
> returned as a "local" path tree?

Right, the so called "local join tree" can contain ForeignPath
representing joins between foreign relations. AFAIU what protects us
from getting into problem is that postgresRecheckForeignScan calls
ExecProcNode() on the outer plan. So, even if there are ForeignPaths
in fdw_outerpath tree, corresponding plans would use a local join plan
to apply EPQ recheck. The code in GetExistingLocalJoinPath() avoids
the redirection at least for the inner or outer ForeignPaths. Any
comment claiming that path tree under fdw_outerpath is entirely
"local" should be removed, if it survives the fix.

>
> Likewise, it seems like the code is trying to reject any custom-path join
> types, or at least this barely-intelligible comment seems to imply that:
>
>                  * Just skip anything else. We don't know if corresponding
>                  * plan would build the output row from whole-row references
>                  * of base relations and execute the EPQ checks.
>
> But this coding fails to notice any such join type that's below the
> level of the immediate two join inputs.

Similarly, here we assume that any custom plan would tell us whether
the given row survives EPQ recheck. As long as a custom plan doing
that correctly, it shouldn't be a problem.

>
> We've probably managed to not notice this so far because foreign joins
> generally ought to dominate any local join method, so that there wouldn't
> often be cases where the surviving paths use local joins for input
> sub-joins.  But Jeff's test case proves it can happen.
>
> I kind of wonder why this infrastructure exists at all; it's not the way
> I'd have foreseen handling EPQ for remote joins.  However, while "throw
> it away and start again" might be good advice going forward, I suppose
> it won't be very popular for applying to 9.6.
>
> One way that we could make things better is to rely on the knowledge
> that EPQ isn't asked to evaluate joins for more than one row per input
> relation, and therefore using merge or hash join technology is really
> overkill.  We could make a tree of simple nestloop joins, which aren't
> going to care about sort order, if we could identify the correct join
> clauses to apply.  At least some of that could be laid off on the FDW,
> which if it's gotten this far at all, ought to know what join clauses
> need to be enforced by the foreign join.  So I'm thinking a little bit
> in terms of "just collect the foreign scans for all the base rels
> included in the join and then build a cross-product nestloop join tree,
> applying all the join clauses at the top".  This would have the signal
> value that it's guaranteed to succeed and so can be left for later,
> rather than having to expensively redo it at each level of join planning.

I am not able to understand how this strategy of applying all join
clauses on top of cross product would work if there are OUTER joins
involved. Wouldn't nullable sides change the result?

>
> (Hm, that does sound a lot like "throw it away and start again", doesn't
> it.  But what we've got here is busted enough that I'm not sure there
> are good alternatives.  Maybe for 9.6 we could just rewrite
> GetExistingLocalJoinPath, and limp along doing a lot of redundant
> computation during planning.)

A possible short-term fix may be this: instead of picking any random
path to stick into fdw_outerpath, we choose a path which covers the
pathkeys of ForeignPath. If postgres_fdw was able to come up with a
ForeignPath with those pathkeys, there is high chance that there
exists a local path with those pathkeys. Since we are yet to call
add_path() with the ForeignPath being built, that local path should
exist in the path list. Stick that path in fdw_outerpath. If such a
path doesn't exist, return NULL from GetExistingLocalJoinPath().
postgres_fdw wouldn't push down the join in that case and we will
loose some optimization, but that might be better than throwing an
error.

I agree that a join tree consisting of nested loop joins would be more
efficient in this case.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jesper Pedersen
Дата:
Сообщение: Re: [HACKERS] Hash Indexes
Следующее
От: Kevin Grittner
Дата:
Сообщение: Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraintviolation [and 2 more messages]