Re: Push down more full joins in postgres_fdw

Поиск
Список
Период
Сортировка
От Ashutosh Bapat
Тема Re: Push down more full joins in postgres_fdw
Дата
Msg-id CAFjFpRdyEcLkwHTzTd23SdWgNNQWqxFJaqPHqoeXjro2DX7p0Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Push down more full joins in postgres_fdw  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
Ответы Re: Push down more full joins in postgres_fdw  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
Список pgsql-hackers





        3. I think registerAlias stuff should happen really at the time of
        creating paths and should be stored in fpinfo. Without that it
        will be
        computed every time we deparse the query. This means every time
        we try
        to EXPLAIN the query at various levels in the join tree and once
        for the
        query to be sent to the foreign server.

    Hmm.  I think the overhead in calculating aliases would be
    negligible compared to the overhead in explaining each remote query
    for costing or sending the remote query for execution.  So, I
    created aliases in the same way as remote params created and stored
    into params_list in deparse_expr_cxt.  I'm not sure it's worth
    complicating the code.

We defer remote parameter creation till deparse time since the the
parameter numbers are dependent upon the sequence in which we deparse
the query. Creating them at the time of path creation and storing them
in fpinfo is not possible because a. those present in the joining
relations will conflict with each other and need some kind of
serialization at the time of deparsing b. those defer for differently
parameterized paths or paths with different pathkeys. We don't defer it
because it's very light on performance.

That's not true with the alias information. As long as we detect which
relations need subqueries, their RTIs are enough to create unique
aliases e.g. if a relation involves RTIs 1, 2, 3 corresponding subquery
can have alias r123 without conflicting with any other alias.

Hmm.  But another concern about the approach of generating an subselect alias for a path, if needed, at the path creation time would be that the path might be rejected by add_path, which would result in totally wasting cycles for generating the subselect alias for the path.

A path may get rejected but the relation is not rejected. The alias applies to a relation and its targetlist, which will remain same for all paths created for that relation, esp when it's a base relation or join. So, AFAIU that work never gets wasted. Also, for costing paths with use_remote_estimate, we deparse the query, which builds the alias information again and again for very path. That's much worse than building it once for a given relation.
 

However minimum overhead it might have, we will deparse the query every
time we create a path involving that kind of relation i.e. for different
pathkeys, different parameterization and different joins in which that
relation participates in. This number can be significant. We want to
avoid this overhead if we can.

Exactly.  As I said above, I think the overhead would be negligible compared to the overhead in explaining each remote query for costing or the overhead in sending the final remote query for execution, though.

It won't remain minimal as the number of paths created increases, increasing the number of times a query is deparsed. We deparse query every time time we cost a path for a relation with use_remote_estimates true. As we try to push down more and more stuff, we will create more paths and deparse the query more time.

Also, that makes the interface better. Right now, in your patch, you have changed the order of deparsing in the existing code, so that the aliases are registered while deparsing FROM clause and before any Var nodes are deparsed. If we create aliases at the time of path creation, only once in GetForeignJoinPaths or GetForeignPaths as appropriate, that would require less code churn and would save some CPU cycles as well.
 

        6.
        !     if (is_placeholder)
        !         errcontext("placeholder expression at position %d in
        select list",
        !                    errpos->cur_attno);
        A user wouldn't know what a placeholder expression is, there is
        no such
        term in the documentation. We have to device a better way to
        provide an
        error context for an expression in general.

    Though I proposed that, I don't think that it's that important to
    let users know that the expression is from a PHV.  How about just
    saying "expression", not "placeholder expression", so that we have
    the message "expression at position %d in select list" in the context?

Hmm, that's better than placeholder expression, but not as explanatory
as it should be since we won't be printing the "select list" in the
error context and user won't have a clue as to what error context is
about.

I don't think so.  Consider an example of the conversion error message, which is from the regression test:

SELECT  ft1.c1,  ft2.c2, ft1 FROM ft1, ft2 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = 1;
ERROR:  invalid input syntax for integer: "foo"
CONTEXT:  whole-row reference to foreign table "ft1"

As shown in the example, the error message is displayed under a remote query for execution.  So, ISTM it's reasonable to print something like "expression at position %d in select list" in the context if an expression in a PHV.

I missed it. Sorry. Looks ok.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: Re: Partition-wise join for join between (declaratively) partitioned tables
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: 9.6 TAP tests and extensions