postgres_fdw: oddity in costing presorted foreign scans with local stats

Поиск
Список
Период
Сортировка
От Etsuro Fujita
Тема postgres_fdw: oddity in costing presorted foreign scans with local stats
Дата
Msg-id CAPmGK17jaJLPDEkgnP2VmkOg=5wT8YQ1CqssU8JRpZ_NSE+dqQ@mail.gmail.com
обсуждение исходный текст
Ответы Re: postgres_fdw: oddity in costing presorted foreign scans withlocal stats  (Etsuro Fujita <etsuro.fujita@gmail.com>)
Список pgsql-hackers
Hi,

I noticed that there is (another) oddity in commit aa09cd242f: in the
!fpinfo->use_remote_estimate mode, when first called for costing an
unsorted foreign scan, estimate_path_cost_size() computes
retrieved_rows, which is the estimated number of rows fetched from the
remote server, by these:

    retrieved_rows = clamp_row_est(rows / fpinfo->local_conds_sel)
    retrieved_rows = Min(retrieved_rows, foreignrel->tuples);

where rows is the estimated number of output rows emitted from the
foreign scan, fpinfo->local_conds_sel is the selectivity of local
conditions, and foreignrel->tuples is the foreign table's reltuples.
This is good, BUT when next called for costing the presorted foreign
scan, that function re-computes retrieved_rows by the former, but
doesn't clamp it by the latter, which would produce wrong results.
Here is such an example:

create table t (a int, b int);
create foreign table ft (a int, b int) server loopback options
(table_name 't');insert into ft values (1, 10);
insert into ft values (2, 20);
analyze ft;
create function postgres_fdw_abs(int) returns int as $$ begin return
abs($1); end $$ language plpgsql immutable;
explain verbose select * from ft where postgres_fdw_abs(b) > 10 order by a;
                            QUERY PLAN
-------------------------------------------------------------------
 Foreign Scan on public.ft  (cost=100.00..101.89 rows=1 width=8)
   Output: a, b
   Filter: (postgres_fdw_abs(ft.b) > 10)
   Remote SQL: SELECT a, b FROM public.t ORDER BY a ASC NULLS LAST
(4 rows)

For this query, we have rows=1 and foreignrel->tuples=2.
postgres_fdw_abs(b) > 10 is a local condition, for which we have
fpinfo->local_conds_sel=0.333333 (I got this by printf debugging).  So
when first called for costing an unsorted foreign scan, by the former
equation retrieved_rows=3, then by the latter retrieved_rows=2, which
is correct.  BUT when next called for costing the presorted foreign
scan, we have retrieved_rows=3, as that function doesn't clamp the
retrieved_rows.  This is wrong, leading to incorrect cost estimates.
(This is an issue for the foreign-scan case, but I think we would have
the same issue for the foreign-join case.)

To fix, I propose to handle retrieved_rows in the same way as cached
costs; 1) cache retrieved_rows computed in the first call of
estimate_path_cost_size() into the foreign table's fpinfo, and 2) use
it after the first call.  Also, I'd like to propose to put this code
in that function for !use_remote_estimate mode in each of the below
code for the cases of foreign scan, foreign join, and foreign grouping
as needed, and use the rows/width estimates stored in the fpinfo (ie,
fpinfo->rows and fpinfo->width) after the first call, like the
attached.

        /*
         * Use rows/width estimates made by set_baserel_size_estimates() for
         * base foreign relations and set_joinrel_size_estimates() for join
         * between foreign relations.
         */
        rows = foreignrel->rows;
        width = foreignrel->reltarget->width;

I think that that would make the code more consistent and easier to
understand.  Also, there is another two reasons: a) this code seems
confusing to me for the foreign-grouping case, as the core code
doesn't set foreignrel->rows at all for grouped relations.  The change
proposed above would avoid that confusion.  And b) we can remove a
change made by commit ffab494a4d, which added support for sorting
grouped relations remotely in postgres_fdw.  In that commit, to extend
the logic for re-using cached costs to the foreign-grouping case, I
modified add_foreign_grouping_paths() so that it saves the rows
estimate for a grouped relation made by estimate_path_cost_size() into
the grouped relation's foreignrel->rows.  But for grouped relations,
we already save the row/width estimates into fpinfo->rows and
fpinfo->width, so the change proposed above would make that change
unnecessary.

Other change is: I noticed that commit 7012b132d0 incorrectly re-sets
the width estimates for grouped relations in the
!fpinfo->use_remote_estimate mode, so I fixed that as well in the
attached.

Best regards,
Etsuro Fujita

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrey Borodin
Дата:
Сообщение: Re: pglz performance
Следующее
От: Juan José Santamaría Flecha
Дата:
Сообщение: Re: [Doc] pg_restore documentation didn't explain how to useconnection string