Re: postgres_fdw: oddity in costing presorted foreign scans withlocal stats

Поиск
Список
Период
Сортировка
От Etsuro Fujita
Тема Re: postgres_fdw: oddity in costing presorted foreign scans withlocal stats
Дата
Msg-id CAPmGK14KydtC3z_kRNONv-K5yJ3wDbKxM2BWCacYL-xhgGn2Ag@mail.gmail.com
обсуждение исходный текст
Ответ на postgres_fdw: oddity in costing presorted foreign scans with local stats  (Etsuro Fujita <etsuro.fujita@gmail.com>)
Ответы Re: postgres_fdw: oddity in costing presorted foreign scans withlocal stats  (Etsuro Fujita <etsuro.fujita@gmail.com>)
Список pgsql-hackers
On Fri, May 17, 2019 at 8:31 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> I noticed that there is (another) oddity in commit aa09cd242f: in the
> !fpinfo->use_remote_estimate mode, when first called for costing an
> unsorted foreign scan, estimate_path_cost_size() computes
> retrieved_rows, which is the estimated number of rows fetched from the
> remote server, by these:
>
>     retrieved_rows = clamp_row_est(rows / fpinfo->local_conds_sel)
>     retrieved_rows = Min(retrieved_rows, foreignrel->tuples);
>
> where rows is the estimated number of output rows emitted from the
> foreign scan, fpinfo->local_conds_sel is the selectivity of local
> conditions, and foreignrel->tuples is the foreign table's reltuples.
> This is good, BUT when next called for costing the presorted foreign
> scan, that function re-computes retrieved_rows by the former, but
> doesn't clamp it by the latter, which would produce wrong results.
> Here is such an example:
>
> create table t (a int, b int);
> create foreign table ft (a int, b int) server loopback options
> (table_name 't');insert into ft values (1, 10);
> insert into ft values (2, 20);
> analyze ft;
> create function postgres_fdw_abs(int) returns int as $$ begin return
> abs($1); end $$ language plpgsql immutable;
> explain verbose select * from ft where postgres_fdw_abs(b) > 10 order by a;
>                             QUERY PLAN
> -------------------------------------------------------------------
>  Foreign Scan on public.ft  (cost=100.00..101.89 rows=1 width=8)
>    Output: a, b
>    Filter: (postgres_fdw_abs(ft.b) > 10)
>    Remote SQL: SELECT a, b FROM public.t ORDER BY a ASC NULLS LAST
> (4 rows)
>
> For this query, we have rows=1 and foreignrel->tuples=2.
> postgres_fdw_abs(b) > 10 is a local condition, for which we have
> fpinfo->local_conds_sel=0.333333 (I got this by printf debugging).  So
> when first called for costing an unsorted foreign scan, by the former
> equation retrieved_rows=3, then by the latter retrieved_rows=2, which
> is correct.  BUT when next called for costing the presorted foreign
> scan, we have retrieved_rows=3, as that function doesn't clamp the
> retrieved_rows.  This is wrong, leading to incorrect cost estimates.
> (This is an issue for the foreign-scan case, but I think we would have
> the same issue for the foreign-join case.)
>
> To fix, I propose to handle retrieved_rows in the same way as cached
> costs; 1) cache retrieved_rows computed in the first call of
> estimate_path_cost_size() into the foreign table's fpinfo, and 2) use
> it after the first call.  Also, I'd like to propose to put this code
> in that function for !use_remote_estimate mode in each of the below
> code for the cases of foreign scan, foreign join, and foreign grouping
> as needed, and use the rows/width estimates stored in the fpinfo (ie,
> fpinfo->rows and fpinfo->width) after the first call, like the
> attached.
>
>         /*
>          * Use rows/width estimates made by set_baserel_size_estimates() for
>          * base foreign relations and set_joinrel_size_estimates() for join
>          * between foreign relations.
>          */
>         rows = foreignrel->rows;
>         width = foreignrel->reltarget->width;
>
> I think that that would make the code more consistent and easier to
> understand.  Also, there is another two reasons: a) this code seems
> confusing to me for the foreign-grouping case, as the core code
> doesn't set foreignrel->rows at all for grouped relations.  The change
> proposed above would avoid that confusion.  And b) we can remove a
> change made by commit ffab494a4d, which added support for sorting
> grouped relations remotely in postgres_fdw.  In that commit, to extend
> the logic for re-using cached costs to the foreign-grouping case, I
> modified add_foreign_grouping_paths() so that it saves the rows
> estimate for a grouped relation made by estimate_path_cost_size() into
> the grouped relation's foreignrel->rows.  But for grouped relations,
> we already save the row/width estimates into fpinfo->rows and
> fpinfo->width, so the change proposed above would make that change
> unnecessary.
>
> Other change is: I noticed that commit 7012b132d0 incorrectly re-sets
> the width estimates for grouped relations in the
> !fpinfo->use_remote_estimate mode, so I fixed that as well in the
> attached.

I made stricter an assertion test I added on retrieved_rows.  Also, I
did some editorialization further and added the commit message.
Attached is an updated version of the patch.  If there are no
objections, I'll commit the patch.

Best regards,
Etsuro Fujita

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Gilles Darold
Дата:
Сообщение: Re: idea: log_statement_sample_rate - bottom limit for sampling
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Why does pg_checksums -r not have a long option?