Обсуждение: Possible regression regarding estimating relation width in FDWs

Поиск
Список
Период
Сортировка

Possible regression regarding estimating relation width in FDWs

От
Ronan Dunklau
Дата:
Hello,

While working on adapting the Multicorn FDW for 9.6, I noticed that there is a  
regression with regards to estimating the remote relation width.

This behavior can be exposed using the postgres_fdw, using 
"use_remote_estimate".

Test case:

CREATE EXTENSION postgres_fdw;
CREATE SERVER localhost FOREIGN DATA WRAPPER postgres_fdw;
CREATE USER MAPPING FOR CURRENT_USER SERVER localhost;
CREATE TABLE local_table (c1 text);
INSERT INTO local_table (c1) (SELECT repeat('test', 10000));
ANALYZE local_table;
CREATE FOREIGN TABLE foreign_table (c1 text) SERVER localhost OPTIONS 
(table_name 'local_table', use_remote_estimate 'true');
EXPLAIN SELECT * FROM foreign_table;

Output, on current HEAD:
                             QUERY PLAN                              
----------------------------------------------------------------------Foreign Scan on foreign_table
(cost=100.00..101.03rows=1 width=32)
 


On 9.5:                             QUERY PLAN                               
-----------------------------------------------------------------------Foreign Scan on foreign_table
(cost=100.00..101.03rows=1 width=472)
 


While the FDW correctly sets the pathtarget width, it is then overriden at a 
later point. I'm not sure what happens exactly, but it seems that the relation 
path target is ignored completely, in planner.c:1695:
    /*     * Convert the query's result tlist into PathTarget format.     *     * Note: it's desirable to not do this
tillafter query_planner(),     * because the target width estimates can use per-Var width numbers     * that were
obtainedwithin query_planner().     */    final_target = create_pathtarget(root, tlist);
 

It says explicitly that it will be computed using per-Var width numbers.

I think the current_rel->cheapest_total_path->pathtarget should be taken into 
account, at least in the FDW case.

I'm not sure if the ability to estimate the whole relation width should be 
deprecated in favor of per-var width, or if it still should be supported 
(after all, the GetForeignRelSize callback is called AFTER having set a value 
computed from the individual attr_widths, in set_foreign_size). But in any 
case, at least postgres_fdw should be updated to support that.

Sorry if that was not clear, I'm at PGCon at the moment so if anyone want to 
discuss that in person I'm available.


-- 
Ronan Dunklau
http://dalibo.com - http://dalibo.org



Re: Possible regression regarding estimating relation width in FDWs

От
Tom Lane
Дата:
Ronan Dunklau <ronan.dunklau@dalibo.com> writes:
> While working on adapting the Multicorn FDW for 9.6, I noticed that there is a  
> regression with regards to estimating the remote relation width.

Hm.  In the old code it was basically a chance artifact that this example
happens to produce the "right" width estimate.  There are plenty of other
cases where the per-column width estimates were already the relevant ones,
for example if you join the foreign table to another one.  postgres_fdw
is falling down on the job by not making the per-column width estimates
consistent with the overall relation width estimate.

We could imagine extending use_remote_estimate mode to collect per-column
width estimates from the remote end, but that would add quite a lot of
cost.  It's not really necessary either, IMO, because you can instead
ANALYZE the foreign table to cause column width estimates to be computed
and stored locally.

If you do that in this example, you find another interesting thing about
HEAD's behavior:

regression=# EXPLAIN SELECT * FROM foreign_table;                             QUERY PLAN                              
----------------------------------------------------------------------Foreign Scan on foreign_table
(cost=100.00..101.03rows=1 width=32)
 
(1 row)

regression=# ANALYZE foreign_table;
ANALYZE
regression=# EXPLAIN SELECT * FROM foreign_table;                              QUERY PLAN

 
-------------------------------------------------------------------------Foreign Scan on foreign_table
(cost=100.00..101.03rows=1 width=40004)
 
(1 row)

The width estimate is now based on the decompressed/detoasted column
width, which is really the right thing here because that is the
representation we'll be working with for any local operations ---
estimating the size of a hash table using the remote's toasted column
width, for example, is just wrong.  So I don't see any basis for arguing
that 472 is the "right" width to use for the foreign table.

In both HEAD and 9.5, join cases (post-ANALYZE) look pretty wacko:

regression=# EXPLAIN SELECT * FROM foreign_table cross join int8_tbl;                                QUERY PLAN
                        
 
-----------------------------------------------------------------------------Nested Loop  (cost=100.00..102.13 rows=5
width=40020) ->  Foreign Scan on foreign_table  (cost=100.00..101.03 rows=1 width=472)  ->  Seq Scan on int8_tbl
(cost=0.00..1.05rows=5 width=16)
 
(3 rows)

The top-level estimate here is actually right, IMV, but the width estimate
for the ForeignScan is not.

In view of this, I'm a bit tempted to double down on the ANALYZE
dependency by having postgres_fdw not pay attention to the remote's width
estimates at all, but just use whatever column width data is cached
locally, and sum those numbers to get a relation width.  That would be
more reliable if we've done an ANALYZE recently, and I'm not convinced
it'd be worse if we have not.

Anyway, the bottom line is that the behavior of 9.5 and before is not
so great in this area that I feel a need to be bug-compatible with it.
        regards, tom lane