[HACKERS] postgres_fdw cost estimation defaults and documentation

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема [HACKERS] postgres_fdw cost estimation defaults and documentation
Дата
Msg-id CAMkU=1yYPsTxrne8=NghrGeyg4WWbXUce_Z-y=9vGW5G3na=2g@mail.gmail.com
обсуждение исходный текст
Ответы Re: [HACKERS] postgres_fdw cost estimation defaults and documentation  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Re: [HACKERS] postgres_fdw cost estimation defaults and documentation  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers

The default value for fdw_tuple_cost is 0.01, which seems way too low.  If I set up a loop-back foreign server with a large fetch_size, then tests like:

select * from pgbench_accounts except select * from loopback.pgbench_accounts 

vs

select * from pgbench_accounts except select * from pgbench_accounts

indicate that 0.1 is about the lowest value for fdw_tuple_cost that could make sense, and a reasonable default would probably be 0.25.  Yes, it is only a default, but the default should make sense for at least some situation, and I can't imagine any situation in which 0.01 makes sense.

In the documentation for fdw_startup_cost, it says "This represents the additional overhead of establishing a connection, parsing and planning the query on the remote side, etc.".  I think that "establishing a connection" should be stricken. Either you need a connection or you don't, there is nothing the optimizer can do about this.  And if do need one, you only establish one once (at most), not once per query sent to the remote side.  I think the implementation correctly doesn't try to account for the overhead of establishing a connection, so the docs should remove that claim.

In regards to use_remote_estimate, the documentation says "Running ANALYZE on the foreign table is the way to update the local statistics; this will perform a scan of the remote table and then calculate and store statistics just as though the table were local. Keeping local statistics can be a useful way to reduce per-query planning overhead for a remote table — but if the remote table is frequently updated, the local statistics will soon be obsolete."  This makes it send like local stats is basically equivalent to use_remote_estimate, other than the staleness issue.  But they are far from equivalent.  use_remote_estimate has implicit knowledge of the indexes on the foreign server (implicit via the reduced cost estimates derived from the foreign side for parameterized queries), whereas local stats of foreign tables just assumes there are no indexes for planning purposes. Perhaps adding something like "Also, local statistics do not contain information on the available indexes on the remote side, while use_remote_estimate does take these into account"?

Cheers,

Jeff

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Shubham Barai
Дата:
Сообщение: [HACKERS] GSoC 2017 weekly progress reports ("Explicitly support predicatelocks in index access methods besides b-tree")
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] shm_toc_lookup API