Re: Wrong cost estimation for foreign tables join with use_remote_estimatedisabled

Поиск
Список
Период
Сортировка
От Etsuro Fujita
Тема Re: Wrong cost estimation for foreign tables join with use_remote_estimatedisabled
Дата
Msg-id 5B2CBE49.8070006@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: Wrong cost estimation for foreign tables join withuse_remote_estimate disabled  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Ответы Re: Wrong cost estimation for foreign tables join withuse_remote_estimate disabled  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
Список pgsql-hackers
Hi Konstantin,

(2018/06/22 15:26), Konstantin Knizhnik wrote:
> On 21.06.2018 20:08, Tom Lane wrote:
>> Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:
>>> The following very simple test reduce the problem with wrong cost
>>> estimation:
>>> create foreign table t1_fdw(x integer, y integer) server pg_fdw options
>>> (table_name 't1', use_remote_estimate 'false');
>>> create foreign table t2_fdw(x integer) server pg_fdw options (table_name
>>> 't2', use_remote_estimate 'false');
>>> It is possible to force Postgres to use correct plan by setting
>>> "fdw_startup_cost" to some very large value (100000000 for example).
>>> ...
>>> Also correct plan is used when use_remote_estimate is true.
>> If you are unhappy about the results with use_remote_estimate off, don't
>> run it that way. The optimizer does not have a crystal ball.
>
> As I wrote, use_remote_estimate can not be used because in this case
> query compilation time is unacceptable (10 seconds, while time of query
> execution itself is ~200msec).
> So the problem can be addressed in two ways:
>
> 1. Try to reduce time of remote estimation. I wonder why postgres_fdw
> sends so much queries to remote server. For join of two tables there are
> 7 queries.
> I suspect that for ~20 joined tables in the original query number of
> calls is more than hundred,  so on wonder that it takes so much time.
> 2. Try to make optimizer make better estimation of join cost based on
> local statistic (please notice that ANALYZE is explicitly called for all
> foreign tables and number of rows in the result was correctly calculated).

To make local estimates more accurate, I think we need other information 
on remote tables such as remote indexes.

> What do you think: which of this two direction is more perspective? Or
> it is better to address both of them?

I'd vote for #2.  One idea for that is to introduce CREATE FOREIGN INDEX 
to have information on remote indexes on the local side, which I 
proposed before.  I have been putting it on hold since then, though.

Best regards,
Etsuro Fujita


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: server crashed with TRAP: FailedAssertion("!(!parallel_aware || pathnode->path.parallel_safe)"
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: add default parallel query to v10 release notes? (Re: [PERFORM]performance drop after upgrade (9.6 > 10))