Re: Foreign join search stops on the first try

Поиск
Список
Период
Сортировка
От Alexander Pyhalov
Тема Re: Foreign join search stops on the first try
Дата
Msg-id ad549fc7fcb10998cca7485d61c75bdb@postgrespro.ru
обсуждение исходный текст
Ответ на Re: Foreign join search stops on the first try  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Список pgsql-hackers
Ashutosh Bapat писал 2022-01-25 17:08:
> This code was written long ago. So I may have some recollection
> errors. But AFAIR, the reasons we wanted to avoid repeated
> estimation/planning for the same foreign join rel were
> 1. If use_remote_estimate = true, we fetch EXPLAIN output from the
> foreign server for various pathkeys. Fetching EXPLAIN output is
> expensive. Irrespective of the join order being considered locally, we
> expect the foreign server to give us the same cost since the join is
> the same. So we avoid running EXPLAIN again and again.
> 2. If use_remote_estimate = false, the logic to estimate a foreign
> join locally is independent of the join order so should yield same
> cost again and again. For some reason that doesn't seem to be the case
> here.

Hi.
use_remote_estimate was set to false in our case, and yes, it fixed this 
issue.
The problem is that if use_remote_estimate = false, the logic to 
estimate a foreign join locally
is not independent from the join order.

In above example, without patch we see plan with cost: 
cost=382.31..966.86 rows=2 width=37

If we avoid exiting on (joinrel->fdw_private), we can see in gdb the 
following cases, when joining all 3 relations:

case 1:
outerrel:relids (stock, order_line), startup_cost = 100, total_cost = 
2415.9200000000001, rel_startup_cost = 0, rel_total_cost = 2315.5, 
retrieved_rows = 21
innerrel: relid (district) startup_cost = 100, total_cost = 
101.14500000000001, rel_startup_cost = 0, rel_total_cost = 1.125, 
retrieved_rows = 1
joinrel: startup_cost = 100, total_cost = 2416.875, retrieved_rows = 2

case 2:
outerrel: relids (district, order_line), startup_cost = 100, total_cost 
= 281.41999999999996, rel_total_cost = 180, retrieved_rows = 71
innerrel: relid (stock), startup_cost = 100, total_cost = 
683.28500000000008, rel_startup_cost = 0, rel_total_cost = 576.625, 
retrieved_rows = 333
joinrel:  startup_cost = 100, total_cost = 974.88, retrieved_rows = 2


So, (stock join order_line) join district has different cost from 
(district join order_line) join stock.

> 
> On Tue, Jan 25, 2022 at 1:26 PM Alexander Pyhalov
> <a.pyhalov@postgrespro.ru> wrote:
> 
> 
> It is surprising that the planning time halves with the patch. I
> expected it to increase slightly since we will compute estimates
> thrice instead of once.

I wouldn't look at estimate times here precisely (and would looked at 
costs). Real example where we found it had 100 times more data, but 
effect was the same. Here some differences in planing time could be 
related to restarting instances with or without patches.

> 
> What is use_remote_estimate? Is it ON/OFF?
> 

Yes, it was off.

> If we want to proceed along this line, we should take care not to fire
> more EXPLAIN queries on the foreign server.

You are correct. Fixed patch to avoid extensive join search when 
use_remote_estimate is true.

-- 
Best regards,
Alexander Pyhalov,
Postgres Professional
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Fix BUG #17335: Duplicate result rows in Gather node
Следующее
От: Mark Dilger
Дата:
Сообщение: Re: CREATEROLE and role ownership hierarchies