FDW join vs full join push down

Поиск

Список

Период

Сортировка

От	Marc Olivé
Тема	FDW join vs full join push down
Дата	7 сентября 2021 г. 17:02:03
Msg-id	CAB7_X5xDKQe0ZwHw4cQ=2aFkjs7R0qHoyvvZigK4E_QBsQffpg@mail.gmail.com обсуждение исходный текст
Список	pgsql-performance

Дерево обсуждения

Hello all,

We have some PG servers which we merge into a "coordinator" node using FDW and partitioned tables, we partition them by a synthetic "shard_id" field.

There are around 30 tables coordinated this way, with all foreign servers having the same schema structure.

We have some performance issues when joining foreign tables, always done by the same "shard_id", where the major bottleneck is how rows from joined tables are fetched. explain(verbose) shows:

Remote SQL: SELECT entity_id, execution_id, shard_id FROM entity_execution WHERE ((shard_id = 5)) AND (($1::bigint = entity_id))

This way, PG is doing a lot of round trips between the coordinator and the foreign nodes, fetching a single row every time, and we have a very high latency between the coordinator and the nodes.

As the joins are done on the same node, it could send the whole query and fetch all results in a single round trip.

The FDW are configured with 'use_remote_estimate' to true and we have the parameters enable_partition_pruning, enable_partitionwise_aggregate and enable_partitionwise_join activated.

The tables involved can have from a million rows to more than 1000 millions, but the queries usually return a few thousand rows.

A full sample plan and it's query: https://explain.depesz.com/s/TbJy

explain(verbose)

select *
from nlp.note_entity_label nel
join nlp.note_entity ne on ne.note_entity_id = nel.note_entity_id and ne.shard_id = nel.shard_id
join nlp.note_entity_execution nex on nex.note_entity_id = ne.note_entity_id and nex.shard_id = nel.shard_id
where
nel.label_id = 192
and nel.shard_id = 5

The row estimates are quite off the true ones, even though we have run 'analyze' on the remote nodes before, and 'use_remote_estimate' is on.

The above query ends in about 6 minutes.

The interesting part is that if we change the 'join' by 'full joins', with some extra filter, the plan is the one we believe is the optimal one, and indeed the query ends in 1 second: https://explain.depesz.com/s/b3As

explain(verbose)

with ents as(
select nel.note_entity_id nelid, ne.note_entity_id neid, nex.note_entity_id nexid, *
  from nlp.note_entity_label nel
  full join nlp.note_entity ne on ne.note_entity_id = nel.note_entity_id and ne.shard_id = nel.shard_id
  full join nlp.note_entity_execution nex on nex.note_entity_id = ne.note_entity_id and nex.shard_id = nel.shard_id
  where
      nel.label_id = 192
      and nel.shard_id = 5
)
select *
from ents
where nelid is not null
  and neid is not null
  and nexid is not null
;

Here we can see that the whole query is sent to the fdw and it finishes in a reasonable time.

So, the question is if we can do something to make the fdw send the whole query to the remote nodes when the involved joins use the same partition, or why isn't PG sending it when we use 'inner join'.

We have tried tweaking the "fdw_tuple_cost" , increasing and lowering it to unreasonable values

10, 1000, 100000 and 1000000 without the desired result.

Thanks,

В списке pgsql-performance по дате отправления:

Предыдущее

От: Andrew Dunstan
Дата: 02 сентября 2021 г., 19:59:51
Сообщение: Re: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4

Следующее

От: Stepan Yankevych
Дата: 07 сентября 2021 г., 21:05:42
Сообщение: Foreign table as partition - Non optimal aggregation plan

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

FDW join vs full join push down

Предыдущее

Следующее