Re: Support run-time partition pruning for hash join

Поиск

Список

Период

Сортировка

От	Andy Fan
Тема	Re: Support run-time partition pruning for hash join
Дата	21 августа 2023 г. 15:34:24
Msg-id	CAKU4AWpGo7wXEQqvsv13s=Cto3k9QbucW5DJtjwD1a0dF9iTgA@mail.gmail.com обсуждение исходный текст
Ответ на	Support run-time partition pruning for hash join (Richard Guo <guofenglinux@gmail.com>)
Ответы	Re: Support run-time partition pruning for hash join Re: Support run-time partition pruning for hash join
Список	pgsql-hackers

Дерево обсуждения

On Mon, Aug 21, 2023 at 11:48 AM Richard Guo <guofenglinux@gmail.com> wrote:

If we have a hash join with an Append node on the outer side, something
like

Hash Join
Hash Cond: (pt.a = t.a)
-> Append
-> Seq Scan on pt_p1 pt_1
-> Seq Scan on pt_p2 pt_2
-> Seq Scan on pt_p3 pt_3
-> Hash
-> Seq Scan on t

We can actually prune those subnodes of the Append that cannot possibly
contain any matching tuples from the other side of the join. To do
that, when building the Hash table, for each row from the inner side we
can compute the minimum set of subnodes that can possibly match the join
condition. When we have built the Hash table and start to execute the
Append node, we should have known which subnodes are survived and thus
can skip other subnodes.

This feature looks good, but is it possible to know if we can prune

any subnodes before we pay the extra effort (building the Hash

table, for each row... stuff)? IIUC, looks no. If so, I think this area

needs more attention. I can't provide any good suggestions yet.

Maybe at least, if we have found no subnodes can be skipped

during the hashing, we can stop doing such work anymore.

There are several points that need more consideration.

1. All the join partition prunning decisions are made in createplan.c
where the best path tree has been decided. This is not great. Maybe
it's better to make it happen when we build up the path tree, so that
we can take the partition prunning into consideration when estimating
the costs.

fwiw, the current master totally ignores the cost reduction for run-time

partition prune, even for init partition prune. So in some real cases,

pg chooses a hash join just because the cost of nest loop join is

highly over estimated.

4. Is it possible and worthwhile to extend the join partition prunning
mechanism to support nestloop and mergejoin also?

In my current knowledge, we have to build the inner table first for this

optimization? so hash join and sort merge should be OK, but nestloop should

be impossible unless I missed something.

Best Regards

Andy Fan

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Christoph Berg
Дата: 21 августа 2023 г., 15:23:27
Сообщение: Reproducibility (Re: Remove distprep)

Следующее

От: "Hayato Kuroda (Fujitsu)"
Дата: 21 августа 2023 г., 16:02:30
Сообщение: RE: [PoC] pg_upgrade: allow to upgrade publisher node

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Support run-time partition pruning for hash join

Предыдущее

Следующее