MergeJoin beats HashJoin in the case of multiple hash clauses

Поиск

Список

Период

Сортировка

От	Andrey Lepikhov
Тема	MergeJoin beats HashJoin in the case of multiple hash clauses
Дата	15 июня 2023 г. 08:30:10
Msg-id	52257607-57f6-850d-399a-ec33a654457b@postgrespro.ru обсуждение исходный текст
Ответы	Re: MergeJoin beats HashJoin in the case of multiple hash clauses Re: MergeJoin beats HashJoin in the case of multiple hash clauses
Список	pgsql-hackers

Дерево обсуждения

Hi, all.

Some of my clients use JOIN's with three - four clauses. Quite 
frequently, I see complaints on unreasonable switch of JOIN algorithm to 
Merge Join instead of Hash Join. Quick research have shown one weak 
place - estimation of an average bucket size in final_cost_hashjoin (see 
q2.sql in attachment) with very conservative strategy.
Unlike estimation of groups, here we use smallest ndistinct value across 
all buckets instead of multiplying them (or trying to make multivariate 
analysis).
It works fine for the case of one clause. But if we have many clauses, 
and if each has high value of ndistinct, we will overestimate average 
size of a bucket and, as a result, prefer to use Merge Join. As the 
example in attachment shows, it leads to worse plan than possible, 
sometimes drastically worse.
I assume, this is done with fear of functional dependencies between hash 
clause components. But as for me, here we should go the same way, as 
estimation of groups.
The attached patch shows a sketch of the solution.

-- 
regards,
Andrey Lepikhov
Postgres Professional

Вложения

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

MergeJoin beats HashJoin in the case of multiple hash clauses

Вложения