Обсуждение: Short-circuit sort_inner_and_outer if there are no mergejoin clauses

Поиск
Список
Период
Сортировка

Short-circuit sort_inner_and_outer if there are no mergejoin clauses

От
Richard Guo
Дата:
In sort_inner_and_outer, we create mergejoin join paths by explicitly
sorting both relations on each possible ordering of the available
mergejoin clauses.  However, if there are no available mergejoin
clauses, we can skip this process entirely.  It seems that this is a
relatively common scenario.  Checking the regression tests I noticed
that there are a lot of cases where we would arrive here with an empty
mergeclause_list.

So I'm wondering if it's worth considering a shortcut in
sort_inner_and_outer by checking if mergeclause_list is empty.  This can
help skip all the statements preceding select_outer_pathkeys_for_merge.
In particular this may help avoid building UniquePath paths in the case
of JOIN_UNIQUE_OUTER or JOIN_UNIQUE_INNER.

I asked this because in the "Right Semi Join" patch [1] I wanted to
exclude mergejoin from being considered for JOIN_RIGHT_SEMI.  So I set
mergeclause_list to NIL, but noticed that it still runs the statements
in sort_inner_and_outer until no available outer pathkeys are found in
select_outer_pathkeys_for_merge.

Attached is a trivial patch for this.  Thoughts?

Вложения

Re: Short-circuit sort_inner_and_outer if there are no mergejoin clauses

От
Richard Guo
Дата:

On Wed, Apr 24, 2024 at 5:13 PM Richard Guo <guofenglinux@gmail.com> wrote:
In sort_inner_and_outer, we create mergejoin join paths by explicitly
sorting both relations on each possible ordering of the available
mergejoin clauses.  However, if there are no available mergejoin
clauses, we can skip this process entirely.  It seems that this is a
relatively common scenario.  Checking the regression tests I noticed
that there are a lot of cases where we would arrive here with an empty
mergeclause_list.

FWIW, during the run of the core regression tests, I found that we enter
sort_inner_and_outer with an empty mergeclause_list a total of 11064
times.  Out of these occurrences, there are 293 instances where the join
type is JOIN_UNIQUE_OUTER, indicating the need to create a UniquePath
for the outer path.  Similarly, there are also 293 instances where the
join type is JOIN_UNIQUE_INNER, indicating the need to create a
UniquePath for the inner path.

Thanks
Richard

Re: Short-circuit sort_inner_and_outer if there are no mergejoin clauses

От
Ashutosh Bapat
Дата:


On Thu, Apr 25, 2024 at 12:50 PM Richard Guo <guofenglinux@gmail.com> wrote:

On Wed, Apr 24, 2024 at 5:13 PM Richard Guo <guofenglinux@gmail.com> wrote:
In sort_inner_and_outer, we create mergejoin join paths by explicitly
sorting both relations on each possible ordering of the available
mergejoin clauses.  However, if there are no available mergejoin
clauses, we can skip this process entirely.  It seems that this is a
relatively common scenario.  Checking the regression tests I noticed
that there are a lot of cases where we would arrive here with an empty
mergeclause_list.

FWIW, during the run of the core regression tests, I found that we enter
sort_inner_and_outer with an empty mergeclause_list a total of 11064
times.  Out of these occurrences, there are 293 instances where the join
type is JOIN_UNIQUE_OUTER, indicating the need to create a UniquePath
for the outer path.  Similarly, there are also 293 instances where the
join type is JOIN_UNIQUE_INNER, indicating the need to create a
UniquePath for the inner path.

Quickly looking at the function, the patch would make it more apparent that the function is a noop when mergeclause_list is empty. I haven't looked closely to see if creating unique path nonetheless is useful somewhere else. Please add to the next commitfest. If the patch shows some measurable performance improvement, it would become more attractive.

--
Best Wishes,
Ashutosh Bapat

Re: Short-circuit sort_inner_and_outer if there are no mergejoin clauses

От
Richard Guo
Дата:

On Thu, Apr 25, 2024 at 7:25 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
Quickly looking at the function, the patch would make it more apparent that the function is a noop when mergeclause_list is empty.

Thanks for looking at this patch.  Yes, that's what it does.
 
I haven't looked closely to see if creating unique path nonetheless is useful somewhere else.

It seems that one of the side effects of create_unique_path is that it
caches the generated unique path so that we can avoid creating it
repeatedly for the same rel.  But this does not seem to justify calling
create_unique_path when we know it is unnecessary. 

Please add to the next commitfest.

Done.
 
If the patch shows some measurable performance improvement, it would become more attractive.

I doubt that there is measurable performance improvement.  But I found
that throughout the run of the regression tests, sort_inner_and_outer is
called a total of 44,424 times.  Among these calls, there are 11,064
instances where mergeclause_list is found to be empty.  This accounts
for ~1/4.  I think maybe this suggests that it's worth the shortcut as
the patch does.

Thanks
Richard