Re: Avoiding hash join batch explosions with extreme skew and weird stats

Поиск
Список
Период
Сортировка
От Melanie Plageman
Тема Re: Avoiding hash join batch explosions with extreme skew and weird stats
Дата
Msg-id CAAKRu_b_XVe0LGVJGoQA77pZJ6xs7xsyyQPbQ+GBA8=Ot79i7g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Avoiding hash join batch explosions with extreme skew and weird stats  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Avoiding hash join batch explosions with extreme skew and weird stats  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers


On Tue, Jun 4, 2019 at 5:43 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Jun 3, 2019 at 5:10 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> I was talking to Jeff Davis about this on Saturday, and, he felt that
> there might be a way to solve the problem differently if we thought of
> the left join case as performing an inner join and an antijoin
> instead.
>
> Riffing on this idea a bit, I started trying to write a patch that
> would basically emit a tuple if it matches and write the tuple out to
> a file if it does not match. Then, after iterating through the outer
> batch the first time for the first inner chunk, any tuples which do
> not yet have a match are the only ones which need to be joined against
> the other inner chunks. Instead of iterating through the outer side
> original batch file, use the unmatched outer tuples file to do the
> join against the next chunk. Repeat this for all chunks.

I'm not sure that I understanding this proposal correctly, but if I am
then I think it doesn't work in the case where a single outer row
matches rows in many different inner chunks.  When you "use the
unmatched outer tuples file to do the join against the next chunk,"
you deny any rows that have already matched the chance to produce
additional matches.


Oops! You are totally right.
I will amend the idea:
For each chunk on the inner side, loop through both the original batch
file and the unmatched outer tuples file created for the last chunk.
Emit any matches and write out any unmatched tuples to a new unmatched
outer tuples file.

I think, in the worst case, if no tuples from the outer have a match,
you end up writing out all of the outer tuples for each chunk on the
inner side. However, using the match bit in the tuple header solution
would require this much writing.
Probably the bigger problem is that in this worst case you would also
need to read double the number of outer tuples for each inner chunk.

However, in the best case it seems like it would be better than the
match bit/write everything from the outer side out solution.

--
Melanie Plageman

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Melanie Plageman
Дата:
Сообщение: Re: Sort support for macaddr8
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Avoiding hash join batch explosions with extreme skew and weird stats