Re: Avoiding hash join batch explosions with extreme skew and weird stats

Поиск
Список
Период
Сортировка
От Melanie Plageman
Тема Re: Avoiding hash join batch explosions with extreme skew and weird stats
Дата
Msg-id CAAKRu_bYcBaL=+XTqLKebD8Hq+Wz=4v7jrsVEHqwbOstrEVHww@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Avoiding hash join batch explosions with extreme skew and weird stats  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Avoiding hash join batch explosions with extreme skew and weird stats  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers


On Tue, Jun 4, 2019 at 12:08 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Jun 4, 2019 at 2:47 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> Oops! You are totally right.
> I will amend the idea:
> For each chunk on the inner side, loop through both the original batch
> file and the unmatched outer tuples file created for the last chunk.
> Emit any matches and write out any unmatched tuples to a new unmatched
> outer tuples file.
>
> I think, in the worst case, if no tuples from the outer have a match,
> you end up writing out all of the outer tuples for each chunk on the
> inner side. However, using the match bit in the tuple header solution
> would require this much writing.
> Probably the bigger problem is that in this worst case you would also
> need to read double the number of outer tuples for each inner chunk.
>
> However, in the best case it seems like it would be better than the
> match bit/write everything from the outer side out solution.

I guess so, but the downside of needing to read twice as many outer
tuples for each inner chunk seems pretty large.  It would be a lot
nicer if we could find a way to store the matched-bits someplace other
than where we are storing the tuples, what Thomas called a
bits-in-order scheme, because then the amount of additional read and
write I/O would be tiny -- one bit per tuple doesn't add up very fast.

In the scheme you propose here, I think that after you read the
original outer tuples for each chunk and the unmatched outer tuples
for each chunk, you'll have to match up the unmatched tuples to the
original tuples, probably by using memcmp() or something.  Otherwise,
when a new match occurs, you won't know which tuple should now not be
emitted into the new unmatched outer tuples file that you're going to
produce.  So I think what's going to happen is that you'll read the
original batch file, then read the unmatched tuples file and use that
to set or not set a bit on each tuple in memory, then do the real work
setting more bits, then write out a new unmatched-tuples file with the
tuples that still don't have the bit set.  So your unmatched tuple
file is basically a list of tuple identifiers in the least compact
form imaginable: the tuple is identified by the entire tuple contents.
That doesn't seem very appealing, although I expect that it would
still win for some queries.


I'm not sure I understand why you would need to compare the original
tuples to the unmatched tuples file.

This is the example I used to try and reason through it.

let's say you have a batch (you are joining two single column tables)
and your outer side is:
5,7,9,11,10,11
and your inner is:
7,10,7,12,5,9
and for the inner, let's say that only two values can fit in memory,
so it is split into 3 chunks:
7,10 | 7,12 | 5,9
The first time you iterate through the outer side (joining it to the
first chunk), you emit as matched
7,7
10,10
and write to unmatched tuples file
5
9
11
11
The second time you iterate through the outer side (joining it to the
second chunk) you emit as matched
7,7
Then, you iterate again through the outer side a third time to join it
to the unmatched tuples in the unmatched tuples file (from the first
chunk) and write the following to a new unmatched tuples file:
5
9
11
11
The fourth time you iterate through the outer side (joining it to the
third chunk), you emit as matched
5,5
9,9
Then you iterate a fifth time through the outer side to join it to the
unmatched tuples in the unmatched tuples file (from the second chunk)
and write the following to a new unmatched tuples file:
11
11
Now that all chunks from the inner side have been processed, you can
loop through the final unmatched tuples file, NULL-extend, and emit
them

Wouldn't that work?

--
Melanie Plageman

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: behaviour change - default_tablesapce + partition table
Следующее
От: Melanie Plageman
Дата:
Сообщение: Re: Avoiding hash join batch explosions with extreme skew and weird stats