Re: accounting for memory used for BufFile during hash joins

Поиск
Список
Период
Сортировка
От Melanie Plageman
Тема Re: accounting for memory used for BufFile during hash joins
Дата
Msg-id CAAKRu_ZTy=a2jqj0++jDwke1_ssJTBB0ezU2t8OwogmXDbYdYw@mail.gmail.com
обсуждение исходный текст
Ответ на accounting for memory used for BufFile during hash joins  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: accounting for memory used for BufFile during hash joins  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-hackers


On Fri, May 3, 2019 at 5:34 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

The second patch tries to enforce work_mem more strictly. That would be
impossible if we were to keep all the BufFile structs in memory, so
instead it slices the batches into chunks that fit into work_mem, and
then uses a single "overflow" file for slices currently not in memory.
These extra slices can't be counted into work_mem, but we should need
just very few of them. For example with work_mem=4MB the slice is 128
batches, so we need 128x less overflow files (compared to per-batch).

I want to see if I understand the implications of the per-slice-overflow patch
for execution of hashjoin:
For each bucket in the hashtable, when attempting to double the number of
batches, if the memory that the BufFile structs will occupy once this is done
will exceed the work_mem, split each batch into slices that fit into memory.
This means that, for each probe-side tuple hashing to that bucket, you have to
load every slice of each batch separately into memory to ensure correct results.
Is this right?
 

I'm not entirely sure which of those approaches is the right one. The
first one is clearly just a "damage control" for cases where the hash
side turned out to be much larger than we expected. With good estimates
we probably would not have picked a hash join for those (that is, we
should have realized we can't keep work_mem and prohibit hash join).

The second patch however makes hash join viable for some of those cases,
and it seems to work pretty well (there are some numbers in the message
posted to pgsql-performance thread). So I kinda like this second one.

So, my initial reaction after taking a look at the patches is that I prefer the
first approach--increasing the resize threshhold. The second patch, the
per-slice-overflow patch, adds a major new mechanic to hashjoin in order to
address what is, based on my understanding, an edge case.

--
Melanie Plageman

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: _bt_split(), and the risk of OOM before its critical section
Следующее
От: David Fetter
Дата:
Сообщение: Re: range_agg