Re: Memory-Bounded Hash Aggregation

Поиск
Список
Период
Сортировка
От Jeff Davis
Тема Re: Memory-Bounded Hash Aggregation
Дата
Msg-id 97c46a59c27f3c38e486ca170fcbc618d97ab049.camel@j-davis.com
обсуждение исходный текст
Ответ на Re: Memory-Bounded Hash Aggregation  (Jeff Davis <pgsql@j-davis.com>)
Ответы Re: Memory-Bounded Hash Aggregation  (Jeff Davis <pgsql@j-davis.com>)
Re: Memory-Bounded Hash Aggregation  (Heikki Linnakangas <hlinnaka@iki.fi>)
Список pgsql-hackers
On Wed, 2020-01-29 at 14:48 -0800, Jeff Davis wrote:
> 2) Use a different structure more capable of handling a large
> fraction
> of free space. A compressed bitmap might make sense, but that seems
> like overkill to waste effort tracking a lot of space that is
> unlikely
> to ever be used.

I ended up converting the freelist to a min heap.

Attached is a patch which makes three changes to better support
HashAgg:

1. Use a minheap for the freelist. The original design used an array
that had to be sorted between a read (which frees a block) and a write
(which needs to sort the array to consume the lowest block number). The
comments said:

  * sorted.  This is an efficient way to handle it because we expect
cycles
  * of releasing many blocks followed by re-using many blocks, due to
  * the larger read buffer.  

But I didn't find a case where that actually wins over a simple
minheap. With that in mind, a minheap seems closer to what one might
expect for that purpose, and more robust when the assumptions don't
hold up as well. If someone knows of a case where the re-sorting
behavior is important, please let me know.

Changing to a minheap effectively solves the problem for HashAgg,
though in theory the memory consumption of the freelist itself could
become significant (though it's only 0.1% of the free space being
tracked).

2. Lazily-allocate the read buffer. The write buffer was always lazily-
allocated, so this patch creates better symmetry. More importantly, it
means freshly-rewound tapes don't have any buffer allocated, so it
greatly expands the number of tapes that can be managed efficiently as
long as only a limited number are active at once.

3. Allow expanding the number of tapes for an existing tape set. This
is useful for HashAgg, which doesn't know how many tapes will be needed
in advance.

Regards,
    Jeff Davis


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Mark Dilger
Дата:
Сообщение: Re: Portal->commandTag as an enum
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: Dumping/restoring fails on inherited generated column