Re: 9.5: Memory-bounded HashAgg

Поиск
Список
Период
Сортировка
От Jeff Davis
Тема Re: 9.5: Memory-bounded HashAgg
Дата
Msg-id 1418294807.5584.32.camel@jeff-desktop
обсуждение исходный текст
Ответ на 9.5: Memory-bounded HashAgg  (Jeff Davis <pgsql@j-davis.com>)
Ответы Re: 9.5: Memory-bounded HashAgg  (Tomas Vondra <tv@fuzzy.cz>)
Re: 9.5: Memory-bounded HashAgg  (Jeff Davis <pgsql@j-davis.com>)
Список pgsql-hackers
On Sun, 2014-08-10 at 14:26 -0700, Jeff Davis wrote:
> This patch is requires the Memory Accounting patch, or something similar
> to track memory usage.
>
> The attached patch enables hashagg to spill to disk, which means that
> hashagg will contain itself to work_mem even if the planner makes a
> bad misestimate of the cardinality.

New patch attached. All open items are complete, though the patch may
have a few rough edges.

Summary of changes:

 * rebased on top of latest memory accounting patch
http://www.postgresql.org/message-id/1417497257.5584.5.camel@jeff-desktop
 * added a flag to hash_create to prevent it from creating an extra
level of memory context
   - without this, the memory accounting would have a measurable impact
on performance
 * cost model for the disk usage
 * intelligently choose the number of partitions for each pass of the
data
 * explain support
 * in build_hash_table(), be more intelligent about the value of
nbuckets to pass to BuildTupleHashTable()
   - BuildTupleHashTable tries to choose a value to keep the table in
work_mem, but it isn't very accurate.
 * some very rudimentary testing (sanity checks, really) shows good
results

Summary of previous discussion (my summary; I may have missed some
points):

Tom Lane requested that the patch also handle the case where transition
values grow (e.g. array_agg) beyond work_mem. I feel this patch provides
a lot of benefit as it is, and trying to handle that case would be a lot
more work (we need a way to write the transition values out to disk at a
minimum, and perhaps combine them with other transition values). I also
don't think my patch would interfere with a fix there in the future.

Tomas Vondra suggested an alternative design that more closely resembles
HashJoin: instead of filling up the hash table and then spilling any new
groups, the idea would be to split the current data into two partitions,
keep one in the hash table, and spill the other (see
ExecHashIncreaseNumBatches()). This has the advantage that it's very
fast to identify whether the tuple is part of the in-memory batch or
not; and we can avoid even looking in the memory hashtable if not.

The batch-splitting approach has a major downside, however: you are
likely to evict a skew value from the in-memory batch, which will result
in all subsequent tuples with that skew value going to disk. My approach
never evicts from the in-memory table until we actually finalize the
groups, so the skew values are likely to be completely processed in the
first pass.

So, the attached patch implements my original approach, which I still
feel is the best solution.

Regards,
    Jeff Davis


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Etsuro Fujita
Дата:
Сообщение: Re: inherit support for foreign tables
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Directory/File Access Permissions for COPY and Generic File Access Functions