Re: tweaking NTUP_PER_BUCKET

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: tweaking NTUP_PER_BUCKET
Дата
Msg-id 53B59ADC.5060709@fuzzy.cz
обсуждение исходный текст
Ответ на tweaking NTUP_PER_BUCKET  (Tomas Vondra <tv@fuzzy.cz>)
Ответы Re: tweaking NTUP_PER_BUCKET  (Stephen Frost <sfrost@snowman.net>)
Список pgsql-hackers
On 3.7.2014 02:13, Tomas Vondra wrote:
> Hi,
>
> while hacking on the 'dynamic nbucket' patch, scheduled for the next CF
> (https://commitfest.postgresql.org/action/patch_view?id=1494) I was
> repeatedly stumbling over NTUP_PER_BUCKET. I'd like to propose a change
> in how we handle it.
>
>
> TL;DR; version
> --------------
>
> I propose dynamic increase of the nbuckets (up to NTUP_PER_BUCKET=1)
> once the table is built and there's free space in work_mem. The patch
> mentioned above makes implementing this possible / rather simple.

Attached is v1 of this experimental patch. It's supposed to be applied
on top of v7 of this patch

   http://www.postgresql.org/message-id/53B59498.3000800@fuzzy.cz

as it shared most of the implementation. So apply it like this:

   patch -p1 < hashjoin-nbuckets-growth-v7.patch
   patch -p1 < hashjoin-dynamic-ntup-v1.patch

It implements the ideas outlined in the previous message, most importantly:

   (a) decreases NTUP_PER_BUCKET to 4

   (b) checks free work_mem when deciding whether to add batch

   (c) after building the batches, increases the number of buckets as
       much as possible, i.e. up to the number of batch tuples, but not
       exceeding work_mem

The improvements for the queries I tried so far are quite significant
(partially due to the NTUP_PER_BUCKET decrease, partially due to the
additional bucket count increase).

The rebuild is quite fast - the patch measures and reports this as a
WARNING, and the timings I've seen are ~12ms per 7MB (i.e. ~120ms for
70MB and ~1200ms for 700MB). Of course, this only makes sense when
compared to how much time it saved, but for the queries I tested so far
this was a good investment.

However it's likely there are queries where this may not be the case,
i.e. where rebuilding the hash table is not worth it. Let me know if you
can construct such query (I wasn't).

regards
Tomas

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: bad estimation together with large work_mem generates terrible slow hash joins
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: tweaking NTUP_PER_BUCKET