Re: tweaking NTUP_PER_BUCKET

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: tweaking NTUP_PER_BUCKET
Дата
Msg-id 53BD81DB.2070504@fuzzy.cz
обсуждение исходный текст
Ответ на Re: tweaking NTUP_PER_BUCKET  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: tweaking NTUP_PER_BUCKET  (Simon Riggs <simon@2ndQuadrant.com>)
Список pgsql-hackers
On 9.7.2014 16:07, Robert Haas wrote:
> On Tue, Jul 8, 2014 at 5:16 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
>> Thinking about this a bit more, do we really need to build the
>> hash table on the first pass? Why not to do this:
>> 
>> (1) batching - read the tuples, stuff them into a simple list -
>> don't build the hash table yet
>> 
>> (2) building the hash table - we have all the tuples in a simple
>> list, batching is done - we know exact row count, can size the
>> table properly - build the table
> 
> We could do this, and in fact we could save quite a bit of memory if 
> we allocated say 1MB chunks and packed the tuples in tightly instead 
> of palloc-ing each one separately.  But I worry that rescanning the 
> data to build the hash table would slow things down too much.

OK. I think we should shoot for no negative impact on well estimated
plans, and the rescans might violate that. I need to think about this
for some time, but my current plan is this:

(1) size the buckets for NTUP_PER_BUCKET=1 (and use whatever number   of batches this requires)

(2) build the batches as in the current patch

(3) if the hash table is too small, resize it

Which is not really different from the current patch.

>> Also, maybe we could use a regular linear hash table [1], instead
>> of using the current implementation with NTUP_PER_BUCKET=1.
>> (Although, that'd be absolutely awful with duplicates.)
> 
> Linear probing is pretty awful unless your load factor is << 1.
> You'd probably want NTUP_PER_BUCKET=0.25, or something like that,
> which would eat up a lot of memory.

My experience is that 0.5-0.75 load factor is quite OK, and
NTUP_PER_BUCKET=1 gives you ~0.75 load on average, so it's not that
different. Anyway, the inability to handle duplicies reasonably is
probably a show-stopper.

Tomas



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Rajmohan C
Дата:
Сообщение: how to find the order of joins from Explain command XML plan output in PostgreSQL
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: IMPORT FOREIGN SCHEMA statement