Re: tweaking NTUP_PER_BUCKET

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: tweaking NTUP_PER_BUCKET
Дата
Msg-id CA+TgmoZRWuAxAKzodDEjRUJyrcFVGhuE9_LAvexzx8ffWrUJwA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: tweaking NTUP_PER_BUCKET  (Tomas Vondra <tv@fuzzy.cz>)
Ответы Re: tweaking NTUP_PER_BUCKET  (Tomas Vondra <tv@fuzzy.cz>)
Re: tweaking NTUP_PER_BUCKET  (Tomas Vondra <tv@fuzzy.cz>)
Список pgsql-hackers
On Tue, Jul 8, 2014 at 5:16 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
> Thinking about this a bit more, do we really need to build the hash
> table on the first pass? Why not to do this:
>
> (1) batching
>     - read the tuples, stuff them into a simple list
>     - don't build the hash table yet
>
> (2) building the hash table
>     - we have all the tuples in a simple list, batching is done
>     - we know exact row count, can size the table properly
>     - build the table

We could do this, and in fact we could save quite a bit of memory if
we allocated say 1MB chunks and packed the tuples in tightly instead
of palloc-ing each one separately.  But I worry that rescanning the
data to build the hash table would slow things down too much.

> Also, maybe we could use a regular linear hash table [1], instead of
> using the current implementation with NTUP_PER_BUCKET=1. (Although,
> that'd be absolutely awful with duplicates.)

Linear probing is pretty awful unless your load factor is << 1.  You'd
probably want NTUP_PER_BUCKET=0.25, or something like that, which
would eat up a lot of memory.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: postgresql.auto.conf and reload
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Pg_upgrade and toast tables bug discovered