Hash index build patch has *worse* performance at small table sizes

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Hash index build patch has *worse* performance at small table sizes
Дата
Msg-id 2292.1205697013@sss.pgh.pa.us
обсуждение исходный текст
Ответы Re: Hash index build patch has *worse* performance at small table sizes  (Bruce Momjian <bruce@momjian.us>)
Список pgsql-hackers
I've been reviewing the hash index build patch submitted here:
http://archives.postgresql.org/pgsql-patches/2007-10/msg00154.php

Although it definitely helps on large indexes, it's actually
counterproductive on not-so-large ones.  The test case I'm using
is random integers generated like this:create table foo as select (random() * N)::int as f1  from
generate_series(1,N);selectcount(*) from foo; -- force hint bit updatescheckpoint;
 
then timingcreate index fooi on foo using hash(f1);

Using all-default configuration settings on some not-very-new hardware,
at N = 1E6 I see

8.3.1:                    30 sec
With pre-expansion of index (CVS HEAD):    24 sec
With sorting:                72 sec
To build a btree index on same data:    34 sec

Now this isn't amazingly surprising, because the original argument for
doing sorting was to improve locality of access to the index during
the build, and that only matters if you've got an index significantly
bigger than memory.  If the index fits in RAM then the sort is pure
overhead.

The obvious response to this is to use the sorting approach only when
the estimated index size exceeds some threshold.  One possible choice of
threshold would be shared_buffers (or temp_buffers for a temp index)
but I think that is probably too conservative, because in most scenarios
the kernel's disk cache is available too.  Plus you can't tweak that
setting without a postmaster restart.  I'm tempted to use
effective_cache_size, which attempts to measure an appropriate number
and can be set locally within the session doing the CREATE INDEX if
necessary.  Or we could invent a new GUC parameter, but that is probably
overkill.

Comments?
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Single table forcing sequential scans on query plans
Следующее
От: "Heikki Linnakangas"
Дата:
Сообщение: Rewriting Free Space Map