Fixing hash index build time

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Fixing hash index build time
Дата	20 марта 2007 г. 16:52:37
Msg-id	16911.1174420341@sss.pgh.pa.us обсуждение
Ответы	Re: Fixing hash index build time
Список	pgsql-hackers

Дерево обсуждения

There are several reasons why Postgres' hash indexes currently suck,
but one of the bigger ones is that the time to build an index on a
large existing table is excessive, eg
http://archives.postgresql.org/pgsql-novice/2007-03/msg00064.php

I'm not sure if this has been discussed before, but I suddenly realized
while responding to the above message that the reason for the awful
performance is pretty obvious: hashbuild starts with a minimum-size
index (two buckets) and repeatedly splits buckets as insertions are
done, exactly the same as ordinary dynamic growth of the index would do.
This means that for an N-row table, approximately N/entries-per-bucket
splits need to occur during index build, which results in roughly O(N^2)
performance because we have to reprocess already-inserted entries over
and over.  This explains the empiric observation I made a long time ago:
http://archives.postgresql.org/pgsql-hackers/2002-04/msg01379.php

This could be fixed with a relatively small amount of new code: when
beginning hashbuild, estimate the parent table's rowcount (the same
method used by the planner will do fine, viz RelationGetNumberOfBlocks
times an estimated tuple density) and construct the appropriate number
of buckets immediately.  No splits, just write out empty pages as fast
as we can.  *Then* do the insertions.

Comments?
        regards, tom lane

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Fixing hash index build time