Re: WIP: Fast GiST index build

Поиск

Список

Период

Сортировка

От	Alexander Korotkov
Тема	Re: WIP: Fast GiST index build
Дата	22 июля 2011 г. 06:39:03
Msg-id	CAPpHfdvoOh78ycX3eRePTS29635pHCSTfLdFDzHZhxTKsggCuQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: WIP: Fast GiST index build (Alexander Korotkov <aekorotkov@gmail.com>)
Ответы	Re: WIP: Fast GiST index build
Список	pgsql-hackers

Дерево обсуждения

Hi!

Patch with my try to detect ordered datasets is attached. The implemented idea is desribed below.

Index tuples are divided by chunks of 128. On each chunk we measure how much leaf pages where index tuples was inserted don't match those of previous chunk. Based on statistics of several chunks we estimate distribution of accesses between lead pages (exponential distribution law is accumed and it's seems to be an error). After that we can estimate portion of index tuples which can be processed without actual IO. If this estimate exceeds threshold then we should switch to buffering build.

Now my implementation successfully detects randomly mixed datasets and well ordered datasets, but it's seems to be too optimistic about intermediate cases. I believe it's due to wrong assumption about distribution law.

Do you think this approach is acceptable? Probably there are some researches about distribution law for such cases (while I didn't find anything relevant in google scholar)?

As an alternative I can propose take into account actual average IO operations per tuple rather then an estimate.

------
With best regards,
Alexander Korotkov.

On Mon, Jul 18, 2011 at 10:00 PM, Alexander Korotkov <aekorotkov@gmail.com> wrote:

Hi!

New version of patch is attached. There are following changes.
1) Since proposed tchnique is not always a "fast" build, it was renamed everywhere in the patch to "buffering" build.
2) Parameter "buffering" now has 3 possible values "yes", "no" and "auto". "auto" means automatic switching from regular index build to buffering one. Currently it just switch when index size exceeds maintenance_work_mem.
3) Holding of many buffers pinned is avoided.
4) Rebased with head.

TODO:
1) Take care about ordered datasets in automatic switching.
2) Take care about concurrent backends in automatic switching.

------
With best regards,
Alexander Korotkov.

Вложения

gist_fast_build-0.8.0.patch.gz

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: WIP: Fast GiST index build

Вложения