Re: WIP: Fast GiST index build

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: WIP: Fast GiST index build
Дата
Msg-id CAPpHfdvoOh78ycX3eRePTS29635pHCSTfLdFDzHZhxTKsggCuQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: WIP: Fast GiST index build  (Alexander Korotkov <aekorotkov@gmail.com>)
Ответы Re: WIP: Fast GiST index build
Список pgsql-hackers
Hi!

Patch with my try to detect ordered datasets is attached. The implemented idea is desribed below.
Index tuples are divided by chunks of 128. On each chunk we measure how much leaf pages where index tuples was inserted don't match those of previous chunk. Based on statistics of several chunks we estimate distribution of accesses between lead pages (exponential distribution law is accumed and it's seems to be an error). After that we can estimate portion of index tuples which can be processed without actual IO. If this estimate exceeds threshold then we should switch to buffering build.
Now my implementation successfully detects randomly mixed datasets and well ordered datasets, but it's seems to be too optimistic about intermediate cases. I believe it's due to wrong assumption about distribution law.
Do you think this approach is acceptable? Probably there are some researches about distribution law for such cases (while I didn't find anything relevant in google scholar)?
As an alternative I can propose take into account actual average IO operations per tuple rather then an estimate.

------
With best regards,
Alexander Korotkov.

On Mon, Jul 18, 2011 at 10:00 PM, Alexander Korotkov <aekorotkov@gmail.com> wrote:
Hi!

New version of patch is attached. There are following changes.
1) Since proposed tchnique is not always a "fast" build, it was renamed everywhere in the patch to "buffering" build.
2) Parameter "buffering" now has 3 possible values "yes", "no" and "auto". "auto" means automatic switching from regular index build to buffering one. Currently it just switch when index size exceeds maintenance_work_mem.
3) Holding of many buffers pinned is avoided.
4) Rebased with head.

TODO:
1) Take care about ordered datasets in automatic switching.
2) Take care about concurrent backends in automatic switching.

------
With best regards,
Alexander Korotkov.

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Questions and experiences writing a Foreign Data Wrapper
Следующее
От: Kohei Kaigai
Дата:
Сообщение: Re: [v9.1] sepgsql - userspace access vector cache