Re: GSoC 2011: Fast GiST index build

Поиск

Список

Период

Сортировка

От	Alexander Korotkov
Тема	Re: GSoC 2011: Fast GiST index build
Дата	26 апреля 2011 г. 09:10:28
Msg-id	BANLkTinGCBTpniGqOPnKmLY4UBPKHRa8Tw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: GSoC 2011: Fast GiST index build (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Ответы	Re: GSoC 2011: Fast GiST index build (Alexander Korotkov <aekorotkov@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On Tue, Apr 26, 2011 at 10:46 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

Just palloc() the buffers in memory, at least in the first phase. That'll work fine for index creation. Dealing with concurrent searches and inserts makes it a lot more complicated, it's better to make it work for the index creation first, and investigate something like the GIN fastupdate buffers later if you have time left.

Since algorithm is focused to reduce I/O, we should expect best acceleration in the case when index doesn't fitting to memory. Size of buffers is comparable to size of whole index. It means that if we can hold buffers in memory then we mostly can hold whole index in memory. That's why I think we should have simple on-disk buffers management for first representative benchmark.

The first priority should be to have something that works enough to be benchmarked. The paper you referred to in the GSoC application [1] contained empirical results on the number of I/O operations needed with the algorithm, but it didn't take operating system cache into account at all. That makes the empiric results next to worthless; keeping some stuff in in-memory buffers is obviously going to reduce I/O if you don't take OS cache into account.

So we're going to need benchmark results that show a benefit, or there's no point in doing this at all. The sooner we get to benchmarking, even with a very limited and buggy version of the patch, the better. If the algorithm described in that paper doesn't give much benefit, you might have to switch to some other algorithm half-way through the project. Fortunately there's plenty of R-tree bulk loading algorithms in the literature, it should be possible to adapt some of them to GiST.

[1] http://dx.doi.org/10.1007/s00453-001-0107-6

Yes, these priority seems very reasonable. We should have first effectiveness confirmation as soon as possible. I'll hold on this priority.

----
With best regards,
Alexander Korotkov.

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Leonardo Francalanci
Дата: 26 апреля 2011 г., 07:50:30
Сообщение: Re: Unlogged tables, persistent kind

Следующее

От: Yves Weißig
Дата: 26 апреля 2011 г., 09:18:31
Сообщение: Re: operator classes for index?

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: GSoC 2011: Fast GiST index build

Предыдущее

Следующее