Re: WIP: Fast GiST index build

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: WIP: Fast GiST index build
Дата
Msg-id 4E569A16.8090405@enterprisedb.com
обсуждение исходный текст
Ответ на Re: WIP: Fast GiST index build  (Alexander Korotkov <aekorotkov@gmail.com>)
Ответы Re: WIP: Fast GiST index build  (Alexander Korotkov <aekorotkov@gmail.com>)
Список pgsql-hackers
On 24.08.2011 16:57, Alexander Korotkov wrote:
> I've added some testing results to the wiki page:
> http://wiki.postgresql.org/wiki/Fast_GiST_index_build_GSoC_2011
> There are not all the results I planned for the first chunk because it takes
> more time than I expect.
> Some notes about it.
>
> Now I see two causes which accelerate regular build of GiST indexes:
> 1) As it was noted before regular index build of pretty ordered dataset is
> fast.
> 2) I found that worse index is faster to build. I mean worse index is index
> with higher overlaps. Function gistchoose selects the first index tuple with
> zero penalty if any. Thus, with higher overlap in root page only few index
> tuples of it will be choosed for insert. And, recursively, only small part
> of the tree will be used for actual inserts. And that part of tree can
> easier fit to the cache. Thus, high overlaps  makes inserts cheaper as much
> as searches expensiver.

As an extreme case, a trivial penalty function that just always returns 
0 will make index build fast - but the index will be useless for querying.

> In the tests on the first version of patch I found index quality of regular
> build much better than it of buffering build (without neighborrelocation).
> Now it's similar, though it's because index quality of regular index build
> become worse. There by in current tests regular index build is faster than
> in previous. I see following possible causes of it:
>   1) I didn't save source random data. So, now it's a new random data.
> 2) Some environment parameters of my test setup may alters, though I doubt.
> Despite these possible explanation it seems quite strange for me.

That's pretty surprising. Assuming the data is truly random, I wouldn't 
expect a big difference in the index quality of one random data set over 
another. If the index quality depends so much on, say, the distribution 
of the few first tuples that are inserted to it, that's a quite 
interesting find on its own, and merits some further research.

> In order to compare index build methods on more qualitative indexes, I've
> tried to build indexes with my double sorting split method (see:
> http://syrcose.ispras.ru/2011/files/SYRCoSE2011_Proceedings.pdf#page=36). So
> on uniform dataset search is faster in about 10 times! And, as it was
> expected, regular index build becomes much slower. It runs more than 60
> hours and while only 50% of index is complete (estimated by file sizes).
>
> Also, automatic switching to buffering build shows better index quality
> results in all the tests. While it's hard for me to explain that.

Hmm, makes me a bit uneasy that we're testing with a modified page 
splitting algorithm. But if the new algorithm is that good, could you 
post that as a separate patch, please?

That said, I don't see any new evidence that the buffering build 
algorithm would be significantly worse. There's the case of ordered data 
that we already knew about, and will have to just accept for now.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: A couple of issues with psql variable substitution
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Backup's from standby