Re: GIN index build speed

Поиск
Список
Период
Сортировка
От Teodor Sigaev
Тема Re: GIN index build speed
Дата
Msg-id 49352C18.30702@sigaev.ru
обсуждение исходный текст
Ответ на GIN index build speed  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Список pgsql-hackers
> The issue is that the GIN index build code accumulates the lexemes into 
> a binary tree, but there's nothing to keep the tree balanced. My test 
> case with almost monotonically increasing keys, happens to be a 
> worst-case scenario, and the tree degenerates into almost linked list 
> that every insertion has to grovel through.
Agree, but in most cases it works well. Because lexemes in documents aren't ordered.


> 
> The obvious fix is to use a balanced tree algorithm. I wrote a quick 
> patch to turn the tree into a splay tree. That fixed the degenerative 
> behavior, and the runtime of CREATE INDEX for the above test case fell 
> from 40s to 1.5s.
BTW, your patch helps to GIN's btree emulation. With typical scenarios of usage 
of btree emulation scalar column will be more or less ordered.

> 
> Magnus kindly gave me a dump of the full-text-search tables from 
> search.postgresql.org, for some real-world testing. Quick testing with 
> that suggests that the patch unfortunately makes the index build 5-10% 
> slower with that data set.
Do you see ways to  improve that?
> 
> We're in commitfest, not supposed to be submitting new features, so I'm 
> not going to pursue this further right now. Patch attached, however, 
> which seems to work fine.
Personally, I don't  object to improve that.
-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Fujii Masao"
Дата:
Сообщение: Re: Sync Rep: First Thoughts on Code
Следующее
От: Teodor Sigaev
Дата:
Сообщение: Re: [PATCHES] GIN improvements