Re: Searching for substring with tsearch(1/2)

Поиск
Список
Период
Сортировка
От Teodor Sigaev
Тема Re: Searching for substring with tsearch(1/2)
Дата
Msg-id 3FD6F013.6050906@sigaev.ru
обсуждение исходный текст
Ответ на Re: Searching for substring with tsearch(1/2)  (Hannu Krosing <hannu@tm.ee>)
Список pgsql-hackers
> I meant that the expansion of 'hu%' is done before and outside of
> tsearch, so the question is how efficient will tsearch be for searching
> for hudreds or thousands of words in one expression.

Ok, I see. The answer - bad. Index structure is signature tree with constant 
signature length, by default 2016 bits. Siganture makes by hashing word and sets 
bits number HASHVAL % 2016 to 1. So, if query has many terms and all terms are 
ored then there is a lot of signatures that matched by query. This means a lot 
of pages in index will be readed.


>>>How hard (or sensible ;) would be creating such an index using GiST ?
>>>As proved by tsearch GiST can cope well with many-to-many indexes.
>>
>>Sorry, I don't understand. Do you mean that GiST supports one heap tuple in 
>>several index tuple? If yes then no :). GiST doesn't support this feature. I 
>>don't think that GiST may help in this situation.
> 
> 
> but tsearch seems to support this, and tsearch uses GiST. Is this
> functionality added entirely by tsearch ?
No, one heap tuple - one index tuple.

I'll try to explain index structure used by tsearch (three levels just for example):
Root page internal tuple 1  -> second level page 1                       internal tuple 1.1 ->
internaltuple 1.2 -> internal tuple 2  -> second level page 2                       internal tuple 2.1 ->
       internal tuple 2.2 -> third level (leaf) page 2.2                                              leaf tuple 2.2.1
->heap tuple                                              leaf tuple 2.2.2 -> heap tuple
 

leaf tuple contains one of two types of predicats:  1 just lexemes (without psition information)  2 if store size of
firsttype is too big then tuple    stores signature as described above.
 

internal tuple contains ored (super-imposed) signatures of childs.



-- 
Teodor Sigaev                                  E-mail: teodor@sigaev.ru



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Hannu Krosing
Дата:
Сообщение: Re: Searching for substring with tsearch(1/2)
Следующее
От: strk
Дата:
Сообщение: Re: ERROR: Index pg_toast_8443892_index is not a btree