Re: [GENERAL] Incorrect FTS result with GIN index

Поиск
Список
Период
Сортировка
От Oleg Bartunov
Тема Re: [GENERAL] Incorrect FTS result with GIN index
Дата
Msg-id Pine.LNX.4.64.1007291459270.32129@sn.sai.msu.ru
обсуждение исходный текст
Ответ на Re: [GENERAL] Incorrect FTS result with GIN index  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [GENERAL] Incorrect FTS result with GIN index  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Tom,

we're not able to work on this right now, so go ahead if you have time.
I also wonder why did I get "right" result :) Just repeated the query:

test=# select count(*) from search_tab where (to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:* & dd:*'));
count
 
-------   123
(1 row)

Time: 26.185 ms


Oleg
On Wed, 28 Jul 2010, Tom Lane wrote:

> Oleg Bartunov <oleg@sai.msu.su> writes:
>> you can download dump http://mira.sai.msu.su/~megera/tmp/search_tab.dump
>
> Hmm ... I'm not sure why you're failing to reproduce it, because it's
> falling over pretty easily for me.  After poking at it for awhile,
> I am of the opinion that scanGetItem's handling of multiple keys is
> fundamentally broken and needs to be rewritten completely.  The
> particular case I'm seeing here is that one key returns this sequence of
> TIDs/lossy flags:
>
> ...
> 1085/4 0
> 1086/65535 1
> 1087/4 0
> ...
>
> while the other one returns this:
>
> ...
> 1083/11 0
> 1086/6 0
> 1086/10 0
> 1087/10 0
> ...
>
> and what comes out of scanGetItem is just
>
> ...
> 1086/6 1
> ...
>
> because after returning that, on the next call it advances both input
> keystreams.  So 1086/10 should be visited and is not.
>
> I think that depending on the previous entryRes state to determine what
> to do is basically unworkable, and what should probably be done instead
> is to remember the last-returned TID and advance keystreams with TIDs <=
> that.  I haven't quite thought through how that should interact with
> lossy-page TIDs but it seems more robust than what we've got.
>
> I'm also noticing that the ANDing behavior for the "ee:* & dd:*" query
> style seems very much stupider than it needs to be --- it's returning
> lossy pages that very obviously don't need to be examined because the
> other keystream has no match at all on that page.  But I haven't had
> time to probe into the reason why.
>
> I'm out of time for today, do you want to work on it?
>
>             regards, tom lane
>
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Matthew Wakeling
Дата:
Сообщение: Re: [JDBC] Trouble with COPY IN
Следующее
От: Henk Enting
Дата:
Сообщение: patch for check constraints using multiple inheritance