[BUGS] Mishandling of right-associated phrase operators in FTS

Поиск
Список
Период
Сортировка
От Tom Lane
Тема [BUGS] Mishandling of right-associated phrase operators in FTS
Дата
Msg-id 26706.1482087250@sss.pgh.pa.us
обсуждение исходный текст
Список pgsql-bugs
What do you think a tsquery like 'x <-> (y <-> z)' should mean?
I find it hard to assign it any meaning other than the same thing
as '(x <-> y) <-> z', ie, it should match a 3-lexeme sequence 'x y z'.

Right now, the execution engine gets this wrong:

regression=# select to_tsvector('x y z') @@ to_tsquery('x <-> y <-> z');
 ?column? 
----------
 t         -- okay
(1 row)

regression=# select to_tsvector('x y z') @@ to_tsquery('x <-> (y <-> z)');
 ?column? 
----------
 f         -- not so okay
(1 row)

This happens because the lower (righthand) <-> operator returns the
position of its righthand-side input ('z'), but that's two away from
where the 'x' is, so the upper phrase operator doesn't think there
is a match.

I considered trying to fix this by forcing right-associated cases into
left-associated form during tsquery parsing, but that has all the same
problems that I pointed out with respect to normalize_phrase_tree().
Really it'd be best to fix this by making the executor cope properly.
I think what we want is to pass down a flag telling recursive invocations
of TS_phrase_execute whether to return the position of the left-side or
right-side argument of a phrase match, which we would set according to
whether we are within the right or left argument of the most closely
nested upper phrase operator.  I propose to incorporate that fix into
the TS_phrase_execute rewrite I'm working on.

A related problem appears in clean_fakeval_intree()'s attempts to adjust
phrase-operator distances when it removes a stopword.  For example, 'a'
is a stopword, so we get:

regression=# select to_tsquery('(b <-> a) <-> c');
 to_tsquery  
-------------
 'b' <2> 'c'
(1 row)

That's fine, but I don't think this answer is right:

regression=# select to_tsquery('b <-> (a <-> c)');
 to_tsquery  
-------------
 'b' <-> 'c'
(1 row)

It should be 'b <2> c', same as the other one.

I haven't worked this out in detail, but I think a similar solution
would work for clean_fakeval_intree: pass down a flag indicating if
we're within the left or right argument of a <-> op, and return the
appropriate adjustment distance based on that.

            regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [BUGS] BUG #14469: Wrong cost estimates for merge append plan with partitions.
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: [BUGS] Crash with a CUBE query on 9.6