Re: Mailing list search engine: surprising missing results?

Поиск
Список
Период
Сортировка
От Ivan Panchenko
Тема Re: Mailing list search engine: surprising missing results?
Дата
Msg-id a73f39bc-94f9-e8c6-9428-9ce94b33a4a7@postgrespro.ru
обсуждение исходный текст
Ответ на Re: Mailing list search engine: surprising missing results?  (James Addison <jay@jp-hosting.net>)
Ответы Re: Mailing list search engine: surprising missing results?
Список pgsql-www
On 25.01.2022 23:48, James Addison wrote:
> I'm uncertain why parsing hyphenated query text produces compound tokens?

Because in some cases user wants to search the full hyphenated words, 
not parts of them.

But the parser is pluggable, it is possible to develop another one, such 
as  pg_tsparser [1] which does the same for underscores.

*to_tsquery functions are also changeable. There can exist plenty of 
them according to different user requirements.
Such function just translates the query from the user query language 
with its semantics into the tsquery language.
So you may write your own and contribute it to community or not. Another 
option is to make a wrapper function which will modify the result of 
existing *to_tsquery function to fit your task.

> There are a couple of references[1][2] in the documentation about the
> dash character being converted to a boolean not (!) operator by
> websearch_to_tsquery, but that seems unrelated.
>
> postgres=# select plainto_tsquery('simple', 'a-b');
>    plainto_tsquery
> -------------------
>   'a-b' & 'a' & 'b'
> (1 row)
>
> postgres=# select plainto_tsquery('simple', 'a_b');
>   plainto_tsquery
> -----------------
>   'a' & 'b'
> (1 row)
>
> postgres=# select plainto_tsquery('simple', 'a+b');
>   plainto_tsquery
> -----------------
>   'a' & 'b'
> (1 row)
In these examples, some characters are removed by the parser. Try 
ts_debug('simple', 'a+b').
>
> [1] - https://www.postgresql.org/docs/14/functions-textsearch.html
> [2] - https://www.postgresql.org/docs/14/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES
> On Tue, 25 Jan 2022 at 17:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Ivan Panchenko <i.panchenko@postgrespro.ru> writes:
>>> The actual explanation can be seen from comparing a tsvector with a tsquery.
>>> To avoid stemming effects, we use the simple configuration below.
>>> # select plainto_tsquery('simple','boyers-moore');
>>>              plainto_tsquery
>>> -------------------------------------
>>>    'boyers-moore' & 'boyers' & 'moore'
>>> # select to_tsvector('simple','boyers-moore-horspool');
>>>                            to_tsvector
>>> -------------------------------------------------------------
>>>    'boyers':2 'boyers-moore-horspool':1 'horspool':4 'moore':3
>>> Obviously, such tsvector does not match the above tsquery. I think,a better tsquery for this query would be
>>>    'boyers-moore' | ('boyers' & 'moore')
>>> May be, it is worth changing to_tsquery() behavior for such cases.
>> Changing the behavior of to_tsquery is certainly a lot less scary
>> than changing to_tsvector --- it wouldn't call the validity of
>> existing tsvector indexes into question.
>>
>> I see that to_tsquery is even sillier than plainto_tsquery:
>>
>> regression=# select to_tsquery('simple','boyers-moore');
>>                 to_tsquery
>> -----------------------------------------
>>   'boyers-moore' <-> 'boyers' <-> 'moore'
>> (1 row)
>>
>> which is absolutely not a sane translation.
>>
>> It seems to me that in both cases we'd be better off generating
>> "'boyers' <-> 'moore'", without the compound token at all.
>> Maybe there's a case for the weaker 'boyers' & 'moore' translation,
>> but I think if people wanted that they'd just enter separate words.

Matching the compond token might be significant for ranking. (?)

Probably, there is no universal *to_tsquery function and no universal 
parser to fit all users.

[1] https://github.com/postgrespro/pg_tsparser

>>
>>                          regards, tom lane
>>
>>
regards, Ivan
  




В списке pgsql-www по дате отправления:

Предыдущее
От: James Addison
Дата:
Сообщение: Re: Mailing list search engine: surprising missing results?
Следующее
От: James Addison
Дата:
Сообщение: Re: Mailing list search engine: surprising missing results?