Re: 9.6 phrase search distance specification

Поиск
Список
Период
Сортировка
От Ryan Pedela
Тема Re: 9.6 phrase search distance specification
Дата
Msg-id CACu89FTgqJyeKCDG1+PqNhJzhO5ywuhfLGxM3nwusL3WVstQKQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: 9.6 phrase search distance specification  (Oleg Bartunov <obartunov@gmail.com>)
Ответы Re: 9.6 phrase search distance specification  (Ryan Pedela <rpedela@datalanche.com>)
Список pgsql-hackers
On Thu, Aug 11, 2016 at 9:27 AM, Oleg Bartunov <obartunov@gmail.com> wrote:
On Tue, Aug 9, 2016 at 9:59 PM, Ryan Pedela <rpedela@datalanche.com> wrote:
>
>

>  I would say that it is worth it to have a "phrase slop" operator (Apache
> Lucene terminology). Proximity search is extremely useful for improving
> relevance and phrase slop is one of the tools to achieve that.
>

It'd be great if you explain what is "phrase slop". I assume it's not
about search, but about relevance.

Sure. An exact phrase query has slop = 0 which means find all terms in the exact positions relative to each other. Phrase query with slop > 0 means find all terms within <slop> positions relative to each other. If slop = 10, find all terms within 10 positions of each other. Here is a concrete example from my current work searching SEC filings.

Bill Gates' full legal name is William H. Gates, III. In the SEC database [1], his name is GATES WILLIAM H III. If you are searching the records of people within the SEC database and you want to find Bill Gates, most users will type "bill gates". Since there are many people with the first name Bill (William) and the last name Gates, Bill Gates most likely won't be the first result with a standard keyword query. Likewise an exact phrase query (slop = 0) will not find him either because the first and last names are transposed. What you need is a phrase query with a slop = 2 which will match "William Gates", "William H Gates", "Gates William", etc. There is still the issue of Bill vs William, but that can be solved with synonyms and is a different topic.


Thanks,
Ryan

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: No longer possible to query catalogs for index capabilities?
Следующее
От: Greg Stark
Дата:
Сообщение: Re: No longer possible to query catalogs for index capabilities?