Re: old bug in full text parser

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: old bug in full text parser
Дата
Msg-id 27036.1455121278@sss.pgh.pa.us
обсуждение исходный текст
Ответ на old bug in full text parser  (Oleg Bartunov <obartunov@gmail.com>)
Ответы Re: old bug in full text parser  (Oleg Bartunov <obartunov@gmail.com>)
Список pgsql-hackers
Oleg Bartunov <obartunov@gmail.com> writes:
> It  looks like there is a very old bug in full text parser (somebody
> pointed me on it), which appeared after moving tsearch2 into the core.  The
> problem is in how full text parser process hyphenated words. Our original
> idea was to report hyphenated word itself as well as its parts and ignore
> hyphen. That was how tsearch2 works.

> This behaviour was changed after moving tsearch2 into the core:
> 1. hyphen now reported by parser, which is useless.
> 2.  Hyphenated words with numbers ('4-dot', 'dot-4')  processed differently
> than ones with plain text words like 'four-dot', no hyphenated word itself
> reported.

> I think we should consider this as a bug and produce fix for all supported
> versions.

I don't see anything here that looks like a bug, more like a definition
disagreement.  As such, I'd be pretty dubious about back-patching a
change.  But it's hard to debate the merits when you haven't said exactly
what you'd do instead.

I believe the commit you mention was intended to fix this inconsistency:

http://www.postgresql.org/message-id/6269.1193184058@sss.pgh.pa.us

so I would be against simply reverting it.  In any case, the examples
given there make it look like there was already inconsistency about mixed
words and numbers.  Do we really think that "4-dot" should be considered
a hyphenated word?  I'm not sure.
        regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Updated backup APIs for non-exclusive backups
Следующее
От: Robert Haas
Дата:
Сообщение: Re: [COMMITTERS] pgsql: Code cleanup in the wake of recent LWLock refactoring.