Re: Latin vs non-Latin words in text search parsing

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Latin vs non-Latin words in text search parsing
Дата
Msg-id 11092.1193150561@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Latin vs non-Latin words in text search parsing  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Latin vs non-Latin words in text search parsing  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Latin vs non-Latin words in text search parsing  (Michael Glaesemann <grzm@seespotcode.net>)
Re: Latin vs non-Latin words in text search parsing  (Gregory Stark <stark@enterprisedb.com>)
Список pgsql-hackers
I wrote:
> Maybe "aword", "word", and "numword"?

Does the lack of response mean people are satisfied with that?

Fleshing the proposal out to include the hyphenated-word categories:

aword        All ASCII letters
word        All letters according to iswalpha()
numword        Mixed letters and digits (all iswalnum())

ahword        Hyphenated word, all ASCII letters
hword        Hyphenated word, all letters
numhword    Hyphenated word, mixed letters and digits

apart_hword    Part of hyphenated word, all ASCII letters
part_hword    Part of hyphenated word, all letters
numpart_hword    Part of hyphenated word, mixed letters and digits

(As an example, "foo-beta1" is a numhword, with component tokens
"foo" an aword and "beta1" a numword.  This is how it works now
modulo the redefinition of the base categories.)

I'm not totally thrilled with these short names for the hyphenation
categories, but they will seem at least somewhat familiar to users
of contrib/tsearch2, and it's probably not worth changing them just
to make them look prettier.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Jonah H. Harris"
Дата:
Сообщение: Re: MVCC, undo log, and HOT
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Latin vs non-Latin words in text search parsing