Обсуждение: Email parsing in Text Search

Поиск
Список
Период
Сортировка

Email parsing in Text Search

От
Martin Dubé
Дата:
Hi,

I'm having a weird behavior with the email parser and wonder if it is a bug or a feature.

When using the default regconfig and parse an email where the first part is numbers only, it is not parsed as an email.

db=# select * from ts_debug('pg_catalog.english', '000000001@asdf.com');
 alias |   description    |   token   | dictionaries | dictionary |   lexemes   
-------+------------------+-----------+--------------+------------+-------------
 uint  | Unsigned integer | 000000001 | {simple}     | simple     | {000000001}
 blank | Space symbols    | @         | {}           |            | 
 host  | Host             | asdf.com  | {simple}     | simple     | {asdf.com}
(3 rows)


However, if I add a letter, it is parsed as an email.

db=# select * from ts_debug('pg_catalog.english', '000000001a@asdf.com');
 alias |  description  |        token        | dictionaries | dictionary |        lexemes        
-------+---------------+---------------------+--------------+------------+-----------------------
 email | Email address | 000000001a@asdf.com | {simple}     | simple     | {000000001a@asdf.com}
(1 row)

According to RFC and several forums, an email address with only numbers in the first part is valid. 

Is it a normal behavior?

I did the test on OpenBSD 5.9 and postgresql is at version 9.4.6.

Thanks,


--
Mart

Re: Email parsing in Text Search

От
Tom Lane
Дата:
Martin Dubé <martin.dube@gmail.com> writes:
> When using the default regconfig and parse an email where the first part is
> numbers only, it is not parsed as an email.

This has been changed for 9.6:
* Fix the default text search parser to allow leading digits in email and host tokens (Artur Zakirov)
        regards, tom lane



Re: Email parsing in Text Search

От
Martin Dubé
Дата:
I should have seen that! Thank you very much!

On Wed, Sep 7, 2016 at 2:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Martin Dubé <martin.dube@gmail.com> writes:
> When using the default regconfig and parse an email where the first part is
> numbers only, it is not parsed as an email.

This has been changed for 9.6:

        * Fix the default text search parser to allow leading digits in email and host tokens (Artur Zakirov)

                        regards, tom lane



--
Mart