Обсуждение: unexpected result from to_tsvector

Поиск
Список
Период
Сортировка

unexpected result from to_tsvector

От
Artur Zakirov
Дата:
Hello,

Here is a little patch. It fixes this issue
http://www.postgresql.org/message-id/20160217080048.26357.49416@wrigleys.postgresql.org

Without patch we get wrong result for the second email 'test@123-reg.ro':

=> SELECT * FROM ts_debug('simple', 'test@vauban-reg.ro');
  alias |  description  |       token        | dictionaries | dictionary
|       lexemes
-------+---------------+--------------------+--------------+------------+----------------------
  email | Email address | test@vauban-reg.ro | {simple}     | simple  |
{test@vauban-reg.ro}
(1 row)

=> SELECT * FROM ts_debug('simple', 'test@123-reg.ro');
    alias   |   description    | token  | dictionaries | dictionary |
lexemes
-----------+------------------+--------+--------------+------------+----------
  asciiword | Word, all ASCII  | test   | {simple}     | simple     | {test}
  blank     | Space symbols    | @      | {}           |            |
  uint      | Unsigned integer | 123    | {simple}     | simple     | {123}
  blank     | Space symbols    | -      | {}           |            |
  host      | Host             | reg.ro | {simple}     | simple     |
{reg.ro}
(5 rows)

After patch we get correct result for the second email:

=> SELECT * FROM ts_debug('simple', 'test@123-reg.ro');
  alias |  description  |      token      | dictionaries | dictionary |
       lexemes
-------+---------------+-----------------+--------------+------------+----------------------
  email | Email address | test@123-reg.ro | {simple}     | simple  |
{test@123-reg.ro}
(1 row)

This patch allows to parser work with emails 'test@123-reg.ro',
'123@123-reg.ro' and 'test@123_reg.ro' correctly.

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Вложения

Re: unexpected result from to_tsvector

От
Dmitrii Golub
Дата:
<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">2016-02-23 20:53 GMT+03:00 Artur Zakirov <span
dir="ltr"><<ahref="mailto:a.zakirov@postgrespro.ru" target="_blank">a.zakirov@postgrespro.ru</a>></span>:<br
/><blockquoteclass="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">Hello,<br/><br /> Here is a little patch. It fixes this issue <a
href="http://www.postgresql.org/message-id/20160217080048.26357.49416@wrigleys.postgresql.org"rel="noreferrer"
target="_blank">http://www.postgresql.org/message-id/20160217080048.26357.49416@wrigleys.postgresql.org</a><br/><br />
Withoutpatch we get wrong result for the second email '<a href="mailto:test@123-reg.ro"
target="_blank">test@123-reg.ro</a>':<br/><br /> => SELECT * FROM ts_debug('simple', '<a
href="mailto:test@vauban-reg.ro"target="_blank">test@vauban-reg.ro</a>');<br />  alias |  description  |       token   
   | dictionaries | dictionary |       lexemes<br />
-------+---------------+--------------------+--------------+------------+----------------------<br/>  email | Email
address| <a href="mailto:test@vauban-reg.ro" target="_blank">test@vauban-reg.ro</a> | {simple}     | simple  | {<a
href="mailto:test@vauban-reg.ro"target="_blank">test@vauban-reg.ro</a>}<br /> (1 row)<br /><br /> => SELECT * FROM
ts_debug('simple','<a href="mailto:test@123-reg.ro" target="_blank">test@123-reg.ro</a>');<br />    alias   | 
 description   | token  | dictionaries | dictionary | lexemes<br />
-----------+------------------+--------+--------------+------------+----------<br/>  asciiword | Word, all ASCII  |
test  | {simple}     | simple     | {test}<br />  blank     | Space symbols    | @      | {}           |           
|<br/>  uint      | Unsigned integer | 123    | {simple}     | simple     | {123}<br />  blank     | Space symbols    |
-     | {}           |            |<br />  host      | Host             | <a href="http://reg.ro" rel="noreferrer"
target="_blank">reg.ro</a>| {simple}     | simple     | {<a href="http://reg.ro" rel="noreferrer"
target="_blank">reg.ro</a>}<br/> (5 rows)<br /><br /> After patch we get correct result for the second email:<br /><br
/>=> SELECT * FROM ts_debug('simple', '<a href="mailto:test@123-reg.ro" target="_blank">test@123-reg.ro</a>');<br />
 alias|  description  |      token      | dictionaries | dictionary |       lexemes<br />
-------+---------------+-----------------+--------------+------------+----------------------<br/>  email | Email
address| <a href="mailto:test@123-reg.ro" target="_blank">test@123-reg.ro</a> | {simple}     | simple  | {<a
href="mailto:test@123-reg.ro"target="_blank">test@123-reg.ro</a>}<br /> (1 row)<br /><br /> This patch allows to parser
workwith emails '<a href="mailto:test@123-reg.ro" target="_blank">test@123-reg.ro</a>', '<a
href="mailto:123@123-reg.ro"target="_blank">123@123-reg.ro</a>' and '<a href="mailto:test@123_reg.ro"
target="_blank">test@123_reg.ro</a>'correctly.<span class=""><font color="#888888"><br /><br /> -- <br /> Artur
Zakirov<br/> Postgres Professional: <a href="http://www.postgrespro.com" rel="noreferrer"
target="_blank">http://www.postgrespro.com</a><br/> Russian Postgres Company<br /></font></span><br /><br /> --<br />
Sentvia pgsql-hackers mailing list (<a href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br
/>To make changes to your subscription:<br /><a href="http://www.postgresql.org/mailpref/pgsql-hackers"
rel="noreferrer"target="_blank">http://www.postgresql.org/mailpref/pgsql-hackers</a><br /><br /></blockquote></div><br
/></div><divclass="gmail_extra">Hello,<br /><br /></div><div class="gmail_extra">Should we added tests for this
case?<br/><br /><a href="http://123_reg.ro">123_reg.ro</a> is not valid domain name, bacause of symbol "_"<br
/></div><divclass="gmail_extra"><br /><a
href="https://tools.ietf.org/html/rfc1035">https://tools.ietf.org/html/rfc1035</a>page 8.<br /><br /></div><div
class="gmail_extra">DmitriiGolub<br /></div></div> 

Re: unexpected result from to_tsvector

От
Artur Zakirov
Дата:
Hello,

On 07.03.2016 23:55, Dmitrii Golub wrote:
>
>
> Hello,
>
> Should we added tests for this case?

I think we should. I have added tests for teodor@123-stack.net and
123@stack.net emails.

>
> 123_reg.ro <http://123_reg.ro> is not valid domain name, bacause of
> symbol "_"
>
> https://tools.ietf.org/html/rfc1035 page 8.
>
> Dmitrii Golub

Thank you for the information. Fixed.

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Вложения

Re: unexpected result from to_tsvector

От
Dmitrii Golub
Дата:
<div dir="ltr"><div class="gmail_extra"><br /><div class="gmail_quote">2016-03-08 0:46 GMT+03:00 Artur Zakirov <span
dir="ltr"><<ahref="mailto:a.zakirov@postgrespro.ru" target="_blank">a.zakirov@postgrespro.ru</a>></span>:<br
/><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello,<span
class=""><br/><br /> On 07.03.2016 23:55, Dmitrii Golub wrote:<br /><blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px#ccc solid;padding-left:1ex"><br /><br /> Hello,<br /><br /> Should we added tests for this
case?<br/></blockquote><br /></span> I think we should. I have added tests for <a href="mailto:teodor@123-stack.net"
target="_blank">teodor@123-stack.net</a>and <a href="mailto:123@stack.net" target="_blank">123@stack.net</a> emails.<br
/><br/><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br /><a
href="http://123_reg.ro"rel="noreferrer" target="_blank">123_reg.ro</a> <<a href="http://123_reg.ro"
rel="noreferrer"target="_blank">http://123_reg.ro</a>> is not valid domain name, bacause of<span class=""><br />
symbol"_"<br /><br /><a href="https://tools.ietf.org/html/rfc1035" rel="noreferrer"
target="_blank">https://tools.ietf.org/html/rfc1035</a>page 8.<br /><br /> Dmitrii Golub<br /></span></blockquote><br
/>Thank you for the information. Fixed.<div class="HOEnZb"><div class="h5"><br /><br /> -- <br /> Artur Zakirov<br />
PostgresProfessional: <a href="http://www.postgrespro.com" rel="noreferrer"
target="_blank">http://www.postgrespro.com</a><br/> Russian Postgres Company<br /></div></div></blockquote></div><br
/></div><divclass="gmail_extra">Looks good to me<br /><br /></div><div class="gmail_extra">Dmitrii Golub<br
/></div></div>

Re: unexpected result from to_tsvector

От
"Shulgin, Oleksandr"
Дата:
On Mon, Mar 7, 2016 at 10:46 PM, Artur Zakirov <a.zakirov@postgrespro.ru> wrote:
Hello,

On 07.03.2016 23:55, Dmitrii Golub wrote:


Hello,

Should we added tests for this case?

I think we should. I have added tests for teodor@123-stack.net and 123@stack.net emails.


123_reg.ro <http://123_reg.ro> is not valid domain name, bacause of
symbol "_"

https://tools.ietf.org/html/rfc1035 page 8.

Dmitrii Golub

Thank you for the information. Fixed.

Hm...  now that doesn't look all that consistent to me (after applying the patch):

=# select ts_debug('simple', 'aaa@123-yyy.zzz');
                                 ts_debug                                  
---------------------------------------------------------------------------
 (email,"Email address",aaa@123-yyy.zzz,{simple},simple,{aaa@123-yyy.zzz})
(1 row)

But:

=# select ts_debug('simple', 'aaa@123_yyy.zzz');
                        ts_debug                         
---------------------------------------------------------
 (asciiword,"Word, all ASCII",aaa,{simple},simple,{aaa})
 (blank,"Space symbols",@,{},,)
 (uint,"Unsigned integer",123,{simple},simple,{123})
 (blank,"Space symbols",_,{},,)
 (host,Host,yyy.zzz,{simple},simple,{yyy.zzz})
(5 rows)

One can also see that if we only keep the domain name, the result is similar:

=# select ts_debug('simple', '123-yyy.zzz');
                       ts_debug                        
-------------------------------------------------------
 (host,Host,123-yyy.zzz,{simple},simple,{123-yyy.zzz})
(1 row)

=# select ts_debug('simple', '123_yyy.zzz');
                      ts_debug                       
-----------------------------------------------------
 (uint,"Unsigned integer",123,{simple},simple,{123})
 (blank,"Space symbols",_,{},,)
 (host,Host,yyy.zzz,{simple},simple,{yyy.zzz})
(3 rows)

But, this only has to do with 123 being recognized as a number, not with the underscore:

=# select ts_debug('simple', 'abc_yyy.zzz');
                       ts_debug                        
-------------------------------------------------------
 (host,Host,abc_yyy.zzz,{simple},simple,{abc_yyy.zzz})
(1 row)

=# select ts_debug('simple', '1abc_yyy.zzz');
                       ts_debug                        
-------------------------------------------------------
 (host,Host,1abc_yyy.zzz,{simple},simple,{1abc_yyy.zzz})
(1 row)

In fact, the 123-yyy.zzz domain is not valid either according to the RFC (subdomain can't start with a digit), but since we already allow it, should we not allow 123_yyy.zzz to be recognized as a Host?  Then why not recognize aaa@123_yyy.zzz as an email address?

Another option is to prohibit underscore in recognized host names, but this has more breakage potential IMO.

--
Alex

Re: unexpected result from to_tsvector

От
Artur Zakirov
Дата:
On 14.03.2016 16:22, Shulgin, Oleksandr wrote:
>
> Hm...  now that doesn't look all that consistent to me (after applying
> the patch):
>
> =# select ts_debug('simple', 'aaa@123-yyy.zzz');
>                                   ts_debug
> ---------------------------------------------------------------------------
>   (email,"Email address",aaa@123-yyy.zzz,{simple},simple,{aaa@123-yyy.zzz})
> (1 row)
>
> But:
>
> =# select ts_debug('simple', 'aaa@123_yyy.zzz');
>                          ts_debug
> ---------------------------------------------------------
>   (asciiword,"Word, all ASCII",aaa,{simple},simple,{aaa})
>   (blank,"Space symbols",@,{},,)
>   (uint,"Unsigned integer",123,{simple},simple,{123})
>   (blank,"Space symbols",_,{},,)
>   (host,Host,yyy.zzz,{simple},simple,{yyy.zzz})
> (5 rows)
>
> One can also see that if we only keep the domain name, the result is
> similar:
>
> =# select ts_debug('simple', '123-yyy.zzz');
>                         ts_debug
> -------------------------------------------------------
>   (host,Host,123-yyy.zzz,{simple},simple,{123-yyy.zzz})
> (1 row)
>
> =# select ts_debug('simple', '123_yyy.zzz');
>                        ts_debug
> -----------------------------------------------------
>   (uint,"Unsigned integer",123,{simple},simple,{123})
>   (blank,"Space symbols",_,{},,)
>   (host,Host,yyy.zzz,{simple},simple,{yyy.zzz})
> (3 rows)
>
> But, this only has to do with 123 being recognized as a number, not with
> the underscore:
>
> =# select ts_debug('simple', 'abc_yyy.zzz');
>                         ts_debug
> -------------------------------------------------------
>   (host,Host,abc_yyy.zzz,{simple},simple,{abc_yyy.zzz})
> (1 row)
>
> =# select ts_debug('simple', '1abc_yyy.zzz');
>                         ts_debug
> -------------------------------------------------------
>   (host,Host,1abc_yyy.zzz,{simple},simple,{1abc_yyy.zzz})
> (1 row)
>
> In fact, the 123-yyy.zzz domain is not valid either according to the RFC
> (subdomain can't start with a digit), but since we already allow it,
> should we not allow 123_yyy.zzz to be recognized as a Host?  Then why
> not recognize aaa@123_yyy.zzz as an email address?
>
> Another option is to prohibit underscore in recognized host names, but
> this has more breakage potential IMO.
>
> --
> Alex
>

It seems reasonable to me. I like more first option. But I am not 
confident that we should allow 123_yyy.zzz to be recognized as a Host.

By the way, in this question http://webmasters.stackexchange.com/a/775 
you can see examples of domain names with numbers (but not subdomains).

If there are not objections from others, I will send a new patch today 
later or tomorrow with 123_yyy.zzz recognizing.

-- 
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: unexpected result from to_tsvector

От
Artur Zakirov
Дата:
I found the discussion about allowing an underscore in emails
http://www.postgresql.org/message-id/200908281359.n7SDxfaf044556@wwwmaster.postgresql.org

That bug report is about recognizing an underscore in the local part of
an email. And is not about recognizing an underscore in a domain name.
But that patch allows an underscore in recognized host names also.

I am not good in RFC, so I put excerpt from Wikipedia
https://en.wikipedia.org/wiki/Email_address:

> The local-part of the email address may use any of these ASCII characters:
>
> Uppercase and lowercase Latin letters (A–Z, a–z) (ASCII: 65–90, 97–122)
> Digits 0 to 9 (ASCII: 48–57)
> These special characters: !#$%&'*+-/=?^_`{|}~ (ASCII: 33, 35–39, 42, 43, 45, 47, 61, 63, 94–96, 123–126)
> Character . (dot, period, full stop), ASCII 46, provided that it is not the first or last character, and provided
alsothat it does not appear consecutively (e.g. John..Doe@example.com is not allowed). 
> Other special characters are allowed with restrictions (they are only allowed inside a quoted string, as described in
theparagraph below, and in addition, a backslash or double-quote must be preceded by a backslash). These characters
are:
> Space and "(),:;<>@[\] (ASCII: 32, 34, 40, 41, 44, 58, 59, 60, 62, 64, 91–93)
> Comments are allowed with parentheses at either end of the local part; e.g. john.smith(comment)@example.com and
(comment)john.smith@example.comare both equivalent to john.smith@example.com. 

and https://en.wikipedia.org/wiki/Hostname#Restrictions_on_valid_host_names

> The Internet standards (Requests for Comments) for protocols mandate that component hostname labels may contain only
theASCII letters 'a' through 'z' (in a case-insensitive manner),the digits '0' through '9', and the hyphen ('-'). The
originalspecification of hostnames in RFC 952, mandated that labels could not start with a digit or with a hyphen, and
mustnot end with a hyphen. However, a subsequent specification (RFC 1123) permitted hostname labels to start with
digits.No other symbols, punctuation characters, or white space are permitted. 

Hence the valid emails is (I might be wrong):

123-s@sample.com
123_s@sample.com
123@123-sample.com
123@123sample.com

The attached patch allow them to be recognized as a email. But this
patch does not prohibit underscore in recognized host names.

As a result this patch gives the following results with underscores:

=# select * from ts_debug('simple', 'aaa@123_yyy.zzz');
  alias |  description  |      token      | dictionaries | dictionary |
      lexemes
-------+---------------+-----------------+--------------+------------+-------------------
  email | Email address | aaa@123_yyy.zzz | {simple}     | simple     |
{aaa@123_yyy.zzz}
(1 row)

=# select * from ts_debug('simple', '123_yyy.zzz');
  alias | description |    token    | dictionaries | dictionary |
lexemes
-------+-------------+-------------+--------------+------------+---------------
  host  | Host        | 123_yyy.zzz | {simple}     | simple     |
{123_yyy.zzz}
(1 row)

On 14.03.2016 17:45, Artur Zakirov wrote:
> On 14.03.2016 16:22, Shulgin, Oleksandr wrote:
>>
>> Hm...  now that doesn't look all that consistent to me (after applying
>> the patch):
>>
>> =# select ts_debug('simple', 'aaa@123-yyy.zzz');
>>                                   ts_debug
>> ---------------------------------------------------------------------------
>>
>>   (email,"Email
>> address",aaa@123-yyy.zzz,{simple},simple,{aaa@123-yyy.zzz})
>> (1 row)
>>
>> But:
>>
>> =# select ts_debug('simple', 'aaa@123_yyy.zzz');
>>                          ts_debug
>> ---------------------------------------------------------
>>   (asciiword,"Word, all ASCII",aaa,{simple},simple,{aaa})
>>   (blank,"Space symbols",@,{},,)
>>   (uint,"Unsigned integer",123,{simple},simple,{123})
>>   (blank,"Space symbols",_,{},,)
>>   (host,Host,yyy.zzz,{simple},simple,{yyy.zzz})
>> (5 rows)
>>
>> One can also see that if we only keep the domain name, the result is
>> similar:
>>
>> =# select ts_debug('simple', '123-yyy.zzz');
>>                         ts_debug
>> -------------------------------------------------------
>>   (host,Host,123-yyy.zzz,{simple},simple,{123-yyy.zzz})
>> (1 row)
>>
>> =# select ts_debug('simple', '123_yyy.zzz');
>>                        ts_debug
>> -----------------------------------------------------
>>   (uint,"Unsigned integer",123,{simple},simple,{123})
>>   (blank,"Space symbols",_,{},,)
>>   (host,Host,yyy.zzz,{simple},simple,{yyy.zzz})
>> (3 rows)
>>
>> But, this only has to do with 123 being recognized as a number, not with
>> the underscore:
>>
>> =# select ts_debug('simple', 'abc_yyy.zzz');
>>                         ts_debug
>> -------------------------------------------------------
>>   (host,Host,abc_yyy.zzz,{simple},simple,{abc_yyy.zzz})
>> (1 row)
>>
>> =# select ts_debug('simple', '1abc_yyy.zzz');
>>                         ts_debug
>> -------------------------------------------------------
>>   (host,Host,1abc_yyy.zzz,{simple},simple,{1abc_yyy.zzz})
>> (1 row)
>>
>> In fact, the 123-yyy.zzz domain is not valid either according to the RFC
>> (subdomain can't start with a digit), but since we already allow it,
>> should we not allow 123_yyy.zzz to be recognized as a Host?  Then why
>> not recognize aaa@123_yyy.zzz as an email address?
>>
>> Another option is to prohibit underscore in recognized host names, but
>> this has more breakage potential IMO.
>>
>> --
>> Alex
>>
>
> It seems reasonable to me. I like more first option. But I am not
> confident that we should allow 123_yyy.zzz to be recognized as a Host.
>
> By the way, in this question http://webmasters.stackexchange.com/a/775
> you can see examples of domain names with numbers (but not subdomains).
>
> If there are not objections from others, I will send a new patch today
> later or tomorrow with 123_yyy.zzz recognizing.
>


--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Вложения

Re: unexpected result from to_tsvector

От
Dmitrii Golub
Дата:
2016-03-14 16:22 GMT+03:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:
On Mon, Mar 7, 2016 at 10:46 PM, Artur Zakirov <a.zakirov@postgrespro.ru> wrote:
Hello,

On 07.03.2016 23:55, Dmitrii Golub wrote:


Hello,

Should we added tests for this case?

I think we should. I have added tests for teodor@123-stack.net and 123@stack.net emails.


123_reg.ro <http://123_reg.ro> is not valid domain name, bacause of
symbol "_"

https://tools.ietf.org/html/rfc1035 page 8.

Dmitrii Golub

Thank you for the information. Fixed.

Hm...  now that doesn't look all that consistent to me (after applying the patch):

=# select ts_debug('simple', 'aaa@123-yyy.zzz');
                                 ts_debug                                  
---------------------------------------------------------------------------
 (email,"Email address",aaa@123-yyy.zzz,{simple},simple,{aaa@123-yyy.zzz})
(1 row)

But:

=# select ts_debug('simple', 'aaa@123_yyy.zzz');
                        ts_debug                         
---------------------------------------------------------
 (asciiword,"Word, all ASCII",aaa,{simple},simple,{aaa})
 (blank,"Space symbols",@,{},,)
 (uint,"Unsigned integer",123,{simple},simple,{123})
 (blank,"Space symbols",_,{},,)
 (host,Host,yyy.zzz,{simple},simple,{yyy.zzz})
(5 rows)

One can also see that if we only keep the domain name, the result is similar:

=# select ts_debug('simple', '123-yyy.zzz');
                       ts_debug                        
-------------------------------------------------------
 (host,Host,123-yyy.zzz,{simple},simple,{123-yyy.zzz})
(1 row)

=# select ts_debug('simple', '123_yyy.zzz');
                      ts_debug                       
-----------------------------------------------------
 (uint,"Unsigned integer",123,{simple},simple,{123})
 (blank,"Space symbols",_,{},,)
 (host,Host,yyy.zzz,{simple},simple,{yyy.zzz})
(3 rows)

But, this only has to do with 123 being recognized as a number, not with the underscore:

=# select ts_debug('simple', 'abc_yyy.zzz');
                       ts_debug                        
-------------------------------------------------------
 (host,Host,abc_yyy.zzz,{simple},simple,{abc_yyy.zzz})
(1 row)

=# select ts_debug('simple', '1abc_yyy.zzz');
                       ts_debug                        
-------------------------------------------------------
 (host,Host,1abc_yyy.zzz,{simple},simple,{1abc_yyy.zzz})
(1 row)

In fact, the 123-yyy.zzz domain is not valid either according to the RFC (subdomain can't start with a digit), but since we already allow it, should we not allow 123_yyy.zzz to be recognized as a Host?  Then why not recognize aaa@123_yyy.zzz as an email address?

Another option is to prohibit underscore in recognized host names, but this has more breakage potential IMO.

--
Alex


Alex, actually subdomain can start with digit, try it.

Re: unexpected result from to_tsvector

От
"Shulgin, Oleksandr"
Дата:
<p dir="ltr">On Mar 20, 2016 01:09, "Dmitrii Golub" <<a
href="mailto:dmitrii.golub@gmail.com">dmitrii.golub@gmail.com</a>>wrote:<br /> ><br /> > 2016-03-14 16:22
GMT+03:00Shulgin, Oleksandr <<a href="mailto:oleksandr.shulgin@zalando.de">oleksandr.shulgin@zalando.de</a>>:<br
/>>><br /> >> In fact, the 123-yyy.zzz domain is not valid either according to the RFC (subdomain can't
startwith a digit), but since we already allow it, should we not allow 123_yyy.zzz to be recognized as a Host?  Then
whynot recognize aaa@123_yyy.zzz as an email address?<br /> >><br /> >> Another option is to prohibit
underscorein recognized host names, but this has more breakage potential IMO.<br /> >><br /> ><br /> >
Alex,actually subdomain can start with digit, <p dir="ltr">Not according to the RFC you have linked to.<p
dir="ltr">>try it.<p dir="ltr">What do you mean? Try it with ts_debug()? I already did, you could see me referring
tothis example above: 123-yyy.zzz<p dir="ltr">--<br /> Alex 

Re: unexpected result from to_tsvector

От
Tom Lane
Дата:
"Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de> writes:
> On Mar 20, 2016 01:09, "Dmitrii Golub" <dmitrii.golub@gmail.com> wrote:
>> Alex, actually subdomain can start with digit,

> Not according to the RFC you have linked to.

The powers-that-be relaxed that some time ago; I assume there's a newer
RFC.  For instance, "163.com" is a real domain:

$ dig 163.com
...
;; QUESTION SECTION:
;163.com.                       IN      A

;; ANSWER SECTION:
163.com.                600     IN      A       123.58.180.8
163.com.                600     IN      A       123.58.180.7

;; AUTHORITY SECTION:
163.com.                4516    IN      NS      ns3.nease.net.
163.com.                4516    IN      NS      ns2.nease.net.
...

$ whois 163.com
...
Registry Registrant ID: 
Registrant Name: Domain Admin
Registrant Organization: Guangzhou NetEase Computer System Co., Ltd
Registrant Street: No. 16, Keyun Road, Tianhe District, 
Registrant City: Guangzhou
Registrant State/Province: Guangdong
Registrant Postal Code: 510665
Registrant Country: CN
Registrant Phone: +86.2085106370
Registrant Phone Ext: 
Registrant Fax: +86.2085106370
Registrant Fax Ext: 
Registrant Email: nsadmin@corp.netease.com
...
        regards, tom lane



Re: unexpected result from to_tsvector

От
David Steele
Дата:
Hi Artur,

On 3/20/16 10:42 AM, Tom Lane wrote:
> "Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de> writes:
>> On Mar 20, 2016 01:09, "Dmitrii Golub" <dmitrii.golub@gmail.com> wrote:
>>> Alex, actually subdomain can start with digit,
>
>> Not according to the RFC you have linked to.
>
> The powers-that-be relaxed that some time ago; I assume there's a newer
> RFC.  For instance, "163.com" is a real domain:

You marked this patch "needs review" and then a few minutes later 
changed it to "waiting on author".

If this was a mistake please change it back to "needs review".  If you 
really are working on a new patch when can we expect that?

Thanks,
-- 
-David
david@pgmasters.net



Re: unexpected result from to_tsvector

От
Artur Zakirov
Дата:
On 25.03.2016 18:19, David Steele wrote:
> Hi Artur,
>
> On 3/20/16 10:42 AM, Tom Lane wrote:
>> "Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de> writes:
>>> On Mar 20, 2016 01:09, "Dmitrii Golub" <dmitrii.golub@gmail.com> wrote:
>>>> Alex, actually subdomain can start with digit,
>>
>>> Not according to the RFC you have linked to.
>>
>> The powers-that-be relaxed that some time ago; I assume there's a newer
>> RFC.  For instance, "163.com" is a real domain:
>
> You marked this patch "needs review" and then a few minutes later
> changed it to "waiting on author".
>
> If this was a mistake please change it back to "needs review".  If you
> really are working on a new patch when can we expect that?
>
> Thanks,

Hi,

The previous patch is current, which can be commited.

I mark this patch as "needs review", because I noticed that the patch 
was marked as "waiting on author". And I thought that I forgot to mark 
as "need review".

But then I noticed that Robert Haas marked the patch as "waiting on 
author" after my answer, and I returned "waiting on author". But I cant 
find any questions or comments to me after my last answer.

Actually I think that this patch should be marked as "need review".

-- 
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: unexpected result from to_tsvector

От
David Steele
Дата:
On 3/25/16 12:14 PM, Artur Zakirov wrote:
> On 25.03.2016 18:19, David Steele wrote:
>> Hi Artur,
>>
>> On 3/20/16 10:42 AM, Tom Lane wrote:
>>> "Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de> writes:
>>>> On Mar 20, 2016 01:09, "Dmitrii Golub" <dmitrii.golub@gmail.com> wrote:
>>>>> Alex, actually subdomain can start with digit,
>>>
>>>> Not according to the RFC you have linked to.
>>>
>>> The powers-that-be relaxed that some time ago; I assume there's a newer
>>> RFC.  For instance, "163.com" is a real domain:
>>
>> You marked this patch "needs review" and then a few minutes later
>> changed it to "waiting on author".
>>
>> If this was a mistake please change it back to "needs review".  If you
>> really are working on a new patch when can we expect that?
>>
>> Thanks,
>
> Hi,
>
> The previous patch is current, which can be commited.
>
> I mark this patch as "needs review", because I noticed that the patch
> was marked as "waiting on author". And I thought that I forgot to mark
> as "need review".
>
> But then I noticed that Robert Haas marked the patch as "waiting on
> author" after my answer, and I returned "waiting on author". But I cant
> find any questions or comments to me after my last answer.
>
> Actually I think that this patch should be marked as "need review".

Done.

-- 
-David
david@pgmasters.net



Re: unexpected result from to_tsvector

От
Artur Zakirov
Дата:
On 25.03.2016 19:15, David Steele wrote:
> On 3/25/16 12:14 PM, Artur Zakirov wrote:
>> On 25.03.2016 18:19, David Steele wrote:
>>> Hi Artur,
>>>
>>> On 3/20/16 10:42 AM, Tom Lane wrote:
>>>> "Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de> writes:
>>>>> On Mar 20, 2016 01:09, "Dmitrii Golub" <dmitrii.golub@gmail.com>
>>>>> wrote:
>>>>>> Alex, actually subdomain can start with digit,
>>>>
>>>>> Not according to the RFC you have linked to.
>>>>
>>>> The powers-that-be relaxed that some time ago; I assume there's a newer
>>>> RFC.  For instance, "163.com" is a real domain:
>>>
>>> You marked this patch "needs review" and then a few minutes later
>>> changed it to "waiting on author".
>>>
>>> If this was a mistake please change it back to "needs review".  If you
>>> really are working on a new patch when can we expect that?
>>>
>>> Thanks,
>>
>> Hi,
>>
>> The previous patch is current, which can be commited.
>>
>> I mark this patch as "needs review", because I noticed that the patch
>> was marked as "waiting on author". And I thought that I forgot to mark
>> as "need review".
>>
>> But then I noticed that Robert Haas marked the patch as "waiting on
>> author" after my answer, and I returned "waiting on author". But I cant
>> find any questions or comments to me after my last answer.
>>
>> Actually I think that this patch should be marked as "need review".
>
> Done.
>

Thank you!

-- 
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: unexpected result from to_tsvector

От
"Shulgin, Oleksandr"
Дата:
On Sun, Mar 20, 2016 at 3:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de> writes:
> On Mar 20, 2016 01:09, "Dmitrii Golub" <dmitrii.golub@gmail.com> wrote:
>> Alex, actually subdomain can start with digit,

> Not according to the RFC you have linked to.

The powers-that-be relaxed that some time ago; I assume there's a newer
RFC.  For instance, "163.com" is a real domain:

$ dig 163.com
...
;; QUESTION SECTION:
;163.com.                       IN      A

Hm, indeed.   Unfortunately, it is not quite easy to find "the" new RFC, there was quite a number of correcting and extending RFCs issued over the last (almost) 30 years, which is not that surprising...

Are we going to do something about it?  Is it likely that relaxing/changing the rules on our side will break any possible workarounds that people might have employed to make the search work like they want it to work?

--
Alex

Re: unexpected result from to_tsvector

От
Artur Zakirov
Дата:
On 29.03.2016 19:17, Shulgin, Oleksandr wrote:
>
> Hm, indeed.   Unfortunately, it is not quite easy to find "the" new RFC,
> there was quite a number of correcting and extending RFCs issued over
> the last (almost) 30 years, which is not that surprising...
>
> Are we going to do something about it?  Is it likely that
> relaxing/changing the rules on our side will break any possible
> workarounds that people might have employed to make the search work like
> they want it to work?

Do you mean here workarounds to recognize such values as 
'test@123-reg.ro' as an email address? Actually I do not see any 
workarounds except a patch to PostgreSQL.

By the way, Teodor committed the patch yesterday.

>
> --
> Alex
>


-- 
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: unexpected result from to_tsvector

От
"Shulgin, Oleksandr"
Дата:
On Wed, Mar 30, 2016 at 10:17 AM, Artur Zakirov <a.zakirov@postgrespro.ru> wrote:
On 29.03.2016 19:17, Shulgin, Oleksandr wrote:

Hm, indeed.   Unfortunately, it is not quite easy to find "the" new RFC,
there was quite a number of correcting and extending RFCs issued over
the last (almost) 30 years, which is not that surprising...

Are we going to do something about it?  Is it likely that
relaxing/changing the rules on our side will break any possible
workarounds that people might have employed to make the search work like
they want it to work?

Do you mean here workarounds to recognize such values as 'test@123-reg.ro' as an email address? Actually I do not see any workarounds except a patch to PostgreSQL.

No, more like disallowing '_' in the host/domain- names.  Anyway, that is pure speculation on my part.

By the way, Teodor committed the patch yesterday.

I've seen that after posting my reply to the list ;-) 

--
Alex