Text search parser's treatment of URLs and emails

Поиск
Список
Период
Сортировка
От Thom Brown
Тема Text search parser's treatment of URLs and emails
Дата
Msg-id AANLkTikf=K=pen6M4bWKkt1QOzh8mbrEXKOYJ=H0qCMh@mail.gmail.com
обсуждение исходный текст
Ответы Re: Text search parser's treatment of URLs and emails  (Thom Brown <thom@linux.com>)
Re: Text search parser's treatment of URLs and emails  (Bruce Momjian <bruce@momjian.us>)
Re: Text search parser's treatment of URLs and emails  (Bruce Momjian <bruce@momjian.us>)
Список pgsql-general
Hi,

I noticed that if I run this:

SELECT alias, description, token FROM
ts_debug('http://www.postgresql.org:2345/directory/page.html?version=9.1&build=alpha1#summary');

I get:

  alias   |  description  |                              token
----------+---------------+-----------------------------------------------------------------
 protocol | Protocol head | http://
 url      | URL           |
www.postgresql.org:2345/directory/page.html?version=9.1&build=alpha1#summary
 host     | Host          | www.postgresql.org:2345
 url_path | URL path      |
/directory/page.html?version=9.1&build=alpha1#summary
(4 rows)


It could be me being picky, but I don't regard parameters or page
fragments as part of the URL path.  Ideally, I'd sort of expect:

    alias     |  description  |                              token
--------------+---------------+-----------------------------------------------------------------
 protocol     | Protocol head | http://
 url          | URL           |
www.postgresql.org:2345/directory/page.html?version=9.1&build=alpha1#summary
 host         | Host          | www.postgresql.org
 port         | Port          | 2345
 url_path     | URL path      | /directory/page.html
 query_string | Query string  | version=9.1&build=alpha1
 fragment     | Page fragment | summary
(7 rows)

... of course that's if there was support for query strings and page
fragments, which there isn't.  But if changes were made to support my
definition of a URL path, they'd have to be considered breaking
changes.

But my main gripe is with the name "url_path".

Also:

SELECT alias, description, token FROM ts_debug('myname+priority@gmail.com');

Yields:

   alias   |   description   |       token
-----------+-----------------+--------------------
 asciiword | Word, all ASCII | myname
 blank     | Space symbols   | +
 email     | Email address   | priority@gmail.com
(3 rows)

The entire string I entered is a valid email address, and isn't
totally uncommon.  Shouldn't that take such email address styles be
taken into account?  The example above incorrectly identifies the
email address since the real destination address would most likely be
myname@gmail.com.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

В списке pgsql-general по дате отправления:

Предыдущее
От: John R Pierce
Дата:
Сообщение: Re: error while autovacuuming
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Memory Errors