Re: BUG #6375: tsearch does not recognize all valid emails

Поиск
Список
Период
Сортировка
От Valentine Gogichashvili
Тема Re: BUG #6375: tsearch does not recognize all valid emails
Дата
Msg-id CAP93muXYJLXSVpQ68UbvtrikZYOkXcG_tRSSB4L3skZZMHbKBw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #6375: tsearch does not recognize all valid emails  (Bruce Momjian <bruce@momjian.us>)
Список pgsql-bugs
>
> This email thread from 2010 has a similar problem:
>        http://archives.postgresql.org/pgsql-hackers/2010-10/msg00772.php
> What is limiting a fix for this is the breaking of existing behavior,
> and the breaking of indexes used during pg_upgrade.
> I have added your email to the existing TODO item:
>        http://wiki.postgresql.org/wiki/Todo#Text_Search
>        Improve handling of dash and plus signs in email address user
> names, and
>        perhaps improve URL parsing
>
> http://archives.postgresql.org/pgsql-hackers/2010-10/msg00772.php
>            tsearch does not recognize all valid emails



Thank you Bruce,

as an intermediate solution, so that people, who have the same problem and
search the mailing archives, can get at least some way to overcome this
issue, I am rewriting such emails when building tverctor, and use rewrite
the tsearch queries as well:

code from a function, that builds the tsvector:

  select (select string_agg(
                 case when CASE WHEN n in ( 1, s ) -- all special outer
chars should be escaped
                                THEN c = ANY
('{.,!,#,$,%,&,'',*,+,/,=,?,_,`,"{",|,"}",~,^,-}'::text[])
                                ELSE c = ANY
('{!,#,$,%,&,'',*,+,/,=,?,`,"{",|,"}",~,^}'::text[])
                           END
                      then 'BCHR' || ascii(c)::text || 'END'
                      else c
                      end, '')
            from ( select row_number() over() as n, count(1) over() as s, c
from regexp_split_to_table(split_part(p, '@', 1 ), '') as e(c) ) as ee
          ) || '@' || split_part(p, '@', 2 )
         into email
    from ( select p_customer_user_row.email::text as p ) as e;

  return to_tsvector('simple', coalesce( p_customer_user_row.first_name,
'') ) ||
         to_tsvector('simple', coalesce( p_customer_user_row.last_name, '')
) ||
         to_tsvector('simple', coalesce( p_customer_user_row.customer_id,
'') ) ||
         to_tsvector('simple', coalesce( email, '') );


code from a function, that builds a tsquery:

  RETURN (select to_tsquery('simple',
                 string_agg(
                   case when p ~ '^[^@]+@[^@]+$' -- has only one @ inside
                        then (select string_agg(
                                     case when CASE WHEN n in ( 1, s ) --
all special outer chars should be escaped
                                                    THEN c = ANY
('{.,!,#,$,%,&,'',*,+,/,=,?,_,`,"{",|,"}",~,^,-}'::text[])
                                                    ELSE c = ANY
('{!,#,$,%,&,'',*,+,/,=,?,`,"{",|,"}",~,^}'::text[])
                                               END
                                          then 'BCHR' || ascii(c)::text ||
'END'
                                          else c
                                          end, '')
                                from ( select row_number() over() as n,
count(1) over() as s, c from regexp_split_to_table(split_part(p, '@', 1 ),
'') as e(c) ) as ee
                             ) || '@' || split_part(p, '@', 2 )
                        else (select string_agg(token, ' & ')
                                from ( select unnest(lexemes) || ':*' as
token
                                         from ts_debug('simple', p)
                                      ) as g
                              )
                   end, ' & ' )
                 )
            from regexp_split_to_table(btrim(search_text), E'\\s+') as s(p)
         );

Cheers,

-- Valentine

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Valentine Gogichashvili
Дата:
Сообщение: Re: BUG #6661: out-of-order XID insertion in KnownAssignedXids
Следующее
От: Andres Freund
Дата:
Сообщение: Re: BUG #6661: out-of-order XID insertion in KnownAssignedXids