Обсуждение: Teaching regex operators about collations

Поиск
Список
Период
Сортировка

Teaching regex operators about collations

От
Tom Lane
Дата:
Since ILIKE now responds to collations, it would be nice if the
case-insensitive regex operators did too.  The hard part of that is
getting the information from src/backend/utils/adt/regexp.c to
src/backend/regex/regc_locale.c.  In principle we could probably add
a field to the data structures carried around in the regex library,
but that is looking a bit invasive, and since we share that code with
the Tcl project I'm loath to change it too much.  So what I'm thinking
about is just having a couple of static variables in regc_locale.c that
we initialize before each use of the regex library.  This is a bit
grotty, but there's no need for the regex library to be re-entrant,
so it wouldn't cause any problems until that improbable day when
somebody succeeds in multi-threading the backend.

Comments?
        regards, tom lane


Re: Teaching regex operators about collations

От
"David E. Wheeler"
Дата:
On Apr 9, 2011, at 2:40 PM, Tom Lane wrote:

> Since ILIKE now responds to collations, it would be nice if the
> case-insensitive regex operators did too.  The hard part of that is
> getting the information from src/backend/utils/adt/regexp.c to
> src/backend/regex/regc_locale.c.  In principle we could probably add
> a field to the data structures carried around in the regex library,
> but that is looking a bit invasive, and since we share that code with
> the Tcl project I'm loath to change it too much.  So what I'm thinking
> about is just having a couple of static variables in regc_locale.c that
> we initialize before each use of the regex library.  This is a bit
> grotty, but there's no need for the regex library to be re-entrant,
> so it wouldn't cause any problems until that improbable day when
> somebody succeeds in multi-threading the backend.
>
> Comments?

Sounds reasonable. Is this something that CITEXT could take advantage of somehow? Right now, its using a nasty hack to
makeILIKE and friends work properly… 

Best,

David



Re: Teaching regex operators about collations

От
Tom Lane
Дата:
I wrote:
>> Since ILIKE now responds to collations, it would be nice if the
>> case-insensitive regex operators did too.  The hard part of that is
>> getting the information from src/backend/utils/adt/regexp.c to
>> src/backend/regex/regc_locale.c.  In principle we could probably add
>> a field to the data structures carried around in the regex library,
>> but that is looking a bit invasive, and since we share that code with
>> the Tcl project I'm loath to change it too much.  So what I'm thinking
>> about is just having a couple of static variables in regc_locale.c that
>> we initialize before each use of the regex library.  This is a bit
>> grotty, but there's no need for the regex library to be re-entrant,
>> so it wouldn't cause any problems until that improbable day when
>> somebody succeeds in multi-threading the backend.

In the event, it seemed least messy to store a collation Oid in struct
regex_t, but not to pass it down via the regex library's "struct vars"
private data structure.  So the interface to the regex library is clean
and if anyone ever wants to get rid of the static variables, it'll just
be necessary to fix the innards.

"David E. Wheeler" <david@kineticode.com> writes:
> Sounds reasonable. Is this something that CITEXT could take advantage of somehow? Right now, its using a nasty hack
tomake ILIKE and friends work properly�
 

You mean the alias operators?  Doesn't seem that bad, and anyway you'd
still need aliases to inject a non-default collation, I think.

But more to the point, we're still a long way from being able to allow a
collation that has special equality semantics, so I don't foresee being
able to replace citext with a "case insensitive collation" anytime soon.
        regards, tom lane