Обсуждение: 7.3 no longer using indexes for LIKE queries

Поиск
Список
Период
Сортировка

7.3 no longer using indexes for LIKE queries

От
Matthew Gabeler-Lee
Дата:
I have a database that has a lot of records (~6mil, iirc), with a varchar
column that often wants to be queried using something like "where acc like
'foo%'".  There is a B-Tree index on the acc column.  In 7.2.3, Postgres
would use that index to do the queries and things were lightning fast.  In
7.3, it is refusing to use the index, even if I set enable_seqscan = off,
meaning that the query that used to take a few msec now takes a few aeons.
I've run vacuum analyze on the whole database, and it doesn't change
anything.

I'm trying to cluster the table on the index (since that's the only way that
particular table is ever queried), so I can't give an explain analyze, but
here's one for another table using the same idea:

     Index "public.xfoo"
   Column    |          Type
-------------+------------------------
 stringthing | character varying(255)
btree, for table "public.foo"

xxx=> explain analyze select * from foo where stringthing like 'ABCDEF%';
 Seq Scan on foo  (cost=0.00..148503.29 rows=1 width=111) (actual
time=30512.99..32082.95 rows=4 loops=1)
   Filter: (stringthing ~~ 'ABCDEF%'::text)
 Total runtime: 32083.07 msec

For reference, there are 4,688,317 rows in this table.  Changing the select
* to select stringthing doesn't affect the query plan either.

I can coerce it to do an index scan by making the condition "stringthing >=
'ABCDEF' and stringthing < 'ABCDEG'", in which case it executes nice and
fast:

xxx=> explain analyze select * from foo where stringthing >= 'ABCDEF' and
stringthing < 'ABCDEG';
 Index Scan using xfoo on foo  (cost=0.00..6.02 rows=1 width=111) (actual
time=0.08..0.08 rows=0 loops=1)
   Index Cond: ((stringthing >= 'ABCDEF'::character varying) AND
(stringthing < 'ABCDEG'::character varying))
 Total runtime: 0.17 msec

This is an ugly workaround, though :(

Something I noticed in trying to force the use of an index scan ... setting
enable_seqscan = off here doesn't change whether it uses a seq scan, but it
makes it change the cost estimate to '100000000.00..100148503.29'; bit
weird, that, if you ask me.

    -Matt

Re: 7.3 no longer using indexes for LIKE queries

От
Stephan Szabo
Дата:
On Tue, 3 Dec 2002, Matthew Gabeler-Lee wrote:

> I have a database that has a lot of records (~6mil, iirc), with a varchar
> column that often wants to be queried using something like "where acc like
> 'foo%'".  There is a B-Tree index on the acc column.  In 7.2.3, Postgres
> would use that index to do the queries and things were lightning fast.  In
> 7.3, it is refusing to use the index, even if I set enable_seqscan = off,
> meaning that the query that used to take a few msec now takes a few aeons.
> I've run vacuum analyze on the whole database, and it doesn't change
> anything.

Did you perhaps not initdb in "C" locale?  I think pg_controldata will
tell you what it was created with.  The optimization for converting
anchored likes into an indexable form doesn't work in all locales and is
currently only done in C locale I believe.

> Something I noticed in trying to force the use of an index scan ... setting
> enable_seqscan = off here doesn't change whether it uses a seq scan, but it
> makes it change the cost estimate to '100000000.00..100148503.29'; bit
> weird, that, if you ask me.

That's probably because it doesn't believe that there's an index scan
possible here.



Re: 7.3 no longer using indexes for LIKE queries

От
Joe Conway
Дата:
Matthew Gabeler-Lee wrote:
> I have a database that has a lot of records (~6mil, iirc), with a varchar
> column that often wants to be queried using something like "where acc like
> 'foo%'".  There is a B-Tree index on the acc column.  In 7.2.3, Postgres
> would use that index to do the queries and things were lightning fast.  In
> 7.3, it is refusing to use the index, even if I set enable_seqscan = off,
> meaning that the query that used to take a few msec now takes a few aeons.
> I've run vacuum analyze on the whole database, and it doesn't change
> anything.

Whats the output of pg_controldata, specifically, what is LC_COLLATE? If it
isn't "C", then LIKE won't use your index.

See:
http://developer.postgresql.org/docs/postgres/charset.html

Joe


Re: 7.3 no longer using indexes for LIKE queries

От
Matthew Gabeler-Lee
Дата:
Thanks, it did it in en_US ... This seems like it should be a fixable thing
to me

* decides to do his day long database dump/restore tomorrow ...

    -Matt


-----Original Message-----
From: Stephan Szabo [mailto:sszabo@megazone23.bigpanda.com]
Sent: Tuesday, December 03, 2002 19:29
To: Matthew Gabeler-Lee
Cc: 'pgsql-general@postgresql.org'
Subject: Re: [GENERAL] 7.3 no longer using indexes for LIKE queries

Did you perhaps not initdb in "C" locale?  I think pg_controldata will
tell you what it was created with.  The optimization for converting
anchored likes into an indexable form doesn't work in all locales and is
currently only done in C locale I believe.


Re: 7.3 no longer using indexes for LIKE queries

От
Matthew Gabeler-Lee
Дата:
Can someone please elaborate on why it stops doing this optimization?  The
only reasons for it that I can think of for it to be unsafe in a locale is
that two characters that are not the same character still compare as being
equal (does this ever really happen?).  Otherwise, saying "foo LIKE 'xxx%'"
is really saying are the first 3 characters of foo equal to 'xxx'.
Regardless of sort order, in a sorted list of values, anything starting with
'xxx' should appear together, and thus be optimizable by an index unless
there are values of x and y for which 'x' = 'y'.  If that were the case,
though, you couldn't use the index for "foo = 'xxx'" either, which is
obviously not the case.

So, can someone enlighten me as to where I'm making a wrong assumption or
something here?


    -Matt


-----Original Message-----
From: Joe Conway [mailto:mail@joeconway.com]
Sent: Tuesday, December 03, 2002 19:31

Whats the output of pg_controldata, specifically, what is LC_COLLATE? If it
isn't "C", then LIKE won't use your index.

See:
http://developer.postgresql.org/docs/postgres/charset.html

Re: 7.3 no longer using indexes for LIKE queries

От
Tom Lane
Дата:
Matthew Gabeler-Lee <mgabelerlee@zycos.com> writes:
> Can someone please elaborate on why it stops doing this optimization?  The
> only reasons for it that I can think of for it to be unsafe in a locale is
> that two characters that are not the same character still compare as being
> equal (does this ever really happen?).

Unfortunately, locale-dependent sorting rules are much stranger than you
seem to think.  You can dig in the pghackers archives for some of the
reasons that we had to disable that optimization (we tried repeatedly to
find ways around this, btw).  Special digraph rules and multipass
comparisons are a couple of examples of things that completely break
the LIKE optimization.

            regards, tom lane

Re: 7.3 no longer using indexes for LIKE queries

От
Matthew Gabeler-Lee
Дата:
Okay, now I understand the problem better.

How about, as an intermediate solution, a list of 'sane' locales in which
the optimization applied to the C/POSIX locale works?  It should be fairly
straight forward to write a test program that will test a locale and see if
any of the common problems apply.  And I expect that several locales can be
added straight away as they are known to obey simple sorting rules.

Alternately, though I don't think this would be a very good idea, a SET
option along the lines of enable_dangerous_like_optimization or such that
would (of course) default to off.

    -Matt


-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]

Unfortunately, locale-dependent sorting rules are much stranger than you
seem to think.  You can dig in the pghackers archives for some of the
reasons that we had to disable that optimization (we tried repeatedly to
find ways around this, btw).  Special digraph rules and multipass
comparisons are a couple of examples of things that completely break
the LIKE optimization.

            regards, tom lane

Re: 7.3 no longer using indexes for LIKE queries

От
Bruce Momjian
Дата:
Matthew Gabeler-Lee wrote:
> Okay, now I understand the problem better.
>
> How about, as an intermediate solution, a list of 'sane' locales in which
> the optimization applied to the C/POSIX locale works?  It should be fairly
> straight forward to write a test program that will test a locale and see if
> any of the common problems apply.  And I expect that several locales can be
> added straight away as they are known to obey simple sorting rules.
>
> Alternately, though I don't think this would be a very good idea, a SET
> option along the lines of enable_dangerous_like_optimization or such that
> would (of course) default to off.

I think we do know the C local is OK for this.  It is just the other
ones we don't know about, and we don't even know.  Is there some test we
can write for this?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: 7.3 no longer using indexes for LIKE queries

От
Matthew Gabeler-Lee
Дата:
Looking at the list archives, it looks like there's some general classes of
problems:

1) Multibyte charsets; me not knowing much, I'd guess that initially none of
these are 'sane'.

2) longer string less than shorter string ('ABC\NNN' < 'ABC'); can test by
iterating through bytes (or potentially pairs of bytes, to catch needing two
special chars as a digraph or something) appeneded to a base string and
seeing if any of them trigger the longer-less.

3) characters that are ignored in a sort (e.g. 'ABC\NNNDEF' sorts same as
'ABCDEF'); can test again by iterating through bytes or pairs of bytes
looking for the condition.

4) accent folding; I'm not entirely sure like is supposed to do this.  I'm
going to pretend for the rest of this that the like operator shouldn't fold
accented characters.

Employing extreme paranoia and assuming great ignorance of any specialties
of the locale being tested, I think this should cover all the bases:

// pglub(x) = postgres' upper bound string for the like optimization on 'x%'
// <, >, = mean sorts less than, etc.
report insane if locale is multibyte
for every 2-byte string $a
  let $b = pglub($a)
  for every 2-byte string $c
    report insane if "$a$c" < $a or "$a$c" > $b
for some 4 byte string(s) of plain characters $d
  for every 2 byte string $e
    report insane if $d = "$d[0,1]$e$d[2,3]"
else report sane

On a fast machine, this would probably take a couple minutes[1] to run per
locale, so it would definitely be a one-time thing, not a compile-time
thing, but would be practical to run on a one time or per-release basis to
generate some source file that listed sane locales.

[1] Basically iterating through all values of 32 bits, takes my pc a few
seconds to just do 'for(unsigned n=0;n<0xffffffff;++n);', so I extrapolated
an estimate of what needs to be done beyond the counting.  The second loop
runs so many less times that its runtime is in the noise.

With some slight modifications, this sanity-finding routine could be used to
generate tables that for use by a function to find the proper bounds for
like optimization on locales that only suffered from problem #2.  Yes, this
embeds knowledge about locales into postgres, but it does it through
exhaustive examination, not usage of personal expertise.

An addendum on accent folding:

To my ignorant mind, characters with different accents are different
characters.  My French teacher certainly marked me off if I got the accents
wrong.

It seems to me that the most common place one wants to think about this is
in full text searching and the like.  In this case, maybe I'm daft, but
perhaps the thing to do is to create a functional index, where the function
being indexed strips all the accents off characters.

Does the SQL spec have anything to say on accent folding in comparisons?

    -Matt


-----Original Message-----
From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]

I think we do know the C local is OK for this.  It is just the other
ones we don't know about, and we don't even know.  Is there some test we
can write for this?

Re: 7.3 no longer using indexes for LIKE queries

От
Jean-Luc Lachance
Дата:
C_COLLATE is what is involved with accents.

How would you sort:

CÔTE
CÔTES
COTÉ
COTÉS
COTE

You can't fold accented character into non accented because unique index
would barf on CÔTE if COTE is already in.

You still need to know if 'CÔTE' < 'COTÉ' or not when you do a sort.

Collating in french, for example, is not a byte to byte compare.

If you compare words based only on their binary representation, the sort
will be wrong
CRIME before CÔTE.


JLL


Matthew Gabeler-Lee wrote:

> 4) accent folding; I'm not entirely sure like is supposed to do this.  I'm
> going to pretend for the rest of this that the like operator shouldn't fold
> accented characters.
>
> [...]
>
> It seems to me that the most common place one wants to think about this is
> in full text searching and the like.  In this case, maybe I'm daft, but
> perhaps the thing to do is to create a functional index, where the function
> being indexed strips all the accents off characters.
>
> Does the SQL spec have anything to say on accent folding in comparisons?

Re: 7.3 no longer using indexes for LIKE queries

От
Matthew Gabeler-Lee
Дата:
What I was referring to there was the idea that when a user searched for
'cote', you'd transform that into a condition along the lines of "WHERE
foldaccents(textcol) LIKE 'cote%'" or such so that failing to enter accents
wouldn't make the query not hit, which is especially desirable for someone
who wants to query something with accents, but can't type them easily on
their keyboard.

In the case of optimizing the LIKE operator, if you ask for txtcol LIKE
'CÔTE%', then (assuming the charset is 'sane'), it would do an index scan
with a filter something along the lines of (txtcol >= 'CÔTE' AND txtcol
<'CÔTF') before applying the LIKE operator.  Assuming the locale is set
right in the database, the indexing stuff will know how to colate CÔTE vs.
COTÉ; that's the whole point of setting the COLLATE thing, so that the index
*will* have them in the correct order.

My point was mostly that expecting txtcol LIKE 'CÔTE%' to match 'COTÉ' is
probably bad logic.  Now, if there is a good argument why that isn't so, I'm
all ears, but it seems to me that 'CÔTE' and 'COTÉ' are different words and
thus shouldn't match.

    -Matt


-----Original Message-----
From: Jean-Luc Lachance [mailto:jllachan@nsd.ca]
Sent: Wednesday, December 04, 2002 15:25
To: Matthew Gabeler-Lee
Cc: 'pgsql-general@postgresql.org'
Subject: Re: [GENERAL] 7.3 no longer using indexes for LIKE queries


C_COLLATE is what is involved with accents.

How would you sort:

CÔTE
CÔTES
COTÉ
COTÉS
COTE

You can't fold accented character into non accented because unique index
would barf on CÔTE if COTE is already in.

You still need to know if 'CÔTE' < 'COTÉ' or not when you do a sort.

Collating in french, for example, is not a byte to byte compare.

If you compare words based only on their binary representation, the sort
will be wrong
CRIME before CÔTE.


JLL


Matthew Gabeler-Lee wrote:

> 4) accent folding; I'm not entirely sure like is supposed to do this.  I'm
> going to pretend for the rest of this that the like operator shouldn't
fold
> accented characters.
>
> [...]
>
> It seems to me that the most common place one wants to think about this is
> in full text searching and the like.  In this case, maybe I'm daft, but
> perhaps the thing to do is to create a functional index, where the
function
> being indexed strips all the accents off characters.
>
> Does the SQL spec have anything to say on accent folding in comparisons?

Re: 7.3 no longer using indexes for LIKE queries

От
Tom Lane
Дата:
Matthew Gabeler-Lee <mgabelerlee@zycos.com> writes:
> How about, as an intermediate solution, a list of 'sane' locales in which
> the optimization applied to the C/POSIX locale works?

If you provide such a list, we'll be happy to improve locale_is_like_safe().

Offhand though, I suspect that *most* if not all non-C locales have
problems; even en_US, which has no character set issues, still manages
to insist on a multipass sort algorithm :-(.  An example on a recent
Linux release:

[tgl@g3]$ echo -e  "a\na a\naa\na  a\nab\na b" | LC_ALL=en_US sort
a
aa
a a
a  a
ab
a b

There's no way to use an index ordered like this to look for strings
beginning "a ", because the sorting of spaces depends on what comes
after them.

Making any real dent in the problem will probably require in-depth
analysis of common locale (mis)behaviors.  For example, if space sorting
is the only thing that's funny about en_US, it might make sense for us
to support a modified form of the LIKE optimization that doesn't
consider space as a "safe" prefix character (ie, we could index only
for "a" not "a ", even if the pattern is LIKE 'a %').

I have no idea exactly what sort of compromises would be effective
though.  Any localedef experts out there?

            regards, tom lane

Re: 7.3 no longer using indexes for LIKE queries

От
Matthew Gabeler-Lee
Дата:
Can anyone explain the rationale behind that sort order?  I'm guessing it
has something to do with getting sequences of words sorted 'right', but I
fail to see how that is right.  I'd think when sorting it would make sense
to have all sequences of words that start with the same word(s) together.

$ echo -e 'bobbill\nbobrob\nbob bill\nbob robber' | LC_ALL=en_US sort
bobbill
bob bill    -\
bobrob       |-- Seems these ought to be adjacent?
bob robber  -/

Perhaps I'm just old fashioned.  In any case, I don't define locale
collation orders, so *shrug*.

The pseudocode I gave for finding insane ones could also be extended to find
information for the modified LIKE optimization you suggest.  Find chars
(like space) that are ignored in the collation order, as it does, and report
them.  When picking the lower bound, drop all such chars from the beginning
of the expression, then find the upper bound from the thus shortened lower
bound.  This would work for stuff that sorted like en_US does with spaces,
but something that sorted 'a b' *before* 'ab' gets into the realms of the
impossible to deal with (there was a message in one of the lists talking
about a locale that apparently suffered from this problem, it at least
sorted 'abc\NNN' before 'abc' for some value of NNN which I forget).

In locales whose second-pass sort only moves things forwards, the least the
planner could know these 'danger' characters and only optimize if none of
them are present in the query string.  foo LIKE 'ab%' can be safely
optimized in en_US since it only has chars that sort simply.  Finding which
chars sort simply should be straight forward and only require testing 2^8 or
maybe 2^16 values.

    -Matt


-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]

[tgl@g3]$ echo -e  "a\na a\naa\na  a\nab\na b" | LC_ALL=en_US sort
a
aa
a a
a  a
ab
a b

There's no way to use an index ordered like this to look for strings
beginning "a ", because the sorting of spaces depends on what comes
after them.

Making any real dent in the problem will probably require in-depth
analysis of common locale (mis)behaviors.  For example, if space sorting
is the only thing that's funny about en_US, it might make sense for us
to support a modified form of the LIKE optimization that doesn't
consider space as a "safe" prefix character (ie, we could index only
for "a" not "a ", even if the pattern is LIKE 'a %').

I have no idea exactly what sort of compromises would be effective
though.  Any localedef experts out there?

            regards, tom lane

Re: 7.3 no longer using indexes for LIKE queries

От
Tom Lane
Дата:
Matthew Gabeler-Lee <mgabelerlee@zycos.com> writes:
> foo LIKE 'ab%' can be safely optimized in en_US since it only has
> chars that sort simply.  Finding which chars sort simply should be
> straight forward and only require testing 2^8 or maybe 2^16 values.

We have several times thought we had a solution to LIKE optimization
in non-C locales, only to discover the hard way that our solution
didn't cover all cases.  After that bitter experience, I am *deeply*
distrustful of any proposal that involves black-box testing of locale
behavior.  A black-box test will only find the behaviors that you
already know about.

The only thing that would make me comfortable is a white-box approach:
go in and read the source code to find out what collation behaviors are
possible (I'm assuming here that commercial locale libraries don't have
anything that's not present in an open-source one, eg glibc's locale
code).  Once we have that information for sure, we can think about how
to detect and deal with the different sorting oddities.  But right now
I have no confidence that we know what we need to deal with.

            regards, tom lane

Re: 7.3 no longer using indexes for LIKE queries

От
Bruno Wolff III
Дата:
On Wed, Dec 04, 2002 at 16:02:28 -0500,
  Matthew Gabeler-Lee <mgabelerlee@zycos.com> wrote:
> Can anyone explain the rationale behind that sort order?  I'm guessing it
> has something to do with getting sequences of words sorted 'right', but I
> fail to see how that is right.  I'd think when sorting it would make sense
> to have all sequences of words that start with the same word(s) together.
>
> $ echo -e 'bobbill\nbobrob\nbob bill\nbob robber' | LC_ALL=en_US sort
> bobbill
> bob bill    -\
> bobrob       |-- Seems these ought to be adjacent?
> bob robber  -/
>
> Perhaps I'm just old fashioned.  In any case, I don't define locale
> collation orders, so *shrug*.

I prefer en_US for sorting names. I like that last names like "des Jardins",
"desJardins" and "Des Jardins" will all show up next to each other, because
that is what I think humans will expect. (Note that I sort by first, middle
and last names separately.)

Re: 7.3 no longer using indexes for LIKE queries

От
Greg Stark
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> [tgl@g3]$ echo -e  "a\na a\naa\na  a\nab\na b" | LC_ALL=en_US sort
> a
> aa
> a a
> a  a
> ab
> a b
>
> There's no way to use an index ordered like this to look for strings
> beginning "a ", because the sorting of spaces depends on what comes
> after them.

It seems like there's an obvious easy fix for this. Allow indexes to be
created a simple non-locale dependent lexical sort order. They wouldn't be
useful for sorting in the locale sort order but they would be useful for the
case at hand.

This has the disadvantage of requiring two indexes if you really do want both
"WHERE x BETWEEN ? and ?" and "WHERE x LIKE 'foo%' to be fast, but then you're
saying that if the user really wants one of these "unsafe" locales then that's
what it'll cost the achieve it.

Perhaps a warning about it in the initdb stage since it's probably usually not
what the user wants would be nice too.

--
greg

Re: 7.3 no longer using indexes for LIKE queries

От
Peter Eisentraut
Дата:
Greg Stark writes:

> It seems like there's an obvious easy fix for this. Allow indexes to be
> created a simple non-locale dependent lexical sort order. They wouldn't be
> useful for sorting in the locale sort order but they would be useful for the
> case at hand.

There has already been a proposed implementation of that idea, but it has
been rejected because of some interpretational problems with how exactly
the LIKE operator should respond to locale settings.

According to the SQL standard, constant strings that are part of a
pattern should be compared using the relevant collation order.  If this
were implemented (which it currently isn't), then an index based on
strxfrm() should be used.  The current implementation should use an index
based on a binary comparison opclass.  We need to figure out which exactly
we want to proceed with.

I will point out that I believe that an implemenation following the SQL
standard model won't be particularly practical.  First of all, strings
that are not binary equivalents won't be compare as equal under any
reasonable collation.  Second, even if that were the case, it would
certainly not be appropriate to use for the LIKE operator.  Third, some of
the rules that underly many collations would cause the LIKE operator to
have pretty bizarre results.  For example, sometimes the strings are
compared backwards from the end of the string to the start, and it's not
clear how that should behave when faced with a wildcard pattern anchored
to the start.

--
Peter Eisentraut   peter_e@gmx.net


Re: 7.3 no longer using indexes for LIKE queries

От
Hiroshi Inoue
Дата:
Peter Eisentraut wrote:
>
> Greg Stark writes:
>
> > It seems like there's an obvious easy fix for this. Allow
> > indexes to be created a simple non-locale dependent lexical
> > sort order. They wouldn't be useful for sorting in the locale
> > sort order but they would be useful for the case at hand.
>
> There has already been a proposed implementation of that idea,
> but it has been rejected because of some interpretational
> problems with how exactly the LIKE operator should respond
> to locale settings.
>
> According to the SQL standard, constant strings that are part of a
> pattern should be compared using the relevant collation order.  If this
> were implemented (which it currently isn't), then an index based on
> strxfrm() should be used.  The current implementation should use an
> index based on a binary comparison opclass.  We need to figure out
> which exactly we want to proceed with.
>
> I will point out that I believe that an implemenation following the SQL
> standard model won't be particularly practical.  First of all, strings
> that are not binary equivalents won't be compare as equal under any
> reasonable collation.  Second, even if that were the case, it would
> certainly not be appropriate to use for the LIKE operator.

For example, there could be case-insensitive collations.

regards,
Hiroshi Inoue
    http://w2422.nsk.ne.jp/~inoue/

Re: 7.3 no longer using indexes for LIKE queries

От
Peter Eisentraut
Дата:
Hiroshi Inoue writes:

> For example, there could be case-insensitive collations.

You could do that, but normally you would take account the case
differences in the third pass.

Another interesting problem is how you would expect to handle
multicharacter collating elements that cross wildcard boundaries (like
'oc%' when 'ch' is a collating element).  That would give weird results.

--
Peter Eisentraut   peter_e@gmx.net


Re: 7.3 no longer using indexes for LIKE queries

От
Hiroshi Inoue
Дата:
Peter Eisentraut wrote:
>
> Hiroshi Inoue writes:
>
> > For example, there could be case-insensitive collations.
>
> You could do that, but normally you would take account the case
> differences in the third pass.

I would handle the cases as the case may be if
such collations are available.
You can develop the code applicable for the
restricted cases where the equality induced
by the collation is equivalent to the binary
equality, can't you ?
BTW you seem to be too inclined to the locale
support. How can we unify the collation support
(possibly introduced in the near future) and
the locale support ?

regards,
Hiroshi Inoue
    http://w2422.nsk.ne.jp/~inoue/