Re: 7.3 no longer using indexes for LIKE queries

Поиск
Список
Период
Сортировка
От Matthew Gabeler-Lee
Тема Re: 7.3 no longer using indexes for LIKE queries
Дата
Msg-id ABABFB80F35AD311848B0090279918EF010B9B66@ZYCOSNT2.hq.zycos.com
обсуждение исходный текст
Ответ на 7.3 no longer using indexes for LIKE queries  (Matthew Gabeler-Lee <mgabelerlee@zycos.com>)
Список pgsql-general
What I was referring to there was the idea that when a user searched for
'cote', you'd transform that into a condition along the lines of "WHERE
foldaccents(textcol) LIKE 'cote%'" or such so that failing to enter accents
wouldn't make the query not hit, which is especially desirable for someone
who wants to query something with accents, but can't type them easily on
their keyboard.

In the case of optimizing the LIKE operator, if you ask for txtcol LIKE
'CÔTE%', then (assuming the charset is 'sane'), it would do an index scan
with a filter something along the lines of (txtcol >= 'CÔTE' AND txtcol
<'CÔTF') before applying the LIKE operator.  Assuming the locale is set
right in the database, the indexing stuff will know how to colate CÔTE vs.
COTÉ; that's the whole point of setting the COLLATE thing, so that the index
*will* have them in the correct order.

My point was mostly that expecting txtcol LIKE 'CÔTE%' to match 'COTÉ' is
probably bad logic.  Now, if there is a good argument why that isn't so, I'm
all ears, but it seems to me that 'CÔTE' and 'COTÉ' are different words and
thus shouldn't match.

    -Matt


-----Original Message-----
From: Jean-Luc Lachance [mailto:jllachan@nsd.ca]
Sent: Wednesday, December 04, 2002 15:25
To: Matthew Gabeler-Lee
Cc: 'pgsql-general@postgresql.org'
Subject: Re: [GENERAL] 7.3 no longer using indexes for LIKE queries


C_COLLATE is what is involved with accents.

How would you sort:

CÔTE
CÔTES
COTÉ
COTÉS
COTE

You can't fold accented character into non accented because unique index
would barf on CÔTE if COTE is already in.

You still need to know if 'CÔTE' < 'COTÉ' or not when you do a sort.

Collating in french, for example, is not a byte to byte compare.

If you compare words based only on their binary representation, the sort
will be wrong
CRIME before CÔTE.


JLL


Matthew Gabeler-Lee wrote:

> 4) accent folding; I'm not entirely sure like is supposed to do this.  I'm
> going to pretend for the rest of this that the like operator shouldn't
fold
> accented characters.
>
> [...]
>
> It seems to me that the most common place one wants to think about this is
> in full text searching and the like.  In this case, maybe I'm daft, but
> perhaps the thing to do is to create a functional index, where the
function
> being indexed strips all the accents off characters.
>
> Does the SQL spec have anything to say on accent folding in comparisons?

В списке pgsql-general по дате отправления:

Предыдущее
От: "scott.marlowe"
Дата:
Сообщение: Re: performance tuning
Следующее
От: Tom Lane
Дата:
Сообщение: Re: 7.3 no longer using indexes for LIKE queries