Re: [HACKERS] Re: [GENERAL] indexed regex select optimisation missing?

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [HACKERS] Re: [GENERAL] indexed regex select optimisation missing?
Дата
Msg-id 495.941820396@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: [GENERAL] indexed regex select optimisation missing?  ("Ross J. Reedstrom" <reedstrm@wallace.ece.rice.edu>)
Ответы Re: [HACKERS] Re: [GENERAL] indexed regex select optimisation missing?  (Charles Tassell <ctassell@isn.net>)
Re: [HACKERS] Re: [GENERAL] indexed regex select optimisation missing?  (Stuart Woolford <stuartw@newmail.net>)
Список pgsql-general
"Ross J. Reedstrom" <reedstrm@wallace.ece.rice.edu> writes:
> Reviewing my email logs from June, most of the work on this has to do with
> people who needs locales, and potentially multibyte character sets. Tom
> Lane is of the opinion that this particular optimization needs to be moved
> out of the parser, and deeper into the planner or optimizer/rewriter,
> so a good fix may be some ways out.

Actually, that part is already done: addition of the index-enabling
comparisons is gone from the parser and is now done in the optimizer,
which has a whole bunch of benefits (one being that the comparison
clauses don't get added to the query unless they are actually used
with an index!).

But the underlying LOCALE problem still remains: I don't know a good
character-set-independent method for generating a "just a little bit
larger" string to use as the righthand limit.  If anyone out there is
an expert on foreign and multibyte character sets, some help would
be appreciated.  Basically, given that we know the LIKE or regex
pattern can only match values beginning with FOO, we want to generate
string comparisons that select out the range of values that begin with
FOO (or, at worst, a slightly larger range).  In USASCII locale it's not
hard: you can do
    field >= 'FOO' AND field < 'FOP'
but it's not immediately obvious how to make this idea work reliably
in the presence of odd collation orders or multibyte characters...

BTW: the \377 hack is actually wrong for USASCII too, since it'll
exclude a data value like 'FOO\377x' which should be included.

            regards, tom lane

В списке pgsql-general по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: [GENERAL] indexed regex select optimisation missing?
Следующее
От: The Hermit Hacker
Дата:
Сообщение: PostgreSQL v6.5.3 Released