Re: non-static LIKE patterns

Поиск
Список
Период
Сортировка
От hamann.w@t-online.de
Тема Re: non-static LIKE patterns
Дата
Msg-id wolfgang-1120414084536.A0226706@amadeus3.local
обсуждение исходный текст
Ответ на Re: non-static LIKE patterns  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
>>
>> hamann.w@t-online.de writes:
>> > Tom Lane wrote:
>> > If you want it to be bulletproof, what I'd think about is something like
>> >     WHERE second.path LIKE quote_like(first.path)||'%'
>>
>> > Just out of curiosity: wouldn't that (as well as using non-static like)
>> > be an enormous performance problem?
>>
>> Well, it won't be free, but I think you've already doomed yourself to
>> a not-very-bright plan by using LIKE in this way at all.
>>
>> In any case, as a wise man once said, you can make it run arbitrarily
>> fast if it doesn't have to give the right answer.  Correctness trumps
>> any micro-optimization questions, so if you have to have prefix matching
>> of this sort, it's gonna cost ya somehow.
>>

Hi Tom,

I just stumbled across this question because I regularly come across problems that,
at first, look like they should be solved with non-static LIKE or REGEX patterns

I actually have two situations where I would need a better plan. One is, fortunately,
fairly static (mostly lookups, hardly inserts) for name matches. Many famous people
appear in different spellings, say these two musicians
Franz|Ferenc Liszt
Fr(e|y)der(ic|yk) Chopin
So the first plan would be to regex-compare the sought name against the first name (or last name)
regexes. Run-time is astronomical, though
My current approach is to
a) keep the regexes in a separate table/column, so names with a regex entry are handled in
a smaller query
b) reverse the query: for every regex (they are well-behaved in this context) I pre-create a pattern
so that my actual query becomes
where pre-made-pattern ~ searched_name
c) while preparing the pattern, a common initial character (the "F" for Franz and Ferenc) is identified
to build an index. In the rare case that the first letter is already different, there would be
two entries in the table. So the actual query can check for first letter before it does the regex.

The other situation, unfortionately, is ad-hoc queries where I cannot do that kind of preparation
typically, the DB would contain strings like XY4711A, XY271, XY17321AAA, and I want to
check whether an input like XY17321 matches a database entry up to the end of the
numerals. So I add [^0-9]*$ to the end of my candidates, select
  where candidate ~ entry-in-table
and go for a coffee or two

Of course I would prefer to see a pre-built solution do all that mess for me...

Regards
Wolfgang Hamann


В списке pgsql-general по дате отправления:

Предыдущее
От: Abel Abraham Camarillo Ojeda
Дата:
Сообщение: Re: Horizontal aggregation?
Следующее
От: hamann.w@t-online.de
Дата:
Сообщение: Re: Horizontal aggregation?