Re: WIP: index support for regexp search

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: WIP: index support for regexp search
Дата
Msg-id CAPpHfduK1u+ykcwKmjOTe30e6JmStMg6LUCmNttGinvYSQQ0RQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: WIP: index support for regexp search  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: WIP: index support for regexp search  ("Erikjan Rijkers" <er@xs4all.nl>)
Список pgsql-hackers
On Mon, Mar 25, 2013 at 2:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alexander Korotkov <aekorotkov@gmail.com> writes:
> On Mon, Mar 25, 2013 at 1:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Similarly, pushing PG-specific declarations like RE_compile_and_cache()
>> into regex/regex.h is completely not the right thing for preserving a
>> clear library boundary (even positing that we want to expose that
>> function outside adt/regexp.c, which I'd rather we didn't).
>>
>> Perhaps we could avoid these issues by defining a library API that
>> provides accessors like these for the opaque regex_t struct:
>>
>> * get the number of states in the CNFA
>>
>> * get the numbers of the initial and final states
>>
>> * get the number of out-arcs for the N'th state
>>
>> * get the out-arcs for the N'th state into a caller-provided array
>> (sized using the previous function), where each out-arc is represented
>> by a color and an end-state
>>
>> * get the number of character codes represented by color C
>>
>> * get the wchar codes for color C into a caller-provided array
>>
>> (The reason for letting the caller allocate the result arrays is so we
>> can use palloc for that; if we allocate it in backend/regex/ we must
>> use malloc, which will greatly increase the risk of leakages.  Also,
>> as far as the color API goes, the above lets the caller decide how
>> many characters is "too many" to bother with.)

> I like the this idea. Seems like clear and not over-engineered API. I can
> implement it. Could you propose something particular to do with
> RE_compile_and_cache in this patch? With this API we still need a way to
> get regex_t or equivalent from string.

Well, the brute force answer is for pg_trgm to not go through
RE_compile_and_cache at all, but just call the regex library for itself.
That would lose the ability to cache regex compilations, but I'm not
sure we care.  The motivation for RE_compile_and_cache is mainly to
prevent having to compile the regex again for *every row*, which is what
would otherwise happen in a simple SELECT using a regex function.
Unless I'm misunderstanding something, pg_trgm would only need to
compile the regex once per indexscan, which is probably tolerable.

For me it makes sense.
Patch with proposed API implementation is attached.

------
With best regards,
Alexander Korotkov.
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Page replacement algorithm in buffer cache
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: Page replacement algorithm in buffer cache