>> Similarly, pushing PG-specific declarations like RE_compile_and_cache() >> into regex/regex.h is completely not the right thing for preserving a >> clear library boundary (even positing that we want to expose that >> function outside adt/regexp.c, which I'd rather we didn't). >>
>> Perhaps we could avoid these issues by defining a library API that >> provides accessors like these for the opaque regex_t struct: >> >> * get the number of states in the CNFA >> >> * get the numbers of the initial and final states >> >> * get the number of out-arcs for the N'th state >> >> * get the out-arcs for the N'th state into a caller-provided array >> (sized using the previous function), where each out-arc is represented >> by a color and an end-state >> >> * get the number of character codes represented by color C >> >> * get the wchar codes for color C into a caller-provided array >> >> (The reason for letting the caller allocate the result arrays is so we >> can use palloc for that; if we allocate it in backend/regex/ we must >> use malloc, which will greatly increase the risk of leakages. Also, >> as far as the color API goes, the above lets the caller decide how >> many characters is "too many" to bother with.)
> I like the this idea. Seems like clear and not over-engineered API. I can > implement it. Could you propose something particular to do with > RE_compile_and_cache in this patch? With this API we still need a way to > get regex_t or equivalent from string.
Well, the brute force answer is for pg_trgm to not go through RE_compile_and_cache at all, but just call the regex library for itself. That would lose the ability to cache regex compilations, but I'm not sure we care. The motivation for RE_compile_and_cache is mainly to prevent having to compile the regex again for *every row*, which is what would otherwise happen in a simple SELECT using a regex function. Unless I'm misunderstanding something, pg_trgm would only need to compile the regex once per indexscan, which is probably tolerable.
For me it makes sense.
Patch with proposed API implementation is attached.