Re: Allow multi-byte characters as escape in SIMILAR TO and SUBSTRING
От | Jeff Davis |
---|---|
Тема | Re: Allow multi-byte characters as escape in SIMILAR TO and SUBSTRING |
Дата | |
Msg-id | 1405131377.9081.244.camel@jeff-desktop обсуждение исходный текст |
Ответ на | Re: Allow multi-byte characters as escape in SIMILAR TO and SUBSTRING (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Allow multi-byte characters as escape in SIMILAR TO
and SUBSTRING
Re: Allow multi-byte characters as escape in SIMILAR TO and SUBSTRING |
Список | pgsql-hackers |
On Fri, 2014-07-11 at 11:51 -0400, Tom Lane wrote: > Jeff Davis <pgsql@j-davis.com> writes: > > Attached is a small patch to $SUBJECT. > > In master, only single-byte characters are allowed as an escape. Of > > course, with the patch it must still be a single character, but it may > > be multi-byte. > > I'm concerned about the performance cost of this patch. Have you done > any measurements about what kind of overhead you are putting on the > inner loop of similar_escape? I didn't consider this very performance critical, because this is looping through the pattern, which I wouldn't expect to be a long string. On my machine using en_US.UTF-8, the difference is imperceptible for a SIMILAR TO ... ESCAPE query. I was able to see about a 2% increase in runtime when using the similar_escape function directly. I made a 10M tuple table and did: explain analyze select similar_escape('ΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣ','#') from t; which was the worst reasonable case I could think of. (It appears that selecting from a table is faster than from generate_series. I'm curious what you use when testing the performance of an individual function at the SQL level.) > At the very least, please don't call GetDatabaseEncoding() over again > every single time through the inner loop. More generally, why do we > need to use pg_encoding_verifymb? The input data is presumably validly > encoded. ISTM the significantly cheaper pg_mblen() would be more > appropriate. Thank you. Using the non-verifying variants reduces the penalty in the above test to 1%. If needed, we could optimize further through code specialization, as like_escape() does. Though I think like_escape() is specialized just because MatchText() is specialized; and MatchText is specialized because it operates on the actual data, not just the pattern. So I don't see a reason to specialize similar_escape(). Regards, Jeff Davis
Вложения
В списке pgsql-hackers по дате отправления: