Обсуждение: ilike multi-byte pattern cache

Поиск
Список
Период
Сортировка

ilike multi-byte pattern cache

От
Andrew Dunstan
Дата:
The attached patch implements a one-value pattern cache for the
multi-byte encoding case for ILIKE. This reduces calls to lower() by
(50% -1) in the common case where the pattern is a constant.  My own
testing and Guillaume Smet's show that this cuts roughly in half the
performance penalty we inflicted by using lower() in that case.

Is this sufficiently low risk to sneak into 8.3?

cheers

andrew
Index: src/backend/utils/adt/like.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/like.c,v
retrieving revision 1.71
diff -c -r1.71 like.c
*** src/backend/utils/adt/like.c    22 Sep 2007 03:58:34 -0000    1.71
--- src/backend/utils/adt/like.c    22 Sep 2007 12:16:23 -0000
***************
*** 139,144 ****
--- 139,149 ----
                 *p;
      int            slen,
                  plen;
+     static char patcache[512], lpatcache[512];
+     static int  patcachelen = 0, lpatcachelen = 0;
+
+     p = VARDATA_ANY(pat);
+     plen = VARSIZE_ANY_EXHDR(pat);

      /* For efficiency reasons, in the single byte case we don't call
       * lower() on the pattern and text, but instead call to_lower on each
***************
*** 147,156 ****

      if (pg_database_encoding_max_length() > 1)
      {
          /* lower's result is never packed, so OK to use old macros here */
-         pat = DatumGetTextP(DirectFunctionCall1(lower, PointerGetDatum(pat)));
-         p = VARDATA(pat);
-         plen = (VARSIZE(pat) - VARHDRSZ);
          str = DatumGetTextP(DirectFunctionCall1(lower, PointerGetDatum(str)));
          s = VARDATA(str);
          slen = (VARSIZE(str) - VARHDRSZ);
--- 152,192 ----

      if (pg_database_encoding_max_length() > 1)
      {
+         if (plen > 0  && plen == patcachelen && strncmp(p,patcache,plen) == 0)
+         {
+             p = lpatcache;
+             plen = lpatcachelen;
+         }
+         else
+         {
+             char *lp;
+             int   lplen;
+
+             pat = DatumGetTextP(DirectFunctionCall1(lower,
+                                                     PointerGetDatum(pat)));
+
+             /* lower's result is never packed, so OK to use old macros here */
+             lp = VARDATA(pat);
+             lplen = (VARSIZE(pat) - VARHDRSZ);
+
+             if (plen < 512 && lplen < 512)
+             {
+                 patcachelen = plen;
+                 lpatcachelen = lplen;
+                 memcpy(patcache,p,plen);
+                 memcpy(lpatcache,lp,lplen);
+             }
+             else
+             {
+                 patcachelen = 0;
+                 lpatcachelen = 0;
+             }
+
+             p = lp;
+             plen = lplen;
+         }
+
          /* lower's result is never packed, so OK to use old macros here */
          str = DatumGetTextP(DirectFunctionCall1(lower, PointerGetDatum(str)));
          s = VARDATA(str);
          slen = (VARSIZE(str) - VARHDRSZ);
***************
*** 161,168 ****
      }
      else
      {
-         p = VARDATA_ANY(pat);
-         plen = VARSIZE_ANY_EXHDR(pat);
          s = VARDATA_ANY(str);
          slen = VARSIZE_ANY_EXHDR(str);
          return SB_IMatchText(s, slen, p, plen);
--- 197,202 ----

Re: ilike multi-byte pattern cache

От
Tom Lane
Дата:
Andrew Dunstan <andrew@dunslane.net> writes:
> The attached patch implements a one-value pattern cache for the
> multi-byte encoding case for ILIKE. This reduces calls to lower() by
> (50% -1) in the common case where the pattern is a constant.  My own
> testing and Guillaume Smet's show that this cuts roughly in half the
> performance penalty we inflicted by using lower() in that case.

> Is this sufficiently low risk to sneak into 8.3?

This seems awfully ugly ... and considering that you don't get to
avoid lower() on the data side, it seems pretty dubious that it
buys very much percentagewise.  It would also be a net loss for
non-constant patterns, which are by no means unheard of --- or even
two constant patterns used in the same query.

We've lived with this in 8.2 without much complaint.  I think we can
let it go until we think of a better solution.  To my mind this is
all tied up in the problem of handling locales in a better fashion
--- a lot of the inefficiency of lower() is due to having a poor
impedance match to the libc locale-related functions, and might be
eliminated if we had locale support with better APIs.

            regards, tom lane

Re: ilike multi-byte pattern cache

От
Andrew Dunstan
Дата:

Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>
>> The attached patch implements a one-value pattern cache for the
>> multi-byte encoding case for ILIKE. This reduces calls to lower() by
>> (50% -1) in the common case where the pattern is a constant.  My own
>> testing and Guillaume Smet's show that this cuts roughly in half the
>> performance penalty we inflicted by using lower() in that case.
>>
>
>
>> Is this sufficiently low risk to sneak into 8.3?
>>
>
> This seems awfully ugly ... and considering that you don't get to
> avoid lower() on the data side, it seems pretty dubious that it
> buys very much percentagewise.  It would also be a net loss for
> non-constant patterns, which are by no means unheard of --- or even
> two constant patterns used in the same query.
>

The cost of using lower() is demonstrably high. Even on a very small
pattern the speedup is easily measurable.

The cost of the patch is effectively 1 call to strcmp() per call and 2
calls to memcpy() per cache miss, which should be quite cheap.

> We've lived with this in 8.2 without much complaint.  I think we can
> let it go until we think of a better solution.  To my mind this is
> all tied up in the problem of handling locales in a better fashion
> --- a lot of the inefficiency of lower() is due to having a poor
> impedance match to the libc locale-related functions, and might be
> eliminated if we had locale support with better APIs.
>
>
>

Well, we have a complaint now :-( This was aimed at being some temporary
relief rather than a long term fix. I guess Guillaume can use this as a
patch if he wants.

I agree that we need better locale APIs.

cheers

andrew

Re: ilike multi-byte pattern cache

От
ITAGAKI Takahiro
Дата:
Andrew Dunstan <andrew@dunslane.net> wrote:

> The attached patch implements a one-value pattern cache for the
> multi-byte encoding case for ILIKE. This reduces calls to lower() by
> (50% -1) in the common case where the pattern is a constant.  My own
> testing and Guillaume Smet's show that this cuts roughly in half the
> performance penalty we inflicted by using lower() in that case.

It might be a better solution to create the new text type 'lower_text'
and replace 'text ILIKE text' to 'text ILIKE lower_text'. The converter
function 'lower' is marked as immutable, planner can evaluate it just
one time in planning if the right-hand side is a constant expression.

Here is a pseudo-code for the above.

CREATE TYPE lower_text (INPUT = lower, ... );
CREATE CAST (text AS lower_text)
    WITH FUNCTION lower(text) AS IMPLICIT;
CREATE OPERATOR ILIKE (
    LEFTARG = text,
    RIGHTARG = lower_text,
);

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center