Обсуждение: Small performance regression in 9.2 has a big impact

Поиск
Список
Период
Сортировка

Small performance regression in 9.2 has a big impact

От
Scott Marlowe
Дата:
OK so there's a simple set of tree functions we use at work. They're
quite fast in 8.4 and they've gotten about 40% slower in 9.2. They're
a simple mix of sql and plpgsql functions which are at
http://pastebin.com/SXTnNhd5 and which I've attached.

Here's a test query:

select tree_ancestor_keys('000000000000000100000001');

According to explain analyze on both 8.4 and 9.2 they have the same
plan. However, on the same machine the query is about 40% slower on
9.2. Note we're not hitting the disks, or even buffers here. It's pure
in memory plpsql and sql that we're running.

explain analyze select tree_ancestor_keys('000000000000000100000001')
from generate_series(1,1000);

On 8.4 runs in about 280 to 300 ms. (you can run it once and get the
same diff, it's just easier to see with the generate series forcing it
to run 1000 times to kind of even out the noise.)

On 9.2, same machine, clean fresh dbs etc, it runs in ~400 ms. And
that difference seems to be there on all plpgsql and sql functions.

In our application, these tree functions get called millions and
millions of times a day, and a 40% performance penalty is a pretty big
deal.

We're already using the trick of telling the query planner that this
function will return 1 row with alter function rows 1 etc. That helps
a lot but it doesn't fix this underlying performance issue.

Server versions are 8.4.22 (last I think) and 9.2.9.

If anyone has any suggestions I'd love to hear them.
--
To understand recursion, one must first understand recursion.

Вложения

Re: Small performance regression in 9.2 has a big impact

От
Heikki Linnakangas
Дата:
On 11/25/2014 10:36 PM, Scott Marlowe wrote:
> OK so there's a simple set of tree functions we use at work. They're
> quite fast in 8.4 and they've gotten about 40% slower in 9.2. They're
> a simple mix of sql and plpgsql functions which are at
> http://pastebin.com/SXTnNhd5 and which I've attached.
>
> Here's a test query:
>
> select tree_ancestor_keys('000000000000000100000001');
>
> According to explain analyze on both 8.4 and 9.2 they have the same
> plan. However, on the same machine the query is about 40% slower on
> 9.2. Note we're not hitting the disks, or even buffers here. It's pure
> in memory plpsql and sql that we're running.
>
> explain analyze select tree_ancestor_keys('000000000000000100000001')
> from generate_series(1,1000);
>
> On 8.4 runs in about 280 to 300 ms. (you can run it once and get the
> same diff, it's just easier to see with the generate series forcing it
> to run 1000 times to kind of even out the noise.)
>
> On 9.2, same machine, clean fresh dbs etc, it runs in ~400 ms. And
> that difference seems to be there on all plpgsql and sql functions.
>
> In our application, these tree functions get called millions and
> millions of times a day, and a 40% performance penalty is a pretty big
> deal.
>
> We're already using the trick of telling the query planner that this
> function will return 1 row with alter function rows 1 etc. That helps
> a lot but it doesn't fix this underlying performance issue.
>
> Server versions are 8.4.22 (last I think) and 9.2.9.
>
> If anyone has any suggestions I'd love to hear them.

I don't know why this regressed between those versions, but looking at
the functions, there's some low-hanging fruit:

1. tree_ancestor_keys() could use UNION ALL instead of UNION. (I believe
duplicates are expected here, although I'm not 100% sure).

2. tree_ancestor_keys() calculates tree_level($1) every time it
recurses. Would be cheaper to calculate once, and pass it as argument.

Put together:

CREATE FUNCTION tree_ancestor_keys(bit varying, integer, integer)
RETURNS SETOF bit varying
     LANGUAGE sql IMMUTABLE STRICT
     AS $_$
   select tree_ancestor_key($1, $2)
   union all
   select tree_ancestor_keys($1, $2 + 1, $3)
   where $2 < $3
$_$;

CREATE or replace FUNCTION tree_ancestor_keys(bit varying, integer)
RETURNS SETOF bit varying
     LANGUAGE sql IMMUTABLE STRICT
     AS $_$
   select tree_ancestor_keys($1, $2 + 1, tree_level($1))
$_$;

These changes make your test query go about 2x faster on my laptop, with
git master. I'm sure you could optimize the functions further, but those
at least seem like fairly safe and simple changes.

- Heikki



Re: Small performance regression in 9.2 has a big impact

От
Tom Lane
Дата:
Scott Marlowe <scott.marlowe@gmail.com> writes:
> OK so there's a simple set of tree functions we use at work. They're
> quite fast in 8.4 and they've gotten about 40% slower in 9.2. They're
> a simple mix of sql and plpgsql functions which are at
> http://pastebin.com/SXTnNhd5 and which I've attached.

> Here's a test query:

> select tree_ancestor_keys('000000000000000100000001');

> According to explain analyze on both 8.4 and 9.2 they have the same
> plan. However, on the same machine the query is about 40% slower on
> 9.2. Note we're not hitting the disks, or even buffers here. It's pure
> in memory plpsql and sql that we're running.

> explain analyze select tree_ancestor_keys('000000000000000100000001')
> from generate_series(1,1000);

Hmm, I don't like the trend here.  For the repeat-1000x query, I get
these reported execution times:

8.4    360 ms
9.0    365 ms
9.1    440 ms
9.2    510 ms
9.3    550 ms
9.4    570 ms
head    570 ms

(This is in assert-enabled builds, I'm too lazy to rebuild all the
branches without that.)

oprofile isn't showing any smoking gun AFAICS; it seems like things
are just generally slower.  Still, we've not seen similar reports
elsewhere, so somehow this usage style is getting penalized in newer
branches compared to other cases.  If it were all on 9.2's head
I'd be suspicious of the plancache mechanism, but evidently that's
not it, or at least not the whole story.

HEAD profile entries above 1%:

samples  %        image name               symbol name
7573      7.2448  postgres                 AllocSetAlloc
6059      5.7964  postgres                 SearchCatCache
3533      3.3799  postgres                 AllocSetCheck
3420      3.2718  postgres                 base_yyparse
2104      2.0128  postgres                 AllocSetFree
1613      1.5431  postgres                 palloc
1523      1.4570  postgres                 ExecMakeFunctionResultNoSets
1313      1.2561  postgres                 check_list_invariants
1241      1.1872  postgres                 palloc0
1213      1.1604  postgres                 pfree
1157      1.1069  postgres                 SPI_plan_get_cached_plan
1136      1.0868  postgres                 GetPrivateRefCountEntry
1098      1.0504  postgres                 sentinel_ok
1085      1.0380  postgres                 hash_any
1057      1.0112  postgres                 core_yylex
1053      1.0074  postgres                 expression_tree_walker
1046      1.0007  postgres                 hash_search_with_hash_value

8.4 profile entries above 1%:

samples  %        image name               symbol name
11782    10.3680  postgres                 AllocSetAlloc
7369      6.4846  postgres                 AllocSetCheck
5623      4.9482  postgres                 base_yyparse
4166      3.6660  postgres                 SearchCatCache
2671      2.3504  postgres                 ExecMakeFunctionResultNoSets
2060      1.8128  postgres                 MemoryContextAllocZeroAligned
2030      1.7864  postgres                 MemoryContextAlloc
1679      1.4775  postgres                 ExecEvalParam
1607      1.4141  postgres                 base_yylex
1389      1.2223  postgres                 check_list_invariants
1348      1.1862  postgres                 RevalidateCachedPlan
1341      1.1801  postgres                 AcquireExecutorLocks
1266      1.1141  postgres                 MemoryContextCreate
1256      1.1053  postgres                 hash_any
1255      1.1044  postgres                 expression_tree_walker
1202      1.0577  postgres                 expression_tree_mutator
1191      1.0481  postgres                 AllocSetReset

            regards, tom lane


Re: Small performance regression in 9.2 has a big impact

От
Scott Marlowe
Дата:
On Tue, Nov 25, 2014 at 1:58 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> On 11/25/2014 10:36 PM, Scott Marlowe wrote:
>>
>> OK so there's a simple set of tree functions we use at work. They're
>> quite fast in 8.4 and they've gotten about 40% slower in 9.2. They're
>> a simple mix of sql and plpgsql functions which are at
>> http://pastebin.com/SXTnNhd5 and which I've attached.
>>
>> Here's a test query:
>>
>> select tree_ancestor_keys('000000000000000100000001');
>>
>> According to explain analyze on both 8.4 and 9.2 they have the same
>> plan. However, on the same machine the query is about 40% slower on
>> 9.2. Note we're not hitting the disks, or even buffers here. It's pure
>> in memory plpsql and sql that we're running.
>>
>> explain analyze select tree_ancestor_keys('000000000000000100000001')
>> from generate_series(1,1000);
>>
>> On 8.4 runs in about 280 to 300 ms. (you can run it once and get the
>> same diff, it's just easier to see with the generate series forcing it
>> to run 1000 times to kind of even out the noise.)
>>
>> On 9.2, same machine, clean fresh dbs etc, it runs in ~400 ms. And
>> that difference seems to be there on all plpgsql and sql functions.
>>
>> In our application, these tree functions get called millions and
>> millions of times a day, and a 40% performance penalty is a pretty big
>> deal.
>>
>> We're already using the trick of telling the query planner that this
>> function will return 1 row with alter function rows 1 etc. That helps
>> a lot but it doesn't fix this underlying performance issue.
>>
>> Server versions are 8.4.22 (last I think) and 9.2.9.
>>
>> If anyone has any suggestions I'd love to hear them.
>
>
> I don't know why this regressed between those versions, but looking at the
> functions, there's some low-hanging fruit:
>
> 1. tree_ancestor_keys() could use UNION ALL instead of UNION. (I believe
> duplicates are expected here, although I'm not 100% sure).
>
> 2. tree_ancestor_keys() calculates tree_level($1) every time it recurses.
> Would be cheaper to calculate once, and pass it as argument.
>
> Put together:
>
> CREATE FUNCTION tree_ancestor_keys(bit varying, integer, integer) RETURNS
> SETOF bit varying
>     LANGUAGE sql IMMUTABLE STRICT
>     AS $_$
>   select tree_ancestor_key($1, $2)
>   union all
>   select tree_ancestor_keys($1, $2 + 1, $3)
>   where $2 < $3
> $_$;
>
> CREATE or replace FUNCTION tree_ancestor_keys(bit varying, integer) RETURNS
> SETOF bit varying
>     LANGUAGE sql IMMUTABLE STRICT
>     AS $_$
>   select tree_ancestor_keys($1, $2 + 1, tree_level($1))
> $_$;
>
> These changes make your test query go about 2x faster on my laptop, with git
> master. I'm sure you could optimize the functions further, but those at
> least seem like fairly safe and simple changes.

Wow that made a huge difference. About a 50% increase across the
board. Sadly, 9.2 is still way behind 8.4 (see Tom's email)

Here's the results of 10 runs, 9.2 old functions, 9.2 new functions,
8.4 old functions, 8.4 new functions:

402.454 ms 217.718 ms 283.289 ms 167.108 ms
390.828 ms 219.644 ms 282.994 ms 165.524 ms
397.987 ms 216.864 ms 289.053 ms 170.821 ms
391.670 ms 220.943 ms 296.410 ms 164.190 ms
437.099 ms 233.360 ms 284.279 ms 183.919 ms
473.945 ms 291.199 ms 347.916 ms 168.300 ms
453.974 ms 231.350 ms 367.517 ms 158.717 ms
377.221 ms 226.697 ms 297.255 ms 164.196 ms
396.812 ms 262.638 ms 300.073 ms 161.325 ms
436.822 ms 227.489 ms 292.553 ms 179.194 ms
405.929 ms 233.461 ms 267.355 ms 162.847 ms

so about 400 versus about 220, and about 290 versus about 165 or so.


Re: Small performance regression in 9.2 has a big impact

От
Tom Lane
Дата:
I wrote:
> Scott Marlowe <scott.marlowe@gmail.com> writes:
>> OK so there's a simple set of tree functions we use at work. They're
>> quite fast in 8.4 and they've gotten about 40% slower in 9.2.

> Hmm, I don't like the trend here.  For the repeat-1000x query, I get
> these reported execution times:

> 8.4    360 ms
> 9.0    365 ms
> 9.1    440 ms
> 9.2    510 ms
> 9.3    550 ms
> 9.4    570 ms
> head    570 ms

I found part of the issue: you're doing a lot of UNIONs on varbit
columns, and every time we parse one of those, get_sort_group_operators
asks the typcache about hash operators for the type.  typcache finds
out that varbit has no default hash opclass ... but *it doesn't cache
negative answers*.  So that means a physical lookup in pg_opclass every
time :-(.  That is actually the only bufmgr traffic induced by this
test query, once the catalog caches are loaded.  Versions before 9.1
don't have that hit because they didn't consider hashing for UNIONs.

I made a quick-hack patch to suppress redundant GetDefaultOpclass calls
in typcache.c, and found that that brought HEAD's runtime down to 460ms.
I don't think I'd want to commit this in its current form, but with
some additions to flush the cache after pg_opclass updates it would
be a credible improvement.

So that probably explains the jump from 9.0 to 9.1.  Don't know yet
about the other lossage.

            regards, tom lane


Re: Small performance regression in 9.2 has a big impact

От
Tom Lane
Дата:
Scott Marlowe <scott.marlowe@gmail.com> writes:
> On Tue, Nov 25, 2014 at 1:58 PM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
>> I don't know why this regressed between those versions, but looking at the
>> functions, there's some low-hanging fruit:
>>
>> 1. tree_ancestor_keys() could use UNION ALL instead of UNION. (I believe
>> duplicates are expected here, although I'm not 100% sure).
>>
>> 2. tree_ancestor_keys() calculates tree_level($1) every time it recurses.
>> Would be cheaper to calculate once, and pass it as argument.

> Wow that made a huge difference. About a 50% increase across the
> board. Sadly, 9.2 is still way behind 8.4 (see Tom's email)

Switching from UNION to UNION ALL would dodge the varbit hash-opclass
caching issue, I think.  But there's still something else going on.

            regards, tom lane


Re: Small performance regression in 9.2 has a big impact

От
Tom Lane
Дата:
I wrote:
>> Hmm, I don't like the trend here.  For the repeat-1000x query, I get
>> these reported execution times:

>> 8.4    360 ms
>> 9.0    365 ms
>> 9.1    440 ms
>> 9.2    510 ms
>> 9.3    550 ms
>> 9.4    570 ms
>> head    570 ms

> I made a quick-hack patch to suppress redundant GetDefaultOpclass calls
> in typcache.c, and found that that brought HEAD's runtime down to 460ms.

I found some additional low-hanging fruit by comparing gprof call counts
in 8.4 and HEAD:

* OverrideSearchPathMatchesCurrent(), which is not there at all in 8.4
or 9.2, accounts for a depressingly large amount of palloc/pfree traffic.
The implementation was quick-n-dirty to begin with, but workloads
like this one call it often enough to make it a pain point.

* plpgsql's setup_param_list() contributes another large fraction of
added palloc/pfree traffic; this is evidently caused by the temporary
bitmapset needed for its bms_first_member() loop, which was not there
in 8.4 but is there in 9.2.

I've been able to bring HEAD's runtime down to about 415 ms with the
collection of more-or-less quick hacks attached.  None of them are
ready to commit but I thought I'd post them for the record.

After review of all this, I think the aspect of your example that is
causing performance issues is that there are a lot of non-inline-able
SQL-language function calls.  That's not a case that we've put much
thought into lately.  I doubt we are going to get all the way back to
where 8.4 was in the short term, because I can see that there is a
significant amount of new computation associated with collation
management during parsing (catcache lookups driven by get_typcollation,
assign_collations_walker, etc).  The long-term answer to that is to
improve the SQL-language function support so that it can cache the results
of parsing the function body; we have surely got enough plancache support
for that now, but no one's attempted to apply it in functions.c.

            regards, tom lane

diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c
index 8c6c7fc..bf80563 100644
*** a/src/backend/utils/cache/typcache.c
--- b/src/backend/utils/cache/typcache.c
*************** static HTAB *TypeCacheHash = NULL;
*** 77,82 ****
--- 77,84 ----
  #define TCFLAGS_CHECKED_FIELD_PROPERTIES    0x0010
  #define TCFLAGS_HAVE_FIELD_EQUALITY            0x0020
  #define TCFLAGS_HAVE_FIELD_COMPARE            0x0040
+ #define TCFLAGS_CHECKED_BTREE_OPCLASS        0x0080
+ #define TCFLAGS_CHECKED_HASH_OPCLASS        0x0100

  /* Private information to support comparisons of enum values */
  typedef struct
*************** lookup_type_cache(Oid type_id, int flags
*** 227,249 ****
      {
          Oid            opclass;

!         opclass = GetDefaultOpClass(type_id, BTREE_AM_OID);
!         if (OidIsValid(opclass))
          {
!             typentry->btree_opf = get_opclass_family(opclass);
!             typentry->btree_opintype = get_opclass_input_type(opclass);
          }
          /* If no btree opclass, we force lookup of the hash opclass */
          if (typentry->btree_opf == InvalidOid)
          {
              if (typentry->hash_opf == InvalidOid)
              {
!                 opclass = GetDefaultOpClass(type_id, HASH_AM_OID);
!                 if (OidIsValid(opclass))
                  {
!                     typentry->hash_opf = get_opclass_family(opclass);
!                     typentry->hash_opintype = get_opclass_input_type(opclass);
                  }
              }
          }
          else
--- 229,259 ----
      {
          Oid            opclass;

!         if (!(typentry->flags & TCFLAGS_CHECKED_BTREE_OPCLASS))
          {
!             opclass = GetDefaultOpClass(type_id, BTREE_AM_OID);
!             if (OidIsValid(opclass))
!             {
!                 typentry->btree_opf = get_opclass_family(opclass);
!                 typentry->btree_opintype = get_opclass_input_type(opclass);
!             }
!             typentry->flags |= TCFLAGS_CHECKED_BTREE_OPCLASS;
          }
          /* If no btree opclass, we force lookup of the hash opclass */
          if (typentry->btree_opf == InvalidOid)
          {
              if (typentry->hash_opf == InvalidOid)
              {
!                 if (!(typentry->flags & TCFLAGS_CHECKED_HASH_OPCLASS))
                  {
!                     opclass = GetDefaultOpClass(type_id, HASH_AM_OID);
!                     if (OidIsValid(opclass))
!                     {
!                         typentry->hash_opf = get_opclass_family(opclass);
!                         typentry->hash_opintype = get_opclass_input_type(opclass);
!                     }
                  }
+                 typentry->flags |= TCFLAGS_CHECKED_HASH_OPCLASS;
              }
          }
          else
*************** lookup_type_cache(Oid type_id, int flags
*** 268,278 ****
      {
          Oid            opclass;

!         opclass = GetDefaultOpClass(type_id, HASH_AM_OID);
!         if (OidIsValid(opclass))
          {
!             typentry->hash_opf = get_opclass_family(opclass);
!             typentry->hash_opintype = get_opclass_input_type(opclass);
          }
      }

--- 278,292 ----
      {
          Oid            opclass;

!         if (!(typentry->flags & TCFLAGS_CHECKED_HASH_OPCLASS))
          {
!             opclass = GetDefaultOpClass(type_id, HASH_AM_OID);
!             if (OidIsValid(opclass))
!             {
!                 typentry->hash_opf = get_opclass_family(opclass);
!                 typentry->hash_opintype = get_opclass_input_type(opclass);
!             }
!             typentry->flags |= TCFLAGS_CHECKED_HASH_OPCLASS;
          }
      }

diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 911f015..ca93307 100644
*** a/src/backend/catalog/namespace.c
--- b/src/backend/catalog/namespace.c
*************** CopyOverrideSearchPath(OverrideSearchPat
*** 3145,3164 ****
  bool
  OverrideSearchPathMatchesCurrent(OverrideSearchPath *path)
  {
!     /* Easiest way to do this is GetOverrideSearchPath() and compare */
!     bool        result;
!     OverrideSearchPath *cur;

!     cur = GetOverrideSearchPath(CurrentMemoryContext);
!     if (path->addCatalog == cur->addCatalog &&
!         path->addTemp == cur->addTemp &&
!         equal(path->schemas, cur->schemas))
!         result = true;
!     else
!         result = false;
!     list_free(cur->schemas);
!     pfree(cur);
!     return result;
  }

  /*
--- 3145,3178 ----
  bool
  OverrideSearchPathMatchesCurrent(OverrideSearchPath *path)
  {
!     ListCell *lc, *lcc;

!     recomputeNamespacePath();
!     lc = list_head(activeSearchPath);
!     if (path->addTemp)
!     {
!         if (lc && lfirst_oid(lc) == myTempNamespace)
!             lc = lnext(lc);
!         else
!             return false;
!     }
!     if (path->addCatalog)
!     {
!         if (lc && lfirst_oid(lc) == PG_CATALOG_NAMESPACE)
!             lc = lnext(lc);
!         else
!             return false;
!     }
!     foreach(lcc, path->schemas)
!     {
!         if (lc && lfirst_oid(lc) == lfirst_oid(lcc))
!             lc = lnext(lc);
!         else
!             return false;
!     }
!     if (lc)
!         return false;
!     return true;
  }

  /*
diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index f770608..a78ff48 100644
*** a/src/include/nodes/bitmapset.h
--- b/src/include/nodes/bitmapset.h
*************** extern Bitmapset *bms_join(Bitmapset *a,
*** 89,94 ****
--- 89,95 ----

  /* support for iterating through the integer elements of a set: */
  extern int    bms_first_member(Bitmapset *a);
+ extern int    bms_next_member(const Bitmapset *a, int prevbit);

  /* support for hashtables using Bitmapsets as keys: */
  extern uint32 bms_hash_value(const Bitmapset *a);
diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index c927b78..fbaf377 100644
*** a/src/backend/nodes/bitmapset.c
--- b/src/backend/nodes/bitmapset.c
*************** bms_first_member(Bitmapset *a)
*** 841,846 ****
--- 841,899 ----
  }

  /*
+  * bms_next_member - find next member of a set
+  *
+  * Returns smallest member greater than "prevbit", or -1 if there is none.
+  *
+  * This is intended as support for iterating through the members of a set.
+  * The typical pattern is
+  *
+  *            x = -1;
+  *            while ((x = bms_next_member(inputset, x)) >= 0)
+  *                process member x;
+  */
+ int
+ bms_next_member(const Bitmapset *a, int prevbit)
+ {
+     int            nwords;
+     int            wordnum;
+     bitmapword    mask;
+
+     if (a == NULL)
+         return -1;
+     nwords = a->nwords;
+     prevbit++;
+     mask = (~(bitmapword) 0) << BITNUM(prevbit);
+     for (wordnum = WORDNUM(prevbit); wordnum < nwords; wordnum++)
+     {
+         bitmapword    w = a->words[wordnum];
+
+         /* ignore bits before prevbit */
+         w &= mask;
+
+         if (w != 0)
+         {
+             int            result;
+
+             w = RIGHTMOST_ONE(w);
+
+             result = wordnum * BITS_PER_BITMAPWORD;
+             while ((w & 255) == 0)
+             {
+                 w >>= 8;
+                 result += 8;
+             }
+             result += rightmost_one_pos[w & 255];
+             return result;
+         }
+
+         /* in subsequent words, consider all bits */
+         mask = (~(bitmapword) 0);
+     }
+     return -1;
+ }
+
+ /*
   * bms_hash_value - compute a hash key for a Bitmapset
   *
   * Note: we must ensure that any two bitmapsets that are bms_equal() will
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 11cb47b..d6d414f 100644
*** a/src/pl/plpgsql/src/pl_exec.c
--- b/src/pl/plpgsql/src/pl_exec.c
*************** setup_param_list(PLpgSQL_execstate *esta
*** 5263,5269 ****
       */
      if (!bms_is_empty(expr->paramnos))
      {
-         Bitmapset  *tmpset;
          int            dno;

          paramLI = (ParamListInfo)
--- 5263,5268 ----
*************** setup_param_list(PLpgSQL_execstate *esta
*** 5276,5283 ****
          paramLI->numParams = estate->ndatums;

          /* Instantiate values for "safe" parameters of the expression */
!         tmpset = bms_copy(expr->paramnos);
!         while ((dno = bms_first_member(tmpset)) >= 0)
          {
              PLpgSQL_datum *datum = estate->datums[dno];

--- 5275,5282 ----
          paramLI->numParams = estate->ndatums;

          /* Instantiate values for "safe" parameters of the expression */
!         dno = -1;
!         while ((dno = bms_next_member(expr->paramnos, dno)) >= 0)
          {
              PLpgSQL_datum *datum = estate->datums[dno];

*************** setup_param_list(PLpgSQL_execstate *esta
*** 5292,5298 ****
                  prm->ptype = var->datatype->typoid;
              }
          }
-         bms_free(tmpset);

          /*
           * Set up link to active expr where the hook functions can find it.
--- 5291,5296 ----

Re: Small performance regression in 9.2 has a big impact

От
Scott Marlowe
Дата:
On Tue, Nov 25, 2014 at 8:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>>> Hmm, I don't like the trend here.  For the repeat-1000x query, I get
>>> these reported execution times:
>
>>> 8.4  360 ms
>>> 9.0  365 ms
>>> 9.1  440 ms
>>> 9.2  510 ms
>>> 9.3  550 ms
>>> 9.4  570 ms
>>> head 570 ms
>
>> I made a quick-hack patch to suppress redundant GetDefaultOpclass calls
>> in typcache.c, and found that that brought HEAD's runtime down to 460ms.
>
> I found some additional low-hanging fruit by comparing gprof call counts
> in 8.4 and HEAD:
>
> * OverrideSearchPathMatchesCurrent(), which is not there at all in 8.4
> or 9.2, accounts for a depressingly large amount of palloc/pfree traffic.
> The implementation was quick-n-dirty to begin with, but workloads
> like this one call it often enough to make it a pain point.
>
> * plpgsql's setup_param_list() contributes another large fraction of
> added palloc/pfree traffic; this is evidently caused by the temporary
> bitmapset needed for its bms_first_member() loop, which was not there
> in 8.4 but is there in 9.2.
>
> I've been able to bring HEAD's runtime down to about 415 ms with the
> collection of more-or-less quick hacks attached.  None of them are
> ready to commit but I thought I'd post them for the record.
>
> After review of all this, I think the aspect of your example that is
> causing performance issues is that there are a lot of non-inline-able
> SQL-language function calls.  That's not a case that we've put much
> thought into lately.  I doubt we are going to get all the way back to
> where 8.4 was in the short term, because I can see that there is a
> significant amount of new computation associated with collation
> management during parsing (catcache lookups driven by get_typcollation,
> assign_collations_walker, etc).  The long-term answer to that is to
> improve the SQL-language function support so that it can cache the results
> of parsing the function body; we have surely got enough plancache support
> for that now, but no one's attempted to apply it in functions.c.

Thanks so much for the work on this. We won't be applying a patch in
prod but we can definitely get a feel for the change on some test
boxes.

And if I didn't say it, Thanks to Heikki for his advice. Huge
difference there too.

This is exactly why I love using Postgres so much. The community
support. No other software package has this kind of support.