Обсуждение: Small performance regression in 9.2 has a big impact
OK so there's a simple set of tree functions we use at work. They're quite fast in 8.4 and they've gotten about 40% slower in 9.2. They're a simple mix of sql and plpgsql functions which are at http://pastebin.com/SXTnNhd5 and which I've attached. Here's a test query: select tree_ancestor_keys('000000000000000100000001'); According to explain analyze on both 8.4 and 9.2 they have the same plan. However, on the same machine the query is about 40% slower on 9.2. Note we're not hitting the disks, or even buffers here. It's pure in memory plpsql and sql that we're running. explain analyze select tree_ancestor_keys('000000000000000100000001') from generate_series(1,1000); On 8.4 runs in about 280 to 300 ms. (you can run it once and get the same diff, it's just easier to see with the generate series forcing it to run 1000 times to kind of even out the noise.) On 9.2, same machine, clean fresh dbs etc, it runs in ~400 ms. And that difference seems to be there on all plpgsql and sql functions. In our application, these tree functions get called millions and millions of times a day, and a 40% performance penalty is a pretty big deal. We're already using the trick of telling the query planner that this function will return 1 row with alter function rows 1 etc. That helps a lot but it doesn't fix this underlying performance issue. Server versions are 8.4.22 (last I think) and 9.2.9. If anyone has any suggestions I'd love to hear them. -- To understand recursion, one must first understand recursion.
Вложения
On 11/25/2014 10:36 PM, Scott Marlowe wrote: > OK so there's a simple set of tree functions we use at work. They're > quite fast in 8.4 and they've gotten about 40% slower in 9.2. They're > a simple mix of sql and plpgsql functions which are at > http://pastebin.com/SXTnNhd5 and which I've attached. > > Here's a test query: > > select tree_ancestor_keys('000000000000000100000001'); > > According to explain analyze on both 8.4 and 9.2 they have the same > plan. However, on the same machine the query is about 40% slower on > 9.2. Note we're not hitting the disks, or even buffers here. It's pure > in memory plpsql and sql that we're running. > > explain analyze select tree_ancestor_keys('000000000000000100000001') > from generate_series(1,1000); > > On 8.4 runs in about 280 to 300 ms. (you can run it once and get the > same diff, it's just easier to see with the generate series forcing it > to run 1000 times to kind of even out the noise.) > > On 9.2, same machine, clean fresh dbs etc, it runs in ~400 ms. And > that difference seems to be there on all plpgsql and sql functions. > > In our application, these tree functions get called millions and > millions of times a day, and a 40% performance penalty is a pretty big > deal. > > We're already using the trick of telling the query planner that this > function will return 1 row with alter function rows 1 etc. That helps > a lot but it doesn't fix this underlying performance issue. > > Server versions are 8.4.22 (last I think) and 9.2.9. > > If anyone has any suggestions I'd love to hear them. I don't know why this regressed between those versions, but looking at the functions, there's some low-hanging fruit: 1. tree_ancestor_keys() could use UNION ALL instead of UNION. (I believe duplicates are expected here, although I'm not 100% sure). 2. tree_ancestor_keys() calculates tree_level($1) every time it recurses. Would be cheaper to calculate once, and pass it as argument. Put together: CREATE FUNCTION tree_ancestor_keys(bit varying, integer, integer) RETURNS SETOF bit varying LANGUAGE sql IMMUTABLE STRICT AS $_$ select tree_ancestor_key($1, $2) union all select tree_ancestor_keys($1, $2 + 1, $3) where $2 < $3 $_$; CREATE or replace FUNCTION tree_ancestor_keys(bit varying, integer) RETURNS SETOF bit varying LANGUAGE sql IMMUTABLE STRICT AS $_$ select tree_ancestor_keys($1, $2 + 1, tree_level($1)) $_$; These changes make your test query go about 2x faster on my laptop, with git master. I'm sure you could optimize the functions further, but those at least seem like fairly safe and simple changes. - Heikki
Scott Marlowe <scott.marlowe@gmail.com> writes: > OK so there's a simple set of tree functions we use at work. They're > quite fast in 8.4 and they've gotten about 40% slower in 9.2. They're > a simple mix of sql and plpgsql functions which are at > http://pastebin.com/SXTnNhd5 and which I've attached. > Here's a test query: > select tree_ancestor_keys('000000000000000100000001'); > According to explain analyze on both 8.4 and 9.2 they have the same > plan. However, on the same machine the query is about 40% slower on > 9.2. Note we're not hitting the disks, or even buffers here. It's pure > in memory plpsql and sql that we're running. > explain analyze select tree_ancestor_keys('000000000000000100000001') > from generate_series(1,1000); Hmm, I don't like the trend here. For the repeat-1000x query, I get these reported execution times: 8.4 360 ms 9.0 365 ms 9.1 440 ms 9.2 510 ms 9.3 550 ms 9.4 570 ms head 570 ms (This is in assert-enabled builds, I'm too lazy to rebuild all the branches without that.) oprofile isn't showing any smoking gun AFAICS; it seems like things are just generally slower. Still, we've not seen similar reports elsewhere, so somehow this usage style is getting penalized in newer branches compared to other cases. If it were all on 9.2's head I'd be suspicious of the plancache mechanism, but evidently that's not it, or at least not the whole story. HEAD profile entries above 1%: samples % image name symbol name 7573 7.2448 postgres AllocSetAlloc 6059 5.7964 postgres SearchCatCache 3533 3.3799 postgres AllocSetCheck 3420 3.2718 postgres base_yyparse 2104 2.0128 postgres AllocSetFree 1613 1.5431 postgres palloc 1523 1.4570 postgres ExecMakeFunctionResultNoSets 1313 1.2561 postgres check_list_invariants 1241 1.1872 postgres palloc0 1213 1.1604 postgres pfree 1157 1.1069 postgres SPI_plan_get_cached_plan 1136 1.0868 postgres GetPrivateRefCountEntry 1098 1.0504 postgres sentinel_ok 1085 1.0380 postgres hash_any 1057 1.0112 postgres core_yylex 1053 1.0074 postgres expression_tree_walker 1046 1.0007 postgres hash_search_with_hash_value 8.4 profile entries above 1%: samples % image name symbol name 11782 10.3680 postgres AllocSetAlloc 7369 6.4846 postgres AllocSetCheck 5623 4.9482 postgres base_yyparse 4166 3.6660 postgres SearchCatCache 2671 2.3504 postgres ExecMakeFunctionResultNoSets 2060 1.8128 postgres MemoryContextAllocZeroAligned 2030 1.7864 postgres MemoryContextAlloc 1679 1.4775 postgres ExecEvalParam 1607 1.4141 postgres base_yylex 1389 1.2223 postgres check_list_invariants 1348 1.1862 postgres RevalidateCachedPlan 1341 1.1801 postgres AcquireExecutorLocks 1266 1.1141 postgres MemoryContextCreate 1256 1.1053 postgres hash_any 1255 1.1044 postgres expression_tree_walker 1202 1.0577 postgres expression_tree_mutator 1191 1.0481 postgres AllocSetReset regards, tom lane
On Tue, Nov 25, 2014 at 1:58 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 11/25/2014 10:36 PM, Scott Marlowe wrote: >> >> OK so there's a simple set of tree functions we use at work. They're >> quite fast in 8.4 and they've gotten about 40% slower in 9.2. They're >> a simple mix of sql and plpgsql functions which are at >> http://pastebin.com/SXTnNhd5 and which I've attached. >> >> Here's a test query: >> >> select tree_ancestor_keys('000000000000000100000001'); >> >> According to explain analyze on both 8.4 and 9.2 they have the same >> plan. However, on the same machine the query is about 40% slower on >> 9.2. Note we're not hitting the disks, or even buffers here. It's pure >> in memory plpsql and sql that we're running. >> >> explain analyze select tree_ancestor_keys('000000000000000100000001') >> from generate_series(1,1000); >> >> On 8.4 runs in about 280 to 300 ms. (you can run it once and get the >> same diff, it's just easier to see with the generate series forcing it >> to run 1000 times to kind of even out the noise.) >> >> On 9.2, same machine, clean fresh dbs etc, it runs in ~400 ms. And >> that difference seems to be there on all plpgsql and sql functions. >> >> In our application, these tree functions get called millions and >> millions of times a day, and a 40% performance penalty is a pretty big >> deal. >> >> We're already using the trick of telling the query planner that this >> function will return 1 row with alter function rows 1 etc. That helps >> a lot but it doesn't fix this underlying performance issue. >> >> Server versions are 8.4.22 (last I think) and 9.2.9. >> >> If anyone has any suggestions I'd love to hear them. > > > I don't know why this regressed between those versions, but looking at the > functions, there's some low-hanging fruit: > > 1. tree_ancestor_keys() could use UNION ALL instead of UNION. (I believe > duplicates are expected here, although I'm not 100% sure). > > 2. tree_ancestor_keys() calculates tree_level($1) every time it recurses. > Would be cheaper to calculate once, and pass it as argument. > > Put together: > > CREATE FUNCTION tree_ancestor_keys(bit varying, integer, integer) RETURNS > SETOF bit varying > LANGUAGE sql IMMUTABLE STRICT > AS $_$ > select tree_ancestor_key($1, $2) > union all > select tree_ancestor_keys($1, $2 + 1, $3) > where $2 < $3 > $_$; > > CREATE or replace FUNCTION tree_ancestor_keys(bit varying, integer) RETURNS > SETOF bit varying > LANGUAGE sql IMMUTABLE STRICT > AS $_$ > select tree_ancestor_keys($1, $2 + 1, tree_level($1)) > $_$; > > These changes make your test query go about 2x faster on my laptop, with git > master. I'm sure you could optimize the functions further, but those at > least seem like fairly safe and simple changes. Wow that made a huge difference. About a 50% increase across the board. Sadly, 9.2 is still way behind 8.4 (see Tom's email) Here's the results of 10 runs, 9.2 old functions, 9.2 new functions, 8.4 old functions, 8.4 new functions: 402.454 ms 217.718 ms 283.289 ms 167.108 ms 390.828 ms 219.644 ms 282.994 ms 165.524 ms 397.987 ms 216.864 ms 289.053 ms 170.821 ms 391.670 ms 220.943 ms 296.410 ms 164.190 ms 437.099 ms 233.360 ms 284.279 ms 183.919 ms 473.945 ms 291.199 ms 347.916 ms 168.300 ms 453.974 ms 231.350 ms 367.517 ms 158.717 ms 377.221 ms 226.697 ms 297.255 ms 164.196 ms 396.812 ms 262.638 ms 300.073 ms 161.325 ms 436.822 ms 227.489 ms 292.553 ms 179.194 ms 405.929 ms 233.461 ms 267.355 ms 162.847 ms so about 400 versus about 220, and about 290 versus about 165 or so.
I wrote: > Scott Marlowe <scott.marlowe@gmail.com> writes: >> OK so there's a simple set of tree functions we use at work. They're >> quite fast in 8.4 and they've gotten about 40% slower in 9.2. > Hmm, I don't like the trend here. For the repeat-1000x query, I get > these reported execution times: > 8.4 360 ms > 9.0 365 ms > 9.1 440 ms > 9.2 510 ms > 9.3 550 ms > 9.4 570 ms > head 570 ms I found part of the issue: you're doing a lot of UNIONs on varbit columns, and every time we parse one of those, get_sort_group_operators asks the typcache about hash operators for the type. typcache finds out that varbit has no default hash opclass ... but *it doesn't cache negative answers*. So that means a physical lookup in pg_opclass every time :-(. That is actually the only bufmgr traffic induced by this test query, once the catalog caches are loaded. Versions before 9.1 don't have that hit because they didn't consider hashing for UNIONs. I made a quick-hack patch to suppress redundant GetDefaultOpclass calls in typcache.c, and found that that brought HEAD's runtime down to 460ms. I don't think I'd want to commit this in its current form, but with some additions to flush the cache after pg_opclass updates it would be a credible improvement. So that probably explains the jump from 9.0 to 9.1. Don't know yet about the other lossage. regards, tom lane
Scott Marlowe <scott.marlowe@gmail.com> writes: > On Tue, Nov 25, 2014 at 1:58 PM, Heikki Linnakangas > <hlinnakangas@vmware.com> wrote: >> I don't know why this regressed between those versions, but looking at the >> functions, there's some low-hanging fruit: >> >> 1. tree_ancestor_keys() could use UNION ALL instead of UNION. (I believe >> duplicates are expected here, although I'm not 100% sure). >> >> 2. tree_ancestor_keys() calculates tree_level($1) every time it recurses. >> Would be cheaper to calculate once, and pass it as argument. > Wow that made a huge difference. About a 50% increase across the > board. Sadly, 9.2 is still way behind 8.4 (see Tom's email) Switching from UNION to UNION ALL would dodge the varbit hash-opclass caching issue, I think. But there's still something else going on. regards, tom lane
I wrote: >> Hmm, I don't like the trend here. For the repeat-1000x query, I get >> these reported execution times: >> 8.4 360 ms >> 9.0 365 ms >> 9.1 440 ms >> 9.2 510 ms >> 9.3 550 ms >> 9.4 570 ms >> head 570 ms > I made a quick-hack patch to suppress redundant GetDefaultOpclass calls > in typcache.c, and found that that brought HEAD's runtime down to 460ms. I found some additional low-hanging fruit by comparing gprof call counts in 8.4 and HEAD: * OverrideSearchPathMatchesCurrent(), which is not there at all in 8.4 or 9.2, accounts for a depressingly large amount of palloc/pfree traffic. The implementation was quick-n-dirty to begin with, but workloads like this one call it often enough to make it a pain point. * plpgsql's setup_param_list() contributes another large fraction of added palloc/pfree traffic; this is evidently caused by the temporary bitmapset needed for its bms_first_member() loop, which was not there in 8.4 but is there in 9.2. I've been able to bring HEAD's runtime down to about 415 ms with the collection of more-or-less quick hacks attached. None of them are ready to commit but I thought I'd post them for the record. After review of all this, I think the aspect of your example that is causing performance issues is that there are a lot of non-inline-able SQL-language function calls. That's not a case that we've put much thought into lately. I doubt we are going to get all the way back to where 8.4 was in the short term, because I can see that there is a significant amount of new computation associated with collation management during parsing (catcache lookups driven by get_typcollation, assign_collations_walker, etc). The long-term answer to that is to improve the SQL-language function support so that it can cache the results of parsing the function body; we have surely got enough plancache support for that now, but no one's attempted to apply it in functions.c. regards, tom lane diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c index 8c6c7fc..bf80563 100644 *** a/src/backend/utils/cache/typcache.c --- b/src/backend/utils/cache/typcache.c *************** static HTAB *TypeCacheHash = NULL; *** 77,82 **** --- 77,84 ---- #define TCFLAGS_CHECKED_FIELD_PROPERTIES 0x0010 #define TCFLAGS_HAVE_FIELD_EQUALITY 0x0020 #define TCFLAGS_HAVE_FIELD_COMPARE 0x0040 + #define TCFLAGS_CHECKED_BTREE_OPCLASS 0x0080 + #define TCFLAGS_CHECKED_HASH_OPCLASS 0x0100 /* Private information to support comparisons of enum values */ typedef struct *************** lookup_type_cache(Oid type_id, int flags *** 227,249 **** { Oid opclass; ! opclass = GetDefaultOpClass(type_id, BTREE_AM_OID); ! if (OidIsValid(opclass)) { ! typentry->btree_opf = get_opclass_family(opclass); ! typentry->btree_opintype = get_opclass_input_type(opclass); } /* If no btree opclass, we force lookup of the hash opclass */ if (typentry->btree_opf == InvalidOid) { if (typentry->hash_opf == InvalidOid) { ! opclass = GetDefaultOpClass(type_id, HASH_AM_OID); ! if (OidIsValid(opclass)) { ! typentry->hash_opf = get_opclass_family(opclass); ! typentry->hash_opintype = get_opclass_input_type(opclass); } } } else --- 229,259 ---- { Oid opclass; ! if (!(typentry->flags & TCFLAGS_CHECKED_BTREE_OPCLASS)) { ! opclass = GetDefaultOpClass(type_id, BTREE_AM_OID); ! if (OidIsValid(opclass)) ! { ! typentry->btree_opf = get_opclass_family(opclass); ! typentry->btree_opintype = get_opclass_input_type(opclass); ! } ! typentry->flags |= TCFLAGS_CHECKED_BTREE_OPCLASS; } /* If no btree opclass, we force lookup of the hash opclass */ if (typentry->btree_opf == InvalidOid) { if (typentry->hash_opf == InvalidOid) { ! if (!(typentry->flags & TCFLAGS_CHECKED_HASH_OPCLASS)) { ! opclass = GetDefaultOpClass(type_id, HASH_AM_OID); ! if (OidIsValid(opclass)) ! { ! typentry->hash_opf = get_opclass_family(opclass); ! typentry->hash_opintype = get_opclass_input_type(opclass); ! } } + typentry->flags |= TCFLAGS_CHECKED_HASH_OPCLASS; } } else *************** lookup_type_cache(Oid type_id, int flags *** 268,278 **** { Oid opclass; ! opclass = GetDefaultOpClass(type_id, HASH_AM_OID); ! if (OidIsValid(opclass)) { ! typentry->hash_opf = get_opclass_family(opclass); ! typentry->hash_opintype = get_opclass_input_type(opclass); } } --- 278,292 ---- { Oid opclass; ! if (!(typentry->flags & TCFLAGS_CHECKED_HASH_OPCLASS)) { ! opclass = GetDefaultOpClass(type_id, HASH_AM_OID); ! if (OidIsValid(opclass)) ! { ! typentry->hash_opf = get_opclass_family(opclass); ! typentry->hash_opintype = get_opclass_input_type(opclass); ! } ! typentry->flags |= TCFLAGS_CHECKED_HASH_OPCLASS; } } diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c index 911f015..ca93307 100644 *** a/src/backend/catalog/namespace.c --- b/src/backend/catalog/namespace.c *************** CopyOverrideSearchPath(OverrideSearchPat *** 3145,3164 **** bool OverrideSearchPathMatchesCurrent(OverrideSearchPath *path) { ! /* Easiest way to do this is GetOverrideSearchPath() and compare */ ! bool result; ! OverrideSearchPath *cur; ! cur = GetOverrideSearchPath(CurrentMemoryContext); ! if (path->addCatalog == cur->addCatalog && ! path->addTemp == cur->addTemp && ! equal(path->schemas, cur->schemas)) ! result = true; ! else ! result = false; ! list_free(cur->schemas); ! pfree(cur); ! return result; } /* --- 3145,3178 ---- bool OverrideSearchPathMatchesCurrent(OverrideSearchPath *path) { ! ListCell *lc, *lcc; ! recomputeNamespacePath(); ! lc = list_head(activeSearchPath); ! if (path->addTemp) ! { ! if (lc && lfirst_oid(lc) == myTempNamespace) ! lc = lnext(lc); ! else ! return false; ! } ! if (path->addCatalog) ! { ! if (lc && lfirst_oid(lc) == PG_CATALOG_NAMESPACE) ! lc = lnext(lc); ! else ! return false; ! } ! foreach(lcc, path->schemas) ! { ! if (lc && lfirst_oid(lc) == lfirst_oid(lcc)) ! lc = lnext(lc); ! else ! return false; ! } ! if (lc) ! return false; ! return true; } /* diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h index f770608..a78ff48 100644 *** a/src/include/nodes/bitmapset.h --- b/src/include/nodes/bitmapset.h *************** extern Bitmapset *bms_join(Bitmapset *a, *** 89,94 **** --- 89,95 ---- /* support for iterating through the integer elements of a set: */ extern int bms_first_member(Bitmapset *a); + extern int bms_next_member(const Bitmapset *a, int prevbit); /* support for hashtables using Bitmapsets as keys: */ extern uint32 bms_hash_value(const Bitmapset *a); diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c index c927b78..fbaf377 100644 *** a/src/backend/nodes/bitmapset.c --- b/src/backend/nodes/bitmapset.c *************** bms_first_member(Bitmapset *a) *** 841,846 **** --- 841,899 ---- } /* + * bms_next_member - find next member of a set + * + * Returns smallest member greater than "prevbit", or -1 if there is none. + * + * This is intended as support for iterating through the members of a set. + * The typical pattern is + * + * x = -1; + * while ((x = bms_next_member(inputset, x)) >= 0) + * process member x; + */ + int + bms_next_member(const Bitmapset *a, int prevbit) + { + int nwords; + int wordnum; + bitmapword mask; + + if (a == NULL) + return -1; + nwords = a->nwords; + prevbit++; + mask = (~(bitmapword) 0) << BITNUM(prevbit); + for (wordnum = WORDNUM(prevbit); wordnum < nwords; wordnum++) + { + bitmapword w = a->words[wordnum]; + + /* ignore bits before prevbit */ + w &= mask; + + if (w != 0) + { + int result; + + w = RIGHTMOST_ONE(w); + + result = wordnum * BITS_PER_BITMAPWORD; + while ((w & 255) == 0) + { + w >>= 8; + result += 8; + } + result += rightmost_one_pos[w & 255]; + return result; + } + + /* in subsequent words, consider all bits */ + mask = (~(bitmapword) 0); + } + return -1; + } + + /* * bms_hash_value - compute a hash key for a Bitmapset * * Note: we must ensure that any two bitmapsets that are bms_equal() will diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c index 11cb47b..d6d414f 100644 *** a/src/pl/plpgsql/src/pl_exec.c --- b/src/pl/plpgsql/src/pl_exec.c *************** setup_param_list(PLpgSQL_execstate *esta *** 5263,5269 **** */ if (!bms_is_empty(expr->paramnos)) { - Bitmapset *tmpset; int dno; paramLI = (ParamListInfo) --- 5263,5268 ---- *************** setup_param_list(PLpgSQL_execstate *esta *** 5276,5283 **** paramLI->numParams = estate->ndatums; /* Instantiate values for "safe" parameters of the expression */ ! tmpset = bms_copy(expr->paramnos); ! while ((dno = bms_first_member(tmpset)) >= 0) { PLpgSQL_datum *datum = estate->datums[dno]; --- 5275,5282 ---- paramLI->numParams = estate->ndatums; /* Instantiate values for "safe" parameters of the expression */ ! dno = -1; ! while ((dno = bms_next_member(expr->paramnos, dno)) >= 0) { PLpgSQL_datum *datum = estate->datums[dno]; *************** setup_param_list(PLpgSQL_execstate *esta *** 5292,5298 **** prm->ptype = var->datatype->typoid; } } - bms_free(tmpset); /* * Set up link to active expr where the hook functions can find it. --- 5291,5296 ----
On Tue, Nov 25, 2014 at 8:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >>> Hmm, I don't like the trend here. For the repeat-1000x query, I get >>> these reported execution times: > >>> 8.4 360 ms >>> 9.0 365 ms >>> 9.1 440 ms >>> 9.2 510 ms >>> 9.3 550 ms >>> 9.4 570 ms >>> head 570 ms > >> I made a quick-hack patch to suppress redundant GetDefaultOpclass calls >> in typcache.c, and found that that brought HEAD's runtime down to 460ms. > > I found some additional low-hanging fruit by comparing gprof call counts > in 8.4 and HEAD: > > * OverrideSearchPathMatchesCurrent(), which is not there at all in 8.4 > or 9.2, accounts for a depressingly large amount of palloc/pfree traffic. > The implementation was quick-n-dirty to begin with, but workloads > like this one call it often enough to make it a pain point. > > * plpgsql's setup_param_list() contributes another large fraction of > added palloc/pfree traffic; this is evidently caused by the temporary > bitmapset needed for its bms_first_member() loop, which was not there > in 8.4 but is there in 9.2. > > I've been able to bring HEAD's runtime down to about 415 ms with the > collection of more-or-less quick hacks attached. None of them are > ready to commit but I thought I'd post them for the record. > > After review of all this, I think the aspect of your example that is > causing performance issues is that there are a lot of non-inline-able > SQL-language function calls. That's not a case that we've put much > thought into lately. I doubt we are going to get all the way back to > where 8.4 was in the short term, because I can see that there is a > significant amount of new computation associated with collation > management during parsing (catcache lookups driven by get_typcollation, > assign_collations_walker, etc). The long-term answer to that is to > improve the SQL-language function support so that it can cache the results > of parsing the function body; we have surely got enough plancache support > for that now, but no one's attempted to apply it in functions.c. Thanks so much for the work on this. We won't be applying a patch in prod but we can definitely get a feel for the change on some test boxes. And if I didn't say it, Thanks to Heikki for his advice. Huge difference there too. This is exactly why I love using Postgres so much. The community support. No other software package has this kind of support.