Re: Protect syscache from bloating with negative cache entries

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: Protect syscache from bloating with negative cache entries
Дата
Msg-id 20190701.160259.247597303.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: Protect syscache from bloating with negative cache entries  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Ответы Re: Protect syscache from bloating with negative cache entries  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Список pgsql-hackers
Hello, 

my_gripe> But, still fluctulates by around 5%..
my_gripe> 
my_gripe> If this level of the degradation is still not acceptable, that
my_gripe> means that nothing can be inserted in the code path and the new
my_gripe> code path should be isolated from existing code by using indirect
my_gripe> call.

Finally, after some struggling, I think I could manage to measure
the impact on performace precisely and reliably. Starting from
"make distclean" every time building, then removing all in
$TARGET before installation makes things stable enough. (I don't
think it's good but I didin't investigate the cause..)

I measured time/call by directly calling SearchSysCache3() many
times. It showed that the patch causes around 0.1 microsec
degradation per call. (The funtion overall took about 6.9
microsec on average.)

Next, I counted how many times SearchSysCache is called during a
planning with, as an instance, a query on a partitioned table
having 3000 columns and 1000 partitions.

  explain analyze select sum(c0000) from test.p;

Planner made 6020608 times syscache calls while planning and the
overall planning time was 8641ms. (Exec time was 48ms.) 6020608
times 0.1 us is 602 ms of degradation. So roughly -7% degradation
in planning time in estimation. The degradation was given by
really only the two successive instructions "ADD/conditional
MOVE(CMOVE)". The fact leads to the conclusion that the existing
code path as is doesn't have room for any additional code.


So I sought for room at least for one branch and found that (on
gcc 7.3.1/CentOS7/x64). Interestingly, de-inlining
SearchCatCacheInternal gave me gain on performance by about
3%. Further inlining of CatalogCacheComputeHashValue() gave
another gain about 3%. I could add a branch in
SearchCatCacheInteral within the gain.

I also tried indirect calls but the degradation overwhelmed the
gain, so I choosed branching rather than indirect calls. I didn't
investigated how it happens.


The following is the result. The binaries are build with the same
configuration using -O2.

binary means
  master      : master HEAD.
  patched_off : patched, but pruning disabled (catalog_cache_prune_min_age=-1).
  patched_on  : patched with pruning enabled.
                ("300s" for 1, "1s" for2, "0" for 3)

bench:
  1: corresponds to catcachebench(1); fetching STATRELATTINH 3000
     * 1000 times generating new cache entriies. (Massive cache
       creatiion)
     Pruning doesn't happen while running this.

  2: catcachebench(2); 60000 times cache access on 1000
     STATRELATTINH entries. (Frequent cache reference)
     Pruning doesn't happen while running this.

  3: catcachebench(3); fetching 1000(tbls) * 3000(cols)

     STATRELATTINH entries. Catcache clock advancing with the
     interval of 100(tbls) * 3000(cols) times of access and
     pruning happenshoge.

     While running catcachebench(3) once, pruning happens 28
     times and most of the time 202202 entries are removed and
     the total number of entries was limite to 524289. (The
     systable has 3000 * 1001 = 3003000 tuples.)

iter: Number of iterations. Time ms and stddev is calculated over
     the iterations.


    binar    | bench | iter  |  time ms | stddev
-------------+-------+-------+----------+--------
 master      | 1     |    10 |  8150.30 |  12.96
 master      | 2     |    10 |  4002.88 |  16.18
 master      | 3     |    10 |  9065.06 |  11.46
-------------+-------+-------+----------+--------
 patched_off | 1     |    10 |  8090.95 |   9.95
 patched_off | 2     |    10 |  3984.67 |  12.33
 patched_off | 3     |    10 |  9050.46 |   4.64
-------------+-------+-------+----------+--------
 patched_on  | 1     |    10 |  8158.95 |   6.29
 patched_on  | 2     |    10 |  4023.72 |  10.41
 patched_on  | 3     |    10 | 16532.66 |  18.39

patched_off is slightly faster than master. patched_on is
generally a bit slower. Even though patched_on/3 seems take too
long time, the extra time comes from increased catalog table
acess in exchange of memory saving. (That is, it is expected
behavior.) I ran it several times and most them showed the same
tendency.

As a side-effect, once the branch added, the shared syscache in a
neighbour thread will be able to be inserted together without
impact on existing code path.


===
The benchmark script is used as the follows:

- create many (3000, as example) tables in "test" schema. I
  created a partitioned table with 3000 children.

- The tables have many columns, 1000 for me.

- Run the following commands.

  =# select catcachebench(0);  -- warm up systables.
  =# set catalog_cache_prune_min_age = any; -- as required
  =# select catcachebench(n);  -- 3 >= n >= 1, the number of "bench" above.


  The above result is taked with the following query.

  =# select 'patched_on', '3' , count(a), avg(a)::numeric(10,2), stddev(a)::numeric(10,2) from (select catcachebench(3)
fromgenerate_series(1, 10)) as a(a);
 

====
The attached patches are:

0001-Adjust-inlining-of-some-functions.patch:

 Changes inlining property of two functions,
 SearchCatCacheInternal and CatalogCacheComputeHashValue.

0002-Benchmark-extension-and-required-core-change.patch:

 Micro benchmark of SearchSysCache3() and core-side tweaks, which
 is out-of this patch set in the view of functionality. Works for
 0001 but not for 0004 or later. 0003 adjusts that.

0003-Adjust-catcachebench-for-later-patches.patch

 Adjustment of 0002, benchmark for 0004, the body of this
 patchset. Breaks code consistency until 0004 applied.

0004-Catcache-pruning-feature.patch

 The feature patch, intentionally unchanges indentation of an
 existing code block in SearchCatCacheInternal for smaller size
 of the patch. It is adjusted in the next 0005 patch.

0005-Adjust-indentation-of-SearchCatCacheInternal.patch

 Adjusts indentation of 0004.


0001+4+5 is the final shape of the patch set and 0002+3 is only
for benchmarking.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 927ce9035e13240378c7c332610bdde9377c2d7b Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Fri, 28 Jun 2019 16:29:52 +0900
Subject: [PATCH 1/5] Adjust inlining of some functions

SearchCatCacheInternal code path is quite short and hot so that it
doesn't accept additional cycles in the function. But changing inline
attribute of SearchCatCacheInternal and CatalogCacheComputeHashValue
makes SearchCatCacheN faster by about 6%. This makes room for an extra
branch to be the door to other implementations of catcache.
---
 src/backend/utils/cache/catcache.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 00def27881..8fc067ce31 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -63,10 +63,10 @@
 /* Cache management header --- pointer is NULL until created */
 static CatCacheHeader *CacheHdr = NULL;
 
-static inline HeapTuple SearchCatCacheInternal(CatCache *cache,
-                                               int nkeys,
-                                               Datum v1, Datum v2,
-                                               Datum v3, Datum v4);
+static HeapTuple SearchCatCacheInternal(CatCache *cache,
+                                        int nkeys,
+                                        Datum v1, Datum v2,
+                                        Datum v3, Datum v4);
 
 static pg_noinline HeapTuple SearchCatCacheMiss(CatCache *cache,
                                                 int nkeys,
@@ -75,8 +75,9 @@ static pg_noinline HeapTuple SearchCatCacheMiss(CatCache *cache,
                                                 Datum v1, Datum v2,
                                                 Datum v3, Datum v4);
 
-static uint32 CatalogCacheComputeHashValue(CatCache *cache, int nkeys,
-                                           Datum v1, Datum v2, Datum v3, Datum v4);
+static inline uint32 CatalogCacheComputeHashValue(CatCache *cache, int nkeys,
+                                                  Datum v1, Datum v2,
+                                                  Datum v3, Datum v4);
 static uint32 CatalogCacheComputeTupleHashValue(CatCache *cache, int nkeys,
                                                 HeapTuple tuple);
 static inline bool CatalogCacheCompareTuple(const CatCache *cache, int nkeys,
@@ -266,7 +267,7 @@ GetCCHashEqFuncs(Oid keytype, CCHashFN *hashfunc, RegProcedure *eqfunc, CCFastEq
  *
  * Compute the hash value associated with a given set of lookup keys
  */
-static uint32
+static inline uint32
 CatalogCacheComputeHashValue(CatCache *cache, int nkeys,
                              Datum v1, Datum v2, Datum v3, Datum v4)
 {
@@ -1194,7 +1195,7 @@ SearchCatCache4(CatCache *cache,
 /*
  * Work-horse for SearchCatCache/SearchCatCacheN.
  */
-static inline HeapTuple
+static HeapTuple
 SearchCatCacheInternal(CatCache *cache,
                        int nkeys,
                        Datum v1,
-- 
2.16.3

From f0f5833ddfd0aac934cc7b5ded93541810c486d3 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Fri, 28 Jun 2019 17:03:07 +0900
Subject: [PATCH 2/5] Benchmark extension and required core change

Micro benchmark extension for SearchSysCache and required core-side
code.
---
 contrib/catcachebench/Makefile               |  17 ++
 contrib/catcachebench/catcachebench--0.0.sql |   9 +
 contrib/catcachebench/catcachebench.c        | 281 +++++++++++++++++++++++++++
 contrib/catcachebench/catcachebench.control  |   6 +
 src/backend/utils/cache/catcache.c           |  13 ++
 5 files changed, 326 insertions(+)
 create mode 100644 contrib/catcachebench/Makefile
 create mode 100644 contrib/catcachebench/catcachebench--0.0.sql
 create mode 100644 contrib/catcachebench/catcachebench.c
 create mode 100644 contrib/catcachebench/catcachebench.control

diff --git a/contrib/catcachebench/Makefile b/contrib/catcachebench/Makefile
new file mode 100644
index 0000000000..0478818b25
--- /dev/null
+++ b/contrib/catcachebench/Makefile
@@ -0,0 +1,17 @@
+MODULE_big = catcachebench
+OBJS = catcachebench.o
+
+EXTENSION = catcachebench
+DATA = catcachebench--0.0.sql
+PGFILEDESC = "catcachebench - benchmark for catcache pruning feature"
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/catcachebench
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/catcachebench/catcachebench--0.0.sql b/contrib/catcachebench/catcachebench--0.0.sql
new file mode 100644
index 0000000000..e091baaaa7
--- /dev/null
+++ b/contrib/catcachebench/catcachebench--0.0.sql
@@ -0,0 +1,9 @@
+/* contrib/catcachebench/catcachebench--0.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION catcachebench" to load this file. \quit
+
+CREATE FUNCTION catcachebench(IN type int)
+RETURNS double precision
+AS 'MODULE_PATHNAME', 'catcachebench'
+LANGUAGE C STRICT VOLATILE;
diff --git a/contrib/catcachebench/catcachebench.c b/contrib/catcachebench/catcachebench.c
new file mode 100644
index 0000000000..0cebbbde4f
--- /dev/null
+++ b/contrib/catcachebench/catcachebench.c
@@ -0,0 +1,281 @@
+/*
+ * catcachebench: test code for cache pruning feature
+ */
+#include "postgres.h"
+#include "catalog/pg_type.h"
+#include "catalog/pg_statistic.h"
+#include "executor/spi.h"
+#include "libpq/pqsignal.h"
+#include "utils/catcache.h"
+#include "utils/syscache.h"
+#include "utils/timestamp.h"
+
+Oid        tableoids[10000];
+int        ntables = 0;
+int16    attnums[1000];
+int        natts = 0;
+
+PG_MODULE_MAGIC;
+
+double catcachebench1(void);
+double catcachebench2(void);
+double catcachebench3(void);
+void collectinfo(void);
+void catcachewarmup(void);
+
+PG_FUNCTION_INFO_V1(catcachebench);
+
+Datum
+catcachebench(PG_FUNCTION_ARGS)
+{
+    int        testtype = PG_GETARG_INT32(0);
+    double    ms;
+    extern bool _catcache_shrink_buckets;
+
+    collectinfo();
+
+    /* flush the catalog -- safe? don't mind. */
+    
+    _catcache_shrink_buckets = true;
+    CatalogCacheFlushCatalog(StatisticRelationId);
+    _catcache_shrink_buckets = false;
+
+    switch (testtype)
+    {
+    case 0:
+        catcachewarmup(); /* prewarm of syscatalog */
+        PG_RETURN_NULL();
+    case 1:
+        ms = catcachebench1(); break;
+    case 2:
+        ms = catcachebench2(); break;
+    case 3:
+        ms = catcachebench3(); break;
+    default:
+        elog(ERROR, "Invalid test type: %d", testtype);
+    }
+
+    PG_RETURN_DATUM(Float8GetDatum(ms));
+}
+
+/*
+ * fetch all attribute entires of all tables.
+ */
+double
+catcachebench1(void)
+{
+    int t, a;
+    instr_time    start,
+                duration;
+
+    PG_SETMASK(&BlockSig);
+    INSTR_TIME_SET_CURRENT(start);
+    for (t = 0 ; t < ntables ; t++)
+    {
+        for (a = 0 ; a < natts ; a++)
+        {
+            HeapTuple tup;
+
+            tup = SearchSysCache3(STATRELATTINH,
+                                  ObjectIdGetDatum(tableoids[t]),
+                                  Int16GetDatum(attnums[a]),
+                                  BoolGetDatum(false));
+            /* should be null, but.. */
+            if (HeapTupleIsValid(tup))
+                ReleaseSysCache(tup);
+        }
+    }
+    INSTR_TIME_SET_CURRENT(duration);
+    INSTR_TIME_SUBTRACT(duration, start);
+    PG_SETMASK(&UnBlockSig);
+
+    return INSTR_TIME_GET_MILLISEC(duration);
+};
+
+/*
+ * fetch all attribute entires of a table 6000 times.
+ */
+double
+catcachebench2(void)
+{
+    const int clock_step = 100;
+    int t, a;
+    instr_time    start,
+                duration;
+
+    PG_SETMASK(&BlockSig);
+    INSTR_TIME_SET_CURRENT(start);
+    for (t = 0 ; t < 60000 ; t++)
+    {
+        int ct = clock_step;
+
+        /*
+         * catcacheclock is updated by transaction timestamp, so needs to
+         * be updated by other means for this test to work. Here I choosed
+         * to update the clock every 100 times of table scans.
+         */
+        if (--ct < 0)
+        {
+            // We don't have it yet.
+            //SetCatCacheClock(GetCurrentTimestamp());
+            GetCurrentTimestamp();
+            ct = clock_step;
+        }
+        for (a = 0 ; a < natts ; a++)
+        {
+            HeapTuple tup;
+
+            tup = SearchSysCache3(STATRELATTINH,
+                                  ObjectIdGetDatum(tableoids[0]),
+                                  Int16GetDatum(attnums[a]),
+                                  BoolGetDatum(false));
+            /* should be null, but.. */
+            if (HeapTupleIsValid(tup))
+                ReleaseSysCache(tup);
+        }
+    }
+    INSTR_TIME_SET_CURRENT(duration);
+    INSTR_TIME_SUBTRACT(duration, start);
+    PG_SETMASK(&UnBlockSig);
+
+    return INSTR_TIME_GET_MILLISEC(duration);
+};
+
+/*
+ * fetch all attribute entires of all tables twice with having expiration
+ * happen.
+ */
+double
+catcachebench3(void)
+{
+    const int clock_step = 100;
+    int i, t, a;
+    instr_time    start,
+                duration;
+
+    PG_SETMASK(&BlockSig);
+    INSTR_TIME_SET_CURRENT(start);
+    for (i = 0 ; i < 2 ; i++)
+    {
+        int ct = clock_step;
+
+        for (t = 0 ; t < ntables ; t++)
+        {
+            /*
+             * catcacheclock is updated by transaction timestamp, so needs to
+             * be updated by other means for this test to work. Here I choosed
+             * to update the clock every 100 tables scan.
+             */
+            if (--ct < 0)
+            {
+                // We don't have it yet.
+                //SetCatCacheClock(GetCurrentTimestamp());
+                GetCurrentTimestamp();
+                ct = clock_step;
+            }
+            for (a = 0 ; a < natts ; a++)
+            {
+                HeapTuple tup;
+
+                tup = SearchSysCache3(STATRELATTINH,
+                                      ObjectIdGetDatum(tableoids[t]),
+                                      Int16GetDatum(attnums[a]),
+                                      BoolGetDatum(false));
+                /* should be null, but.. */
+                if (HeapTupleIsValid(tup))
+                    ReleaseSysCache(tup);
+            }
+        }
+    }
+    INSTR_TIME_SET_CURRENT(duration);
+    INSTR_TIME_SUBTRACT(duration, start);
+    PG_SETMASK(&UnBlockSig);
+
+    return INSTR_TIME_GET_MILLISEC(duration);
+};
+
+void
+catcachewarmup(void)
+{
+    int t, a;
+
+    /* load up catalog tables */
+    for (t = 0 ; t < ntables ; t++)
+    {
+        for (a = 0 ; a < natts ; a++)
+        {
+            HeapTuple tup;
+
+            tup = SearchSysCache3(STATRELATTINH,
+                                  ObjectIdGetDatum(tableoids[t]),
+                                  Int16GetDatum(attnums[a]),
+                                  BoolGetDatum(false));
+            /* should be null, but.. */
+            if (HeapTupleIsValid(tup))
+                ReleaseSysCache(tup);
+        }
+    }
+}
+
+void
+collectinfo(void)
+{
+    int ret;
+    Datum    values[10000];
+    bool    nulls[10000];
+    Oid        types0[] = {OIDOID};
+    int i;
+
+    ntables = 0;
+    natts = 0;
+
+    SPI_connect();
+    /* collect target tables */
+    ret = SPI_execute("select oid from pg_class where relnamespace = (select oid from pg_namespace where nspname =
\'test\')",
+                      true, 0);
+    if (ret != SPI_OK_SELECT)
+        elog(ERROR, "Failed 1");
+    if (SPI_processed == 0)
+        elog(ERROR, "no relation found in schema \"test\"");
+    if (SPI_processed > 10000)
+        elog(ERROR, "too many relation found in schema \"test\"");
+
+    for (i = 0 ; i < SPI_processed ; i++)
+    {
+        heap_deform_tuple(SPI_tuptable->vals[i], SPI_tuptable->tupdesc,
+                          values, nulls);
+        if (nulls[0])
+            elog(ERROR, "Failed 2");
+
+        tableoids[ntables++] = DatumGetObjectId(values[0]);
+    }
+    SPI_finish();
+    elog(DEBUG1, "%d tables found", ntables);
+
+    values[0] = ObjectIdGetDatum(tableoids[0]);
+    nulls[0] = false;
+    SPI_connect();
+    ret = SPI_execute_with_args("select attnum from pg_attribute where attrelid = (select oid from pg_class where oid
=$1)",
 
+                                1, types0, values, NULL, true, 0);
+    if (SPI_processed == 0)
+        elog(ERROR, "no attribute found in table %d", tableoids[0]);
+    if (SPI_processed > 10000)
+        elog(ERROR, "too many relation found in table %d", tableoids[0]);
+    
+    /* collect target attributes. assuming all tables have the same attnums */
+    for (i = 0 ; i < SPI_processed ; i++)
+    {
+        int16 attnum;
+
+        heap_deform_tuple(SPI_tuptable->vals[i], SPI_tuptable->tupdesc,
+                          values, nulls);
+        if (nulls[0])
+            elog(ERROR, "Failed 3");
+        attnum = DatumGetInt16(values[0]);
+
+        if (attnum > 0)
+            attnums[natts++] = attnum;
+    }
+    SPI_finish();
+    elog(DEBUG1, "%d attributes found", natts);
+}
diff --git a/contrib/catcachebench/catcachebench.control b/contrib/catcachebench/catcachebench.control
new file mode 100644
index 0000000000..3fc9d2e420
--- /dev/null
+++ b/contrib/catcachebench/catcachebench.control
@@ -0,0 +1,6 @@
+# catcachebench
+
+comment = 'benchmark for catcache pruning'
+default_version = '0.0'
+module_pathname = '$libdir/catcachebench'
+relocatable = true
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 8fc067ce31..98427b67cd 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -716,6 +716,9 @@ ResetCatalogCaches(void)
  *    rather than relying on the relcache to keep a tupdesc for us.  Of course
  *    this assumes the tupdesc of a cachable system table will not change...)
  */
+/* CODE FOR catcachebench: REMOVE ME AFTER USE */
+bool _catcache_shrink_buckets = false;
+/* END: CODE FOR catcachebench*/
 void
 CatalogCacheFlushCatalog(Oid catId)
 {
@@ -735,6 +738,16 @@ CatalogCacheFlushCatalog(Oid catId)
 
             /* Tell inval.c to call syscache callbacks for this cache */
             CallSyscacheCallbacks(cache->id, 0);
+
+            /* CODE FOR catcachebench: REMOVE ME AFTER USE */
+            if (_catcache_shrink_buckets)
+            {
+                cache->cc_nbuckets = 128;
+                pfree(cache->cc_bucket);
+                cache->cc_bucket = palloc0(128 * sizeof(dlist_head));
+                elog(LOG, "Catcache reset");
+            }
+            /* END: CODE FOR catcachebench*/
         }
     }
 
-- 
2.16.3

From fcd0273933f3f53979749ba2e315043f1a6f6f31 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Mon, 1 Jul 2019 15:08:11 +0900
Subject: [PATCH 3/5] Adjust catcachebench for later patches

Make the benchmark use SetCatCacheClock, which is being introduced by
the next patch. This temprarily breaks consistency until the next
patch is applied.
---
 contrib/catcachebench/catcachebench.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/contrib/catcachebench/catcachebench.c b/contrib/catcachebench/catcachebench.c
index 0cebbbde4f..63a7400463 100644
--- a/contrib/catcachebench/catcachebench.c
+++ b/contrib/catcachebench/catcachebench.c
@@ -116,9 +116,7 @@ catcachebench2(void)
          */
         if (--ct < 0)
         {
-            // We don't have it yet.
-            //SetCatCacheClock(GetCurrentTimestamp());
-            GetCurrentTimestamp();
+            SetCatCacheClock(GetCurrentTimestamp());
             ct = clock_step;
         }
         for (a = 0 ; a < natts ; a++)
@@ -168,9 +166,7 @@ catcachebench3(void)
              */
             if (--ct < 0)
             {
-                // We don't have it yet.
-                //SetCatCacheClock(GetCurrentTimestamp());
-                GetCurrentTimestamp();
+                SetCatCacheClock(GetCurrentTimestamp());
                 ct = clock_step;
             }
             for (a = 0 ; a < natts ; a++)
-- 
2.16.3

From 57129a5a001b7729f8888579bc11e47cf7192801 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Mon, 1 Jul 2019 11:31:54 +0900
Subject: [PATCH 4/5] Catcache pruning feature.

Currently we don't have a mechanism to limit the memory amount for
syscache. Syscache bloat often causes process die by OOM killer or
other problems. This patch lets old syscache entries removed to
eventually limit the amount of cache.

This patch intentionally unchanges indentation of an existing code
block in SearchCatCacheInternal for the patch size to be smaller. It
is adjusted in the next patch.
---
 src/backend/utils/cache/catcache.c | 186 +++++++++++++++++++++++++++++++++++++
 src/backend/utils/misc/guc.c       |  12 +++
 src/include/utils/catcache.h       |  17 ++++
 3 files changed, 215 insertions(+)

diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 98427b67cd..b552ae960c 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -60,9 +60,18 @@
 #define CACHE_elog(...)
 #endif
 
+/*
+ * GUC variable to define the minimum age of entries that will be considered
+ * to be evicted in seconds. -1 to disable the feature.
+ */
+int catalog_cache_prune_min_age = -1;
+
 /* Cache management header --- pointer is NULL until created */
 static CatCacheHeader *CacheHdr = NULL;
 
+/* Clock for the last accessed time of a catcache entry. */
+TimestampTz    catcacheclock = 0;
+
 static HeapTuple SearchCatCacheInternal(CatCache *cache,
                                         int nkeys,
                                         Datum v1, Datum v2,
@@ -864,9 +873,107 @@ InitCatCache(int id,
      */
     MemoryContextSwitchTo(oldcxt);
 
+    /* initialize catcache reference clock if haven't done yet */
+    if (catcacheclock == 0)
+        catcacheclock = GetCurrentTimestamp();
+
+    /*
+     * This cache doesn't contain a tuple older than the current time. Prevent
+     * the first pruning from happening too early.
+     */
+    cp->cc_oldest_ts = catcacheclock;
+
     return cp;
 }
 
+/*
+ * CatCacheCleanupOldEntries - Remove infrequently-used entries
+ *
+ * Catcache entries happen to be left unused for a long time for several
+ * reasons. Remove such entries to prevent catcache from bloating. It is based
+ * on the similar algorithm with buffer eviction. Entries that are accessed
+ * several times in a certain period live longer than those that have had less
+ * access in the same duration.
+ */
+static bool
+CatCacheCleanupOldEntries(CatCache *cp)
+{
+    int        nremoved = 0;
+    int        i;
+    long    oldest_ts = catcacheclock;
+    long    age;
+    int        us;
+
+    /* Return immediately if disabled */
+    if (catalog_cache_prune_min_age < 0)
+        return false;
+
+    /* Don't scan the hash when we know we don't have prunable entries */
+    TimestampDifference(cp->cc_oldest_ts, catcacheclock, &age, &us);
+    if (age < catalog_cache_prune_min_age)
+        return false;
+
+    /* Scan over the whole hash to find entries to remove */
+    for (i = 0 ; i < cp->cc_nbuckets ; i++)
+    {
+        dlist_mutable_iter    iter;
+
+        dlist_foreach_modify(iter, &cp->cc_bucket[i])
+        {
+            CatCTup    *ct = dlist_container(CatCTup, cache_elem, iter.cur);
+
+            /* Don't remove referenced entries */
+            if (ct->refcount == 0 &&
+                (ct->c_list == NULL || ct->c_list->refcount == 0))
+            {
+                /*
+                 * Calculate the duration from the time from the last access
+                 * to the "current" time. catcacheclock is updated
+                 * per-statement basis and additionaly udpated periodically
+                 * during a long running query.
+                 */
+                TimestampDifference(ct->lastaccess, catcacheclock, &age, &us);
+
+                if (age > catalog_cache_prune_min_age)
+                {
+                    /*
+                     * Entries that are not accessed after the last pruning
+                     * are removed in that seconds, and their lives are
+                     * prolonged according to how many times they are accessed
+                     * up to three times of the duration. We don't try shrink
+                     * buckets since pruning effectively caps catcache
+                     * expansion in the long term.
+                     */
+                    if (ct->naccess > 2)
+                        ct->naccess = 1;
+                    else if (ct->naccess > 0)
+                        ct->naccess--;
+                    else
+                    {
+                        CatCacheRemoveCTup(cp, ct);
+                        nremoved++;
+
+                        /* don't update oldest_ts by removed entry */
+                        continue;
+                    }
+                }
+            }
+
+            /* update oldest timestamp if the entry remains alive */
+            if (ct->lastaccess < oldest_ts)
+                oldest_ts = ct->lastaccess;
+        }
+    }
+
+    cp->cc_oldest_ts = oldest_ts;
+
+    if (nremoved > 0)
+        elog(DEBUG1, "pruning catalog cache id=%d for %s: removed %d / %d",
+             cp->id, cp->cc_relname, nremoved, cp->cc_ntup + nremoved);
+
+    return nremoved > 0;
+}
+
 /*
  * Enlarge a catcache, doubling the number of buckets.
  */
@@ -880,6 +987,10 @@ RehashCatCache(CatCache *cp)
     elog(DEBUG1, "rehashing catalog cache id %d for %s; %d tups, %d buckets",
          cp->id, cp->cc_relname, cp->cc_ntup, cp->cc_nbuckets);
 
+    /* try removing old entries before expanding the hash */
+    if (CatCacheCleanupOldEntries(cp))
+        return;
+
     /* Allocate a new, larger, hash table. */
     newnbuckets = cp->cc_nbuckets * 2;
     newbucket = (dlist_head *) MemoryContextAllocZero(CacheMemoryContext, newnbuckets * sizeof(dlist_head));
@@ -1257,6 +1368,14 @@ SearchCatCacheInternal(CatCache *cache,
      * dlist within the loop, because we don't continue the loop afterwards.
      */
     bucket = &cache->cc_bucket[hashIndex];
+
+    /*
+     * Even though this branch leads to duplicate of a bit much code, we want
+     * as less branches as possible here to keep fastest when pruning is
+     * disabled. Don't try to move this branch to foreach to save lines.
+     */
+    if (likely(catalog_cache_prune_min_age < 0))
+    {
     dlist_foreach(iter, bucket)
     {
         ct = dlist_container(CatCTup, cache_elem, iter.cur);
@@ -1309,6 +1428,71 @@ SearchCatCacheInternal(CatCache *cache,
             return NULL;
         }
     }
+    }
+    else
+    {
+        /*
+         * We manage the age of each entries for pruning in this branch.
+         */
+        dlist_foreach(iter, bucket)
+        {
+            /* The following section is the same with the if() block */
+            ct = dlist_container(CatCTup, cache_elem, iter.cur);
+
+            if (ct->dead)
+                continue;
+
+            if (ct->hash_value != hashValue)
+                continue;
+
+            if (!CatalogCacheCompareTuple(cache, nkeys, ct->keys, arguments))
+                continue;
+
+            dlist_move_head(bucket, &ct->cache_elem);
+
+            /*
+             * Prolong life of this entry. Since we want run as less
+             * instructions as possible and want the branch be stable for
+             * performance reasons, we don't give a strict cap on the
+             * counter. All numbers above 1 will be regarded as 2 in
+             * CatCacheCleanupOldEntries().
+             */
+            ct->naccess++;
+            if (unlikely(ct->naccess == 0))
+                ct->naccess = 2;
+            ct->lastaccess = catcacheclock;
+
+            /* Following part is also the same with if() block above */
+            if (!ct->negative)
+            {
+                ResourceOwnerEnlargeCatCacheRefs(CurrentResourceOwner);
+                ct->refcount++;
+                ResourceOwnerRememberCatCacheRef(CurrentResourceOwner,
+                                                 &ct->tuple);
+
+                CACHE_elog(DEBUG2, "SearchCatCache(%s): found in bucket %d",
+                           cache->cc_relname, hashIndex);
+                
+#ifdef CATCACHE_STATS
+                cache->cc_hits++;
+#endif
+                
+
+                return &ct->tuple;
+            }
+            else
+            {
+                CACHE_elog(DEBUG2, "SearchCatCache(%s): found neg entry in bucket %d",
+                           cache->cc_relname, hashIndex);
+
+#ifdef CATCACHE_STATS
+                cache->cc_neg_hits++;
+#endif
+
+                return NULL;
+            }
+        }
+    }
 
     return SearchCatCacheMiss(cache, nkeys, hashValue, hashIndex, v1, v2, v3, v4);
 }
@@ -1902,6 +2086,8 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
     ct->dead = false;
     ct->negative = negative;
     ct->hash_value = hashValue;
+    ct->naccess = 0;
+    ct->lastaccess = catcacheclock;
 
     dlist_push_head(&cache->cc_bucket[hashIndex], &ct->cache_elem);
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 92c4fee8f8..c2a4caa44b 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -82,6 +82,7 @@
 #include "tsearch/ts_cache.h"
 #include "utils/builtins.h"
 #include "utils/bytea.h"
+#include "utils/catcache.h"
 #include "utils/guc_tables.h"
 #include "utils/float.h"
 #include "utils/memutils.h"
@@ -2252,6 +2253,17 @@ static struct config_int ConfigureNamesInt[] =
         NULL, NULL, NULL
     },
 
+    {
+        {"catalog_cache_prune_min_age", PGC_USERSET, RESOURCES_MEM,
+            gettext_noop("System catalog cache entries that live unused for longer than this seconds are considered
forremoval."),
 
+            gettext_noop("The value of -1 turns off pruning."),
+            GUC_UNIT_S
+        },
+        &catalog_cache_prune_min_age,
+        300, -1, INT_MAX,
+        NULL, NULL, NULL
+    },
+
     /*
      * We use the hopefully-safely-small value of 100kB as the compiled-in
      * default for max_stack_depth.  InitializeGUCOptions will increase it if
diff --git a/src/include/utils/catcache.h b/src/include/utils/catcache.h
index ff1fabaca1..ad962fb096 100644
--- a/src/include/utils/catcache.h
+++ b/src/include/utils/catcache.h
@@ -22,6 +22,7 @@
 
 #include "access/htup.h"
 #include "access/skey.h"
+#include "datatype/timestamp.h"
 #include "lib/ilist.h"
 #include "utils/relcache.h"
 
@@ -61,6 +62,7 @@ typedef struct catcache
     slist_node    cc_next;        /* list link */
     ScanKeyData cc_skey[CATCACHE_MAXKEYS];    /* precomputed key info for heap
                                              * scans */
+    TimestampTz    cc_oldest_ts;    /* timestamp of the oldest tuple in the hash */
 
     /*
      * Keep these at the end, so that compiling catcache.c with CATCACHE_STATS
@@ -119,6 +121,8 @@ typedef struct catctup
     bool        dead;            /* dead but not yet removed? */
     bool        negative;        /* negative cache entry? */
     HeapTupleData tuple;        /* tuple management header */
+    unsigned int naccess;        /* # of access to this entry */
+    TimestampTz    lastaccess;        /* timestamp of the last usage */
 
     /*
      * The tuple may also be a member of at most one CatCList.  (If a single
@@ -189,6 +193,19 @@ typedef struct catcacheheader
 /* this extern duplicates utils/memutils.h... */
 extern PGDLLIMPORT MemoryContext CacheMemoryContext;
 
+/* for guc.c, not PGDLLPMPORT'ed */
+extern int catalog_cache_prune_min_age;
+
+/* source clock for access timestamp of catcache entries */
+extern TimestampTz catcacheclock;
+
+/* SetCatCacheClock - set catcache timestamp source clodk */
+static inline void
+SetCatCacheClock(TimestampTz ts)
+{
+    catcacheclock = ts;
+}
+
 extern void CreateCacheMemoryContext(void);
 
 extern CatCache *InitCatCache(int id, Oid reloid, Oid indexoid,
-- 
2.16.3

From 772d7fda030e8990fbd84ec44f07120d73682256 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Mon, 1 Jul 2019 14:11:08 +0900
Subject: [PATCH 5/5] Adjust indentation of SearchCatCacheInternal

The previous patch leaves indentation of an existing code block in
SearchCatCacheInternal unchanged for diff size to be small. This
adjusts the indentation.
---
 src/backend/utils/cache/catcache.c | 82 +++++++++++++++++++-------------------
 1 file changed, 41 insertions(+), 41 deletions(-)

diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index b552ae960c..60d0fd28a8 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -1376,59 +1376,59 @@ SearchCatCacheInternal(CatCache *cache,
      */
     if (likely(catalog_cache_prune_min_age < 0))
     {
-    dlist_foreach(iter, bucket)
-    {
-        ct = dlist_container(CatCTup, cache_elem, iter.cur);
-
-        if (ct->dead)
-            continue;            /* ignore dead entries */
-
-        if (ct->hash_value != hashValue)
-            continue;            /* quickly skip entry if wrong hash val */
-
-        if (!CatalogCacheCompareTuple(cache, nkeys, ct->keys, arguments))
-            continue;
-
-        /*
-         * We found a match in the cache.  Move it to the front of the list
-         * for its hashbucket, in order to speed subsequent searches.  (The
-         * most frequently accessed elements in any hashbucket will tend to be
-         * near the front of the hashbucket's list.)
-         */
-        dlist_move_head(bucket, &ct->cache_elem);
-
-        /*
-         * If it's a positive entry, bump its refcount and return it. If it's
-         * negative, we can report failure to the caller.
-         */
-        if (!ct->negative)
+        dlist_foreach(iter, bucket)
         {
-            ResourceOwnerEnlargeCatCacheRefs(CurrentResourceOwner);
-            ct->refcount++;
-            ResourceOwnerRememberCatCacheRef(CurrentResourceOwner, &ct->tuple);
+            ct = dlist_container(CatCTup, cache_elem, iter.cur);
 
-            CACHE_elog(DEBUG2, "SearchCatCache(%s): found in bucket %d",
-                       cache->cc_relname, hashIndex);
+            if (ct->dead)
+                continue;            /* ignore dead entries */
+
+            if (ct->hash_value != hashValue)
+                continue;            /* quickly skip entry if wrong hash val */
+
+            if (!CatalogCacheCompareTuple(cache, nkeys, ct->keys, arguments))
+                continue;
+
+            /*
+             * We found a match in the cache.  Move it to the front of the
+             * list for its hashbucket, in order to speed subsequent searches.
+             * (The most frequently accessed elements in any hashbucket will
+             * tend to be near the front of the hashbucket's list.)
+             */
+            dlist_move_head(bucket, &ct->cache_elem);
+
+            /*
+             * If it's a positive entry, bump its refcount and return it. If
+             * it's negative, we can report failure to the caller.
+             */
+            if (!ct->negative)
+            {
+                ResourceOwnerEnlargeCatCacheRefs(CurrentResourceOwner);
+                ct->refcount++;
+                ResourceOwnerRememberCatCacheRef(CurrentResourceOwner, &ct->tuple);
+
+                CACHE_elog(DEBUG2, "SearchCatCache(%s): found in bucket %d",
+                           cache->cc_relname, hashIndex);
 
 #ifdef CATCACHE_STATS
-            cache->cc_hits++;
+                cache->cc_hits++;
 #endif
 
-            return &ct->tuple;
-        }
-        else
-        {
-            CACHE_elog(DEBUG2, "SearchCatCache(%s): found neg entry in bucket %d",
-                       cache->cc_relname, hashIndex);
+                return &ct->tuple;
+            }
+            else
+            {
+                CACHE_elog(DEBUG2, "SearchCatCache(%s): found neg entry in bucket %d",
+                           cache->cc_relname, hashIndex);
 
 #ifdef CATCACHE_STATS
-            cache->cc_neg_hits++;
+                cache->cc_neg_hits++;
 #endif
 
-            return NULL;
+                return NULL;
+            }
         }
     }
-    }
     else
     {
         /*
-- 
2.16.3


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Khandekar
Дата:
Сообщение: Re: Minimal logical decoding on standbys
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: POC: Cleaning up orphaned files using undo logs