Re: pg_dump versus hash partitioning

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: pg_dump versus hash partitioning
Дата
Msg-id 1912821.1676402493@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: pg_dump versus hash partitioning  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: pg_dump versus hash partitioning  (Robert Haas <robertmhaas@gmail.com>)
Re: pg_dump versus hash partitioning  (Julien Rouhaud <rjuju123@gmail.com>)
Список pgsql-hackers
Here's a set of draft patches around this issue.

0001 does what I last suggested, ie force load-via-partition-root for
leaf tables underneath a partitioned table with a partitioned-by-hash
enum column.  It wasn't quite as messy as I first feared, although we do
need a new query (and pg_dump now knows more about pg_partitioned_table
than it used to).

I was a bit unhappy to read this in the documentation:

        It is best not to use parallelism when restoring from an archive made
        with this option, because <application>pg_restore</application> will
        not know exactly which partition(s) a given archive data item will
        load data into.  This could result in inefficiency due to lock
        conflicts between parallel jobs, or perhaps even restore failures due
        to foreign key constraints being set up before all the relevant data
        is loaded.

This made me wonder if this could be a usable solution at all, but
after thinking for awhile, I don't see how the claim about foreign key
constraints is anything but FUD.  pg_dump/pg_restore have sufficient
dependency logic to prevent that from happening.  I think we can just
drop the "or perhaps ..." clause here, and tolerate the possible
inefficiency as better than failing.

0002 and 0003 are not part of the bug fix, but are some performance
improvements I noticed while working on this.  0002 is pretty minor,
but 0003 is possibly significant if you have a ton of partitions.
I haven't done any performance measurement on it though.

            regards, tom lane

From 0c66c9f1211e16366cf471ef48a52e5447c79424 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Tue, 14 Feb 2023 13:09:04 -0500
Subject: [PATCH v1 1/3] Force load-via-partition-root for hash partitioning on
 an enum column.

Without this, dump and restore will almost always fail because the
enum values will receive different OIDs in the destination database,
and their hash codes will therefore be different.  (Improving the
hash algorithm would not make this situation better, and would
indeed break other cases as well.)

Discussion: https://postgr.es/m/1376149.1675268279@sss.pgh.pa.us
---
 src/bin/pg_dump/common.c  |  18 +++---
 src/bin/pg_dump/pg_dump.c | 120 +++++++++++++++++++++++++++++++++++---
 src/bin/pg_dump/pg_dump.h |   2 +
 3 files changed, 124 insertions(+), 16 deletions(-)

diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c
index a43f2e5553..3d0cfc1b8e 100644
--- a/src/bin/pg_dump/common.c
+++ b/src/bin/pg_dump/common.c
@@ -230,6 +230,9 @@ getSchemaData(Archive *fout, int *numTablesPtr)
     pg_log_info("flagging inherited columns in subtables");
     flagInhAttrs(fout->dopt, tblinfo, numTables);

+    pg_log_info("reading partitioning data");
+    getPartitioningInfo(fout);
+
     pg_log_info("reading indexes");
     getIndexes(fout, tblinfo, numTables);

@@ -285,7 +288,6 @@ static void
 flagInhTables(Archive *fout, TableInfo *tblinfo, int numTables,
               InhInfo *inhinfo, int numInherits)
 {
-    DumpOptions *dopt = fout->dopt;
     int            i,
                 j;

@@ -301,18 +303,18 @@ flagInhTables(Archive *fout, TableInfo *tblinfo, int numTables,
             continue;

         /*
-         * Normally, we don't bother computing anything for non-target tables,
-         * but if load-via-partition-root is specified, we gather information
-         * on every partition in the system so that getRootTableInfo can trace
-         * from any given to leaf partition all the way up to the root.  (We
-         * don't need to mark them as interesting for getTableAttrs, though.)
+         * Normally, we don't bother computing anything for non-target tables.
+         * However, we must find the parents of non-root partitioned tables in
+         * any case, so that we can trace from leaf partitions up to the root
+         * (in case a leaf is to be dumped but its parents are not).  We need
+         * not mark such parents interesting for getTableAttrs, though.
          */
         if (!tblinfo[i].dobj.dump)
         {
             mark_parents = false;

-            if (!dopt->load_via_partition_root ||
-                !tblinfo[i].ispartition)
+            if (!(tblinfo[i].relkind == RELKIND_PARTITIONED_TABLE &&
+                  tblinfo[i].ispartition))
                 find_parents = false;
         }

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 527c7651ab..51b317aa3d 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -320,6 +320,7 @@ static void appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 static char *get_synchronized_snapshot(Archive *fout);
 static void setupDumpWorker(Archive *AH);
 static TableInfo *getRootTableInfo(const TableInfo *tbinfo);
+static bool forcePartitionRootLoad(const TableInfo *tbinfo);


 int
@@ -2228,11 +2229,13 @@ dumpTableData_insert(Archive *fout, const void *dcontext)
             insertStmt = createPQExpBuffer();

             /*
-             * When load-via-partition-root is set, get the root table name
-             * for the partition table, so that we can reload data through the
-             * root table.
+             * When load-via-partition-root is set or forced, get the root
+             * table name for the partition table, so that we can reload data
+             * through the root table.
              */
-            if (dopt->load_via_partition_root && tbinfo->ispartition)
+            if (tbinfo->ispartition &&
+                (dopt->load_via_partition_root ||
+                 forcePartitionRootLoad(tbinfo)))
                 targettab = getRootTableInfo(tbinfo);
             else
                 targettab = tbinfo;
@@ -2430,6 +2433,35 @@ getRootTableInfo(const TableInfo *tbinfo)
     return parentTbinfo;
 }

+/*
+ * forcePartitionRootLoad
+ *     Check if we must force load_via_partition_root for this partition.
+ *
+ * This is required if any level of ancestral partitioned table has an
+ * unsafe partitioning scheme.
+ */
+static bool
+forcePartitionRootLoad(const TableInfo *tbinfo)
+{
+    TableInfo  *parentTbinfo;
+
+    Assert(tbinfo->ispartition);
+    Assert(tbinfo->numParents == 1);
+
+    parentTbinfo = tbinfo->parents[0];
+    if (parentTbinfo->unsafe_partitions)
+        return true;
+    while (parentTbinfo->ispartition)
+    {
+        Assert(parentTbinfo->numParents == 1);
+        parentTbinfo = parentTbinfo->parents[0];
+        if (parentTbinfo->unsafe_partitions)
+            return true;
+    }
+
+    return false;
+}
+
 /*
  * dumpTableData -
  *      dump the contents of a single table
@@ -2456,11 +2488,13 @@ dumpTableData(Archive *fout, const TableDataInfo *tdinfo)
         dumpFn = dumpTableData_copy;

         /*
-         * When load-via-partition-root is set, get the root table name for
-         * the partition table, so that we can reload data through the root
-         * table.
+         * When load-via-partition-root is set or forced, get the root table
+         * name for the partition table, so that we can reload data through
+         * the root table.
          */
-        if (dopt->load_via_partition_root && tbinfo->ispartition)
+        if (tbinfo->ispartition &&
+            (dopt->load_via_partition_root ||
+             forcePartitionRootLoad(tbinfo)))
         {
             TableInfo  *parentTbinfo;

@@ -6744,6 +6778,76 @@ getInherits(Archive *fout, int *numInherits)
     return inhinfo;
 }

+/*
+ * getPartitioningInfo
+ *      get information about partitioning
+ *
+ * For the most part, we only collect partitioning info about tables we
+ * intend to dump.  However, this function has to consider all partitioned
+ * tables in the database, because we need to know about parents of partitions
+ * we are going to dump even if the parents themselves won't be dumped.
+ *
+ * Specifically, what we need to know is whether each partitioned table
+ * has an "unsafe" partitioning scheme that requires us to force
+ * load-via-partition-root mode for its children.  Currently the only case
+ * for which we force that is hash partitioning on enum columns, since the
+ * hash codes depend on enum value OIDs which won't be replicated across
+ * dump-and-reload.  There are other cases in which load-via-partition-root
+ * might be necessary, but we expect users to cope with them.
+ */
+void
+getPartitioningInfo(Archive *fout)
+{
+    PQExpBuffer query;
+    PGresult   *res;
+    int            ntups;
+
+    /* no partitions before v10 */
+    if (fout->remoteVersion < 100000)
+        return;
+    /* needn't bother if schema-only dump */
+    if (fout->dopt->schemaOnly)
+        return;
+
+    query = createPQExpBuffer();
+
+    /*
+     * Unsafe partitioning schemes are exactly those for which hash enum_ops
+     * appears among the partition opclasses.  We needn't check partstrat.
+     *
+     * Note that this query may well retrieve info about tables we aren't
+     * going to dump and hence have no lock on.  That's okay since we need not
+     * invoke any unsafe server-side functions.
+     */
+    appendPQExpBufferStr(query,
+                         "SELECT partrelid FROM pg_partitioned_table WHERE\n"
+                         "(SELECT c.oid FROM pg_opclass c JOIN pg_am a "
+                         "ON c.opcmethod = a.oid\n"
+                         "WHERE opcname = 'enum_ops' "
+                         "AND opcnamespace = 'pg_catalog'::regnamespace "
+                         "AND amname = 'hash') = ANY(partclass)");
+
+    res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+
+    ntups = PQntuples(res);
+
+    for (int i = 0; i < ntups; i++)
+    {
+        Oid            tabrelid = atooid(PQgetvalue(res, i, 0));
+        TableInfo  *tbinfo;
+
+        tbinfo = findTableByOid(tabrelid);
+        if (tbinfo == NULL)
+            pg_fatal("failed sanity check, table OID %u appearing in pg_partitioned_table not found",
+                     tabrelid);
+        tbinfo->unsafe_partitions = true;
+    }
+
+    PQclear(res);
+
+    destroyPQExpBuffer(query);
+}
+
 /*
  * getIndexes
  *      get information about every index on a dumpable table
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index e7cbd8d7ed..bfd3cf17b7 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -318,6 +318,7 @@ typedef struct _tableInfo
     bool        dummy_view;        /* view's real definition must be postponed */
     bool        postponed_def;    /* matview must be postponed into post-data */
     bool        ispartition;    /* is table a partition? */
+    bool        unsafe_partitions;    /* is it an unsafe partitioned table? */

     /*
      * These fields are computed only if we decide the table is interesting
@@ -715,6 +716,7 @@ extern ConvInfo *getConversions(Archive *fout, int *numConversions);
 extern TableInfo *getTables(Archive *fout, int *numTables);
 extern void getOwnedSeqs(Archive *fout, TableInfo tblinfo[], int numTables);
 extern InhInfo *getInherits(Archive *fout, int *numInherits);
+extern void getPartitioningInfo(Archive *fout);
 extern void getIndexes(Archive *fout, TableInfo tblinfo[], int numTables);
 extern void getExtendedStatistics(Archive *fout);
 extern void getConstraints(Archive *fout, TableInfo tblinfo[], int numTables);
--
2.31.1

From 5f73193c120ce04f4e5bb3530fe1ee84afc841f1 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Tue, 14 Feb 2023 13:18:29 -0500
Subject: [PATCH v1 2/3] Don't create useless TableAttachInfo objects.

It's silly to create a TableAttachInfo object that we're not
going to dump, when we know perfectly well at creation time
that it won't be dumped.

Discussion: https://postgr.es/m/1376149.1675268279@sss.pgh.pa.us
---
 src/bin/pg_dump/common.c  | 3 ++-
 src/bin/pg_dump/pg_dump.c | 3 ---
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c
index 3d0cfc1b8e..ce9d2a8ee8 100644
--- a/src/bin/pg_dump/common.c
+++ b/src/bin/pg_dump/common.c
@@ -336,7 +336,8 @@ flagInhTables(Archive *fout, TableInfo *tblinfo, int numTables,
         }

         /* Create TableAttachInfo object if needed */
-        if (tblinfo[i].dobj.dump && tblinfo[i].ispartition)
+        if ((tblinfo[i].dobj.dump & DUMP_COMPONENT_DEFINITION) &&
+            tblinfo[i].ispartition)
         {
             TableAttachInfo *attachinfo;

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 51b317aa3d..35bba8765f 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -16082,9 +16082,6 @@ dumpTableAttach(Archive *fout, const TableAttachInfo *attachinfo)
     if (dopt->dataOnly)
         return;

-    if (!(attachinfo->partitionTbl->dobj.dump & DUMP_COMPONENT_DEFINITION))
-        return;
-
     q = createPQExpBuffer();

     if (!fout->is_prepared[PREPQUERY_DUMPTABLEATTACH])
--
2.31.1

From 7cc336043e750ec154185b972970b2364f0f57ee Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Tue, 14 Feb 2023 13:35:52 -0500
Subject: [PATCH v1 3/3] Simplify pg_dump's creation of parent-table links.

Instead of trying to optimize this by skipping creation of the
links for tables we don't plan to dump, just create them all in
bulk with a single scan over the pg_inherits data.  The previous
approach was more or less O(N^2) in the number of pg_inherits
entries, not to mention being way too complicated.

Discussion: https://postgr.es/m/1376149.1675268279@sss.pgh.pa.us
---
 src/bin/pg_dump/common.c  | 125 ++++++++++++++++----------------------
 src/bin/pg_dump/pg_dump.h |   5 +-
 2 files changed, 54 insertions(+), 76 deletions(-)

diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c
index ce9d2a8ee8..84d99ee8b6 100644
--- a/src/bin/pg_dump/common.c
+++ b/src/bin/pg_dump/common.c
@@ -83,8 +83,6 @@ static void flagInhTables(Archive *fout, TableInfo *tblinfo, int numTables,
                           InhInfo *inhinfo, int numInherits);
 static void flagInhIndexes(Archive *fout, TableInfo *tblinfo, int numTables);
 static void flagInhAttrs(DumpOptions *dopt, TableInfo *tblinfo, int numTables);
-static void findParentsByOid(TableInfo *self,
-                             InhInfo *inhinfo, int numInherits);
 static int    strInArray(const char *pattern, char **arr, int arr_size);
 static IndxInfo *findIndexByOid(Oid oid);

@@ -288,45 +286,70 @@ static void
 flagInhTables(Archive *fout, TableInfo *tblinfo, int numTables,
               InhInfo *inhinfo, int numInherits)
 {
+    TableInfo  *child = NULL;
+    TableInfo  *parent = NULL;
     int            i,
                 j;

-    for (i = 0; i < numTables; i++)
+    /*
+     * Set up links from child tables to their parents.
+     *
+     * We used to attempt to skip this work for tables that are not to be
+     * dumped; but the optimizable cases are rare in practice, and setting up
+     * these links in bulk is cheaper than the old way.  (Note in particular
+     * that it's very rare for a child to have more than one parent.)
+     */
+    for (i = 0; i < numInherits; i++)
     {
-        bool        find_parents = true;
-        bool        mark_parents = true;
-
-        /* Some kinds never have parents */
-        if (tblinfo[i].relkind == RELKIND_SEQUENCE ||
-            tblinfo[i].relkind == RELKIND_VIEW ||
-            tblinfo[i].relkind == RELKIND_MATVIEW)
-            continue;
-
         /*
-         * Normally, we don't bother computing anything for non-target tables.
-         * However, we must find the parents of non-root partitioned tables in
-         * any case, so that we can trace from leaf partitions up to the root
-         * (in case a leaf is to be dumped but its parents are not).  We need
-         * not mark such parents interesting for getTableAttrs, though.
+         * Skip a hashtable lookup if it's same table as last time.  This is
+         * unlikely for the child, but less so for the parent.  (Maybe we
+         * should ask the backend for a sorted array to make it more likely?
+         * Not clear the sorting effort would be repaid, though.)
          */
-        if (!tblinfo[i].dobj.dump)
+        if (child == NULL ||
+            child->dobj.catId.oid != inhinfo[i].inhrelid)
         {
-            mark_parents = false;
+            child = findTableByOid(inhinfo[i].inhrelid);

-            if (!(tblinfo[i].relkind == RELKIND_PARTITIONED_TABLE &&
-                  tblinfo[i].ispartition))
-                find_parents = false;
+            /*
+             * If we find no TableInfo, assume the pg_inherits entry is for a
+             * partitioned index, which we don't need to track.
+             */
+            if (child == NULL)
+                continue;
         }
+        if (parent == NULL ||
+            parent->dobj.catId.oid != inhinfo[i].inhparent)
+        {
+            parent = findTableByOid(inhinfo[i].inhparent);
+            if (parent == NULL)
+                pg_fatal("failed sanity check, parent OID %u of table \"%s\" (OID %u) not found",
+                         inhinfo[i].inhparent,
+                         child->dobj.name,
+                         child->dobj.catId.oid);
+        }
+        /* Add this parent to the child's list of parents. */
+        if (child->numParents > 0)
+            child->parents = pg_realloc_array(child->parents,
+                                              TableInfo *,
+                                              child->numParents + 1);
+        else
+            child->parents = pg_malloc_array(TableInfo *, 1);
+        child->parents[child->numParents++] = parent;
+    }

-        /* If needed, find all the immediate parent tables. */
-        if (find_parents)
-            findParentsByOid(&tblinfo[i], inhinfo, numInherits);
-
+    /*
+     * Now consider all child tables and mark parents interesting as needed.
+     */
+    for (i = 0; i < numTables; i++)
+    {
         /*
          * If needed, mark the parents as interesting for getTableAttrs and
-         * getIndexes.
+         * getIndexes.  We only need this for direct parents of dumpable
+         * tables.
          */
-        if (mark_parents)
+        if (tblinfo[i].dobj.dump)
         {
             int            numParents = tblinfo[i].numParents;
             TableInfo **parents = tblinfo[i].parents;
@@ -979,52 +1002,6 @@ findOwningExtension(CatalogId catalogId)
 }


-/*
- * findParentsByOid
- *      find a table's parents in tblinfo[]
- */
-static void
-findParentsByOid(TableInfo *self,
-                 InhInfo *inhinfo, int numInherits)
-{
-    Oid            oid = self->dobj.catId.oid;
-    int            i,
-                j;
-    int            numParents;
-
-    numParents = 0;
-    for (i = 0; i < numInherits; i++)
-    {
-        if (inhinfo[i].inhrelid == oid)
-            numParents++;
-    }
-
-    self->numParents = numParents;
-
-    if (numParents > 0)
-    {
-        self->parents = pg_malloc_array(TableInfo *, numParents);
-        j = 0;
-        for (i = 0; i < numInherits; i++)
-        {
-            if (inhinfo[i].inhrelid == oid)
-            {
-                TableInfo  *parent;
-
-                parent = findTableByOid(inhinfo[i].inhparent);
-                if (parent == NULL)
-                    pg_fatal("failed sanity check, parent OID %u of table \"%s\" (OID %u) not found",
-                             inhinfo[i].inhparent,
-                             self->dobj.name,
-                             oid);
-                self->parents[j++] = parent;
-            }
-        }
-    }
-    else
-        self->parents = NULL;
-}
-
 /*
  * parseOidArray
  *      parse a string of numbers delimited by spaces into a character array
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index bfd3cf17b7..bb4e981e7d 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -320,6 +320,9 @@ typedef struct _tableInfo
     bool        ispartition;    /* is table a partition? */
     bool        unsafe_partitions;    /* is it an unsafe partitioned table? */

+    int            numParents;        /* number of (immediate) parent tables */
+    struct _tableInfo **parents;    /* TableInfos of immediate parents */
+
     /*
      * These fields are computed only if we decide the table is interesting
      * (it's either a table to dump, or a direct parent of a dumpable table).
@@ -352,8 +355,6 @@ typedef struct _tableInfo
     /*
      * Stuff computed only for dumpable tables.
      */
-    int            numParents;        /* number of (immediate) parent tables */
-    struct _tableInfo **parents;    /* TableInfos of immediate parents */
     int            numIndexes;        /* number of indexes */
     struct _indxInfo *indexes;    /* indexes */
     struct _tableDataInfo *dataObj; /* TableDataInfo, if dumping its data */
--
2.31.1


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Следующее
От: Andres Freund
Дата:
Сообщение: Re: doc: add missing "id" attributes to extension packaging page