Re: pg_dump and dependencies and --section ... it's a mess

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: pg_dump and dependencies and --section ... it's a mess
Дата
Msg-id 18836.1340397797@sss.pgh.pa.us
обсуждение исходный текст
Ответ на pg_dump and dependencies and --section ... it's a mess  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: pg_dump and dependencies and --section ... it's a mess  (Andrew Dunstan <andrew@dunslane.net>)
Re: pg_dump and dependencies and --section ... it's a mess  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
I wrote:
> I believe the right fix for both of these issues is to add knowledge of
> the section concept to the topological sort logic, so that an ordering
> that puts POST_DATA before DATA or PRE_DATA after DATA is considered to
> be a dependency-ordering violation.  One way to do that is to add dummy
> "fencepost" objects to the sort, representing the start and end of the
> DATA section.  However, these objects would need explicit dependency
> links to every other DumpableObject, so that doesn't sound very good
> from a performance standpoint.  What I'm going to go look at is whether
> we can mark DumpableObjects with their SECTION codes at creation time
> (rather than adding that information at ArchiveEntry() time) and then
> have the topo sort logic take that marking into account in addition to
> the explicit dependency links.

I gave up on putting any fancy hacks into the topological sort code;
it would have made it extremely ugly, and besides every solution I could
think of wanted to have at least one extra auxiliary array with an entry
per DumpableObject.  Which would have eaten about as much space as the
extra dependency links.  Furthermore, it turns out that the post-data
dependencies need to be changeable, since rules and constraints should
only be forced to be post-data *if* we've decided to dump them
separately from their parent tables/views.  (My original try at this
ended up forcing every rule & constraint to be dumped separately, which
is not what we want.)  Putting knowledge of that into the core
topological sort code seemed right out.  So the attached draft patch
does it the straightforward way, actually creating two dummy boundary
objects and setting up explicit dependency links with them.

I did some simple performance tests and found that this adds a
measurable but pretty negligible cost to pg_dump's runtime.  For
instance, dumping several thousand empty tables went from 9.34 to
9.57 seconds of pg_dump CPU time, compared to multiple minutes of
CPU time spent on the backend side (even with the recent lockmanager
fixes).  So I no longer feel any strong need to "optimize" the code.

A disadvantage of representing the dependencies explicitly is that
the ones attached to DATA and POST_DATA objects show up in the output
archive.  I'm not particularly worried about this so far as HEAD and
9.2 are concerned, because the other patch to fix emitted dependencies
will make them go away again.  But as I mentioned, I'm not big on
back-patching that one into 9.1.  We could hack something simpler
to directly suppress dependencies on the boundary objects only, or
we could just write it off as not mattering much.  I'd barely have
noticed it except I was testing whether I got an exact match to the
archive produced by an unpatched pg_dump (in cases not involving
the view-vs-constraint bug).

Anyway, the attached patch does seem to fix the constraint bug.

A possible objection to it is that there are now three different ways in
which the pg_dump code knows which DO_XXX object types go in which dump
section: the new addBoundaryDependencies() function knows this, the
SECTION_xxx arguments to ArchiveEntry calls know it, and the sort
ordering constants in pg_dump_sort.c have to agree too.  My original
idea was to add an explicit section field to DumpableObject to reduce
the number of places that know this, but that would increase pg_dump's
memory consumption still more, and yet still not give us a single point
of knowledge.  Has anybody got a better idea?

Barring objections or better ideas, I'll push forward with applying
this patch and the dependency-fixup patch.

            regards, tom lane

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 5fde18921ac3c20f878fbe5ad22c60fabc13a916..71cc3416bb9fd9dcf482c7e8c3a99d01acd7f238 100644
*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
*************** static void dumpACL(Archive *fout, Catal
*** 210,215 ****
--- 210,220 ----
          const char *acls);

  static void getDependencies(Archive *fout);
+
+ static DumpableObject *createBoundaryObjects(void);
+ static void addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
+                         DumpableObject *boundaryObjs);
+
  static void getDomainConstraints(Archive *fout, TypeInfo *tyinfo);
  static void getTableData(TableInfo *tblinfo, int numTables, bool oids);
  static void makeTableDataInfo(TableInfo *tbinfo, bool oids);
*************** main(int argc, char **argv)
*** 270,275 ****
--- 275,281 ----
      int            numTables;
      DumpableObject **dobjs;
      int            numObjs;
+     DumpableObject *boundaryObjs;
      int            i;
      enum trivalue prompt_password = TRI_DEFAULT;
      int            compressLevel = -1;
*************** main(int argc, char **argv)
*** 691,696 ****
--- 697,713 ----
       */
      getDependencies(fout);

+     /* Lastly, create dummy objects to represent the section boundaries */
+     boundaryObjs = createBoundaryObjects();
+
+     /* Get pointers to all the known DumpableObjects */
+     getDumpableObjects(&dobjs, &numObjs);
+
+     /*
+      * Add dummy dependencies to enforce the dump section ordering.
+      */
+     addBoundaryDependencies(dobjs, numObjs, boundaryObjs);
+
      /*
       * Sort the objects into a safe dump order (no forward references).
       *
*************** main(int argc, char **argv)
*** 700,713 ****
       * will dump identically.  Before 7.3 we don't have dependencies and we
       * use OID ordering as an (unreliable) guide to creation order.
       */
-     getDumpableObjects(&dobjs, &numObjs);
-
      if (fout->remoteVersion >= 70300)
          sortDumpableObjectsByTypeName(dobjs, numObjs);
      else
          sortDumpableObjectsByTypeOid(dobjs, numObjs);

!     sortDumpableObjects(dobjs, numObjs);

      /*
       * Create archive TOC entries for all the objects to be dumped, in a safe
--- 717,729 ----
       * will dump identically.  Before 7.3 we don't have dependencies and we
       * use OID ordering as an (unreliable) guide to creation order.
       */
      if (fout->remoteVersion >= 70300)
          sortDumpableObjectsByTypeName(dobjs, numObjs);
      else
          sortDumpableObjectsByTypeOid(dobjs, numObjs);

!     sortDumpableObjects(dobjs, numObjs,
!                         boundaryObjs[0].dumpId, boundaryObjs[1].dumpId);

      /*
       * Create archive TOC entries for all the objects to be dumped, in a safe
*************** dumpDumpableObject(Archive *fout, Dumpab
*** 7184,7189 ****
--- 7200,7209 ----
                           dobj->dependencies, dobj->nDeps,
                           dumpBlobs, NULL);
              break;
+         case DO_PRE_DATA_BOUNDARY:
+         case DO_POST_DATA_BOUNDARY:
+             /* never dumped, nothing to do */
+             break;
      }
  }

*************** dumpDefaultACL(Archive *fout, DefaultACL
*** 11672,11678 ****
         daclinfo->dobj.namespace ? daclinfo->dobj.namespace->dobj.name : NULL,
                   NULL,
                   daclinfo->defaclrole,
!                  false, "DEFAULT ACL", SECTION_NONE,
                   q->data, "", NULL,
                   daclinfo->dobj.dependencies, daclinfo->dobj.nDeps,
                   NULL, NULL);
--- 11692,11698 ----
         daclinfo->dobj.namespace ? daclinfo->dobj.namespace->dobj.name : NULL,
                   NULL,
                   daclinfo->defaclrole,
!                  false, "DEFAULT ACL", SECTION_POST_DATA,
                   q->data, "", NULL,
                   daclinfo->dobj.dependencies, daclinfo->dobj.nDeps,
                   NULL, NULL);
*************** getDependencies(Archive *fout)
*** 14028,14033 ****
--- 14048,14160 ----


  /*
+  * createBoundaryObjects - create dummy DumpableObjects to represent
+  * dump section boundaries.
+  */
+ static DumpableObject *
+ createBoundaryObjects(void)
+ {
+     DumpableObject *dobjs;
+
+     dobjs = (DumpableObject *) pg_malloc(2 * sizeof(DumpableObject));
+
+     dobjs[0].objType = DO_PRE_DATA_BOUNDARY;
+     dobjs[0].catId = nilCatalogId;
+     AssignDumpId(dobjs + 0);
+     dobjs[0].name = pg_strdup("PRE-DATA BOUNDARY");
+
+     dobjs[1].objType = DO_POST_DATA_BOUNDARY;
+     dobjs[1].catId = nilCatalogId;
+     AssignDumpId(dobjs + 1);
+     dobjs[1].name = pg_strdup("POST-DATA BOUNDARY");
+
+     return dobjs;
+ }
+
+ /*
+  * addBoundaryDependencies - add dependencies as needed to enforce the dump
+  * section boundaries.
+  */
+ static void
+ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
+                         DumpableObject *boundaryObjs)
+ {
+     DumpableObject *preDataBound = boundaryObjs + 0;
+     DumpableObject *postDataBound = boundaryObjs + 1;
+     int            i;
+
+     for (i = 0; i < numObjs; i++)
+     {
+         DumpableObject *dobj = dobjs[i];
+
+         /*
+          * The classification of object types here must match the SECTION_xxx
+          * values assigned during subsequent ArchiveEntry calls!
+          */
+         switch (dobj->objType)
+         {
+             case DO_NAMESPACE:
+             case DO_EXTENSION:
+             case DO_TYPE:
+             case DO_SHELL_TYPE:
+             case DO_FUNC:
+             case DO_AGG:
+             case DO_OPERATOR:
+             case DO_OPCLASS:
+             case DO_OPFAMILY:
+             case DO_COLLATION:
+             case DO_CONVERSION:
+             case DO_TABLE:
+             case DO_ATTRDEF:
+             case DO_PROCLANG:
+             case DO_CAST:
+             case DO_DUMMY_TYPE:
+             case DO_TSPARSER:
+             case DO_TSDICT:
+             case DO_TSTEMPLATE:
+             case DO_TSCONFIG:
+             case DO_FDW:
+             case DO_FOREIGN_SERVER:
+             case DO_BLOB:
+                 /* Pre-data objects: must come before the pre-data boundary */
+                 addObjectDependency(preDataBound, dobj->dumpId);
+                 break;
+             case DO_TABLE_DATA:
+             case DO_BLOB_DATA:
+                 /* Data objects: must come between the boundaries */
+                 addObjectDependency(dobj, preDataBound->dumpId);
+                 addObjectDependency(postDataBound, dobj->dumpId);
+                 break;
+             case DO_INDEX:
+             case DO_TRIGGER:
+             case DO_DEFAULT_ACL:
+                 /* Post-data objects: must come after the post-data boundary */
+                 addObjectDependency(dobj, postDataBound->dumpId);
+                 break;
+             case DO_RULE:
+                 /* Rules are post-data, but only if dumped separately */
+                 if (((RuleInfo *) dobj)->separate)
+                     addObjectDependency(dobj, postDataBound->dumpId);
+                 break;
+             case DO_CONSTRAINT:
+             case DO_FK_CONSTRAINT:
+                 /* Constraints are post-data, but only if dumped separately */
+                 if (((ConstraintInfo *) dobj)->separate)
+                     addObjectDependency(dobj, postDataBound->dumpId);
+                 break;
+             case DO_PRE_DATA_BOUNDARY:
+                 /* nothing to do */
+                 break;
+             case DO_POST_DATA_BOUNDARY:
+                 /* must come after the pre-data boundary */
+                 addObjectDependency(dobj, preDataBound->dumpId);
+                 break;
+         }
+     }
+ }
+
+
+ /*
   * selectSourceSchema - make the specified schema the active search path
   * in the source database.
   *
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index f26d8d3ab66e3d4e894f96cdf0f33aa739050e7e..b44187bbdcfc3a809b32059a00966b09581cd2a3 100644
*** a/src/bin/pg_dump/pg_dump.h
--- b/src/bin/pg_dump/pg_dump.h
*************** typedef enum
*** 97,102 ****
--- 97,103 ----
      DO_OPERATOR,
      DO_OPCLASS,
      DO_OPFAMILY,
+     DO_COLLATION,
      DO_CONVERSION,
      DO_TABLE,
      DO_ATTRDEF,
*************** typedef enum
*** 118,124 ****
      DO_DEFAULT_ACL,
      DO_BLOB,
      DO_BLOB_DATA,
!     DO_COLLATION
  } DumpableObjectType;

  typedef struct _dumpableObject
--- 119,126 ----
      DO_DEFAULT_ACL,
      DO_BLOB,
      DO_BLOB_DATA,
!     DO_PRE_DATA_BOUNDARY,
!     DO_POST_DATA_BOUNDARY
  } DumpableObjectType;

  typedef struct _dumpableObject
*************** extern bool simple_string_list_member(Si
*** 520,526 ****

  extern void parseOidArray(const char *str, Oid *array, int arraysize);

! extern void sortDumpableObjects(DumpableObject **objs, int numObjs);
  extern void sortDumpableObjectsByTypeName(DumpableObject **objs, int numObjs);
  extern void sortDumpableObjectsByTypeOid(DumpableObject **objs, int numObjs);

--- 522,529 ----

  extern void parseOidArray(const char *str, Oid *array, int arraysize);

! extern void sortDumpableObjects(DumpableObject **objs, int numObjs,
!                     DumpId preBoundaryId, DumpId postBoundaryId);
  extern void sortDumpableObjectsByTypeName(DumpableObject **objs, int numObjs);
  extern void sortDumpableObjectsByTypeOid(DumpableObject **objs, int numObjs);

diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 9a82e4b6c58a6d22f6785d261ddf9fe1a8c34e96..9aa69052d492872f09b990faa088d4cedb882a7b 100644
*** a/src/bin/pg_dump/pg_dump_sort.c
--- b/src/bin/pg_dump/pg_dump_sort.c
*************** static const char *modulename = gettext_
*** 26,31 ****
--- 26,36 ----
   * behavior for old databases without full dependency info.)  Note: collations,
   * extensions, text search, foreign-data, and default ACL objects can't really
   * happen here, so the rather bogus priorities for them don't matter.
+  *
+  * NOTE: object-type priorities must match the section assignments made in
+  * pg_dump.c; that is, PRE_DATA objects must sort before DO_PRE_DATA_BOUNDARY,
+  * POST_DATA objects must sort after DO_POST_DATA_BOUNDARY, and DATA objects
+  * must sort between them.
   */
  static const int oldObjectTypePriority[] =
  {
*************** static const int oldObjectTypePriority[]
*** 38,70 ****
      3,                            /* DO_OPERATOR */
      4,                            /* DO_OPCLASS */
      4,                            /* DO_OPFAMILY */
      5,                            /* DO_CONVERSION */
      6,                            /* DO_TABLE */
      8,                            /* DO_ATTRDEF */
!     13,                            /* DO_INDEX */
!     14,                            /* DO_RULE */
!     15,                            /* DO_TRIGGER */
!     12,                            /* DO_CONSTRAINT */
!     16,                            /* DO_FK_CONSTRAINT */
      2,                            /* DO_PROCLANG */
      2,                            /* DO_CAST */
!     10,                            /* DO_TABLE_DATA */
      7,                            /* DO_DUMMY_TYPE */
!     3,                            /* DO_TSPARSER */
      4,                            /* DO_TSDICT */
!     3,                            /* DO_TSTEMPLATE */
!     5,                            /* DO_TSCONFIG */
!     3,                            /* DO_FDW */
      4,                            /* DO_FOREIGN_SERVER */
!     17,                            /* DO_DEFAULT_ACL */
      9,                            /* DO_BLOB */
!     11,                            /* DO_BLOB_DATA */
!     2                            /* DO_COLLATION */
  };

  /*
   * Sort priority for object types when dumping newer databases.
   * Objects are sorted by type, and within a type by name.
   */
  static const int newObjectTypePriority[] =
  {
--- 43,82 ----
      3,                            /* DO_OPERATOR */
      4,                            /* DO_OPCLASS */
      4,                            /* DO_OPFAMILY */
+     4,                            /* DO_COLLATION */
      5,                            /* DO_CONVERSION */
      6,                            /* DO_TABLE */
      8,                            /* DO_ATTRDEF */
!     15,                            /* DO_INDEX */
!     16,                            /* DO_RULE */
!     17,                            /* DO_TRIGGER */
!     14,                            /* DO_CONSTRAINT */
!     18,                            /* DO_FK_CONSTRAINT */
      2,                            /* DO_PROCLANG */
      2,                            /* DO_CAST */
!     11,                            /* DO_TABLE_DATA */
      7,                            /* DO_DUMMY_TYPE */
!     4,                            /* DO_TSPARSER */
      4,                            /* DO_TSDICT */
!     4,                            /* DO_TSTEMPLATE */
!     4,                            /* DO_TSCONFIG */
!     4,                            /* DO_FDW */
      4,                            /* DO_FOREIGN_SERVER */
!     19,                            /* DO_DEFAULT_ACL */
      9,                            /* DO_BLOB */
!     12,                            /* DO_BLOB_DATA */
!     10,                            /* DO_PRE_DATA_BOUNDARY */
!     13                            /* DO_POST_DATA_BOUNDARY */
  };

  /*
   * Sort priority for object types when dumping newer databases.
   * Objects are sorted by type, and within a type by name.
+  *
+  * NOTE: object-type priorities must match the section assignments made in
+  * pg_dump.c; that is, PRE_DATA objects must sort before DO_PRE_DATA_BOUNDARY,
+  * POST_DATA objects must sort after DO_POST_DATA_BOUNDARY, and DATA objects
+  * must sort between them.
   */
  static const int newObjectTypePriority[] =
  {
*************** static const int newObjectTypePriority[]
*** 77,93 ****
      8,                            /* DO_OPERATOR */
      9,                            /* DO_OPCLASS */
      9,                            /* DO_OPFAMILY */
      11,                            /* DO_CONVERSION */
      18,                            /* DO_TABLE */
      20,                            /* DO_ATTRDEF */
!     25,                            /* DO_INDEX */
!     26,                            /* DO_RULE */
!     27,                            /* DO_TRIGGER */
!     24,                            /* DO_CONSTRAINT */
!     28,                            /* DO_FK_CONSTRAINT */
      2,                            /* DO_PROCLANG */
      10,                            /* DO_CAST */
!     22,                            /* DO_TABLE_DATA */
      19,                            /* DO_DUMMY_TYPE */
      12,                            /* DO_TSPARSER */
      14,                            /* DO_TSDICT */
--- 89,106 ----
      8,                            /* DO_OPERATOR */
      9,                            /* DO_OPCLASS */
      9,                            /* DO_OPFAMILY */
+     3,                            /* DO_COLLATION */
      11,                            /* DO_CONVERSION */
      18,                            /* DO_TABLE */
      20,                            /* DO_ATTRDEF */
!     27,                            /* DO_INDEX */
!     28,                            /* DO_RULE */
!     29,                            /* DO_TRIGGER */
!     26,                            /* DO_CONSTRAINT */
!     30,                            /* DO_FK_CONSTRAINT */
      2,                            /* DO_PROCLANG */
      10,                            /* DO_CAST */
!     23,                            /* DO_TABLE_DATA */
      19,                            /* DO_DUMMY_TYPE */
      12,                            /* DO_TSPARSER */
      14,                            /* DO_TSDICT */
*************** static const int newObjectTypePriority[]
*** 95,106 ****
      15,                            /* DO_TSCONFIG */
      16,                            /* DO_FDW */
      17,                            /* DO_FOREIGN_SERVER */
!     29,                            /* DO_DEFAULT_ACL */
      21,                            /* DO_BLOB */
!     23,                            /* DO_BLOB_DATA */
!     3                            /* DO_COLLATION */
  };


  static int    DOTypeNameCompare(const void *p1, const void *p2);
  static int    DOTypeOidCompare(const void *p1, const void *p2);
--- 108,123 ----
      15,                            /* DO_TSCONFIG */
      16,                            /* DO_FDW */
      17,                            /* DO_FOREIGN_SERVER */
!     31,                            /* DO_DEFAULT_ACL */
      21,                            /* DO_BLOB */
!     24,                            /* DO_BLOB_DATA */
!     22,                            /* DO_PRE_DATA_BOUNDARY */
!     25                            /* DO_POST_DATA_BOUNDARY */
  };

+ static DumpId preDataBoundId;
+ static DumpId postDataBoundId;
+

  static int    DOTypeNameCompare(const void *p1, const void *p2);
  static int    DOTypeOidCompare(const void *p1, const void *p2);
*************** DOTypeOidCompare(const void *p1, const v
*** 237,252 ****
  /*
   * Sort the given objects into a safe dump order using dependency
   * information (to the extent we have it available).
   */
  void
! sortDumpableObjects(DumpableObject **objs, int numObjs)
  {
      DumpableObject **ordering;
      int            nOrdering;

!     if (numObjs <= 0)
          return;

      ordering = (DumpableObject **) pg_malloc(numObjs * sizeof(DumpableObject *));
      while (!TopoSort(objs, numObjs, ordering, &nOrdering))
          findDependencyLoops(ordering, nOrdering, numObjs);
--- 254,280 ----
  /*
   * Sort the given objects into a safe dump order using dependency
   * information (to the extent we have it available).
+  *
+  * The DumpIds of the PRE_DATA_BOUNDARY and POST_DATA_BOUNDARY objects are
+  * passed in separately, in case we need them during dependency loop repair.
   */
  void
! sortDumpableObjects(DumpableObject **objs, int numObjs,
!                     DumpId preBoundaryId, DumpId postBoundaryId)
  {
      DumpableObject **ordering;
      int            nOrdering;

!     if (numObjs <= 0)            /* can't happen anymore ... */
          return;

+     /*
+      * Saving the boundary IDs in static variables is a bit grotty, but seems
+      * better than adding them to parameter lists of subsidiary functions.
+      */
+     preDataBoundId = preBoundaryId;
+     postDataBoundId = postBoundaryId;
+
      ordering = (DumpableObject **) pg_malloc(numObjs * sizeof(DumpableObject *));
      while (!TopoSort(objs, numObjs, ordering, &nOrdering))
          findDependencyLoops(ordering, nOrdering, numObjs);
*************** repairViewRuleMultiLoop(DumpableObject *
*** 701,706 ****
--- 729,736 ----
      ((RuleInfo *) ruleobj)->separate = true;
      /* put back rule's dependency on view */
      addObjectDependency(ruleobj, viewobj->dumpId);
+     /* now that rule is separate, it must be post-data */
+     addObjectDependency(ruleobj, postDataBoundId);
  }

  /*
*************** repairTableConstraintMultiLoop(DumpableO
*** 736,741 ****
--- 766,773 ----
      ((ConstraintInfo *) constraintobj)->separate = true;
      /* put back constraint's dependency on table */
      addObjectDependency(constraintobj, tableobj->dumpId);
+     /* now that constraint is separate, it must be post-data */
+     addObjectDependency(constraintobj, postDataBoundId);
  }

  /*
*************** repairDomainConstraintMultiLoop(Dumpable
*** 782,787 ****
--- 814,821 ----
      ((ConstraintInfo *) constraintobj)->separate = true;
      /* put back constraint's dependency on domain */
      addObjectDependency(constraintobj, domainobj->dumpId);
+     /* now that constraint is separate, it must be post-data */
+     addObjectDependency(constraintobj, postDataBoundId);
  }

  /*
*************** describeDumpableObject(DumpableObject *o
*** 1190,1195 ****
--- 1224,1239 ----
                       "BLOB DATA  (ID %d)",
                       obj->dumpId);
              return;
+         case DO_PRE_DATA_BOUNDARY:
+             snprintf(buf, bufsize,
+                      "PRE-DATA BOUNDARY  (ID %d)",
+                      obj->dumpId);
+             return;
+         case DO_POST_DATA_BOUNDARY:
+             snprintf(buf, bufsize,
+                      "POST-DATA BOUNDARY  (ID %d)",
+                      obj->dumpId);
+             return;
      }
      /* shouldn't get here */
      snprintf(buf, bufsize,

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stefan Kaltenbrunner
Дата:
Сообщение: Re: random failing builds on spoonbill - backends not exiting...
Следующее
От: Tom Lane
Дата:
Сообщение: Re: random failing builds on spoonbill - backends not exiting...