Обсуждение: Further cleanup of pg_dump/pg_restore item selection code

Поиск
Список
Период
Сортировка

Further cleanup of pg_dump/pg_restore item selection code

От
Tom Lane
Дата:
As I've been hacking on the pg_dump code recently, I got annoyed at the
ugliness of its code for deciding whether or not to emit database-related
TOC entries.  That logic is implemented in a completely different place
from other TOC entry selection decisions, and it has a bunch of strange
behaviors that can really only be characterized as bugs.  An example is
that

    pg_dump --create -t sometable somedb

will not emit a CREATE DATABASE command as you might expect, because
the use of -t prevents the DATABASE TOC entry from being made in the
first place.  Also,

    pg_dump --clean --create -t sometable somedb

will not emit any DROP commands: the code expects that with the
combination of --clean and --create, it should emit only DROP
DATABASE, but nothing happens because there's no DATABASE entry.

The same behaviors occur if you do e.g.

    pg_dump -Fc -t sometable somedb | pg_restore --create

which is another undocumented misbehavior: the docs for pg_restore do not
suggest that --create might fail if the source archive was selective.

Another set of problems is that if you use pg_restore's item
selection switches (-t etc), those do not restore ACLs, comments,
or security labels associated with the selected object(s).  This
is strange, and unlike the behavior of the comparable switches in
pg_dump, and certainly undocumented.

Attached is a patch that proposes to clean this up.  It arranges
things so that CREATE DATABASE (and, if --clean, DROP DATABASE)
is emitted if and only if you said --create, regardless of other
selectivity switches.  And it fixes selective pg_restore to include
subsidiary ACLs, comments, and security labels.

This does not go all the way towards making pg_restore's item selection
switches dump subsidiary objects the same as pg_dump would, because
pg_restore doesn't really have enough info to deal with indexes and
table constraints the way pg_dump does.  Perhaps we could intuit the
parent table by examining object dependencies, but that would be a
bunch of new logic that seems like fit material for a different patch.
In the meantime I just added a disclaimer to pg_restore.sgml explaining
this.

I also made an effort to make _tocEntryRequired() a little better
organized and better documented.  It's grown a lot of different
behaviors over the years without much thought about layout or
keeping related checks together.  (There's a chunk in the middle
of the function that now needs to be indented one stop to the right,
but in the interests of keeping the patch reviewable, I've not done
so yet.)

Comments?

            regards, tom lane

diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index a2ebf75..b1b507f 100644
*** a/doc/src/sgml/ref/pg_restore.sgml
--- b/doc/src/sgml/ref/pg_restore.sgml
***************
*** 446,451 ****
--- 446,456 ----
           flag of <application>pg_dump</application>.  There is not currently
           any provision for wild-card matching in <application>pg_restore</application>,
           nor can you include a schema name within its <option>-t</option>.
+          And, while <application>pg_dump</application>'s <option>-t</option>
+          flag will also dump subsidiary objects (such as indexes) of the
+          selected table(s),
+          <application>pg_restore</application>'s <option>-t</option>
+          flag does not include such subsidiary objects.
          </para>
         </note>

diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index fb03793..051df9e 100644
*** a/src/bin/pg_dump/pg_backup_archiver.c
--- b/src/bin/pg_dump/pg_backup_archiver.c
*************** static void _selectOutputSchema(ArchiveH
*** 70,76 ****
  static void _selectTablespace(ArchiveHandle *AH, const char *tablespace);
  static void processEncodingEntry(ArchiveHandle *AH, TocEntry *te);
  static void processStdStringsEntry(ArchiveHandle *AH, TocEntry *te);
! static teReqs _tocEntryRequired(TocEntry *te, teSection curSection, RestoreOptions *ropt);
  static RestorePass _tocEntryRestorePass(TocEntry *te);
  static bool _tocEntryIsACL(TocEntry *te);
  static void _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te);
--- 70,76 ----
  static void _selectTablespace(ArchiveHandle *AH, const char *tablespace);
  static void processEncodingEntry(ArchiveHandle *AH, TocEntry *te);
  static void processStdStringsEntry(ArchiveHandle *AH, TocEntry *te);
! static teReqs _tocEntryRequired(TocEntry *te, teSection curSection, ArchiveHandle *AH);
  static RestorePass _tocEntryRestorePass(TocEntry *te);
  static bool _tocEntryIsACL(TocEntry *te);
  static void _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te);
*************** ProcessArchiveRestoreOptions(Archive *AH
*** 312,318 ****
          if (te->section != SECTION_NONE)
              curSection = te->section;

!         te->reqs = _tocEntryRequired(te, curSection, ropt);
      }

      /* Enforce strict names checking */
--- 312,318 ----
          if (te->section != SECTION_NONE)
              curSection = te->section;

!         te->reqs = _tocEntryRequired(te, curSection, AH);
      }

      /* Enforce strict names checking */
*************** RestoreArchive(Archive *AHX)
*** 488,496 ****
               * In createDB mode, issue a DROP *only* for the database as a
               * whole.  Issuing drops against anything else would be wrong,
               * because at this point we're connected to the wrong database.
!              * Conversely, if we're not in createDB mode, we'd better not
!              * issue a DROP against the database at all.  (The DATABASE
!              * PROPERTIES entry, if any, works like the DATABASE entry.)
               */
              if (ropt->createDB)
              {
--- 488,495 ----
               * In createDB mode, issue a DROP *only* for the database as a
               * whole.  Issuing drops against anything else would be wrong,
               * because at this point we're connected to the wrong database.
!              * (The DATABASE PROPERTIES entry, if any, should be treated like
!              * the DATABASE entry.)
               */
              if (ropt->createDB)
              {
*************** RestoreArchive(Archive *AHX)
*** 498,509 ****
                      strcmp(te->desc, "DATABASE PROPERTIES") != 0)
                      continue;
              }
-             else
-             {
-                 if (strcmp(te->desc, "DATABASE") == 0 ||
-                     strcmp(te->desc, "DATABASE PROPERTIES") == 0)
-                     continue;
-             }

              /* Otherwise, drop anything that's selected and has a dropStmt */
              if (((te->reqs & (REQ_SCHEMA | REQ_DATA)) != 0) && te->dropStmt)
--- 497,502 ----
*************** restore_toc_entry(ArchiveHandle *AH, Toc
*** 752,776 ****

      AH->currentTE = te;

-     /* Work out what, if anything, we want from this entry */
-     reqs = te->reqs;
-
-     /*
-      * Ignore DATABASE and related entries unless createDB is specified.  We
-      * must check this here, not in _tocEntryRequired, because !createDB
-      * should not prevent emitting these entries to an archive file.
-      */
-     if (!ropt->createDB &&
-         (strcmp(te->desc, "DATABASE") == 0 ||
-          strcmp(te->desc, "DATABASE PROPERTIES") == 0 ||
-          (strcmp(te->desc, "ACL") == 0 &&
-           strncmp(te->tag, "DATABASE ", 9) == 0) ||
-          (strcmp(te->desc, "COMMENT") == 0 &&
-           strncmp(te->tag, "DATABASE ", 9) == 0) ||
-          (strcmp(te->desc, "SECURITY LABEL") == 0 &&
-           strncmp(te->tag, "DATABASE ", 9) == 0)))
-         reqs = 0;
-
      /* Dump any relevant dump warnings to stderr */
      if (!ropt->suppressDumpWarnings && strcmp(te->desc, "WARNING") == 0)
      {
--- 745,750 ----
*************** restore_toc_entry(ArchiveHandle *AH, Toc
*** 780,785 ****
--- 754,762 ----
              write_msg(modulename, "warning from original dump file: %s\n", te->copyStmt);
      }

+     /* Work out what, if anything, we want from this entry */
+     reqs = te->reqs;
+
      defnDumped = false;

      /*
*************** PrintTOCSummary(Archive *AHX)
*** 1191,1197 ****
          if (te->section != SECTION_NONE)
              curSection = te->section;
          if (ropt->verbose ||
!             (_tocEntryRequired(te, curSection, ropt) & (REQ_SCHEMA | REQ_DATA)) != 0)
          {
              char       *sanitized_name;
              char       *sanitized_schema;
--- 1168,1174 ----
          if (te->section != SECTION_NONE)
              curSection = te->section;
          if (ropt->verbose ||
!             (_tocEntryRequired(te, curSection, AH) & (REQ_SCHEMA | REQ_DATA)) != 0)
          {
              char       *sanitized_name;
              char       *sanitized_schema;
*************** StrictNamesCheck(RestoreOptions *ropt)
*** 2824,2839 ****
      }
  }

  static teReqs
! _tocEntryRequired(TocEntry *te, teSection curSection, RestoreOptions *ropt)
  {
      teReqs        res = REQ_SCHEMA | REQ_DATA;

      /* ENCODING and STDSTRINGS items are treated specially */
      if (strcmp(te->desc, "ENCODING") == 0 ||
          strcmp(te->desc, "STDSTRINGS") == 0)
          return REQ_SPECIAL;

      /* If it's an ACL, maybe ignore it */
      if (ropt->aclsSkip && _tocEntryIsACL(te))
          return 0;
--- 2801,2842 ----
      }
  }

+ /*
+  * Determine whether we want to restore this TOC entry.
+  *
+  * Returns 0 if entry should be skipped, or some combination of the
+  * REQ_SCHEMA and REQ_DATA bits if we want to restore schema and/or data
+  * portions of this TOC entry, or REQ_SPECIAL if it's a special entry.
+  */
  static teReqs
! _tocEntryRequired(TocEntry *te, teSection curSection, ArchiveHandle *AH)
  {
      teReqs        res = REQ_SCHEMA | REQ_DATA;
+     RestoreOptions *ropt = AH->public.ropt;

      /* ENCODING and STDSTRINGS items are treated specially */
      if (strcmp(te->desc, "ENCODING") == 0 ||
          strcmp(te->desc, "STDSTRINGS") == 0)
          return REQ_SPECIAL;

+     /*
+      * DATABASE and DATABASE PROPERTIES also have a special rule: they are
+      * restored in createDB mode, and not restored otherwise, independently of
+      * all else.
+      */
+     if (strcmp(te->desc, "DATABASE") == 0 ||
+         strcmp(te->desc, "DATABASE PROPERTIES") == 0)
+     {
+         if (ropt->createDB)
+             return REQ_SCHEMA;
+         else
+             return 0;
+     }
+
+     /*
+      * Process exclusions that affect certain classes of TOC entries.
+      */
+
      /* If it's an ACL, maybe ignore it */
      if (ropt->aclsSkip && _tocEntryIsACL(te))
          return 0;
*************** _tocEntryRequired(TocEntry *te, teSectio
*** 2842,2852 ****
      if (ropt->no_publications && strcmp(te->desc, "PUBLICATION") == 0)
          return 0;

!     /* If it's security labels, maybe ignore it */
      if (ropt->no_security_labels && strcmp(te->desc, "SECURITY LABEL") == 0)
          return 0;

!     /* If it's a subcription, maybe ignore it */
      if (ropt->no_subscriptions && strcmp(te->desc, "SUBSCRIPTION") == 0)
          return 0;

--- 2845,2855 ----
      if (ropt->no_publications && strcmp(te->desc, "PUBLICATION") == 0)
          return 0;

!     /* If it's a security label, maybe ignore it */
      if (ropt->no_security_labels && strcmp(te->desc, "SECURITY LABEL") == 0)
          return 0;

!     /* If it's a subscription, maybe ignore it */
      if (ropt->no_subscriptions && strcmp(te->desc, "SUBSCRIPTION") == 0)
          return 0;

*************** _tocEntryRequired(TocEntry *te, teSectio
*** 2870,2876 ****
              return 0;
      }

!     /* Check options for selective dump/restore */
      if (ropt->schemaNames.head != NULL)
      {
          /* If no namespace is specified, it means all. */
--- 2873,2924 ----
              return 0;
      }

!     /* Ignore it if rejected by idWanted[] (cf. SortTocFromFile) */
!     if (ropt->idWanted && !ropt->idWanted[te->dumpId - 1])
!         return 0;
!
!     /*
!      * Check options for selective dump/restore.
!      */
!     if (strcmp(te->desc, "ACL") == 0 ||
!         strcmp(te->desc, "COMMENT") == 0 ||
!         strcmp(te->desc, "SECURITY LABEL") == 0)
!     {
!         /* Database properties react to createDB, not selectivity options. */
!         if (strncmp(te->tag, "DATABASE ", 9) == 0)
!         {
!             if (!ropt->createDB)
!                 return 0;
!         }
!         else if (ropt->schemaNames.head != NULL ||
!                  ropt->schemaExcludeNames.head != NULL ||
!                  ropt->selTypes)
!         {
!             /*
!              * In a selective dump/restore, we want to restore these dependent
!              * TOC entry types only if their parent object is being restored.
!              * Without selectivity options, we let through everything in the
!              * archive.  Note there may be such entries with no parent, eg
!              * non-default ACLs for built-in objects.
!              *
!              * This code depends on the parent having been marked already,
!              * which should be the case; if it isn't, perhaps due to
!              * SortTocFromFile rearrangement, skipping the dependent entry
!              * seems prudent anyway.
!              *
!              * Ideally we'd handle, eg, table CHECK constraints this way too.
!              * But it's hard to tell which of their dependencies is the one to
!              * consult.
!              */
!             if (te->nDeps != 1 ||
!                 TocIDRequired(AH, te->dependencies[0]) == 0)
!                 return 0;
!         }
!     }
!     else
!     {
!         /* Apply selective-restore rules for standalone TOC entries. */
!         /* XXX reindent before committing */
      if (ropt->schemaNames.head != NULL)
      {
          /* If no namespace is specified, it means all. */
*************** _tocEntryRequired(TocEntry *te, teSectio
*** 2926,2934 ****
          else
              return 0;
      }

      /*
!      * Check if we had a dataDumper. Indicates if the entry is schema or data
       */
      if (!te->hadDumper)
      {
--- 2974,2986 ----
          else
              return 0;
      }
+     }

      /*
!      * Determine whether the TOC entry contains schema and/or data components,
!      * and mask off inapplicable REQ bits.  If it had a dataDumper, assume
!      * it's both schema and data.  Otherwise it's probably schema-only, but
!      * there are exceptions.
       */
      if (!te->hadDumper)
      {
*************** _tocEntryRequired(TocEntry *te, teSectio
*** 2952,2957 ****
--- 3004,3013 ----
              res = res & ~REQ_DATA;
      }

+     /* If there's no definition command, there's no schema component */
+     if (!te->defn || !te->defn[0])
+         res = res & ~REQ_SCHEMA;
+
      /*
       * Special case: <Init> type with <Max OID> tag; this is obsolete and we
       * always ignore it.
*************** _tocEntryRequired(TocEntry *te, teSectio
*** 2963,2974 ****
      if (ropt->schemaOnly)
      {
          /*
!          * The sequence_data option overrides schema-only for SEQUENCE SET.
           *
!          * In binary-upgrade mode, even with schema-only set, we do not mask
!          * out large objects.  Only large object definitions, comments and
!          * other information should be generated in binary-upgrade mode (not
!          * the actual data).
           */
          if (!(ropt->sequence_data && strcmp(te->desc, "SEQUENCE SET") == 0) &&
              !(ropt->binary_upgrade &&
--- 3019,3030 ----
      if (ropt->schemaOnly)
      {
          /*
!          * The sequence_data option overrides schemaOnly for SEQUENCE SET.
           *
!          * In binary-upgrade mode, even with schemaOnly set, we do not mask
!          * out large objects.  (Only large object definitions, comments and
!          * other metadata should be generated in binary-upgrade mode, not the
!          * actual data, but that need not concern us here.)
           */
          if (!(ropt->sequence_data && strcmp(te->desc, "SEQUENCE SET") == 0) &&
              !(ropt->binary_upgrade &&
*************** _tocEntryRequired(TocEntry *te, teSectio
*** 2986,2999 ****
      if (ropt->dataOnly)
          res = res & REQ_DATA;

-     /* Mask it if we don't have a schema contribution */
-     if (!te->defn || strlen(te->defn) == 0)
-         res = res & ~REQ_SCHEMA;
-
-     /* Finally, if there's a per-ID filter, limit based on that as well */
-     if (ropt->idWanted && !ropt->idWanted[te->dumpId - 1])
-         return 0;
-
      return res;
  }

--- 3042,3047 ----
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 50399c9..749182e 100644
*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
*************** main(int argc, char **argv)
*** 632,637 ****
--- 632,644 ----
  #endif

      /*
+      * If emitting an archive format, we always want to emit a DATABASE item,
+      * in case --create is specified at pg_restore time.
+      */
+     if (!plainText)
+         dopt.outputCreateDB = 1;
+
+     /*
       * On Windows we can only have at most MAXIMUM_WAIT_OBJECTS (= 64 usually)
       * parallel jobs because that's the maximum limit for the
       * WaitForMultipleObjects() call.
*************** main(int argc, char **argv)
*** 841,847 ****
      dumpStdStrings(fout);

      /* The database items are always next, unless we don't want them at all */
!     if (dopt.include_everything && !dopt.dataOnly)
          dumpDatabase(fout);

      /* Now the rearrangeable objects. */
--- 848,854 ----
      dumpStdStrings(fout);

      /* The database items are always next, unless we don't want them at all */
!     if (dopt.outputCreateDB)
          dumpDatabase(fout);

      /* Now the rearrangeable objects. */

Further cleanup of pg_dump/pg_restore item selection code

От
"David G. Johnston"
Дата:
On Wednesday, January 24, 2018, Tom Lane <tgl@sss.pgh.pa.us> wrote:
and it has a bunch of strange
behaviors that can really only be characterized as bugs.  An example is
that

        pg_dump --create -t sometable somedb


pg_dump -t:

"The -n and -N switches have no effect when -t is used, because tables selected by -t will be dumped regardless of those switches, and non-table objects will not be dumped."

a database is a non-table object...though the proposed behavior seems reasonable; but maybe worthy of being pointed out just like the -n switches are.

(probably been asked before but why 'no effect' instead of 'will cause the dump to fail before it begins'?) 
 
The same behaviors occur if you do e.g.

        pg_dump -Fc -t sometable somedb | pg_restore --create

which is another undocumented misbehavior: the docs for pg_restore do not
suggest that --create might fail if the source archive was selective.

pg_restore -t:

"When -t is specified, pg_restore makes no attempt to restore any other database objects that the selected table(s) might depend upon. Therefore, there is no guarantee that a specific-table restore into a clean database will succeed."

That -t was specified on the dump instead of the restore could be clarified but applies nonetheless.

Do we document anywhere what is a "database object" and what are "property objects" that are restored even when -t is specified?


Attached is a patch that proposes to clean this up.  It arranges
things so that CREATE DATABASE (and, if --clean, DROP DATABASE)
is emitted if and only if you said --create, regardless of other
selectivity switches.

Makes sense, the bit about regardless of other switches probably should find it's way into the docs.

Adding a drop database where there never was one before is a bit disconcerting though...even if the switches imply the user should be expecting one,
 

 And it fixes selective pg_restore to include
subsidiary ACLs, comments, and security labels.

+1

David J.

Re: Further cleanup of pg_dump/pg_restore item selection code

От
Tom Lane
Дата:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Wednesday, January 24, 2018, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The same behaviors occur if you do e.g.
>>     pg_dump -Fc -t sometable somedb | pg_restore --create
>> which is another undocumented misbehavior: the docs for pg_restore do not
>> suggest that --create might fail if the source archive was selective.

> pg_restore -t:

> "When -t is specified, pg_restore makes no attempt to restore any other
> database objects that the selected table(s) might depend upon. Therefore,
> there is no guarantee that a specific-table restore into a clean database
> will succeed."

I think you might be missing one of the main points here, which is
that --create is specified as causing both a CREATE DATABASE and a
reconnect to that database.  If it silently becomes a no-op, which
is what happens today, the restore is likely to go into the wrong
database entirely (or at least not the DB the user expected).

I won't deny the possibility that some people are depending on the
existing wrong behavior, but that's a hazard with any bug fix.

            regards, tom lane


Re: Further cleanup of pg_dump/pg_restore item selection code

От
"David G. Johnston"
Дата:
On Wednesday, January 24, 2018, Tom Lane <tgl@sss.pgh.pa.us> wrote:

This does not go all the way towards making pg_restore's item selection
switches dump subsidiary objects the same as pg_dump would, because
pg_restore doesn't really have enough info to deal with indexes and
table constraints the way pg_dump does.  Perhaps we could intuit the
parent table by examining object dependencies, but that would be a
bunch of new logic that seems like fit material for a different patch.
In the meantime I just added a disclaimer to pg_restore.sgml explaining
this.

Unless we really wanted to keep prior-version compatibility it would seem more reliable to have pg_dump generate a new TOC entry type describing these dependencies and have pg_restore read and interpret them during selective restore mode.  Basically a cross-referencing "if you restore A also restore B". Where A can only be tables and B indexes and constraints (if we can get away with it being that limited initially).

David J.

Re: Further cleanup of pg_dump/pg_restore item selection code

От
"David G. Johnston"
Дата:
On Wednesday, January 24, 2018, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think you might be missing one of the main points here, which is
that --create is specified as causing both a CREATE DATABASE and a
reconnect to that database.

I missed the implication though I read and even thought about questioning that aspect (in pg_dump though, not restore).

Should pg_restore fail if asked to --create without a database entry in the TOC?

David J.

Re: Further cleanup of pg_dump/pg_restore item selection code

От
Tom Lane
Дата:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> Should pg_restore fail if asked to --create without a database entry in the
> TOC?

Yeah, I wondered about that too.  This patch makes it a non-issue for
archives created with v11 or later pg_dump, but there's still a hazard
if you're restoring from an archive made by an older pg_dump.  Is it
worth putting in logic to catch that?

Given the lack of field complaints, I felt fixing it going forward is
sufficient, but there's room to argue differently.

            regards, tom lane