Обсуждение: Proposed patch: make pg_dump --data-only consider FK constraints

Поиск
Список
Период
Сортировка

Proposed patch: make pg_dump --data-only consider FK constraints

От
Tom Lane
Дата:
Okay, I got tired of seeing people complain about foreign-key constraint
violations in data-only dumps.  While it's true that the problem can't
be solved in the general case (because of potentially circular
references), we could certainly make pg_dump at least *try* to order the
tables according to foreign key relationships.  It turns out not to even
require a whole lot of new code.  Accordingly I propose the attached
patch.  It will order the tables safely if it can, and otherwise
complain like this:

pg_dump: WARNING: circular foreign-key constraints among these table(s):
pg_dump:   master
pg_dump:   child
pg_dump: You may not be able to restore the dump without using --disable-triggers or temporarily dropping the
constraints.

Comments?

            regards, tom lane

Index: src/bin/pg_dump/pg_dump.c
===================================================================
RCS file: /cvsroot/pgsql/src/bin/pg_dump/pg_dump.c,v
retrieving revision 1.499
diff -c -r1.499 pg_dump.c
*** src/bin/pg_dump/pg_dump.c    30 Jul 2008 19:35:13 -0000    1.499
--- src/bin/pg_dump/pg_dump.c    7 Sep 2008 18:03:38 -0000
***************
*** 166,171 ****
--- 166,172 ----
  static void getDependencies(void);
  static void getDomainConstraints(TypeInfo *tinfo);
  static void getTableData(TableInfo *tblinfo, int numTables, bool oids);
+ static void getTableDataFKConstraints(void);
  static char *format_function_arguments(FuncInfo *finfo, char *funcargs);
  static char *format_function_arguments_old(FuncInfo *finfo, int nallargs,
                            char **allargtypes,
***************
*** 659,665 ****
--- 660,670 ----
          guessConstraintInheritance(tblinfo, numTables);

      if (!schemaOnly)
+     {
          getTableData(tblinfo, numTables, oids);
+         if (dataOnly)
+             getTableDataFKConstraints();
+     }

      if (outputBlobs && hasBlobs(g_fout))
      {
***************
*** 1392,1401 ****
--- 1397,1455 ----
              tdinfo->tdtable = &(tblinfo[i]);
              tdinfo->oids = oids;
              addObjectDependency(&tdinfo->dobj, tblinfo[i].dobj.dumpId);
+
+             tblinfo[i].dataObj = tdinfo;
          }
      }
  }

+ /*
+  * getTableDataFKConstraints -
+  *      add dump-order dependencies reflecting foreign key constraints
+  *
+  * This code is executed only in a data-only dump --- in schema+data dumps
+  * we handle foreign key issues by not creating the FK constraints until
+  * after the data is loaded.  In a data-only dump, however, we want to
+  * order the table data objects in such a way that a table's referenced
+  * tables are restored first.  (In the presence of circular references or
+  * self-references this may be impossible; we'll detect and complain about
+  * that during the dependency sorting step.)
+  */
+ static void
+ getTableDataFKConstraints(void)
+ {
+     DumpableObject **dobjs;
+     int            numObjs;
+     int            i;
+
+     /* Search through all the dumpable objects for FK constraints */
+     getDumpableObjects(&dobjs, &numObjs);
+     for (i = 0; i < numObjs; i++)
+     {
+         if (dobjs[i]->objType == DO_FK_CONSTRAINT)
+         {
+             ConstraintInfo *cinfo = (ConstraintInfo *) dobjs[i];
+             TableInfo *ftable;
+
+             /* Not interesting unless both tables are to be dumped */
+             if (cinfo->contable == NULL ||
+                 cinfo->contable->dataObj == NULL)
+                 continue;
+             ftable = findTableByOid(cinfo->confrelid);
+             if (ftable == NULL ||
+                 ftable->dataObj == NULL)
+                 continue;
+             /*
+              * Okay, make referencing table's TABLE_DATA object depend on
+              * the referenced table's TABLE_DATA object.
+              */
+             addObjectDependency(&cinfo->contable->dataObj->dobj,
+                                 ftable->dataObj->dobj.dumpId);
+         }
+     }
+     free(dobjs);
+ }
+

  /*
   * guessConstraintInheritance:
***************
*** 3626,3631 ****
--- 3680,3686 ----
                  constrinfo[j].condomain = NULL;
                  constrinfo[j].contype = contype;
                  constrinfo[j].condef = NULL;
+                 constrinfo[j].confrelid = InvalidOid;
                  constrinfo[j].conindex = indxinfo[j].dobj.dumpId;
                  constrinfo[j].conislocal = true;
                  constrinfo[j].separate = true;
***************
*** 3666,3675 ****
      ConstraintInfo *constrinfo;
      PQExpBuffer query;
      PGresult   *res;
!     int            i_condef,
!                 i_contableoid,
                  i_conoid,
!                 i_conname;
      int            ntups;

      /* pg_constraint was created in 7.3, so nothing to do if older */
--- 3721,3731 ----
      ConstraintInfo *constrinfo;
      PQExpBuffer query;
      PGresult   *res;
!     int            i_contableoid,
                  i_conoid,
!                 i_conname,
!                 i_confrelid,
!                 i_condef;
      int            ntups;

      /* pg_constraint was created in 7.3, so nothing to do if older */
***************
*** 3697,3703 ****

          resetPQExpBuffer(query);
          appendPQExpBuffer(query,
!                           "SELECT tableoid, oid, conname, "
                            "pg_catalog.pg_get_constraintdef(oid) as condef "
                            "FROM pg_catalog.pg_constraint "
                            "WHERE conrelid = '%u'::pg_catalog.oid "
--- 3753,3759 ----

          resetPQExpBuffer(query);
          appendPQExpBuffer(query,
!                           "SELECT tableoid, oid, conname, confrelid, "
                            "pg_catalog.pg_get_constraintdef(oid) as condef "
                            "FROM pg_catalog.pg_constraint "
                            "WHERE conrelid = '%u'::pg_catalog.oid "
***************
*** 3711,3716 ****
--- 3767,3773 ----
          i_contableoid = PQfnumber(res, "tableoid");
          i_conoid = PQfnumber(res, "oid");
          i_conname = PQfnumber(res, "conname");
+         i_confrelid = PQfnumber(res, "confrelid");
          i_condef = PQfnumber(res, "condef");

          constrinfo = (ConstraintInfo *) malloc(ntups * sizeof(ConstraintInfo));
***************
*** 3727,3732 ****
--- 3784,3790 ----
              constrinfo[j].condomain = NULL;
              constrinfo[j].contype = 'f';
              constrinfo[j].condef = strdup(PQgetvalue(res, j, i_condef));
+             constrinfo[j].confrelid = atooid(PQgetvalue(res, j, i_confrelid));
              constrinfo[j].conindex = 0;
              constrinfo[j].conislocal = true;
              constrinfo[j].separate = true;
***************
*** 3810,3815 ****
--- 3868,3874 ----
          constrinfo[i].condomain = tinfo;
          constrinfo[i].contype = 'c';
          constrinfo[i].condef = strdup(PQgetvalue(res, i, i_consrc));
+         constrinfo[i].confrelid = InvalidOid;
          constrinfo[i].conindex = 0;
          constrinfo[i].conislocal = true;
          constrinfo[i].separate = false;
***************
*** 4788,4793 ****
--- 4847,4853 ----
                  constrs[j].condomain = NULL;
                  constrs[j].contype = 'c';
                  constrs[j].condef = strdup(PQgetvalue(res, j, 3));
+                 constrs[j].confrelid = InvalidOid;
                  constrs[j].conindex = 0;
                  constrs[j].conislocal = (PQgetvalue(res, j, 4)[0] == 't');
                  constrs[j].separate = false;
Index: src/bin/pg_dump/pg_dump.h
===================================================================
RCS file: /cvsroot/pgsql/src/bin/pg_dump/pg_dump.h,v
retrieving revision 1.140
diff -c -r1.140 pg_dump.h
*** src/bin/pg_dump/pg_dump.h    9 May 2008 23:32:04 -0000    1.140
--- src/bin/pg_dump/pg_dump.h    7 Sep 2008 18:03:38 -0000
***************
*** 280,285 ****
--- 280,286 ----
       */
      int            numParents;        /* number of (immediate) parent tables */
      struct _tableInfo **parents;    /* TableInfos of immediate parents */
+     struct _tableDataInfo *dataObj;    /* TableDataInfo, if dumping its data */
  } TableInfo;

  typedef struct _attrDefInfo
***************
*** 352,357 ****
--- 353,359 ----
      TypeInfo   *condomain;        /* NULL if table constraint */
      char        contype;
      char       *condef;            /* definition, if CHECK or FOREIGN KEY */
+     Oid            confrelid;        /* referenced table, if FOREIGN KEY */
      DumpId        conindex;        /* identifies associated index if any */
      bool        conislocal;        /* TRUE if constraint has local definition */
      bool        separate;        /* TRUE if must dump as separate item */
Index: src/bin/pg_dump/pg_dump_sort.c
===================================================================
RCS file: /cvsroot/pgsql/src/bin/pg_dump/pg_dump_sort.c,v
retrieving revision 1.20
diff -c -r1.20 pg_dump_sort.c
*** src/bin/pg_dump/pg_dump_sort.c    1 Jan 2008 19:45:55 -0000    1.20
--- src/bin/pg_dump/pg_dump_sort.c    7 Sep 2008 18:03:38 -0000
***************
*** 947,952 ****
--- 947,975 ----
      }

      /*
+      * If all the objects are TABLE_DATA items, what we must have is a
+      * circular set of foreign key constraints (or a single self-referential
+      * table).  Give an appropriate warning and break the loop arbitrarily.
+      */
+     for (i = 0; i < nLoop; i++)
+     {
+         if (loop[i]->objType != DO_TABLE_DATA)
+             break;
+     }
+     if (i >= nLoop)
+     {
+         write_msg(NULL, "WARNING: circular foreign-key constraints among these table(s):\n");
+         for (i = 0; i < nLoop; i++)
+             write_msg(NULL, "  %s\n", loop[i]->name);
+         write_msg(NULL, "You may not be able to restore the dump without using --disable-triggers or temporarily
droppingthe constraints.\n"); 
+         if (nLoop > 1)
+             removeObjectDependency(loop[0], loop[1]->dumpId);
+         else                        /* must be a self-dependency */
+             removeObjectDependency(loop[0], loop[0]->dumpId);
+         return;
+     }
+
+     /*
       * If we can't find a principled way to break the loop, complain and break
       * it in an arbitrary fashion.
       */
***************
*** 958,964 ****
          describeDumpableObject(loop[i], buf, sizeof(buf));
          write_msg(modulename, "  %s\n", buf);
      }
!     removeObjectDependency(loop[0], loop[1]->dumpId);
  }

  /*
--- 981,991 ----
          describeDumpableObject(loop[i], buf, sizeof(buf));
          write_msg(modulename, "  %s\n", buf);
      }
!
!     if (nLoop > 1)
!         removeObjectDependency(loop[0], loop[1]->dumpId);
!     else                        /* must be a self-dependency */
!         removeObjectDependency(loop[0], loop[0]->dumpId);
  }

  /*

Re: Proposed patch: make pg_dump --data-only consider FK constraints

От
Heikki Linnakangas
Дата:
Tom Lane wrote:
> Okay, I got tired of seeing people complain about foreign-key constraint
> violations in data-only dumps.  While it's true that the problem can't
> be solved in the general case (because of potentially circular
> references), we could certainly make pg_dump at least *try* to order the
> tables according to foreign key relationships.  It turns out not to even
> require a whole lot of new code.  Accordingly I propose the attached
> patch.  It will order the tables safely if it can, and otherwise
> complain like this:

+1

> pg_dump: WARNING: circular foreign-key constraints among these table(s):
> pg_dump:   master
> pg_dump:   child
> pg_dump: You may not be able to restore the dump without using --disable-triggers or temporarily dropping the
constraints.

WARNING feels a bit too strong. I realize that that message isn't going 
to the postmaster's log, bloating it, but if a user does that regularly, 
always disabling triggers as instructed, or there is in fact never 
circular references in the data with a particular schema, seeing that 
big fat warning every time is going to become a bit tiresome. Perhaps 
"NOTE: ..." ?

How about printing that notice at the top of the dump file as well? Most 
people probably don't look at the dump files, but if someone needs to 
deal with a data-only dumps that contain circular constraints, and also 
those that don't, it would be invaluable information.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Proposed patch: make pg_dump --data-only consider FK constraints

От
David Fetter
Дата:
On Sun, Sep 07, 2008 at 02:06:40PM -0400, Tom Lane wrote:
> Okay, I got tired of seeing people complain about foreign-key
> constraint violations in data-only dumps.

Isn't this something solved in the more general case by having
pre-data, data, and post-data dump options?

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Proposed patch: make pg_dump --data-only consider FK constraints

От
Alvaro Herrera
Дата:
Heikki Linnakangas wrote:
> Tom Lane wrote:

>> pg_dump: WARNING: circular foreign-key constraints among these table(s):
>> pg_dump:   master
>> pg_dump:   child
>> pg_dump: You may not be able to restore the dump without using --disable-triggers or temporarily dropping the
constraints.
>
> WARNING feels a bit too strong. I realize that that message isn't going  
> to the postmaster's log, bloating it, but if a user does that regularly,  
> always disabling triggers as instructed, or there is in fact never  
> circular references in the data with a particular schema, seeing that  
> big fat warning every time is going to become a bit tiresome. Perhaps  
> "NOTE: ..." ?

But the warning is only going to be emitted if there are actual circular
FK constraints, so it seems OK.

> How about printing that notice at the top of the dump file as well? Most  
> people probably don't look at the dump files, but if someone needs to  
> deal with a data-only dumps that contain circular constraints, and also  
> those that don't, it would be invaluable information.

I assume that this trick will only work at restore time only for custom
or tar dumps.  A text-only dump would produce the warning to stderr at
dump time, no?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Proposed patch: make pg_dump --data-only consider FK constraints

От
Tom Lane
Дата:
David Fetter <david@fetter.org> writes:
> On Sun, Sep 07, 2008 at 02:06:40PM -0400, Tom Lane wrote:
>> Okay, I got tired of seeing people complain about foreign-key
>> constraint violations in data-only dumps.

> Isn't this something solved in the more general case by having
> pre-data, data, and post-data dump options?

No, not unless you expect that that patch will somehow forbid people
from using data-only dumps.
        regards, tom lane


Re: Proposed patch: make pg_dump --data-only consider FK constraints

От
Tom Lane
Дата:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Tom Lane wrote:
>> pg_dump: WARNING: circular foreign-key constraints among these table(s):
>> pg_dump:   master
>> pg_dump:   child
>> pg_dump: You may not be able to restore the dump without using --disable-triggers or temporarily dropping the
constraints.

> WARNING feels a bit too strong. I realize that that message isn't going 
> to the postmaster's log, bloating it, but if a user does that regularly, 
> always disabling triggers as instructed, or there is in fact never 
> circular references in the data with a particular schema, seeing that 
> big fat warning every time is going to become a bit tiresome. Perhaps 
> "NOTE: ..." ?

I doubt that very many people will ever see it at all, actually --- how
common are circular FK relationships?  And it does seem appropriate to
me for pg_dump to be noisy about the possibility of trouble at restore
time.  (Maybe the message should also suggest using a schema+data dump,
since that would be a solution at dump time?)

> How about printing that notice at the top of the dump file as well?

Hmm ... that might be feasible in plain text output, but I don't see
any easy way to get a similar effect in archive modes.
        regards, tom lane


Re: Proposed patch: make pg_dump --data-only consider FK constraints

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Heikki Linnakangas wrote:
>> How about printing that notice at the top of the dump file as well?

> I assume that this trick will only work at restore time only for custom
> or tar dumps.  A text-only dump would produce the warning to stderr at
> dump time, no?

Yes, the warning (and the re-sorting) must happen at dump time.  Given a
data-only dump, pg_restore wouldn't even have the information needed to
do anything about this.
        regards, tom lane


Re: Proposed patch: make pg_dump --data-only consider FK constraints

От
Gregory Stark
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> I doubt that very many people will ever see it at all, actually --- how
> common are circular FK relationships?  And it does seem appropriate to
> me for pg_dump to be noisy about the possibility of trouble at restore
> time.  (Maybe the message should also suggest using a schema+data dump,
> since that would be a solution at dump time?)

I think they're surprisingly common actually. Most complex databases end up
with them one way or another. Either through a parent-child relationship or
from two different types of relationships (such as "user which owns this
directory" and "home directory of this user").

The other reason to think NOTICE might be better is that it's something which,
if it occurs once, will always occur for that database. So a sysadmin will
become inured to seeing WARNING on his backups. Are there any other warning
conditions which could occur spontaneously that this would mask?

One minor thought -- surely the main use case for data-only dumps is for
importing into another brand of database. In which case the message seems a
bit awkward -- it could talk generically about disabling or dropping the
constraints and then have a hint to indicate how to do that with Postgres.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!


Re: Proposed patch: make pg_dump --data-only consider FK constraints

От
Tom Lane
Дата:
Gregory Stark <stark@enterprisedb.com> writes:
> The other reason to think NOTICE might be better is that it's something which,
> if it occurs once, will always occur for that database. So a sysadmin will
> become inured to seeing WARNING on his backups. Are there any other warning
> conditions which could occur spontaneously that this would mask?

[ shrug ... ]  Seems I'm outvoted, so NOTICE it will be.

> One minor thought -- surely the main use case for data-only dumps is for
> importing into another brand of database. In which case the message seems a
> bit awkward -- it could talk generically about disabling or dropping the
> constraints and then have a hint to indicate how to do that with Postgres.

I'm not convinced that data-only dumps are used mainly for that purpose.
In any case I don't want to turn this message into a paragraph;
mentioning all the things you might do about it in a Postgres context is
already making it longer than I would like...
        regards, tom lane


Re: Proposed patch: make pg_dump --data-only consider FK constraints

От
Stephen Frost
Дата:
* Gregory Stark (stark@enterprisedb.com) wrote:
> The other reason to think NOTICE might be better is that it's something which,
> if it occurs once, will always occur for that database. So a sysadmin will
> become inured to seeing WARNING on his backups. Are there any other warning
> conditions which could occur spontaneously that this would mask?

Impartial on NOTICE vs. WARNING, it could go either way for me.

> One minor thought -- surely the main use case for data-only dumps is for
> importing into another brand of database. In which case the message seems a
> bit awkward -- it could talk generically about disabling or dropping the
> constraints and then have a hint to indicate how to do that with Postgres.

I have to disagree strongly with this.  We have multiple PG instances
and often have cause to copy between them using data-only pg_dump.  On
the other side, I've never used pg_dump (data-only or not) to generate
something to load data into a different database.
Thanks,
    Stephen

Re: Proposed patch: make pg_dump --data-only consider FK constraints

От
Philip Warner
Дата:
Tom Lane wrote:
>> How about printing that notice at the top of the dump file as well?
>>     
>
> Hmm ... that might be feasible in plain text output, but I don't see
> any easy way to get a similar effect in archive modes.
>   

Just saw this, obviously very late, but from memory there is a TOC entry
type for comments or warnings that get output when the dump is used.