Re: pg_upgrade - link mode and transaction-wraparound data loss

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: pg_upgrade - link mode and transaction-wraparound data loss
Дата
Msg-id 201005191829.o4JITUr04829@momjian.us
обсуждение исходный текст
Ответ на Re: pg_upgrade - link mode and transaction-wraparound data loss  (Bruce Momjian <bruce@momjian.us>)
Список pgsql-hackers
Bruce Momjian wrote:
> > This is the "production system". I have absolutely no indications that
> > anything should be wrong in there. It has run rock-solid since it got
> > migrated (dump/restore) to 8.4 for about 7 months now. So I am a bit
> > scared about you telling that it seems wrong. (but that cannot be
> > attributed to pg_upgrade)
>
> I am on chat with Alvaro now and it seems we do somehow connect to
> template0 for transaction id wraparound.  I think Alvaro will post
> shortly on this.
>
> > > OK, thanks.  This does seem odd.  Frankly, having template0's
> > > datfrozenxid be wrong would not cause any kind of instability because
> > > template0 is used only by pg_dump, so I am wondering if something else
> > > is seriously wrong.
> > >
> > I also think that something was seriously wrong with the pg_upgrade'd
> > version. I'll try to reproduce and be a bit more carefull in tracking
> > the steps
> > this time.
>
> Thanks, but I think the entire problem might be this template0 xid issue
> that Alvaro and I are researching.  I can now see how invalid template0
> xids could cause the instability you saw in the new database.  Odd no
> one has seen this bug before.

OK, after talking to Alvaro and Heikki, the problem is that while you
cannot connect to template0, it is accessed by autovacuum for vacuum
freeze, even if autovacuum is turned off.  I think the reason you are
seeing this bug is that your xid counter is near 2 billion (50% to
wraparound) and the original template0 xids are the maximum distance
from your counter.

I am attaching the newest patch which fixes this issue.  I did modify
this code yesterday with another patch, and I am unclear exactly if you
need that patch as well.  CVS now has all these changes.

If you could test with this and the earlier patch, I think it will now
work fine.  Thanks for the valuable testing, and quick feedback.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com
Index: contrib/pg_upgrade/pg_upgrade.c
===================================================================
RCS file: /cvsroot/pgsql/contrib/pg_upgrade/pg_upgrade.c,v
retrieving revision 1.3
diff -c -c -r1.3 pg_upgrade.c
*** contrib/pg_upgrade/pg_upgrade.c    18 May 2010 18:40:51 -0000    1.3
--- contrib/pg_upgrade/pg_upgrade.c    19 May 2010 18:20:03 -0000
***************
*** 164,170 ****
      check_ok(ctx);

      /*
!      * We do freeze after analyze so pg_statistic is also frozen
       */
      prep_status(ctx, "Freezing all rows on the new cluster");
      exec_prog(ctx, true,
--- 164,173 ----
      check_ok(ctx);

      /*
!      * We do freeze after analyze so pg_statistic is also frozen.
!      * template0 is not frozen here, but data rows were frozen by initdb,
!      * and we set its datfrozenxid and relfrozenxids later to match the
!      * new xid counter later.
       */
      prep_status(ctx, "Freezing all rows on the new cluster");
      exec_prog(ctx, true,
***************
*** 292,339 ****
  set_frozenxids(migratorContext *ctx)
  {
      int            dbnum;
!     PGconn       *conn;
      PGresult   *dbres;
      int            ntups;

      prep_status(ctx, "Setting frozenxid counters in new cluster");

!     conn = connectToServer(ctx, "template1", CLUSTER_NEW);

      /* set pg_database.datfrozenxid */
!     PQclear(executeQueryOrDie(ctx, conn,
                                "UPDATE pg_catalog.pg_database "
!                               "SET    datfrozenxid = '%u' "
!                               "WHERE datallowconn = true",
                                ctx->old.controldata.chkpnt_nxtxid));

      /* get database names */
!     dbres = executeQueryOrDie(ctx, conn,
!                               "SELECT    datname "
!                               "FROM    pg_catalog.pg_database "
!                               "WHERE datallowconn = true");

!     /* free dbres below */
!     PQfinish(conn);

      ntups = PQntuples(dbres);
      for (dbnum = 0; dbnum < ntups; dbnum++)
      {
!         conn = connectToServer(ctx, PQgetvalue(dbres, dbnum, 0), CLUSTER_NEW);

          /* set pg_class.relfrozenxid */
          PQclear(executeQueryOrDie(ctx, conn,
                                    "UPDATE    pg_catalog.pg_class "
                                    "SET    relfrozenxid = '%u' "
          /* only heap and TOAST are vacuumed */
!                                   "WHERE    relkind = 'r' OR "
!                                   "        relkind = 't'",
                                    ctx->old.controldata.chkpnt_nxtxid));
          PQfinish(conn);
      }

      PQclear(dbres);

      check_ok(ctx);
  }

--- 295,366 ----
  set_frozenxids(migratorContext *ctx)
  {
      int            dbnum;
!     PGconn       *conn, *conn_template1;
      PGresult   *dbres;
      int            ntups;
+     int            i_datname;
+     int            i_datallowconn;

      prep_status(ctx, "Setting frozenxid counters in new cluster");

!     conn_template1 = connectToServer(ctx, "template1", CLUSTER_NEW);

      /* set pg_database.datfrozenxid */
!     PQclear(executeQueryOrDie(ctx, conn_template1,
                                "UPDATE pg_catalog.pg_database "
!                               "SET    datfrozenxid = '%u'",
                                ctx->old.controldata.chkpnt_nxtxid));

      /* get database names */
!     dbres = executeQueryOrDie(ctx, conn_template1,
!                               "SELECT    datname, datallowconn "
!                               "FROM    pg_catalog.pg_database");

!     i_datname = PQfnumber(dbres, "datname");
!     i_datallowconn = PQfnumber(dbres, "datallowconn");

      ntups = PQntuples(dbres);
      for (dbnum = 0; dbnum < ntups; dbnum++)
      {
!         char *datname = PQgetvalue(dbres, dbnum, i_datname);
!         char *datallowconn= PQgetvalue(dbres, dbnum, i_datallowconn);
!
!         /*
!          *    We must update databases where datallowconn = false, e.g.
!          *    template0, because autovacuum increments their datfrozenxids and
!          *    relfrozenxids even if autovacuum is turned off, and even though
!          *    all the data rows are already frozen  To enable this, we
!          *    temporarily change datallowconn.
!          */
!         if (strcmp(datallowconn, "f") == 0)
!             PQclear(executeQueryOrDie(ctx, conn_template1,
!                                   "UPDATE pg_catalog.pg_database "
!                                   "SET    datallowconn = true "
!                                   "WHERE datname = '%s'", datname));
!
!         conn = connectToServer(ctx, datname, CLUSTER_NEW);

          /* set pg_class.relfrozenxid */
          PQclear(executeQueryOrDie(ctx, conn,
                                    "UPDATE    pg_catalog.pg_class "
                                    "SET    relfrozenxid = '%u' "
          /* only heap and TOAST are vacuumed */
!                                   "WHERE    relkind IN ('r', 't')",
                                    ctx->old.controldata.chkpnt_nxtxid));
          PQfinish(conn);
+
+         /* Reset datallowconn flag */
+         if (strcmp(datallowconn, "f") == 0)
+             PQclear(executeQueryOrDie(ctx, conn_template1,
+                                   "UPDATE pg_catalog.pg_database "
+                                   "SET    datallowconn = false "
+                                   "WHERE datname = '%s'", datname));
      }

      PQclear(dbres);

+     PQfinish(conn_template1);
+
      check_ok(ctx);
  }


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: C function argument types
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: pg_upgrade - link mode and transaction-wraparound data loss