Re: pg_upgrade - link mode and transaction-wraparound data loss
От | Bruce Momjian |
---|---|
Тема | Re: pg_upgrade - link mode and transaction-wraparound data loss |
Дата | |
Msg-id | 201005191829.o4JITUr04829@momjian.us обсуждение исходный текст |
Ответ на | Re: pg_upgrade - link mode and transaction-wraparound data loss (Bruce Momjian <bruce@momjian.us>) |
Список | pgsql-hackers |
Bruce Momjian wrote: > > This is the "production system". I have absolutely no indications that > > anything should be wrong in there. It has run rock-solid since it got > > migrated (dump/restore) to 8.4 for about 7 months now. So I am a bit > > scared about you telling that it seems wrong. (but that cannot be > > attributed to pg_upgrade) > > I am on chat with Alvaro now and it seems we do somehow connect to > template0 for transaction id wraparound. I think Alvaro will post > shortly on this. > > > > OK, thanks. This does seem odd. Frankly, having template0's > > > datfrozenxid be wrong would not cause any kind of instability because > > > template0 is used only by pg_dump, so I am wondering if something else > > > is seriously wrong. > > > > > I also think that something was seriously wrong with the pg_upgrade'd > > version. I'll try to reproduce and be a bit more carefull in tracking > > the steps > > this time. > > Thanks, but I think the entire problem might be this template0 xid issue > that Alvaro and I are researching. I can now see how invalid template0 > xids could cause the instability you saw in the new database. Odd no > one has seen this bug before. OK, after talking to Alvaro and Heikki, the problem is that while you cannot connect to template0, it is accessed by autovacuum for vacuum freeze, even if autovacuum is turned off. I think the reason you are seeing this bug is that your xid counter is near 2 billion (50% to wraparound) and the original template0 xids are the maximum distance from your counter. I am attaching the newest patch which fixes this issue. I did modify this code yesterday with another patch, and I am unclear exactly if you need that patch as well. CVS now has all these changes. If you could test with this and the earlier patch, I think it will now work fine. Thanks for the valuable testing, and quick feedback. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com Index: contrib/pg_upgrade/pg_upgrade.c =================================================================== RCS file: /cvsroot/pgsql/contrib/pg_upgrade/pg_upgrade.c,v retrieving revision 1.3 diff -c -c -r1.3 pg_upgrade.c *** contrib/pg_upgrade/pg_upgrade.c 18 May 2010 18:40:51 -0000 1.3 --- contrib/pg_upgrade/pg_upgrade.c 19 May 2010 18:20:03 -0000 *************** *** 164,170 **** check_ok(ctx); /* ! * We do freeze after analyze so pg_statistic is also frozen */ prep_status(ctx, "Freezing all rows on the new cluster"); exec_prog(ctx, true, --- 164,173 ---- check_ok(ctx); /* ! * We do freeze after analyze so pg_statistic is also frozen. ! * template0 is not frozen here, but data rows were frozen by initdb, ! * and we set its datfrozenxid and relfrozenxids later to match the ! * new xid counter later. */ prep_status(ctx, "Freezing all rows on the new cluster"); exec_prog(ctx, true, *************** *** 292,339 **** set_frozenxids(migratorContext *ctx) { int dbnum; ! PGconn *conn; PGresult *dbres; int ntups; prep_status(ctx, "Setting frozenxid counters in new cluster"); ! conn = connectToServer(ctx, "template1", CLUSTER_NEW); /* set pg_database.datfrozenxid */ ! PQclear(executeQueryOrDie(ctx, conn, "UPDATE pg_catalog.pg_database " ! "SET datfrozenxid = '%u' " ! "WHERE datallowconn = true", ctx->old.controldata.chkpnt_nxtxid)); /* get database names */ ! dbres = executeQueryOrDie(ctx, conn, ! "SELECT datname " ! "FROM pg_catalog.pg_database " ! "WHERE datallowconn = true"); ! /* free dbres below */ ! PQfinish(conn); ntups = PQntuples(dbres); for (dbnum = 0; dbnum < ntups; dbnum++) { ! conn = connectToServer(ctx, PQgetvalue(dbres, dbnum, 0), CLUSTER_NEW); /* set pg_class.relfrozenxid */ PQclear(executeQueryOrDie(ctx, conn, "UPDATE pg_catalog.pg_class " "SET relfrozenxid = '%u' " /* only heap and TOAST are vacuumed */ ! "WHERE relkind = 'r' OR " ! " relkind = 't'", ctx->old.controldata.chkpnt_nxtxid)); PQfinish(conn); } PQclear(dbres); check_ok(ctx); } --- 295,366 ---- set_frozenxids(migratorContext *ctx) { int dbnum; ! PGconn *conn, *conn_template1; PGresult *dbres; int ntups; + int i_datname; + int i_datallowconn; prep_status(ctx, "Setting frozenxid counters in new cluster"); ! conn_template1 = connectToServer(ctx, "template1", CLUSTER_NEW); /* set pg_database.datfrozenxid */ ! PQclear(executeQueryOrDie(ctx, conn_template1, "UPDATE pg_catalog.pg_database " ! "SET datfrozenxid = '%u'", ctx->old.controldata.chkpnt_nxtxid)); /* get database names */ ! dbres = executeQueryOrDie(ctx, conn_template1, ! "SELECT datname, datallowconn " ! "FROM pg_catalog.pg_database"); ! i_datname = PQfnumber(dbres, "datname"); ! i_datallowconn = PQfnumber(dbres, "datallowconn"); ntups = PQntuples(dbres); for (dbnum = 0; dbnum < ntups; dbnum++) { ! char *datname = PQgetvalue(dbres, dbnum, i_datname); ! char *datallowconn= PQgetvalue(dbres, dbnum, i_datallowconn); ! ! /* ! * We must update databases where datallowconn = false, e.g. ! * template0, because autovacuum increments their datfrozenxids and ! * relfrozenxids even if autovacuum is turned off, and even though ! * all the data rows are already frozen To enable this, we ! * temporarily change datallowconn. ! */ ! if (strcmp(datallowconn, "f") == 0) ! PQclear(executeQueryOrDie(ctx, conn_template1, ! "UPDATE pg_catalog.pg_database " ! "SET datallowconn = true " ! "WHERE datname = '%s'", datname)); ! ! conn = connectToServer(ctx, datname, CLUSTER_NEW); /* set pg_class.relfrozenxid */ PQclear(executeQueryOrDie(ctx, conn, "UPDATE pg_catalog.pg_class " "SET relfrozenxid = '%u' " /* only heap and TOAST are vacuumed */ ! "WHERE relkind IN ('r', 't')", ctx->old.controldata.chkpnt_nxtxid)); PQfinish(conn); + + /* Reset datallowconn flag */ + if (strcmp(datallowconn, "f") == 0) + PQclear(executeQueryOrDie(ctx, conn_template1, + "UPDATE pg_catalog.pg_database " + "SET datallowconn = false " + "WHERE datname = '%s'", datname)); } PQclear(dbres); + PQfinish(conn_template1); + check_ok(ctx); }
В списке pgsql-hackers по дате отправления:
Следующее
От: Bruce MomjianДата:
Сообщение: Re: pg_upgrade - link mode and transaction-wraparound data loss