Обсуждение: visibility map corruption
Hi hackers, We recently ran into an issue where the visibility map of a relation was corrupt, running Postgres 12.4. The error we'd getwhen running a SELECT * from this table is: could not access status of transaction 3704450152 DETAIL: Could not open file "pg_xact/0DCC": No such file or directory. On the lists I could find several similar reports, but corruption like this could obviously have a very wide range of rootcauses.. see [1] [2] [3] for example - not all of them have their root cause known. This particular case was similar to reported cases above, but also has some differences. The following query returns ~21.000 rows, which indicates something inconsistent between the visibility map and the pd_all_visibleflag on the page: select * from pg_check_frozen('tbl'); Looking at one of the affected pages with pageinspect: =# SELECT lp,lp_off,lp_flags,lp_len,t_xmin,t_xmax,t_field3,t_ctid,t_infomask2,t_infomask,t_hoff,t_oid FROM heap_page_items(get_raw_page('tbl',726127)); ┌────┬────────┬──────────┬────────┬────────────┬────────────┬──────────┬────────────┬─────────────┬────────────┬────────┬───────┐ │ lp │ lp_off │ lp_flags │ lp_len │ t_xmin │ t_xmax │ t_field3 │ t_ctid │ t_infomask2 │ t_infomask │ t_hoff │t_oid │ ├────┼────────┼──────────┼────────┼────────────┼────────────┼──────────┼────────────┼─────────────┼────────────┼────────┼───────┤ │ 1 │ 6328 │ 1 │ 1864 │ 3704450155 │ 3704450155 │ 1 │ (726127,1) │ 249 │ 8339 │ 56 │ ∅ │ │ 2 │ 4464 │ 1 │ 1864 │ 3704450155 │ 3704450155 │ 1 │ (726127,2) │ 249 │ 8339 │ 56 │ ∅ │ │ 3 │ 2600 │ 1 │ 1864 │ 3704450155 │ 3704450155 │ 1 │ (726127,3) │ 249 │ 8339 │ 56 │ ∅ │ │ 4 │ 680 │ 1 │ 1920 │ 3704450155 │ 3704450155 │ 1 │ (726127,4) │ 249 │ 8339 │ 56 │ ∅ │ └────┴────────┴──────────┴────────┴────────────┴────────────┴──────────┴────────────┴─────────────┴────────────┴────────┴───────┘ t_infomask shows that HEAP_XMIN_COMMITTED and HEAP_XMIN_INVALID bits are both unset. This pg_visibility() call shows the inconsistency between VM and page, with PD_ALL_VISIBLE=false =# select * from pg_visibility('tbl', 726127); ┌─────────────┬────────────┬────────────────┐ │ all_visible │ all_frozen │ pd_all_visible │ ├─────────────┼────────────┼────────────────┤ │ t │ t │ f │ └─────────────┴────────────┴────────────────┘ Looking at other pages show the same information. What's interesting is that out of the affected tuples returned by pg_check_frozen, over 99% belong to 1 transaction (3704450155as seen above) and the remaining few are from one other transaction that occurred at roughly the same time. I find it hard to believe that this is due to some random bit flipping, because many pages are affected, but the "incorrect"ones are in total only from two specific transactions which occurred at roughly the same time. There were alsono server crashes or other known failures around the time of this transaction. I'm not ruling out any other kind of failurestill, but I also cannot really explain how this could have happened. The server has PG12.4 with full_page_writes=on, data_checksums=off. It's a large analytics database. The VM inconsistenciesalso occur on the streaming replicas. I realize these cases are pretty rare and hard to debug, but I wanted to share the information I found so far here for reference.Maybe someone has an idea what occurred, or maybe someone in the future finds it useful when he finds somethingsimilar. I have no idea how the inconsistency between VM and PD_ALL_VISIBLE started - from looking through the code I can't reallyfind any way how this could occur. However, for it to lead to the problem described here, I believe there should be*no* SELECT that touches that particular page after the insert/update transaction and before the transaction log gets truncated.If the page is read before the transaction log gets truncated, then the hint bit HEAP_XMIN_COMMITTED will get setand future reads will succeed regardless of tx log truncation. One of the replica's had this happen to it: the affectedpages are identical to the primary except that the HEAP_XMIN_COMMITTED flag is set (note that the VM inconsistencyis still there on the replica though: PD_ALL_VISIBLE=false even though VM shows that all_frozen=all_visible=true).But I can query these rows on the replica without issues, because it doesn't check the tx logwhen it sees that HEAP_XMIN_COMMITTED is set. -Floris [1] https://postgrespro.com/list/thread-id/2422376 [2] https://postgrespro.com/list/thread-id/2501800 [3] https://postgrespro.com/list/thread-id/2321949
On Sun, Jul 4, 2021 at 1:44 PM Floris Van Nee <florisvannee@optiver.com> wrote: > We recently ran into an issue where the visibility map of a relation was corrupt, running Postgres 12.4. The error we'dget when running a SELECT * from this table is: > > could not access status of transaction 3704450152 > DETAIL: Could not open file "pg_xact/0DCC": No such file or directory. Have you ever used pg_upgrade on this database? -- Peter Geoghegan
> On Sun, Jul 4, 2021 at 1:44 PM Floris Van Nee <florisvannee@optiver.com> > wrote: > > We recently ran into an issue where the visibility map of a relation was > corrupt, running Postgres 12.4. The error we'd get when running a SELECT * > from this table is: > > > > could not access status of transaction 3704450152 > > DETAIL: Could not open file "pg_xact/0DCC": No such file or directory. > > Have you ever used pg_upgrade on this database? > Yes. The last time (from v11 to v12) was in October 2020. The transaction id in the tuples (the one PG is trying to checkin the tx log) dated from February 2021. I do believe (but am not 100% certain) that the affected table already existedat the time of the last pg_upgrade though.
On Sun, Jul 4, 2021 at 2:26 PM Floris Van Nee <florisvannee@optiver.com> wrote: > > Have you ever used pg_upgrade on this database? > > > > Yes. The last time (from v11 to v12) was in October 2020. The transaction id in the tuples (the one PG is trying to checkin the tx log) dated from February 2021. I do believe (but am not 100% certain) that the affected table already existedat the time of the last pg_upgrade though. I wonder if it's related to this issue: https://www.postgresql.org/message-id/20210423234256.hwopuftipdmp3okf@alap3.anarazel.de Have you increased autovacuum_freeze_max_age from its default? This already sounds like the kind of database where that would make sense. -- Peter Geoghegan
> > I wonder if it's related to this issue: > > https://www.postgresql.org/message- > id/20210423234256.hwopuftipdmp3okf@alap3.anarazel.de > > Have you increased autovacuum_freeze_max_age from its default? This > already sounds like the kind of database where that would make sense. > autovacuum_freeze_max_age is increased in our setup indeed (it is set to 500M). However, we do regularly run manual VACUUM(FREEZE) on individual tables in the database, including this one. A lot of tables in the database follow an INSERT-onlypattern and since it's not running v13 yet, this meant that these tables would only rarely be touched by autovacuum.Autovacuum would sometimes kick in on some of these tables at the same time at unfortunate moments. Thereforewe have some regular job running that VACUUM (FREEZE)s tables with a xact age higher than a (low, 10M) thresholdourselves.
On Sun, Jul 4, 2021 at 10:28:25PM +0000, Floris Van Nee wrote: > > > > I wonder if it's related to this issue: > > > > https://www.postgresql.org/message- > > id/20210423234256.hwopuftipdmp3okf@alap3.anarazel.de > > > > Have you increased autovacuum_freeze_max_age from its default? This > > already sounds like the kind of database where that would make > > sense. > > > > autovacuum_freeze_max_age is increased in our setup indeed (it is > set to 500M). However, we do regularly run manual VACUUM (FREEZE) > on individual tables in the database, including this one. A lot of > tables in the database follow an INSERT-only pattern and since it's > not running v13 yet, this meant that these tables would only rarely > be touched by autovacuum. Autovacuum would sometimes kick in on some > of these tables at the same time at unfortunate moments. Therefore we > have some regular job running that VACUUM (FREEZE)s tables with a xact > age higher than a (low, 10M) threshold ourselves. OK, this is confirmation that the pg_resetwal bug, and its use by pg_upgrade, is a serious issue that needs to be addressed. I am prepared to work on it now. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Tue, Jul 6, 2021 at 10:27 AM Bruce Momjian <bruce@momjian.us> wrote: > OK, this is confirmation that the pg_resetwal bug, and its use by > pg_upgrade, is a serious issue that needs to be addressed. I am > prepared to work on it now. To be clear, I'm not 100% sure that this is related to the pg_upgrade + "pg_resetwal sets oldestXid to an invented value" issue. I am sure that that is a serious issue that needs to be addressed sooner rather than later, though. -- Peter Geoghegan
On Tue, Jul 6, 2021 at 10:32:24AM -0700, Peter Geoghegan wrote: > On Tue, Jul 6, 2021 at 10:27 AM Bruce Momjian <bruce@momjian.us> wrote: > > OK, this is confirmation that the pg_resetwal bug, and its use by > > pg_upgrade, is a serious issue that needs to be addressed. I am > > prepared to work on it now. > > To be clear, I'm not 100% sure that this is related to the pg_upgrade > + "pg_resetwal sets oldestXid to an invented value" issue. I am sure > that that is a serious issue that needs to be addressed sooner rather > than later, though. Well, pg_upgrade corruptions are rare, but so is modifying autovacuum_freeze_max_age. If we have a corruption and we know autovacuum_freeze_max_age was modified, odds are that is the cause. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Tue, Jul 6, 2021 at 10:58 AM Bruce Momjian <bruce@momjian.us> wrote: > Well, pg_upgrade corruptions are rare, but so is modifying > autovacuum_freeze_max_age. If we have a corruption and we know > autovacuum_freeze_max_age was modified, odds are that is the cause. My point is that there isn't necessarily that much use in trying to determine what really happened here. It would be nice to know for sure, but it shouldn't affect what we do about the bug. In a way the situation with the bug is simple. Clearly Tom wasn't thinking of pg_upgrade when he wrote the relevant pg_resetwal code that sets oldestXid, because pg_upgrade didn't exist at the time. He was thinking of restoring the database to a relatively sane state in the event of some kind of corruption, with all of the uncertainty that goes with that. Nobody noticed that pg_upgrade gets this same behavior until much more recently. Now that we see the problem laid out, there isn't much to think about that will affect the response to the issue. At least as far as I can tell. We know that pg_upgrade uses pg_resetwal's -x flag in a context where there is no reason at all to think that the database is corrupt -- Tom can't have anticipated that all those years ago. It's easy to see that the behavior is wrong for pg_upgrade, and it's very hard to imagine any way in which it might have accidentally made some sense all along. We should just carry forward the original oldestXid. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > ... We should just carry forward the original oldestXid. Yup. It's a bit silly that we recognized the need to do that for oldestMultiXid yet not for oldestXid. BTW, is it really necessary for copy_xact_xlog_xid to invoke pg_resetwal so many times? Why can't we pass all of the update-this options in one call? Who's going to do the legwork on this? regards, tom lane
Hi, On 7/6/21 8:57 PM, Tom Lane wrote: > Peter Geoghegan <pg@bowt.ie> writes: >> ... We should just carry forward the original oldestXid. > Yup. It's a bit silly that we recognized the need to do that > for oldestMultiXid yet not for oldestXid. FWIW there is a commitfest entry for it: https://commitfest.postgresql.org/33/3105/ Thanks Bertrand
On Tue, Jul 6, 2021 at 11:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Geoghegan <pg@bowt.ie> writes: > > ... We should just carry forward the original oldestXid. > > Yup. It's a bit silly that we recognized the need to do that > for oldestMultiXid yet not for oldestXid. True. But at the same time it somehow doesn't seem silly at all. IME some of the most devious bugs evade detection by hiding in plain sight. It looks like amcheck's verify_heapam.c functionality almost catches bugs like this one. Something for Mark (CC'd) to consider. Does it matter that we usually "ctx.oldest_xid = ctx.relfrozenxid", and so usually use pg_class.relfrozenxid as our oldest_xid (and not ShmemVariableCache->oldestXid)? In other words, could we be doing more to sanitize ShmemVariableCache->oldestXid, especially when the relation's pg_class.relfrozenxid happens to be set to a real XID? > BTW, is it really necessary for copy_xact_xlog_xid to invoke pg_resetwal > so many times? Why can't we pass all of the update-this options in one > call? No opinion here. > Who's going to do the legwork on this? Can Bruce take care of committing the patch for this? Bruce? This should definitely be in the next point release IMV. -- Peter Geoghegan
> On Jul 6, 2021, at 2:27 PM, Peter Geoghegan <pg@bowt.ie> wrote: > > It looks like amcheck's verify_heapam.c functionality almost catches > bugs like this one. Something for Mark (CC'd) to consider. Does it > matter that we usually "ctx.oldest_xid = ctx.relfrozenxid", and so > usually use pg_class.relfrozenxid as our oldest_xid (and not > ShmemVariableCache->oldestXid)? In other words, could we be doing more > to sanitize ShmemVariableCache->oldestXid, especially when the > relation's pg_class.relfrozenxid happens to be set to a real XID? Thanks, Peter, for drawing my attention to this. I had already been following this thread, but had not yet thought aboutthe problem in terms of amcheck. I will investigate possible solutions in verify_heapam(). — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Jul 6, 2021 at 3:12 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote: > Thanks, Peter, for drawing my attention to this. I had already been following this thread, but had not yet thought aboutthe problem in terms of amcheck. > > I will investigate possible solutions in verify_heapam(). Thanks! Great that we might be able to make a whole class of bugs detectable with the new amcheck stuff. Glad that I didn't forget about amcheck myself -- I almost forgot. When I was working on the btree amcheck code, I looked for interesting historical bugs and made sure that I could detect them. That seems even more important with heapam. I wouldn't be surprised if one or two important invariants were missed, in part because the heapam design didn't have invariants as a starting point. -- Peter Geoghegan
On Tue, Jul 6, 2021 at 02:27:34PM -0700, Peter Geoghegan wrote: > > BTW, is it really necessary for copy_xact_xlog_xid to invoke pg_resetwal > > so many times? Why can't we pass all of the update-this options in one > > call? > > No opinion here. > > > Who's going to do the legwork on this? > > Can Bruce take care of committing the patch for this? Bruce? > > This should definitely be in the next point release IMV. Yes, I can, though it seems like a much bigger issue than pg_upgrade. I will be glad to dig into it. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Tue, Jul 6, 2021 at 3:30 PM Bruce Momjian <bruce@momjian.us> wrote: > Yes, I can, though it seems like a much bigger issue than pg_upgrade. > I will be glad to dig into it. I'm not sure what you mean by that. Technically this would be an issue for any program that uses "pg_resetwal -x" in the way that pg_upgrade does, with those same expectations. But isn't pg_upgrade the only known program that behaves like that? I don't see any reason why this wouldn't be treated as a pg_upgrade bug in the release notes, regardless of the exact nature or provenance of the issue -- the pg_upgrade framing seems useful because this is a practical problem for pg_upgrade users alone. Have I missed something? -- Peter Geoghegan
On Tue, Jul 6, 2021 at 06:30:41PM -0400, Bruce Momjian wrote: > On Tue, Jul 6, 2021 at 02:27:34PM -0700, Peter Geoghegan wrote: > > > BTW, is it really necessary for copy_xact_xlog_xid to invoke pg_resetwal > > > so many times? Why can't we pass all of the update-this options in one > > > call? > > > > No opinion here. > > > > > Who's going to do the legwork on this? > > > > Can Bruce take care of committing the patch for this? Bruce? > > > > This should definitely be in the next point release IMV. > > Yes, I can, though it seems like a much bigger issue than pg_upgrade. > I will be glad to dig into it. Bertrand Drouvot provided a patch in the thread, so I will start from that; CC'ing him too. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Tue, Jul 6, 2021 at 03:46:48PM -0700, Peter Geoghegan wrote: > On Tue, Jul 6, 2021 at 3:30 PM Bruce Momjian <bruce@momjian.us> wrote: > > Yes, I can, though it seems like a much bigger issue than pg_upgrade. > > I will be glad to dig into it. > > I'm not sure what you mean by that. Technically this would be an issue > for any program that uses "pg_resetwal -x" in the way that pg_upgrade > does, with those same expectations. But isn't pg_upgrade the only > known program that behaves like that? > > I don't see any reason why this wouldn't be treated as a pg_upgrade > bug in the release notes, regardless of the exact nature or provenance > of the issue -- the pg_upgrade framing seems useful because this is a > practical problem for pg_upgrade users alone. Have I missed something? My point is that there are a lot internals involved here that are not part of pg_upgrade, though it probably only affects pg_upgrade. Anyway, Bertrand patch seems to have what I need. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Tue, Jul 6, 2021 at 3:49 PM Bruce Momjian <bruce@momjian.us> wrote: > My point is that there are a lot internals involved here that are not > part of pg_upgrade, though it probably only affects pg_upgrade. Anyway, > Bertrand patch seems to have what I need. I was confused by your remarks because I am kind of looking at it from the opposite angle. At least now that I've thought about it a bit. Since the snippet of pg_resetwal code that sets oldestXid wasn't ever intended to be used by pg_upgrade, but was anyway, what we have is a something that's clearly totally wrong (at least in the pg_upgrade case). It's not just wrong for pg_upgrade to do things that way -- it's also wildly unreasonable. We heard a complaint about this from Reddit only because it worked "as designed", and so made the cluster immediately have an anti-wraparound autovacuum. But why would anybody want that behavior, even if it was implemented correctly? It simply makes no sense. The consequences of this bug are indeed complicated and subtle and will probably never be fully understood. But at the same time fixing the bug now seems kind of simple. (Working backwards to arrive here was a bit tricky, mind you.) -- Peter Geoghegan
On Tue, Jul 6, 2021 at 06:49:10PM -0400, Bruce Momjian wrote: > On Tue, Jul 6, 2021 at 03:46:48PM -0700, Peter Geoghegan wrote: > > On Tue, Jul 6, 2021 at 3:30 PM Bruce Momjian <bruce@momjian.us> wrote: > > > Yes, I can, though it seems like a much bigger issue than pg_upgrade. > > > I will be glad to dig into it. > > > > I'm not sure what you mean by that. Technically this would be an issue > > for any program that uses "pg_resetwal -x" in the way that pg_upgrade > > does, with those same expectations. But isn't pg_upgrade the only > > known program that behaves like that? > > > > I don't see any reason why this wouldn't be treated as a pg_upgrade > > bug in the release notes, regardless of the exact nature or provenance > > of the issue -- the pg_upgrade framing seems useful because this is a > > practical problem for pg_upgrade users alone. Have I missed something? > > My point is that there are a lot internals involved here that are not > part of pg_upgrade, though it probably only affects pg_upgrade. Anyway, > Bertrand patch seems to have what I need. One question is how do we want to handle cases where -x next_xid is used but -u oldestXid is not used? Compute a value for oldestXid like we did previously? Throw an error? Leave oldestXid unchanged? I am thinking the last option. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Tue, Jul 6, 2021 at 08:36:13PM -0400, Bruce Momjian wrote: > On Tue, Jul 6, 2021 at 06:49:10PM -0400, Bruce Momjian wrote: > > My point is that there are a lot internals involved here that are not > > part of pg_upgrade, though it probably only affects pg_upgrade. Anyway, > > Bertrand patch seems to have what I need. > > One question is how do we want to handle cases where -x next_xid is used > but -u oldestXid is not used? Compute a value for oldestXid like we did > previously? Throw an error? Leave oldestXid unchanged? I am thinking > the last option. Here is a modified version of Bertrand's patch, with docs, that does the last option. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Вложения
Hi, On 7/7/21 3:49 AM, Bruce Momjian wrote: > On Tue, Jul 6, 2021 at 08:36:13PM -0400, Bruce Momjian wrote: >> On Tue, Jul 6, 2021 at 06:49:10PM -0400, Bruce Momjian wrote: >>> My point is that there are a lot internals involved here that are not >>> part of pg_upgrade, though it probably only affects pg_upgrade. Anyway, >>> Bertrand patch seems to have what I need. >> One question is how do we want to handle cases where -x next_xid is used >> but -u oldestXid is not used? Compute a value for oldestXid like we did >> previously? Throw an error? Leave oldestXid unchanged? I am thinking >> the last option. > Here is a modified version of Bertrand's patch, with docs, that does the > last option. Thanks for having looked at it. It looks good to me, but i have one question: + printf(_(" -u, --oldest-transaction-id=XID set oldest transaction ID\n")); and + if (!TransactionIdIsNormal(set_oldest_xid)) + { + pg_log_error("oldest transaction ID (-u) must be greater or equal to %u", FirstNormalTransactionId); + exit(1); + } I am wondering if we should not keep my original proposal "oldest unfrozen transaction" (as compare to "oldest transaction") in both output to: - make the wording similar with what we can found in StartupXLOG(): ereport(DEBUG1, (errmsg_internal("oldest unfrozen transaction ID: %u, in database %u", checkPoint.oldestXid, checkPoint.oldestXidDB))); - give the new "-u" a sense (somehow) from a naming point of view. What do you think? Thanks Bertrand
On Thu, Jul 8, 2021 at 07:35:58AM +0200, Drouvot, Bertrand wrote: > Thanks for having looked at it. > > It looks good to me, but i have one question: > > + printf(_(" -u, --oldest-transaction-id=XID set oldest transaction > ID\n")); > > and > > + if (!TransactionIdIsNormal(set_oldest_xid)) > + { > + pg_log_error("oldest transaction ID (-u) must be > greater or equal to %u", FirstNormalTransactionId); > + exit(1); > + } > > I am wondering if we should not keep my original proposal "oldest unfrozen > transaction" (as compare to "oldest transaction") in both output to: > > - make the wording similar with what we can found in StartupXLOG(): > > ereport(DEBUG1, > (errmsg_internal("oldest unfrozen transaction ID: %u, in > database %u", > checkPoint.oldestXid, > checkPoint.oldestXidDB))); > > - give the new "-u" a sense (somehow) from a naming point of view. > > What do you think? I was wondering about that too. We don't use the term "unfrozen" in the pg_control output, and only in a few places in our docs. I added the word "unfrozen" for the -u doc description in this updated patch --- not sure how much farther to go in using this term, but I am afraid if I use it in the areas you suggested above, it will confuse people who are trying to match it to the pg_control output. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Вложения
Also, the pg_upgrade status message still seems to be misplaced: In 20210706190612.GM22043@telsasoft.com, Justin Pryzby wrote: > I re-arranged the pg_upgrade output of that patch: it was in the middle of the > two halves: "Setting next transaction ID and epoch for new cluster" +++ b/src/bin/pg_upgrade/pg_upgrade.c @@ -473,6 +473,12 @@ copy_xact_xlog_xid(void) "\"%s/pg_resetwal\" -f -x %u \"%s\"", new_cluster.bindir, old_cluster.controldata.chkpnt_nxtxid, new_cluster.pgdata); + check_ok(); + prep_status("Setting oldest XID for new cluster"); + exec_prog(UTILITY_LOG_FILE, NULL, true, true, + "\"%s/pg_resetwal\" -f -u %u \"%s\"", + new_cluster.bindir, old_cluster.controldata.chkpnt_oldstxid, + new_cluster.pgdata); exec_prog(UTILITY_LOG_FILE, NULL, true, true, "\"%s/pg_resetwal\" -f -e %u \"%s\"", new_cluster.bindir, old_cluster.controldata.chkpnt_nxtepoch, -- Justin
On Thu, Jul 8, 2021 at 08:11:14AM -0500, Justin Pryzby wrote: > Also, the pg_upgrade status message still seems to be misplaced: > > In 20210706190612.GM22043@telsasoft.com, Justin Pryzby wrote: > > I re-arranged the pg_upgrade output of that patch: it was in the middle of the > > two halves: "Setting next transaction ID and epoch for new cluster" > > +++ b/src/bin/pg_upgrade/pg_upgrade.c > @@ -473,6 +473,12 @@ copy_xact_xlog_xid(void) > "\"%s/pg_resetwal\" -f -x %u \"%s\"", > new_cluster.bindir, old_cluster.controldata.chkpnt_nxtxid, > new_cluster.pgdata); > + check_ok(); > + prep_status("Setting oldest XID for new cluster"); > + exec_prog(UTILITY_LOG_FILE, NULL, true, true, > + "\"%s/pg_resetwal\" -f -u %u \"%s\"", > + new_cluster.bindir, old_cluster.controldata.chkpnt_oldstxid, > + new_cluster.pgdata); > exec_prog(UTILITY_LOG_FILE, NULL, true, true, > "\"%s/pg_resetwal\" -f -e %u \"%s\"", > new_cluster.bindir, old_cluster.controldata.chkpnt_nxtepoch, Wow, you are 100% correct. Updated patch attached. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Вложения
On 7/8/21 3:08 PM, Bruce Momjian wrote: > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you canconfirm the sender and know the content is safe. > > > > On Thu, Jul 8, 2021 at 07:35:58AM +0200, Drouvot, Bertrand wrote: >> Thanks for having looked at it. >> >> It looks good to me, but i have one question: >> >> + printf(_(" -u, --oldest-transaction-id=XID set oldest transaction >> ID\n")); >> >> and >> >> + if (!TransactionIdIsNormal(set_oldest_xid)) >> + { >> + pg_log_error("oldest transaction ID (-u) must be >> greater or equal to %u", FirstNormalTransactionId); >> + exit(1); >> + } >> >> I am wondering if we should not keep my original proposal "oldest unfrozen >> transaction" (as compare to "oldest transaction") in both output to: >> >> - make the wording similar with what we can found in StartupXLOG(): >> >> ereport(DEBUG1, >> (errmsg_internal("oldest unfrozen transaction ID: %u, in >> database %u", >> checkPoint.oldestXid, >> checkPoint.oldestXidDB))); >> >> - give the new "-u" a sense (somehow) from a naming point of view. >> >> What do you think? > I was wondering about that too. We don't use the term "unfrozen" in the > pg_control output, and only in a few places in our docs. I added the > word "unfrozen" for the -u doc description in this updated patch Thanks! > --- > not sure how much farther to go in using this term, but I am afraid if I > use it in the areas you suggested above, it will confuse people who are > trying to match it to the pg_control output. Makes sense, thanks for your feedback. Bertrand
On Thu, Jul 8, 2021 at 09:51:47AM -0400, Bruce Momjian wrote: > On Thu, Jul 8, 2021 at 08:11:14AM -0500, Justin Pryzby wrote: > > Also, the pg_upgrade status message still seems to be misplaced: > > > > In 20210706190612.GM22043@telsasoft.com, Justin Pryzby wrote: > > > I re-arranged the pg_upgrade output of that patch: it was in the middle of the > > > two halves: "Setting next transaction ID and epoch for new cluster" > > > > +++ b/src/bin/pg_upgrade/pg_upgrade.c > > @@ -473,6 +473,12 @@ copy_xact_xlog_xid(void) > > "\"%s/pg_resetwal\" -f -x %u \"%s\"", > > new_cluster.bindir, old_cluster.controldata.chkpnt_nxtxid, > > new_cluster.pgdata); > > + check_ok(); > > + prep_status("Setting oldest XID for new cluster"); > > + exec_prog(UTILITY_LOG_FILE, NULL, true, true, > > + "\"%s/pg_resetwal\" -f -u %u \"%s\"", > > + new_cluster.bindir, old_cluster.controldata.chkpnt_oldstxid, > > + new_cluster.pgdata); > > exec_prog(UTILITY_LOG_FILE, NULL, true, true, > > "\"%s/pg_resetwal\" -f -e %u \"%s\"", > > new_cluster.bindir, old_cluster.controldata.chkpnt_nxtepoch, > > Wow, you are 100% correct. Updated patch attached. OK, I have the patch ready to apply to all supported Postgres versions, and it passes all my cross-version pg_upgrade tests. However, I am now stuck on the commit message text, and I think this is the point Peter Geoghegan was trying to make earlier --- while we know that preserving the oldest xid in pg_control is the right thing to do, and that setting it to the current xid - 2 billion (the old behavior) causes vacuum freeze to run on all tables, but what else does this patch affect? As far as I know, seeing a very low oldest xid causes autovacuum to check all objects and make sure their relfrozenxid is less then autovacuum_freeze_max_age, but isn't that just a check? Would that cause any table scans? I would think not. And would this cause incorrect truncation of pg_xact or fsm or vm files? I would think not too. Even if the old and new cluster had mismatched autovacuum_freeze_max_age values, I don't see how that would cause any corruption either. I could perhaps see corruption happening if pg_control's oldest xid value was closer to the current xid value than it should be, but I can't see how having it 2-billion away could cause harm, unless perhaps pg_upgrade itself used enough xids to cause the counter to wrap more than 2^31 away from the oldest xid recorded in pg_control. What I am basically asking is how to document this and what it fixes. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Fri, Jul 23, 2021 at 5:08 PM Bruce Momjian <bruce@momjian.us> wrote: > However, I am now stuck on the commit message text, and I think this is > the point Peter Geoghegan was trying to make earlier --- while we know > that preserving the oldest xid in pg_control is the right thing to do, > and that setting it to the current xid - 2 billion (the old behavior) > causes vacuum freeze to run on all tables, but what else does this patch > affect? As far as I know the only other thing that it might affect is the traditional use of pg_resetwal: recovering likely-corrupt data. Getting the database to limp along for long enough to pg_dump. That is the only interpretation that makes sense, because the code in question predates pg_upgrade. AFAICT that was the original spirit of the code that we're changing here. > As far as I know, seeing a very low oldest xid causes autovacuum to > check all objects and make sure their relfrozenxid is less then > autovacuum_freeze_max_age, but isn't that just a check? Would that > cause any table scans? I would think not. And would this cause > incorrect truncation of pg_xact or fsm or vm files? I would think not > too. Tom actually wrote this code. I believe that he questioned the whole basis of it himself quite recently. Whether or not it's okay to change the behavior in contexts outside of pg_upgrade (contexts where the user invokes pg_resetwal -x to get the system to start) is perhaps debatable. It probably doesn't matter very much if you preserve that behavior for non-pg_upgrade cases -- hard to say. At the same time it's now easy to see that pg_upgrade shouldn't be doing this. > Even if the old and new cluster had mismatched autovacuum_freeze_max_age > values, I don't see how that would cause any corruption either. Sometimes the pg_control value for oldest XID is used as the oldest non-frozen XID that's expected in the table. Other times it's relfrozenxid itself IIRC. > I could perhaps see corruption happening if pg_control's oldest xid > value was closer to the current xid value than it should be, but I can't > see how having it 2-billion away could cause harm, unless perhaps > pg_upgrade itself used enough xids to cause the counter to wrap more > than 2^31 away from the oldest xid recorded in pg_control. > > What I am basically asking is how to document this and what it fixes. ISTM that this is a little like commits 78db307bb2 and a61daa14. Maybe take a look at those? -- Peter Geoghegan
On Fri, Jul 23, 2021 at 05:47:18PM -0700, Peter Geoghegan wrote: > > I could perhaps see corruption happening if pg_control's oldest xid > > value was closer to the current xid value than it should be, but I can't > > see how having it 2-billion away could cause harm, unless perhaps > > pg_upgrade itself used enough xids to cause the counter to wrap more > > than 2^31 away from the oldest xid recorded in pg_control. > > > > What I am basically asking is how to document this and what it fixes. > > ISTM that this is a little like commits 78db307bb2 and a61daa14. Maybe > take a look at those? Agreed. I just wanted to make sure I wasn't missing an important aspect of this patch. Thanks. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Fri, Jul 23, 2021 at 09:01:18PM -0400, Bruce Momjian wrote: > On Fri, Jul 23, 2021 at 05:47:18PM -0700, Peter Geoghegan wrote: > > > I could perhaps see corruption happening if pg_control's oldest xid > > > value was closer to the current xid value than it should be, but I can't > > > see how having it 2-billion away could cause harm, unless perhaps > > > pg_upgrade itself used enough xids to cause the counter to wrap more > > > than 2^31 away from the oldest xid recorded in pg_control. > > > > > > What I am basically asking is how to document this and what it fixes. > > > > ISTM that this is a little like commits 78db307bb2 and a61daa14. Maybe > > take a look at those? > > Agreed. I just wanted to make sure I wasn't missing an important aspect > of this patch. Thanks. Another question --- with the previous code, the oldest xid was always set to a reasonable value, -2 billion less than the current xid. With the new code, the oldest xid might be slightly higher than the current xid if they use -x but not -u. Is that acceptable? I think we agreed it was. pg_upgrade will always set both. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Sat, Jul 24, 2021 at 10:01:05AM -0400, Bruce Momjian wrote: > On Fri, Jul 23, 2021 at 09:01:18PM -0400, Bruce Momjian wrote: > > On Fri, Jul 23, 2021 at 05:47:18PM -0700, Peter Geoghegan wrote: > > > > I could perhaps see corruption happening if pg_control's oldest xid > > > > value was closer to the current xid value than it should be, but I can't > > > > see how having it 2-billion away could cause harm, unless perhaps > > > > pg_upgrade itself used enough xids to cause the counter to wrap more > > > > than 2^31 away from the oldest xid recorded in pg_control. > > > > > > > > What I am basically asking is how to document this and what it fixes. > > > > > > ISTM that this is a little like commits 78db307bb2 and a61daa14. Maybe > > > take a look at those? > > > > Agreed. I just wanted to make sure I wasn't missing an important aspect > > of this patch. Thanks. > > Another question --- with the previous code, the oldest xid was always > set to a reasonable value, -2 billion less than the current xid. With > the new code, the oldest xid might be slightly higher than the current > xid if they use -x but not -u. Is that acceptable? I think we agreed it > was. pg_upgrade will always set both. This patch has been applied back to 9.6 and will appear in the next minor release. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.