Обсуждение: Avoiding another needless ERROR during nbtree page deletion
Work from commit 5b861baa (later backpatched as commit 43e409ce) taught nbtree to press on with vacuuming an index when page deletion fails to "re-find" a downlink in the target page's parent (or in some page to the right of the parent) due to index corruption. To recap, avoiding ERRORs during vacuuming (even those caused by index corruption) is useful because there is no reason to expect the error to go away on its own; we're relying on the DBA to notice the error and REINDEX before wraparound/xidStopLimit kicks in. This is at least the case on versions before 14, where the failsafe can eventually kick-in and avoid catastrophe (though the failsafe can only be expected to avoid the worst consequences). It has come to my attention that there is a remaining issue of the same general nature in nbtree VACUUM's page deletion code. Though this remaining issue seems significantly less likely to come up in practice, there is no reason to take any chances here. Attached patch fixes it. Also attached is a bugfix for a minor issue in amcheck's bt_index_parent_check() function, which I noticed in passing, while I tested the first patch. We assumed that we'd always land on the leftmost page on each level first (the leftmost according to internal pages one level up). That assumption is faulty because page deletion of the leftmost page is quite possible. Page deletion can be interrupted, leaving a half-dead leaf page (possibly the leftmost leaf page) without any downlink one level up, while still leaving a left sibling link on the leaf level (in the leaf page that isn't about to become the leftmost, but won't until the interrupted page deletion can be completed). IMV this should be backpatched all the way. The issue in question is rather unlikely to come up. But the fix that I've come up with is very well targeted. It seems just about impossible for it to affect any user that didn't already have a serious problem (without the fix). -- Peter Geoghegan
Вложения
On 20/05/2023 05:17, Peter Geoghegan wrote: > It has come to my attention that there is a remaining issue of the > same general nature in nbtree VACUUM's page deletion code. Though this > remaining issue seems significantly less likely to come up in > practice, there is no reason to take any chances here. Yeah, let's be consistent. Any idea what might cause this corruption? > Attached patch fixes it. > + /* > + * Validate target's right sibling page's left link points back to target. > + * > + * This is known to fail in the field in the presence of index corruption, > + * so we go to the trouble of avoiding a hard ERROR. This is fairly close > + * to what we did earlier on when we located the target's left sibling > + * (iff target has a left sibling). > + */ This comment notes that this is similar to what we did with the left sibling, but there isn't really any mention at the left sibling code about avoiding hard ERRORs. Feels a bit backwards. Maybe move the comment about avoiding the hard ERROR to where the left sibling is handled. Or explain it in the function comment and just have short "shouldn't happen, but avoid hard ERROR if the index is corrupt" comment here. > Also attached is a bugfix for a minor issue in amcheck's > bt_index_parent_check() function, which I noticed in passing, while I > tested the first patch. We assumed that we'd always land on the > leftmost page on each level first (the leftmost according to internal > pages one level up). That assumption is faulty because page deletion > of the leftmost page is quite possible. Page deletion can be > interrupted, leaving a half-dead leaf page (possibly the leftmost leaf > page) without any downlink one level up, while still leaving a left > sibling link on the leaf level (in the leaf page that isn't about to > become the leftmost, but won't until the interrupted page deletion can > be completed). You could check that the left sibling is indeed a half-dead page. > ereport(DEBUG1, > (errcode(ERRCODE_NO_DATA), > errmsg("block %u is not leftmost in index \"%s\"", > current, RelationGetRelationName(state->rel)))); ERRCODE_NO_DATA doesn't look right. Let's just leave out the errcode. -- Heikki Linnakangas Neon (https://neon.tech)
On Sun, May 21, 2023 at 11:51 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > Any idea what might cause this corruption? Not really, no. As far as I know the specific case that was brought to my attention (that put me on the path to writing this patch) was just an isolated incident. The interesting detail (if any) is that it was a relatively recent version of Postgres (13), and that there were no other known problems. This means that there is a plausible remaining gap in the defensive checks in nbtree VACUUM on recent versions -- we might have expected to avoid a hard ERROR in some other way, from one of the earlier checks, but that didn't happen on at least one occasion. You can find several references to the "right sibling's left-link doesn't match:" error message by googling. Most of them are clearly from the page split ERROR. But there are some from VACUUM, too: https://stackoverflow.com/questions/49307292/error-in-postgresql-right-siblings-left-link-doesnt-match-block-5-links-to-8 Granted, that was from a 9.2 database -- before your 9.4 work that made this whole area much more robust. > This comment notes that this is similar to what we did with the left > sibling, but there isn't really any mention at the left sibling code > about avoiding hard ERRORs. Feels a bit backwards. Maybe move the > comment about avoiding the hard ERROR to where the left sibling is > handled. Or explain it in the function comment and just have short > "shouldn't happen, but avoid hard ERROR if the index is corrupt" comment > here. Good point. Will do it that way. > > Also attached is a bugfix for a minor issue in amcheck's > > bt_index_parent_check() function, which I noticed in passing, while I > > tested the first patch. > You could check that the left sibling is indeed a half-dead page. It's very hard to see, but...I think that we do. Sort of. Since bt_recheck_sibling_links() is prepared to check that the left sibling's right link points back to the target page. One problem with that is that it only happens in the AccessShareLock case, whereas we're concerned with fixing an issue in the ShareLock case. Another problem is that it's awkward and complicated to explain. It's not obvious that it's worth trying to explain all this and/or making sure that it happens in the ShareLock case, so that we have everything covered. I'm unsure. > ERRCODE_NO_DATA doesn't look right. Let's just leave out the errcode. Agreed. -- Peter Geoghegan
On Mon, May 22, 2023 at 9:22 AM Peter Geoghegan <pg@bowt.ie> wrote: > > This comment notes that this is similar to what we did with the left > > sibling, but there isn't really any mention at the left sibling code > > about avoiding hard ERRORs. Feels a bit backwards. Maybe move the > > comment about avoiding the hard ERROR to where the left sibling is > > handled. Or explain it in the function comment and just have short > > "shouldn't happen, but avoid hard ERROR if the index is corrupt" comment > > here. > > Good point. Will do it that way. Attached is v2, which does it that way. It also adjusts the approach taken to release locks and pins when the left sibling validation check fails. This makes it simpler and more consistent with surrounding code. I might not include this change in the backpatch. Not including a revised amcheck patch here, since I'm not exactly sure what to do with your feedback on that one just yet. -- Peter Geoghegan
Вложения
On Mon, May 22, 2023 at 10:59 AM Peter Geoghegan <pg@bowt.ie> wrote: > Attached is v2, which does it that way. It also adjusts the approach > taken to release locks and pins when the left sibling validation check > fails. I pushed this just now, backpatching all the way. > Not including a revised amcheck patch here, since I'm not exactly sure > what to do with your feedback on that one just yet. I'd like to go with a minimal approach in my patch to address the remaining issue in amcheck. Something similar to the patch that was posted as part of v1. While it seems important to address the issue, making sure that we have coverage of the leftmost page really being half-dead (as opposed to something that would constitute corruption) seems much less important. Ideally we'd have exhaustive coverage, but it's not a priority for me right now. -- Peter Geoghegan
On Mon, May 22, 2023, 12:31 Peter Geoghegan <pg@bowt.ie> wrote:
On Sun, May 21, 2023 at 11:51 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Any idea what might cause this corruption?
Not really, no. As far as I know the specific case that was brought to
my attention (that put me on the path to writing this patch) was just
an isolated incident. The interesting detail (if any) is that it was a
relatively recent version of Postgres (13), and that there were no
other known problems. This means that there is a plausible remaining
gap in the defensive checks in nbtree VACUUM on recent versions -- we
might have expected to avoid a hard ERROR in some other way, from one
of the earlier checks, but that didn't happen on at least one
occasion.
What error would one expect to see? I did have a case where vacuum was erroring on a btree in $previous_job.
On Thu, Jun 1, 2023 at 6:47 AM Greg Stark <stark@mit.edu> wrote: > What error would one expect to see? I did have a case where vacuum was erroring on a btree in $previous_job. You mean in general? It's usually this one: https://gitlab.com/gitlab-org/gitlab/-/issues/381443 In the case of this particular issue, the error is "right sibling's left-link doesn't match". Per: https://stackoverflow.com/questions/49307292/error-in-postgresql-right-siblings-left-link-doesnt-match-block-5-links-to-8 -- Peter Geoghegan