Re: Fixes for two separate bugs in nbtree VACUUM's page deletion

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: Fixes for two separate bugs in nbtree VACUUM's page deletion
Дата
Msg-id CA+fd4k5XWCXcF2XyO6EDv2jcrw9az8aaBWgp33YE-4Jmr8RT4A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Fixes for two separate bugs in nbtree VACUUM's page deletion  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: Fixes for two separate bugs in nbtree VACUUM's page deletion
Re: Fixes for two separate bugs in nbtree VACUUM's page deletion
Список pgsql-hackers
On Tue, 28 Apr 2020 at 11:35, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Mon, Apr 27, 2020 at 11:02 AM Peter Geoghegan <pg@bowt.ie> wrote:
> > I would like to backpatch both patches to all branches that have
> > commit 857f9c36cda -- v11, v12, and master. The second issue isn't
> > serious, but it seems worth keeping v11+ in sync in this area. Note
> > that any backpatch theoretically creates an ABI break for callers of
> > the _bt_pagedel() function.
>
> I plan to commit both fixes, while backpatching to v11, v12 and the
> master branch on Friday -- barring objections.

Thank you for the patches and for starting the new thread.

I agree with both patches.

For the first fix it seems better to push down the logic to the page
deletion code as your 0001 patch does so. The following change changes
the page deletion code so that it emits LOG message indicating the
index corruption when a deleted page is passed. But we used to ignore
in this case.

@@ -1511,14 +1523,21 @@ _bt_pagedel(Relation rel, Buffer buf)

    for (;;)
    {
-       page = BufferGetPage(buf);
+       page = BufferGetPage(leafbuf);
        opaque = (BTPageOpaque) PageGetSpecialPointer(page);

        /*
         * Internal pages are never deleted directly, only as part of deleting
         * the whole branch all the way down to leaf level.
+        *
+        * Also check for deleted pages here.  Caller never passes us a fully
+        * deleted page.  Only VACUUM can delete pages, so there can't have
+        * been a concurrent deletion.  Assume that we reached any deleted
+        * page encountered here by following a sibling link, and that the
+        * index is corrupt.
         */
-       if (!P_ISLEAF(opaque))
+       Assert(!P_ISDELETED(opaque));
+       if (!P_ISLEAF(opaque) || P_ISDELETED(opaque))
        {
            /*
             * Pre-9.4 page deletion only marked internal pages as half-dead,
@@ -1537,13 +1556,21 @@ _bt_pagedel(Relation rel, Buffer buf)
                         errmsg("index \"%s\" contains a half-dead
internal page",
                                RelationGetRelationName(rel)),
                         errhint("This can be caused by an interrupted
VACUUM in version 9.3 or older, before upgrade. Please REINDEX
it.")));
-           _bt_relbuf(rel, buf);
+
+           if (P_ISDELETED(opaque))
+               ereport(LOG,
+                       (errcode(ERRCODE_INDEX_CORRUPTED),
+                        errmsg("index \"%s\" contains a deleted page
with a live sibling link",
+                               RelationGetRelationName(rel))));
+
+           _bt_relbuf(rel, leafbuf);
            return ndeleted;
        }

I agree with this change but since I'm not sure this change directly
relates to this bug I wonder if it can be a separate patch.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Handling of concurrent aborts in logical decoding of in-progress xacts
Следующее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: [HACKERS] Restricting maximum keep segments by repslots