Re: Fixing a couple of buglets in how VACUUM sets visibility map bits

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Fixing a couple of buglets in how VACUUM sets visibility map bits
Дата
Msg-id 20230109195729.7n5gidgfirsbtsjx@awork3.anarazel.de
обсуждение исходный текст
Ответ на Re: Fixing a couple of buglets in how VACUUM sets visibility map bits  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: Fixing a couple of buglets in how VACUUM sets visibility map bits
Список pgsql-hackers
Hi,

On 2023-01-09 10:16:03 -0800, Peter Geoghegan wrote:
> Attached is v3, which explicitly checks the need to set the PD_ALL_VISIBLE
> flag at the relevant visibilitymap_set() call site. It also has improved
> comments.

Afaict we'll need to backpatch this all the way?


> From e7788ebdb589fb7c6f866cf53658cc369f9858b5 Mon Sep 17 00:00:00 2001
> From: Peter Geoghegan <pg@bowt.ie>
> Date: Sat, 31 Dec 2022 15:13:01 -0800
> Subject: [PATCH v3] Don't accidentally unset all-visible bit in VM.
> 
> One of the calls to visibilitymap_set() during VACUUM's initial heap
> pass could unset a page's all-visible bit during the process of setting
> the same page's all-frozen bit.

As just mentioned upthread, this just seems wrong.


> This could happen in the event of a
> concurrent HOT update from a transaction that aborts soon after.  Since
> the all_visible_according_to_vm local variable that lazy_scan_heap works
> off of when setting the VM doesn't reflect the current state of the VM,
> and since visibilitymap_set() just requested that the all-frozen bit get
> set in one case, there was a race condition.  Heap pages could initially
> be all-visible just as all_visible_according_to_vm is established, then
> not be all-visible after the update, and then become eligible to be set
> all-visible once more following pruning by VACUUM.  There is no reason
> why VACUUM can't remove a concurrently aborted heap-only tuple right
> away, and so no reason why such a page won't be able to reach the
> relevant visibilitymap_set() call site.

Do you have a reproducer for this?


> @@ -1120,8 +1123,8 @@ lazy_scan_heap(LVRelState *vacrel)
>           * got cleared after lazy_scan_skip() was called, so we must recheck
>           * with buffer lock before concluding that the VM is corrupt.
>           */
> -        else if (all_visible_according_to_vm && !PageIsAllVisible(page)
> -                 && VM_ALL_VISIBLE(vacrel->rel, blkno, &vmbuffer))
> +        else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
> +                 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
>          {
>              elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page
%u",
>                   vacrel->relname, blkno);

Hm. The message gets a bit less accurate with the change. Perhaps OK? OTOH, it
might be useful to know what bit was wrong when debugging problems.


> @@ -1164,12 +1167,34 @@ lazy_scan_heap(LVRelState *vacrel)
>                   !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
>          {
>              /*
> -             * We can pass InvalidTransactionId as the cutoff XID here,
> -             * because setting the all-frozen bit doesn't cause recovery
> -             * conflicts.
> +             * Avoid relying on all_visible_according_to_vm as a proxy for the
> +             * page-level PD_ALL_VISIBLE bit being set, since it might have
> +             * become stale -- even when all_visible is set in prunestate.
> +             *
> +             * Consider the example of a page that starts out all-visible and
> +             * then has a tuple concurrently deleted by an xact that aborts.
> +             * The page will be all_visible_according_to_vm, and will have
> +             * all_visible set in prunestate.  It will nevertheless not have
> +             * PD_ALL_VISIBLE set by here (plus neither VM bit will be set).
> +             * And so we must check if PD_ALL_VISIBLE needs to be set.
>               */
> +            if (!PageIsAllVisible(page))
> +            {
> +                PageSetAllVisible(page);
> +                MarkBufferDirty(buf);
> +            }
> +
> +            /*
> +             * Set the page all-frozen (and all-visible) in the VM.
> +             *
> +             * We can pass InvalidTransactionId as our visibility_cutoff_xid,
> +             * since a snapshotConflictHorizon sufficient to make everything
> +             * safe for REDO was logged when the page's tuples were frozen.
> +             */
> +            Assert(!TransactionIdIsValid(prunestate.visibility_cutoff_xid));
>              visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
>                                vmbuffer, InvalidTransactionId,
> +                              VISIBILITYMAP_ALL_VISIBLE    |
>                                VISIBILITYMAP_ALL_FROZEN);
>          }
>  
> @@ -1311,7 +1336,11 @@ lazy_scan_skip(LVRelState *vacrel, Buffer *vmbuffer, BlockNumber next_block,
>  
>          /* DISABLE_PAGE_SKIPPING makes all skipping unsafe */
>          if (!vacrel->skipwithvm)
> +        {
> +            /* Caller shouldn't rely on all_visible_according_to_vm */
> +            *next_unskippable_allvis = false;
>              break;
> +        }
>  
>          /*
>           * Aggressive VACUUM caller can't skip pages just because they are
> @@ -1818,7 +1847,11 @@ retry:
>               * cutoff by stepping back from OldestXmin.
>               */
>              if (prunestate->all_visible && prunestate->all_frozen)
> +            {
> +                /* Using same cutoff when setting VM is now unnecessary */
>                  snapshotConflictHorizon = prunestate->visibility_cutoff_xid;
> +                prunestate->visibility_cutoff_xid = InvalidTransactionId;
> +            }
>              else
>              {
>                  /* Avoids false conflicts when hot_standby_feedback in use */
> @@ -2388,8 +2421,8 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
>  static void
>  lazy_vacuum_heap_rel(LVRelState *vacrel)
>  {
> -    int            index;
> -    BlockNumber vacuumed_pages;
> +    int            index = 0;
> +    BlockNumber vacuumed_pages = 0;
>      Buffer        vmbuffer = InvalidBuffer;
>      LVSavedErrInfo saved_err_info;
>  
> @@ -2406,42 +2439,42 @@ lazy_vacuum_heap_rel(LVRelState *vacrel)
>                               VACUUM_ERRCB_PHASE_VACUUM_HEAP,
>                               InvalidBlockNumber, InvalidOffsetNumber);
>  
> -    vacuumed_pages = 0;
> -
> -    index = 0;
>      while (index < vacrel->dead_items->num_items)
>      {
> -        BlockNumber tblk;
> +        BlockNumber blkno;
>          Buffer        buf;
>          Page        page;
>          Size        freespace;
>  
>          vacuum_delay_point();

I still think such changes are inappropriate for a bugfix, particularly one
that needs to be backpatched.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Fixing a couple of buglets in how VACUUM sets visibility map bits
Следующее
От: Andres Freund
Дата:
Сообщение: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher