Обсуждение: Bug in VACUUM reporting of "removed %d row versions" in 9.2+

Поиск
Список
Период
Сортировка

Bug in VACUUM reporting of "removed %d row versions" in 9.2+

От
Simon Riggs
Дата:
Commit d0dcb315db0043f10073a9a244cea138e9e60edd and previous
introduced a bug into the reporting of removed row versions. ('Twas
myself et al, before you ask).

In those commits, lazy_vacuum_heap() skipped pinned blocks, but then
failed to report that accurately, claiming that the tuples were
actually removed when they were not. That bug has masked the effect of
the page skipping behaviour.

Bug is in 9.2 and HEAD.

Attached patch corrects that, with logic to move to the next block
rather than re-try the lock in a tight loop once per tuple, which was
mostly ineffective.

Attached patch also changes the algorithm slightly to retry a skipped
block by sleeping and then retrying the block, following observation
of the effects of the current skipping algorithm once skipped rows are
correctly reported.

It also adds a comment which explains the skipping behaviour.

Viewpoints?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Bug in VACUUM reporting of "removed %d row versions" in 9.2+

От
Robert Haas
Дата:
On Fri, May 10, 2013 at 11:37 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Commit d0dcb315db0043f10073a9a244cea138e9e60edd and previous
> introduced a bug into the reporting of removed row versions. ('Twas
> myself et al, before you ask).
>
> In those commits, lazy_vacuum_heap() skipped pinned blocks, but then
> failed to report that accurately, claiming that the tuples were
> actually removed when they were not. That bug has masked the effect of
> the page skipping behaviour.
>
> Bug is in 9.2 and HEAD.
>
> Attached patch corrects that, with logic to move to the next block
> rather than re-try the lock in a tight loop once per tuple, which was
> mostly ineffective.
>
> Attached patch also changes the algorithm slightly to retry a skipped
> block by sleeping and then retrying the block, following observation
> of the effects of the current skipping algorithm once skipped rows are
> correctly reported.
>
> It also adds a comment which explains the skipping behaviour.
>
> Viewpoints?

I think this patch as currently written is going to leave us with the
following dubious-looking construct.
    if (!ConditionalLockBufferForCleanup(buf))    {        if (!ConditionalLockBufferForCleanup(buf))        {

Modulo that minor gripe, I think it's definitely worth doing this in
master.  I'm a bit disinclined to change the message string in 9.2,
and therefore might not back-patch at all, since there's basically no
consequence to this except for mildly inaccurate reporting.  But if
people feel it's worth a translation break for this, I don't object to
back-patching it either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Bug in VACUUM reporting of "removed %d row versions" in 9.2+

От
Alvaro Herrera
Дата:
Robert Haas escribió:

> I think this patch as currently written is going to leave us with the
> following dubious-looking construct.
>
>         if (!ConditionalLockBufferForCleanup(buf))
>         {
>             if (!ConditionalLockBufferForCleanup(buf))
>             {
>
> Modulo that minor gripe, I think it's definitely worth doing this in
> master.  I'm a bit disinclined to change the message string in 9.2,
> and therefore might not back-patch at all, since there's basically no
> consequence to this except for mildly inaccurate reporting.  But if
> people feel it's worth a translation break for this, I don't object to
> back-patching it either.

We break translations to fix bugs from time to time.  Translators do
update for minor releases, so if this is the only consideration, I don't
think we should hold this bugfix for it (or any other, IMV).

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Bug in VACUUM reporting of "removed %d row versions" in 9.2+

От
Bruce Momjian
Дата:
Where are we on this?

---------------------------------------------------------------------------

On Fri, May 10, 2013 at 04:37:58PM +0100, Simon Riggs wrote:
> Commit d0dcb315db0043f10073a9a244cea138e9e60edd and previous
> introduced a bug into the reporting of removed row versions. ('Twas
> myself et al, before you ask).
> 
> In those commits, lazy_vacuum_heap() skipped pinned blocks, but then
> failed to report that accurately, claiming that the tuples were
> actually removed when they were not. That bug has masked the effect of
> the page skipping behaviour.
> 
> Bug is in 9.2 and HEAD.
> 
> Attached patch corrects that, with logic to move to the next block
> rather than re-try the lock in a tight loop once per tuple, which was
> mostly ineffective.
> 
> Attached patch also changes the algorithm slightly to retry a skipped
> block by sleeping and then retrying the block, following observation
> of the effects of the current skipping algorithm once skipped rows are
> correctly reported.
> 
> It also adds a comment which explains the skipping behaviour.
> 
> Viewpoints?
> 
> --
>  Simon Riggs                   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services

> diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
> index 02f3cf3..f0d054a 100644
> --- a/src/backend/commands/vacuumlazy.c
> +++ b/src/backend/commands/vacuumlazy.c
> @@ -1052,15 +1052,15 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
>  static void
>  lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
>  {
> -    int            tupindex;
> -    int            npages;
> +    int            tupindex = 0;
> +    int            npages = 0;
> +    int            ntupskipped = 0;
> +    int            npagesskipped = 0;
>      PGRUsage    ru0;
>      Buffer        vmbuffer = InvalidBuffer;
>  
>      pg_rusage_init(&ru0);
> -    npages = 0;
>  
> -    tupindex = 0;
>      while (tupindex < vacrelstats->num_dead_tuples)
>      {
>          BlockNumber tblk;
> @@ -1075,9 +1075,32 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
>                                   vac_strategy);
>          if (!ConditionalLockBufferForCleanup(buf))
>          {
> -            ReleaseBuffer(buf);
> -            ++tupindex;
> -            continue;
> +            /*
> +             * If we can't get the lock, sleep, then try again just once.
> +             *
> +             * If we can't get the lock the second time, skip this block and
> +             * move onto the next one. This is possible because by now we
> +             * know the tuples are dead and all index pointers to them have been
> +             * removed, so it is safe to ignore them, even if not ideal.
> +             */
> +            VacuumCostBalance += VacuumCostLimit;
> +            vacuum_delay_point();
> +            if (!ConditionalLockBufferForCleanup(buf))
> +            {
> +                BlockNumber blkno = tblk;
> +
> +                ReleaseBuffer(buf);
> +                tupindex++;
> +                for (; tupindex < vacrelstats->num_dead_tuples; tupindex++)
> +                {
> +                    ntupskipped++;
> +                    tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples[tupindex]);
> +                    if (tblk != blkno)
> +                        break;
> +                }
> +                npagesskipped++;
> +                continue;
> +            }
>          }
>          tupindex = lazy_vacuum_page(onerel, tblk, buf, tupindex, vacrelstats,
>                                      &vmbuffer);
> @@ -1098,9 +1121,9 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
>      }
>  
>      ereport(elevel,
> -            (errmsg("\"%s\": removed %d row versions in %d pages",
> +            (errmsg("\"%s\": removed %d row versions in %d pages (skipped %d row versions in %d pages)",
>                      RelationGetRelationName(onerel),
> -                    tupindex, npages),
> +                    tupindex - ntupskipped, npages, ntupskipped, npagesskipped),
>               errdetail("%s.",
>                         pg_rusage_show(&ru0))));
>  }

> 
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Bug in VACUUM reporting of "removed %d row versions" in 9.2+

От
Simon Riggs
Дата:
On 9 December 2013 21:24, Bruce Momjian <bruce@momjian.us> wrote:
>
> Where are we on this?

Good question, see below.

>> In those commits, lazy_vacuum_heap() skipped pinned blocks, but then
>> failed to report that accurately, claiming that the tuples were
>> actually removed when they were not. That bug has masked the effect of
>> the page skipping behaviour.

Wow, this is more complex than just that. Can't foresee backpatching anything.

We prune rows down to dead item pointers. Then we remove from the
index, then we try to remove them from heap, but may skip removal for
some blocks.

The user description of that is just wrong and needs more thought and work.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services