Обсуждение: Bug in VACUUM reporting of "removed %d row versions" in 9.2+
Commit d0dcb315db0043f10073a9a244cea138e9e60edd and previous introduced a bug into the reporting of removed row versions. ('Twas myself et al, before you ask). In those commits, lazy_vacuum_heap() skipped pinned blocks, but then failed to report that accurately, claiming that the tuples were actually removed when they were not. That bug has masked the effect of the page skipping behaviour. Bug is in 9.2 and HEAD. Attached patch corrects that, with logic to move to the next block rather than re-try the lock in a tight loop once per tuple, which was mostly ineffective. Attached patch also changes the algorithm slightly to retry a skipped block by sleeping and then retrying the block, following observation of the effects of the current skipping algorithm once skipped rows are correctly reported. It also adds a comment which explains the skipping behaviour. Viewpoints? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
On Fri, May 10, 2013 at 11:37 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > Commit d0dcb315db0043f10073a9a244cea138e9e60edd and previous > introduced a bug into the reporting of removed row versions. ('Twas > myself et al, before you ask). > > In those commits, lazy_vacuum_heap() skipped pinned blocks, but then > failed to report that accurately, claiming that the tuples were > actually removed when they were not. That bug has masked the effect of > the page skipping behaviour. > > Bug is in 9.2 and HEAD. > > Attached patch corrects that, with logic to move to the next block > rather than re-try the lock in a tight loop once per tuple, which was > mostly ineffective. > > Attached patch also changes the algorithm slightly to retry a skipped > block by sleeping and then retrying the block, following observation > of the effects of the current skipping algorithm once skipped rows are > correctly reported. > > It also adds a comment which explains the skipping behaviour. > > Viewpoints? I think this patch as currently written is going to leave us with the following dubious-looking construct. if (!ConditionalLockBufferForCleanup(buf)) { if (!ConditionalLockBufferForCleanup(buf)) { Modulo that minor gripe, I think it's definitely worth doing this in master. I'm a bit disinclined to change the message string in 9.2, and therefore might not back-patch at all, since there's basically no consequence to this except for mildly inaccurate reporting. But if people feel it's worth a translation break for this, I don't object to back-patching it either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas escribió: > I think this patch as currently written is going to leave us with the > following dubious-looking construct. > > if (!ConditionalLockBufferForCleanup(buf)) > { > if (!ConditionalLockBufferForCleanup(buf)) > { > > Modulo that minor gripe, I think it's definitely worth doing this in > master. I'm a bit disinclined to change the message string in 9.2, > and therefore might not back-patch at all, since there's basically no > consequence to this except for mildly inaccurate reporting. But if > people feel it's worth a translation break for this, I don't object to > back-patching it either. We break translations to fix bugs from time to time. Translators do update for minor releases, so if this is the only consideration, I don't think we should hold this bugfix for it (or any other, IMV). -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Where are we on this? --------------------------------------------------------------------------- On Fri, May 10, 2013 at 04:37:58PM +0100, Simon Riggs wrote: > Commit d0dcb315db0043f10073a9a244cea138e9e60edd and previous > introduced a bug into the reporting of removed row versions. ('Twas > myself et al, before you ask). > > In those commits, lazy_vacuum_heap() skipped pinned blocks, but then > failed to report that accurately, claiming that the tuples were > actually removed when they were not. That bug has masked the effect of > the page skipping behaviour. > > Bug is in 9.2 and HEAD. > > Attached patch corrects that, with logic to move to the next block > rather than re-try the lock in a tight loop once per tuple, which was > mostly ineffective. > > Attached patch also changes the algorithm slightly to retry a skipped > block by sleeping and then retrying the block, following observation > of the effects of the current skipping algorithm once skipped rows are > correctly reported. > > It also adds a comment which explains the skipping behaviour. > > Viewpoints? > > -- > Simon Riggs http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c > index 02f3cf3..f0d054a 100644 > --- a/src/backend/commands/vacuumlazy.c > +++ b/src/backend/commands/vacuumlazy.c > @@ -1052,15 +1052,15 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats, > static void > lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats) > { > - int tupindex; > - int npages; > + int tupindex = 0; > + int npages = 0; > + int ntupskipped = 0; > + int npagesskipped = 0; > PGRUsage ru0; > Buffer vmbuffer = InvalidBuffer; > > pg_rusage_init(&ru0); > - npages = 0; > > - tupindex = 0; > while (tupindex < vacrelstats->num_dead_tuples) > { > BlockNumber tblk; > @@ -1075,9 +1075,32 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats) > vac_strategy); > if (!ConditionalLockBufferForCleanup(buf)) > { > - ReleaseBuffer(buf); > - ++tupindex; > - continue; > + /* > + * If we can't get the lock, sleep, then try again just once. > + * > + * If we can't get the lock the second time, skip this block and > + * move onto the next one. This is possible because by now we > + * know the tuples are dead and all index pointers to them have been > + * removed, so it is safe to ignore them, even if not ideal. > + */ > + VacuumCostBalance += VacuumCostLimit; > + vacuum_delay_point(); > + if (!ConditionalLockBufferForCleanup(buf)) > + { > + BlockNumber blkno = tblk; > + > + ReleaseBuffer(buf); > + tupindex++; > + for (; tupindex < vacrelstats->num_dead_tuples; tupindex++) > + { > + ntupskipped++; > + tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples[tupindex]); > + if (tblk != blkno) > + break; > + } > + npagesskipped++; > + continue; > + } > } > tupindex = lazy_vacuum_page(onerel, tblk, buf, tupindex, vacrelstats, > &vmbuffer); > @@ -1098,9 +1121,9 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats) > } > > ereport(elevel, > - (errmsg("\"%s\": removed %d row versions in %d pages", > + (errmsg("\"%s\": removed %d row versions in %d pages (skipped %d row versions in %d pages)", > RelationGetRelationName(onerel), > - tupindex, npages), > + tupindex - ntupskipped, npages, ntupskipped, npagesskipped), > errdetail("%s.", > pg_rusage_show(&ru0)))); > } > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On 9 December 2013 21:24, Bruce Momjian <bruce@momjian.us> wrote: > > Where are we on this? Good question, see below. >> In those commits, lazy_vacuum_heap() skipped pinned blocks, but then >> failed to report that accurately, claiming that the tuples were >> actually removed when they were not. That bug has masked the effect of >> the page skipping behaviour. Wow, this is more complex than just that. Can't foresee backpatching anything. We prune rows down to dead item pointers. Then we remove from the index, then we try to remove them from heap, but may skip removal for some blocks. The user description of that is just wrong and needs more thought and work. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services