Обсуждение: Incomplete freezing when truncating a relation during vacuum

Поиск
Список
Период
Сортировка

Incomplete freezing when truncating a relation during vacuum

От
Andres Freund
Дата:
Hi,

A customer of ours reported that some columns were missing from
pg_attribute. Investigation showed that they were visible to sequential
but not index scans. Looking closer, the page with the missing
attributes is marked as all-visible, but the xids on the individual rows
were xids more than 2^31 in the past - so, effectively in the future and
invisible.
pg_class.relfrozenxid, pg_database.datfrozenxid looked perfectly normal,
not indicating any wraparound and were well past the xid in the
particular rows.
So evidently, something around freezing has gone wrong. We've haven't
frozen each row, but updated relfrozenxid regardless.

A longer period of staring revealed a likely reason, in lazy_vacuum_rel:   /* Do the vacuuming */
lazy_scan_heap(onerel,vacrelstats, Irel, nindexes, scan_all);
 
...   if (whatever)      lazy_truncate_heap(onerel, vacrelstats);
...   new_frozen_xid = FreezeLimit;   if (vacrelstats->scanned_pages < vacrelstats->rel_pages)       new_frozen_xid =
InvalidTransactionId;
but lazy_tuncate_heap() does, after it's finished truncating:   vacrelstats->rel_pages = new_rel_pages;

Which means, we might consider a partial vacuum as a vacuum that has
frozen all old rows if just enough pages have been truncated away.

This seems to be the case since
b4b6923e03f4d29636a94f6f4cc2f5cf6298b8c8. I suggest we go back to using
scan_all to determine whether we can set new_frozen_xid. That's a slight
pessimization when we scan a relation fully without explicitly scanning
it in its entirety, but given this isn't the first bug around
scanned_pages/rel_pages I'd rather go that way. The aforementioned
commit wasn't primarily concerned with that.
Alternatively we could just compute new_frozen_xid et al before the
lazy_truncate_heap.

This is somewhat nasty :(.

I am not sure how we could fixup the resulting corruption. In some cases
we can check for page-level all-visible bit and fixup the the individual
xids. But it's not guaranteed, although likely, that the page level all
visible bit has been set...

Comments?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Incomplete freezing when truncating a relation during vacuum

От
Robert Haas
Дата:
On Tue, Nov 26, 2013 at 7:32 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> This is somewhat nasty :(.

Yeah, that's bad.  Real bad.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Incomplete freezing when truncating a relation during vacuum

От
Andres Freund
Дата:
On 2013-11-26 13:32:44 +0100, Andres Freund wrote:
> A longer period of staring revealed a likely reason, in lazy_vacuum_rel:
>     /* Do the vacuuming */
>     lazy_scan_heap(onerel, vacrelstats, Irel, nindexes, scan_all);
> ...
>     if (whatever)
>        lazy_truncate_heap(onerel, vacrelstats);
> ...
>     new_frozen_xid = FreezeLimit;
>     if (vacrelstats->scanned_pages < vacrelstats->rel_pages)
>         new_frozen_xid = InvalidTransactionId;
> but lazy_tuncate_heap() does, after it's finished truncating:
>     vacrelstats->rel_pages = new_rel_pages;
>
> Which means, we might consider a partial vacuum as a vacuum that has
> frozen all old rows if just enough pages have been truncated away.

repro.sql is a reproducer for the problem.

> This seems to be the case since
> b4b6923e03f4d29636a94f6f4cc2f5cf6298b8c8. I suggest we go back to using
> scan_all to determine whether we can set new_frozen_xid. That's a slight
> pessimization when we scan a relation fully without explicitly scanning
> it in its entirety, but given this isn't the first bug around
> scanned_pages/rel_pages I'd rather go that way. The aforementioned
> commit wasn't primarily concerned with that.
> Alternatively we could just compute new_frozen_xid et al before the
> lazy_truncate_heap.

I've gone for the latter in this preliminary patch. Not increasing
relfrozenxid after an initial data load seems like a bit of a shame.

I wonder if we should just do scan_all || vacrelstats->scanned_pages <
vacrelstats->rel_pages?

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Incomplete freezing when truncating a relation during vacuum

От
Heikki Linnakangas
Дата:
On 11/27/13 01:21, Andres Freund wrote:
> On 2013-11-26 13:32:44 +0100, Andres Freund wrote:
>> This seems to be the case since
>> b4b6923e03f4d29636a94f6f4cc2f5cf6298b8c8. I suggest we go back to using
>> scan_all to determine whether we can set new_frozen_xid. That's a slight
>> pessimization when we scan a relation fully without explicitly scanning
>> it in its entirety, but given this isn't the first bug around
>> scanned_pages/rel_pages I'd rather go that way. The aforementioned
>> commit wasn't primarily concerned with that.
>> Alternatively we could just compute new_frozen_xid et al before the
>> lazy_truncate_heap.
>
> I've gone for the latter in this preliminary patch. Not increasing
> relfrozenxid after an initial data load seems like a bit of a shame.
>
> I wonder if we should just do scan_all || vacrelstats->scanned_pages <
> vacrelstats->rel_pages?

Hmm, you did (scan_all || vacrelstats->scanned_pages <
vacrelstats->rel_pages) for relminmxid, and just
(vacrelstats->scanned_pages < vacrelstats->rel_pages) for relfrozenxid.
That was probably not what you meant to do, the thing you did for
relfrozenxid looks good to me.

Does the attached look correct to you?

- Heikki

Вложения

Re: Incomplete freezing when truncating a relation during vacuum

От
Andres Freund
Дата:
On 2013-11-27 11:01:55 +0200, Heikki Linnakangas wrote:
> On 11/27/13 01:21, Andres Freund wrote:
> >On 2013-11-26 13:32:44 +0100, Andres Freund wrote:
> >>This seems to be the case since
> >>b4b6923e03f4d29636a94f6f4cc2f5cf6298b8c8. I suggest we go back to using
> >>scan_all to determine whether we can set new_frozen_xid. That's a slight
> >>pessimization when we scan a relation fully without explicitly scanning
> >>it in its entirety, but given this isn't the first bug around
> >>scanned_pages/rel_pages I'd rather go that way. The aforementioned
> >>commit wasn't primarily concerned with that.
> >>Alternatively we could just compute new_frozen_xid et al before the
> >>lazy_truncate_heap.
> >
> >I've gone for the latter in this preliminary patch. Not increasing
> >relfrozenxid after an initial data load seems like a bit of a shame.
> >
> >I wonder if we should just do scan_all || vacrelstats->scanned_pages <
> >vacrelstats->rel_pages?
> 
> Hmm, you did (scan_all || vacrelstats->scanned_pages <
> vacrelstats->rel_pages) for relminmxid, and just (vacrelstats->scanned_pages
> < vacrelstats->rel_pages) for relfrozenxid. That was probably not what you
> meant to do, the thing you did for relfrozenxid looks good to me.

I said it's a preliminary patch ;), really, I wasn't sure what of both
to go for.

> Does the attached look correct to you?

Looks good.

I wonder if we need to integrate any mitigating logic? Currently the
corruption may only become apparent long after it occurred, that's
pretty bad. And instructing people run a vacuum after the ugprade will
cause the corrupted data being lost if they are already 2^31 xids. But
integrating logic to fix things into heap_page_prune() looks somewhat
ugly as well.
Afaics the likelihood of the issue occuring on non-all-visible pages is
pretty low, since they'd need to be skipped due to lock contention
repeatedly.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Incomplete freezing when truncating a relation during vacuum

От
Heikki Linnakangas
Дата:
On 11/27/13 11:15, Andres Freund wrote:
> On 2013-11-27 11:01:55 +0200, Heikki Linnakangas wrote:
>> On 11/27/13 01:21, Andres Freund wrote:
>>> On 2013-11-26 13:32:44 +0100, Andres Freund wrote:
>>>> This seems to be the case since
>>>> b4b6923e03f4d29636a94f6f4cc2f5cf6298b8c8. I suggest we go back to using
>>>> scan_all to determine whether we can set new_frozen_xid. That's a slight
>>>> pessimization when we scan a relation fully without explicitly scanning
>>>> it in its entirety, but given this isn't the first bug around
>>>> scanned_pages/rel_pages I'd rather go that way. The aforementioned
>>>> commit wasn't primarily concerned with that.
>>>> Alternatively we could just compute new_frozen_xid et al before the
>>>> lazy_truncate_heap.
>>>
>>> I've gone for the latter in this preliminary patch. Not increasing
>>> relfrozenxid after an initial data load seems like a bit of a shame.
>>>
>>> I wonder if we should just do scan_all || vacrelstats->scanned_pages <
>>> vacrelstats->rel_pages?
>>
>> Hmm, you did (scan_all || vacrelstats->scanned_pages <
>> vacrelstats->rel_pages) for relminmxid, and just (vacrelstats->scanned_pages
>> < vacrelstats->rel_pages) for relfrozenxid. That was probably not what you
>> meant to do, the thing you did for relfrozenxid looks good to me.
>
> I said it's a preliminary patch ;), really, I wasn't sure what of both
> to go for.
>
>> Does the attached look correct to you?
>
> Looks good.

Ok, committed and backpatched that.

> I wonder if we need to integrate any mitigating logic? Currently the
> corruption may only become apparent long after it occurred, that's
> pretty bad. And instructing people run a vacuum after the ugprade will
> cause the corrupted data being lost if they are already 2^31 xids.

Ugh :-(. Running vacuum after the upgrade is the right thing to do to 
prevent further damage, but you're right that it will cause any 
already-wrapped around data to be lost forever. Nasty.

> But integrating logic to fix things into heap_page_prune() looks
> somewhat ugly as well.

I think any mitigating logic we might add should go into vacuum. It 
should be possible for a DBA to run a command, and after it's finished, 
be confident that you're safe. That means vacuum.

> Afaics the likelihood of the issue occuring on non-all-visible pages is
> pretty low, since they'd need to be skipped due to lock contention
> repeatedly.

Hmm. If a page has its visibility-map flag set, but contains a tuple 
that appears to be dead because you've wrapped around, vacuum will give 
a warning:  "page containing dead tuples is marked as all-visible in 
relation \"%s\" page %u". So I think if a manual VACUUM FREEZE passes 
without giving that warning, that vacuum hasn't destroyed any data. So 
we could advise to take a physical backup of the data directory, and run 
a manual VACUUM FREEZE on all databases. If it doesn't give a warning, 
you're safe from that point onwards. If it does, you'll want to recover 
from an older backup, or try to manually salvage just the lost rows from 
the backup, and re-index. Ugly, but hopefully rare in practice.

Unfortunately that doesn't mean that you haven't already lost some data. 
Wrap-around could've already happened, and vacuum might already have run 
and removed some rows. You'll want to examine your logs and grep for 
that warning.

- Heikki



Re: Incomplete freezing when truncating a relation during vacuum

От
Andres Freund
Дата:
On 2013-11-27 13:56:58 +0200, Heikki Linnakangas wrote:
> Ok, committed and backpatched that.

Thanks.

> >I wonder if we need to integrate any mitigating logic? Currently the
> >corruption may only become apparent long after it occurred, that's
> >pretty bad. And instructing people run a vacuum after the ugprade will
> >cause the corrupted data being lost if they are already 2^31 xids.
> 
> Ugh :-(. Running vacuum after the upgrade is the right thing to do to
> prevent further damage, but you're right that it will cause any
> already-wrapped around data to be lost forever. Nasty.

> >But integrating logic to fix things into heap_page_prune() looks
> >somewhat ugly as well.
> 
> I think any mitigating logic we might add should go into vacuum. It should
> be possible for a DBA to run a command, and after it's finished, be
> confident that you're safe. That means vacuum.

Well, heap_page_prune() is the first thing that's executed by
lazy_scan_heap(), that's why I was talking about it. So anything we do
need to happen in there or before.

> >Afaics the likelihood of the issue occuring on non-all-visible pages is
> >pretty low, since they'd need to be skipped due to lock contention
> >repeatedly.

> Hmm. If a page has its visibility-map flag set, but contains a tuple that
> appears to be dead because you've wrapped around, vacuum will give a
> warning:  "page containing dead tuples is marked as all-visible in relation
> \"%s\" page %u".

I don't think this warning is likely to be hit as the code stands -
heap_page_prune() et. al. will have removed all dead tuples already,
right and so has_dead_tuples won't be set.

Independent from this, ISTM we should add a              else if (PageIsAllVisible(page) && all_visible)
to those checks.


Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Incomplete freezing when truncating a relation during vacuum

От
Heikki Linnakangas
Дата:
On 11/27/13 14:11, Andres Freund wrote:
> On 2013-11-27 13:56:58 +0200, Heikki Linnakangas wrote:
>>> Afaics the likelihood of the issue occuring on non-all-visible pages is
>>> pretty low, since they'd need to be skipped due to lock contention
>>> repeatedly.
>
>> Hmm. If a page has its visibility-map flag set, but contains a tuple that
>> appears to be dead because you've wrapped around, vacuum will give a
>> warning:  "page containing dead tuples is marked as all-visible in relation
>> \"%s\" page %u".
>
> I don't think this warning is likely to be hit as the code stands -
> heap_page_prune() et. al. will have removed all dead tuples already,
> right and so has_dead_tuples won't be set.

It might be a good idea to add such a warning to heap_page_prune(). Or 
also emit the warning in lazy_scan_heap() if heap_page_prune() returned > 0.

> Independent from this, ISTM we should add a
>                 else if (PageIsAllVisible(page) && all_visible)
> to those checks.

Can you elaborate, where should that be added?

- Heikki



Re: Incomplete freezing when truncating a relation during vacuum

От
Andres Freund
Дата:
On 2013-11-27 14:45:25 +0200, Heikki Linnakangas wrote:
> On 11/27/13 14:11, Andres Freund wrote:
> >I don't think this warning is likely to be hit as the code stands -
> >heap_page_prune() et. al. will have removed all dead tuples already,
> >right and so has_dead_tuples won't be set.
> 
> It might be a good idea to add such a warning to heap_page_prune(). Or also
> emit the warning in lazy_scan_heap() if heap_page_prune() returned > 0.

> >Independent from this, ISTM we should add a
> >                else if (PageIsAllVisible(page) && all_visible)
> >to those checks.
> 
> Can you elaborate, where should that be added?

I was thinking of adding such a warning below   elog(WARNING, "page containing dead tuples is marked as all-visible in
relation\"%s\" page %u",..)
 
but cannot warn against that because GetOldestXmin() can go backwards...

I think it's probably sufficient to set has_dead_tuples = true in the
ItemIdIsDead() branch in lazy_scan_heap(). That should catch relevant
actions from heap_page_prune().

Besides not warning in against deletions from heap_page_prune(), the
current warning logic is also buggy for tables without indexes...
       /*        * If there are no indexes then we can vacuum the page right now        * instead of doing a second
scan.       */       if (nindexes == 0 &&           vacrelstats->num_dead_tuples > 0)       {           /* Remove
tuplesfrom heap */           lazy_vacuum_page(onerel, blkno, buf, 0, vacrelstats, &vmbuffer);           has_dead_tuples
=false;
 

That happens before the       else if (PageIsAllVisible(page) && has_dead_tuples)
check.

With regard to fixing things up, ISTM the best bet is heap_prune_chain()
so far. That's executed b vacuum and by opportunistic pruning and we
know we have the appropriate locks there. Looks relatively easy to fix
up things there. Not sure if there are any possible routes to WAL log
this but using log_newpage()?
I am really not sure what the best course of action is :(

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Incomplete freezing when truncating a relation during vacuum

От
Noah Misch
Дата:
How would you characterize the chances of this happening with default
*vacuum_freeze_*_age settings?  Offhand, it seems you would need to encounter
this bug during each of ~10 generations of autovacuum_freeze_max_age before
the old rows actually become invisible.

On Wed, Nov 27, 2013 at 02:14:53PM +0100, Andres Freund wrote:
> With regard to fixing things up, ISTM the best bet is heap_prune_chain()
> so far. That's executed b vacuum and by opportunistic pruning and we
> know we have the appropriate locks there. Looks relatively easy to fix
> up things there. Not sure if there are any possible routes to WAL log
> this but using log_newpage()?
> I am really not sure what the best course of action is :(

Maximizing detection is valuable, and the prognosis for automated repair is
poor.  I would want a way to extract tuples having xmin outside the range of
CLOG that are marked HEAP_XMIN_COMMITTED or appear on an all-visible page.  At
first, I supposed we could offer a tool to blindly freeze such tuples.
However, there's no guarantee that they are in harmony with recent changes to
the database; transactions that wrongly considered those tuples invisible may
have made decisions incompatible with their existence.  For example, reviving
such a tuple could violate a UNIQUE constraint if the user had already
replaced the missing row manually.  A module that offers "SELECT * FROM
rows_wrongly_invisible('anytable')" would aid manual cleanup efforts.
freeze_if_wrongly_invisible(tid) would be useful, too.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: Incomplete freezing when truncating a relation during vacuum

От
Andres Freund
Дата:
On 2013-11-27 14:53:27 -0500, Noah Misch wrote:
> How would you characterize the chances of this happening with default
> *vacuum_freeze_*_age settings?  Offhand, it seems you would need to encounter
> this bug during each of ~10 generations of autovacuum_freeze_max_age before
> the old rows actually become invisible.

I think realistically, to actually trigger the bug, it needs to happen
quite a bit more often. But in some workloads it's pretty darn easy to
hit. E.g. if significant parts of the table are regularly deleted, lots,
if not most, of your vacuums will spuriously increase relfrozenxid above
the actual value. Each time only by a small amount, but due to that
small increase there never will be an actual full table vacuum since
freeze_table_age will never even remotely be reached.

The client that made me look into the issue noticed problems on
pg_attribute - presumably because of temporary table usage primarily
affecting the tail end of pg_attribute.

> On Wed, Nov 27, 2013 at 02:14:53PM +0100, Andres Freund wrote:
> > With regard to fixing things up, ISTM the best bet is heap_prune_chain()
> > so far. That's executed b vacuum and by opportunistic pruning and we
> > know we have the appropriate locks there. Looks relatively easy to fix
> > up things there. Not sure if there are any possible routes to WAL log
> > this but using log_newpage()?
> > I am really not sure what the best course of action is :(
> 
> Maximizing detection is valuable, and the prognosis for automated repair is
> poor.  I would want a way to extract tuples having xmin outside the range of
> CLOG that are marked HEAP_XMIN_COMMITTED or appear on an all-visible
> page.

I think the likelihood of the problem affecting !all-visible pages is
close to zero. Each vacuum will try to clean those, so they surely will
get vacuumed at some point. I think the only way that could happen is if
the ConditionalLockBufferForCleanup() fails in each vacuum. And that
seems a bit unlikely.

> At first, I supposed we could offer a tool to blindly freeze such tuples.
> However, there's no guarantee that they are in harmony with recent changes to
> the database; transactions that wrongly considered those tuples invisible may
> have made decisions incompatible with their existence.  For example, reviving
> such a tuple could violate a UNIQUE constraint if the user had already
> replaced the missing row manually.

Good point, although since they are all on all-visible pages sequential
scans will currently already find those. It's primarily index scans that
won't. So it's not really reviving them...
The primary reason why I think it might be a good idea to "revive"
automatically is, that an eventual full-table/freeze vacuum will
currently delete them which seems bad.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Incomplete freezing when truncating a relation during vacuum

От
Noah Misch
Дата:
On Wed, Nov 27, 2013 at 10:43:05PM +0100, Andres Freund wrote:
> On 2013-11-27 14:53:27 -0500, Noah Misch wrote:
> > How would you characterize the chances of this happening with default
> > *vacuum_freeze_*_age settings?  Offhand, it seems you would need to encounter
> > this bug during each of ~10 generations of autovacuum_freeze_max_age before
> > the old rows actually become invisible.
> 
> I think realistically, to actually trigger the bug, it needs to happen
> quite a bit more often. But in some workloads it's pretty darn easy to
> hit. E.g. if significant parts of the table are regularly deleted, lots,
> if not most, of your vacuums will spuriously increase relfrozenxid above
> the actual value. Each time only by a small amount, but due to that
> small increase there never will be an actual full table vacuum since
> freeze_table_age will never even remotely be reached.

That makes sense.

> > Maximizing detection is valuable, and the prognosis for automated repair is
> > poor.  I would want a way to extract tuples having xmin outside the range of
> > CLOG that are marked HEAP_XMIN_COMMITTED or appear on an all-visible
> > page.
> 
> I think the likelihood of the problem affecting !all-visible pages is
> close to zero. Each vacuum will try to clean those, so they surely will
> get vacuumed at some point. I think the only way that could happen is if
> the ConditionalLockBufferForCleanup() fails in each vacuum. And that
> seems a bit unlikely.

The page could have sat all-visible (through multiple XID epochs, let's say)
until a recent UPDATE.

> > At first, I supposed we could offer a tool to blindly freeze such tuples.
> > However, there's no guarantee that they are in harmony with recent changes to
> > the database; transactions that wrongly considered those tuples invisible may
> > have made decisions incompatible with their existence.  For example, reviving
> > such a tuple could violate a UNIQUE constraint if the user had already
> > replaced the missing row manually.
> 
> Good point, although since they are all on all-visible pages sequential
> scans will currently already find those. It's primarily index scans that
> won't. So it's not really reviving them...

True.  Since a dump/reload of the database would already get the duplicate key
violation, the revival is not making anything clearly worse.  And if we hope
for manual repair, many DBAs just won't do that at all.

> The primary reason why I think it might be a good idea to "revive"
> automatically is, that an eventual full-table/freeze vacuum will
> currently delete them which seems bad.

Will it?  When the page became all-visible, the tuples were all hinted.  They
will never be considered dead.  Every 2B transactions, they will alternate
between live and not-yet-committed.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: Incomplete freezing when truncating a relation during vacuum

От
Andres Freund
Дата:
On 2013-11-27 18:18:02 -0500, Noah Misch wrote:
> > I think the likelihood of the problem affecting !all-visible pages is
> > close to zero. Each vacuum will try to clean those, so they surely will
> > get vacuumed at some point. I think the only way that could happen is if
> > the ConditionalLockBufferForCleanup() fails in each vacuum. And that
> > seems a bit unlikely.
> 
> The page could have sat all-visible (through multiple XID epochs, let's say)
> until a recent UPDATE.

Good point.

> > > At first, I supposed we could offer a tool to blindly freeze such tuples.
> > > However, there's no guarantee that they are in harmony with recent changes to
> > > the database; transactions that wrongly considered those tuples invisible may
> > > have made decisions incompatible with their existence.  For example, reviving
> > > such a tuple could violate a UNIQUE constraint if the user had already
> > > replaced the missing row manually.
> > 
> > Good point, although since they are all on all-visible pages sequential
> > scans will currently already find those. It's primarily index scans that
> > won't. So it's not really reviving them...
> 
> True.  Since a dump/reload of the database would already get the duplicate key
> violation, the revival is not making anything clearly worse.  And if we hope
> for manual repair, many DBAs just won't do that at all.

Especially if it involves compiling C code...

> > The primary reason why I think it might be a good idea to "revive"
> > automatically is, that an eventual full-table/freeze vacuum will
> > currently delete them which seems bad.
> 
> Will it?  When the page became all-visible, the tuples were all hinted.  They
> will never be considered dead.  Every 2B transactions, they will alternate
> between live and not-yet-committed.

Good point again. And pretty damn good luck.

Although 9.3+ multixact rows look like they could return _DEAD since
we'll do an TransactionIdDidAbort() (via GetUpdateXid()) and
TransactionIdDidCommit() in there and we don't set XMAX_COMMITTED hint
bits for XMAX_IS_MULTI rows. As an additional problem, once multixact.c
has pruned the old multis away we'll get errors from there on.
But that's less likely to affect many rows.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Incomplete freezing when truncating a relation during vacuum

От
Noah Misch
Дата:
> > On Wed, Nov 27, 2013 at 02:14:53PM +0100, Andres Freund wrote:
> > > With regard to fixing things up, ISTM the best bet is heap_prune_chain()
> > > so far. That's executed b vacuum and by opportunistic pruning and we
> > > know we have the appropriate locks there. Looks relatively easy to fix
> > > up things there. Not sure if there are any possible routes to WAL log
> > > this but using log_newpage()?
> > > I am really not sure what the best course of action is :(

Based on subsequent thread discussion, the plan you outline sounds reasonable.
Here is a sketch of the specific semantics of that fixup.  If a HEAPTUPLE_LIVE
tuple has XIDs older than the current relfrozenxid/relminmxid of its relation
or newer than ReadNewTransactionId()/ReadNextMultiXactId(), freeze those XIDs.
Do likewise for HEAPTUPLE_DELETE_IN_PROGRESS, ensuring a proper xmin if the
in-progress deleter aborts.  Using log_newpage_buffer() seems fine; there's no
need to optimize performance there.  (All the better if we can, with minimal
hacks, convince heap_freeze_tuple() itself to log the right changes.)

I am wary about the performance loss of doing these checks in every
heap_prune_chain() call, for all time.  If it's measurable, can we can shed
the overhead once corrections are done?  Maybe bump the page layout version
and skip the checks for v5 pages?  (Ugh.)

Time is tight to finalize this, but it would be best to get this into next
week's release.  That way, the announcement, fix, and mitigating code
pertaining to this data loss bug all land in the same release.  If necessary,
I think it would be worth delaying the release, or issuing a new release a
week or two later, to closely align those events.  That being said, I'm
prepared to review a patch in this area over the weekend.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: Incomplete freezing when truncating a relation during vacuum

От
Andres Freund
Дата:
Hi Noah,

On 2013-11-30 00:40:06 -0500, Noah Misch wrote:
> > > On Wed, Nov 27, 2013 at 02:14:53PM +0100, Andres Freund wrote:
> > > > With regard to fixing things up, ISTM the best bet is heap_prune_chain()
> > > > so far. That's executed b vacuum and by opportunistic pruning and we
> > > > know we have the appropriate locks there. Looks relatively easy to fix
> > > > up things there. Not sure if there are any possible routes to WAL log
> > > > this but using log_newpage()?
> > > > I am really not sure what the best course of action is :(
> 
> Based on subsequent thread discussion, the plan you outline sounds reasonable.
> Here is a sketch of the specific semantics of that fixup.  If a HEAPTUPLE_LIVE
> tuple has XIDs older than the current relfrozenxid/relminmxid of its relation
> or newer than ReadNewTransactionId()/ReadNextMultiXactId(), freeze those XIDs.
> Do likewise for HEAPTUPLE_DELETE_IN_PROGRESS, ensuring a proper xmin if the
> in-progress deleter aborts.  Using log_newpage_buffer() seems fine; there's no
> need to optimize performance there.

We'd need to decide what to do with xmax values, they'd likely need to
be treated differently.

The problem with log_newpage_buffer() is that we'd quite possibly issue
one such call per item on a page. And that might become quite
expensive. Logging ~1.5MB per 8k page in the worst case sounds a bit
scary.

> (All the better if we can, with minimal
> hacks, convince heap_freeze_tuple() itself to log the right changes.)

That likely comes to late - we've already pruned the page and might have
made wrong decisions there. Also, heap_freeze_tuple() is run on both the
primary and standbys.
I think our xl_heap_freeze format, that relies on running
heap_freeze_tuple() during recovery, is a terrible idea, but we cant
change that right now.

> Time is tight to finalize this, but it would be best to get this into next
> week's release.  That way, the announcement, fix, and mitigating code
> pertaining to this data loss bug all land in the same release.  If necessary,
> I think it would be worth delaying the release, or issuing a new release a
> week or two later, to closely align those events.  That being said, I'm
> prepared to review a patch in this area over the weekend.

I don't think I currently have the energy/brainpower/time to develop
such a fix in a suitable quality till monday. I've worked pretty hard on
trying to fix the host of multixact data corruption bugs the last days
and developing a solution that I'd be happy to put into such critical
paths is certainly several days worth of work.

I am not sure if it's a good idea to delay the release because of this,
there are so many other critical issues that that seems like a bad
tradeoff.

That said, if somebody else is taking the lead I am certainly willing to
help in detail with review and testing.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Incomplete freezing when truncating a relation during vacuum

От
Tom Lane
Дата:
Andres Freund <andres@2ndquadrant.com> writes:
> I am not sure if it's a good idea to delay the release because of this,
> there are so many other critical issues that that seems like a bad
> tradeoff.

Indeed.  We already said that this release was being done *now* because
of the replication bug, and I see no reason to change that.  The more
last-minute stuff we try to cram in, the bigger risk of (another)
mistake.  We've already taken a credibility hit from introducing a new
bug into the last round of update releases; let's please not take a
risk of doing that again.
        regards, tom lane



Re: Incomplete freezing when truncating a relation during vacuum

От
Andres Freund
Дата:
On 2013-11-30 11:18:09 -0500, Tom Lane wrote:
> We've already taken a credibility hit from introducing a new
> bug into the last round of update releases; let's please not take a
> risk of doing that again.

On that front: I'd love for somebody else to look at the revised 9.3
freezing logic. It's way to complicated and there are quite some
subtleties about freezing that are hard to get right, as evidenced by
9.3.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Incomplete freezing when truncating a relation during vacuum

От
Noah Misch
Дата:
On Sat, Nov 30, 2013 at 05:22:04PM +0100, Andres Freund wrote:
> On that front: I'd love for somebody else to look at the revised 9.3
> freezing logic.

Do you speak of the changes to xmax freezing arising from the FK lock
optimization?

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: Incomplete freezing when truncating a relation during vacuum

От
Andres Freund
Дата:
On 2013-11-30 11:18:09 -0500, Tom Lane wrote:
> Indeed.  We already said that this release was being done *now* because
> of the replication bug, and I see no reason to change that.

FWIW, I think the two other data corrupting bugs, "incomplete freezing
due to truncation" (all branches) and freezing overall (in 9.3), are at
least as bad because they take effect on the primary.
Not saying that because of my involvement, but because I think they need
to be presented at least as prominent in the release notes.
They bugs themselves are all fixed in the relevant branches, but I do
think we need to talk about about how to detect and fix possible
corruption.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Incomplete freezing when truncating a relation during vacuum

От
Tom Lane
Дата:
Andres Freund <andres@2ndquadrant.com> writes:
> FWIW, I think the two other data corrupting bugs, "incomplete freezing
> due to truncation" (all branches) and freezing overall (in 9.3), are at
> least as bad because they take effect on the primary.
> Not saying that because of my involvement, but because I think they need
> to be presented at least as prominent in the release notes.
> They bugs themselves are all fixed in the relevant branches, but I do
> think we need to talk about about how to detect and fix possible
> corruption.

I was planning to draft up the release notes today.  Can you propose
text about the above?
        regards, tom lane



Re: Incomplete freezing when truncating a relation during vacuum

От
Andres Freund
Дата:
On 2013-11-30 11:40:36 -0500, Noah Misch wrote:
> On Sat, Nov 30, 2013 at 05:22:04PM +0100, Andres Freund wrote:
> > On that front: I'd love for somebody else to look at the revised 9.3
> > freezing logic.
> 
> Do you speak of the changes to xmax freezing arising from the FK lock
> optimization?

Yes. There had been several major bugs in 9.3+ around freezing:
* old "updater" xids in multixacts haven't been subjected to the new xmin horizon => inaccessible or duplicated rows.
* If a multi was too old, we simply removed it, even if it contained an committed xmax => duplicate rows
* If HEAP_XMAX_INVALID was set in heap_freeze_tuple() and heap_tuple_needs_freeze() we didn't do anything about xmax.
That'scompletely not okay since the hint bit might not have been set on the standby => diverging standby and primary,
withunfrozen rows remaining on the standby, missing or duplicate rows there.
 

The fixed (2393c7d102368717283d7121a6ea8164e902b011) heap_freeze_tuple()
and heap_tuple_needs_freeze() look much safer to me, but theres lots of
complexity there. I don't see any alternative to the complexity until we
change the format of xl_heap_freeze, but some more eyes than Alvaro's
and mine definitely would be good.

71ad5a8475b4df896a7baa71e6dd3c455eebae99 also touches quite some
intricate code paths and fixes a good amount of bugs, so it's definitely
also worthy of further inspection.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Incomplete freezing when truncating a relation during vacuum

От
Andres Freund
Дата:
On 2013-11-30 11:50:57 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > FWIW, I think the two other data corrupting bugs, "incomplete freezing
> > due to truncation" (all branches) and freezing overall (in 9.3), are at
> > least as bad because they take effect on the primary.
> > Not saying that because of my involvement, but because I think they need
> > to be presented at least as prominent in the release notes.
> > They bugs themselves are all fixed in the relevant branches, but I do
> > think we need to talk about about how to detect and fix possible
> > corruption.
> 
> I was planning to draft up the release notes today.  Can you propose
> text about the above?

I can, but it will be a couple of hours before I can give it serious
thought (starving and insanity being serious perils otherwise ;)).

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Incomplete freezing when truncating a relation during vacuum

От
Noah Misch
Дата:
On Sat, Nov 30, 2013 at 05:00:58PM +0100, Andres Freund wrote:
> The problem with log_newpage_buffer() is that we'd quite possibly issue
> one such call per item on a page. And that might become quite
> expensive. Logging ~1.5MB per 8k page in the worst case sounds a bit
> scary.

I had in mind issuing at most one call per page.  heap_page_prune() has a
structure conducive to that.

> On 2013-11-30 00:40:06 -0500, Noah Misch wrote:
> > Time is tight to finalize this, but it would be best to get this into next
> > week's release.  That way, the announcement, fix, and mitigating code
> > pertaining to this data loss bug all land in the same release.  If necessary,
> > I think it would be worth delaying the release, or issuing a new release a
> > week or two later, to closely align those events.

> I am not sure if it's a good idea to delay the release because of this,
> there are so many other critical issues that that seems like a bad
> tradeoff.

Fair enough; I'll drop that proposal.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: Incomplete freezing when truncating a relation during vacuum

От
Andres Freund
Дата:
On 2013-11-30 12:22:16 -0500, Noah Misch wrote:
> On Sat, Nov 30, 2013 at 05:00:58PM +0100, Andres Freund wrote:
> > The problem with log_newpage_buffer() is that we'd quite possibly issue
> > one such call per item on a page. And that might become quite
> > expensive. Logging ~1.5MB per 8k page in the worst case sounds a bit
> > scary.
> 
> I had in mind issuing at most one call per page.  heap_page_prune() has a
> structure conducive to that.

That, at least as far as I can imagine, would make the fix quite
complicated though. In the first phase heap_page_prune() we aren't in a
critical section and cannot modify the buffer yet, so we would make all
the involved code cope with the unfixed xids and hint bits.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Incomplete freezing when truncating a relation during vacuum

От
Tom Lane
Дата:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2013-11-30 11:50:57 -0500, Tom Lane wrote:
>> I was planning to draft up the release notes today.  Can you propose
>> text about the above?

> I can, but it will be a couple of hours before I can give it serious
> thought (starving and insanity being serious perils otherwise ;)).

Sure.  I can wait till tomorrow as far as this aspect is concerned.
        regards, tom lane



Re: Incomplete freezing when truncating a relation during vacuum

От
Andres Freund
Дата:
On 2013-11-30 11:50:57 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > FWIW, I think the two other data corrupting bugs, "incomplete freezing
> > due to truncation" (all branches) and freezing overall (in 9.3), are at
> > least as bad because they take effect on the primary.
> > Not saying that because of my involvement, but because I think they need
> > to be presented at least as prominent in the release notes.
> > They bugs themselves are all fixed in the relevant branches, but I do
> > think we need to talk about about how to detect and fix possible
> > corruption.
> 
> I was planning to draft up the release notes today.  Can you propose
> text about the above?

* Fix possible data corruptions due to incomplete vacuuming (Andres Freund, Heikki Linnakangas)

Due to this bug (auto-)vacuum could sometimes treat a partial vacuum as
a full table vacuum mistakenly increasing relfrozenxid as a result. This
could happen if it managed to truncate the tail end of the table due to
dead space. Possible consequences are:
* Errors like "could not access status of transaction XXX" when accessing such rows.
* Vanishing rows after more than 2^31 transactions have passed.

Tables in which parts only changing infrequently, while others change
heavily are more likely to be affected.

It is recommended to perform a VACUUM of all tables in all databases
while having vacuum_freeze_table_age set to zero. This will fix latent
corruption but will not be able to fix all errors.

To detect whether a database is possibly affected check wether either
"SELECT txid_current() < 2^31" returns 'f' or a VACUUM of all tables
with vacuum_freeze_table_age set to zero returns errors. If neither are
the case, the database is safe after performing the VACUUM.

If you think you are suffering from this corruption, please contact the
pgsql-hackers mailing list or your service provider, data is likely to
be recoverable.

Users updating from 9.0.4/8.4.8 or earlier are not affected.

All branches.

Commit: 82b43f7df2036d06b4410721f77512969846b6d0

* Fix possible data corruptions due to several bugs around vacuuming [in the 9.3 foreign key locks feature] (Andres
Freund,Alvaro Herrera)
 

The VACUUM implementation in 9.3 had several bugs: It removed multixact
xmax values without regard of the importance of contained xids, it did
not remove multixacts if the contained xids were too old and it relied
on hint bits when checking whether a row needed to be frozen which might
not have been set on replicas.

It is unlikely that databases on a primary are affected in which no
VACUUM FREEZE or a VACUUM with a nondefault vacuum_freeze_min_age was
ever executed and in which SELECT relminmxid FROM pg_class WHERE relkind
= 'r' AND NOT oid = 1262 AND NOT relminmxid = 1 returns no rows.

Possible consequences are:
* Duplicated or vanishing rows.
* Errors like "could not access status of transaction XXX" when accessing rows.
* Primary and Standby servers getting out of sync

It is strongly recommended to re-clone all standby servers after
ugprading, especially if full_page_writes was set to false.  On the
primary it recommented to execute a VACUUM of all tables in all
databases after upgrading both the primary and possibly existing
standbys while having vacuum_freeze_table_age set to zero. This will fix
latent corruption on primaries but will not be able to fix all
pre-existing errors.

If you think you are suffering from data loss due this corruption on the
primary, please contact the pgsql-hackers mailing list or your service
provider, some data might be recoverable.

9.3 only, but should be mentioned first as corruption due to this is quite likely.

Commit: 2393c7d102368717283d7121a6ea8164e902b011

I had quite a hard time - likely noticeable - to summarize the second
time in easy to understand terms. The interactions are quite
complicated.
We could tell users they don't necessarily need to re-clone standbys if
no xids have been truncated away (txid_current() < 2^31, and
datfrozenxid of at least one database = 1), but given the replication
issue that seems like unneccessary confusion.

Questions?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Incomplete freezing when truncating a relation during vacuum

От
Andres Freund
Дата:
On 2013-11-27 14:53:27 -0500, Noah Misch wrote:
> How would you characterize the chances of this happening with default
> *vacuum_freeze_*_age settings?  Offhand, it seems you would need to encounter
> this bug during each of ~10 generations of autovacuum_freeze_max_age before
> the old rows actually become invisible.

On second thought, it's quite possible to see problems before that
leading to more problems. A single occurance of such a illegitimate
increase in relfrozenxid can be enough to cause problems of a slightly
different nature.
As relfrozenxid has been updated we might now, or after vacuuming some
other tables, become elegible to truncate the clog. In that case we'll
get ERRORs about "could not access status of transaction" if the tuple
hasn't been fully hinted when scanning it later.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Incomplete freezing when truncating a relation during vacuum

От
Andres Freund
Дата:
On 2013-12-01 13:33:42 +0100, Andres Freund wrote:
> On 2013-11-27 14:53:27 -0500, Noah Misch wrote:
> > How would you characterize the chances of this happening with default
> > *vacuum_freeze_*_age settings?  Offhand, it seems you would need to encounter
> > this bug during each of ~10 generations of autovacuum_freeze_max_age before
> > the old rows actually become invisible.
> 
> On second thought, it's quite possible to see problems before that
> leading to more problems. A single occurance of such a illegitimate
> increase in relfrozenxid can be enough to cause problems of a slightly
> different nature.
> As relfrozenxid has been updated we might now, or after vacuuming some
> other tables, become elegible to truncate the clog. In that case we'll
> get ERRORs about "could not access status of transaction" if the tuple
> hasn't been fully hinted when scanning it later.

And indeed, a quick search shows up some threads that might suffer from
it:
BD7EE863F673A14EBF99D95562AEE15E44B1DA71@digi-pdc.digitilitiprod.int
CAAzPmNxfDrV72wDmBEv5tcQOByE_wvGSeqRkQj0FizXmCYyaPQ@mail.gmail.com
CAK9oVJwvAZLmdMrHMPg1+s37z16j+BZ8FbarZSpmrHsXxH-4GQ@mail.gmail.com

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Incomplete freezing when truncating a relation during vacuum

От
Noah Misch
Дата:
On Sun, Dec 01, 2013 at 01:55:45PM +0100, Andres Freund wrote:
> On 2013-12-01 13:33:42 +0100, Andres Freund wrote:
> > On 2013-11-27 14:53:27 -0500, Noah Misch wrote:
> > > How would you characterize the chances of this happening with default
> > > *vacuum_freeze_*_age settings?  Offhand, it seems you would need to encounter
> > > this bug during each of ~10 generations of autovacuum_freeze_max_age before
> > > the old rows actually become invisible.
> > 
> > On second thought, it's quite possible to see problems before that
> > leading to more problems. A single occurance of such a illegitimate
> > increase in relfrozenxid can be enough to cause problems of a slightly
> > different nature.
> > As relfrozenxid has been updated we might now, or after vacuuming some
> > other tables, become elegible to truncate the clog. In that case we'll
> > get ERRORs about "could not access status of transaction" if the tuple
> > hasn't been fully hinted when scanning it later.

Agreed.  Probably, the use of hint bits and the low frequency with which
TruncateCLOG() can actually remove something has kept this below the radar.

> And indeed, a quick search shows up some threads that might suffer from
> it:
> BD7EE863F673A14EBF99D95562AEE15E44B1DA71@digi-pdc.digitilitiprod.int

This system had multiple problems, a missing pg_subtrans file and a missing
TOAST chunk for pg_attribute.  I don't see a pg_clog problem connecting it to
the freeze bug at hand.

> CAAzPmNxfDrV72wDmBEv5tcQOByE_wvGSeqRkQj0FizXmCYyaPQ@mail.gmail.com

This report is against PostgreSQL 8.1.11, which never had a commit like
b4b6923.  If a similar bug is at work, this older version acquired it through
a different vector.

> CAK9oVJwvAZLmdMrHMPg1+s37z16j+BZ8FbarZSpmrHsXxH-4GQ@mail.gmail.com

Possible match, but suggestions of lower-level problems cloud the diagnosis.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: Incomplete freezing when truncating a relation during vacuum

От
Andres Freund
Дата:
On 2013-12-01 12:49:40 -0500, Noah Misch wrote:
> This system had multiple problems, a missing pg_subtrans file and a missing
> TOAST chunk for pg_attribute.  I don't see a pg_clog problem connecting it to
> the freeze bug at hand.

Those all sound like possible problems caused by the bug, no?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Incomplete freezing when truncating a relation during vacuum

От
Noah Misch
Дата:
On Sun, Dec 01, 2013 at 06:56:10PM +0100, Andres Freund wrote:
> On 2013-12-01 12:49:40 -0500, Noah Misch wrote:
> > This system had multiple problems, a missing pg_subtrans file and a missing
> > TOAST chunk for pg_attribute.  I don't see a pg_clog problem connecting it to
> > the freeze bug at hand.
> 
> Those all sound like possible problems caused by the bug, no?

pg_subtrans has a lifecycle unrelated to datfrozenxid.  I am not aware of a
mechanism connecting that problem to the bug at hand.

The missing TOAST chunk (in pg_statistic, not pg_attribute as I wrote above)
could happen from the XID space wrapping with that TOAST table page marked
all-visible but not frozen.  The bug reporter describes the start of that
error coinciding with a minor version upgrade, so that would be an odd
coincidence: 8.4.3 did not have the bug as we know it, so considerable time
would typically pass between the upgrade and that symptom appearing.  Can't
rule it out, but this user report fits the known bug symptoms only loosely.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: Incomplete freezing when truncating a relation during vacuum

От
Tom Lane
Дата:
Andres Freund <andres@2ndquadrant.com> writes:
> The VACUUM implementation in 9.3 had several bugs: It removed multixact
> xmax values without regard of the importance of contained xids, it did
> not remove multixacts if the contained xids were too old and it relied
> on hint bits when checking whether a row needed to be frozen which might
> not have been set on replicas.

Uh ... what does the last have to do with it?  Surely we don't run
VACUUM on replicas.  Or are you talking about what might happen when
VACUUM is run on a former replica that's been promoted to master?
        regards, tom lane



Re: Incomplete freezing when truncating a relation during vacuum

От
Andres Freund
Дата:

Tom Lane <tgl@sss.pgh.pa.us> schrieb:
>Andres Freund <andres@2ndquadrant.com> writes:
>> The VACUUM implementation in 9.3 had several bugs: It removed
>multixact
>> xmax values without regard of the importance of contained xids, it
>did
>> not remove multixacts if the contained xids were too old and it
>relied
>> on hint bits when checking whether a row needed to be frozen which
>might
>> not have been set on replicas.
>
>Uh ... what does the last have to do with it?  Surely we don't run
>VACUUM on replicas.  Or are you talking about what might happen when
>VACUUM is run on a former replica that's been promoted to master?

Unfortunately not. The problem is that xl_heap_freeze's redo function simply reexecutes heap-freeze-tuple() instead of
loggingmuch about each tuple...
 

Andres


-- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund                       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Incomplete freezing when truncating a relation during vacuum

От
Andres Freund
Дата:

Noah Misch <noah@leadboat.com> schrieb:
>On Sun, Dec 01, 2013 at 06:56:10PM +0100, Andres Freund wrote:
>> On 2013-12-01 12:49:40 -0500, Noah Misch wrote:
>> > This system had multiple problems, a missing pg_subtrans file and a
>missing
>> > TOAST chunk for pg_attribute.  I don't see a pg_clog problem
>connecting it to
>> > the freeze bug at hand.
>> 
>> Those all sound like possible problems caused by the bug, no?
>
>pg_subtrans has a lifecycle unrelated to datfrozenxid.  I am not aware
>of a
>mechanism connecting that problem to the bug at hand.

TransactinIdIsInProgress returns true for future xids. Which triggers the use of XactLockTableWait. Which then does
SubtransGetParentof a far future xid...
 

Andres

-- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund                       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Incomplete freezing when truncating a relation during vacuum

От
Tom Lane
Дата:
Andres Freund <andres@2ndquadrant.com> writes:
> Tom Lane <tgl@sss.pgh.pa.us> schrieb:
>> Uh ... what does the last have to do with it?  Surely we don't run
>> VACUUM on replicas.  Or are you talking about what might happen when
>> VACUUM is run on a former replica that's been promoted to master?

> Unfortunately not. The problem is that xl_heap_freeze's redo function simply reexecutes heap-freeze-tuple() instead
oflogging much about each tuple...
 

That was a pretty stupid choice ... we should think seriously about
changing that for 9.4.  In general the application of a WAL record
needs to be 100% deterministic.
        regards, tom lane



Re: Incomplete freezing when truncating a relation during vacuum

От
Tom Lane
Дата:
Andres Freund <andres@2ndquadrant.com> writes:
> * Fix possible data corruptions due to incomplete vacuuming (Andres Freund, Heikki Linnakangas)

> Due to this bug (auto-)vacuum could sometimes treat a partial vacuum as
> a full table vacuum mistakenly increasing relfrozenxid as a result. This
> could happen if it managed to truncate the tail end of the table due to
> dead space. Possible consequences are:
> * Errors like "could not access status of transaction XXX" when
>   accessing such rows.
> * Vanishing rows after more than 2^31 transactions have passed.

Is there really a significant risk of clog access errors due to this bug?
IIUC, the risk is that tuples in pages that vacuum skips due to being
all-visible might not be frozen when intended.  But it seems just about
certain that such tuples would be properly hinted already, which means
that nothing would ever go to clog to confirm that.  So ISTM the only real
risk is that rows would become invisible after 2^31 transactions (and then
visible again after 2^31 more).

And even then you'd need persistent bad luck, in the form of incorrect
advancements of relfrozenxid having happened often enough to prevent any
anti-wraparound vacuums from getting launched.

Am I missing something?  It's certainly a bad bug, but it seems to me
that the probability of data loss is low enough that it's not so
surprising this has escaped detection for so long.
        regards, tom lane



Re: Incomplete freezing when truncating a relation during vacuum

От
Andres Freund
Дата:
On 2013-12-01 17:15:31 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > * Fix possible data corruptions due to incomplete vacuuming (Andres Freund, Heikki Linnakangas)
> 
> > Due to this bug (auto-)vacuum could sometimes treat a partial vacuum as
> > a full table vacuum mistakenly increasing relfrozenxid as a result. This
> > could happen if it managed to truncate the tail end of the table due to
> > dead space. Possible consequences are:
> > * Errors like "could not access status of transaction XXX" when
> >   accessing such rows.
> > * Vanishing rows after more than 2^31 transactions have passed.
> 
> Is there really a significant risk of clog access errors due to this bug?
> IIUC, the risk is that tuples in pages that vacuum skips due to being
> all-visible might not be frozen when intended.  But it seems just about
> certain that such tuples would be properly hinted already, which means
> that nothing would ever go to clog to confirm that.  So ISTM the only real
> risk is that rows would become invisible after 2^31 transactions (and then
> visible again after 2^31 more).

Unfortunately it's not actually too hard to hit due to following part of the
code in vacuumlazy.c:

/* We need buffer cleanup lock so that we can prune HOT chains. */
if (!ConditionalLockBufferForCleanup(buf))
{/* * If we're not scanning the whole relation to guard against XID * wraparound, it's OK to skip vacuuming a page.
Thenext vacuum * will clean it up. */if (!scan_all){    ReleaseBuffer(buf);    continue;}
 
...

if you have some concurrency you hit that path pretty often. And once
such a vacuum went through the table it will have a higher relfrozenxid,
so an impending "wave" of anti-wraparound vacuums won't scan it.

Also, the hint bits won't be on potential standbys, so that's not
necessarily preventing problems.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Incomplete freezing when truncating a relation during vacuum

От
Andres Freund
Дата:
On 2013-12-01 15:54:41 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > Tom Lane <tgl@sss.pgh.pa.us> schrieb:
> >> Uh ... what does the last have to do with it?  Surely we don't run
> >> VACUUM on replicas.  Or are you talking about what might happen when
> >> VACUUM is run on a former replica that's been promoted to master?
> 
> > Unfortunately not. The problem is that xl_heap_freeze's redo function simply reexecutes heap-freeze-tuple() instead
oflogging much about each tuple...
 
> 
> That was a pretty stupid choice ... we should think seriously about
> changing that for 9.4.  In general the application of a WAL record
> needs to be 100% deterministic.

Completely agreed. I'm not really sure what led to that design choice
except the desire to save a bit of WAL volume. It's a pretty old piece
of code - a good while before I followed development in any form of
detail.
It's actually in the original commit
(48188e1621bb6711e7d092bee48523b18cd80177) that introduced today's form
of freezing.

If it had been a more robust format all along, potential damage from the
replication bug could have been fixed by a VACUUM FREEZE...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Incomplete freezing when truncating a relation during vacuum

От
Tom Lane
Дата:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2013-12-01 17:15:31 -0500, Tom Lane wrote:
>> Is there really a significant risk of clog access errors due to this bug?
>> IIUC, the risk is that tuples in pages that vacuum skips due to being
>> all-visible might not be frozen when intended.

> Unfortunately it's not actually too hard to hit due to following part of the
> code in vacuumlazy.c:

>     /*
>      * If we're not scanning the whole relation to guard against XID
>      * wraparound, it's OK to skip vacuuming a page.  The next vacuum
>      * will clean it up.
>      */

Ah.  So it's only been *seriously* broken since commit bbb6e559c, ie 9.2.
        regards, tom lane



Re: Incomplete freezing when truncating a relation during vacuum

От
Andres Freund
Дата:
On 2013-12-01 18:02:27 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2013-12-01 17:15:31 -0500, Tom Lane wrote:
> >> Is there really a significant risk of clog access errors due to this bug?
> >> IIUC, the risk is that tuples in pages that vacuum skips due to being
> >> all-visible might not be frozen when intended.
> 
> > Unfortunately it's not actually too hard to hit due to following part of the
> > code in vacuumlazy.c:
> 
> >     /*
> >      * If we're not scanning the whole relation to guard against XID
> >      * wraparound, it's OK to skip vacuuming a page.  The next vacuum
> >      * will clean it up.
> >      */
> 
> Ah.  So it's only been *seriously* broken since commit bbb6e559c, ie 9.2.

Well, even before that crash recovery/replication didn't necessarily
preserve the hint bits. Even more so if somebody dared to set
full_page_writes=off.

I personally think full_page_writes=off should conflict with wal_level
!= minimal, btw, but I don't see much chance of gaining acceptance for
that.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services