Обсуждение: BlowAwayRelationBuffers

Поиск
Список
Период
Сортировка

BlowAwayRelationBuffers

От
Adriaan Joubert
Дата:
Hmmm, I got the following this morning on version 6.5.2 on DEC Alpha
during a vacuum verbose analyze. Ended up with duplicate rows of
everything.

NOTICE:  --Relation tasksids--
NOTICE:  Pages 1356: Changed 349, Reapped 875, Empty 0, New 0; Tup
88946: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 123921, MinLen 41, MaxLen
41; Re-using: Free/Avail. Space 5965708/5965708; EndEmpty/Avail. Pages
0/875. Elapsed 0/0 sec.
NOTICE:  Rel tasksids: Pages: 1356 --> 567; Tuple(s) moved: 31746.
Elapsed 0/0 sec.
NOTICE:  BlowawayRelationBuffers(tasksids, 567): block 764 is referenced
(private 0, last 0, global 1)
pqReadData() -- backend closed the channel unexpectedly.       This probably means the backend terminated abnormally
  before or while processing the request.
 
We have lost the connection to the backend, so further processing is
impossible.  Terminating.

According to the mailing list archive

http://www.postgresql.org/mhonarc/pgsql-hackers/1999-02/msg00052.html

a bug in this area was fixed in 6.4. I seem to remember that somebody is
looking at vacuum at the moment, so this may be something to keep in
mind.

Adriaan



Re: [HACKERS] BlowAwayRelationBuffers

От
Tom Lane
Дата:
Adriaan Joubert <a.joubert@albourne.com> writes:
> Hmmm, I got the following this morning on version 6.5.2 on DEC Alpha
> during a vacuum verbose analyze. Ended up with duplicate rows of
> everything.

Really!?  The referencecount failure doesn't surprise me a whole lot,
given the refcount bugs that I fixed a couple months ago (no, those
fixes are not in 6.5.* :-().  But VACUUM is supposed to be guaranteed
proof against generating duplicate tuples by design --- that's what
all the HEAP_MOVED_OFF and HEAP_MOVED_IN foofaraw is about.

Perhaps there is a glitch in the tuple validity checking logic for
HEAP_MOVED_OFF/HEAP_MOVED_IN?  Anyone see it?

Given that this was on an Alpha, it could be a 64-bit-platform-
dependency kind of bug...
        regards, tom lane


Re: [HACKERS] BlowAwayRelationBuffers

От
Adriaan Joubert
Дата:
Tom Lane wrote:

> Adriaan Joubert <a.joubert@albourne.com> writes:
> > Hmmm, I got the following this morning on version 6.5.2 on DEC Alpha
> > during a vacuum verbose analyze. Ended up with duplicate rows of
> > everything.
>
> Really!?  The referencecount failure doesn't surprise me a whole lot,
> given the refcount bugs that I fixed a couple months ago (no, those
> fixes are not in 6.5.* :-().  But VACUUM is supposed to be guaranteed
> proof against generating duplicate tuples by design --- that's what
> all the HEAP_MOVED_OFF and HEAP_MOVED_IN foofaraw is about.
>
> Perhaps there is a glitch in the tuple validity checking logic for
> HEAP_MOVED_OFF/HEAP_MOVED_IN?  Anyone see it?
>
> Given that this was on an Alpha, it could be a 64-bit-platform-
> dependency kind of bug...

This is not the first time that I've ended up with duplicate tuples: I
even have a standard mechanism to deal with them :-(! Initially I thought
this was due to tables getting corrupted by having index entries that
were too large, but that has been fixed (and has caused no problems since
the fix you sent -- thanks again!), and this still happens. It seems to
happen most frequently when there have been a very large number of
changes to the tables between vacuums.

Adriaan



RE: [HACKERS] BlowAwayRelationBuffers

От
"Hiroshi Inoue"
Дата:
> -----Original Message-----
> From: owner-pgsql-hackers@postgreSQL.org
> [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Tom Lane
> 
> Adriaan Joubert <a.joubert@albourne.com> writes:
> > Hmmm, I got the following this morning on version 6.5.2 on DEC Alpha
> > during a vacuum verbose analyze. Ended up with duplicate rows of
> > everything.
> 
> Really!?  The referencecount failure doesn't surprise me a whole lot,
> given the refcount bugs that I fixed a couple months ago (no, those
> fixes are not in 6.5.* :-().  But VACUUM is supposed to be guaranteed
> proof against generating duplicate tuples by design --- that's what
> all the HEAP_MOVED_OFF and HEAP_MOVED_IN foofaraw is about.
> 
> Perhaps there is a glitch in the tuple validity checking logic for
> HEAP_MOVED_OFF/HEAP_MOVED_IN?  Anyone see it?
>

I commited the following change to REL tree after 6.5.2.
It might be late for Adriaan.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp

*** xact.c.orig Wed Jan 12 17:53:19 2000
--- xact.c      Tue Oct 19 11:54:39 1999
***************
*** 733,741 ****       /*        * Have the transaction access methods record the status of        * this transaction
idin the pg_log relation. We skip it
 
!        * if no one shared buffer was changed by this transaction.        */
!       if (SharedBufferChanged)               TransactionIdAbort(xid);
       ResetBufferPool();
--- 733,742 ----       /*        * Have the transaction access methods record the status of        * this transaction
idin the pg_log relation. We skip it
 
!        * if no one shared buffer was changed by this transaction
!        * or this transaction has been committed already.        */
!       if (SharedBufferChanged && !TransactionIdDidCommit(xid))               TransactionIdAbort(xid);

       ResetBufferPool();


Re: [HACKERS] BlowAwayRelationBuffers

От
Alfred Perlstein
Дата:
* Tom Lane <tgl@sss.pgh.pa.us> [000112 00:56] wrote:
> Adriaan Joubert <a.joubert@albourne.com> writes:
> > Hmmm, I got the following this morning on version 6.5.2 on DEC Alpha
> > during a vacuum verbose analyze. Ended up with duplicate rows of
> > everything.
> 
> Really!?  The referencecount failure doesn't surprise me a whole lot,
> given the refcount bugs that I fixed a couple months ago (no, those
> fixes are not in 6.5.* :-().  But VACUUM is supposed to be guaranteed
> proof against generating duplicate tuples by design --- that's what
> all the HEAP_MOVED_OFF and HEAP_MOVED_IN foofaraw is about.
> 
> Perhaps there is a glitch in the tuple validity checking logic for
> HEAP_MOVED_OFF/HEAP_MOVED_IN?  Anyone see it?
> 
> Given that this was on an Alpha, it could be a 64-bit-platform-
> dependency kind of bug...

We've seen this on postgresql 6.5.3 on i386+FreeBSD 4.0, the only
way I was able to fix it was by dumping the entire table, running
sort on it and re-importing it.

Btw, I'd be interested in your opinion on the issues I recently
brought up with libpq when you have the time.

-Alfred



> 
>             regards, tom lane


Re: [HACKERS] BlowAwayRelationBuffers

От
Adriaan Joubert
Дата:
Thanks Hiroshi, I will patch my database and see whether that helps. Guess
i really ought to upgrade to 6.5.3, but  I had some compile problems on
Alpha which I haven't looked at closely yet.

thanks again,

Adriaan



RE: [HACKERS] BlowAwayRelationBuffers

От
"Hiroshi Inoue"
Дата:
> -----Original Message-----
> From: a.joubert@albourne.com [mailto:a.joubert@albourne.com]
>
> Thanks Hiroshi, I will patch my database and see whether that helps. Guess
> i really ought to upgrade to 6.5.3, but  I had some compile problems on
> Alpha which I haven't looked at closely yet.
>

Unfortunately the patch could neither recover your current status
nor prevent the occurrence of BlowAwayRelationBuffers.
It may only prevent the occurrence of inconsistency after
the error.

BlowAwayRelationBuffers is called immediately before truncation of
the target relation file in VACUUM. Without applying my patch,
HEAP_MOVED_OFF tuples would revive after BlowAwayRelationBuffers
error.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp



Re: [HACKERS] BlowAwayRelationBuffers]

От
Bruce Momjian
Дата:
[Charset iso-8859-1 unsupported, filtering to ASCII...]
> > -----Original Message-----
> > From: a.joubert@albourne.com [mailto:a.joubert@albourne.com]
> >
> > Thanks Hiroshi, I will patch my database and see whether that helps. Guess
> > i really ought to upgrade to 6.5.3, but  I had some compile problems on
> > Alpha which I haven't looked at closely yet.
> >
> 
> Unfortunately the patch could neither recover your current status
> nor prevent the occurrence of BlowAwayRelationBuffers.
> It may only prevent the occurrence of inconsistency after
> the error.
> 
> BlowAwayRelationBuffers is called immediately before truncation of
> the target relation file in VACUUM. Without applying my patch,
> HEAP_MOVED_OFF tuples would revive after BlowAwayRelationBuffers
> error.

Wow, our team is really getting good at understanding this low-level code.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] BlowAwayRelationBuffers

От
Tom Lane
Дата:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> I commited the following change to REL tree after 6.5.2.
> It might be late for Adriaan.

> !       if (SharedBufferChanged)
>                 TransactionIdAbort(xid);

> !       if (SharedBufferChanged && !TransactionIdDidCommit(xid))
>                 TransactionIdAbort(xid);

OK, I guess the point is that if VACUUM aborts at some time after
it's done its internal commit, this code would have un-done the
commit, thereby allowing HEAP_MOVED_OFF tuples to spring back to
life?

I was trying to figure out if this change might fix the duplicate-
tuples-after-failed-VACUUM problems that we've just been hearing
about.  Certainly there is plenty of stuff going on in VACUUM after
its internal commit, so plenty of places that could elog(ERROR).
But it looks like the very first thing that happens after commit
is a scan to commit HEAP_MOVED_IN tuples and kill HEAP_MOVED_OFF
tuples, so this couldn't help much unless the failure happened
during that scan.  Which doesn't seem really likely...
        regards, tom lane



RE: [HACKERS] BlowAwayRelationBuffers

От
"Hiroshi Inoue"
Дата:
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> 
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> > I commited the following change to REL tree after 6.5.2.
> > It might be late for Adriaan.
> 
> > !       if (SharedBufferChanged)
> >                 TransactionIdAbort(xid);
> 
> > !       if (SharedBufferChanged && !TransactionIdDidCommit(xid))
> >                 TransactionIdAbort(xid);
> 
> OK, I guess the point is that if VACUUM aborts at some time after
> it's done its internal commit, this code would have un-done the
> commit, thereby allowing HEAP_MOVED_OFF tuples to spring back to
> life?
>

Yes.
> I was trying to figure out if this change might fix the duplicate-
> tuples-after-failed-VACUUM problems that we've just been hearing
> about.  Certainly there is plenty of stuff going on in VACUUM after
> its internal commit, so plenty of places that could elog(ERROR).
> But it looks like the very first thing that happens after commit
> is a scan to commit HEAP_MOVED_IN tuples and kill HEAP_MOVED_OFF

Certainly when BlowAwayRelationBuffers() is called,commit to HEAP_
MOVED_IN(OFF) was already completed.
However it seems that the pages which are about to be truncated
are not touched.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp