Обсуждение: BlowAwayRelationBuffers
Hmmm, I got the following this morning on version 6.5.2 on DEC Alpha during a vacuum verbose analyze. Ended up with duplicate rows of everything. NOTICE: --Relation tasksids-- NOTICE: Pages 1356: Changed 349, Reapped 875, Empty 0, New 0; Tup 88946: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 123921, MinLen 41, MaxLen 41; Re-using: Free/Avail. Space 5965708/5965708; EndEmpty/Avail. Pages 0/875. Elapsed 0/0 sec. NOTICE: Rel tasksids: Pages: 1356 --> 567; Tuple(s) moved: 31746. Elapsed 0/0 sec. NOTICE: BlowawayRelationBuffers(tasksids, 567): block 764 is referenced (private 0, last 0, global 1) pqReadData() -- backend closed the channel unexpectedly. This probably means the backend terminated abnormally before or while processing the request. We have lost the connection to the backend, so further processing is impossible. Terminating. According to the mailing list archive http://www.postgresql.org/mhonarc/pgsql-hackers/1999-02/msg00052.html a bug in this area was fixed in 6.4. I seem to remember that somebody is looking at vacuum at the moment, so this may be something to keep in mind. Adriaan
Adriaan Joubert <a.joubert@albourne.com> writes: > Hmmm, I got the following this morning on version 6.5.2 on DEC Alpha > during a vacuum verbose analyze. Ended up with duplicate rows of > everything. Really!? The referencecount failure doesn't surprise me a whole lot, given the refcount bugs that I fixed a couple months ago (no, those fixes are not in 6.5.* :-(). But VACUUM is supposed to be guaranteed proof against generating duplicate tuples by design --- that's what all the HEAP_MOVED_OFF and HEAP_MOVED_IN foofaraw is about. Perhaps there is a glitch in the tuple validity checking logic for HEAP_MOVED_OFF/HEAP_MOVED_IN? Anyone see it? Given that this was on an Alpha, it could be a 64-bit-platform- dependency kind of bug... regards, tom lane
Tom Lane wrote: > Adriaan Joubert <a.joubert@albourne.com> writes: > > Hmmm, I got the following this morning on version 6.5.2 on DEC Alpha > > during a vacuum verbose analyze. Ended up with duplicate rows of > > everything. > > Really!? The referencecount failure doesn't surprise me a whole lot, > given the refcount bugs that I fixed a couple months ago (no, those > fixes are not in 6.5.* :-(). But VACUUM is supposed to be guaranteed > proof against generating duplicate tuples by design --- that's what > all the HEAP_MOVED_OFF and HEAP_MOVED_IN foofaraw is about. > > Perhaps there is a glitch in the tuple validity checking logic for > HEAP_MOVED_OFF/HEAP_MOVED_IN? Anyone see it? > > Given that this was on an Alpha, it could be a 64-bit-platform- > dependency kind of bug... This is not the first time that I've ended up with duplicate tuples: I even have a standard mechanism to deal with them :-(! Initially I thought this was due to tables getting corrupted by having index entries that were too large, but that has been fixed (and has caused no problems since the fix you sent -- thanks again!), and this still happens. It seems to happen most frequently when there have been a very large number of changes to the tables between vacuums. Adriaan
> -----Original Message----- > From: owner-pgsql-hackers@postgreSQL.org > [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Tom Lane > > Adriaan Joubert <a.joubert@albourne.com> writes: > > Hmmm, I got the following this morning on version 6.5.2 on DEC Alpha > > during a vacuum verbose analyze. Ended up with duplicate rows of > > everything. > > Really!? The referencecount failure doesn't surprise me a whole lot, > given the refcount bugs that I fixed a couple months ago (no, those > fixes are not in 6.5.* :-(). But VACUUM is supposed to be guaranteed > proof against generating duplicate tuples by design --- that's what > all the HEAP_MOVED_OFF and HEAP_MOVED_IN foofaraw is about. > > Perhaps there is a glitch in the tuple validity checking logic for > HEAP_MOVED_OFF/HEAP_MOVED_IN? Anyone see it? > I commited the following change to REL tree after 6.5.2. It might be late for Adriaan. Regards. Hiroshi Inoue Inoue@tpf.co.jp *** xact.c.orig Wed Jan 12 17:53:19 2000 --- xact.c Tue Oct 19 11:54:39 1999 *************** *** 733,741 **** /* * Have the transaction access methods record the status of * this transaction idin the pg_log relation. We skip it ! * if no one shared buffer was changed by this transaction. */ ! if (SharedBufferChanged) TransactionIdAbort(xid); ResetBufferPool(); --- 733,742 ---- /* * Have the transaction access methods record the status of * this transaction idin the pg_log relation. We skip it ! * if no one shared buffer was changed by this transaction ! * or this transaction has been committed already. */ ! if (SharedBufferChanged && !TransactionIdDidCommit(xid)) TransactionIdAbort(xid); ResetBufferPool();
* Tom Lane <tgl@sss.pgh.pa.us> [000112 00:56] wrote: > Adriaan Joubert <a.joubert@albourne.com> writes: > > Hmmm, I got the following this morning on version 6.5.2 on DEC Alpha > > during a vacuum verbose analyze. Ended up with duplicate rows of > > everything. > > Really!? The referencecount failure doesn't surprise me a whole lot, > given the refcount bugs that I fixed a couple months ago (no, those > fixes are not in 6.5.* :-(). But VACUUM is supposed to be guaranteed > proof against generating duplicate tuples by design --- that's what > all the HEAP_MOVED_OFF and HEAP_MOVED_IN foofaraw is about. > > Perhaps there is a glitch in the tuple validity checking logic for > HEAP_MOVED_OFF/HEAP_MOVED_IN? Anyone see it? > > Given that this was on an Alpha, it could be a 64-bit-platform- > dependency kind of bug... We've seen this on postgresql 6.5.3 on i386+FreeBSD 4.0, the only way I was able to fix it was by dumping the entire table, running sort on it and re-importing it. Btw, I'd be interested in your opinion on the issues I recently brought up with libpq when you have the time. -Alfred > > regards, tom lane
Thanks Hiroshi, I will patch my database and see whether that helps. Guess i really ought to upgrade to 6.5.3, but I had some compile problems on Alpha which I haven't looked at closely yet. thanks again, Adriaan
> -----Original Message----- > From: a.joubert@albourne.com [mailto:a.joubert@albourne.com] > > Thanks Hiroshi, I will patch my database and see whether that helps. Guess > i really ought to upgrade to 6.5.3, but I had some compile problems on > Alpha which I haven't looked at closely yet. > Unfortunately the patch could neither recover your current status nor prevent the occurrence of BlowAwayRelationBuffers. It may only prevent the occurrence of inconsistency after the error. BlowAwayRelationBuffers is called immediately before truncation of the target relation file in VACUUM. Without applying my patch, HEAP_MOVED_OFF tuples would revive after BlowAwayRelationBuffers error. Regards. Hiroshi Inoue Inoue@tpf.co.jp
[Charset iso-8859-1 unsupported, filtering to ASCII...] > > -----Original Message----- > > From: a.joubert@albourne.com [mailto:a.joubert@albourne.com] > > > > Thanks Hiroshi, I will patch my database and see whether that helps. Guess > > i really ought to upgrade to 6.5.3, but I had some compile problems on > > Alpha which I haven't looked at closely yet. > > > > Unfortunately the patch could neither recover your current status > nor prevent the occurrence of BlowAwayRelationBuffers. > It may only prevent the occurrence of inconsistency after > the error. > > BlowAwayRelationBuffers is called immediately before truncation of > the target relation file in VACUUM. Without applying my patch, > HEAP_MOVED_OFF tuples would revive after BlowAwayRelationBuffers > error. Wow, our team is really getting good at understanding this low-level code. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > I commited the following change to REL tree after 6.5.2. > It might be late for Adriaan. > ! if (SharedBufferChanged) > TransactionIdAbort(xid); > ! if (SharedBufferChanged && !TransactionIdDidCommit(xid)) > TransactionIdAbort(xid); OK, I guess the point is that if VACUUM aborts at some time after it's done its internal commit, this code would have un-done the commit, thereby allowing HEAP_MOVED_OFF tuples to spring back to life? I was trying to figure out if this change might fix the duplicate- tuples-after-failed-VACUUM problems that we've just been hearing about. Certainly there is plenty of stuff going on in VACUUM after its internal commit, so plenty of places that could elog(ERROR). But it looks like the very first thing that happens after commit is a scan to commit HEAP_MOVED_IN tuples and kill HEAP_MOVED_OFF tuples, so this couldn't help much unless the failure happened during that scan. Which doesn't seem really likely... regards, tom lane
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > > I commited the following change to REL tree after 6.5.2. > > It might be late for Adriaan. > > > ! if (SharedBufferChanged) > > TransactionIdAbort(xid); > > > ! if (SharedBufferChanged && !TransactionIdDidCommit(xid)) > > TransactionIdAbort(xid); > > OK, I guess the point is that if VACUUM aborts at some time after > it's done its internal commit, this code would have un-done the > commit, thereby allowing HEAP_MOVED_OFF tuples to spring back to > life? > Yes. > I was trying to figure out if this change might fix the duplicate- > tuples-after-failed-VACUUM problems that we've just been hearing > about. Certainly there is plenty of stuff going on in VACUUM after > its internal commit, so plenty of places that could elog(ERROR). > But it looks like the very first thing that happens after commit > is a scan to commit HEAP_MOVED_IN tuples and kill HEAP_MOVED_OFF Certainly when BlowAwayRelationBuffers() is called,commit to HEAP_ MOVED_IN(OFF) was already completed. However it seems that the pages which are about to be truncated are not touched. Regards. Hiroshi Inoue Inoue@tpf.co.jp