Обсуждение: Fwd: Postgres update

Поиск
Список
Период
Сортировка

Fwd: Postgres update

От
Denis Perchine
Дата:
Hello,

What does this mean?????

----------  Forwarded Message  ----------
Subject: Postgres update
Date: Fri, 28 Jul 2000 04:00:09 -0500 (EST)
From: webmail@webmailstation.com (WebmailStation User)


CREATE
DROP
ALTER
psql:/home/www/www.webmailstation.com/sql/reindex.lo.sql:5: NOTICE:  --Relation pg_largeobject--
psql:/home/www/www.webmailstation.com/sql/reindex.lo.sql:5: NOTICE:  Pages 577: Changed 0, reaped 549, Empty 0, New 0;
Tup105038: Vac 12554, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 36, MaxLen 36; Re-using: Free/Avail. Space 469760/465060;
EndEmpty/Avail.Pages 0/548. CPU 0.00s/0.38u sec.
 
psql:/home/www/www.webmailstation.com/sql/reindex.lo.sql:5: NOTICE:  Index pg_largeobject_loid_index: Pages 209; Tuples
105038:Deleted 0. CPU 0.01s/0.50u sec.
 
psql:/home/www/www.webmailstation.com/sql/reindex.lo.sql:5: NOTICE:  Rel pg_largeobject: Pages: 577 --> 515; Tuple(s)
moved:10696. CPU 0.11s/1.08u sec.
 
psql:/home/www/www.webmailstation.com/sql/reindex.lo.sql:5: NOTICE:  Index pg_largeobject_loid_index: Pages 250; Tuples
105038:Deleted 10696. CPU 0.01s/0.55u sec.
 
psql:/home/www/www.webmailstation.com/sql/reindex.lo.sql:5: NOTICE:  FlushRelationBuffers(pg_largeobject, 515): block
504is referenced (private 0, global 1)
 
psql:/home/www/www.webmailstation.com/sql/reindex.lo.sql:5: FATAL 1:  VACUUM (repair_frag): FlushRelationBuffers
returned-2
 
pqReadData() -- backend closed the channel unexpectedly.This probably means the backend terminated abnormallybefore or
whileprocessing the request.
 
psql:/home/www/www.webmailstation.com/sql/reindex.lo.sql:5: connection to server was lost
-------------------------------------------------------

-- 
Sincerely Yours,
Denis Perchine

----------------------------------
E-Mail: dyp@perchine.com
HomePage: http://www.perchine.com/dyp/
FidoNet: 2:5000/120.5
----------------------------------


Re: Fwd: Postgres update

От
Tom Lane
Дата:
Denis Perchine <dyp@perchine.com> writes:
> NOTICE:  FlushRelationBuffers(pg_largeobject, 515): block 504 is referenced (private 0, global 1)
> FATAL 1:  VACUUM (repair_frag): FlushRelationBuffers returned -2

Hmm, there's another report of that in the archives.  You've got a
buffer that has a positive reference count even though (presumably)
no one is using it.  VACUUM is quitting out of paranoia that maybe
some other backend is accessing the table --- there's unlikely to be
any actual data corruption here, just (over?) caution.

You can get back to a state where VACUUM will work on the table by
restarting the postmaster, but to fix the real problem we need to figure
out how the system got into this state in the first place.  Can you
produce a repeatable example of a series of queries that gets you into
this state?
        regards, tom lane


Re: Fwd: Postgres update

От
Denis Perchine
Дата:
> > NOTICE:  FlushRelationBuffers(pg_largeobject, 515): block 504 is referenced (private 0, global 1)
> > FATAL 1:  VACUUM (repair_frag): FlushRelationBuffers returned -2
> 
> Hmm, there's another report of that in the archives.  You've got a
> buffer that has a positive reference count even though (presumably)
> no one is using it.  VACUUM is quitting out of paranoia that maybe
> some other backend is accessing the table --- there's unlikely to be
> any actual data corruption here, just (over?) caution.
> 
> You can get back to a state where VACUUM will work on the table by
> restarting the postmaster, but to fix the real problem we need to figure
> out how the system got into this state in the first place.  Can you
> produce a repeatable example of a series of queries that gets you into
> this state?

I get this after the following:
psql:/home/www/www.webmailstation.com/sql/reindex.sql:75: NOTICE:  !!! write error seems permanent
!!!psql:/home/www/www.webmailstation.com/sql/reindex.sql:75:NOTICE:  !!! now kill all backends and reset postmaster
!!!
psql:/home/www/www.webmailstation.com/sql/reindex.sql:75: ERROR:  cannot write block 175 of ix_q_b_1 [webmailstation]
blind
pqReadData() -- backend closed the channel unexpectedly.This probably means the backend terminated abnormallybefore or
whileprocessing the request.
 
psql:/home/www/www.webmailstation.com/sql/reindex.sql:75: connection to server was lost

This was the command which should create unique index. Something happend and
index became corrupted. After that postgres starts to eat up memory and I killed him.
I recognized that this happend on update of the table on which the index was build and
that update uses the index.

It is hard to reproduce this...
I would like to give you binary data, but unfortunatly I was forced
to rebuild index ASAP and has finished investigation later...

-- 
Sincerely Yours,
Denis Perchine

----------------------------------
E-Mail: dyp@perchine.com
HomePage: http://www.perchine.com/dyp/
FidoNet: 2:5000/120.5
----------------------------------


Re: Fwd: Postgres update

От
Tom Lane
Дата:
Denis Perchine <dyp@perchine.com> writes:
>>>> NOTICE:  FlushRelationBuffers(pg_largeobject, 515): block 504 is referenced (private 0, global 1)
>>>> FATAL 1:  VACUUM (repair_frag): FlushRelationBuffers returned -2

> I get this after the following:

> NOTICE:  !!! write error seems permanent !!!
> NOTICE:  !!! now kill all backends and reset postmaster !!!
> ERROR:  cannot write block 175 of ix_q_b_1 [webmailstation] blind
> pqReadData() -- backend closed the channel unexpectedly.

Oh, that's interesting.  The NOTICEs are coming out of AbortBufferIO()
which is invoked during error processing (in other words, I bet the
ERROR actually happened first.  It's a libpq artifact that the NOTICEs
are presented first on the client side.  If you are keeping the
postmaster log output you could confirm the sequence of events by
looking in the log).  The backend shutdown is then forced by
AbortBufferIO().

AbortBufferIO() seems rather badly designed, but given that it forces
a database-wide restart, I'm not sure how this could relate to the
later FlushRelationBuffers problem.  The restart should get rid of the
old buffers anyway.

> This was the command which should create unique index.

Was the index on the same table that FlushRelationBuffers later had
trouble with (ie, "pg_largeobject")?

What version are you running, anyway?  There is no "pg_largeobject"
in either 6.5 or current AFAIK.
        regards, tom lane


RE: Fwd: Postgres update

От
"Hiroshi Inoue"
Дата:
> -----Original Message-----
> From: pgsql-hackers-owner@hub.org [mailto:pgsql-hackers-owner@hub.org]On
> Behalf Of Tom Lane
> 
> Denis Perchine <dyp@perchine.com> writes:
> >>>> NOTICE:  FlushRelationBuffers(pg_largeobject, 515): block 
> 504 is referenced (private 0, global 1)
> >>>> FATAL 1:  VACUUM (repair_frag): FlushRelationBuffers returned -2
> 
> > I get this after the following:
> 
> > NOTICE:  !!! write error seems permanent !!!
> > NOTICE:  !!! now kill all backends and reset postmaster !!!
> > ERROR:  cannot write block 175 of ix_q_b_1 [webmailstation] blind
> > pqReadData() -- backend closed the channel unexpectedly.
> 
> Oh, that's interesting.  The NOTICEs are coming out of AbortBufferIO()
> which is invoked during error processing (in other words, I bet the
> ERROR actually happened first.  It's a libpq artifact that the NOTICEs
> are presented first on the client side.  If you are keeping the
> postmaster log output you could confirm the sequence of events by
> looking in the log).  The backend shutdown is then forced by
> AbortBufferIO().
> 
> AbortBufferIO() seems rather badly designed, but given that it forces
> a database-wide restart, I'm not sure how this could relate to the

It was me who introduced xxxxBufferIO routines to avoid io_in_progress
spinlock freezing. Unfortunately I didn't think of any elegant way to
recover parmanent write error then. I'm judging the error is parmanent
when write error for the buffer occurs twice.
Because you changed bufmgr before 7.0,we may be able to avoid a
database-wide restart.  However don't we have to leave the buffer 
still dirty even after your change ?  If I recognize correctly,we couldn't
know which backends dirtied the buffer.  Even though we could know
it,we couldn't know if the transactions which dirtied the buffer are still
running.

Regards.

Hiroshi Inoue




Re: Fwd: Postgres update

От
Tom Lane
Дата:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> It was me who introduced xxxxBufferIO routines to avoid io_in_progress
> spinlock freezing. Unfortunately I didn't think of any elegant way to
> recover parmanent write error then. I'm judging the error is parmanent
> when write error for the buffer occurs twice.
> Because you changed bufmgr before 7.0,we may be able to avoid a
> database-wide restart.  However don't we have to leave the buffer 
> still dirty even after your change ?  If I recognize correctly,we couldn't
> know which backends dirtied the buffer.  Even though we could know
> it,we couldn't know if the transactions which dirtied the buffer are still
> running.

Right, that's why we can't just mark the buffer broken and recycle it.
If there is another transaction in progress that's dirtied that same
buffer, it would come back, find the disk page no longer present in that
buffer, conclude that its changes were successfully written, and go
ahead and commit.  No good, it has to fail.  So, leave the buffer
sitting there marked BM_DIRTY + BM_IO_ERROR.  When the other xact wants
to commit, it will try to write the buffer, fail, and abort.  (Or,
if perchance it doesn't fail, that's OK too.)

The only way to discard the buffer short of a postmaster reset would be
if we knew no current transaction could have dirtied the buffer.  Hmm,
I wonder if we could have VACUUM do that before it starts the main
vacuuming process?  If we have exclusive lock on the table, then
presumably there can be no other xacts with pending changes.  So maybe
the first step of VACUUM could be to sweep the buffer table for buffers
of that relation that are marked BM_DIRTY + BM_IO_ERROR, and discard
them?  Not sure about this.

Also, I think deleting the relation that owns the broken buffer will
clean it up correctly in current sources --- ReleaseRelationBuffers
doesn't care if the buffer is dirty or not.
        regards, tom lane


Re: Fwd: Postgres update

От
Denis Perchine
Дата:
Hello Tom,

> >>>> NOTICE:  FlushRelationBuffers(pg_largeobject, 515): block 504 is referenced (private 0, global 1)
> >>>> FATAL 1:  VACUUM (repair_frag): FlushRelationBuffers returned -2
> 
> > I get this after the following:
> 
> > NOTICE:  !!! write error seems permanent !!!
> > NOTICE:  !!! now kill all backends and reset postmaster !!!
> > ERROR:  cannot write block 175 of ix_q_b_1 [webmailstation] blind
> > pqReadData() -- backend closed the channel unexpectedly.
> 
> Oh, that's interesting.  The NOTICEs are coming out of AbortBufferIO()
> which is invoked during error processing (in other words, I bet the
> ERROR actually happened first.  It's a libpq artifact that the NOTICEs
> are presented first on the client side.  If you are keeping the
> postmaster log output you could confirm the sequence of events by
> looking in the log).  The backend shutdown is then forced by
> AbortBufferIO().
> 
> AbortBufferIO() seems rather badly designed, but given that it forces
> a database-wide restart, I'm not sure how this could relate to the
> later FlushRelationBuffers problem.  The restart should get rid of the
> old buffers anyway.
> 
> > This was the command which should create unique index.
> 
> Was the index on the same table that FlushRelationBuffers later had
> trouble with (ie, "pg_largeobject")?
> 
> What version are you running, anyway?  There is no "pg_largeobject"
> in either 6.5 or current AFAIK.

:-))) Sorry. Just to concatenate the pieces...
I use modified 7.0.2. I applied my patch for largeobject (that one with files in hash dirs).
That's why you can see pg_largeobject. But this is not an issue here. That patch modifies
only large object related stuff.

I get vacuum error first on pg_largeobject. Later index was automaticaly recreated (I have a cron job)
and all became fine.

And when I replied on your mail I get an error in table queue. It started when I noticed that
postmaster starts to eat up memory. I shut it down and look at the log. The last query was update
on queue table. I tried to vacuum the table and get the same error as in the last time.
Then I droped index and recreate it and all became fine.

When later I go through the reports of cron (I do dropping and recreateing of indices each day)
I found out the error message during recreating the index for this table. That is all.

-- 
Sincerely Yours,
Denis Perchine

----------------------------------
E-Mail: dyp@perchine.com
HomePage: http://www.perchine.com/dyp/
FidoNet: 2:5000/120.5
----------------------------------


RE: Fwd: Postgres update

От
"Hiroshi Inoue"
Дата:
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]

> > Because you changed bufmgr before 7.0,we may be able to avoid a
> > database-wide restart.  However don't we have to leave the buffer 
> > still dirty even after your change ?  If I recognize 
> correctly,we couldn't
> > know which backends dirtied the buffer.  Even though we could know
> > it,we couldn't know if the transactions which dirtied the 
> buffer are still
> > running.
> 
> Right, that's why we can't just mark the buffer broken and recycle it.
> If there is another transaction in progress that's dirtied that same
> buffer, it would come back, find the disk page no longer present in that
> buffer, conclude that its changes were successfully written, and go
> ahead and commit.  No good, it has to fail.  So, leave the buffer
> sitting there marked BM_DIRTY + BM_IO_ERROR.  When the other xact wants
> to commit, it will try to write the buffer, fail, and abort.  (Or,
> if perchance it doesn't fail, that's OK too.)
>

Agreed.
However,BM_DIRTY+BM_IO_ERROR buffers seems to have to be
excluded from freelist of buffers.  Then IO errors would be limited
to backends which really need to update the buffers.
> The only way to discard the buffer short of a postmaster reset would be
> if we knew no current transaction could have dirtied the buffer.  Hmm,
> I wonder if we could have VACUUM do that before it starts the main
> vacuuming process?  If we have exclusive lock on the table, then
> presumably there can be no other xacts with pending changes.  So maybe

You are right.

> the first step of VACUUM could be to sweep the buffer table for buffers
> of that relation that are marked BM_DIRTY + BM_IO_ERROR, and discard
> them?  Not sure about this.
>

Sounds reasonbale.
Regards.

Hiroshi Inoue


Re: Fwd: Postgres update

От
Tom Lane
Дата:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> However,BM_DIRTY+BM_IO_ERROR buffers seems to have to be
> excluded from freelist of buffers.  Then IO errors would be limited
> to backends which really need to update the buffers.

Good idea, that will prevent unrelated transactions from getting in
trouble by trying to flush and re-use the buffer.
        regards, tom lane