Обсуждение: BUG #5507: missing chunk number 0 for toast value XXXXX in pg_toast_XXXXX

Поиск
Список
Период
Сортировка

BUG #5507: missing chunk number 0 for toast value XXXXX in pg_toast_XXXXX

От
"Shinji Nakajima"
Дата:
The following bug has been logged online:

Bug reference:      5507
Logged by:          Shinji Nakajima
Email address:      sinakaj@jops.co.jp
PostgreSQL version: 8.3.8
Operating system:   Red Hat Enterprise Linux Server release 5.3 (Tikanga)
Description:        missing chunk number 0 for toast value XXXXX in
pg_toast_XXXXX
Details:

Error message called "missing chunk number" occurred when I did select of
the specific column of the specific table.
I did not update this record, but was in such a condition suddenly.
There seems to be the person that a similar phenomenon
occurs."http://www.ruizs.org/archives/138"
I delete a record, and the system restores, but prime cause is unknown.
Will this be a bug of the databases?

Re: BUG #5507: missing chunk number 0 for toast value XXXXX in pg_toast_XXXXX

От
Tom Lane
Дата:
"Shinji Nakajima" <sinakaj@jops.co.jp> writes:
> Error message called "missing chunk number" occurred when I did select of
> the specific column of the specific table.

This might indicate that the toast table's index was corrupted.

> I delete a record, and the system restores, but prime cause is unknown.
> Will this be a bug of the databases?

Perhaps, but there's not a lot we can do without a lot more information...

            regards, tom lane

Re: BUG #5507: missing chunk number 0 for toast value XXXXX in pg_toast_XXXXX

От
"Kevin Grittner"
Дата:
"Shinji Nakajima" <sinakaj@jops.co.jp> wrote:

> Error message called "missing chunk number" occurred when I did
> select of the specific column of the specific table.

> I delete a record, and the system restores, but prime cause is
> unknown.  Will this be a bug of the databases?

Errors like this are usually caused by hardware problems.  I think
the second-most common cause is running in a configuration with
fsync = off or full_page_writes = off, and suffering a power outage
or OS crash.  I would recommend that you check your configuration
for these unsafe settings and schedule a check of your hardware and
drivers.

-Kevin

Re: BUG #5507: missing chunk number 0 for toast value XXXXX in pg_toast_XXXXX

От
Greg Stark
Дата:
On Mon, Jun 14, 2010 at 11:28 AM, Shinji Nakajima <sinakaj@jops.co.jp> wrot=
e:
> PostgreSQL version: 8.3.8
> Description: =A0 =A0 =A0 =A0missing chunk number 0 for toast value XXXXX =
in
> pg_toast_XXXXX
>
> I delete a record, and the system restores, but prime cause is unknown.
> Will this be a bug of the databases?

Probably. Or possibly bad hardware. Assuming you didn't manually go in
and delete that record from the toast table, which would be a strange
thing to do.

The problem is it could have happened a long time ago and you just
discovered it now. Have you had any other significant events on this
machine? Any system crashes or power failures? Any drive crashes or
signs of bad memory?

In the postgres logs are there any instances of unusual error messages
or warnings?

--=20
greg

Re: BUG #5507: missing chunk number 0 for toast value XXXXX in pg_toast_XXXXX

От
中嶋 信二
Дата:
Thank you for a reply, everybody.


> On Mon, Jun 14, 2010 at 11:28 AM, Shinji Nakajima <sinakaj@jops.co.jp>
> wrote:
> > PostgreSQL version: 8.3.8
> > Description:        missing chunk number 0 for toast value XXXXX in
> > pg_toast_XXXXX
> >
> > I delete a record, and the system restores, but prime cause is iso-8859-1.
> > Will this be a bug of the databases?
> 
> Probably. Or possibly bad hardware. Assuming you didn't manually go in
> and delete that record from the toast table, which would be a strange
> thing to do.
>
The table restored.
However, there were tables when I checked the other tables.
Because primary key repeated in the same table, 
similar error message was displayed when I did select entirely.  


> The problem is it could have happened a long time ago and you just
> discovered it now. Have you had any other significant events on this
> machine? Any system crashes or power failures? Any drive crashes or
> signs of bad memory?
>
postgres is duplicated.
Red Hat Cluster Suite watches a process of each service.
PGDATA shares it in strage.

There is the thing that a wait server started. 
A cluster began the change disposal of servers. 
Because A cluster judged a state of postgres to be a stop.

I do not understand why duplex system to refer to same PGDATA was able to start.
I was able to surely carry out SQL by a psql command in duplex system.
I did not output log in those days.


> In the postgres logs are there any instances of unusual error messages
> or warnings?
> --
> greg
It continues, and an error occurs.
"could not read block 17 of relation 1663/16872/2840: read only 0 of 8192 bytes"

A data file seems to be broken...

Two postgres that PGDATA was shared will have started 
why if it was thought that it was caused by double start. 
Is there such a precedent?
Does a data file lead to the cause that failed?

Regards,
Nakajima



Re: BUG #5507: missing chunk number 0 for toast value XXXXX in pg_toast_XXXXX

От
"Kevin Grittner"
Дата:
中嶋 信二<sinakaj@jops.co.jp> wrote:

> postgres is duplicated.
> Red Hat Cluster Suite watches a process of each service.
> PGDATA shares it in strage.
>
> There is the thing that a wait server started.
> A cluster began the change disposal of servers.
> Because A cluster judged a state of postgres to be a stop.
>
> I do not understand why duplex system to refer to same PGDATA was
> able to start.
> I was able to surely carry out SQL by a psql command in duplex
> system.
> I did not output log in those days.

> Two postgres that PGDATA was shared will have started
> why if it was thought that it was caused by double start.
> Is there such a precedent?
> Does a data file lead to the cause that failed?

I'm not sure I totally understand, but it sounds like you had two
postmasters running against a single data directory.  If so, that
could cause all kinds of corruption.  It's hard to see how that
could happen unless you deleted a PostgreSQL data directory, or at
least the postmaster.pid file, while an instance was running.

I would start by capturing "ps auxf" output, to be able to
understand what postgres processes were running and when they
started.  Then I would probably make sure they all got stopped.
Then I would be seriously looking at restoring from backup, unless
this was a development database which could just be recreated from
scratch.

-Kevin