Обсуждение: FATAL: could not open relation xxx: No such file or directory

Поиск
Список
Период
Сортировка

FATAL: could not open relation xxx: No such file or directory

От
"Mikko Partio"
Дата:
Hello all

my struggle with the database continues (see earlier thread titled "too many trigger records found for relation xyz").

Today, I created yet another to table to the same database. Everything went ok, no errors or anything, but when I checked pg_tables -view I saw two tables with the same name. Instantly I queried pg_class and yes there was again two tables with same oid. I dropped the table before anything more serious could happen, but then postgres started to complain of "cache lookup failed for relation ...". I disconnected my psql session and tried to reconnect but failed to do so:

2008-04-09 16:39:25 EEST [18984]: [1-1] FATAL:  could not open relation 1663/16386/544592: No such file or directory

Indeed, there is no such file in that directory. I'm guessing that file is connected to the table I just dropped. Now, is there anything to do to get the database back online? I can still connect to other databases in the same instance.

Regards

Mikko


Re: FATAL: could not open relation xxx: No such file or directory

От
"Mikko Partio"
Дата:


On Wed, Apr 9, 2008 at 4:47 PM, Mikko Partio <mpartio@gmail.com> wrote:
Hello all

my struggle with the database continues (see earlier thread titled "too many trigger records found for relation xyz").

Today, I created yet another to table to the same database. Everything went ok, no errors or anything, but when I checked pg_tables -view I saw two tables with the same name. Instantly I queried pg_class and yes there was again two tables with same oid. I dropped the table before anything more serious could happen, but then postgres started to complain of "cache lookup failed for relation ...". I disconnected my psql session and tried to reconnect but failed to do so:

2008-04-09 16:39:25 EEST [18984]: [1-1] FATAL:  could not open relation 1663/16386/544592: No such file or directory

Indeed, there is no such file in that directory. I'm guessing that file is connected to the table I just dropped. Now, is there anything to do to get the database back online? I can still connect to other databases in the same instance

The cure was to create file 1663/16386/54459 8K in size with dd. The file in question was in fact the oid index on pg_class -- I had issued a REINDEX on pg_class just a moment before and apparantly something went wrong and the system lost track of the index. There was also two entries in pg_index for index pg_class_oid_index. After I removed the other entry and reindexed pg_class and pg_index, everything seems to be working ok. All the symptoms indicate that perhaps a xid wraparound had happened, but there is no such warning in logs and age(datfrozenxid) went never higher than say 250,000,000. Does anybody have a clue what might have happened?

Regards

Mikko



Re: FATAL: could not open relation xxx: No such file or directory

От
"Mikko Partio"
Дата:


On Tue, Apr 15, 2008 at 9:36 AM, Mikko Partio <mpartio@gmail.com> wrote:


On Wed, Apr 9, 2008 at 4:47 PM, Mikko Partio <mpartio@gmail.com> wrote:
Hello all

my struggle with the database continues (see earlier thread titled "too many trigger records found for relation xyz").

Today, I created yet another to table to the same database. Everything went ok, no errors or anything, but when I checked pg_tables -view I saw two tables with the same name. Instantly I queried pg_class and yes there was again two tables with same oid. I dropped the table before anything more serious could happen, but then postgres started to complain of "cache lookup failed for relation ...". I disconnected my psql session and tried to reconnect but failed to do so:

2008-04-09 16:39:25 EEST [18984]: [1-1] FATAL:  could not open relation 1663/16386/544592: No such file or directory

Indeed, there is no such file in that directory. I'm guessing that file is connected to the table I just dropped. Now, is there anything to do to get the database back online? I can still connect to other databases in the same instance

The cure was to create file 1663/16386/54459 8K in size with dd. The file in question was in fact the oid index on pg_class -- I had issued a REINDEX on pg_class just a moment before and apparantly something went wrong and the system lost track of the index. There was also two entries in pg_index for index pg_class_oid_index. After I removed the other entry and reindexed pg_class and pg_index, everything seems to be working ok. All the symptoms indicate that perhaps a xid wraparound had happened, but there is no such warning in logs and age(datfrozenxid) went never higher than say 250,000,000. Does anybody have a clue what might have happene

And now it has happened again. A CLUSTER operation was done on a table succesfully, afterwards when trying to access the table I get the error

2008-04-17 13:05:30 EEST [8435]: [32-1] ERROR:  could not open relation 1663/16386/359232: No such file or directory

Seems to me like VACUUM FULL, REINDEX and CLUSTER change the filename of a table and/or index and then fail to record the new name to system catalogues. Is this a known deficiency what can I do to stop this behaviour?

Regards

Mikko

Re: FATAL: could not open relation xxx: No such file or directory

От
"Pavan Deolasee"
Дата:
On Thu, Apr 17, 2008 at 3:38 PM, Mikko Partio <mpartio@gmail.com> wrote:

>
> 2008-04-17 13:05:30 EEST [8435]: [32-1] ERROR:  could not open relation
> 1663/16386/359232: No such file or directory
>

Looks like a corrupt index to me. DId you try REINDEX on the table ?

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Re: FATAL: could not open relation xxx: No such file or directory

От
"Mikko Partio"
Дата:


On Thu, Apr 17, 2008 at 1:36 PM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
On Thu, Apr 17, 2008 at 3:38 PM, Mikko Partio <mpartio@gmail.com> wrote:

>
> 2008-04-17 13:05:30 EEST [8435]: [32-1] ERROR:  could not open relation
> 1663/16386/359232: No such file or directory
>

Looks like a corrupt index to me. DId you try REINDEX on the table ?


Hi Pavan and thanks for your reply.

I tried to reindex the individual indexes in the table:

# reindex index xxx_idx;
ERROR:  could not open relation 1663/16386/359232: No such file or directory

Since I thought the trouble may lie in the system catalogue indexes I issued a REINDEX SYSTEM db, which went through with no errors. After that I tried to remove indexes from the table in question:

# drop index xxx_idx;
ERROR:  could not read block 0 of relation 1663/16386/2673: read only 0 of 8192 bytes

Hmm.. this is a different oid

# select 2673::regclass;
         regclass
--------------------------
 pg_depend_depender_index
(1 row)

But I just reindexed it!

# reindex table pg_depend;
WARNING:  could not remove relation 1663/16386/2673: No such file or directory
REINDEX

When I fire pg_dump to take a last minute backup I see this error:

pg_dump: Error message from server: ERROR:  could not open relation 1663/16386/544529: No such file or directory
pg_dump: The command was: SELECT tgname, tgfoid::pg_catalog.regproc as tgfname, tgtype, tgnargs, tgargs, tgenabled, tgisconstraint, tgconstrname, tgdeferrable, tgconstrrelid, tginitdeferred, tableoid, oid, tgconstrrelid::pg_catalog.regclass as tgconstrrelname from pg_catalog.pg_trigger t where tgrelid = '294134'::pg_catalog.oid and tgconstraint = 0

# reindex table pg_catalog.pg_trigger;
WARNING:  could not remove relation 1663/16386/544529: No such file or directory
REINDEX

Seems like the whole db is falling apart.

Regards

Mikko

Re: FATAL: could not open relation xxx: No such file or directory

От
Tom Lane
Дата:
"Mikko Partio" <mpartio@gmail.com> writes:
> Seems like the whole db is falling apart.

I think you've got really serious filesystem-level problems.  Have you
tried running any hardware diagnostics?  Are you sure you're using a
stable kernel version?

            regards, tom lane

Re: FATAL: could not open relation xxx: No such file or directory

От
"Mikko Partio"
Дата:


On Thu, Apr 17, 2008 at 6:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Mikko Partio" <mpartio@gmail.com> writes:
> Seems like the whole db is falling apart.

I think you've got really serious filesystem-level problems.  Have you
tried running any hardware diagnostics?  Are you sure you're using a
stable kernel version?

I run fsck on the filesystem (gfs) -- no problems found. The disks are from a san and the diagnostic programs say there's nothing wrong. I also have other db clusters running on different filesystems (also gfs) and I have never had any problems with them.

Regards

Mikko

Re: FATAL: could not open relation xxx: No such file or directory

От
Tom Lane
Дата:
"Mikko Partio" <mpartio@gmail.com> writes:
> On Thu, Apr 17, 2008 at 6:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think you've got really serious filesystem-level problems.  Have you
>> tried running any hardware diagnostics?  Are you sure you're using a
>> stable kernel version?

> I run fsck on the filesystem (gfs) -- no problems found. The disks are from
> a san and the diagnostic programs say there's nothing wrong. I also have
> other db clusters running on different filesystems (also gfs) and I have
> never had any problems with them.

Some RAM checks wouldn't be out of place either.

            regards, tom lane

Re: FATAL: could not open relation xxx: No such file or directory

От
"Mikko Partio"
Дата:


On Thu, Apr 17, 2008 at 6:59 PM, Mikko Partio <mpartio@gmail.com> wrote:


On Thu, Apr 17, 2008 at 6:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Mikko Partio" <mpartio@gmail.com> writes:
> Seems like the whole db is falling apart.

I think you've got really serious filesystem-level problems.  Have you
tried running any hardware diagnostics?  Are you sure you're using a
stable kernel version?

I run fsck on the filesystem (gfs) -- no problems found. The disks are from a san and the diagnostic programs say there's nothing wrong. I also have other db clusters running on different filesystems (also gfs) and I have never had any problems with them.

Oh yeah and the kernel version is 2.6.18-53.1.14.el5.

Regards

Mikko

Re: FATAL: could not open relation xxx: No such file or directory

От
"Mikko Partio"
Дата:


On Thu, Apr 17, 2008 at 7:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Mikko Partio" <mpartio@gmail.com> writes:
> On Thu, Apr 17, 2008 at 6:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think you've got really serious filesystem-level problems.  Have you
>> tried running any hardware diagnostics?  Are you sure you're using a
>> stable kernel version?

> I run fsck on the filesystem (gfs) -- no problems found. The disks are from
> a san and the diagnostic programs say there's nothing wrong. I also have
> other db clusters running on different filesystems (also gfs) and I have
> never had any problems with them.

Some RAM checks wouldn't be out of place either.

Hmm didn't think of that, will do that asap (tomorrow). Thanks for your help.

Regards

Mikko

Re: FATAL: could not open relation xxx: No such file or directory

От
"Mikko Partio"
Дата:


On Thu, Apr 17, 2008 at 7:08 PM, Mikko Partio <mpartio@gmail.com> wrote:


On Thu, Apr 17, 2008 at 7:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Mikko Partio" <mpartio@gmail.com> writes:
> On Thu, Apr 17, 2008 at 6:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think you've got really serious filesystem-level problems.  Have you
>> tried running any hardware diagnostics?  Are you sure you're using a
>> stable kernel version?

> I run fsck on the filesystem (gfs) -- no problems found. The disks are from
> a san and the diagnostic programs say there's nothing wrong. I also have
> other db clusters running on different filesystems (also gfs) and I have
> never had any problems with them.

Some RAM checks wouldn't be out of place either.


Memtest86+ has now been running for 20+ hours and no errors has been found. I was also unable to reproduce this problem, but it only happened after a few days of constant activity anyway so I guess it's not so easy to replicate. Any other pointers where to look at? Your help is well appreciated.

Regards

Mikko

Re: FATAL: could not open relation xxx: No such file or directory

От
Michael Monnerie
Дата:
On Donnerstag, 17. April 2008 Mikko Partio wrote:
> I run fsck on the filesystem (gfs) -- no problems found. The disks
> are from a san and the diagnostic programs say there's nothing wrong.
> I also have other db clusters running on different filesystems (also
> gfs) and I have never had any problems with them.

A bit OT, but maybe related: I have similar strangeness with a Linux box
with Areca controller. On this box, the reiserfs filesystem starts
getting seriously damaged after some time. Memtest showed no problems,
and everything looks fine. Today we will replace the mainboard, it
could have an internal problem (transport from memory to controller
broken?).

What I had twice (on different customers, once SCSI once SATA) is that a
broken hard disk reports no errors, but delivers different data than
what was written before. Very nasty, as the RAID controller doesn't see
any problem, and destroys even the good harddisks data after the next
write, because the read data is already broken.

HTH, good luck.

mfg zmi
--
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0676/846 914 666                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: www.keyserver.net                   Key-ID: 1C1209B4

Вложения

Re: FATAL: could not open relation xxx: No such file or directory

От
"Mikko Partio"
Дата:


On Tue, Apr 22, 2008 at 12:02 PM, Michael Monnerie <michael.monnerie@it-management.at> wrote:
What I had twice (on different customers, once SCSI once SATA) is that a
broken hard disk reports no errors, but delivers different data than
what was written before. Very nasty, as the RAID controller doesn't see
any problem, and destroys even the good harddisks data after the next
write, because the read data is already broken.

How have you recognized such a hard disk?

Regards

Mikko

Re: FATAL: could not open relation xxx: No such file or directory

От
Michael Monnerie
Дата:
On Dienstag, 22. April 2008 Mikko Partio wrote:
> How have you recognized such a hard disk?

With "badblocks", which writes some patterns and re-reads it. But it's
of course annoying slow. At these servers I was lucky. Both were "only"
73GB disks used in a RAID-1, so only 2 small drives to check. With a
RAID of 8x750GB disks, it will take a *long* time to check, if you
cannot simply replace all disks at once. At this customer from today, I
would have to take one drive, check it, replace it, let RAID rebuild
CRC, take the next... a new mainboard is less work, so I try this
first.

mfg zmi
--
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0676/846 914 666                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: www.keyserver.net                   Key-ID: 1C1209B4

Вложения