Re: Postgres error: could not open relation base/xxxxx/yyyyy

Поиск
Список
Период
Сортировка
От Achilleas Mantzios
Тема Re: Postgres error: could not open relation base/xxxxx/yyyyy
Дата
Msg-id 201011161031.19910.achill@matrix.gatewaynet.com
обсуждение исходный текст
Ответ на Postgres error: could not open relation base/xxxxx/yyyyy  (Pablo Delgado Díaz-Pache <delgadop@gmail.com>)
Список pgsql-admin
Also you should check for any CPU malfunction, overheating, etc..
and for motherboard problems as well, power supply, etc...

Anyway, your problem is not postgresql (postgresql does not delete its own files).
Your problem is below your OS and everything that comes underneath.
What do the linux logs look like? Any hangups? Any crashes at OS level?

Στις Monday 15 November 2010 21:04:40 γράψατε:
> I found the memory is indeed ECC
> http://accessories.us.dell.com/sna/productdetail.aspx?c=us&l=en&s=biz&cs=05&sku=a2257179
>
> Type: DRAM
> Technology: DDR2 SDRAM
> Form Factor: FB-DIMM 240-pin
> Memory Speed: 667 MHz ( PC2-5300 )
> Data Integrity Check: ECC
> RAM Features: Fully buffered
>
> 2010/11/15 Pablo Delgado DΓ­az-Pache <delgadop@gmail.com>
>
> > Thanks again. We will go to the datacenter and run the both the memory and
> > fs check.
> >
> > Could you think something besides the memory hw problem could be causing
> > this postgres problem? or you think it's a 99% hardware problem?
> >
> > In the meantime and in case it helps here it goes the memory info of the
> > server:
> >
> > # cat /proc/meminfo
> > MemTotal:     28818256 kB
> > MemFree:      10277776 kB
> > Buffers:        554956 kB
> > Cached:       17388880 kB
> > SwapCached:          0 kB
> > Active:       14245932 kB
> > Inactive:      3791952 kB
> > HighTotal:           0 kB
> > HighFree:            0 kB
> > LowTotal:     28818256 kB
> > LowFree:      10277776 kB
> > SwapTotal:     4192924 kB
> > SwapFree:      4192924 kB
> > Dirty:             348 kB
> > Writeback:         124 kB
> > AnonPages:       93824 kB
> > Mapped:        5699404 kB
> > Slab:           448360 kB
> > PageTables:      16808 kB
> > NFS_Unstable:        0 kB
> > Bounce:              0 kB
> > CommitLimit:  18602052 kB
> > Committed_AS: 15344816 kB
> > VmallocTotal: 34359738367 kB
> > VmallocUsed:    265116 kB
> > VmallocChunk: 34359473063 kB
> > HugePages_Total:     0
> > HugePages_Free:      0
> > HugePages_Rsvd:      0
> > Hugepagesize:     2048 kB
> >
> > and more information ...
> >
> > # dmidecode --type 17
> > # dmidecode 2.10
> > SMBIOS 2.4 present.
> >
> > Handle 0x1100, DMI type 17, 28 bytes
> > Memory Device
> >         Array Handle: 0x1000
> >         Error Information Handle: Not Provided
> >         Total Width: 72 bits
> >         Data Width: 64 bits
> >         Size: 2048 MB
> >         Form Factor: FB-DIMM
> >         Set: 1
> >         Locator: DIMM1
> >         Bank Locator: Not Specified
> >         Type: DDR2 FB-DIMM
> >         Type Detail: Synchronous
> >         Speed: 667 MHz
> >         Manufacturer: 80AD808980AD
> >         Serial Number: 0394080B
> >         Asset Tag: 010802
> >         Part Number: HYMP525F72CP4N3-Y5
> >         Rank: 2
> >
> > Handle 0x1101, DMI type 17, 28 bytes
> > Memory Device
> >         Array Handle: 0x1000
> >         Error Information Handle: Not Provided
> >         Total Width: 72 bits
> >         Data Width: 64 bits
> >         Size: 2048 MB
> >         Form Factor: FB-DIMM
> >         Set: 1
> >         Locator: DIMM2
> >         Bank Locator: Not Specified
> >         Type: DDR2 FB-DIMM
> >         Type Detail: Synchronous
> >         Speed: 667 MHz
> >         Manufacturer: 80AD808980AD
> >         Serial Number: 0394040C
> >         Asset Tag: 010802
> >         Part Number: HYMP525F72CP4N3-Y5
> >         Rank: 2
> >
> > Handle 0x1102, DMI type 17, 28 bytes
> > Memory Device
> >         Array Handle: 0x1000
> >         Error Information Handle: Not Provided
> >         Total Width: 72 bits
> >         Data Width: 64 bits
> >         Size: 4096 MB
> >         Form Factor: FB-DIMM
> >         Set: 2
> >         Locator: DIMM3
> >         Bank Locator: Not Specified
> >         Type: DDR2 FB-DIMM
> >         Type Detail: Synchronous
> >         Speed: 667 MHz
> >         Manufacturer: 830B7FB3830B
> >         Serial Number: 151A2A5F
> >         Asset Tag: 0D0938
> >         Part Number: NT4GT72U4ND2BD-3C
> >         Rank: 2
> >
> > Handle 0x1103, DMI type 17, 28 bytes
> > Memory Device
> >         Array Handle: 0x1000
> >         Error Information Handle: Not Provided
> >         Total Width: 72 bits
> >         Data Width: 64 bits
> >         Size: 4096 MB
> >         Form Factor: FB-DIMM
> >         Set: 2
> >         Locator: DIMM4
> >         Bank Locator: Not Specified
> >         Type: DDR2 FB-DIMM
> >         Type Detail: Synchronous
> >         Speed: 667 MHz
> >         Manufacturer: 830B7FB3830B
> >         Serial Number: 6D1D2A5F
> >         Asset Tag: 0D0938
> >         Part Number: NT4GT72U4ND2BD-3C
> >         Rank: 2
> >
> > Handle 0x1104, DMI type 17, 28 bytes
> > Memory Device
> >         Array Handle: 0x1000
> >         Error Information Handle: Not Provided
> >         Total Width: 72 bits
> >         Data Width: 64 bits
> >         Size: 4096 MB
> >         Form Factor: FB-DIMM
> >         Set: 3
> >         Locator: DIMM5
> >         Bank Locator: Not Specified
> >         Type: DDR2 FB-DIMM
> >         Type Detail: Synchronous
> >         Speed: 667 MHz
> >         Manufacturer: 830B7FB3830B
> >         Serial Number: C43D2B5C
> >         Asset Tag: 0D0934
> >         Part Number: NT4GT72U4ND2BD-3C
> >         Rank: 2
> >
> > Handle 0x1105, DMI type 17, 28 bytes
> > Memory Device
> >         Array Handle: 0x1000
> >         Error Information Handle: Not Provided
> >         Total Width: 72 bits
> >         Data Width: 64 bits
> >         Size: 4096 MB
> >         Form Factor: FB-DIMM
> >         Set: 3
> >         Locator: DIMM6
> >         Bank Locator: Not Specified
> >         Type: DDR2 FB-DIMM
> >         Type Detail: Synchronous
> >         Speed: 667 MHz
> >         Manufacturer: 830B7FB3830B
> >         Serial Number: 9F472B5F
> >         Asset Tag: 0D0934
> >         Part Number: NT4GT72U4ND2BD-3C
> >         Rank: 2
> >
> > Handle 0x1106, DMI type 17, 28 bytes
> > Memory Device
> >         Array Handle: 0x1000
> >         Error Information Handle: Not Provided
> >         Total Width: 72 bits
> >         Data Width: 64 bits
> >         Size: 4096 MB
> >         Form Factor: FB-DIMM
> >         Set: 4
> >         Locator: DIMM7
> >         Bank Locator: Not Specified
> >         Type: DDR2 FB-DIMM
> >         Type Detail: Synchronous
> >         Speed: 667 MHz
> >         Manufacturer: 830B7FB3830B
> >         Serial Number: C51E2A5F
> >         Asset Tag: 0D0938
> >         Part Number: NT4GT72U4ND2BD-3C
> >         Rank: 2
> >
> > Handle 0x1107, DMI type 17, 28 bytes
> > Memory Device
> >         Array Handle: 0x1000
> >         Error Information Handle: Not Provided
> >         Total Width: 72 bits
> >         Data Width: 64 bits
> >         Size: 4096 MB
> >         Form Factor: FB-DIMM
> >         Set: 4
> >         Locator: DIMM8
> >         Bank Locator: Not Specified
> >         Type: DDR2 FB-DIMM
> >         Type Detail: Synchronous
> >         Speed: 667 MHz
> >         Manufacturer: 830B7FB3830B
> >         Serial Number: 1E1E2A5B
> >         Asset Tag: 0D0938
> >         Part Number: NT4GT72U4ND2BD-3C
> >         Rank: 2
> >
> >
> > On Mon, Nov 15, 2010 at 2:05 PM, Achilleas Mantzios <
> > achill@matrix.gatewaynet.com> wrote:
> >
> >> Whoever did the memory installation, should also have taken care of
> >> checking it's working ok.
> >> A good idea is to use ECC memory.
> >> Does fsck on the remote system run with reporting problems as well?? (it
> >> might report problems such as incorrect block counts, missing inodes etc...)
> >>
> >> Also check for disk, controller problems.
> >>
> >> Στις Monday 15 November 2010 14:42:32 ΞΏ/Ξ· Pablo Delgado DΓ­az-Pache έγραψΡ:
> >> > Hi Achilleas,
> >> >
> >> > Thanks for your quick response.
> >> > We do have backups and a couple of spare servers running in parallel so
> >> > we're safe in that sense. Thanks for your advice anyway.
> >> >
> >> > To run *memtest86 *we have to go to the datacenter and that will take us
> >> a
> >> > few days. Is there anything we can do remotely?
> >> >
> >> > Could it be anything different than memory HW problem?
> >> > Btw, we are not using that server at the moment :)
> >> >
> >> > Thanks
> >> >
> >> > On Mon, Nov 15, 2010 at 12:03 PM, Achilleas Mantzios <
> >> > achill@matrix.gatewaynet.com> wrote:
> >> >
> >> > > Stop using the system immediately, since many things inserted to the
> >> DB
> >> > > might simply be garbage.
> >> > > Inspect your memory with memtest86.
> >> > >
> >> > > I would even suggest moving to a new HW if available, and start
> >> working
> >> > > into two parallel directions:
> >> > >
> >> > > a) try to bring your DB into a sane state
> >> > > b) try to fix your HW server, which apparently has problems.
> >> > >
> >> > > Files are not "disappearing" like that, in normal situations.
> >> > > You cannot trust your current HW to run postgresql. Its a guarantee
> >> method
> >> > > to lose your data, if you
> >> > > go on by this start/stop/restore/pray circle.
> >> > >
> >> > > IMPORTANT: Handle your old (pre-memory upgrade) backups like expensive
> >> > > chinese vases.
> >> > > Start from those good and trustworthy backups to do any database
> >> "replay"
> >> > > (or similar) work which
> >> > > would lead to a viable situation.
> >> > >
> >> > > Στις Monday 15 November 2010 11:55:38 ΞΏ/Ξ· Pablo Delgado DΓ­az-Pache
> >> έγραψΡ:
> >> > > > Hi all,
> >> > > >
> >> > > > We've been using postgres for 9 years without a problem until now!
> >> Two
> >> > > > problems in a very short time!
> >> > > > The first one is described in
> >> > > >
> >> > >
> >>
http://postgresql.1045698.n5.nabble.com/Autovacuum-seems-to-block-database-WARNING-worker-took-too-long-to-start-td3264261.html
> >> > > > This is another one (not related I think) ...
> >> > > >
> >> > > > Postgres server is usually working fine. All of a sudden we start
> >> getting
> >> > > > these errors ...
> >> > > >
> >> > > > * **2010-11-09 11:49:15.320
> >> > > >
> >> CET|2|database1|10.19.0.51(18895)|20929|SELECT|4cd926fd.51c1|2010-11-09
> >> > > > 11:48:29 CET|10/417796|1390150|postgres| LOG:  duration: 1518.422 ms
> >> > > >  execute <unnamed>: SELECT id_token_fk,xxxxxxxxx ORDER BY avadate*
> >> > > > * **2010-11-09 11:52:25.364
> >> > > >
> >> CET|1|database1|10.19.0.51(23286)|21566|PARSE|4cd927cf.543e|2010-11-09
> >> > > > 11:51:59 CET|7/430041|0|postgres| ERROR:  could not open relation
> >> > > > base/273198960/273198979: No such file or directory*
> >> > > > * **2010-11-09 11:52:25.364
> >> > > >
> >> CET|2|database1|10.19.0.51(23286)|21566|PARSE|4cd927cf.543e|2010-11-09
> >> > > > 11:51:59 CET|7/430041|0|postgres| STATEMENT:  SELECT
> >> > > id_token_fkxxxxxxxxxxx
> >> > > >  ORDER BY avadate*
> >> > > > * **2010-11-09 11:52:29.981
> >> > > >
> >> CET|3|database1|10.19.0.51(23286)|21566|PARSE|4cd927cf.543e|2010-11-09
> >> > > > 11:51:59 CET|7/430049|0|postgres| ERROR:  could not open relation
> >> > > > base/273198960/273199235: No such file or directory*
> >> > > > * **2010-11-09 11:52:30.988
> >> > > >
> >> CET|6|database1|10.19.0.51(23286)|21566|PARSE|4cd927cf.543e|2010-11-09
> >> > > > 11:51:59 CET|7/430050|0|postgres| STATEMENT:  SELECT max(avadate)
> >> > > xxxxxxxx
> >> > > > 32036)*
> >> > > > * **2010-11-09 11:53:36.346
> >> > > >
> >> CET|16|database2|10.19.0.42(44916)|22107|SELECT|4cd9280e.565b|2010-11-09
> >> > > > 11:53:02 CET|94/516004|0|postgres| STATEMENT:  SELECT * FROM
> >> "photos"
> >> > > > xxxxxxxxxxxxxx LIMIT 1*
> >> > > > * **2010-11-09 11:53:37.956
> >> > > >
> >> CET|17|database2|10.19.0.42(44916)|22107|SELECT|4cd9280e.565b|2010-11-09
> >> > > > 11:53:02 CET|94/516025|0|postgres| ERROR:  could not open relation
> >> > > > base/271253899/271254075: No such file or directory*
> >> > > > * **................*
> >> > > > * **................*
> >> > > > * **2010-11-09 11:53:55.560
> >> CET|111|||26090||4cc6e970.65ea|2010-10-26
> >> > > > 16:45:04 CEST||0|| ERROR:  could not open relation
> >> > > base/273198960/273199235:
> >> > > > No such file or directory*
> >> > > > * **2010-11-09 11:53:55.560
> >> CET|112|||26090||4cc6e970.65ea|2010-10-26
> >> > > > 16:45:04 CEST||0|| CONTEXT:  writing block 8866 of relation
> >> > > > base/273198960/273199235*
> >> > > > * **2010-11-09 11:53:55.560
> >> CET|113|||26090||4cc6e970.65ea|2010-10-26
> >> > > > 16:45:04 CEST||0|| WARNING:  could not write block 8866 of
> >> > > > base/273198960/273199235*
> >> > > > * **2010-11-09 11:53:55.560
> >> CET|114|||26090||4cc6e970.65ea|2010-10-26
> >> > > > 16:45:04 CEST||0|| DETAIL:  Multiple failures --- write error might
> >> be
> >> > > > permanent.*
> >> > > > * **2010-11-09 11:53:56.590
> >> CET|115|||26090||4cc6e970.65ea|2010-10-26
> >> > > > 16:45:04 CEST||0|| ERROR:  could not open relation
> >> > > base/273198960/273199235:
> >> > > > No such file or directory*
> >> > > > * **2010-11-09 11:53:56.590
> >> CET|116|||26090||4cc6e970.65ea|2010-10-26
> >> > > > 16:45:04 CEST||0|| CONTEXT:  writing block 8866 of relation
> >> > > > base/273198960/273199235*
> >> > > >
> >> > > > See that there are 2 different databases involved. (database1 and
> >> > > > database2).
> >> > > >
> >> > > > Looking for distinct errors (among the many we have in the log) I
> >> find
> >> > > there
> >> > > > are only 4 involved ...
> >> > > >
> >> > > > * **base/271253899/271254075*
> >> > > > * **base/273198960/273198979*
> >> > > > * **base/273198960/273199235*
> >> > > > * **base/273198960/273199253*
> >> > > >
> >> > > > and those files are not in the postgres base directory.
> >> > > >
> >> > > > To fix it we have no option but to restart postgres (which restarts
> >> fine
> >> > > > with a /etc/init.d/postgresql stop & start)
> >> > > > However, once we restarted postgres some data was corrupted. Tables
> >> that
> >> > > > used to have 4,5 million rows had only 60 rows. As a consequence we
> >> had
> >> > > to
> >> > > > restore from file system backup.
> >> > > > Once we did that, it worked fine for a few days until it happened
> >> again.
> >> > > > We're worried it can happen again!
> >> > > >
> >> > > > Could this error be a hardware problem?
> >> > > > We recently increased the memory from 8GB to 28GB, although it was
> >> > > working
> >> > > > fine for more than 3 weeks.
> >> > > > We also recently upgraded from postgres 8.3.6 to 8.4.5, althought it
> >> also
> >> > > > worked fine for a few months.
> >> > > > Upgrading to postgres 9 is easy for us. however, not sure that would
> >> > > help.
> >> > > >
> >> > > > Some info of our server:
> >> > > >
> >> > > > OS: Centos 5.5
> >> > > > Kernel: 2.6.18-194.1.el5
> >> > > > Postgres version: 8.4.5 (installation out-of-the-box using yum)
> >> > > > Server memory: 28GB
> >> > > >
> >> > > >
> >> > > > Any help would be appreciated
> >> > > >
> >> > > > Pablo
> >> > > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Achilleas Mantzios
> >> > >
> >> > > --
> >> > > Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
> >> > > To make changes to your subscription:
> >> > > http://www.postgresql.org/mailpref/pgsql-admin
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Achilleas Mantzios
> >>
> >> --
> >> Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
> >> To make changes to your subscription:
> >> http://www.postgresql.org/mailpref/pgsql-admin
> >>
> >
> >
>



--
Achilleas Mantzios

В списке pgsql-admin по дате отправления:

Предыдущее
От: Scott Marlowe
Дата:
Сообщение: Re: ignoring data type
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: Autovacuum seems to block database: WARNING worker took too long to start