Re: Postgres crash? could not write to log file: No spaceleft on device

Поиск
Список
Период
Сортировка
От Yuri Levinsky
Тема Re: Postgres crash? could not write to log file: No spaceleft on device
Дата
Msg-id B72526FA2066E344AFD09734A487318103E93431@falcon1.celltick.com
обсуждение исходный текст
Ответ на Re: Postgres crash? could not write to log file: No space left on device  (Andres Freund <andres@2ndquadrant.com>)
Список pgsql-bugs
Dear All,
I succeed to find some Postgres note where similar issue is related to
NFS mount option 'intr'. It was recommended to use 'hard,nointr' on
Solaris NFS or do not use NFS at all. The default on SUN is 'intr' and
on Linux is 'nointr', even not specified. Surprisingly the issue doesn't
happens on any my Linux installations. According to the very long note
the issue happens during heavy load, when PostgreSQL doesn't have answer
during some time. By the way adding 'noac' mount option prevent
successfully reproduced by me the "buggy kernel" error that also
followed by PostgreSQL crash on this my particular system. All this
leads me to conclusion that something basically wrong: these mount
options have to be documented somewhere as recommended for NFS or log/bg
writers have to take it somehow into account.
 Can this be confirmed as a bug? Can it be fixed in some nearest future?



2013-07-02 01:37:58 GMTXX000PANIC:  could not write to file
"pg_xlog/xlogtemp.24205": Interrupted system call
2013-07-02 01:40:52 GMT00000LOG:  WAL writer process (PID 24205) was
terminated by signal 6
2013-07-02 01:40:52 GMT00000LOG:  terminating any other active server
processes
2013-07-02 01:40:52 GMT57P03FATAL:  the database system is in recovery
mode

2013-07-02 02:17:41 GMT57P03FATAL:  the database system is in recovery
mode
2013-07-02 02:17:41 GMT00000LOG:  autovacuum launcher started
2013-07-02 02:17:41 GMT00000LOG:  database system is ready to accept
connections
2013-07-02 02:44:50 GMTXX000PANIC:  could not write to file
"pg_xlog/xlogtemp.14855": Interrupted system call
2013-07-02 02:48:02 GMT00000LOG:  WAL writer process (PID 14855) was
terminated by signal 6
2013-07-02 02:48:02 GMT00000LOG:  terminating any other active server
processes
2013-07-02 02:48:02 GMT57P03FATAL:  the database system is in recovery
mode

2013-07-02 04:15:49 GMTXX000PANIC:  could not open file
"pg_xlog/00000001000000B9000000C9" (log file 185, segment 201):
Interrupted system call
2013-07-02 04:18:55 GMT00000LOG:  WAL writer process (PID 2296) was
terminated by signal 6
2013-07-02 04:18:55 GMT00000LOG:  terminating any other active server
processes
2013-07-02 04:18:55 GMT57P03FATAL:  the database system is in recovery
mode





Sincerely yours,


Yuri Levinsky, DBA
Celltick Technologies Ltd., 32 Maskit St., Herzliya 46733, Israel
Mobile: +972 54 6107703, Office: +972 9 9710239; Fax: +972 9 9710222

-----Original Message-----
From: Andres Freund [mailto:andres@2ndquadrant.com]=20
Sent: Wednesday, June 26, 2013 4:04 PM
To: Heikki Linnakangas
Cc: Greg Stark; Tom Lane; Jeff Davis; Yuri Levinsky;
pgsql-bugs@postgresql.org
Subject: Re: [BUGS] Postgres crash? could not write to log file: No
spaceleft on device

On 2013-06-26 15:40:08 +0300, Heikki Linnakangas wrote:
> On 26.06.2013 15:21, Andres Freund wrote:
> >On 2013-06-26 13:14:37 +0100, Greg Stark wrote:
> >>On Wed, Jun 26, 2013 at 12:57 AM, Tom Lane<tgl@sss.pgh.pa.us>
wrote:
> >>>  (Though if it is, it's not apparent why such failures would only=20
> >>>be manifesting on the pg_xlog files and not for anything else.)
> >>
> >>Well data files are only ever written to in 8k chunks. Maybe these=20
> >>errors are only occuring on>8k xlog records such as records with=20
> >>multiple full page images. I'm not sure how much we write for other=20
> >>types of files but they won't be written to as frequently as xlog or

> >>data files and might not cause errors that are as noticeable.
> >
> >We only write xlog in XLOG_BLCKSZ units - which is 8kb by default as=20
> >well...
>=20
> Actually, XLogWrite() writes multiple pages at once. If all=20
> wal_buffers are dirty, it can try to write them all in one write()
call.

Oh. Misremembered that.

> We've discussed retrying short writes before, and IIRC Tom has argued=20
> that it shouldn't be necessary when writing to disk. Nevertheless, I=20
> think we should retry in XLogWrite(). It can write much bigger chunks=20
> than most
> write() calls, so there's more room for a short write to happen t$here

> if it can happen at all. Secondly, it PANICs on failure, so it would=20
> be nice to try a bit harder to avoid that.

At the very least we should log the amount of bytes actually writen if
it was a short write to make it possible to discern that case from the
direct ENOSPC response.

This might also be caused by the fact that until recently the SIGALRM
handler didn't set SA_RESTART... If a backend decided to write out the
xlog directly it very well might have an active alarm...

Greetings,

Andres Freund

--=20
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

This mail was received via Mail-SeCure System.

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Rushabh Lathia
Дата:
Сообщение: Re: BUG #8275: Updateable View based on inheritance (partition) throws Error on INSERT Statement
Следующее
От: Dean Rasheed
Дата:
Сообщение: Re: BUG #8275: Updateable View based on inheritance (partition) throws Error on INSERT Statement