Re: Postgres, fsync, and OSs (specifically linux)

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: Postgres, fsync, and OSs (specifically linux)
Дата
Msg-id 20180427233830.GB32605@momjian.us
обсуждение исходный текст
Ответ на Re: Postgres, fsync, and OSs (specifically linux)  (Andres Freund <andres@anarazel.de>)
Ответы Re: Postgres, fsync, and OSs (specifically linux)  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Fri, Apr 27, 2018 at 04:10:43PM -0700, Andres Freund wrote:
> Hi,
> 
> On 2018-04-27 19:04:47 -0400, Bruce Momjian wrote:
> > On Fri, Apr 27, 2018 at 03:28:42PM -0700, Andres Freund wrote:
> > > - We need more aggressive error checking on close(), for ENOSPC and
> > >   EIO. In both cases afaics we'll have to trigger a crash recovery
> > >   cycle. It's entirely possible to end up in a loop on NFS etc, but I
> > >   don't think there's a way around that.
> > 
> > If the no-space or write failures are persistent, as you mentioned
> > above, what is the point of going into crash recovery --- why not just
> > shut down?
> 
> Well, I mentioned that as an alternative in my email. But for one we
> don't really have cases where we do that right now, for another we can't
> really differentiate between a transient and non-transient state. It's
> entirely possible that the admin on the system that ran out of space
> fixes things, clearing up the problem.

True, but if we get a no-space error, odds are it will not be fixed at
the time we are failing.  Wouldn't the administrator check that the
server is still running after they free the space?

> > Also, since we can't guarantee that we can write any persistent state
> > to storage, we have no way of preventing infinite crash recovery
> > loops, which, based on inconsistent writes, might make things worse.
> 
> How would it make things worse?

Uh, I can imagine some writes working and some not, and getting things
more inconsistent.  I would say at least that we don't know.

> > An additional features we have talked about is running some kind of
> > notification shell script to inform administrators, similar to
> > archive_command.  We need this too when sync replication fails.
> 
> To me that seems like a feature independent of this thread.

Well, if we are introducing new panic-and-not-restart behavior, we might
need this new feature.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: FinishPreparedTransaction missing HOLD_INTERRUPTS section
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Postgres, fsync, and OSs (specifically linux)