Re: Hard limit on WAL space used (because PANIC sucks)

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Hard limit on WAL space used (because PANIC sucks)
Дата
Msg-id 20130608183326.GB28471@awork2.anarazel.de
обсуждение исходный текст
Ответ на Re: Hard limit on WAL space used (because PANIC sucks)  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Список pgsql-hackers
On 2013-06-07 12:02:57 +0300, Heikki Linnakangas wrote:
> On 07.06.2013 00:38, Andres Freund wrote:
> >On 2013-06-06 23:28:19 +0200, Christian Ullrich wrote:
> >>* Heikki Linnakangas wrote:
> >>
> >>>The current situation is that if you run out of disk space while writing
> >>>WAL, you get a PANIC, and the server shuts down. That's awful. We can
> >>
> >>>So we need to somehow stop new WAL insertions from happening, before
> >>>it's too late.
> >>
> >>>A naive idea is to check if there's enough preallocated WAL space, just
> >>>before inserting the WAL record. However, it's too late to check that in
> >>
> >>There is a database engine, Microsoft's "Jet Blue" aka the Extensible
> >>Storage Engine, that just keeps some preallocated log files around,
> >>specifically so it can get consistent and halt cleanly if it runs out of
> >>disk space.
> >>
> >>In other words, the idea is not to check over and over again that there is
> >>enough already-reserved WAL space, but to make sure there always is by
> >>having a preallocated segment that is never used outside a disk space
> >>emergency.
> >
> >That's not a bad technique. I wonder how reliable it would be in
> >postgres.
> 
> That's no different from just having a bit more WAL space in the first
> place. We need a mechanism to stop backends from writing WAL, before you run
> out of it completely. It doesn't matter if the reservation is done by
> stashing away a WAL segment for emergency use, or by a variable in shared
> memory. Either way, backends need to stop using it up, by blocking or
> throwing an error before they enter the critical section.

Well, if you have 16 or 32MB of reserved WAL space available you don't
need to judge all that precisely how much space is available.

So we can just sprinkle some EnsureXLogHasSpace() on XLogInsert()
callsites like heap_insert(), but we can do that outside of the critical
sections and we can do it without locks since there needs to happen
quite some write activity to overrun the reserved space. Anything that
desparately needs to write stuff, like the end of recovery checkpoint,
can just not call EnsureXLogHasSpace() and rely on the reserved space.

Seems like 90% of the solution for 30% of the complexity or so.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Hard limit on WAL space used (because PANIC sucks)
Следующее
От: Kevin Grittner
Дата:
Сообщение: Re: system catalog pg_rewrite column ev_attr document description problem