Re: Hard limit on WAL space used (because PANIC sucks)

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: Hard limit on WAL space used (because PANIC sucks)
Дата
Msg-id CA+U5nM+ipFyK_cNkP=NdF72mTpMzyv=hsFaN-Si1Wx=85PmP+A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Hard limit on WAL space used (because PANIC sucks)  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Hard limit on WAL space used (because PANIC sucks)  (Greg Stark <stark@mit.edu>)
Список pgsql-hackers
On 21 January 2014 18:35, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On 6 June 2013 16:00, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
>>> The current situation is that if you run out of disk space while writing
>>> WAL, you get a PANIC, and the server shuts down. That's awful.
>
>> I don't see we need to prevent WAL insertions when the disk fills. We
>> still have the whole of wal_buffers to use up. When that is full, we
>> will prevent further WAL insertions because we will be holding the
>> WALwritelock to clear more space. So the rest of the system will lock
>> up nicely, like we want, apart from read-only transactions.
>
> I'm not sure that "all writing transactions lock up hard" is really so
> much better than the current behavior.

Lock up momentarily, until the situation clears. But my proposal would
allow the situation to fully clear, i.e. all WAL files could be
deleted as soon as replication/archiving has caught up. The current
behaviour doesn't automatically correct itself as this proposal would.
My proposal is also fully safe in line with synchronous replication,
as well as zero performance overhead for mainline processing.

> My preference would be that we simply start failing writes with ERRORs
> rather than PANICs.

Yes, that is what I am proposing, amongst other points.

> I'm not real sure ATM why this has to be a PANIC
> condition.  Probably the cause is that it's being done inside a critical
> section, but could we move that?

Yes, I think so.

>> Instead of PANICing, we should simply signal the checkpointer to
>> perform a shutdown checkpoint.
>
> And if that fails for lack of disk space?

I proposed a way to ensure it wouldn't fail for that, at least on pg_xlog space.

> In any case, what you're
> proposing sounds like a lot of new complication in a code path that
> is necessarily never going to be terribly well tested.

It's the smallest amount of change proposed so far... I agree on the
danger of untested code.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Incorrectly reporting config errors
Следующее
От: Adrian Klaver
Дата:
Сообщение: Re: Incorrectly reporting config errors