Re: Hard limit on WAL space used (because PANIC sucks)

Поиск

Список

Период

Сортировка

От	Jeff Janes
Тема	Re: Hard limit on WAL space used (because PANIC sucks)
Дата	7 июня 2013 г. 04:30:56
Msg-id	CAMkU=1wR3R_P2s6=J6qmL+V6ox62UBSAWscyycii-soU6YfHMQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Hard limit on WAL space used (because PANIC sucks) (Josh Berkus <josh@agliodbs.com>)
Ответы	Re: Hard limit on WAL space used (because PANIC sucks) Re: Hard limit on WAL space used (because PANIC sucks)
Список	pgsql-hackers

Дерево обсуждения

On Thursday, June 6, 2013, Josh Berkus wrote:

Let's talk failure cases.

There's actually three potential failure cases here:

- One Volume: WAL is on the same volume as PGDATA, and that volume is
completely out of space.

- XLog Partition: WAL is on its own partition/volume, and fills it up.

- Archiving: archiving is failing or too slow, causing the disk to fill
up with waiting log segments.

I'll argue that these three cases need to be dealt with in three
different ways, and no single solution is going to work for all three.

Archiving
---------

In some ways, this is the simplest case. Really, we just need a way to
know when the available WAL space has become 90% full, and abort
archiving at that stage. Once we stop attempting to archive, we can
clean up the unneeded log segments.

I would oppose that as the solution, either an unconditional one, or configurable with is it as the default. Those segments are not unneeded. I need them. That is why I set up archiving in the first place. If you need to shut down the database rather than violate my established retention policy, then shut down the database.

What we need is a better way for the DBA to find out that archiving is
falling behind when it first starts to fall behind. Tailing the log and
examining the rather cryptic error messages we give out isn't very
effective.

The archive command can be made a shell script (or that matter a compiled program) which can do anything it wants upon failure, including emailing people. Of course maybe whatever causes the archive to fail will also cause the delivery of the message to fail, but I don't see a real solution to this that doesn't start down an infinite regress. If it is not failing outright, but merely falling behind, then I don't really know how to go about detecting that, either in archive_command, or through tailing the PostgreSQL log. I guess archive_command, each time it is invoked, could count the files in the pg_xlog directory and warn if it thinks the number is unreasonable.

xLog Partition
--------------

As Heikki pointed, out, a full dedicated WAL drive is hard to fix once
it gets full, since there's nothing you can safely delete to clear
space, even enough for a checkpoint record.

Although the DBA probably wouldn't know it from reading the manual, it is almost always safe to delete the oldest WAL file (after copying it to a different partition just in case something goes wrong--it should be possible to do that as if WAL is on its own partition, it is hard to imagine you can't scrounge up 16MB on a different one), as PostgreSQL keeps two complete checkpoints worth of WAL around. I think the only reason you would not be able to recover after removing the oldest file is if the controldata file is damaged such that the most recent checkpoint record cannot be found and so it has to fall back to the previous one. Or at least, this is my understanding.

On the other hand, it should be easy to prevent full status; we could
simply force a non-spread checkpoint whenever the available WAL space
gets 90% full. We'd also probably want to be prepared to switch to a
read-only mode if we get full enough that there's only room for the
checkpoint records.

I think that that last sentence could also be applied without modification to the "one volume" case as well.

So what would that look like? Before accepting a (non-checkpoint) WAL Insert that fills up the current segment to a high enough level that a checkpoint record will no longer fit, it must first verify that a recycled file exists, or if not it must successfully init a new file.

If that init fails, then it must do what? Signal for a checkpoint, release it's locks, and then ERROR out? That would be better than a PANIC, but can it do better? Enter a retry loop so that once the checkpoint has finished and assuming it has freed up enough WAL files to recycling/removal, then it can try the original WAL Insert again?

Cheers,

Jeff

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Hard limit on WAL space used (because PANIC sucks)