Re: Hard limit on WAL space used (because PANIC sucks)

Поиск
Список
Период
Сортировка
От Bernd Helmle
Тема Re: Hard limit on WAL space used (because PANIC sucks)
Дата
Msg-id F3D04ABBF3FB984BCC98C06D@apophis.local
обсуждение исходный текст
Ответ на Re: Hard limit on WAL space used (because PANIC sucks)  (Josh Berkus <josh@agliodbs.com>)
Список pgsql-hackers

--On 6. Juni 2013 16:25:29 -0700 Josh Berkus <josh@agliodbs.com> wrote:

> Archiving
> ---------
>
> In some ways, this is the simplest case.  Really, we just need a way to
> know when the available WAL space has become 90% full, and abort
> archiving at that stage.  Once we stop attempting to archive, we can
> clean up the unneeded log segments.
>
> What we need is a better way for the DBA to find out that archiving is
> falling behind when it first starts to fall behind.  Tailing the log and
> examining the rather cryptic error messages we give out isn't very
> effective.

Slightly OT, but i always wondered wether we could create a function, say

pg_last_xlog_removed()

for example, returning a value suitable to be used to calculate the 
distance to the current position. An increasing value could be used to 
instruct monitoring to throw a warning if a certain threshold is exceeded.

I've also seen people creating monitoring scripts by looking into 
archive_status and do simple counts on the .ready files and give a warning, 
if that exceeds an expected maximum value.

I haven't looked at the code very deep, but i think we already store the 
position of the last removed xlog in shared memory already, maybe this can 
be used somehow. Afaik, we do cleanup only during checkpoints, so this all 
has too much delay...

-- 
Thanks
Bernd



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Markus Wanner
Дата:
Сообщение: Re: Proposal for CSN based snapshots
Следующее
От: Greg Stark
Дата:
Сообщение: Re: Proposal for CSN based snapshots