Re: warning message in standby

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: warning message in standby
Дата
Msg-id 4C16174E.6020004@enterprisedb.com
обсуждение исходный текст
Ответ на Re: warning message in standby  (Bruce Momjian <bruce@momjian.us>)
Список pgsql-hackers
On 14/06/10 13:16, Bruce Momjian wrote:
> Heikki Linnakangas wrote:
>> On 12/06/10 04:19, Bruce Momjian wrote:
>>> Robert Haas wrote:
>>>>> If my streaming replication stops working, I want to know about it as
>>>>> soon as possible. WARNING just doesn't cut it.
>>>>>
>>>>> This needs some better thought.
>>>>>
>>>>> If we PANIC, then surely it will PANIC again when we restart unless we
>>>>> do something. So we can't do that. But we need to do something better
>>>>> than
>>>>>
>>>>> WARNING there is a bug that will likely cause major data loss
>>>>> HINT you'll be sacked if you miss this message
>>>>
>>>> +1.  I was making this same argument (less eloquently) upthread.
>>>> I particularly like the errhint().
>>>
>>> I am wondering what action would be most likely to get the
>>> administrator's attention.
>>
>> I've committed the patch to disconnect the SR connection in that case.
>> If the message needs improvement, let's do that separately once we
>> figure out what to do.
>>
>> Seems like we need something like WARNING that doesn't cause the process
>> to die, but more alarming like ERROR/FATAL/PANIC. Or maybe just adding a
>> hint to the warning will do. How about
>>
>> WARNING:  invalid record length at 0/4005330
>> HINT: An invalid record was streamed from master. That can be a sign of
>> corruption in the master, or inconsistency between master and standby
>> state. The record will be re-fetched, but that is unlikely to fix the
>> problem. You may have to restore standby from base backup.
>
> I am thinking about log monitoring tools like Nagios.  I am afraid
> they are never going to pick up something tagged WARNING, no matter
> what the wording is.

One idea is for the startup process to signal walreceiver process to 
commit suicide with FATAL, instead of just dying silently like it does 
now. So you'd get a WARNING explaining how the record was corrupt, 
followed by a FATAL from the walreceiver process:

WARNING:  invalid record length at 0/4005330
FATAL: walreceiver killed because of error in WAL stream

>  Crazy idea, but can we force a fatal error line
> into the logs with something like "WARNING ...\nFATAL: ...".

Yeah, that's crazy.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: GSoC - Materialized Views - is stale or fresh?
Следующее
От: Robert Haas
Дата:
Сообщение: Re: ExecutorCheckPerms() hook