Re: Logging corruption error codes

Поиск
Список
Период
Сортировка
От Andrey Borodin
Тема Re: Logging corruption error codes
Дата
Msg-id 28DE958A-DB3B-4266-B960-596B0092FF8E@yandex-team.ru
обсуждение исходный текст
Ответ на Re: Logging corruption error codes  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Ответы Re: Logging corruption error codes  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-bugs

> 20 июня 2019 г., в 22:09, Alvaro Herrera <alvherre@2ndquadrant.com> написал(а):
>
> On 2019-Jun-20, Andrey Borodin wrote:
>
>> Hi!
>>
>> We are fine-tuning our data corruption monitoring and found out that many corruption cases do not report proper
errorcode. 
>> This makes automatic log analyzer way too smart program.
>> We think that corruption error codes should be given in cases when B-tree or TOAST do not know how to interpret
data.
>> PFA patch with cases that we have found in logs and consider evidence of corruption.
>
> This is not totally insane -- other similar messages such as 'corrupted
> page pointers' in bufpage.c get the same errcode.
On master there is only
elog(ERROR, "incorrect index offsets supplied");
in bufpage.c. But this indicate misuse, not corrupted data on disk.
Others already use ERRCODE_DATA_CORRUPTED.
>
> I would like to have a separate marking for messages indicating a
> system-level permanent problem rather than user error ("table/column X
> does not exist"), retryable condition ("serializability violation"), or
> resource exhaustion ("out of memory", "too many clients"),
Good idea, but there must be standards to which we comply?

> but that's
> probably a separate patch: things like "could not open/read/write file"
> for a data file, or "xlog flush request XYZ not satisfied", and so on,
> which also indicate a kind of corruption.
I believe we should not report hardware problems as corruption. But this worries us (YC) too. Do you think that this
problemdeserve a patch? 
If we introduce new error code - this is, kind of, new feature. Should I send it to pgsql-hackers?

>  As you say, currently we have
> to have much too smart programs to weed out the serious errors that
> ought to show up in an alerting system from run-of-the-mill problems.

Thanks!

Best regards, Andrey Borodin.


В списке pgsql-bugs по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: segfault during SELECT using && ANY (ARRAY[NULL]::BOX2D).
Следующее
От: Juan José Santamaría Flecha
Дата:
Сообщение: Re: BUG #15789: libpq compilation with OpenSSL 1.1.1b fails onWindows with Visual Studio 2017