> 20 июня 2019 г., в 22:09, Alvaro Herrera <alvherre@2ndquadrant.com> написал(а):
>
> On 2019-Jun-20, Andrey Borodin wrote:
>
>> Hi!
>>
>> We are fine-tuning our data corruption monitoring and found out that many corruption cases do not report proper
errorcode.
>> This makes automatic log analyzer way too smart program.
>> We think that corruption error codes should be given in cases when B-tree or TOAST do not know how to interpret
data.
>> PFA patch with cases that we have found in logs and consider evidence of corruption.
>
> This is not totally insane -- other similar messages such as 'corrupted
> page pointers' in bufpage.c get the same errcode.
On master there is only
elog(ERROR, "incorrect index offsets supplied");
in bufpage.c. But this indicate misuse, not corrupted data on disk.
Others already use ERRCODE_DATA_CORRUPTED.
>
> I would like to have a separate marking for messages indicating a
> system-level permanent problem rather than user error ("table/column X
> does not exist"), retryable condition ("serializability violation"), or
> resource exhaustion ("out of memory", "too many clients"),
Good idea, but there must be standards to which we comply?
> but that's
> probably a separate patch: things like "could not open/read/write file"
> for a data file, or "xlog flush request XYZ not satisfied", and so on,
> which also indicate a kind of corruption.
I believe we should not report hardware problems as corruption. But this worries us (YC) too. Do you think that this
problemdeserve a patch?
If we introduce new error code - this is, kind of, new feature. Should I send it to pgsql-hackers?
> As you say, currently we have
> to have much too smart programs to weed out the serious errors that
> ought to show up in an alerting system from run-of-the-mill problems.
Thanks!
Best regards, Andrey Borodin.