Обсуждение: Strange error in new 003_cic_2pc.pl test

Поиск
Список
Период
Сортировка

Strange error in new 003_cic_2pc.pl test

От
Tom Lane
Дата:
tadarida just reported this [1]:

2021-11-11 14:40:40.040 CET [29620:388] 023_cic_2pc.pl ERROR:  could not read two-phase state from WAL at 0/1716C68

That error message is remarkably unhelpful, and I think unnecessarily so.
Why the heck isn't XlogReadTwoPhaseData passing on the error string
that it just got from XLogReadRecord?

Now, XLogReadRecord's API spec does say it might return NULL errormsg
if "the page read callback already reported the error".  But there's
no sign of that in tadarida's log.  Anyway, I have in mind to do

        ereport(ERROR,
                (errcode_for_file_access(),
-                errmsg("could not read two-phase state from WAL at %X/%X",
-                       LSN_FORMAT_ARGS(lsn))));
+                errmsg("could not read two-phase state from WAL at %X/%X: %s",
+                       LSN_FORMAT_ARGS(lsn),
+                       errormsg ? errormsg : "(no details available)")));

But first, I'm going to look around for other XLogReadRecord
callers that might be equally slipshod.

(Identifying the actual underlying bug, if there is one, is a different
problem.  This test being so new, I'd not be surprised if it is showing
us a real-but-intermittent issue.  But perhaps fixing this error report
will yield more info.)

            regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tadarida&dt=2021-11-11%2013%3A29%3A58



Re: Strange error in new 003_cic_2pc.pl test

От
Andrey Borodin
Дата:

> tadarida just reported this [1]:
> 
> 2021-11-11 14:40:40.040 CET [29620:388] 023_cic_2pc.pl ERROR: could not read two-phase state from WAL at 0/1716C68
> 
> That error message is remarkably unhelpful, and I think unnecessarily so.
> Why the heck isn't XlogReadTwoPhaseData passing on the error string
> that it just got from XLogReadRecord?
> 
> Now, XLogReadRecord's API spec does say it might return NULL errormsg
> if "the page read callback already reported the error". But there's
> no sign of that in tadarida's log. Anyway, I have in mind to do
> 
> ereport(ERROR,
> (errcode_for_file_access(),
> - errmsg("could not read two-phase state from WAL at %X/%X",
> - LSN_FORMAT_ARGS(lsn))));
> + errmsg("could not read two-phase state from WAL at %X/%X: %s",
> + LSN_FORMAT_ARGS(lsn),
> + errormsg ? errormsg : "(no details available)")));
> 

Noah recently proposed the same [0]. But responders like me started to complain about other places without error
reportingaround xlog reading.
 

Probably I had to be more clear on the matter. Error logging certainly worth doing, at least this.

Thanks!


Best regards, Andrey Borodin.

[0] https://www.postgresql.org/message-id/flat/20211107013157.GB790288%40rfd.leadboat.com



Re: Strange error in new 003_cic_2pc.pl test

От
Tom Lane
Дата:
Andrey Borodin <x4mmm@yandex-team.ru> writes:
>> That error message is remarkably unhelpful, and I think unnecessarily so.
>> Why the heck isn't XlogReadTwoPhaseData passing on the error string
>> that it just got from XLogReadRecord?

> Noah recently proposed the same [0]. But responders like me started to complain about other places without error
reportingaround xlog reading. 

Ah, I missed that thread.  Now that we've seen another instance of the
same failure, I agree it's urgent to get something committed.  Noah's
patch matches the style of some other callers, so let's go with that.

(And yeah, why are some of the other callers ignoring errormsg?  But
that can wait.)

            regards, tom lane