Обсуждение: Odd usage of errmsg_internal in bufmgr.c

Поиск

Список

Период

Сортировка

Odd usage of errmsg_internal in bufmgr.c

От

Chao Li

Дата:

12 февраля, 03:37:54

Hi,

The relevant code looks like this:
```
        msg_one = _("invalid page in block %u of relation \"%s\””);

       ereport(elevel,
           errcode(ERRCODE_DATA_CORRUPTED),
           errmsg_internal(msg_one, first + first_off, rpath.str) :
```

Here, the string is first translated via _() and stored in msg_one, and then passed to errmsg_internal(). However,
accordingto the header comment of errmsg_internal(): 
```
/*
* errmsg_internal --- add a primary error message text to the current error
*
* This is exactly like errmsg() except that strings passed to errmsg_internal
* are not translated, and are customarily left out of the
* internationalization message dictionary. This should be used for "can't
* happen" cases that are probably not worth spending translation effort on.
* We also use this for certain cases where we *must* not try to translate
* the message because the translation would fail and result in infinite
* error recursion.
*/
int
errmsg_internal(const char *fmt,...)
```

errmsg_internal() is explicitly intended for non-translatable, internal messages. Passing an already translated string
toit feels inconsistent with its documented purpose. 

In bufmgr.c, these corruption-related messages are clearly user-facing, so it seems more appropriate to use errmsg()
hereinstead of errmsg_internal(). If my understanding is correct, the attached patch fixes the usages in bufmgr.c. 

This patch doesn't affect runtime behavior, but it avoids confusion for future readers, and helps prevent similar
misuseelsewhere. 

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Вложения

v1-0001-bufmgr-Fix-wrong-usage-of-errmsg_internal-in-buff.patch

Re: Odd usage of errmsg_internal in bufmgr.c

От

Andres Freund

Дата:

12 февраля, 04:26:00

Hi,

On February 11, 2026 4:37:54 PM PST, Chao Li <li.evan.chao@gmail.com> wrote:
>Hi,
>
>The relevant code looks like this:
>```
>        msg_one = _("invalid page in block %u of relation \"%s\””);
>
>       ereport(elevel,
>           errcode(ERRCODE_DATA_CORRUPTED),
>           errmsg_internal(msg_one, first + first_off, rpath.str) :
>```
>
>Here, the string is first translated via _() and stored in msg_one, and then passed to errmsg_internal(). However,
accordingto the header comment of errmsg_internal(): 
>```
>/*
>* errmsg_internal --- add a primary error message text to the current error
>*
>* This is exactly like errmsg() except that strings passed to errmsg_internal
>* are not translated, and are customarily left out of the
>* internationalization message dictionary. This should be used for "can't
>* happen" cases that are probably not worth spending translation effort on.
>* We also use this for certain cases where we *must* not try to translate
>* the message because the translation would fail and result in infinite
>* error recursion.
>*/
>int
>errmsg_internal(const char *fmt,...)
>```
>
>errmsg_internal() is explicitly intended for non-translatable, internal messages. Passing an already translated string
toit feels inconsistent with its documented purpose. 

I don't think it is - you want to translate just once. There are other places using that pattern.


>In bufmgr.c, these corruption-related messages are clearly user-facing, so it seems more appropriate to use errmsg()
hereinstead of errmsg_internal(). If my understanding is correct, the attached patch fixes the usages in bufmgr.c. 
>
>This patch doesn't affect runtime behavior, but it avoids confusion for future readers, and helps prevent similar
misuseelsewhere. 

You can't do that, because then the strings won't be recognized as translatable by the translation machinery.  The
argumentsto functions that take to-be-translated strings (like errmsg(..., ) or _(...) have to constants, so the
stringscan be extracted to then be translated. 

Greetings,

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Odd usage of errmsg_internal in bufmgr.c

От

Tom Lane

Дата:

12 февраля, 04:34:50

Chao Li <li.evan.chao@gmail.com> writes:
> The relevant code looks like this:

>         msg_one = _("invalid page in block %u of relation \"%s\””);

>        ereport(elevel,
>            errcode(ERRCODE_DATA_CORRUPTED),
>            errmsg_internal(msg_one, first + first_off, rpath.str) :

I agree that that isn't great code, but what I don't like about it
is the separation between where the format string is defined and
where it is used.  It'd be very easy for the %-escapes to get out of
sync with the types of the actual parameters, and if they did, the
compiler would not warn you.  I think we ought to try to recast this
into the normal usage pattern where the format is literal within the
errmsg call.  I see the comment about avoiding code duplication, but
to my mind this is a terrible solution.

> errmsg_internal() is explicitly intended for non-translatable, internal messages. Passing an already translated
stringto it feels inconsistent with its documented purpose. 

[ shrug... ] We do that in some other places, I believe, the idea
being just to not waste cycles by passing a message through
translation twice.  (In theory perhaps you could even end with the
wrong message, though that'd take a highly-unlikely match of one
translated message to some other untranslated message.)  I don't like
this code for the reason I gave above, but I don't think its use of
errmsg_internal is problematic per se.  Maybe we should amend the
comment of errmsg_internal to mention such usage.

> This patch doesn't affect runtime behavior, but it avoids confusion for future readers, and helps prevent similar
misuseelsewhere. 

Well, actually, it breaks translation of these messages completely.
With this coding there is nothing that will cue xgettext to pick up
these strings as translatable.  You could use gettext_noop() instead
of _() to do that, but that doesn't fix my beef about the separation
between format string and format arguments.

            regards, tom lane

Re: Odd usage of errmsg_internal in bufmgr.c

От

Andres Freund

Дата:

12 февраля, 05:19:32

Hi,

On 2026-02-11 20:34:50 -0500, Tom Lane wrote:
> Chao Li <li.evan.chao@gmail.com> writes:
> > The relevant code looks like this:
> 
> >         msg_one = _("invalid page in block %u of relation \"%s\””);
> 
> >        ereport(elevel,
> >            errcode(ERRCODE_DATA_CORRUPTED),
> >            errmsg_internal(msg_one, first + first_off, rpath.str) :
> 
> I agree that that isn't great code, but what I don't like about it
> is the separation between where the format string is defined and
> where it is used.  It'd be very easy for the %-escapes to get out of
> sync with the types of the actual parameters, and if they did, the
> compiler would not warn you.  I think we ought to try to recast this
> into the normal usage pattern where the format is literal within the
> errmsg call.  I see the comment about avoiding code duplication, but
> to my mind this is a terrible solution.

The amount of code duplication it avoids is rather substantial. Yes, it's not
great to loose the compiler checking for format codes, but each of the
branches is actually, so the likelihood of that not being noticed doesn't seem
significant.

Greetings,

Andres Freund

Re: Odd usage of errmsg_internal in bufmgr.c

От

Tom Lane

Дата:

12 февраля, 06:13:24

Andres Freund <andres@anarazel.de> writes:
> On 2026-02-11 20:34:50 -0500, Tom Lane wrote:
>> I agree that that isn't great code, but what I don't like about it
>> is the separation between where the format string is defined and
>> where it is used.  It'd be very easy for the %-escapes to get out of
>> sync with the types of the actual parameters, and if they did, the
>> compiler would not warn you.  I think we ought to try to recast this
>> into the normal usage pattern where the format is literal within the
>> errmsg call.  I see the comment about avoiding code duplication, but
>> to my mind this is a terrible solution.

> The amount of code duplication it avoids is rather substantial.

Really?  By my count it's strictly fewer lines to do it the
straightforward way.  Yes, I'm counting removal of the comments
defending doing it in the convoluted way, but on the other hand the
attached patch adds quite a few extra line breaks for readability,
and still comes out 4 lines shorter.  Not to mention less fragile.
I do not see a reason why the existing code is good.

            regards, tom lane

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index d1babaff023..64220f8a72f 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -8374,83 +8374,79 @@ buffer_readv_report(PgAioResult result, const PgAioTargetData *td,
     uint8        zeroed_or_error_count,
                 checkfail_count,
                 first_off;
-    uint8        affected_count;
-    const char *msg_one,
-               *msg_mult,
-               *det_mult,
-               *hint_mult;

     buffer_readv_decode_error(result, &zeroed_any, &ignored_any,
                               &zeroed_or_error_count,
                               &checkfail_count,
                               &first_off);

-    /*
-     * Treat a read that had both zeroed buffers *and* ignored checksums as a
-     * special case, it's too irregular to be emitted the same way as the
-     * other cases.
-     */
     if (zeroed_any && ignored_any)
     {
-        Assert(zeroed_any && ignored_any);
         Assert(nblocks > 1);    /* same block can't be both zeroed and ignored */
         Assert(result.status != PGAIO_RS_ERROR);
-        affected_count = zeroed_or_error_count;
-
         ereport(elevel,
                 errcode(ERRCODE_DATA_CORRUPTED),
                 errmsg("zeroing %u page(s) and ignoring %u checksum failure(s) among blocks %u..%u of relation
\"%s\"",
-                       affected_count, checkfail_count, first, last, rpath.str),
-                affected_count > 1 ?
+                       zeroed_or_error_count, checkfail_count, first, last, rpath.str),
+                zeroed_or_error_count > 1 ?
                 errdetail("Block %u held the first zeroed page.",
                           first + first_off) : 0,
                 errhint_plural("See server log for details about the other %d invalid block.",
                                "See server log for details about the other %d invalid blocks.",
-                               affected_count + checkfail_count - 1,
-                               affected_count + checkfail_count - 1));
-        return;
+                               zeroed_or_error_count + checkfail_count - 1,
+                               zeroed_or_error_count + checkfail_count - 1));
     }
-
-    /*
-     * The other messages are highly repetitive. To avoid duplicating a long
-     * and complicated ereport(), gather the translated format strings
-     * separately and then do one common ereport.
-     */
-    if (result.status == PGAIO_RS_ERROR)
+    else if (result.status == PGAIO_RS_ERROR)
     {
         Assert(!zeroed_any);    /* can't have invalid pages when zeroing them */
-        affected_count = zeroed_or_error_count;
-        msg_one = _("invalid page in block %u of relation \"%s\"");
-        msg_mult = _("%u invalid pages among blocks %u..%u of relation \"%s\"");
-        det_mult = _("Block %u held the first invalid page.");
-        hint_mult = _("See server log for the other %u invalid block(s).");
+        ereport(elevel,
+                errcode(ERRCODE_DATA_CORRUPTED),
+                zeroed_or_error_count == 1 ?
+                errmsg("invalid page in block %u of relation \"%s\"",
+                       first + first_off, rpath.str) :
+                errmsg("%u invalid pages among blocks %u..%u of relation \"%s\"",
+                       zeroed_or_error_count, first, last, rpath.str),
+                zeroed_or_error_count > 1 ?
+                errdetail("Block %u held the first invalid page.",
+                          first + first_off) : 0,
+                zeroed_or_error_count > 1 ?
+                errhint("See server log for the other %u invalid block(s).",
+                        zeroed_or_error_count - 1) : 0);
     }
     else if (zeroed_any && !ignored_any)
     {
-        affected_count = zeroed_or_error_count;
-        msg_one = _("invalid page in block %u of relation \"%s\"; zeroing out page");
-        msg_mult = _("zeroing out %u invalid pages among blocks %u..%u of relation \"%s\"");
-        det_mult = _("Block %u held the first zeroed page.");
-        hint_mult = _("See server log for the other %u zeroed block(s).");
+        ereport(elevel,
+                errcode(ERRCODE_DATA_CORRUPTED),
+                zeroed_or_error_count == 1 ?
+                errmsg("invalid page in block %u of relation \"%s\"; zeroing out page",
+                       first + first_off, rpath.str) :
+                errmsg("zeroing out %u invalid pages among blocks %u..%u of relation \"%s\"",
+                       zeroed_or_error_count, first, last, rpath.str),
+                zeroed_or_error_count > 1 ?
+                errdetail("Block %u held the first zeroed page.",
+                          first + first_off) : 0,
+                zeroed_or_error_count > 1 ?
+                errhint("See server log for the other %u zeroed block(s).",
+                        zeroed_or_error_count - 1) : 0);
     }
     else if (!zeroed_any && ignored_any)
     {
-        affected_count = checkfail_count;
-        msg_one = _("ignoring checksum failure in block %u of relation \"%s\"");
-        msg_mult = _("ignoring %u checksum failures among blocks %u..%u of relation \"%s\"");
-        det_mult = _("Block %u held the first ignored page.");
-        hint_mult = _("See server log for the other %u ignored block(s).");
+        ereport(elevel,
+                errcode(ERRCODE_DATA_CORRUPTED),
+                checkfail_count == 1 ?
+                errmsg("ignoring checksum failure in block %u of relation \"%s\"",
+                       first + first_off, rpath.str) :
+                errmsg("ignoring %u checksum failures among blocks %u..%u of relation \"%s\"",
+                       checkfail_count, first, last, rpath.str),
+                checkfail_count > 1 ?
+                errdetail("Block %u held the first ignored page.",
+                          first + first_off) : 0,
+                checkfail_count > 1 ?
+                errhint("See server log for the other %u ignored block(s).",
+                        checkfail_count - 1) : 0);
     }
     else
         pg_unreachable();
-
-    ereport(elevel,
-            errcode(ERRCODE_DATA_CORRUPTED),
-            affected_count == 1 ?
-            errmsg_internal(msg_one, first + first_off, rpath.str) :
-            errmsg_internal(msg_mult, affected_count, first, last, rpath.str),
-            affected_count > 1 ? errdetail_internal(det_mult, first + first_off) : 0,
-            affected_count > 1 ? errhint_internal(hint_mult, affected_count - 1) : 0);
 }

 static void

Re: Odd usage of errmsg_internal in bufmgr.c

От

Chao Li

Дата:

12 февраля, 10:49:38


> On Feb 12, 2026, at 11:13, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Andres Freund <andres@anarazel.de> writes:
>> On 2026-02-11 20:34:50 -0500, Tom Lane wrote:
>>> I agree that that isn't great code, but what I don't like about it
>>> is the separation between where the format string is defined and
>>> where it is used.  It'd be very easy for the %-escapes to get out of
>>> sync with the types of the actual parameters, and if they did, the
>>> compiler would not warn you.  I think we ought to try to recast this
>>> into the normal usage pattern where the format is literal within the
>>> errmsg call.  I see the comment about avoiding code duplication, but
>>> to my mind this is a terrible solution.
>
>> The amount of code duplication it avoids is rather substantial.
>
> Really?  By my count it's strictly fewer lines to do it the
> straightforward way.  Yes, I'm counting removal of the comments
> defending doing it in the convoluted way, but on the other hand the
> attached patch adds quite a few extra line breaks for readability,
> and still comes out 4 lines shorter.  Not to mention less fragile.
> I do not see a reason why the existing code is good.
>
> regards, tom lane
>
> diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
> index d1babaff023..64220f8a72f 100644
> --- a/src/backend/storage/buffer/bufmgr.c
> +++ b/src/backend/storage/buffer/bufmgr.c
> @@ -8374,83 +8374,79 @@ buffer_readv_report(PgAioResult result, const PgAioTargetData *td,
> uint8 zeroed_or_error_count,
> checkfail_count,
> first_off;
> - uint8 affected_count;
> - const char *msg_one,
> -   *msg_mult,
> -   *det_mult,
> -   *hint_mult;
>
> buffer_readv_decode_error(result, &zeroed_any, &ignored_any,
>  &zeroed_or_error_count,
>  &checkfail_count,
>  &first_off);
>
> - /*
> - * Treat a read that had both zeroed buffers *and* ignored checksums as a
> - * special case, it's too irregular to be emitted the same way as the
> - * other cases.
> - */
> if (zeroed_any && ignored_any)
> {
> - Assert(zeroed_any && ignored_any);
> Assert(nblocks > 1); /* same block can't be both zeroed and ignored */
> Assert(result.status != PGAIO_RS_ERROR);
> - affected_count = zeroed_or_error_count;
> -
> ereport(elevel,
> errcode(ERRCODE_DATA_CORRUPTED),
> errmsg("zeroing %u page(s) and ignoring %u checksum failure(s) among blocks %u..%u of relation \"%s\"",
> -   affected_count, checkfail_count, first, last, rpath.str),
> - affected_count > 1 ?
> +   zeroed_or_error_count, checkfail_count, first, last, rpath.str),
> + zeroed_or_error_count > 1 ?
> errdetail("Block %u held the first zeroed page.",
>  first + first_off) : 0,
> errhint_plural("See server log for details about the other %d invalid block.",
>   "See server log for details about the other %d invalid blocks.",
> -   affected_count + checkfail_count - 1,
> -   affected_count + checkfail_count - 1));
> - return;
> +   zeroed_or_error_count + checkfail_count - 1,
> +   zeroed_or_error_count + checkfail_count - 1));
> }
> -
> - /*
> - * The other messages are highly repetitive. To avoid duplicating a long
> - * and complicated ereport(), gather the translated format strings
> - * separately and then do one common ereport.
> - */
> - if (result.status == PGAIO_RS_ERROR)
> + else if (result.status == PGAIO_RS_ERROR)
> {
> Assert(!zeroed_any); /* can't have invalid pages when zeroing them */
> - affected_count = zeroed_or_error_count;
> - msg_one = _("invalid page in block %u of relation \"%s\"");
> - msg_mult = _("%u invalid pages among blocks %u..%u of relation \"%s\"");
> - det_mult = _("Block %u held the first invalid page.");
> - hint_mult = _("See server log for the other %u invalid block(s).");
> + ereport(elevel,
> + errcode(ERRCODE_DATA_CORRUPTED),
> + zeroed_or_error_count == 1 ?
> + errmsg("invalid page in block %u of relation \"%s\"",
> +   first + first_off, rpath.str) :
> + errmsg("%u invalid pages among blocks %u..%u of relation \"%s\"",
> +   zeroed_or_error_count, first, last, rpath.str),
> + zeroed_or_error_count > 1 ?
> + errdetail("Block %u held the first invalid page.",
> +  first + first_off) : 0,
> + zeroed_or_error_count > 1 ?
> + errhint("See server log for the other %u invalid block(s).",
> + zeroed_or_error_count - 1) : 0);
> }
> else if (zeroed_any && !ignored_any)
> {
> - affected_count = zeroed_or_error_count;
> - msg_one = _("invalid page in block %u of relation \"%s\"; zeroing out page");
> - msg_mult = _("zeroing out %u invalid pages among blocks %u..%u of relation \"%s\"");
> - det_mult = _("Block %u held the first zeroed page.");
> - hint_mult = _("See server log for the other %u zeroed block(s).");
> + ereport(elevel,
> + errcode(ERRCODE_DATA_CORRUPTED),
> + zeroed_or_error_count == 1 ?
> + errmsg("invalid page in block %u of relation \"%s\"; zeroing out page",
> +   first + first_off, rpath.str) :
> + errmsg("zeroing out %u invalid pages among blocks %u..%u of relation \"%s\"",
> +   zeroed_or_error_count, first, last, rpath.str),
> + zeroed_or_error_count > 1 ?
> + errdetail("Block %u held the first zeroed page.",
> +  first + first_off) : 0,
> + zeroed_or_error_count > 1 ?
> + errhint("See server log for the other %u zeroed block(s).",
> + zeroed_or_error_count - 1) : 0);
> }
> else if (!zeroed_any && ignored_any)
> {
> - affected_count = checkfail_count;
> - msg_one = _("ignoring checksum failure in block %u of relation \"%s\"");
> - msg_mult = _("ignoring %u checksum failures among blocks %u..%u of relation \"%s\"");
> - det_mult = _("Block %u held the first ignored page.");
> - hint_mult = _("See server log for the other %u ignored block(s).");
> + ereport(elevel,
> + errcode(ERRCODE_DATA_CORRUPTED),
> + checkfail_count == 1 ?
> + errmsg("ignoring checksum failure in block %u of relation \"%s\"",
> +   first + first_off, rpath.str) :
> + errmsg("ignoring %u checksum failures among blocks %u..%u of relation \"%s\"",
> +   checkfail_count, first, last, rpath.str),
> + checkfail_count > 1 ?
> + errdetail("Block %u held the first ignored page.",
> +  first + first_off) : 0,
> + checkfail_count > 1 ?
> + errhint("See server log for the other %u ignored block(s).",
> + checkfail_count - 1) : 0);
> }
> else
> pg_unreachable();
> -
> - ereport(elevel,
> - errcode(ERRCODE_DATA_CORRUPTED),
> - affected_count == 1 ?
> - errmsg_internal(msg_one, first + first_off, rpath.str) :
> - errmsg_internal(msg_mult, affected_count, first, last, rpath.str),
> - affected_count > 1 ? errdetail_internal(det_mult, first + first_off) : 0,
> - affected_count > 1 ? errhint_internal(hint_mult, affected_count - 1) : 0);
> }
>
> static void

Thank you, Tom and Andres, for clarifying the translation mechanism, that helped me understand it better.

Given that, the patch is no longer about any misuse of errmsg_internal(), but turns to be a refactoring. I’ve prepared
av2 accordingly. It incorporates the diff Tom attached, and as Tom suggested earlier, I’ve also added a brief
clarificationto the header comment of errmsg_internal(). 

I also noticed that Tom’s diff removes a redundant Assert(), so it ends up doing slightly more than a pure refactoring.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Вложения

v2-0001-bufmgr-Inline-error-messages-and-clarify-errmsg_i.patch

Re: Odd usage of errmsg_internal in bufmgr.c

От

Andres Freund

Дата:

12 февраля, 20:12:03

Hi,

On 2026-02-11 22:13:24 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2026-02-11 20:34:50 -0500, Tom Lane wrote:
> >> I agree that that isn't great code, but what I don't like about it
> >> is the separation between where the format string is defined and
> >> where it is used.  It'd be very easy for the %-escapes to get out of
> >> sync with the types of the actual parameters, and if they did, the
> >> compiler would not warn you.  I think we ought to try to recast this
> >> into the normal usage pattern where the format is literal within the
> >> errmsg call.  I see the comment about avoiding code duplication, but
> >> to my mind this is a terrible solution.
> 
> > The amount of code duplication it avoids is rather substantial.
> 
> Really?  By my count it's strictly fewer lines to do it the
> straightforward way.  Yes, I'm counting removal of the comments
> defending doing it in the convoluted way, but on the other hand the
> attached patch adds quite a few extra line breaks for readability,
> and still comes out 4 lines shorter.  Not to mention less fragile.
> I do not see a reason why the existing code is good.

I think the code after the proposed change is considerably harder to read.

Greetings,

Andres Freund

Re: Odd usage of errmsg_internal in bufmgr.c

От

Álvaro Herrera

Дата:

12 февраля, 20:53:34

On 2026-Feb-12, Andres Freund wrote:

> I think the code after the proposed change is considerably harder to read.

Perhaps gratuitously so ...  For instance, AFAICS the first block could
be:

    if (result.status == PGAIO_RS_ERROR)
    {
        Assert(!zeroed_any);    /* can't have invalid pages when zeroing them */
        if (zeroed_or_error_count == 1)
            ereport(elevel,
                    errcode(ERRCODE_DATA_CORRUPTED),
                    errmsg("invalid page in block %u of relation \"%s\"",
                           first + first_off, rpath.str));
        else
            ereport(elevel,
                    errcode(ERRCODE_DATA_CORRUPTED),
                    errmsg("%u invalid pages among blocks %u..%u of relation \"%s\"",
                           zeroed_or_error_count, first, last, rpath.str),
                    errdetail("Block %u held the first invalid page.",
                              first + first_off),
                    errhint("See server log for the other %u invalid block(s).",
                            zeroed_or_error_count - 1));
    }

which does the same, is easier to read, contains no duplicate messages,
and still allows the whole sequence fit in one screenful.


I'm confused about the meaning of "Block [first+first_off] held the
first invalid page.", though ... what exactly does that mean?  What does
(first+first_off) represent?  Block of what?  How is (first) different
from (first+first_off)?  The comments on buffer_readv_decode_error() and
buffer_readv_encode_error() leave me none the wiser.  Is the segment
size relevant to how I must interpret that number?

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Los trabajadores menos efectivos son sistematicamente llevados al lugar
donde pueden hacer el menor daño posible: gerencia."  (El principio Dilbert)

Re: Odd usage of errmsg_internal in bufmgr.c

От

Zsolt Parragi

Дата:

12 февраля, 21:34:49

> Perhaps gratuitously so ...  For instance, AFAICS the first block could
> be:


I also tried to play with a few variants for this, and I reached the
same conclusion: the cleanest version is when we simply repeat the
messages.
If we unify the messages a bit (the zeroing singular form is a bit
different compared to the others), we could introduce a helper
function but it doesn't seem that readable to me.

Also:

  if (result.status == PGAIO_RS_ERROR)
  {
  Assert(!zeroed_any); /* can't have invalid pages when zeroing them */
- affected_count = zeroed_or_error_count;
- msg_one = _("invalid page in block %u of relation \"%s\"");


* The comment after assert seems wrong, isn't it backwards? Can't have
zeroed pages with failed reads
* Is it really okay to classify read errors as "invalid pages"? This
is a read failure, so I imagine this can happen if we lost a disk
because of faulty/flaky hardware, or a user unmounting a partition, or
moving the file, or anything like that, so we weren't able to actually
try to read it. Shouldn't the error say something about read failure
or unreadable pages instead?

Re: Odd usage of errmsg_internal in bufmgr.c

От

Andres Freund

Дата:

12 февраля, 22:12:14

Hi,

On 2026-02-12 18:53:34 +0100, Álvaro Herrera wrote:
> I'm confused about the meaning of "Block [first+first_off] held the
> first invalid page.", though ... what exactly does that mean?  What does
> (first+first_off) represent?  Block of what?  How is (first) different
> from (first+first_off)?  The comments on buffer_readv_decode_error() and
> buffer_readv_encode_error() leave me none the wiser.  Is the segment
> size relevant to how I must interpret that number?

It's relevant for IOs that are larger than one block. E.g. you could have an
IO reading in three blocks, where there are checksum errors on the second and
third block, but the first block is OK. Or just one checksum error in a 16
block IO.

Because the error can be detected e.g. in an IO worker, we have to encode
information about the error in a dense way, since it has to be forwarded
through shared memory. Reporting the number of invalid pages / checksum
failures / zeroed pages, and the first affected block seemed a decent
compromise between space and precision.

Greetings,

Andres Freund

Re: Odd usage of errmsg_internal in bufmgr.c

От

Álvaro Herrera

Дата:

12 февраля, 22:16:52

On 2026-Feb-12, Zsolt Parragi wrote:

> > Perhaps gratuitously so ...  For instance, AFAICS the first block could
> > be:
> 
> 
> I also tried to play with a few variants for this, and I reached the
> same conclusion: the cleanest version is when we simply repeat the
> messages.

But you know what's strange?  There are actually *no* messages that
appear more than once.  The ereport calls all use different messages.
The only thing that appears multiple times is
errcode(ERRCODE_DATA_CORRUPTED).  This is because there's either one
errmsg with no detail/hint (in the single page case), or the other
errmsg() with detail and hint (when multiple pages are affected).

I don't understand why the hint says "see server log for the other
blocks", though ... what part of the code emits that information?
[ ... some cscope jumping later ... ]  Is it md_readv_report?


    if (result.status == PGAIO_RS_ERROR)
    {
        Assert(!zeroed_any);    /* can't have invalid pages when zeroing them */
        if (zeroed_or_error_count == 1)
            ereport(elevel,
                    errcode(ERRCODE_DATA_CORRUPTED),
                    errmsg("invalid page in block %u of relation \"%s\"",
                           first + first_off, rpath.str));
        else
            ereport(elevel,
                    errcode(ERRCODE_DATA_CORRUPTED),
                    errmsg("%u invalid pages among blocks %u..%u of relation \"%s\"",
                           zeroed_or_error_count, first, last, rpath.str),
                    errdetail("Block %u held the first invalid page.",
                              first + first_off),
                    errhint("See server log for the other %u invalid block(s).",
                            zeroed_or_error_count - 1));
    }
    else if (zeroed_any && !ignored_any)
    {
        if (zeroed_or_error_count == 1)
            ereport(elevel,
                    errcode(ERRCODE_DATA_CORRUPTED),
                    errmsg("invalid page in block %u of relation \"%s\"; zeroing out page",
                           first + first_off, rpath.str));
        else
            ereport(elevel,
                    errcode(ERRCODE_DATA_CORRUPTED),
                    errmsg("zeroing out %u invalid pages among blocks %u..%u of relation \"%s\"",
                           zeroed_or_error_count, first, last, rpath.str),
                    errdetail("Block %u held the first zeroed page.",
                              first + first_off),
                    errhint("See server log for the other %u zeroed block(s).",
                            zeroed_or_error_count - 1));
    }
    else if (!zeroed_any && ignored_any)
    {
        if (checkfail_count == 1)
            ereport(elevel,
                    errcode(ERRCODE_DATA_CORRUPTED),
                    errmsg("ignoring checksum failure in block %u of relation \"%s\"",
                           first + first_off, rpath.str));
        else
            ereport(elevel,
                    errcode(ERRCODE_DATA_CORRUPTED),
                    errmsg("ignoring %u checksum failures among blocks %u..%u of relation \"%s\"",
                           checkfail_count, first, last, rpath.str),
                    errdetail("Block %u held the first ignored page.",
                              first + first_off),
                    errhint("See server log for the other %u ignored block(s).",
                            checkfail_count - 1));
    }
    else
        pg_unreachable();

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/

Re: Odd usage of errmsg_internal in bufmgr.c

От

Andres Freund

Дата:

12 февраля, 22:17:34

Hi,

On 2026-02-12 18:34:49 +0000, Zsolt Parragi wrote:
> Also:
> 
>   if (result.status == PGAIO_RS_ERROR)
>   {
>   Assert(!zeroed_any); /* can't have invalid pages when zeroing them */
> - affected_count = zeroed_or_error_count;
> - msg_one = _("invalid page in block %u of relation \"%s\"");
> 
> 
> * The comment after assert seems wrong, isn't it backwards? Can't have
> zeroed pages with failed reads

No, I don't think so. This is just about errors on the bufmgr layer. If
zero_damaged_pages is on, all errors on the bufmgr layer are handled by
zeroing the page.

> * Is it really okay to classify read errors as "invalid pages"? This
> is a read failure, so I imagine this can happen if we lost a disk
> because of faulty/flaky hardware, or a user unmounting a partition, or
> moving the file, or anything like that, so we weren't able to actually
> try to read it.

As mentioned above, this is just about errors on the bufmgr layer. If you have
the entire IO fail, it'll be reported via md.c:md_readv_report(). That case is
simpler, since you can't have a case where parts of the IO had an IO error,
but others worked ok.

Greetings,

Andres Freund

Re: Odd usage of errmsg_internal in bufmgr.c

От

Andres Freund

Дата:

12 февраля, 22:51:20

Hi

On 2026-02-12 20:16:52 +0100, Álvaro Herrera wrote:
> But you know what's strange?  There are actually *no* messages that
> appear more than once.  The ereport calls all use different messages.
> The only thing that appears multiple times is
> errcode(ERRCODE_DATA_CORRUPTED).

Not the messages, but the emission of it does have plenty repetition, with a
branch for zeroed_or_error_count == 1 and otherwise, the ereport etc.


I apparently may be alone in this, but I find 6 repetitions of ereports, with
differently indented messages and arguments, depending on whether it's an
errmsg, errdetails, errhint way harder to scan and modify than something that
just shows the different messages with consistent indentation.


> This is because there's either one
> errmsg with no detail/hint (in the single page case), or the other
> errmsg() with detail and hint (when multiple pages are affected).
> 
> I don't understand why the hint says "see server log for the other
> blocks", though ... what part of the code emits that information?

buffer_readv_complete_one(), which does the per-buffer processing for,
possibly larger, IO:

        /*
         * Immediately log a message about the invalid page, but only to the
         * server log. The reason to do so immediately is that this may be
         * executed in a different backend than the one that originated the
         * request. The reason to do so immediately is that the originator
         * might not process the query result immediately (because it is busy
         * doing another part of query processing) or at all (e.g. if it was
         * cancelled or errored out due to another IO also failing). The
         * definer of the IO will emit an ERROR or WARNING when processing the
         * IO's results
         *
         * To avoid duplicating the code to emit these log messages, we reuse
         * buffer_readv_report().
         */

(hm, there's a repeated "The reason to do so immediately is" that should just
be an "and")


> [ ... some cscope jumping later ... ]  Is it md_readv_report?

No, that's for the IO failing in its entirety.

Greetings,

Andres Freund

Re: Odd usage of errmsg_internal in bufmgr.c

От

Zsolt Parragi

Дата:

12 февраля, 23:25:19

> No, I don't think so. This is just about errors on the bufmgr layer.

I see. Looks like I misinterpreted the comment in md.c where it sets
this flag when it reads 0 blocks.

> I apparently may be alone in this, but I find 6 repetitions of ereports, with
> differently indented messages and arguments, depending on whether it's an
> errmsg, errdetails, errhint way harder to scan and modify than something that
> just shows the different messages with consistent indentation.

Is changing the messages to follow the same pattern an option?

For example the error messages:

"read error in block %u of relation \"%s\": %s"
"%u read errors among blocks %u..%u of relation \"%s\": %s"

When the last string is conditional:

* invalid page(s)
* zeroing out invalid page(s)
* ignoring checksum error(s)

or maybe

"read error: %s in block %u of relation \"%s\""

errdetail would use a generic description (first faulty page?)
errhint could stay the same (maybe with errhint_plural?)

And that would simplify the ifs:

if (result.status == PGAIO_RS_ERROR)
{
affected_count = zeroed_or_error_count;
msg_action = affected_count > 1 ? "invalid pages" : "invalid page";
}

Re: Odd usage of errmsg_internal in bufmgr.c

От

Tom Lane

Дата:

12 февраля, 23:44:27

Andres Freund <andres@anarazel.de> writes:
> I apparently may be alone in this, but I find 6 repetitions of ereports, with
> differently indented messages and arguments, depending on whether it's an
> errmsg, errdetails, errhint way harder to scan and modify than something that
> just shows the different messages with consistent indentation.

I was about to make a similar comment: I don't find anything
particularly attractive about the output that this code is trying
to generate, nor does it seem helpful to have several variations
of what's fundamentally the same message.

I'd rip it all out in favor of a generic message that covers both
the single- and multi-page cases.

            regards, tom lane

Re: Odd usage of errmsg_internal in bufmgr.c

От

Andres Freund

Дата:

13 февраля, 00:02:37

Hi,

On 2026-02-12 20:25:19 +0000, Zsolt Parragi wrote:
> > No, I don't think so. This is just about errors on the bufmgr layer.
> 
> I see. Looks like I misinterpreted the comment in md.c where it sets
> this flag when it reads 0 blocks.

With AIO an IO can fail at multiple levels, so the error stored in the handle
is associated with the layer at which the error was "diagnosed".

If we e.g. were to support am smgr implementation other than md.c it might
need very different IO error conditions (e.g. network failures if a networked
smgr implementation) than md.c, but the errors at the level of bufmgr.c would
stay the same as today.

It's also conceivable that a higher layer could just ignore the error by a
lower layer and would thereby just "override" the lower layer's error.  E.g. a
non-existant freespacemap could be recreated by freespacemap.c after a failure
to read, instead of having to check how large the on-disk FSM is before trying
to read it (check fsm_readbuf() for how it currently works).

> > I apparently may be alone in this, but I find 6 repetitions of ereports, with
> > differently indented messages and arguments, depending on whether it's an
> > errmsg, errdetails, errhint way harder to scan and modify than something that
> > just shows the different messages with consistent indentation.
> 
> Is changing the messages to follow the same pattern an option?
> 
> For example the error messages:
> 
> "read error in block %u of relation \"%s\": %s"
> "%u read errors among blocks %u..%u of relation \"%s\": %s"
> 
> When the last string is conditional:
> 
> * invalid page(s)
> * zeroing out invalid page(s)
> * ignoring checksum error(s)

That gets complicated with translation, because you need to translate the %s
arguments need to be translated separately, as the translation cannot be done
together with the error/detail message. Doable, but it's not an no-cost win.

> or maybe
> 
> "read error: %s in block %u of relation \"%s\""

That's not really following our message guidelines...  I'm not sure I believe
in all the tenants stated in our guidelines, but we do try to follow them...

Greetings,

Andres Freund

Re: Odd usage of errmsg_internal in bufmgr.c

От

Chao Li

Дата:

13 февраля, 03:31:55


> On Feb 13, 2026, at 05:02, Andres Freund <andres@anarazel.de> wrote:
>
>
> Hi,
>
> On 2026-02-12 20:25:19 +0000, Zsolt Parragi wrote:
>>> No, I don't think so. This is just about errors on the bufmgr layer.
>>
>> I see. Looks like I misinterpreted the comment in md.c where it sets
>> this flag when it reads 0 blocks.
>
> With AIO an IO can fail at multiple levels, so the error stored in the handle
> is associated with the layer at which the error was "diagnosed".
>
> If we e.g. were to support am smgr implementation other than md.c it might
> need very different IO error conditions (e.g. network failures if a networked
> smgr implementation) than md.c, but the errors at the level of bufmgr.c would
> stay the same as today.
>
> It's also conceivable that a higher layer could just ignore the error by a
> lower layer and would thereby just "override" the lower layer's error.  E.g. a
> non-existant freespacemap could be recreated by freespacemap.c after a failure
> to read, instead of having to check how large the on-disk FSM is before trying
> to read it (check fsm_readbuf() for how it currently works).
>
>
>>> I apparently may be alone in this, but I find 6 repetitions of ereports, with
>>> differently indented messages and arguments, depending on whether it's an
>>> errmsg, errdetails, errhint way harder to scan and modify than something that
>>> just shows the different messages with consistent indentation.
>>
>> Is changing the messages to follow the same pattern an option?
>>
>> For example the error messages:
>>
>> "read error in block %u of relation \"%s\": %s"
>> "%u read errors among blocks %u..%u of relation \"%s\": %s"
>>
>> When the last string is conditional:
>>
>> * invalid page(s)
>> * zeroing out invalid page(s)
>> * ignoring checksum error(s)
>
> That gets complicated with translation, because you need to translate the %s
> arguments need to be translated separately, as the translation cannot be done
> together with the error/detail message. Doable, but it's not an no-cost win.
>
>
>
>> or maybe
>>
>> "read error: %s in block %u of relation \"%s\""
>
> That's not really following our message guidelines...  I'm not sure I believe
> in all the tenants stated in our guidelines, but we do try to follow them...
>
> Greetings,
>
> Andres Freund

The discussion has gone quite a bit beyond the original patch intention. I always appreciate the insights from these
exchanges.Since I started the thread, let me try to follow up with an updated version. 

PFA v3:

    * I tried to simplify the logging logic in buffer_readv_report() by leveraging errmsg_plural() and introducing a
function-scopedmacro to reduce the repetitive ereport() structure. I kept in mind that we should not use “%s” to inject
thefailure type (e.g., “invalid page”), as that would break translation. But I am not entirely certain whether using a
macroin this way could have also broken translation; if that turns out to be a concern, I am happy to rework it. 
    * I also removed the repeated wording in the comment that Andres pointed out.

At this point, the only change that relates directly to the original patch topic is the header comment adjustment for
errmsg_internal(),which is otherwise unrelated to the refactoring in bufmgr.c. I am fine with either splitting that out
intoa separate patch or dropping it altogether, whichever is preferred. 

Please let me know if this direction makes sense, or if further simplification would be better.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Вложения

v3-0001-Simplify-buffer_readv_report-error-message-genera.patch

Re: Odd usage of errmsg_internal in bufmgr.c

От

Zsolt Parragi

Дата:

13 февраля, 11:31:50

> That gets complicated with translation, because you need to translate the %s
> arguments need to be translated separately, as the translation cannot be done
> together with the error/detail message. Doable, but it's not an no-cost win.

I am aware of this, but the translation can be again at one place at the end.
As for the cost, if I understand correctly in this error path the
server has already generated possibly many log messages already, is
one more translation really a significant cost that we have to worry
about?

I would also probably add the first slightly different if into the
same message, by adding the two counts together, and mentioning
"invalid pages or checksum errors" at the end - since we already
suggest checking the server logs for more details, is it useful to
display 2 separate exact counts for that?

And that way we have only 1 error message for everything.


> and introducing a function-scoped macro to reduce the repetitive ereport() structure

I tried something similar initially, but I didn't find this too readable

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Odd usage of errmsg_internal in bufmgr.c

Вложения

Вложения

Вложения