Обсуждение: Avoid overflow (src/backend/utils/adt/formatting.c)

Поиск
Список
Период
Сортировка

Avoid overflow (src/backend/utils/adt/formatting.c)

От
Ranier Vilela
Дата:
Hi.

Per Coverity.

Coverity raised the follow report:

CID 1642824: (#1 of 1): Overflowed constant (INTEGER_OVERFLOW)
37. overflow_const: Expression pattern_len - 1ULL, where pattern_len is known to be equal to 0, underflows the type of pattern_len - 1ULL, which is type unsigned long long.

This is because the function *pg_mbstrlen* can return zero.
And the comment is clear, *just in case there are MB chars*.

patch attached.

best regards,
Ranier Vilela
Вложения

Re: Avoid overflow (src/backend/utils/adt/formatting.c)

От
Bryan Green
Дата:
On 11/2/2025 8:59 AM, Ranier Vilela wrote:
> Hi.
> 
> Per Coverity.
> 
> Coverity raised the follow report:
> 
> CID 1642824: (#1 of 1): Overflowed constant (INTEGER_OVERFLOW)
> 37. overflow_const: Expression pattern_len - 1ULL, where pattern_len is 
> known to be equal to 0, underflows the type of pattern_len - 1ULL, which 
> is type unsigned long long.
> 
> This is because the function *pg_mbstrlen* can return zero.
> And the comment is clear, *just in case there are MB chars*.
> 
> patch attached.
> 
> best regards,
> Ranier Vilela

Thanks for finding and patching this.  I think you're absolutely
right that there's a latent bug here, and your fix is appropriate.
However, I wanted to share some thoughts on why this has probably
never been reported in the wild, and why we should apply the patch
anyway.

The crash requires a specific and rather unlikely combination of
circumstances:

1. You need a locale where lconv->thousands_sep is present but
   empty.  Most locales either provide a real separator (",", ".",
   or " ") or return NULL (in which case NUM_prepare_locale's
   fallback logic provides a default).  The comments mention
   "broken glibc pt_BR" as an example, but even that's been fixed
   in most glibc versions for years now.

2. You need a format string containing 'G', which requests
   insertion of the locale's thousands separator.

Here's the thing: if your locale has no thousands separator, why
would you write a format string asking for one?  Consider:

    -- In a locale with thousands_sep = ","
    SELECT to_char(1234567.89, '9G999G999.99');
    -- Result: "1,234,567.89"

    -- In a locale with empty thousands_sep
    SELECT to_char(1234567.89, '9G999G999.99');
    -- Result: "1234567.89"  (the G's insert nothing)

If you're working in a locale that doesn't have a thousands
separator, you're most likely culturally aware of that fact,
and you simply wouldn't include 'G' in your format strings.
It would be like writing code to format something your locale
doesn't have a concept of.

So I think the bug has survived this long because:
  a) Very few locales have truly empty thousands_sep strings
  b) Users of those locales naturally avoid the 'G' format code
  c) Even accidental use would require the right fill-mode
     settings

That said, in my opinion we should definitely apply your patch,
for several reasons:

First, it's undefined behavior.  Backing up the pointer via
"ptr += -1" when ptr is at the start of allocated memory is a
serious bug, regardless of how unlikely the trigger condition is.
We don't want that lurking in the codebase.

Second, even if this hasn't been triggered in 20+ years, it's
exactly the kind of thing that AFL or similar tools might find.
Better to fix it before we get a CVE filing.

Third, it's a trivial fix.  The performance impact is nil, and it
makes the code more obviously correct.  There's really no
downside.

I'd suggest one minor adjustment to your patch: add a comment
explaining why we're checking for empty pattern, since it's
non-obvious:

    /*
     * Guard against empty separator (could happen with some
     * locales)
     */
    if (pattern_len > 0)
    {
        ...
    }

That'll help future maintainers understand this isn't just
defensive programming paranoia.

Good catch on finding this. We'll have to see if others agree...

Bryan Green



Re: Avoid overflow (src/backend/utils/adt/formatting.c)

От
Bryan Green
Дата:
On 11/2/2025 10:09 AM, Bryan Green wrote:
> On 11/2/2025 8:59 AM, Ranier Vilela wrote:
>> Hi.
>>
>> Per Coverity.
>>
>> Coverity raised the follow report:
>>
>> CID 1642824: (#1 of 1): Overflowed constant (INTEGER_OVERFLOW)
>> 37. overflow_const: Expression pattern_len - 1ULL, where pattern_len is 
>> known to be equal to 0, underflows the type of pattern_len - 1ULL, which 
>> is type unsigned long long.
>>
>> This is because the function *pg_mbstrlen* can return zero.
>> And the comment is clear, *just in case there are MB chars*.
>>
>> patch attached.
>>
>> best regards,
>> Ranier Vilela
> 
> Thanks for finding and patching this.  I think you're absolutely
> right that there's a latent bug here, and your fix is appropriate.
> However, I wanted to share some thoughts on why this has probably
> never been reported in the wild, and why we should apply the patch
> anyway.
> 
> The crash requires a specific and rather unlikely combination of
> circumstances:
> 
> 1. You need a locale where lconv->thousands_sep is present but
>    empty.  Most locales either provide a real separator (",", ".",
>    or " ") or return NULL (in which case NUM_prepare_locale's
>    fallback logic provides a default).  The comments mention
>    "broken glibc pt_BR" as an example, but even that's been fixed
>    in most glibc versions for years now.
> 
> 2. You need a format string containing 'G', which requests
>    insertion of the locale's thousands separator.
> 
> Here's the thing: if your locale has no thousands separator, why
> would you write a format string asking for one?  Consider:
> 
>     -- In a locale with thousands_sep = ","
>     SELECT to_char(1234567.89, '9G999G999.99');
>     -- Result: "1,234,567.89"
> 
>     -- In a locale with empty thousands_sep
>     SELECT to_char(1234567.89, '9G999G999.99');
>     -- Result: "1234567.89"  (the G's insert nothing)
> 
> If you're working in a locale that doesn't have a thousands
> separator, you're most likely culturally aware of that fact,
> and you simply wouldn't include 'G' in your format strings.
> It would be like writing code to format something your locale
> doesn't have a concept of.
> 
> So I think the bug has survived this long because:
>   a) Very few locales have truly empty thousands_sep strings
>   b) Users of those locales naturally avoid the 'G' format code
>   c) Even accidental use would require the right fill-mode
>      settings
> 
> That said, in my opinion we should definitely apply your patch,
> for several reasons:
> 
> First, it's undefined behavior.  Backing up the pointer via
> "ptr += -1" when ptr is at the start of allocated memory is a
> serious bug, regardless of how unlikely the trigger condition is.
> We don't want that lurking in the codebase.
> 
> Second, even if this hasn't been triggered in 20+ years, it's
> exactly the kind of thing that AFL or similar tools might find.
> Better to fix it before we get a CVE filing.
> 
> Third, it's a trivial fix.  The performance impact is nil, and it
> makes the code more obviously correct.  There's really no
> downside.
> 
> I'd suggest one minor adjustment to your patch: add a comment
> explaining why we're checking for empty pattern, since it's
> non-obvious:
> 
>     /*
>      * Guard against empty separator (could happen with some
>      * locales)
>      */
>     if (pattern_len > 0)
>     {
>         ...
>     }
> 
> That'll help future maintainers understand this isn't just
> defensive programming paranoia.
> 
> Good catch on finding this. We'll have to see if others agree...
> 
> Bryan Green
I did a bit more analysis to double check myself and realized I missed
an important check:

NUM_prepare_locale() has a safety check:

    if (lconv->thousands_sep && *lconv->thousands_sep)
        Np->L_thousands_sep = lconv->thousands_sep;
    else if (strcmp(Np->decimal, ",") != 0)
        Np->L_thousands_sep = ",";
    else
        Np->L_thousands_sep = ".";

The "*lconv->thousands_sep" test specifically prevents empty
strings from being used, so Np->L_thousands_sep can never be empty in
production.

The patch is still trivial and it probably should still be applied.

Bryan Green



Re: Avoid overflow (src/backend/utils/adt/formatting.c)

От
Ranier Vilela
Дата:
Hi.

Em dom., 2 de nov. de 2025 às 13:09, Bryan Green <dbryan.green@gmail.com> escreveu:
On 11/2/2025 8:59 AM, Ranier Vilela wrote:
> Hi.
>
> Per Coverity.
>
> Coverity raised the follow report:
>
> CID 1642824: (#1 of 1): Overflowed constant (INTEGER_OVERFLOW)
> 37. overflow_const: Expression pattern_len - 1ULL, where pattern_len is
> known to be equal to 0, underflows the type of pattern_len - 1ULL, which
> is type unsigned long long.
>
> This is because the function *pg_mbstrlen* can return zero.
> And the comment is clear, *just in case there are MB chars*.
>
> patch attached.
>
> best regards,
> Ranier Vilela

Thanks for finding and patching this.  I think you're absolutely
right that there's a latent bug here, and your fix is appropriate.
However, I wanted to share some thoughts on why this has probably
never been reported in the wild, and why we should apply the patch
anyway.

The crash requires a specific and rather unlikely combination of
circumstances:

1. You need a locale where lconv->thousands_sep is present but
   empty.  Most locales either provide a real separator (",", ".",
   or " ") or return NULL (in which case NUM_prepare_locale's
   fallback logic provides a default).  The comments mention
   "broken glibc pt_BR" as an example, but even that's been fixed
   in most glibc versions for years now.

2. You need a format string containing 'G', which requests
   insertion of the locale's thousands separator.

Here's the thing: if your locale has no thousands separator, why
would you write a format string asking for one?  Consider:

    -- In a locale with thousands_sep = ","
    SELECT to_char(1234567.89, '9G999G999.99');
    -- Result: "1,234,567.89"

    -- In a locale with empty thousands_sep
    SELECT to_char(1234567.89, '9G999G999.99');
    -- Result: "1234567.89"  (the G's insert nothing)

If you're working in a locale that doesn't have a thousands
separator, you're most likely culturally aware of that fact,
and you simply wouldn't include 'G' in your format strings.
It would be like writing code to format something your locale
doesn't have a concept of.

So I think the bug has survived this long because:
  a) Very few locales have truly empty thousands_sep strings
  b) Users of those locales naturally avoid the 'G' format code
  c) Even accidental use would require the right fill-mode
     settings

That said, in my opinion we should definitely apply your patch,
for several reasons:

First, it's undefined behavior.  Backing up the pointer via
"ptr += -1" when ptr is at the start of allocated memory is a
serious bug, regardless of how unlikely the trigger condition is.
We don't want that lurking in the codebase.

Second, even if this hasn't been triggered in 20+ years, it's
exactly the kind of thing that AFL or similar tools might find.
Better to fix it before we get a CVE filing.

Third, it's a trivial fix.  The performance impact is nil, and it
makes the code more obviously correct.  There's really no
downside.

I'd suggest one minor adjustment to your patch: add a comment
explaining why we're checking for empty pattern, since it's
non-obvious:

    /*
     * Guard against empty separator (could happen with some
     * locales)
     */
    if (pattern_len > 0)
    {
        ...
    }

That'll help future maintainers understand this isn't just
defensive programming paranoia.

Good catch on finding this. We'll have to see if others agree...
Thanks for taking a look.

I really appreciate your suggestion for the comment,
but I believe it makes it harder to understand.

best regards,
Ranier Vilela

Re: Avoid overflow (src/backend/utils/adt/formatting.c)

От
Ranier Vilela
Дата:


Em dom., 2 de nov. de 2025 às 13:22, Bryan Green <dbryan.green@gmail.com> escreveu:
On 11/2/2025 10:09 AM, Bryan Green wrote:
> On 11/2/2025 8:59 AM, Ranier Vilela wrote:
>> Hi.
>>
>> Per Coverity.
>>
>> Coverity raised the follow report:
>>
>> CID 1642824: (#1 of 1): Overflowed constant (INTEGER_OVERFLOW)
>> 37. overflow_const: Expression pattern_len - 1ULL, where pattern_len is
>> known to be equal to 0, underflows the type of pattern_len - 1ULL, which
>> is type unsigned long long.
>>
>> This is because the function *pg_mbstrlen* can return zero.
>> And the comment is clear, *just in case there are MB chars*.
>>
>> patch attached.
>>
>> best regards,
>> Ranier Vilela
>
> Thanks for finding and patching this.  I think you're absolutely
> right that there's a latent bug here, and your fix is appropriate.
> However, I wanted to share some thoughts on why this has probably
> never been reported in the wild, and why we should apply the patch
> anyway.
>
> The crash requires a specific and rather unlikely combination of
> circumstances:
>
> 1. You need a locale where lconv->thousands_sep is present but
>    empty.  Most locales either provide a real separator (",", ".",
>    or " ") or return NULL (in which case NUM_prepare_locale's
>    fallback logic provides a default).  The comments mention
>    "broken glibc pt_BR" as an example, but even that's been fixed
>    in most glibc versions for years now.
>
> 2. You need a format string containing 'G', which requests
>    insertion of the locale's thousands separator.
>
> Here's the thing: if your locale has no thousands separator, why
> would you write a format string asking for one?  Consider:
>
>     -- In a locale with thousands_sep = ","
>     SELECT to_char(1234567.89, '9G999G999.99');
>     -- Result: "1,234,567.89"
>
>     -- In a locale with empty thousands_sep
>     SELECT to_char(1234567.89, '9G999G999.99');
>     -- Result: "1234567.89"  (the G's insert nothing)
>
> If you're working in a locale that doesn't have a thousands
> separator, you're most likely culturally aware of that fact,
> and you simply wouldn't include 'G' in your format strings.
> It would be like writing code to format something your locale
> doesn't have a concept of.
>
> So I think the bug has survived this long because:
>   a) Very few locales have truly empty thousands_sep strings
>   b) Users of those locales naturally avoid the 'G' format code
>   c) Even accidental use would require the right fill-mode
>      settings
>
> That said, in my opinion we should definitely apply your patch,
> for several reasons:
>
> First, it's undefined behavior.  Backing up the pointer via
> "ptr += -1" when ptr is at the start of allocated memory is a
> serious bug, regardless of how unlikely the trigger condition is.
> We don't want that lurking in the codebase.
>
> Second, even if this hasn't been triggered in 20+ years, it's
> exactly the kind of thing that AFL or similar tools might find.
> Better to fix it before we get a CVE filing.
>
> Third, it's a trivial fix.  The performance impact is nil, and it
> makes the code more obviously correct.  There's really no
> downside.
>
> I'd suggest one minor adjustment to your patch: add a comment
> explaining why we're checking for empty pattern, since it's
> non-obvious:
>
>     /*
>      * Guard against empty separator (could happen with some
>      * locales)
>      */
>     if (pattern_len > 0)
>     {
>         ...
>     }
>
> That'll help future maintainers understand this isn't just
> defensive programming paranoia.
>
> Good catch on finding this. We'll have to see if others agree...
>
> Bryan Green
I did a bit more analysis to double check myself and realized I missed
an important check:

NUM_prepare_locale() has a safety check:

    if (lconv->thousands_sep && *lconv->thousands_sep)
        Np->L_thousands_sep = lconv->thousands_sep;
    else if (strcmp(Np->decimal, ",") != 0)
        Np->L_thousands_sep = ",";
    else
        Np->L_thousands_sep = ".";

The "*lconv->thousands_sep" test specifically prevents empty
strings from being used, so Np->L_thousands_sep can never be empty in
production.
The question is that the *pattern* is not empty.
But that can not contain MB characters.
In this case, not contain any MB chars, pg_mbstrlen
will return zero and overflow can happen.

best regards,
Ranier Vilela

Re: Avoid overflow (src/backend/utils/adt/formatting.c)

От
Bryan Green
Дата:
On 11/2/2025 6:34 PM, Ranier Vilela wrote:
> 
> 
> Em dom., 2 de nov. de 2025 às 13:22, Bryan Green <dbryan.green@gmail.com
> <mailto:dbryan.green@gmail.com>> escreveu:
> 
>     On 11/2/2025 10:09 AM, Bryan Green wrote:
>     > On 11/2/2025 8:59 AM, Ranier Vilela wrote:
>     >> Hi.
>     >>
>     >> Per Coverity.
>     >>
>     >> Coverity raised the follow report:
>     >>
>     >> CID 1642824: (#1 of 1): Overflowed constant (INTEGER_OVERFLOW)
>     >> 37. overflow_const: Expression pattern_len - 1ULL, where
>     pattern_len is
>     >> known to be equal to 0, underflows the type of pattern_len -
>     1ULL, which
>     >> is type unsigned long long.
>     >>
>     >> This is because the function *pg_mbstrlen* can return zero.
>     >> And the comment is clear, *just in case there are MB chars*.
>     >>
>     >> patch attached.
>     >>
>     >> best regards,
>     >> Ranier Vilela
>     >
>     > Thanks for finding and patching this.  I think you're absolutely
>     > right that there's a latent bug here, and your fix is appropriate.
>     > However, I wanted to share some thoughts on why this has probably
>     > never been reported in the wild, and why we should apply the patch
>     > anyway.
>     >
>     > The crash requires a specific and rather unlikely combination of
>     > circumstances:
>     >
>     > 1. You need a locale where lconv->thousands_sep is present but
>     >    empty.  Most locales either provide a real separator (",", ".",
>     >    or " ") or return NULL (in which case NUM_prepare_locale's
>     >    fallback logic provides a default).  The comments mention
>     >    "broken glibc pt_BR" as an example, but even that's been fixed
>     >    in most glibc versions for years now.
>     >
>     > 2. You need a format string containing 'G', which requests
>     >    insertion of the locale's thousands separator.
>     >
>     > Here's the thing: if your locale has no thousands separator, why
>     > would you write a format string asking for one?  Consider:
>     >
>     >     -- In a locale with thousands_sep = ","
>     >     SELECT to_char(1234567.89, '9G999G999.99');
>     >     -- Result: "1,234,567.89"
>     >
>     >     -- In a locale with empty thousands_sep
>     >     SELECT to_char(1234567.89, '9G999G999.99');
>     >     -- Result: "1234567.89"  (the G's insert nothing)
>     >
>     > If you're working in a locale that doesn't have a thousands
>     > separator, you're most likely culturally aware of that fact,
>     > and you simply wouldn't include 'G' in your format strings.
>     > It would be like writing code to format something your locale
>     > doesn't have a concept of.
>     >
>     > So I think the bug has survived this long because:
>     >   a) Very few locales have truly empty thousands_sep strings
>     >   b) Users of those locales naturally avoid the 'G' format code
>     >   c) Even accidental use would require the right fill-mode
>     >      settings
>     >
>     > That said, in my opinion we should definitely apply your patch,
>     > for several reasons:
>     >
>     > First, it's undefined behavior.  Backing up the pointer via
>     > "ptr += -1" when ptr is at the start of allocated memory is a
>     > serious bug, regardless of how unlikely the trigger condition is.
>     > We don't want that lurking in the codebase.
>     >
>     > Second, even if this hasn't been triggered in 20+ years, it's
>     > exactly the kind of thing that AFL or similar tools might find.
>     > Better to fix it before we get a CVE filing.
>     >
>     > Third, it's a trivial fix.  The performance impact is nil, and it
>     > makes the code more obviously correct.  There's really no
>     > downside.
>     >
>     > I'd suggest one minor adjustment to your patch: add a comment
>     > explaining why we're checking for empty pattern, since it's
>     > non-obvious:
>     >
>     >     /*
>     >      * Guard against empty separator (could happen with some
>     >      * locales)
>     >      */
>     >     if (pattern_len > 0)
>     >     {
>     >         ...
>     >     }
>     >
>     > That'll help future maintainers understand this isn't just
>     > defensive programming paranoia.
>     >
>     > Good catch on finding this. We'll have to see if others agree...
>     >
>     > Bryan Green
>     I did a bit more analysis to double check myself and realized I missed
>     an important check:
> 
>     NUM_prepare_locale() has a safety check:
> 
>         if (lconv->thousands_sep && *lconv->thousands_sep)
>             Np->L_thousands_sep = lconv->thousands_sep;
>         else if (strcmp(Np->decimal, ",") != 0)
>             Np->L_thousands_sep = ",";
>         else
>             Np->L_thousands_sep = ".";
> 
>     The "*lconv->thousands_sep" test specifically prevents empty
>     strings from being used, so Np->L_thousands_sep can never be empty in
>     production.
> 
> The question is that the *pattern* is not empty.
> But that can not contain MB characters.
> In this case, not contain any MB chars, pg_mbstrlen
> will return zero and overflow can happen.
> 
> best regards,
> Ranier Vilela

You're right, and I was wrong. I got focused on whether the string could
be empty at the byte level and completely missed your actual point.

pg_mbstrlen() returns character count, not byte count. So even though
NUM_prepare_locale() ensures L_thousands_sep isn't an empty string, if
those bytes aren't valid in the database encoding, pg_mbstrlen() will
return 0. Then we do:

    Np->inout_p += pattern_len - 1;

which is undefined behavior when pattern_len is 0.

I tried to reproduce this with various locale configurations and couldn't
trigger it. Tested Portuguese_Brazil, German_Germany, different encoding
combinations - everything worked fine. Modern locales use ASCII-safe
separators like "," or "." that work everywhere. You'd need a genuinely
broken locale to get bytes that are invalid UTF-8.

Still, undefined behavior is undefined behavior. The fix is simple,
makes the code more robust, quietens Coverity. I think we
should apply it on principle even if nobody's likely to hit it in
practice. If someone's got locale data that broken, they've got bigger
problems anyway.

I'd call this a code hardening fix rather than an urgent bug. Not
something that needs immediate backpatching, but reasonable to include
in normal maintenance releases?

Bryan



Re: Avoid overflow (src/backend/utils/adt/formatting.c)

От
Ranier Vilela
Дата:


Em seg., 3 de nov. de 2025 às 07:45, Bryan Green <dbryan.green@gmail.com> escreveu:
On 11/2/2025 6:34 PM, Ranier Vilela wrote:
>
>
> Em dom., 2 de nov. de 2025 às 13:22, Bryan Green <dbryan.green@gmail.com
> <mailto:dbryan.green@gmail.com>> escreveu:
>
>     On 11/2/2025 10:09 AM, Bryan Green wrote:
>     > On 11/2/2025 8:59 AM, Ranier Vilela wrote:
>     >> Hi.
>     >>
>     >> Per Coverity.
>     >>
>     >> Coverity raised the follow report:
>     >>
>     >> CID 1642824: (#1 of 1): Overflowed constant (INTEGER_OVERFLOW)
>     >> 37. overflow_const: Expression pattern_len - 1ULL, where
>     pattern_len is
>     >> known to be equal to 0, underflows the type of pattern_len -
>     1ULL, which
>     >> is type unsigned long long.
>     >>
>     >> This is because the function *pg_mbstrlen* can return zero.
>     >> And the comment is clear, *just in case there are MB chars*.
>     >>
>     >> patch attached.
>     >>
>     >> best regards,
>     >> Ranier Vilela
>     >
>     > Thanks for finding and patching this.  I think you're absolutely
>     > right that there's a latent bug here, and your fix is appropriate.
>     > However, I wanted to share some thoughts on why this has probably
>     > never been reported in the wild, and why we should apply the patch
>     > anyway.
>     >
>     > The crash requires a specific and rather unlikely combination of
>     > circumstances:
>     >
>     > 1. You need a locale where lconv->thousands_sep is present but
>     >    empty.  Most locales either provide a real separator (",", ".",
>     >    or " ") or return NULL (in which case NUM_prepare_locale's
>     >    fallback logic provides a default).  The comments mention
>     >    "broken glibc pt_BR" as an example, but even that's been fixed
>     >    in most glibc versions for years now.
>     >
>     > 2. You need a format string containing 'G', which requests
>     >    insertion of the locale's thousands separator.
>     >
>     > Here's the thing: if your locale has no thousands separator, why
>     > would you write a format string asking for one?  Consider:
>     >
>     >     -- In a locale with thousands_sep = ","
>     >     SELECT to_char(1234567.89, '9G999G999.99');
>     >     -- Result: "1,234,567.89"
>     >
>     >     -- In a locale with empty thousands_sep
>     >     SELECT to_char(1234567.89, '9G999G999.99');
>     >     -- Result: "1234567.89"  (the G's insert nothing)
>     >
>     > If you're working in a locale that doesn't have a thousands
>     > separator, you're most likely culturally aware of that fact,
>     > and you simply wouldn't include 'G' in your format strings.
>     > It would be like writing code to format something your locale
>     > doesn't have a concept of.
>     >
>     > So I think the bug has survived this long because:
>     >   a) Very few locales have truly empty thousands_sep strings
>     >   b) Users of those locales naturally avoid the 'G' format code
>     >   c) Even accidental use would require the right fill-mode
>     >      settings
>     >
>     > That said, in my opinion we should definitely apply your patch,
>     > for several reasons:
>     >
>     > First, it's undefined behavior.  Backing up the pointer via
>     > "ptr += -1" when ptr is at the start of allocated memory is a
>     > serious bug, regardless of how unlikely the trigger condition is.
>     > We don't want that lurking in the codebase.
>     >
>     > Second, even if this hasn't been triggered in 20+ years, it's
>     > exactly the kind of thing that AFL or similar tools might find.
>     > Better to fix it before we get a CVE filing.
>     >
>     > Third, it's a trivial fix.  The performance impact is nil, and it
>     > makes the code more obviously correct.  There's really no
>     > downside.
>     >
>     > I'd suggest one minor adjustment to your patch: add a comment
>     > explaining why we're checking for empty pattern, since it's
>     > non-obvious:
>     >
>     >     /*
>     >      * Guard against empty separator (could happen with some
>     >      * locales)
>     >      */
>     >     if (pattern_len > 0)
>     >     {
>     >         ...
>     >     }
>     >
>     > That'll help future maintainers understand this isn't just
>     > defensive programming paranoia.
>     >
>     > Good catch on finding this. We'll have to see if others agree...
>     >
>     > Bryan Green
>     I did a bit more analysis to double check myself and realized I missed
>     an important check:
>
>     NUM_prepare_locale() has a safety check:
>
>         if (lconv->thousands_sep && *lconv->thousands_sep)
>             Np->L_thousands_sep = lconv->thousands_sep;
>         else if (strcmp(Np->decimal, ",") != 0)
>             Np->L_thousands_sep = ",";
>         else
>             Np->L_thousands_sep = ".";
>
>     The "*lconv->thousands_sep" test specifically prevents empty
>     strings from being used, so Np->L_thousands_sep can never be empty in
>     production.
>
> The question is that the *pattern* is not empty.
> But that can not contain MB characters.
> In this case, not contain any MB chars, pg_mbstrlen
> will return zero and overflow can happen.
>
> best regards,
> Ranier Vilela

You're right, and I was wrong. I got focused on whether the string could
be empty at the byte level and completely missed your actual point.

pg_mbstrlen() returns character count, not byte count. So even though
NUM_prepare_locale() ensures L_thousands_sep isn't an empty string, if
those bytes aren't valid in the database encoding, pg_mbstrlen() will
return 0. Then we do:

    Np->inout_p += pattern_len - 1;

which is undefined behavior when pattern_len is 0.

I tried to reproduce this with various locale configurations and couldn't
trigger it. Tested Portuguese_Brazil, German_Germany, different encoding
combinations - everything worked fine. Modern locales use ASCII-safe
separators like "," or "." that work everywhere. You'd need a genuinely
broken locale to get bytes that are invalid UTF-8.
Yeah. Or a broken locale maliciously done.
 

Still, undefined behavior is undefined behavior. The fix is simple,
makes the code more robust, quietens Coverity. I think we
should apply it on principle even if nobody's likely to hit it in
practice. If someone's got locale data that broken, they've got bigger
problems anyway.

I'd call this a code hardening fix rather than an urgent bug. Not
something that needs immediate backpatching, but reasonable to include
in normal maintenance releases?
I agree that backpatching is not necessary.
 
Thank you for your attention and details.

best regards,
Ranier Vilela