Обсуждение: Likely undefined behavior with some flexible arrays

Поиск
Список
Период
Сортировка

Likely undefined behavior with some flexible arrays

От
Andres Freund
Дата:
Hi,

I got a - I thought - spurious warning in a development patch. A simplified
reproducer of the warning is [1], which triggers:

<source>: In function 'trigger_warning':
<source>:19:9: warning: array subscript 'struct foo[0]' is partly outside array bounds of 'unsigned char[13]'
[-Warray-bounds=]
   19 |     foop->len = len;
      |         ^~
<source>:18:12: note: object of size 13 allocated by 'allocme'
   18 |     foop = allocme(offsetof(struct foo, data) + sizeof(char) * len);
      |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Compiler returned: 0


This is a pattern that we have in quite a few places.


Note that the warning vanishes if the "flag" element is not present.  The
reason for that difference is that when the flag element is present, sizeof(
struct foo) is 16 on a 64bit platform, but we allocate only 13 bytes, as
offsetof(struct foo, data) is 12.


I asked the gcc folks whether they think the warning is spurious or whether
the code is broken. They think it's the code. Their argument is that it always
has to be legal to assign a struct foo * to a struct foo (i.e. do struct foo a
= *foop), and that that may access the trailing padding bytes.

Unfortunately, after staring at the C99 draft, I think the gcc folks are
right, and this undefined behavior. Although it's really hard to tell with the
C99 formulation.

I also looked at C23, and the examples (e.g. § 6.7.3.2, paragraph 28) do lend
some additional credence to the position of the gcc folks.


Which isn't great, given how widely we use offsetof(structtype, flex_member)
as the base allocation size.

The historic reason we do that is support for systems that didn't support real
flexible arrays, where we used one element arrays instead. In that case
sizeof(structtype) would obviously return too big a value.



I don't know how likely doing the wrong thing here is to lead to wrong code
generation, but relying on that not to happen doesn't seem great.


I've attached a list of structs that have a flexible array and padding,
generated with
  pahole -a --padding_ge=1 --with_flexible_array src/backend/postgres contrib/*/*.so
which of course is not exhaustive. And will differ between platforms.


One example of where we do this is gtrgm_alloc().


It's not entirely obvious to me what to do here. I think the minimum ought to
to not introduce more uses of offsetof() for flexible-array allocation sizing
in the future.  But just leaving the existing uses as-is seems a bit
scary. Fixign them all seems like a lot of fiddly work with some compatibility
hazards :(


Greetings,

Andres Freund

PS: I'm fairly sure that we also are entering UB territory when allocating
space for just the smallest member of a union, as we do a *lot* in numeric.c,
due to all the allocations using NUMERIC_HDRSZ_SHORT. But there's a lot more
standardese to parse to figure out the requirements.


[1]

See also https://godbolt.org/z/98jchG4Ex

/* compiled with gcc -O2 -Warray-bounds */
#include <stddef.h>

struct foo
{
    size_t len;
    /* warning is only emitted if the "flag" struct member is present */
    int flag;
    char data[];
};

extern void *allocme(size_t sz) __attribute__((alloc_size(1)));

struct foo* trigger_warning()
{
    size_t len = 1;
    struct foo *foop;

    foop = allocme(offsetof(struct foo, data) + sizeof(char) * len);
    foop->len = len;

    return foop;
}

Вложения

Re: Likely undefined behavior with some flexible arrays

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> I got a - I thought - spurious warning in a development patch. A simplified
> reproducer of the warning is [1], which triggers:

> <source>: In function 'trigger_warning':
> <source>:19:9: warning: array subscript 'struct foo[0]' is partly outside array bounds of 'unsigned char[13]'
[-Warray-bounds=]
>    19 |     foop->len = len;
>       |         ^~
> <source>:18:12: note: object of size 13 allocated by 'allocme'
>    18 |     foop = allocme(offsetof(struct foo, data) + sizeof(char) * len);
>       |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Compiler returned: 0

Considering that palloc() is going to round up the request to a
maxalign boundary, I think the chances of actual trouble are
precisely zero.  However, if we start getting such warnings on
common compilers, maybe the way to fix it is to put the maxaligns
into the calls?

            regards, tom lane



Re: Likely undefined behavior with some flexible arrays

От
Andres Freund
Дата:
Hi,

On 2026-01-21 17:07:04 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I got a - I thought - spurious warning in a development patch. A simplified
> > reproducer of the warning is [1], which triggers:
> 
> > <source>: In function 'trigger_warning':
> > <source>:19:9: warning: array subscript 'struct foo[0]' is partly outside array bounds of 'unsigned char[13]'
[-Warray-bounds=]
> >    19 |     foop->len = len;
> >       |         ^~
> > <source>:18:12: note: object of size 13 allocated by 'allocme'
> >    18 |     foop = allocme(offsetof(struct foo, data) + sizeof(char) * len);
> >       |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > Compiler returned: 0
> 
> Considering that palloc() is going to round up the request to a
> maxalign boundary, I think the chances of actual trouble are
> precisely zero.

I am more worried about mis-optimizations than actually overflowing an
allocation or such. Although I guess we might eventually see things like
spurious valgrind overflow warnings, if the compiler decides to write to the
padding bytes for efficiency reasons.


> However, if we start getting such warnings on common compilers, maybe the
> way to fix it is to put the maxaligns into the calls?

The only reason we're not getting them widely right now is that we're
effectively hiding the allocation sizes from the compiler, because the
compiler doesn't currently know that palloc() allocates. It'd be nice to teach
the compile that palloc allocates, to a) get compiler warnings for things like
use-after-free b) warnings for things like access-beyond-allocation.

Greetings,

Andres Freund



Re: Likely undefined behavior with some flexible arrays

От
Andrey Borodin
Дата:

> On 22 Jan 2026, at 06:56, Andres Freund <andres@anarazel.de> wrote:
>
> It'd be nice to teach
> the compile that palloc allocates, to a) get compiler warnings for things like
> use-after-free b) warnings for things like access-beyond-allocation.

Is there any chance to teach a compiler about short lived memory contexts?


Best regards, Andrey Borodin.


Re: Likely undefined behavior with some flexible arrays

От
Andres Freund
Дата:
Hi,

On 2026-01-22 11:09:37 +0500, Andrey Borodin wrote:
> > On 22 Jan 2026, at 06:56, Andres Freund <andres@anarazel.de> wrote:
> > 
> > It'd be nice to teach
> > the compile that palloc allocates, to a) get compiler warnings for things like
> > use-after-free b) warnings for things like access-beyond-allocation.
> 
> Is there any chance to teach a compiler about short lived memory contexts?

I doubt that we can teach static analysis that anytime soon - I think you'd
need a compiler plugin for that. However I'd already be happy with getting
warnings for obvious stuff like using variables after being pfreed (even
indirectly) or running off the end of an allocation.

We certainly could improve the sanitizer integration with memory contexts, but
that obviously requires reaching the relevant paths in a problematic scenario
to be effective.

Greetings,

Andres Freund