Обсуждение: Fix and improve allocation formulas

Поиск
Список
Период
Сортировка

Fix and improve allocation formulas

От
Bertrand Drouvot
Дата:
Hi hackers,

Two allocation formulas have been fixed recently in 3f83de20ba2 and 06761b6096b,
so I looked for potential others with a coccinelle script [1].

It found two formulas that are technically correct, but using GBT_VARKEY and char
are the semantically appropriate choices (see 0001 attached).

Also, to make this safer, instead of:

"
var = palloc(sizeof(T) * count)
"

we could do:

"
var = palloc(sizeof(*var) * count)
"

that way the size computation is correct even if the variable's type changes (
less prone to errors and bugs then).

That would give something like in 0002 (produced with [2]).

Note that:

- 0002 is a very large patch. I think that it provides added value as mentioned
above but I'm not sure it is worth the noise. Anyway it is done, so sharing
here to get your thoughts.

- sizeof(*var) is evaluated at compile time so that's safe even with uninitialized pointers

- this is the preferred form for the Linux kernel (see "Allocating memory" in the
coding style doc [3])

- when there is casting involved, that might look weird to have the cast and not
computing the size on the "type". So, I've a mixed feeling about those even if I
think that's right to have a consistent approach.

Remarks:

- the patch does not touch the "test" files to reduce the noise
- we could do the same for:

"
var = palloc_array(T, count)
"

to

"
var = palloc_array(*var, count)
"

but that would not work because palloc_array is defined as:

#define palloc_array(type, count) ((type *) palloc(sizeof(type) * (count)))

and the cast would fail. We could use typeof() in palloc_array() but that leads
to the same discussion as in [4]. 

Thoughts?

[1]: https://github.com/bdrouvot/coccinelle_on_pg/blob/main/misc/detect_sizeof_bugs.cocci
[2]: https://github.com/bdrouvot/coccinelle_on_pg/blob/main/misc/use_var_in_sizeof.cocci
[3]: https://www.kernel.org/doc/html/latest/process/coding-style.html
[4]: https://www.postgresql.org/message-id/flat/CAGECzQR21OnnKiZO_1rLWO0-16kg1JBxnVq-wymYW0-_1cUNtg%40mail.gmail.com

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: Fix and improve allocation formulas

От
Andres Freund
Дата:
Hi,

On 2025-12-11 13:27:56 +0000, Bertrand Drouvot wrote:
> - 0002 is a very large patch. I think that it provides added value as mentioned
> above but I'm not sure it is worth the noise. Anyway it is done, so sharing
> here to get your thoughts.

I find the recent trend to sent auto-generated huge patches to the list
... not great. I think there's practially zero chance of them getting applied
and it takes away mental bandwidth from stuff that has a chance.

I tend to agree that what you propose is the better style, but I seriously
doubt that

a) changing over everything at once is worth the backpatch hazard and review
   pain
b) that to judge whether we should do this a 277kB patch is useful
c) that changing the existing code should be the first thing, if we want to
   make this the new style, we should first document the sizeof(*var) approach to
   be preferred.

Greetings,

Andres Freund



Re: Fix and improve allocation formulas

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> I tend to agree that what you propose is the better style, but I seriously
> doubt that

> a) changing over everything at once is worth the backpatch hazard and review
>    pain
> b) that to judge whether we should do this a 277kB patch is useful
> c) that changing the existing code should be the first thing, if we want to
>    make this the new style, we should first document the sizeof(*var) approach to
>    be preferred.

And before that, you'd have to get consensus that sizeof(*var) *is*
the preferred style.  I for one don't like it a bit.  IMO what it
mostly accomplishes is to remove a cue as to what we are allocating.
I don't agree that it removes a chance for error, either.  Sure,
if you write

    foo = palloc(sizeof(typeA))

when foo is of type typeB*, you made a mistake --- but we know how
to get the compiler to warn about such mistakes, and indeed the
main point of the palloc_object() changes was to catch those.
However, suppose you write

    foo = palloc(sizeof(*bar))

I claim that's about an equally credible typo, and there is
nothing that will detect it.

            regards, tom lane



Re: Fix and improve allocation formulas

От
Álvaro Herrera
Дата:
On 2025-Dec-11, Andres Freund wrote:

> a) changing over everything at once is worth the backpatch hazard and review
>    pain

The other issue with these giant patches is that they cause many largish
patches waiting in the commitfest process to require rebases, which are
sometimes not trivial to do.  Also, all the Postgres forks will
require tedious merges later on.

I have my part of blame for having committed the mass change to
XLogRecPtrIsValid in a2b02293bc65.  I'm starting to regret that now.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Digital and video cameras have this adjustment and film cameras don't for the
same reason dogs and cats lick themselves: because they can."   (Ken Rockwell)



Re: Fix and improve allocation formulas

От
Bertrand Drouvot
Дата:
Hi,

On Thu, Dec 11, 2025 at 10:39:55AM -0500, Andres Freund wrote:
> Hi,
> 
> On 2025-12-11 13:27:56 +0000, Bertrand Drouvot wrote:
> > - 0002 is a very large patch. I think that it provides added value as mentioned
> > above but I'm not sure it is worth the noise. Anyway it is done, so sharing
> > here to get your thoughts.
> 
> I find the recent trend to sent auto-generated huge patches to the list
> ... not great. I think there's practially zero chance of them getting applied
> and it takes away mental bandwidth from stuff that has a chance.
> 
> I tend to agree that what you propose is the better style, but I seriously
> doubt that
> 
> a) changing over everything at once is worth the backpatch hazard and review
>    pain
> b) that to judge whether we should do this a 277kB patch is useful

Yeah I agree that it's almost impossible to review such big patches. The idea
was more to show the impact rather than thinking it would be applied as it is.

That said, when a patch needs to modify a large amount of code and when that's worth
it (not saying it is the case in the current thread) we could think of an approach
like modifying 20 files per patch and applying, say the 10 patches at a frequency
of one per month.

I think that most of the time those patches are mainly about refactoring to improve
the code so I don't think that's an issue if it takes a year or so to have all the
sub-patches applied.

We could discuss the approach more in depth if another use case shows up (the
approach would probably also depend of the use case).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Fix and improve allocation formulas

От
Bertrand Drouvot
Дата:
Hi,

On Thu, Dec 11, 2025 at 05:56:13PM +0100, Álvaro Herrera wrote:
> I have my part of blame for having committed the mass change to
> XLogRecPtrIsValid in a2b02293bc65.  I'm starting to regret that now.

After reflecting on this one, I do agree that this one was probably not worth
the mass changes.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Fix and improve allocation formulas

От
Bertrand Drouvot
Дата:
Hi,

On Thu, Dec 11, 2025 at 11:43:27AM -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I tend to agree that what you propose is the better style, but I seriously
> > doubt that
> 
> > a) changing over everything at once is worth the backpatch hazard and review
> >    pain
> > b) that to judge whether we should do this a 277kB patch is useful
> > c) that changing the existing code should be the first thing, if we want to
> >    make this the new style, we should first document the sizeof(*var) approach to
> >    be preferred.
> 
> And before that, you'd have to get consensus that sizeof(*var) *is*
> the preferred style.  I for one don't like it a bit.  IMO what it
> mostly accomplishes is to remove a cue as to what we are allocating.
> I don't agree that it removes a chance for error, either.  Sure,
> if you write
> 
>     foo = palloc(sizeof(typeA))
> 
> when foo is of type typeB*, you made a mistake --- but we know how
> to get the compiler to warn about such mistakes, and indeed the
> main point of the palloc_object() changes was to catch those.

Right, thanks to the cast in palloc_object()/palloc_array() that produces
-Wincompatible-pointer-types or -Wpointer-sign warnings for most cases.

Still that does not protect against the ones that are semantically wrong, say:

TransactionId *xids = palloc_array(CommandId, 100);

That's not a major concern though.

> However, suppose you write
> 
>     foo = palloc(sizeof(*bar))

We could imagine a macro like:

#define palloc_set_var(var, count) \
    ((var) = palloc((count) * sizeof(*(var))))

to prevent those typos, but that's useless if we remove all those palloc
calls and adopt palloc_object() and palloc_array() usage instead.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Fix and improve allocation formulas

От
Michael Paquier
Дата:
On Thu, Dec 11, 2025 at 11:43:27AM -0500, Tom Lane wrote:
> And before that, you'd have to get consensus that sizeof(*var) *is*
> the preferred style.  I for one don't like it a bit.  IMO what it
> mostly accomplishes is to remove a cue as to what we are allocating.
> I don't agree that it removes a chance for error, either.  Sure,
> if you write
>
>     foo = palloc(sizeof(typeA))
>
> when foo is of type typeB*, you made a mistake --- but we know how
> to get the compiler to warn about such mistakes, and indeed the
> main point of the palloc_object() changes was to catch those.
> However, suppose you write
>
>     foo = palloc(sizeof(*bar))
>
> I claim that's about an equally credible typo, and there is
> nothing that will detect it.

Yeah, I'd prefer something where we keep track of the type, with the
extra layer that enforces a cast to the type of the variable like
palloc_object/array macros.  The latter style of specifying a variable
pointer within the sizeof is more error-prone long-term, so it's not
something I think we should encourage.
--
Michael

Вложения