Обсуждение: Fix and improve allocation formulas
Hi hackers, Two allocation formulas have been fixed recently in 3f83de20ba2 and 06761b6096b, so I looked for potential others with a coccinelle script [1]. It found two formulas that are technically correct, but using GBT_VARKEY and char are the semantically appropriate choices (see 0001 attached). Also, to make this safer, instead of: " var = palloc(sizeof(T) * count) " we could do: " var = palloc(sizeof(*var) * count) " that way the size computation is correct even if the variable's type changes ( less prone to errors and bugs then). That would give something like in 0002 (produced with [2]). Note that: - 0002 is a very large patch. I think that it provides added value as mentioned above but I'm not sure it is worth the noise. Anyway it is done, so sharing here to get your thoughts. - sizeof(*var) is evaluated at compile time so that's safe even with uninitialized pointers - this is the preferred form for the Linux kernel (see "Allocating memory" in the coding style doc [3]) - when there is casting involved, that might look weird to have the cast and not computing the size on the "type". So, I've a mixed feeling about those even if I think that's right to have a consistent approach. Remarks: - the patch does not touch the "test" files to reduce the noise - we could do the same for: " var = palloc_array(T, count) " to " var = palloc_array(*var, count) " but that would not work because palloc_array is defined as: #define palloc_array(type, count) ((type *) palloc(sizeof(type) * (count))) and the cast would fail. We could use typeof() in palloc_array() but that leads to the same discussion as in [4]. Thoughts? [1]: https://github.com/bdrouvot/coccinelle_on_pg/blob/main/misc/detect_sizeof_bugs.cocci [2]: https://github.com/bdrouvot/coccinelle_on_pg/blob/main/misc/use_var_in_sizeof.cocci [3]: https://www.kernel.org/doc/html/latest/process/coding-style.html [4]: https://www.postgresql.org/message-id/flat/CAGECzQR21OnnKiZO_1rLWO0-16kg1JBxnVq-wymYW0-_1cUNtg%40mail.gmail.com Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Вложения
Hi, On 2025-12-11 13:27:56 +0000, Bertrand Drouvot wrote: > - 0002 is a very large patch. I think that it provides added value as mentioned > above but I'm not sure it is worth the noise. Anyway it is done, so sharing > here to get your thoughts. I find the recent trend to sent auto-generated huge patches to the list ... not great. I think there's practially zero chance of them getting applied and it takes away mental bandwidth from stuff that has a chance. I tend to agree that what you propose is the better style, but I seriously doubt that a) changing over everything at once is worth the backpatch hazard and review pain b) that to judge whether we should do this a 277kB patch is useful c) that changing the existing code should be the first thing, if we want to make this the new style, we should first document the sizeof(*var) approach to be preferred. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes:
> I tend to agree that what you propose is the better style, but I seriously
> doubt that
> a) changing over everything at once is worth the backpatch hazard and review
> pain
> b) that to judge whether we should do this a 277kB patch is useful
> c) that changing the existing code should be the first thing, if we want to
> make this the new style, we should first document the sizeof(*var) approach to
> be preferred.
And before that, you'd have to get consensus that sizeof(*var) *is*
the preferred style. I for one don't like it a bit. IMO what it
mostly accomplishes is to remove a cue as to what we are allocating.
I don't agree that it removes a chance for error, either. Sure,
if you write
foo = palloc(sizeof(typeA))
when foo is of type typeB*, you made a mistake --- but we know how
to get the compiler to warn about such mistakes, and indeed the
main point of the palloc_object() changes was to catch those.
However, suppose you write
foo = palloc(sizeof(*bar))
I claim that's about an equally credible typo, and there is
nothing that will detect it.
regards, tom lane
On 2025-Dec-11, Andres Freund wrote: > a) changing over everything at once is worth the backpatch hazard and review > pain The other issue with these giant patches is that they cause many largish patches waiting in the commitfest process to require rebases, which are sometimes not trivial to do. Also, all the Postgres forks will require tedious merges later on. I have my part of blame for having committed the mass change to XLogRecPtrIsValid in a2b02293bc65. I'm starting to regret that now. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "Digital and video cameras have this adjustment and film cameras don't for the same reason dogs and cats lick themselves: because they can." (Ken Rockwell)
Hi, On Thu, Dec 11, 2025 at 10:39:55AM -0500, Andres Freund wrote: > Hi, > > On 2025-12-11 13:27:56 +0000, Bertrand Drouvot wrote: > > - 0002 is a very large patch. I think that it provides added value as mentioned > > above but I'm not sure it is worth the noise. Anyway it is done, so sharing > > here to get your thoughts. > > I find the recent trend to sent auto-generated huge patches to the list > ... not great. I think there's practially zero chance of them getting applied > and it takes away mental bandwidth from stuff that has a chance. > > I tend to agree that what you propose is the better style, but I seriously > doubt that > > a) changing over everything at once is worth the backpatch hazard and review > pain > b) that to judge whether we should do this a 277kB patch is useful Yeah I agree that it's almost impossible to review such big patches. The idea was more to show the impact rather than thinking it would be applied as it is. That said, when a patch needs to modify a large amount of code and when that's worth it (not saying it is the case in the current thread) we could think of an approach like modifying 20 files per patch and applying, say the 10 patches at a frequency of one per month. I think that most of the time those patches are mainly about refactoring to improve the code so I don't think that's an issue if it takes a year or so to have all the sub-patches applied. We could discuss the approach more in depth if another use case shows up (the approach would probably also depend of the use case). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Thu, Dec 11, 2025 at 05:56:13PM +0100, Álvaro Herrera wrote: > I have my part of blame for having committed the mass change to > XLogRecPtrIsValid in a2b02293bc65. I'm starting to regret that now. After reflecting on this one, I do agree that this one was probably not worth the mass changes. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi,
On Thu, Dec 11, 2025 at 11:43:27AM -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I tend to agree that what you propose is the better style, but I seriously
> > doubt that
>
> > a) changing over everything at once is worth the backpatch hazard and review
> > pain
> > b) that to judge whether we should do this a 277kB patch is useful
> > c) that changing the existing code should be the first thing, if we want to
> > make this the new style, we should first document the sizeof(*var) approach to
> > be preferred.
>
> And before that, you'd have to get consensus that sizeof(*var) *is*
> the preferred style. I for one don't like it a bit. IMO what it
> mostly accomplishes is to remove a cue as to what we are allocating.
> I don't agree that it removes a chance for error, either. Sure,
> if you write
>
> foo = palloc(sizeof(typeA))
>
> when foo is of type typeB*, you made a mistake --- but we know how
> to get the compiler to warn about such mistakes, and indeed the
> main point of the palloc_object() changes was to catch those.
Right, thanks to the cast in palloc_object()/palloc_array() that produces
-Wincompatible-pointer-types or -Wpointer-sign warnings for most cases.
Still that does not protect against the ones that are semantically wrong, say:
TransactionId *xids = palloc_array(CommandId, 100);
That's not a major concern though.
> However, suppose you write
>
> foo = palloc(sizeof(*bar))
We could imagine a macro like:
#define palloc_set_var(var, count) \
((var) = palloc((count) * sizeof(*(var))))
to prevent those typos, but that's useless if we remove all those palloc
calls and adopt palloc_object() and palloc_array() usage instead.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Thu, Dec 11, 2025 at 11:43:27AM -0500, Tom Lane wrote: > And before that, you'd have to get consensus that sizeof(*var) *is* > the preferred style. I for one don't like it a bit. IMO what it > mostly accomplishes is to remove a cue as to what we are allocating. > I don't agree that it removes a chance for error, either. Sure, > if you write > > foo = palloc(sizeof(typeA)) > > when foo is of type typeB*, you made a mistake --- but we know how > to get the compiler to warn about such mistakes, and indeed the > main point of the palloc_object() changes was to catch those. > However, suppose you write > > foo = palloc(sizeof(*bar)) > > I claim that's about an equally credible typo, and there is > nothing that will detect it. Yeah, I'd prefer something where we keep track of the type, with the extra layer that enforces a cast to the type of the variable like palloc_object/array macros. The latter style of specifying a variable pointer within the sizeof is more error-prone long-term, so it's not something I think we should encourage. -- Michael