Обсуждение: Fix and improve allocation formulas
Hi hackers, Two allocation formulas have been fixed recently in 3f83de20ba2 and 06761b6096b, so I looked for potential others with a coccinelle script [1]. It found two formulas that are technically correct, but using GBT_VARKEY and char are the semantically appropriate choices (see 0001 attached). Also, to make this safer, instead of: " var = palloc(sizeof(T) * count) " we could do: " var = palloc(sizeof(*var) * count) " that way the size computation is correct even if the variable's type changes ( less prone to errors and bugs then). That would give something like in 0002 (produced with [2]). Note that: - 0002 is a very large patch. I think that it provides added value as mentioned above but I'm not sure it is worth the noise. Anyway it is done, so sharing here to get your thoughts. - sizeof(*var) is evaluated at compile time so that's safe even with uninitialized pointers - this is the preferred form for the Linux kernel (see "Allocating memory" in the coding style doc [3]) - when there is casting involved, that might look weird to have the cast and not computing the size on the "type". So, I've a mixed feeling about those even if I think that's right to have a consistent approach. Remarks: - the patch does not touch the "test" files to reduce the noise - we could do the same for: " var = palloc_array(T, count) " to " var = palloc_array(*var, count) " but that would not work because palloc_array is defined as: #define palloc_array(type, count) ((type *) palloc(sizeof(type) * (count))) and the cast would fail. We could use typeof() in palloc_array() but that leads to the same discussion as in [4]. Thoughts? [1]: https://github.com/bdrouvot/coccinelle_on_pg/blob/main/misc/detect_sizeof_bugs.cocci [2]: https://github.com/bdrouvot/coccinelle_on_pg/blob/main/misc/use_var_in_sizeof.cocci [3]: https://www.kernel.org/doc/html/latest/process/coding-style.html [4]: https://www.postgresql.org/message-id/flat/CAGECzQR21OnnKiZO_1rLWO0-16kg1JBxnVq-wymYW0-_1cUNtg%40mail.gmail.com Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Вложения
Hi, On 2025-12-11 13:27:56 +0000, Bertrand Drouvot wrote: > - 0002 is a very large patch. I think that it provides added value as mentioned > above but I'm not sure it is worth the noise. Anyway it is done, so sharing > here to get your thoughts. I find the recent trend to sent auto-generated huge patches to the list ... not great. I think there's practially zero chance of them getting applied and it takes away mental bandwidth from stuff that has a chance. I tend to agree that what you propose is the better style, but I seriously doubt that a) changing over everything at once is worth the backpatch hazard and review pain b) that to judge whether we should do this a 277kB patch is useful c) that changing the existing code should be the first thing, if we want to make this the new style, we should first document the sizeof(*var) approach to be preferred. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes:
> I tend to agree that what you propose is the better style, but I seriously
> doubt that
> a) changing over everything at once is worth the backpatch hazard and review
> pain
> b) that to judge whether we should do this a 277kB patch is useful
> c) that changing the existing code should be the first thing, if we want to
> make this the new style, we should first document the sizeof(*var) approach to
> be preferred.
And before that, you'd have to get consensus that sizeof(*var) *is*
the preferred style. I for one don't like it a bit. IMO what it
mostly accomplishes is to remove a cue as to what we are allocating.
I don't agree that it removes a chance for error, either. Sure,
if you write
foo = palloc(sizeof(typeA))
when foo is of type typeB*, you made a mistake --- but we know how
to get the compiler to warn about such mistakes, and indeed the
main point of the palloc_object() changes was to catch those.
However, suppose you write
foo = palloc(sizeof(*bar))
I claim that's about an equally credible typo, and there is
nothing that will detect it.
regards, tom lane
On 2025-Dec-11, Andres Freund wrote: > a) changing over everything at once is worth the backpatch hazard and review > pain The other issue with these giant patches is that they cause many largish patches waiting in the commitfest process to require rebases, which are sometimes not trivial to do. Also, all the Postgres forks will require tedious merges later on. I have my part of blame for having committed the mass change to XLogRecPtrIsValid in a2b02293bc65. I'm starting to regret that now. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "Digital and video cameras have this adjustment and film cameras don't for the same reason dogs and cats lick themselves: because they can." (Ken Rockwell)
Hi, On Thu, Dec 11, 2025 at 10:39:55AM -0500, Andres Freund wrote: > Hi, > > On 2025-12-11 13:27:56 +0000, Bertrand Drouvot wrote: > > - 0002 is a very large patch. I think that it provides added value as mentioned > > above but I'm not sure it is worth the noise. Anyway it is done, so sharing > > here to get your thoughts. > > I find the recent trend to sent auto-generated huge patches to the list > ... not great. I think there's practially zero chance of them getting applied > and it takes away mental bandwidth from stuff that has a chance. > > I tend to agree that what you propose is the better style, but I seriously > doubt that > > a) changing over everything at once is worth the backpatch hazard and review > pain > b) that to judge whether we should do this a 277kB patch is useful Yeah I agree that it's almost impossible to review such big patches. The idea was more to show the impact rather than thinking it would be applied as it is. That said, when a patch needs to modify a large amount of code and when that's worth it (not saying it is the case in the current thread) we could think of an approach like modifying 20 files per patch and applying, say the 10 patches at a frequency of one per month. I think that most of the time those patches are mainly about refactoring to improve the code so I don't think that's an issue if it takes a year or so to have all the sub-patches applied. We could discuss the approach more in depth if another use case shows up (the approach would probably also depend of the use case). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Thu, Dec 11, 2025 at 05:56:13PM +0100, Álvaro Herrera wrote: > I have my part of blame for having committed the mass change to > XLogRecPtrIsValid in a2b02293bc65. I'm starting to regret that now. After reflecting on this one, I do agree that this one was probably not worth the mass changes. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi,
On Thu, Dec 11, 2025 at 11:43:27AM -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I tend to agree that what you propose is the better style, but I seriously
> > doubt that
>
> > a) changing over everything at once is worth the backpatch hazard and review
> > pain
> > b) that to judge whether we should do this a 277kB patch is useful
> > c) that changing the existing code should be the first thing, if we want to
> > make this the new style, we should first document the sizeof(*var) approach to
> > be preferred.
>
> And before that, you'd have to get consensus that sizeof(*var) *is*
> the preferred style. I for one don't like it a bit. IMO what it
> mostly accomplishes is to remove a cue as to what we are allocating.
> I don't agree that it removes a chance for error, either. Sure,
> if you write
>
> foo = palloc(sizeof(typeA))
>
> when foo is of type typeB*, you made a mistake --- but we know how
> to get the compiler to warn about such mistakes, and indeed the
> main point of the palloc_object() changes was to catch those.
Right, thanks to the cast in palloc_object()/palloc_array() that produces
-Wincompatible-pointer-types or -Wpointer-sign warnings for most cases.
Still that does not protect against the ones that are semantically wrong, say:
TransactionId *xids = palloc_array(CommandId, 100);
That's not a major concern though.
> However, suppose you write
>
> foo = palloc(sizeof(*bar))
We could imagine a macro like:
#define palloc_set_var(var, count) \
((var) = palloc((count) * sizeof(*(var))))
to prevent those typos, but that's useless if we remove all those palloc
calls and adopt palloc_object() and palloc_array() usage instead.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Thu, Dec 11, 2025 at 11:43:27AM -0500, Tom Lane wrote: > And before that, you'd have to get consensus that sizeof(*var) *is* > the preferred style. I for one don't like it a bit. IMO what it > mostly accomplishes is to remove a cue as to what we are allocating. > I don't agree that it removes a chance for error, either. Sure, > if you write > > foo = palloc(sizeof(typeA)) > > when foo is of type typeB*, you made a mistake --- but we know how > to get the compiler to warn about such mistakes, and indeed the > main point of the palloc_object() changes was to catch those. > However, suppose you write > > foo = palloc(sizeof(*bar)) > > I claim that's about an equally credible typo, and there is > nothing that will detect it. Yeah, I'd prefer something where we keep track of the type, with the extra layer that enforces a cast to the type of the variable like palloc_object/array macros. The latter style of specifying a variable pointer within the sizeof is more error-prone long-term, so it's not something I think we should encourage. -- Michael
Вложения
On 12.12.25 10:53, Michael Paquier wrote:
> On Thu, Dec 11, 2025 at 11:43:27AM -0500, Tom Lane wrote:
>> And before that, you'd have to get consensus that sizeof(*var) *is*
>> the preferred style. I for one don't like it a bit. IMO what it
>> mostly accomplishes is to remove a cue as to what we are allocating.
>> I don't agree that it removes a chance for error, either. Sure,
>> if you write
>>
>> foo = palloc(sizeof(typeA))
>>
>> when foo is of type typeB*, you made a mistake --- but we know how
>> to get the compiler to warn about such mistakes, and indeed the
>> main point of the palloc_object() changes was to catch those.
>> However, suppose you write
>>
>> foo = palloc(sizeof(*bar))
>>
>> I claim that's about an equally credible typo, and there is
>> nothing that will detect it.
>
> Yeah, I'd prefer something where we keep track of the type, with the
> extra layer that enforces a cast to the type of the variable like
> palloc_object/array macros. The latter style of specifying a variable
> pointer within the sizeof is more error-prone long-term, so it's not
> something I think we should encourage.
The original proposal that led to palloc_object() etc.[0] contained a
function palloc_ptrtype() that you would use like
foo = palloc_ptrtype(foo)
but people didn't like that for all these reasons.
[0]:
https://www.postgresql.org/message-id/flat/bb755632-2a43-d523-36f8-a1e7a389a907@enterprisedb.com
On Thu, Dec 11, 2025 at 01:27:56PM +0000, Bertrand Drouvot wrote:
> It found two formulas that are technically correct, but using GBT_VARKEY and char
> are the semantically appropriate choices (see 0001 attached).
Bertrand has just reminded me offline that this two have not been
discussed, and that they're a a separate thing. Looking again at
them.
--- a/contrib/btree_gist/btree_utils_var.c
+++ b/contrib/btree_gist/btree_utils_var.c
@@ -476,7 +476,7 @@ gbt_var_picksplit(const GistEntryVector *entryvec, GIST_SPLITVEC *v,
v->spl_nleft = 0;
v->spl_nright = 0;
- sv = palloc(sizeof(bytea *) * (maxoff + 1));
+ sv = palloc(sizeof(GBT_VARKEY *) * (maxoff + 1));
/* Sort entries */
This one is indeed inconsistent, so this needs to be switched as
proposed.
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index c9b24df7c05..1cd5fa791c0 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -1007,7 +1007,7 @@ verify_tar_file(verifier_context *context, char *relpath, char *fullpath,
return;
}
- buffer = pg_malloc(READ_CHUNK_SIZE * sizeof(uint8));
+ buffer = pg_malloc(READ_CHUNK_SIZE * sizeof(char));
/* Perform the reads */
while ((rc = read(fd, buffer, READ_CHUNK_SIZE)) > 0)
This one reads the same to me, still it seems to me that the intention
is to deal with a byte array. Doesn't this point to the fact that
using uint8 is more adapted for the astreamer code, following changes
like b28c59a6cd08? That would be a more invasive change, of course.
--
Michael
Вложения
Hi,
On Wed, Dec 17, 2025 at 05:19:33PM +0900, Michael Paquier wrote:
> On Thu, Dec 11, 2025 at 01:27:56PM +0000, Bertrand Drouvot wrote:
> diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
> index c9b24df7c05..1cd5fa791c0 100644
> --- a/src/bin/pg_verifybackup/pg_verifybackup.c
> +++ b/src/bin/pg_verifybackup/pg_verifybackup.c
> @@ -1007,7 +1007,7 @@ verify_tar_file(verifier_context *context, char *relpath, char *fullpath,
> return;
> }
>
> - buffer = pg_malloc(READ_CHUNK_SIZE * sizeof(uint8));
> + buffer = pg_malloc(READ_CHUNK_SIZE * sizeof(char));
>
> /* Perform the reads */
> while ((rc = read(fd, buffer, READ_CHUNK_SIZE)) > 0)
>
> This one reads the same to me, still it seems to me that the intention
> is to deal with a byte array. Doesn't this point to the fact that
> using uint8 is more adapted for the astreamer code, following changes
> like b28c59a6cd08? That would be a more invasive change, of course.
While I think that would make sense to follow the b28c59a6cd08 approach, that would
create a huge cascade of changes. For example, astreamer_recovery_injector_content()
passes mystreamer->recoveryconfcontents->data to astreamer_content() with recoveryconfcontents
being a PQExpBuffer and:
typedef struct PQExpBufferData
{
char *data;
...
I'd vote for just changing the palloc() like proposed in v1.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Wed, Dec 17, 2025 at 09:24:31AM +0000, Bertrand Drouvot wrote: > I'd vote for just changing the palloc() like proposed in v1. It does not seem that bad if we treat the uint8 rule for actual bytes, but I'm going to leave that for others to comment for now. By the way, while checking again the whole, David has proposed something different for btree_utils_var.c still a bit incorrect: https://www.postgresql.org/message-id/4ab4a12b-49e9-4ebf-9055-225c8055fed0@gmail.com I had my eyes on it a couple of days ago and noticed the bytea vs GBT_VARKEY business, but discarded it to look at the rest first because there were a lot of patterns to look at. His solution used palloc_array(), with a "bytea *". Okay, that's actually the same as btree_utils_var.h tells that bytea and GBT_VARKEY are the same thing, but there is also a point in being consistent in the code with what the header wants. So I have reused the palloc_array() with GBT_VARKEY, meaning that the solution of 5cf03552fbb4 is a mix of what both of you have proposed, except that it's consistent with the declaration. -- Michael
Вложения
On Thu, Dec 11, 2025 at 05:56:13PM +0100, Álvaro Herrera wrote:
> On 2025-Dec-11, Andres Freund wrote:
>
> > a) changing over everything at once is worth the backpatch hazard and review
> > pain
>
> The other issue with these giant patches is that they cause many largish
> patches waiting in the commitfest process to require rebases, which are
> sometimes not trivial to do. Also, all the Postgres forks will
> require tedious merges later on.
>
> I have my part of blame for having committed the mass change to
> XLogRecPtrIsValid in a2b02293bc65. I'm starting to regret that now.
I think the bigger issue is that these patches bypass the normal
workflow of implementing changes in Postgres --- specifically, asking
about Desirability first:
https://wiki.postgresql.org/wiki/Todo
Desirability -> Design -> Implement -> Test -> Review -> Commit
It would have been much cleaner to discuss the desirability of this
change on its own.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
Do not let urgent matters crowd out time for investment in the future.