Обсуждение: Consistently use palloc_object() and palloc_array()

Поиск
Список
Период
Сортировка

Consistently use palloc_object() and palloc_array()

От
David Geier
Дата:
Hi hackers,

I've changed all code to use the "new" palloc_object(), palloc_array(),
palloc0_object(), palloc0_array, repalloc_array() and repalloc0_array()
macros. This makes the code more readable and more consistent.

The patch is pretty big but potential merge conflicts should be easy to
resolve. If preferred, I can also further split up the patch, e.g.
directory by directory or high impact files first.

The patch is passing "meson test" and I've additionally wrote a script
that parses the patch file and verifies that every two corresponding +
and - lines match (e.g. palloc0() replaced by palloc0_array() or
palloc0_object(), the same for palloc() and repalloc(), additionally
some checks to make sure the conversion to the _array() variant is
correct).

--
David Geier
Вложения

Re: Consistently use palloc_object() and palloc_array()

От
Chao Li
Дата:

> On Nov 27, 2025, at 06:09, David Geier <geidav.pg@gmail.com> wrote:
>
> Hi hackers,
>
> I've changed all code to use the "new" palloc_object(), palloc_array(),
> palloc0_object(), palloc0_array, repalloc_array() and repalloc0_array()
> macros. This makes the code more readable and more consistent.
>
> The patch is pretty big but potential merge conflicts should be easy to
> resolve. If preferred, I can also further split up the patch, e.g.
> directory by directory or high impact files first.
>
> The patch is passing "meson test" and I've additionally wrote a script
> that parses the patch file and verifies that every two corresponding +
> and - lines match (e.g. palloc0() replaced by palloc0_array() or
> palloc0_object(), the same for palloc() and repalloc(), additionally
> some checks to make sure the conversion to the _array() variant is
> correct).
>
> --
> David Geier<v1-0001-Consistently-use-palloc_object-and-palloc_array.patch>

This is a large patch, I just take a quick look, and found that:

```
-        *phoned_word = palloc(sizeof(char) * strlen(word) + 1);
+        *phoned_word = palloc_array(char, strlen(word) + 1);
```

And

```
-        params = (const char **) palloc(sizeof(char *));
+        params = palloc_object(const char *);
```

Applying palloc_array and palloc_object to char type doesn’t seem to improve anything.

Best reagards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







Re: Consistently use palloc_object() and palloc_array()

От
Michael Paquier
Дата:
On Wed, Nov 26, 2025 at 11:09:31PM +0100, David Geier wrote:
> I've changed all code to use the "new" palloc_object(), palloc_array(),
> palloc0_object(), palloc0_array, repalloc_array() and repalloc0_array()
> macros. This makes the code more readable and more consistent.
>
> The patch is pretty big but potential merge conflicts should be easy to
> resolve. If preferred, I can also further split up the patch, e.g.
> directory by directory or high impact files first.

The backpatching extra-pain argument indeed comes into mind first when
it comes to such changes, but perhaps we should just bite the bullet
and encourage the new allocation styles across the tree, as you are
doing here.  I'm not completely sure if it would make sense to split
things up, if we do I would do it on a subdirectory basis like to
suggest, perhaps, like contrib/, src/backend/executor/, etc. to
balance the blast damage.  Did you use some kind of automation to find
all of these?  If yes, what did you use?

> The patch is passing "meson test" and I've additionally wrote a script
> that parses the patch file and verifies that every two corresponding +
> and - lines match (e.g. palloc0() replaced by palloc0_array() or
> palloc0_object(), the same for palloc() and repalloc(), additionally
> some checks to make sure the conversion to the _array() variant is
> correct).

It may be an idea to share that as well, so as your checks could be
replicated rather than partially re-guessed.
--
Michael

Вложения

Re: Consistently use palloc_object() and palloc_array()

От
Thomas Munro
Дата:
On Thu, Nov 27, 2025 at 11:10 AM David Geier <geidav.pg@gmail.com> wrote:
> I've changed all code to use the "new" palloc_object(), palloc_array(),
> palloc0_object(), palloc0_array, repalloc_array() and repalloc0_array()
> macros. This makes the code more readable and more consistent.

I wondered about this in the context of special alignment
requirements[1].  palloc() aligns to MAXALIGN, which we artificially
constrain for various reasons that we can't easily change (at least
not without splitting on-disk MAXALIGN from allocation MAXALIGN, and
if we do that we'll waste more memory).  That's less than
alignof(max_align_t) on common systems, so then we have to do some
weird stuff to handle __int128 that doesn't fit too well into modern
<stdalign.h> thinking and also disables optimal codegen.

This isn't a fully-baked thought, just a thought that occurred to me
while looking into that:  If palloc_object(Int128AggState) were smart
enough to detect that alignof(T) > MAXALIGN and redirect to
palloc_aligned(sizeof(T), alignof(T), ...) at compile time, then
Int128AggState would naturally propagate the layout requirements of
its __int128 member, and we wouldn't need to do that weird stuff, and
it wouldn't be error-prone if usage of __int128 spreads to more
structs.  That really only makes sense if we generalise
palloc_object() as a programming style and consider direct use of
palloc() to be a rarer low-level interface that triggers human
reviewers to think about alignment, I guess.  I think you'd also want
a variant that can deal with structs ending in a flexible array
member, but that seems doable...  palloc_flexible_object(T,
flexible_member, flexible_elements) or whatever.  But I might also be
missing some parts of that puzzle, for example it wouldn't make sense
if __int128 is ever stored.

[1] https://www.postgresql.org/message-id/CA%2BhUKGLQUivg-NC7dHdbRAPmG0Hapg1gGnygM5KgDfDM2a_TMg%40mail.gmail.com



Re: Consistently use palloc_object() and palloc_array()

От
Tom Lane
Дата:
Thomas Munro <thomas.munro@gmail.com> writes:
> This isn't a fully-baked thought, just a thought that occurred to me
> while looking into that:  If palloc_object(Int128AggState) were smart
> enough to detect that alignof(T) > MAXALIGN and redirect to
> palloc_aligned(sizeof(T), alignof(T), ...) at compile time, then
> Int128AggState would naturally propagate the layout requirements of
> its __int128 member, and we wouldn't need to do that weird stuff, and
> it wouldn't be error-prone if usage of __int128 spreads to more
> structs.  That really only makes sense if we generalise
> palloc_object() as a programming style and consider direct use of
> palloc() to be a rarer low-level interface that triggers human
> reviewers to think about alignment, I guess.

Hmm ... I had the same doubts as Michael about whether this change
could possibly be worth the ensuing back-patching pain.  But if
it leads to an improvement in type-safety, that'd be a reason to
take on the work.

            regards, tom lane



Re: Consistently use palloc_object() and palloc_array()

От
Michael Paquier
Дата:
On Wed, Nov 26, 2025 at 10:25:12PM -0500, Tom Lane wrote:
> Hmm ... I had the same doubts as Michael about whether this change
> could possibly be worth the ensuing back-patching pain.  But if
> it leads to an improvement in type-safety, that'd be a reason to
> take on the work.

Yeah, that sounds like a reason convincing enough to do the jump.  I
didn't really feel strongly against the original proposal to begin
with as it improves the code style, FWIW.

The backpatching bits like these are always annoying, of course.  Now,
these fancy palloc() routines exist since v16 so that would just mean
two years and two branches that would need special handling.  Even
better, why not just backpatch into v15 and v14 the macros in palloc.h
and fe_memutils.h to avoid backpatch hazards due to some new code?
--
Michael

Вложения

Re: Consistently use palloc_object() and palloc_array()

От
David Geier
Дата:
Hi!

Thanks for taking a look.

On 27.11.2025 00:03, Chao Li wrote:
> 
> This is a large patch, I just take a quick look, and found that:
> 
> ```
> -        *phoned_word = palloc(sizeof(char) * strlen(word) + 1);
> +        *phoned_word = palloc_array(char, strlen(word) + 1);
> ```
> 
> And
> 
> ```
> -        params = (const char **) palloc(sizeof(char *));
> +        params = palloc_object(const char *);
> ```
> 
> Applying palloc_array and palloc_object to char type doesn’t seem to improve anything.
> 

You mean because sizeof(char) is always 1 and hence we could instead
simply write:

*phoned_word = palloc(strlen(word) + 1);
params = palloc(1);

I think the _array and _object variants are more expressive and for sure
don't make the code less readable.

--
David Geier



Re: Consistently use palloc_object() and palloc_array()

От
Tom Lane
Дата:
David Geier <geidav.pg@gmail.com> writes:
> On 27.11.2025 00:03, Chao Li wrote:
>> This is a large patch, I just take a quick look, and found that:
>> -        *phoned_word = palloc(sizeof(char) * strlen(word) + 1);
>> +        *phoned_word = palloc_array(char, strlen(word) + 1);
>> And
>> -        params = (const char **) palloc(sizeof(char *));
>> +        params = palloc_object(const char *);
>> Applying palloc_array and palloc_object to char type doesn’t seem to improve anything.

> You mean because sizeof(char) is always 1 and hence we could instead
> simply write:
>         *phoned_word = palloc(strlen(word) + 1);
>         params = palloc(1);
> I think the _array and _object variants are more expressive and for sure
> don't make the code less readable.

Yeah, I agree these particular changes seem fine.  When you're doing
address arithmetic for a memcpy or such, it may be fine to wire in an
assumption that sizeof(char) == 1, but I think doing that in other
contexts is not particularly good style.

Another thing to note is that the proposed patch effectively changes
the expression evaluation order:

-        *phoned_word = palloc((sizeof(char) * strlen(word)) + 1);
+        *phoned_word = palloc(sizeof(char) * (strlen(word) + 1));

Now, there's not actually any difference because sizeof(char) is 1,
but if it hypothetically weren't, the new version is likely more
correct.  Presumably the +1 is meant to allow room for a trailing \0,
which is a char.

It'd be a good idea to review the patch to see if there are any
places where semantics are changed in a less benign fashion...

            regards, tom lane



Re: Consistently use palloc_object() and palloc_array()

От
David Geier
Дата:
Hi Michael!

On 27.11.2025 01:24, Michael Paquier wrote:
> On Wed, Nov 26, 2025 at 11:09:31PM +0100, David Geier wrote:
>> I've changed all code to use the "new" palloc_object(), palloc_array(),
>> palloc0_object(), palloc0_array, repalloc_array() and repalloc0_array()
>> macros. This makes the code more readable and more consistent.
>>
>> The patch is pretty big but potential merge conflicts should be easy to
>> resolve. If preferred, I can also further split up the patch, e.g.
>> directory by directory or high impact files first.
> 
> The backpatching extra-pain argument indeed comes into mind first when
> it comes to such changes, but perhaps we should just bite the bullet
> and encourage the new allocation styles across the tree, as you are
> doing here.  I'm not completely sure if it would make sense to split
> things up, if we do I would do it on a subdirectory basis like to
> suggest, perhaps, like contrib/, src/backend/executor/, etc. to
> balance the blast damage.  Did you use some kind of automation to find
> all of these?  If yes, what did you use?
I thought again about splitting up the patch. I'm not sure how useful
this really is. If a single committer takes this on, then it doesn't
really matter. He can apply the patch but then commit directory by
directory or in any other way he deems best. If we want to divide the
work among multiple committers splitting might be more useful.

Just tell me what you prefer and I'll provide the patch accordingly.

>> The patch is passing "meson test" and I've additionally wrote a script
>> that parses the patch file and verifies that every two corresponding +
>> and - lines match (e.g. palloc0() replaced by palloc0_array() or
>> palloc0_object(), the same for palloc() and repalloc(), additionally
>> some checks to make sure the conversion to the _array() variant is
>> correct).
> 
> It may be an idea to share that as well, so as your checks could be
> replicated rather than partially re-guessed.
I've attached the script. You can run it via

python3 verify_palloc_pairs.py patch_file

Disclaimer: The script is a bit of a mess. It doesn't check repalloc()
and it still reports five conversions as erroneous while they're
actually correct. I checked them manually. I left it at that because the
vast majority of changes it processes correctly and all tests pass. The
repalloc() occurrences I also checked manually.

--
David Geier
Вложения

Re: Consistently use palloc_object() and palloc_array()

От
David Geier
Дата:
Hi Thomas!

On 27.11.2025 03:53, Thomas Munro wrote:
> On Thu, Nov 27, 2025 at 11:10 AM David Geier <geidav.pg@gmail.com> wrote:
>> I've changed all code to use the "new" palloc_object(), palloc_array(),
>> palloc0_object(), palloc0_array, repalloc_array() and repalloc0_array()
>> macros. This makes the code more readable and more consistent.
> 
> I wondered about this in the context of special alignment
> requirements[1].  palloc() aligns to MAXALIGN, which we artificially
> constrain for various reasons that we can't easily change (at least
> not without splitting on-disk MAXALIGN from allocation MAXALIGN, and
> if we do that we'll waste more memory).  That's less than
> alignof(max_align_t) on common systems, so then we have to do some
> weird stuff to handle __int128 that doesn't fit too well into modern
> <stdalign.h> thinking and also disables optimal codegen.
> 
> This isn't a fully-baked thought, just a thought that occurred to me
> while looking into that:  If palloc_object(Int128AggState) were smart
> enough to detect that alignof(T) > MAXALIGN and redirect to
> palloc_aligned(sizeof(T), alignof(T), ...) at compile time, then
> Int128AggState would naturally propagate the layout requirements of
> its __int128 member, and we wouldn't need to do that weird stuff, and
> it wouldn't be error-prone if usage of __int128 spreads to more
> structs.  That really only makes sense if we generalise
> palloc_object() as a programming style and consider direct use of
> palloc() to be a rarer low-level interface that triggers human
> reviewers to think about alignment, I guess.  I think you'd also want
> a variant that can deal with structs ending in a flexible array
> member, but that seems doable...  palloc_flexible_object(T,
> flexible_member, flexible_elements) or whatever.  But I might also be
> missing some parts of that puzzle, for example it wouldn't make sense
> if __int128 is ever stored.
> 
> [1] https://www.postgresql.org/message-id/CA%2BhUKGLQUivg-NC7dHdbRAPmG0Hapg1gGnygM5KgDfDM2a_TMg%40mail.gmail.com

These are some interesting ideas but I would consider them for now as
possible follow-up work, once this refactoring is merged.

--
David Geier



Re: Consistently use palloc_object() and palloc_array()

От
David Geier
Дата:
On 28.11.2025 22:28, Tom Lane wrote:
> David Geier <geidav.pg@gmail.com> writes:
>> On 27.11.2025 00:03, Chao Li wrote:
>>> This is a large patch, I just take a quick look, and found that:
>>> -        *phoned_word = palloc(sizeof(char) * strlen(word) + 1);
>>> +        *phoned_word = palloc_array(char, strlen(word) + 1);
>>> And
>>> -        params = (const char **) palloc(sizeof(char *));
>>> +        params = palloc_object(const char *);
>>> Applying palloc_array and palloc_object to char type doesn’t seem to improve anything.
> 
>> You mean because sizeof(char) is always 1 and hence we could instead
>> simply write:
>>         *phoned_word = palloc(strlen(word) + 1);
>>         params = palloc(1);
>> I think the _array and _object variants are more expressive and for sure
>> don't make the code less readable.
> 
> Yeah, I agree these particular changes seem fine.  When you're doing
> address arithmetic for a memcpy or such, it may be fine to wire in an
> assumption that sizeof(char) == 1, but I think doing that in other
> contexts is not particularly good style.
> 
> Another thing to note is that the proposed patch effectively changes
> the expression evaluation order:
> 
> -        *phoned_word = palloc((sizeof(char) * strlen(word)) + 1);
> +        *phoned_word = palloc(sizeof(char) * (strlen(word) + 1));
> 
> Now, there's not actually any difference because sizeof(char) is 1,
> but if it hypothetically weren't, the new version is likely more
> correct.  Presumably the +1 is meant to allow room for a trailing \0,
> which is a char.

Oh right. Good catch!

> It'd be a good idea to review the patch to see if there are any
> places where semantics are changed in a less benign fashion...

I intentionally tried to avoid any semantic changes but it's of course
possible something slipped through by accident.

It's some ~1700 occurrences in the patch. If it takes ~10 seconds to
check a single, it would take ~4.7h to review the entire patch.

If no one is willing to take this on, I could split up the patch in 4 to
5 that each can be reviewed by someone else.

--
David Geier



Re: Consistently use palloc_object() and palloc_array()

От
Thomas Munro
Дата:
On Sat, Nov 29, 2025 at 10:47 AM David Geier <geidav.pg@gmail.com> wrote:
> I intentionally tried to avoid any semantic changes but it's of course
> possible something slipped through by accident.

Do you expect the generated code to be identical?  Is it?



Re: Consistently use palloc_object() and palloc_array()

От
David Geier
Дата:
On 28.11.2025 23:31, Thomas Munro wrote:
> On Sat, Nov 29, 2025 at 10:47 AM David Geier <geidav.pg@gmail.com> wrote:
>> I intentionally tried to avoid any semantic changes but it's of course
>> possible something slipped through by accident.
> 
> Do you expect the generated code to be identical?  Is it?

In the majority of cases yes. However, there are a few cases where a
small change in the C code can yield to differences in the generated code:

I used the following bash script to create the disassembly of all object
files in the build directory. I ran this script twice, once on master
and once on the patched branch. The directory needs to be adapted
accordingly in the script.

find . -name "*.o" -print0 | while IFS= read -r -d '' file; do
    mkdir -p ~/Desktop/master/"$(dirname "$file")"

    if objdump -drwC -Mintel "$file" > ~/Desktop/master/"$file".dis
2>/dev/null; then
        echo "  ✓ Disassembled to ~/Desktop/master/$file.dis"
    else
        echo "  ✗ Failed to disassemble $file"
    fi
done

There are 54 files that show changes in the generated code. I didn't
look through all files but the changes are largely caused by the same
reasons:

1) If the refactoring introduces a change in the number of lines, the
__LINE__ macro used by elog.h will cause a change in the disassembly.
This is the reason for the majority of changes.

2) fuzzystrmatch.c: contains a functional change. We allocate 1 byte
more now but that is safe, see [1].

3) pg_buffercache_pages.c: Functional change. Type used in
palloc_array() changed from uint64 to int, as the rest of the code only
uses an integer.

4) trgm_op.c: Contains arithmetic expressions in the calls to
palloc_array() where now the brackets are placed differently, e.g.
sizeof(type) * a * b vs sizeof(type) * (a * b). That's the reason for
many differences.

So reviewing this patch can now be done by only going through all files
that have changes in the disassembly. This is only 54 out of which most
are because of changes in the number of LOC or where the brackets are
placed.

[1] https://www.postgresql.org/message-id/524587.1764365323%40sss.pgh.pa.us

--
David Geier



Re: Consistently use palloc_object() and palloc_array()

От
Michael Paquier
Дата:
On Tue, Dec 02, 2025 at 04:13:01PM +0100, David Geier wrote:
> So reviewing this patch can now be done by only going through all files
> that have changes in the disassembly. This is only 54 out of which most
> are because of changes in the number of LOC or where the brackets are
> placed.

It may be a good idea to split the patch into two parts, at least:
- One for the bulk of the changes, for the straight-forward changes.
Most of what you are suggesting are that for palloc_object and
palloc_array which are dropped-in replacements.  Checking that these
assemble the same before and after offers one extra layer of
confidence.
- Second one for the more dubious changes.

It does not change that all these need to be looked with human eyes.
For the first one, splitting things based on the code area is simpler
With more than 1.7k places changed, splitting by area and checking
them individually would be the best course, at least for me when it
comes to such mechanical changes.  It comes down with dealing with
individual doses that are not so large that they cause one's head to
spin in the middle of checking the diffs (did that a few times in the
past for this code tree, splitting and dose balance helps a lot).

I cannot say for the others, but I find the type-safety argument
mentioned upthread good enough to do a switch and encourage more the
new style moving forward.
--
Michael

Вложения

Re: Consistently use palloc_object() and palloc_array()

От
Tom Lane
Дата:
Michael Paquier <michael@paquier.xyz> writes:
> It may be a good idea to split the patch into two parts, at least:
> - One for the bulk of the changes, for the straight-forward changes.
> Most of what you are suggesting are that for palloc_object and
> palloc_array which are dropped-in replacements.  Checking that these
> assemble the same before and after offers one extra layer of
> confidence.
> - Second one for the more dubious changes.

Yeah, I was thinking the same.  Some of those might perhaps be bugs
that we want to back-patch, so they need to be looked at in a
different way.

            regards, tom lane



Re: Consistently use palloc_object() and palloc_array()

От
Peter Eisentraut
Дата:
On 27.11.25 03:53, Thomas Munro wrote:
> I wondered about this in the context of special alignment
> requirements[1].  palloc() aligns to MAXALIGN, which we artificially
> constrain for various reasons that we can't easily change (at least
> not without splitting on-disk MAXALIGN from allocation MAXALIGN, and
> if we do that we'll waste more memory).  That's less than
> alignof(max_align_t) on common systems, so then we have to do some
> weird stuff to handle __int128 that doesn't fit too well into modern
> <stdalign.h> thinking and also disables optimal codegen.

On macOS ARM, I have MAXALIGN == alignof(max_align_t) == 8, but 
alignof(__int128) == 16.  (macOS Intel has 16/16.)  Also, as a 
consequence of that, the result of malloc() is not guaranteed to be 
aligned sufficiently for __int128 (need to use aligned_alloc()).  So it 
seems to me that the current behavior of palloc() is pretty consistent 
with that.