Обсуждение: Fix uninitialized xl_running_xacts padding
Hi, While looking at the generated WAL, I've found out that RUNNING_XACTS records contain data from uninitialized padding bytes. This can be seen by generating a simple WAL with "SELECT pg_switch_wal(); CHECKPOINT;" Finding the position of the running_xacts record with pg_waldump: rmgr: Standby len (rec/tot): 54/ 54, tx: 0, lsn: 0/02D001D0, prev 0/02D00198, desc: RUNNING_XACTS nextXid 803 latestCompletedXid 801 oldestRunningXid 802; 1 xacts: 802 And getting the content of the running xacts record, skipping the 24 bytes of record header: hexdump -C -s $((0x1d0 + 24)) -n 30 00000001000000000000002D Which yields the following: ff 1c 01 00 00 00 00 00 00 00 00 ca ce 9b 23 03 00 00 22 03 00 00 21 03 00 00 22 03 00 00 Looking at the xl_running_xacts, structure, we have the following: id: ff length: 1c xcnt: 01 00 00 00 subxcnt: 00 00 00 00 subxid_overflow: 00 padding: ca ce 9b nextXid: 00 00 22 03 ... The 3 bytes of padding after subxid_overflow were left uninitialized, leading to the random 'ca ce 9b' data being written in the WAL. The attached patch fixes the issue by zeroing the xl_running_xacts structure in LogCurrentRunningXacts using MemSet. Regards, Anthonin Bonnefoy
Вложения
On Fri, Feb 13, 2026 at 10:39:14AM +0100, Anthonin Bonnefoy wrote: > The 3 bytes of padding after subxid_overflow were left uninitialized, > leading to the random 'ca ce 9b' data being written in the WAL. The > attached patch fixes the issue by zeroing the xl_running_xacts > structure in LogCurrentRunningXacts using MemSet. This uninitialized padding exists for as long as this code exists, down to efc16ea52067. No objection here to clean up that on HEAD. -- Michael
Вложения
Hi, On Fri, Feb 13, 2026 at 06:50:08PM +0900, Michael Paquier wrote: > On Fri, Feb 13, 2026 at 10:39:14AM +0100, Anthonin Bonnefoy wrote: > > The 3 bytes of padding after subxid_overflow were left uninitialized, > > leading to the random 'ca ce 9b' data being written in the WAL. The > > attached patch fixes the issue by zeroing the xl_running_xacts > > structure in LogCurrentRunningXacts using MemSet. > > This uninitialized padding exists for as long as this code exists, > down to efc16ea52067. No objection here to clean up that on HEAD. It's not as important as when a struct which is used as an hash key has padding bytes uninitialized (and byte comparisons are done on the key) but I'm also +1 to make it "cleaner". Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
> On Feb 13, 2026, at 18:08, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > Hi, > > On Fri, Feb 13, 2026 at 06:50:08PM +0900, Michael Paquier wrote: >> On Fri, Feb 13, 2026 at 10:39:14AM +0100, Anthonin Bonnefoy wrote: >>> The 3 bytes of padding after subxid_overflow were left uninitialized, >>> leading to the random 'ca ce 9b' data being written in the WAL. The >>> attached patch fixes the issue by zeroing the xl_running_xacts >>> structure in LogCurrentRunningXacts using MemSet. >> >> This uninitialized padding exists for as long as this code exists, >> down to efc16ea52067. No objection here to clean up that on HEAD. > > It's not as important as when a struct which is used as an hash key has padding > bytes uninitialized (and byte comparisons are done on the key) but I'm also > +1 to make it "cleaner". > > Regards, > > -- > Bertrand Drouvot > PostgreSQL Contributors Team > RDS Open Source Databases > Amazon Web Services: https://aws.amazon.com > I have no objection on cleanup the padding bytes. As the structure is small, maybe we can use {0} initializer: ``` xl_running_xacts xlrec = {0}; ``` That will allow compilers to optimize the initialization. Anyway, that’s not a big deal, no strong opinion here. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
On Fri, Feb 13, 2026 at 10:39 PM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:
> The 3 bytes of padding after subxid_overflow were left uninitialized,
> leading to the random 'ca ce 9b' data being written in the WAL. The
> attached patch fixes the issue by zeroing the xl_running_xacts
> structure in LogCurrentRunningXacts using MemSet.
Nitpick: the so-called universal zero initialiser syntax (var = {0})
is a nicer way to do this and generally preferred in new code AFAIK.
But in this case, it seems we don't actually worry about initialising
WAL padding bytes in general. valgrind.supp has an entry to prevent
warnings about it. Should we?
On Mon, Feb 16, 2026 at 01:17:56PM +1300, Thomas Munro wrote:
> Nitpick: the so-called universal zero initialiser syntax (var = {0})
> is a nicer way to do this and generally preferred in new code AFAIK.
My memory on the matter may be fuzzy, of course, but the initializer
does not guarantee that the padding bytes are initialized to zero
because the padding bytes are not associated to a member in the
structure. A memset(0), however, makes sure that the padding bytes
are full of zeros by taking into account the full size of the
structure. We could couple a {0} with some dummy fields in
xl_running_xacts, of course. But actually, there may be an even
smarter move in this case: LogCurrentRunningXacts() uses
MinSizeOfXactRunningXacts to store the data of a xl_running_xacts,
based on an offset of xl_running_xacts.xids. So we could move
subxid_overflow at the end of xl_running_xacts before xids, shaving
these padding bytes away while inserting the record's data.
> But in this case, it seems we don't actually worry about initialising
> WAL padding bytes in general. valgrind.supp has an entry to prevent
> warnings about it. Should we?
True about the initialization part, mostly I guess, still we tend to
worry about eliminating padding because these are wasted bytes in the
WAL records. For example, xlhp_freeze_plans has two bytes of padding,
that we eliminate while inserting its record by splitting the
FLEXIBLE_ARRAY_MEMBER part.
--
Michael
Вложения
Hi,
On Mon, Feb 16, 2026 at 10:10:58AM +0900, Michael Paquier wrote:
> On Mon, Feb 16, 2026 at 01:17:56PM +1300, Thomas Munro wrote:
> > Nitpick: the so-called universal zero initialiser syntax (var = {0})
> > is a nicer way to do this and generally preferred in new code AFAIK.
>
> My memory on the matter may be fuzzy, of course, but the initializer
> does not guarantee that the padding bytes are initialized to zero
> because the padding bytes are not associated to a member in the
> structure. A memset(0), however, makes sure that the padding bytes
> are full of zeros by taking into account the full size of the
> structure.
That's also what I recall, and what we followed in [1].
> > But in this case, it seems we don't actually worry about initialising
> > WAL padding bytes in general. valgrind.supp has an entry to prevent
> > warnings about it. Should we?
>
> True about the initialization part, mostly I guess, still we tend to
> worry about eliminating padding because these are wasted bytes in the
> WAL records. For example, xlhp_freeze_plans has two bytes of padding,
> that we eliminate while inserting its record by splitting the
> FLEXIBLE_ARRAY_MEMBER part.
But in the case of this thread it's in the middle of the struct, so I'm not
sure the "wasted" bytes would be elminated, would it?
Regards,
[1]: https://postgr.es/m/CAGECzQS37h0twutb=kkS6v0rSnQ0vWxhVncqVNYoOTsv6gOmcw@mail.gmail.com
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Fri, Feb 13, 2026 at 11:08 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
> It's not as important as when a struct which is used as an hash key has padding
> bytes uninitialized (and byte comparisons are done on the key) but I'm also
> +1 to make it "cleaner".
Yeah, there's no direct issue of having those uninitialized. The only
impact I can think of is reducing compression efficiency of the WAL
due to the random padding bytes.
On Mon, Feb 16, 2026 at 8:29 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
> > My memory on the matter may be fuzzy, of course, but the initializer
> > does not guarantee that the padding bytes are initialized to zero
> > because the padding bytes are not associated to a member in the
> > structure. A memset(0), however, makes sure that the padding bytes
> > are full of zeros by taking into account the full size of the
> > structure.
>
> That's also what I recall, and what we followed in [1].
I think that depends on the C standard used. With C99, there's no rule
for the padding bytes initialization.
With C11, in 6.7.9 Initialization of the standard: "the remainder of
the aggregate shall be initialized implicitly the same as objects that
have static storage duration", and with static storage will "every
member is initialized (recursively) according to these rules, and any
padding is initialized to zero bits".
So if I read this correctly, '{0}' will set padding bytes to 0 when
using C11. But given Postgres is using C99, that's not something we
can rely on?
> > True about the initialization part, mostly I guess, still we tend to
> > worry about eliminating padding because these are wasted bytes in the
> > WAL records. For example, xlhp_freeze_plans has two bytes of padding,
> > that we eliminate while inserting its record by splitting the
> > FLEXIBLE_ARRAY_MEMBER part.
>
> But in the case of this thread it's in the middle of the struct, so I'm not
> sure the "wasted" bytes would be elminated, would it?
Moving subxid_overflow before xids, wouldn't you have 3 bytes of
padding at the end of the struct for the whole struct alignment?
Hi,
On Mon, Feb 16, 2026 at 12:02:33PM +0100, Anthonin Bonnefoy wrote:
> On Fri, Feb 13, 2026 at 11:08 AM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > It's not as important as when a struct which is used as an hash key has padding
> > bytes uninitialized (and byte comparisons are done on the key) but I'm also
> > +1 to make it "cleaner".
>
> Yeah, there's no direct issue of having those uninitialized. The only
> impact I can think of is reducing compression efficiency of the WAL
> due to the random padding bytes.
Yeah, good point about the compression.
> >
> > That's also what I recall, and what we followed in [1].
>
> I think that depends on the C standard used. With C99, there's no rule
> for the padding bytes initialization.
> With C11, in 6.7.9 Initialization of the standard: "the remainder of
> the aggregate shall be initialized implicitly the same as objects that
> have static storage duration", and with static storage will "every
> member is initialized (recursively) according to these rules, and any
> padding is initialized to zero bits".
Thanks for the research!
> So if I read this correctly, '{0}' will set padding bytes to 0 when
> using C11. But given Postgres is using C99, that's not something we
> can rely on?
C11 is required as of f5e0186f865c so it looks like we could make use of
{0} instead.
> > > True about the initialization part, mostly I guess, still we tend to
> > > worry about eliminating padding because these are wasted bytes in the
> > > WAL records. For example, xlhp_freeze_plans has two bytes of padding,
> > > that we eliminate while inserting its record by splitting the
> > > FLEXIBLE_ARRAY_MEMBER part.
> >
> > But in the case of this thread it's in the middle of the struct, so I'm not
> > sure the "wasted" bytes would be elminated, would it?
>
> Moving subxid_overflow before xids, wouldn't you have 3 bytes of
> padding at the end of the struct for the whole struct alignment?
Yeah, we'd go from:
/* offset | size */ type = struct xl_running_xacts {
/* 0 | 4 */ int xcnt;
/* 4 | 4 */ int subxcnt;
/* 8 | 1 */ _Bool subxid_overflow;
/* XXX 3-byte hole */
/* 12 | 4 */ TransactionId nextXid;
/* 16 | 4 */ TransactionId oldestRunningXid;
/* 20 | 4 */ TransactionId latestCompletedXid;
/* 24 | 0 */ TransactionId xids[];
/* total size (bytes): 24 */
}
to
/* offset | size */ type = struct xl_running_xacts {
/* 0 | 4 */ int xcnt;
/* 4 | 4 */ int subxcnt;
/* 8 | 4 */ TransactionId nextXid;
/* 12 | 4 */ TransactionId oldestRunningXid;
/* 16 | 4 */ TransactionId latestCompletedXid;
/* 20 | 1 */ _Bool subxid_overflow;
/* XXX 3-byte hole */
/* 24 | 0 */ TransactionId xids[];
/* total size (bytes): 24 */
}
By moving subxid_overflow before the flexible array.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Mon, Feb 16, 2026 at 1:18 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> But in this case, it seems we don't actually worry about initialising
> WAL padding bytes in general. valgrind.supp has an entry to prevent
> warnings about it. Should we?
Most of the xlog records are zeroed, so having different behaviour for
some records feels inconsistent. For context, I've been trying to
write an ImHex's pattern for WAL files[0], and stumbling upon random
values was definitely confusing.
On Mon, Feb 16, 2026 at 12:17 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
> > So if I read this correctly, '{0}' will set padding bytes to 0 when
> > using C11. But given Postgres is using C99, that's not something we
> > can rely on?
>
> C11 is required as of f5e0186f865c so it looks like we could make use of
> {0} instead.
Ha good point, I've missed the switch to C11. So using '{0}' seems
like an option. I've run some additional tests to check what was the
generated machine code[1] (sorry for the long godbolt link...):
- x86 clang: Calls memset
- x86 gcc: Sets 128 bits + 64 bits to 0
- arm clang: Sets 64 bits * 3 to 0
- arm gcc: Sets 128 bits + 64 bits to 0
So it looks like '{0}' does zero the padding on this example. For
clang, the generated machine code is the exact same as calling memset.
I've found 2 other places where padding bytes aren't set: heap inplace
and invalidation messages. I've updated the patch:
- Initialize xl_running_xacts, xl_invalidations and xl_heap_inplace using '{0}'.
- Use MemoryContextAllocZero instead of MemoryContextAlloc for shared
invalidation messages
[0]: https://github.com/bonnefoa/ImHex-Patterns/blob/pg_xlog/patterns/postgres_xlog.hexpat
[1]:
https://godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAMzwBtMA7AQwFtMQByARg9KtQYEAysib0QXACx8BBAKoBnTAAUAHpwAMvAFYTStJg1AB9U8lIAHQ5gByrdqtrFMaYulJL6yAngGVG6ADCqLQAriwMIABMpAEAMngMtuEARpjEIJJclqgKhL4MwWER0Tl5PgIJSTap6ZnZni4VDEIETMQEReGRMY3eBa3tBFXJLGkZWR5tHV0lvdPDiaPj9QCUHqihxMjsHACkGgCCe1EAzInIYVgA1HungQoE%2BIIAdAh32AfHZxdXmLf3R7oFKoEJvD5fE7nBiXUI3O4PAjERLAcGnT5HKG/OH/BFA3xojHHI4EACeFkwWCo10exFC3mujmMdIYDBRxlUTG8CkhAHYAEJfa7XRIEYXi1TIQR3QVHYWi8XChShFKS6WnWWHYUgkKK5WqvDoYyoABu6SotFQAHcZULrqFRacosYxUlVAQABqG21y%2B2O51ikJYR4AJVCrJRXvQPq1fsETpd1wMBEwj2CLAs9BT6CjMbtDvjAcZhp5AFZBaWACJ53mVxm0ZnhtlGDlcgg8jWQo4m1DekmpgjGRKECA9w2rPmag4ATiZLObwFb3PrzmQALrewFGk31c7mNrXcOY%2Bj/ceQ4Y%2BBNhtCYnPI%2BPE/3msVc6b7M5y8cq/XtwFLzVYp3HWGikLcvqKsKLz6gBP4gWBsYQdcUEqqohrGmaxAWtaP5UGISikHaiEvG6nqGrBBHgRBLxBgOYYRkYUbkYRVHJgO6aZpg2aMUB1zbgee7HAemLdr2J6HCmZ5sCwSgEKOomPoJz4Sg287vm2Cgri4MaKlJMkQCcABsX4uKBcF5AAXpgqBUBAxnIKsE4CTuHDrLQnClrwkQcFopCoCAdkkO4BCaC56wANYgKcBkvFFsUGXFsW8tIbkcJIvAsCAvIaC807xVwvJRAAHBokilk6paFaQXk%2BX5HC8AoIAgcF3kuaQcCwEgaAZnQ6TkJQXUWD1GTIJchjAFEXBcCBNC0CmxANRAKQhaQKSJO0pKcDwK1rcQpIAPIpNoTSbbwXVsIIe0MLQG0taQWApKEwCBGItANdwvBYCwY3iLd%2BCrj4ZpvT5mCqC4oQpid5CCJgKU%2BbQeApMQ63BFgy1IngGXvaQGEgkolaYF9Rjw0YIXrBaTDAAoABqeCYFae0Ul5W38IIIhiOwUgyIIigqOot26MlY0gKYxjmPDKQNZA6yoBYzRvQAtECQHIJNvCmukyLBvA6x9M0/gXrMPTSPESw1GMdTSBYuT5AIhvRJb1vNCMZsrNIusDAsdtRG7MNNB7QzO7UExTEMXtuwsgfmxMOubNsHOue5nnLbVjKFQZ8sGZI1yjUY1wTS8XAvBo1wQLghAkLcZxcKsvDNVo6wQJ1qDdfQZAUBAA1DSgwBcKWMQzXNC1Lbdq3MLtkOj%2BtB1Hd4kNnYwBCXddy33Y9z20K9kOfd9uw%2BX9fuA8tINgxDWOijDy3i0ju0o7vtfIpjW047kmD44TwDE6ALVkwYlM03TDNGCQxZsIUQ4hOYgJ5moZaugYgGBJiLMWCNJYQGlrLAoCslanErCrLa6tiCa1TNrDwvt%2Bh%2BAgAEO22QTbVCDnoK25QChULKDbBgkcVgNFIc0QYMwQjdD0O7AQPDFi0KjgIz2fCSgNAjqbOh1cNhbB2BIBOHAPJVWTpwVO6dM7XGACNa4vcXhRBLmXIgxBK6nGrrXUmbUm4t16u3TurcQDtBYCaQq8sRrwPGpNaadBB6UGHj5Se48sYhP2odY6WN54XSujdPemAHpPRem9La28iZ3zungf6eBD63WPsgcGuwtrn1hrwK%2ByMMCZPRo/Xgz88YE2%2BiiGx5M/603pozYBsg2bgOkJApQ0D%2BZ6G8cLMw%2BhkHEJlnLTgisnjK1Vr5DChCUE6y4QUfWQRJE9EqjQ5YdQposOaF7SqDDWHsP2SBQRLQJHFG2SQrw3CZGiI4Zcm5/Ciohw6OcjIU0Y6KPjvoRO6jbop1ce4%2BWTBs7eLzoXQuxdS74DMRYqxpA66tUbigZug1W59Q7liruohiAsEzj3PufjZrpCHstcJE8doRJns1LaMTF5xJXokteKSt6NIySvbJB8iH5NBoU0%2BJToZlPGYjSpqNbo1MhvU1%2B3KP7NO/nwX%2B1N2mAKZrwEBPSOZ9NkFAvmPldCnH0ELRBEqVm%2BXQQITBczsG4LVksw0RCpb3L9uQyhWyJAxF2S7Oo3tDlMO9VwGIpynayLEYGq5wiqG9DWUIp5eyMjRreVI%2BYAdI0rG9n8uOyjAWqKTiCzRhwQwAFkdEAHFAiBAMaWIxRcTGIorlCFFaKHKkAivFF48Ve0JV7aWZKnA0rApqpweqjVUWkxUVEItY66pTu/usDCeQ/CSCAA%3D%3D%3D
Вложения
Hi,
On 2026-02-16 12:02:33 +0100, Anthonin Bonnefoy wrote:
> I think that depends on the C standard used. With C99, there's no rule
> for the padding bytes initialization.
> With C11, in 6.7.9 Initialization of the standard: "the remainder of
> the aggregate shall be initialized implicitly the same as objects that
> have static storage duration", and with static storage will "every
> member is initialized (recursively) according to these rules, and any
> padding is initialized to zero bits".
I don't think that rule applies for things like xl_running_xacts, as it does
not have static storage duration.
> So if I read this correctly, '{0}' will set padding bytes to 0 when
> using C11. But given Postgres is using C99, that's not something we
> can rely on?
We use C11, but the guarantee doesn't help us, due to the static storage
duration restriction. However, in C23, this has been fixed:
6.7.10 Initialization, point 11:
If an object that has automatic storage duration is initialized with an empty
initializer, its value is the same as the initialization of a static storage
duration object. Otherwise, if an object that has automatic storage duration
is not initialized explicitly, its representation is indeterminate. [...]
This notably includes being able to initialize everything to default with {}.
But C23 won't help us for a while :(
> > > True about the initialization part, mostly I guess, still we tend to
> > > worry about eliminating padding because these are wasted bytes in the
> > > WAL records. For example, xlhp_freeze_plans has two bytes of padding,
> > > that we eliminate while inserting its record by splitting the
> > > FLEXIBLE_ARRAY_MEMBER part.
> >
> > But in the case of this thread it's in the middle of the struct, so I'm not
> > sure the "wasted" bytes would be elminated, would it?
>
> Moving subxid_overflow before xids, wouldn't you have 3 bytes of
> padding at the end of the struct for the whole struct alignment?
Yes. I'm a bit doubtful the space wastage argument is strong for most of the
record types with padding, for a lot of them the waste through the padding is
such a small amount compared to the record type that it won't matter.
I don't think it makes a whole lot of sense to tackle this specifically for
xl_running_xacts. Until now we just accepted that WAL insertions can contain
random padding. If we don't want that, we should go around and make sure that
there is no padding (or padding is initialized) for *all* WAL records,
document that as the rule, and remove the relevant valgrind suppressions.
A lot of the WAL structs have holes. At least
- xl_brin_update
- xl_btree_mark_page_halfdead
- xl_btree_unlink_page
- xl_hash_vacuum_one_page
- xl_heap_inplace
- xl_heap_multi_insert
- xl_heap_rewrite_mapping
- xl_heap_truncate
- xl_invalidations
- xl_logical_message
- xl_multixact_create
- xl_running_xacts
- xl_xact_prepare
- xlhp_freeze_plan (not a toplevel type)
- xlhp_freeze_plans (not a toplevel type)
I didn't check how many WAL record have trailing padding that we don't avoid
with
offsetoff(structname, last_field) + sizeof(last_field_type)
style hackery.
Greetings,
Andres Freund
> Until now we just accepted that WAL insertions can contain > random padding. If we don't want that, we should go around and make sure that > there is no padding (or padding is initialized) for *all* WAL records, > document that as the rule, and remove the relevant valgrind suppressions. While this would be a nice requirement, I don't think we can enforce it for extensions, only for the core, as C has no capabilities to add a rule for this. But we could enforce it for the core code, what do you think about a script that similarly to headercheck detects WAL record issues automatically? That's also good for detecting the current issues, see attached script and the results I get when executing it on the current master branch. Notes: * It tries to find the related SizeOf macros, and if that exists, accepts trailing padding if it's correctly calculated. But it currently doesn't verify that the SizeOf macro is currently used everywhere (that also seems doable with some greps) * It also has a whitelist for non wal structs in these headers and 2 cases where we explicitly document that padding is ok (not sure if this latter should be really in the whitelist or not)
Вложения
On 16/02/2026 21:10, Andres Freund wrote: > I don't think it makes a whole lot of sense to tackle this specifically for > xl_running_xacts. Until now we just accepted that WAL insertions can contain > random padding. If we don't want that, we should go around and make sure that > there is no padding (or padding is initialized) for *all* WAL records, > document that as the rule, and remove the relevant valgrind suppressions. That's not random, that's server memory, right? Probably not another Heartbleed, but I'd rather initialize a few locals than find out. Happy to see this being worked on, these uninitialized WAL records are a major obstacle to enabling MemorySanitizer. I ran into this again today and this is how I found this thread. Unfortunately the MemorySanitizer can't even use the same suppressions as Valgrind, because the suppression architecture is different (can only remove the checks from a given function, not all stack traces that have this function like Valgrind does). Best regards Alexander Kuzmenkov TigerData
On 10/03/2026 23:51, Alexander Kuzmenkov wrote: > On 16/02/2026 21:10, Andres Freund wrote: >> I don't think it makes a whole lot of sense to tackle this >> specifically for >> xl_running_xacts. Until now we just accepted that WAL insertions can >> contain >> random padding. If we don't want that, we should go around and make >> sure that >> there is no padding (or padding is initialized) for *all* WAL records, >> document that as the rule, and remove the relevant valgrind suppressions. > > That's not random, that's server memory, right? Probably not another > Heartbleed, but I'd rather initialize a few locals than find out. > > Happy to see this being worked on, these uninitialized WAL records are a > major obstacle to enabling MemorySanitizer. I ran into this again today > and this is how I found this thread. Unfortunately the MemorySanitizer > can't even use the same suppressions as Valgrind, because the > suppression architecture is different (can only remove the checks from a > given function, not all stack traces that have this function like > Valgrind does). +1 for initializing all padding in WAL records. In fact I thought that we already did that. (Except in this case, apparently) - Heikki
On Tue, Mar 10, 2026 at 11:09 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> +1 for initializing all padding in WAL records. In fact I thought that
> we already did that. (Except in this case, apparently)
I found 42 exceptions like this. See the attached patch, it
initializes some WAL records and removes the WAL-related Valgrind
suppressions. The regression tests pass under Valgrind with these
changes.
As discussed above, I used memset instead of = { 0 }. I could observe
the latter to not initialize the padding on some configurations.
Вложения
On Wed, Mar 11, 2026 at 11:45 AM Alexander Kuzmenkov <akuzmenkov@tigerdata.com> wrote: > > On Tue, Mar 10, 2026 at 11:09 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > +1 for initializing all padding in WAL records. In fact I thought that > > we already did that. (Except in this case, apparently) > > I found 42 exceptions like this. See the attached patch, it > initializes some WAL records and removes the WAL-related Valgrind > suppressions. The regression tests pass under Valgrind with these > changes. I think I'm making some unneeded changes here though. For example in ginxlogInsertListPage for a two-int struct with no padding. I'll need to check them again one by one.
On 11/03/2026 13:07, Alexander Kuzmenkov wrote: > On Wed, Mar 11, 2026 at 11:45 AM Alexander Kuzmenkov > <akuzmenkov@tigerdata.com> wrote: >> >> On Tue, Mar 10, 2026 at 11:09 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: >>> +1 for initializing all padding in WAL records. In fact I thought that >>> we already did that. (Except in this case, apparently) >> >> I found 42 exceptions like this. See the attached patch, it >> initializes some WAL records and removes the WAL-related Valgrind >> suppressions. The regression tests pass under Valgrind with these >> changes. > > I think I'm making some unneeded changes here though. For example in > ginxlogInsertListPage for a two-int struct with no padding. I'll need > to check them again one by one. I experimented with this a little more. Valgrind complained about one more place on 'master': the xl_multixact_create got padding, when MultiXactOffset was widened to 64 bits. That could be fixed by swapping the fields. Another thing I did to find possible initializations: I ran 'pahole bin/postgres' and search for all the "xl_*" structs with padding, and then looked at where they're initialized. Attached patch (0003) shows a few places that look suspicious to me. I don't think I caught all structs used in WAL records, though, like the ginxlogInsertListPage thing mentioned. I wish we could just mark all WAL record structs with pg_attribute_packed(). Unfortunately pg_attribute_packed() is not available on all compilers we support. - Heikki
Вложения
The functions in the "0003" patch haven't surfaced in my "make installcheck-parallel" runs with Valgrind, or the "make check" with MemorySanitizer. However, I could hit most of them with some fuzzing. The only exception was `xl_hash_vacuum_one_page` but that's probably also triggerable.
I noticed that we also use `sizeof` in some WAL functions, so probably the tail padding can also be written to WAL? For example, consider this:
(gdb) ptype/o gistxlogPageSplit
type = struct gistxlogPageSplit {
/* 0 | 4 */ BlockNumber origrlink;
/* XXX 4-byte hole */
/* 8 | 8 */ GistNSN orignsn;
/* 16 | 1 */ _Bool origleaf;
/* XXX 1-byte hole */
/* 18 | 2 */ uint16 npage;
/* 20 | 1 */ _Bool markfollowright;
/* XXX 3-byte padding */
/* total size (bytes): 24 */
And then we do XLogRegisterData((char *) &xlrec, sizeof(gistxlogPageSplit));
In general, I'm wondering what our approach to this should be. Several potential improvements were mentioned, but I think for now we could focus on removing the Valgrind suppression. This is a meaningful improvement that uses the existing test tools. Do we want to defensively zero-initialize every case that seems to be potentially affected, i.e. written to WAL and has holes/tail padding? That sounds cheap and simple and probably even backportable. In the "0001" patch, there are several cases where no padding goes into WAL, I can remove these. For example, the use of xl_brin_createidx in brinbuild() does not have this problem.
I noticed that we also use `sizeof` in some WAL functions, so probably the tail padding can also be written to WAL? For example, consider this:
(gdb) ptype/o gistxlogPageSplit
type = struct gistxlogPageSplit {
/* 0 | 4 */ BlockNumber origrlink;
/* XXX 4-byte hole */
/* 8 | 8 */ GistNSN orignsn;
/* 16 | 1 */ _Bool origleaf;
/* XXX 1-byte hole */
/* 18 | 2 */ uint16 npage;
/* 20 | 1 */ _Bool markfollowright;
/* XXX 3-byte padding */
/* total size (bytes): 24 */
}
In general, I'm wondering what our approach to this should be. Several potential improvements were mentioned, but I think for now we could focus on removing the Valgrind suppression. This is a meaningful improvement that uses the existing test tools. Do we want to defensively zero-initialize every case that seems to be potentially affected, i.e. written to WAL and has holes/tail padding? That sounds cheap and simple and probably even backportable. In the "0001" patch, there are several cases where no padding goes into WAL, I can remove these. For example, the use of xl_brin_createidx in brinbuild() does not have this problem.
On 12/03/2026 20:23, Alexander Kuzmenkov wrote:
> The functions in the "0003" patch haven't surfaced in my "make
> installcheck-parallel" runs with Valgrind, or the "make check" with
> MemorySanitizer. However, I could hit most of them with some fuzzing.
> The only exception was `xl_hash_vacuum_one_page` but that's probably
> also triggerable.
Cool. It would be nice to have test coverage for every WAL record type.
Could you add tests to the test suite to hit those cases?
> I noticed that we also use `sizeof` in some WAL functions, so probably
> the tail padding can also be written to WAL? For example, consider this:
> (gdb) ptype/o gistxlogPageSplit
> type = struct gistxlogPageSplit {
> /* 0 | 4 */ BlockNumber origrlink;
> /* XXX 4-byte hole */
> /* 8 | 8 */ GistNSN orignsn;
> /* 16 | 1 */ _Bool origleaf;
> /* XXX 1-byte hole */
> /* 18 | 2 */ uint16 npage;
> /* 20 | 1 */ _Bool markfollowright;
> /* XXX 3-byte padding */
>
> /* total size (bytes): 24 */
> }
>
> And then we do XLogRegisterData((char *) &xlrec,
> sizeof(gistxlogPageSplit));
Yep.
> In general, I'm wondering what our approach to this should be. Several
> potential improvements were mentioned, but I think for now we could
> focus on removing the Valgrind suppression. This is a meaningful
> improvement that uses the existing test tools.
+1. I think it's a good goal that no uninitialized bytes reach the WAL.
It's not a security issue or anything, but just seems like good hygiene.
> Do we want to defensively zero-initialize every case that seems to
> be potentially affected, i.e. written to WAL and has holes/tail
> padding? That sounds cheap and simple and probably even
> backportable. In the "0001" patch, there are several cases where no
> padding goes into WAL, I can remove these. For example, the use of
> xl_brin_createidx in brinbuild() does not have this problem.
Sounds good to me.
- Heikki
I have removed the unnecessary memsets (for structs with no padding). With these changes, and removing the two WAL-related suppressions, the make installcheck under Valgrind passes. The second patch is a small addition to the hash index test that exercises the "vacuum one page" path we discussed above.
Вложения
Hi,
On Mon, Mar 16, 2026 at 05:14:10PM +0100, Alexander Kuzmenkov wrote:
> I have removed the unnecessary memsets (for structs with no padding). With
> these changes, and removing the two WAL-related suppressions, the make
> installcheck under Valgrind passes.
Thanks for the patch!
Without the memset part of the patch, I got valgrind's things like:
214 heap_multi_insert (heapam.c:2425)
149 heap_inplace_update_and_unlock (heapam.c:6592)
5 palloc (mcxt.c:1411)
3 _bt_getroot (nbtpage.c:348)
2 log_heap_prune_and_freeze (pruneheap.c:2171)
2 LogCurrentRunningXacts (standby.c:1356)
1 vacuumRedirectAndPlaceholder (spgvacuum.c:495)
1 _bt_set_cleanup_info (nbtpage.c:234)
1 ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
and none with the patch applied. So the proposed changes look good to me.
One comment regarding the new memset(s) in the patch, I wonder if we should:
1/ Add a comment on top of them explaining why we are doing this and why
we don't use {0} (cf. Andres's point about C23 up-thread)
or
2/ Create a new macro, say INITIALIZE_PADDING or such with the comment on
top of it. That way, we could do things like:
+ INITIALIZE_PADDING(xlrec);
instead of
+ memset(&xlrec, 0, sizeof(xlrec));
I think that it would make the intent more clear and we could switch to {0} in a
single place (if we feel the need) once C23 is required.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hello! I'd like to propose a different approach: instead of relying on valgrind and runtime detection of the issue, why don't we (also) add specific static analysis rules to detect the situation at compile time? There are several threads when I had the same idea: maybe I should write a postgres specific clang-tidy checker, and ask what everyone thinks about integrating that into the build process in an optional way? I attached a WIP patch that addresses this, specifically for the xlog padding problem for now. src/tools/pg-tidy contains a basic custom clang-tidy plugin that works based on two annotations (and helper macros that resolve to "" for normal compilation): * PG_NOPADDING can be used to mark that a struct doesn't have any padding. If this annotation is added to a struct, but it has padding, it will generate clang-tidy warnings. This is basically "-Wpadded", but specifically for selected types. * A separate check rule requires all PG_NOPADDING structs to be always zero-initialized, meaning we don't have to rely on memset at all * PG_REQUIRE_NOPADDING can be used to mark function arguments. If an argument is marked with this, then the underlying type of its parameter has to be either a primitive type, or a struct annotated with PG_NOPADDING Possible alternatives: * I could simplify this by removing PG_NOPADDING, and instead checking the requirement at every call site of a function with PG_REQUIRE_NOPADDING. That would also mean that it could only enforce zero-initialization when it's clearly visible in the same function. I choose the two annotation approach for increased reliability * I could simplify this to only check for end padding, enforce memset instead of zero initialization, and build upon Alexander's previous patch. 0001 implements and integrates pg-tidy (I only added it to the meson build, if there's interest I can also add it to make. clang-tidy integration works similarly to the llvm bitcode patch, so it is properly parallelized/incremental) 0002 adds basic helper macros 0003 marks the data in XLogRegisterData with PG_REQUIRE_NOPADDING, all related structs with PG_NOPADDING, and then fixes the padding issues by adding explicit padding data instead of the compiler autogenerated padding. We also cast a few arrays to char* (or alternatively we could use nolint to suppress the check), because I didn't want to everywhere zero initialize types like RelFileLocator. 0004 removes the now trivial SizeOf macros Compared to only using valgrind, clang-tidy: * works at compile time, guaranteed for every type used with xlog * in theory should work with extensions (if we want to, e.g. by integrating it into pgxs), without requiring extension developers to add proper test workflows using valgrind * valgrind should still work What do you think? I'm interested in opinions about both the specific case, and the generic idea of using custom clang-tidy checks for various postgres-specific checks. As I mentioned at the beginning of the message I think this could be useful for other things and doesn't always require custom annotations, in several cases it could work without any C code change.
Вложения
On Tue, Mar 17, 2026 at 06:45:46PM +0000, Zsolt Parragi wrote: > What do you think? I'm interested in opinions about both the specific > case, and the generic idea of using custom clang-tidy checks for > various postgres-specific checks. As I mentioned at the beginning of > the message I think this could be useful for other things and doesn't > always require custom annotations, in several cases it could work > without any C code change. That's an interesting idea to be more aggressive in terms of the checks done, but the invasiveness and the footprint this involves in the WAL insertion code paths makes it a no-go for me. Valgrind has proved to be quite useful over the years. Sure, it takes more time to run it, but for this specific issue I don't see why we should not continue relying on it, not reinventing the wheel, and it's served us pretty well. While removing padding is a nice practice on clean ground to make WAL records reproducible, that would mean forcing the rule even for custom WAL RMGRs. Some could say that they're OK to live with some padding, and that we don't have to be strictly aggressive at the code level. Different opinions are of course welcome, that's just my feeling on the matter about your proposal. -- Michael
Вложения
Thank you for the feedback! > but the invasiveness and the footprint this involves in > the WAL insertion code paths makes it a no-go for me. Invasiveness is an option I choose, not a requirement. In an alternative version, this could work in a "less strict" mode, on top of Alexander's memset patch, verifying that: if we see a function that uses XLogRegisterData, and the variable passed to it is defined in the same function/translation unit (which is most of the case), we require that variable to be well initialized - either all fields have to be specified by hand, or it needs an initializer block/memset at the beginning -- or if it has compiler generated padding inside, it requires memset, as that's the only thing guaranteed to initialize it. Similarly instead of requiring explicit padding added to the end of the struct, it could instead verify that 1. the SizeOf macros are correctly defined, refer to the proper size 2. if a SizeOf macro is defined, the struct is properly memset at every location where it it used In that version, there would be little or no change over Alexander's previous patch, other than adding pg-tidy itself to the build. I can also create a version with that approach, it should be relatively simply as I won't have to modify the WAL structs/calls like in this version. > Valgrind has proved to be quite useful over the years. Sure, it takes > more time to run it, but for this specific issue I don't see why we > should not continue relying on it I'm not saying that we should rely on valgrind, it is a good tool and it possibly catches things this wouldn't. This would be an additional tool, offering the advantage of being quick and integrated into the build. (Valgrind is also integrated, but it is also slow, I don't think everyone runs it regularly as part of normal development)
On Mon, Mar 16, 2026 at 05:14:10PM +0100, Alexander Kuzmenkov wrote:
> +-- Test insert-driven cleanup of dead index tuples (_hash_vacuum_one_page).
> +TRUNCATE hash_cleanup_heap;
> +INSERT INTO hash_cleanup_heap SELECT 1 FROM generate_series(1, 1000) as i;
> +DELETE FROM hash_cleanup_heap
> + WHERE ctid IN ('(0,5)','(0,10)','(0,15)','(0,20)','(0,25)',
> + '(0,30)','(0,35)','(0,40)','(0,45)','(0,50)');
> +SET enable_seqscan = off;
> +SET enable_bitmapscan = off;
> +SELECT count(*) FROM hash_cleanup_heap WHERE keycol = 1;
> +INSERT INTO hash_cleanup_heap SELECT 1 FROM generate_series(1, 200) as i;
> +RESET enable_seqscan;
> +RESET enable_bitmapscan;
> +
> -- Clean up.
> DROP TABLE hash_cleanup_heap;
Hmm. If I take this SQL sequence independently or with an
installcheck, the one-page VACUUM path is taken during the final
INSERT, but that's not the case of a `make check`. Could this be made
more stable? I have not spent a lot of time on it, so I may be
missing something obvious, of course.
--
Michael
Вложения
On Wed, Mar 18, 2026 at 7:59 AM Michael Paquier <michael@paquier.xyz> wrote:
Hmm. If I take this SQL sequence independently or with an
installcheck, the one-page VACUUM path is taken during the final
INSERT, but that's not the case of a `make check`. Could this be made
more stable? I have not spent a lot of time on it, so I may be
missing something obvious, of course.
I think this might be caused by "make check" running many tests in parallel, so the deleting transaction is visible to some snapshots, and the cleanup is not done. Not sure what's the best way to improve this.
On 18/03/2026 12:42, Alexander Kuzmenkov wrote: > On Wed, Mar 18, 2026 at 7:59 AM Michael Paquier <michael@paquier.xyz > <mailto:michael@paquier.xyz>> wrote: > > Hmm. If I take this SQL sequence independently or with an > installcheck, the one-page VACUUM path is taken during the final > INSERT, but that's not the case of a `make check`. Could this be made > more stable? I have not spent a lot of time on it, so I may be > missing something obvious, of course. > > > I think this might be caused by "make check" running many tests in > parallel, so the deleting transaction is visible to some snapshots, and > the cleanup is not done. Not sure what's the best way to improve this. I think if you use "BEGIN; INSERT ...; ROLLBACK;" to generate the dead tuples instead of DELETE, it will not be sensitive to concurrent snapshots like that. - Heikki
On Wed, Mar 18, 2026 at 02:18:45PM +0200, Heikki Linnakangas wrote: > I think if you use "BEGIN; INSERT ...; ROLLBACK;" to generate the dead > tuples instead of DELETE, it will not be sensitive to concurrent snapshots > like that. Nice trick, this makes the test stable. I have reused it and applied this part to take care of the coverage hole. -- Michael