Обсуждение: Remove header lock BufferGetLSNAtomic() on architectures with 64 bit atomic operations
Remove header lock BufferGetLSNAtomic() on architectures with 64 bit atomic operations
От
Andreas Karlsson
Дата:
Hi, Andres pointed out this possible optimization on Discord so I hacked up a quick patch which avoids taking a lock when reading the LSN from a page on architectures where we can be sure to not get a torn value. It is always nice to remove a lock from a reasonably hot code path. I thought about using our functions for atomics but did not do so since I did not want to introduce any extra overhead on platforms which do not support 64-bit atomic operations. I decided to just remove the struct to make the code simpler and more consistent but I can also see an argument for keeping it to get some degree of type safety. I have not properly benchmarked it yet but plan to do so when I am back from my vacation. I have also included a cleanup patch where I change a macro into an inline function which I think improves code readability. Feel free to ignore that one if you want. -- Andreas Percona
Вложения
Re: Remove header lock BufferGetLSNAtomic() on architectures with 64 bit atomic operations
От
Peter Geoghegan
Дата:
On Sun, Nov 23, 2025 at 6:10 PM Andreas Karlsson <andreas@proxel.se> wrote: > Andres pointed out this possible optimization on Discord so I hacked up > a quick patch which avoids taking a lock when reading the LSN from a > page on architectures where we can be sure to not get a torn value. It > is always nice to remove a lock from a reasonably hot code path. This seems very useful. One reason why I'm interested in this work is that it will facilitate other work (from Tomas Vondra and myself) on the proposed new amgetbatch interface, which enables optimizations such as index prefetching. That work will replace the use of the amgettuple interface with a index page/batch oriented amgetbatch interface. It generalizes nbtree's dropPin optimization (originally added under a different name by commit 2ed5b87f) to work with any index AM that uses the new amgetbatch interface -- which will include both nbtree and hash in the initial version (and possibly other index AMs in the future). This means that there'll be quite a few new BufferGetLSNAtomic calls during scans of hash indexes, where before there were none -- we need to stash an LSN so that we have an alternative way of detecting unsafe concurrent TID recycling when LP_DEAD-marking index tuples as a batch is released (since there'll have been no buffer pin held on the index leaf page while we accessed the heap when the dropPin optimization is applied). With page-level checksums enabled, on the recently posted v4 of our patch set, I find that there's a regression of about 5% of transaction throughput with a variant of pgbench SELECT that uses hash indexes. But with your patch applied on top of our own, that regression completely goes away. FWIW the improvement I see with regular nbtree index scans is still visible, but not quite as good as with hash indexes + the amgetbatch patch set. It should be easy enough to see this for yourself. Standard pgbench SELECT should work well enough -- just make sure that you test with page-level checksums enabled. -- Peter Geoghegan
Re: Remove header lock BufferGetLSNAtomic() on architectures with 64 bit atomic operations
От
Andreas Karlsson
Дата:
On 11/24/25 12:10 AM, Andreas Karlsson wrote:
> Andres pointed out this possible optimization on Discord so I hacked up
> a quick patch which avoids taking a lock when reading the LSN from a
> page on architectures where we can be sure to not get a torn value. It
> is always nice to remove a lock from a reasonably hot code path.
>
> I thought about using our functions for atomics but did not do so since
> I did not want to introduce any extra overhead on platforms which do not
> support 64-bit atomic operations.
>
> I decided to just remove the struct to make the code simpler and more
> consistent but I can also see an argument for keeping it to get some
> degree of type safety.
Noticed that the pageinspect tests in the CI did not like that I
increased the alignment requirement of PageHeaderData from 4 to 8 bytes
presumably due to a bytea's data being 4 byte aligned. I will need to
think how to best fix this. Maybe copying the struct is fine in the the
pageinspect code because being able to check for this alignment seems
like a nice property for code which is not in pageinspect.
It gives us the following backtrace:
../src/include/storage/bufpage.h:397:16: runtime error: member access
within misaligned address 0x55d6721e69bc for type 'const struct
PageHeaderData', which requires 8 byte alignment
0x55d6721e69bc: note: pointer points here
10 80 00 00 00 00 00 00 00 00 00 00 00 00 04 00 1c 00 e0 1f 00 20
04 20 00 00 00 00 e0 9f 40 00
^
#0 0x7fb39deb9b07 in PageGetMaxOffsetNumber
../src/include/storage/bufpage.h:397
#1 0x7fb39debac3e in heap_page_items
../contrib/pageinspect/heapfuncs.c:168
#2 0x55d63ff127c7 in ExecMakeTableFunctionResult
../src/backend/executor/execSRF.c:234
#3 0x55d63ff42aa7 in FunctionNext
../src/backend/executor/nodeFunctionscan.c:94
#4 0x55d63ff14157 in ExecScanFetch
../src/include/executor/execScan.h:134
#5 0x55d63ff14157 in ExecScanExtended
../src/include/executor/execScan.h:195
#6 0x55d63ff14157 in ExecScan ../src/backend/executor/execScan.c:59
#7 0x55d63ff427b6 in ExecFunctionScan
../src/backend/executor/nodeFunctionscan.c:269
#8 0x55d63ff0c38c in ExecProcNodeFirst
../src/backend/executor/execProcnode.c:469
#9 0x55d63fef5cee in ExecProcNode
../src/include/executor/executor.h:319
#10 0x55d63fef6465 in ExecutePlan
../src/backend/executor/execMain.c:1707
#11 0x55d63fef7c00 in standard_ExecutorRun
../src/backend/executor/execMain.c:366
#12 0x55d63fef7c60 in ExecutorRun
../src/backend/executor/execMain.c:303
#13 0x55d640322295 in PortalRunSelect ../src/backend/tcop/pquery.c:916
#14 0x55d64032534e in PortalRun ../src/backend/tcop/pquery.c:760
#15 0x55d64031e34a in exec_simple_query
../src/backend/tcop/postgres.c:1279
#16 0x55d6403215ce in PostgresMain ../src/backend/tcop/postgres.c:4775
#17 0x55d640316c6c in BackendMain
../src/backend/tcop/backend_startup.c:124
#18 0x55d6401b2711 in postmaster_child_launch
../src/backend/postmaster/launch_backend.c:268
#19 0x55d6401b7b19 in BackendStartup
../src/backend/postmaster/postmaster.c:3598
#20 0x55d6401ba061 in ServerLoop
../src/backend/postmaster/postmaster.c:1713
#21 0x55d6401bb972 in PostmasterMain
../src/backend/postmaster/postmaster.c:1403
#22 0x55d63ffde07b in main ../src/backend/main/main.c:231
#23 0x7fb39e033ca7 (/lib/x86_64-linux-gnu/libc.so.6+0x29ca7)
(BuildId: def5460e3cee00bfee25b429c97bcc4853e5b3a8)
#24 0x7fb39e033d64 in __libc_start_main
(/lib/x86_64-linux-gnu/libc.so.6+0x29d64) (BuildId:
def5460e3cee00bfee25b429c97bcc4853e5b3a8)
#25 0x55d63fb70260 in _start
(/tmp/cirrus-ci-build/build/tmp_install/usr/local/pgsql/bin/postgres+0x8b0260)
(BuildId: d2e585ffc320b6de4bfce11bb801bb923e7d3484)
Links
- https://cirrus-ci.com/task/5353983260229632
-
https://api.cirrus-ci.com/v1/artifact/task/5353983260229632/testrun/build/testrun/pageinspect/regress/log/postmaster.log
Andreas
Re: Remove header lock BufferGetLSNAtomic() on architectures with 64 bit atomic operations
От
Andres Freund
Дата:
Hi,
On 2025-11-24 00:10:03 +0100, Andreas Karlsson wrote:
> Andres pointed out this possible optimization on Discord so I hacked up a
> quick patch which avoids taking a lock when reading the LSN from a page on
> architectures where we can be sure to not get a torn value. It is always
> nice to remove a lock from a reasonably hot code path.
Nice.
> static inline XLogRecPtr
> PageXLogRecPtrGet(PageXLogRecPtr val)
> {
> - return (uint64) val.xlogid << 32 | val.xrecoff;
> + return val;
> }
>
> #define PageXLogRecPtrSet(ptr, lsn) \
> - ((ptr).xlogid = (uint32) ((lsn) >> 32), (ptr).xrecoff = (uint32) (lsn))
> + ((ptr) = (lsn))
> +
> +#else
> +
> +static inline XLogRecPtr
> +PageXLogRecPtrGet(volatile PageXLogRecPtr val)
> +{
> + return (val << 32) | (val >> 32);
> +}
A volatile on a non-pointer won't do you much good, I'm afraid. You need to
make sure that the underlying value is read as a single 8 byte read, I don't
see how this guarantees that, unfortunately.
Greetings,
Andres Freund
Re: Remove header lock BufferGetLSNAtomic() on architectures with 64 bit atomic operations
От
Andreas Karlsson
Дата:
On 1/2/26 4:27 PM, Andres Freund wrote: > A volatile on a non-pointer won't do you much good, I'm afraid. You need to > make sure that the underlying value is read as a single 8 byte read, I don't > see how this guarantees that, unfortunately. Yeah, that was a quite big thinko. I have a attached a patch with the thinko fixed but I am still not happy with it. I think I will try to use atomics.h and see if that makes the code nicer to read. Also will after that see what I can do about pageinspect. Andreas
Вложения
Re: Remove header lock BufferGetLSNAtomic() on architectures with 64 bit atomic operations
От
Peter Geoghegan
Дата:
On Wed, Jan 14, 2026 at 1:31 AM Andreas Karlsson <andreas@proxel.se> wrote: > Yeah, that was a quite big thinko. I have a attached a patch with the > thinko fixed but I am still not happy with it. I think I will try to use > atomics.h and see if that makes the code nicer to read. > > Also will after that see what I can do about pageinspect. I believe that pageinspect's heap_page_items function needs to use get_page_from_raw -- see commit 14e9b18fed. Attached patch 0002-* shows what's required. I'm including your original 0001-* patch from your v2 here, to keep CFTester happy. We (Tomas Vondra and I) are treating this as a dependency for our index prefetching patch. It'd be good to get this done soon. -- Peter Geoghegan
Вложения
On 2/5/26 16:38, Peter Geoghegan wrote: > On Wed, Jan 14, 2026 at 1:31 AM Andreas Karlsson <andreas@proxel.se> wrote: >> Yeah, that was a quite big thinko. I have a attached a patch with the >> thinko fixed but I am still not happy with it. I think I will try to use >> atomics.h and see if that makes the code nicer to read. >> >> Also will after that see what I can do about pageinspect. > > I believe that pageinspect's heap_page_items function needs to use > get_page_from_raw -- see commit 14e9b18fed. > > Attached patch 0002-* shows what's required. I'm including your > original 0001-* patch from your v2 here, to keep CFTester happy. > > We (Tomas Vondra and I) are treating this as a dependency for our > index prefetching patch. It'd be good to get this done soon. > -- Here's a v4 with some very minor adjustments and updated commit message. Barring objections, I'm going to push this shortly - in a day or two, to get it out of the way for the main index prefetch patch. The only real change is about asserts in BufferGetLSNAtomic( - the new "ifdef" branch only called PageGetLSN(), so it lost the checks that the buffer is valid and pinned. Which seems not desirable. In fact, looking at the existing code, shouldn't the asserts be before the "fastpath" exit (for cases without hint bits or local buffers)? The v4 changes both points. It adds the asserts to the new ifdef block, and moves it up in the existing one. regards -- Tomas Vondra
Вложения
Re: Remove header lock BufferGetLSNAtomic() on architectures with 64 bit atomic operations
От
Andres Freund
Дата:
Hi, On 2026-03-10 19:41:54 +0100, Tomas Vondra wrote: > On 2/5/26 16:38, Peter Geoghegan wrote: > > On Wed, Jan 14, 2026 at 1:31 AM Andreas Karlsson <andreas@proxel.se> wrote: > >> Yeah, that was a quite big thinko. I have a attached a patch with the > >> thinko fixed but I am still not happy with it. I think I will try to use > >> atomics.h and see if that makes the code nicer to read. > >> > >> Also will after that see what I can do about pageinspect. > > > > I believe that pageinspect's heap_page_items function needs to use > > get_page_from_raw -- see commit 14e9b18fed. Maybe I'm slow today, but how's that related to the patch at hand? Regardless of that, it seems a bit confusing to fold the pageinspect changes into the main commit. > The only real change is about asserts in BufferGetLSNAtomic( - the new > "ifdef" branch only called PageGetLSN(), so it lost the checks that the > buffer is valid and pinned. Which seems not desirable. > > In fact, looking at the existing code, shouldn't the asserts be before > the "fastpath" exit (for cases without hint bits or local buffers)? > > The v4 changes both points. It adds the asserts to the new ifdef block, > and moves it up in the existing one. > > > regards > > -- > Tomas Vondra > From 4d6479b37e60015cc4cf55579a8ec51337675996 Mon Sep 17 00:00:00 2001 > From: Andreas Karlsson <andreas@proxel.se> > Date: Fri, 21 Nov 2025 10:15:06 +0100 > Subject: [PATCH v4] Do not lock in BufferGetLSNAtomic() on archs with 8 byte > atomic reads > > On platforms where we can read or write the whole LSN atomically, we do > not need to lock the buffer header to prevent torn LSNs. We can do this > only on platforms with PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY, and when the > pd_lsn field is properly aligned. > > For historical reasons the PageXLogRecPtr was defined as a struct with > two uint32 fields. This replaces it with a single uint64 value, to make > the intent clearer. > > Idea by Andres Freund. Initial patch by Andreas Karlsson, improved by > Peter Geoghegan. Minor tweaks by me. > > Author: Andreas Karlsson <andreas@proxel.se> > Author: Peter Geoghegan <pg@bowt.ie> > Reviewed-by: Andres Freund <andres@anarazel.de> > Reviewed-by: Tomas Vondra <tomas@vondra.me> > Discussion: https://postgr.es/m/b6610c3b-3f59-465a-bdbb-8e9259f0abc4@proxel.se > diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c > index 5f3d083e938..efcf64169aa 100644 > --- a/src/backend/storage/buffer/bufmgr.c > +++ b/src/backend/storage/buffer/bufmgr.c > @@ -4628,33 +4628,42 @@ BufferIsPermanent(Buffer buffer) > > /* > * BufferGetLSNAtomic > - * Retrieves the LSN of the buffer atomically using a buffer header lock. > - * This is necessary for some callers who may not have an exclusive lock > - * on the buffer. > + * Retrieves the LSN of the buffer atomically. > + * > + * On platforms without 8 byte atomic reads/write we need to take a > + * header lock. This is necessary for some callers who may not have an > + * exclusive lock on the buffer. > */ > XLogRecPtr > BufferGetLSNAtomic(Buffer buffer) > { > +#ifdef PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY > + Assert(BufferIsValid(buffer)); > + Assert(BufferIsPinned(buffer)); > + > + return PageGetLSN(BufferGetPage(buffer)); > +#else > char *page = BufferGetPage(buffer); > BufferDesc *bufHdr; > XLogRecPtr lsn; > > + /* Make sure we've got a real buffer, and that we hold a pin on it. */ > + Assert(BufferIsValid(buffer)); > + Assert(BufferIsPinned(buffer)); > + Seems a bit silly to have these asserts duplicated. I'd probably just put the body of the #else into an {} and then have the asserts before the ifdef. > diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h > index 92a6bb9b0c0..e994526ca52 100644 > --- a/src/include/storage/bufpage.h > +++ b/src/include/storage/bufpage.h > @@ -91,24 +91,27 @@ typedef uint16 LocationIndex; > > > /* > - * For historical reasons, the 64-bit LSN value is stored as two 32-bit > - * values. > + * For historical reasons, the storage of 64-bit LSN values depends on CPU > + * endianness; PageXLogRecPtr used to be a struct consisting of two 32-bit > + * values. When reading (and writing) the pd_lsn field from page headers, the > + * caller must convert from (and convert to) the platform's native endianness. > */ > -typedef struct > -{ > - uint32 xlogid; /* high bits */ > - uint32 xrecoff; /* low bits */ > -} PageXLogRecPtr; > +typedef uint64 PageXLogRecPtr; I think this should explain why we are storing it as a 64bit value (i.e. that we need to be able to read it without tearing). I suspect this should continue to be a struct (just containing a 64bit uint), otherwise it'll be way way too easy to omit the conversion, due to C's weak typedefs. > -static inline XLogRecPtr > -PageXLogRecPtrGet(PageXLogRecPtr val) > +/* > + * Convert a pd_lsn taken from a page header into its native > + * uint64/PageXLogRecPtr representation > + */ Odd double space before pd_lsn. > +static inline PageXLogRecPtr > +PageXLogRecPtrGet(PageXLogRecPtr pd_lsn) > { > - return (uint64) val.xlogid << 32 | val.xrecoff; > +#ifdef WORDS_BIGENDIAN > + return pd_lsn; > +#else > + return (pd_lsn << 32) | (pd_lsn >> 32); > +#endif > } > > -#define PageXLogRecPtrSet(ptr, lsn) \ > - ((ptr).xlogid = (uint32) ((lsn) >> 32), (ptr).xrecoff = (uint32) (lsn)) > - > /* > * disk page organization > * > @@ -387,10 +390,11 @@ PageGetLSN(const PageData *page) > { > return PageXLogRecPtrGet(((const PageHeaderData *) page)->pd_lsn); > } I think this may need to actually use a volatile to force the read to happen as the 64bit value, otherwise the compiler would be entirely free to implement it as two 32bit reads. > static inline void > PageSetLSN(Page page, XLogRecPtr lsn) > { > - PageXLogRecPtrSet(((PageHeader) page)->pd_lsn, lsn); > + ((PageHeader) page)->pd_lsn = PageXLogRecPtrGet(lsn); > } Dito. Seems also a bit misleading to document PageXLogRecPtrSet as "Convert a pd_lsn taken from a page header into its native ..." when we use it for both directions. I think this probably also ought to assert that the page this is being set on is properly aligned. Greetings, Andres Freund
On 3/10/26 21:41, Andres Freund wrote:
> Hi,
>
> On 2026-03-10 19:41:54 +0100, Tomas Vondra wrote:
>> On 2/5/26 16:38, Peter Geoghegan wrote:
>>> On Wed, Jan 14, 2026 at 1:31 AM Andreas Karlsson <andreas@proxel.se> wrote:
>>>> Yeah, that was a quite big thinko. I have a attached a patch with the
>>>> thinko fixed but I am still not happy with it. I think I will try to use
>>>> atomics.h and see if that makes the code nicer to read.
>>>>
>>>> Also will after that see what I can do about pageinspect.
>>>
>>> I believe that pageinspect's heap_page_items function needs to use
>>> get_page_from_raw -- see commit 14e9b18fed.
>
> Maybe I'm slow today, but how's that related to the patch at hand?
>
> Regardless of that, it seems a bit confusing to fold the pageinspect changes
> into the main commit.
>
That's because of alignment requirements, per this comment:
* On machines with MAXALIGN = 8, the payload of a bytea is not
* maxaligned, since it will start 4 bytes into a palloc'd value.
* PageHeaderData requires 8 byte alignment, so always use this
* function when accessing page header fields from a raw page bytea.
AFAIK proper alignment is required to make the load atomic.
>> XLogRecPtr
>> BufferGetLSNAtomic(Buffer buffer)
>> {
>> +#ifdef PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY
>> + Assert(BufferIsValid(buffer));
>> + Assert(BufferIsPinned(buffer));
>> +
>> + return PageGetLSN(BufferGetPage(buffer));
>> +#else
>> char *page = BufferGetPage(buffer);
>> BufferDesc *bufHdr;
>> XLogRecPtr lsn;
>>
>> + /* Make sure we've got a real buffer, and that we hold a pin on it. */
>> + Assert(BufferIsValid(buffer));
>> + Assert(BufferIsPinned(buffer));
>> +
>
> Seems a bit silly to have these asserts duplicated. I'd probably just put the
> body of the #else into an {} and then have the asserts before the ifdef.
>
OK, WFM.
>
>> diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
>> index 92a6bb9b0c0..e994526ca52 100644
>> --- a/src/include/storage/bufpage.h
>> +++ b/src/include/storage/bufpage.h
>> @@ -91,24 +91,27 @@ typedef uint16 LocationIndex;
>>
>>
>> /*
>> - * For historical reasons, the 64-bit LSN value is stored as two 32-bit
>> - * values.
>> + * For historical reasons, the storage of 64-bit LSN values depends on CPU
>> + * endianness; PageXLogRecPtr used to be a struct consisting of two 32-bit
>> + * values. When reading (and writing) the pd_lsn field from page headers, the
>> + * caller must convert from (and convert to) the platform's native endianness.
>> */
>> -typedef struct
>> -{
>> - uint32 xlogid; /* high bits */
>> - uint32 xrecoff; /* low bits */
>> -} PageXLogRecPtr;
>> +typedef uint64 PageXLogRecPtr;
>
> I think this should explain why we are storing it as a 64bit value (i.e. that
> we need to be able to read it without tearing).
>
>
> I suspect this should continue to be a struct (just containing a 64bit uint),
> otherwise it'll be way way too easy to omit the conversion, due to C's weak
> typedefs.
>
I agree. It'd be trivial to call it with plain XLogRecPtr (especially
now, with a function handling both directions).
>
>> -static inline XLogRecPtr
>> -PageXLogRecPtrGet(PageXLogRecPtr val)
>> +/*
>> + * Convert a pd_lsn taken from a page header into its native
>> + * uint64/PageXLogRecPtr representation
>> + */
>
> Odd double space before pd_lsn.
>
>
>> +static inline PageXLogRecPtr
>> +PageXLogRecPtrGet(PageXLogRecPtr pd_lsn)
>> {
>> - return (uint64) val.xlogid << 32 | val.xrecoff;
>> +#ifdef WORDS_BIGENDIAN
>> + return pd_lsn;
>> +#else
>> + return (pd_lsn << 32) | (pd_lsn >> 32);
>> +#endif
>> }
>>
>> -#define PageXLogRecPtrSet(ptr, lsn) \
>> - ((ptr).xlogid = (uint32) ((lsn) >> 32), (ptr).xrecoff = (uint32) (lsn))
>> -
>> /*
>> * disk page organization
>> *
>> @@ -387,10 +390,11 @@ PageGetLSN(const PageData *page)
>> {
>> return PageXLogRecPtrGet(((const PageHeaderData *) page)->pd_lsn);
>> }
>
> I think this may need to actually use a volatile to force the read to happen
> as the 64bit value, otherwise the compiler would be entirely free to implement
> it as two 32bit reads.
>
I did that in v5 (I think). But at the same time I'm now rather confused
about the meaning of PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY. Because if a
well-aligned load/store of 8B can be implemented as two 32-bit reads
(with a "sequence point" in between), then how is that atomic?
I'm also a bit ... unsure about "volatile" in general. Is that actually
something the specification says is needed here, or is it just an
attempt to "disable optimizations" (like the split into two stores)?
>
>> static inline void
>> PageSetLSN(Page page, XLogRecPtr lsn)
>> {
>> - PageXLogRecPtrSet(((PageHeader) page)->pd_lsn, lsn);
>> + ((PageHeader) page)->pd_lsn = PageXLogRecPtrGet(lsn);
>> }
>
> Dito.
>
> Seems also a bit misleading to document PageXLogRecPtrSet as "Convert a pd_lsn
> taken from a page header into its native ..." when we use it for both
> directions.
>
I agree, I found that confusing too. In the attached v5 I went back to
the separate get/set functions from v3. I think that's better/clearer.
regards
--
Tomas Vondra
Вложения
Re: Remove header lock BufferGetLSNAtomic() on architectures with 64 bit atomic operations
От
Andres Freund
Дата:
Hi,
On 2026-03-11 01:00:34 +0100, Tomas Vondra wrote:
> On 3/10/26 21:41, Andres Freund wrote:
> > On 2026-03-10 19:41:54 +0100, Tomas Vondra wrote:
> >> On 2/5/26 16:38, Peter Geoghegan wrote:
> >>> On Wed, Jan 14, 2026 at 1:31 AM Andreas Karlsson <andreas@proxel.se> wrote:
> >>>> Yeah, that was a quite big thinko. I have a attached a patch with the
> >>>> thinko fixed but I am still not happy with it. I think I will try to use
> >>>> atomics.h and see if that makes the code nicer to read.
> >>>>
> >>>> Also will after that see what I can do about pageinspect.
> >>>
> >>> I believe that pageinspect's heap_page_items function needs to use
> >>> get_page_from_raw -- see commit 14e9b18fed.
> >
> > Maybe I'm slow today, but how's that related to the patch at hand?
> >
> > Regardless of that, it seems a bit confusing to fold the pageinspect changes
> > into the main commit.
> >
>
> That's because of alignment requirements, per this comment:
>
> * On machines with MAXALIGN = 8, the payload of a bytea is not
> * maxaligned, since it will start 4 bytes into a palloc'd value.
> * PageHeaderData requires 8 byte alignment, so always use this
> * function when accessing page header fields from a raw page bytea.
I guess we are now reading 8 bytes as one.
> AFAIK proper alignment is required to make the load atomic.
Sure, but it's a copy of the page, so that'd itself would not matter.
> >> +static inline PageXLogRecPtr
> >> +PageXLogRecPtrGet(PageXLogRecPtr pd_lsn)
> >> {
> >> - return (uint64) val.xlogid << 32 | val.xrecoff;
> >> +#ifdef WORDS_BIGENDIAN
> >> + return pd_lsn;
> >> +#else
> >> + return (pd_lsn << 32) | (pd_lsn >> 32);
> >> +#endif
> >> }
> >>
> >> -#define PageXLogRecPtrSet(ptr, lsn) \
> >> - ((ptr).xlogid = (uint32) ((lsn) >> 32), (ptr).xrecoff = (uint32) (lsn))
> >> -
> >> /*
> >> * disk page organization
> >> *
> >> @@ -387,10 +390,11 @@ PageGetLSN(const PageData *page)
> >> {
> >> return PageXLogRecPtrGet(((const PageHeaderData *) page)->pd_lsn);
> >> }
> >
> > I think this may need to actually use a volatile to force the read to happen
> > as the 64bit value, otherwise the compiler would be entirely free to implement
> > it as two 32bit reads.
> >
>
> I did that in v5 (I think). But at the same time I'm now rather confused
> about the meaning of PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY. Because if a
> well-aligned load/store of 8B can be implemented as two 32-bit reads
> (with a "sequence point" in between), then how is that atomic?
All PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY says is that the hardware can do 8
byte reads/writes without tearing. But that still requires 8 bytes
reads/writes to be done as one, not multiple instructions.
> I'm also a bit ... unsure about "volatile" in general. Is that actually
> something the specification says is needed here, or is it just an
> attempt to "disable optimizations" (like the split into two stores)?
It's definitely suboptimal. We should use something C11's
atomic_load()/atomic_store(). But we can't rely on that yet (it's not enabled
without extra flags in at least msvc).
The volatile does prevent the compiler from just doing a read/write twice or
never, just because it'd be nicer for code generation. But it doesn't
actually guarantee that the right instructions for reading writing are
generated :(
Greetings,
Andres Freund
On 3/11/26 01:21, Andres Freund wrote:
> Hi,
>
> On 2026-03-11 01:00:34 +0100, Tomas Vondra wrote:
>> On 3/10/26 21:41, Andres Freund wrote:
>>> On 2026-03-10 19:41:54 +0100, Tomas Vondra wrote:
>>>> On 2/5/26 16:38, Peter Geoghegan wrote:
>>>>> On Wed, Jan 14, 2026 at 1:31 AM Andreas Karlsson <andreas@proxel.se> wrote:
>>>>>> Yeah, that was a quite big thinko. I have a attached a patch with the
>>>>>> thinko fixed but I am still not happy with it. I think I will try to use
>>>>>> atomics.h and see if that makes the code nicer to read.
>>>>>>
>>>>>> Also will after that see what I can do about pageinspect.
>>>>>
>>>>> I believe that pageinspect's heap_page_items function needs to use
>>>>> get_page_from_raw -- see commit 14e9b18fed.
>>>
>>> Maybe I'm slow today, but how's that related to the patch at hand?
>>>
>>> Regardless of that, it seems a bit confusing to fold the pageinspect changes
>>> into the main commit.
>>>
>>
>> That's because of alignment requirements, per this comment:
>>
>> * On machines with MAXALIGN = 8, the payload of a bytea is not
>> * maxaligned, since it will start 4 bytes into a palloc'd value.
>> * PageHeaderData requires 8 byte alignment, so always use this
>> * function when accessing page header fields from a raw page bytea.
>
> I guess we are now reading 8 bytes as one.
>
Not sure I understand. Are you saying it's OK or not?
>
>> AFAIK proper alignment is required to make the load atomic.
>
> Sure, but it's a copy of the page, so that'd itself would not matter.
>
I does seem a bit unnecessary, in the sense that there shouldn't be
anything changing the data.
>
>>>> +static inline PageXLogRecPtr
>>>> +PageXLogRecPtrGet(PageXLogRecPtr pd_lsn)
>>>> {
>>>> - return (uint64) val.xlogid << 32 | val.xrecoff;
>>>> +#ifdef WORDS_BIGENDIAN
>>>> + return pd_lsn;
>>>> +#else
>>>> + return (pd_lsn << 32) | (pd_lsn >> 32);
>>>> +#endif
>>>> }
>>>>
>>>> -#define PageXLogRecPtrSet(ptr, lsn) \
>>>> - ((ptr).xlogid = (uint32) ((lsn) >> 32), (ptr).xrecoff = (uint32) (lsn))
>>>> -
>>>> /*
>>>> * disk page organization
>>>> *
>>>> @@ -387,10 +390,11 @@ PageGetLSN(const PageData *page)
>>>> {
>>>> return PageXLogRecPtrGet(((const PageHeaderData *) page)->pd_lsn);
>>>> }
>>>
>>> I think this may need to actually use a volatile to force the read to happen
>>> as the 64bit value, otherwise the compiler would be entirely free to implement
>>> it as two 32bit reads.
>>>
>>
>> I did that in v5 (I think). But at the same time I'm now rather confused
>> about the meaning of PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY. Because if a
>> well-aligned load/store of 8B can be implemented as two 32-bit reads
>> (with a "sequence point" in between), then how is that atomic?
>
> All PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY says is that the hardware can do 8
> byte reads/writes without tearing. But that still requires 8 bytes
> reads/writes to be done as one, not multiple instructions.
>
OK
>
>> I'm also a bit ... unsure about "volatile" in general. Is that actually
>> something the specification says is needed here, or is it just an
>> attempt to "disable optimizations" (like the split into two stores)?
>
> It's definitely suboptimal. We should use something C11's
> atomic_load()/atomic_store(). But we can't rely on that yet (it's not enabled
> without extra flags in at least msvc).
>
> The volatile does prevent the compiler from just doing a read/write twice or
> never, just because it'd be nicer for code generation. But it doesn't
> actually guarantee that the right instructions for reading writing are
> generated :(
>
That sounds a bit scary, TBH.
regards
--
Tomas Vondra
Re: Remove header lock BufferGetLSNAtomic() on architectures with 64 bit atomic operations
От
Andres Freund
Дата:
Hi,
On 2026-03-11 01:43:02 +0100, Tomas Vondra wrote:
> >>> Maybe I'm slow today, but how's that related to the patch at hand?
> >>>
> >>> Regardless of that, it seems a bit confusing to fold the pageinspect changes
> >>> into the main commit.
> >>>
> >>
> >> That's because of alignment requirements, per this comment:
> >>
> >> * On machines with MAXALIGN = 8, the payload of a bytea is not
> >> * maxaligned, since it will start 4 bytes into a palloc'd value.
> >> * PageHeaderData requires 8 byte alignment, so always use this
> >> * function when accessing page header fields from a raw page bytea.
> >
> > I guess we are now reading 8 bytes as one.
> >
>
> Not sure I understand. Are you saying it's OK or not?
I mean that this adds an 8 byte read, which alignment picky hardware could
complain about, in theory. So I guess I kinda see the point of the change
now.
> >> I'm also a bit ... unsure about "volatile" in general. Is that actually
> >> something the specification says is needed here, or is it just an
> >> attempt to "disable optimizations" (like the split into two stores)?
> >
> > It's definitely suboptimal. We should use something C11's
> > atomic_load()/atomic_store(). But we can't rely on that yet (it's not enabled
> > without extra flags in at least msvc).
> >
> > The volatile does prevent the compiler from just doing a read/write twice or
> > never, just because it'd be nicer for code generation. But it doesn't
> > actually guarantee that the right instructions for reading writing are
> > generated :(
> >
>
> That sounds a bit scary, TBH.
Indeed. It's how atomic reads / writes have worked for quite a while though,
so I think we'll just have to live with it until we can rely on C11 atomics on
msvc.
> From 794fb844266dcd8d97217da09b61c5c9cafc01d7 Mon Sep 17 00:00:00 2001
> From: Andreas Karlsson <andreas@proxel.se>
> Date: Fri, 21 Nov 2025 10:15:06 +0100
> Subject: [PATCH v5] Do not lock in BufferGetLSNAtomic() on archs with 8 byte
> atomic reads
>
> On platforms where we can read or write the whole LSN atomically, we do
> not need to lock the buffer header to prevent torn LSNs. We can do this
> only on platforms with PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY, and when the
> pd_lsn field is properly aligned.
>
> For historical reasons the PageXLogRecPtr was defined as a struct with
> two uint32 fields. This replaces it with a single uint64 value, to make
> the intent clearer.
>
> Idea by Andres Freund. Initial patch by Andreas Karlsson, improved by
> Peter Geoghegan. Minor tweaks by me.
I'd add a note explaining why heapfuncs is updated.
> /*
> * disk page organization
> @@ -385,12 +411,13 @@ PageGetMaxOffsetNumber(const PageData *page)
> static inline XLogRecPtr
> PageGetLSN(const PageData *page)
> {
> - return PageXLogRecPtrGet(((const PageHeaderData *) page)->pd_lsn);
> + return PageXLogRecPtrGet(&((const volatile PageHeaderData *) page)->pd_lsn);
> }
I don't think this volatile is needed, the one in PageXLogRecPtrGet's
declaration should do the work.
Greetings,
Andres Freund