Обсуждение: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

Поиск
Список
Период
Сортировка

Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

От
Anthonin Bonnefoy
Дата:
Hi,

Shutdown may be indefinitely stuck under the following circumstances:
- Data checksum is enabled (needed to generate FPI_FOR_HINT record)
- A logical replication walsender is running
- A select in an explicit ongoing transaction pruned a heap page and
logged a FPI_FOR_HINT record. This record is likely going to be a
contrecord and start a new page.

Starting the shutdown will kill this ongoing transaction. Since the
transaction doesn't have an allocated xid, the FPI_FOR_HINT record
will be left unflushed.

When the checkpointer calls ShutdownXLOG(), all walsenders will be
notified to stop. However, the logical replication walsender will be
stuck in an infinite loop, trying to read this unflushed record and
never reaching the stop state, blocking the whole shutdown sequence.

This can be reproduced with the following script (this assumes
`pgbench -i` was run to create pgbench_accounts and a running logical
replication walsender):

TRUNCATE pgbench_accounts;
-- Completely fill the first heap page
INSERT INTO pgbench_accounts SELECT *, *, *, '' FROM generate_series(0, 62);
-- This should tag the page's metadata as full
BEGIN;
UPDATE pgbench_accounts SET bid=4 where aid=1;
ROLLBACK;
-- Force checkpoint so next change will be a FPW
CHECKPOINT;
-- Open an explicit transaction
BEGIN;
-- Select will do an opportunistic pruning, find nothing to prune but
will still unset the page full flag, writing a FPI_FOR_HINT
SELECT ctid, * FROM pgbench_accounts WHERE aid=2;

Then shutdown the database with 'pg_ctl stop' with the transaction
left opened. The shutdown will be stuck and the logical replication
walsender will be stuck at 100% CPU.

I've managed to reproduce this issue on 14 and the current HEAD.

The attached (tentative) patch fixes the issue by flushing all records
before signaling walsenders to stop. At that point, all backends
should have been killed, so flushing leftover records felt like a
correct approach.

Regards,
Anthonin Bonnefoy

Вложения

Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

От
Anthonin Bonnefoy
Дата:
Some additional informations:

XLogSendLogical is stuck in the following infinite loop:
- It attempt to read the next record with XLogReadAhead + XLogDecodeNextRecord
- The page with the record header is read
- It has the record header, it goes back to XLogDecodeNextRecord
- tot_len > len, the record needs to be reassembled
- The next page containing the rest of the record is read with
ReadPageInternal. It fails since this page was never written.
- It jumps to the err label, XLogReaderInvalReadState(state) is called
and reset the reader state
- It goes back to the start of WalSndLoop's loop

There are some attempts done by the walsender to flush the WAL using
XLogBackgroundFlush:
  /*
   * If we're shutting down, trigger pending WAL to be written out,
   * otherwise we'd possibly end up waiting for WAL that never gets
   * written, because walwriter has shut down already.
   */
  if (got_STOPPING)
    XLogBackgroundFlush();

However, XLogBackgroundFlush only writes completed blocks or the
latest async xact known. With the issue triggered, I have the
following state:

XLogCtl->LogwrtRqst: (Write = 39128056, Flush = 39124992)
LogwrtResult: (Write = 39124992, Flush = 39124992)
XLogCtl->asyncXactLSN: 39119776

There are 3064 bytes (39128056 - 39124992) that contain the next page
with the rest of the cont record that still needs to be written.
However, XLogBackgroundFlush backs off to the previous page boundary:
  /* back off to last completed page boundary */
  WriteRqst.Write -= WriteRqst.Write % XLOG_BLCKSZ;
Meaning WriteRqst.Write will be 39124992, which is already written and
flushed and asyncXactLSN is behind both write and flush.

So, it looks like the root issue is more that the async LSN isn't
updated when a transaction without xid is rollbacked.
When going through CommitTransaction, such a transaction would still
go through XLogSetAsyncXactLSN.

I've updated the patch with this new approach: XLogSetAsyncXactLSN is
now called in RecordTransactionAbort even when a xid wasn't assigned.
With this, the logical walsender is able to force the flush of the
last partial page using XLogBackgroundFlush.

Вложения

Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

От
Anthonin Bonnefoy
Дата:
And here's a script reproducing the issue. It creates the clusters,
sets up the logical replication and runs the necessary query to leave
FPI_FOR_HINT as the last written record.

If successful, the script should have pg_stop stuck with 'waiting for
server to shut down.......', with the walsender stuck at 100% CPU.

Вложения

Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

От
Fujii Masao
Дата:
> Shutdown may be indefinitely stuck under the following circumstances:

Thanks for reporting this issue and for the patch!

I was able to reproduce the problem on master.


On Tue, Feb 24, 2026 at 6:47 PM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:
> So, it looks like the root issue is more that the async LSN isn't
> updated when a transaction without xid is rollbacked.
> When going through CommitTransaction, such a transaction would still
> go through XLogSetAsyncXactLSN.
>
> I've updated the patch with this new approach: XLogSetAsyncXactLSN is
> now called in RecordTransactionAbort even when a xid wasn't assigned.
> With this, the logical walsender is able to force the flush of the
> last partial page using XLogBackgroundFlush.

I'm a bit concerned that this approach could cause walwriter to attempt
WAL writes more aggressively in some cases, adding overhead during
normal processing. If the goal is only to prevent shutdown from getting stuck,
instead, would it be sufficient for ShutdownXLOG() to call
XLogSetAsyncXactLSN() just before WalSndWaitStopping()?

Regards,

--
Fujii Masao



Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

От
Anthonin Bonnefoy
Дата:
On Thu, Feb 26, 2026 at 5:20 PM Fujii Masao <masao.fujii@gmail.com> wrote:
> Thanks for reporting this issue and for the patch!
>
> I was able to reproduce the problem on master.

Thanks for the confirmation!

> I'm a bit concerned that this approach could cause walwriter to attempt
> WAL writes more aggressively in some cases, adding overhead during
> normal processing. If the goal is only to prevent shutdown from getting stuck,
> instead, would it be sufficient for ShutdownXLOG() to call
> XLogSetAsyncXactLSN() just before WalSndWaitStopping()?

I did something similar with the first version using
XLogFlush(WriteRqstPtr), and it indeed fixes the shutdown issue.

Though I was under the impression that calling XLogSetAsyncXactLSN()
in RecordTransactionAbort was a better approach, as it is similar to
reporting the latest async abort. You may have a large backlog of
records, like selects pruning a lot of pages and timing out, which the
next commit will have to flush. Notifying the walwriter allows it to
flush those records it can.

Maybe for the context of a backport, patching ShutdownXLOG has the
benefit of minimising the amount of changes and risks?

I've updated the RecordTransactionAbort approach for now, with a small
optimization. XactLastRecEnd may be 0 if nothing was written, and we
can skip the unnecessary spinlocks in this case.

Regards,
Anthonin Bonnefoy

Вложения

Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

От
Anthonin Bonnefoy
Дата:
Here's a small updated version of the patch, with the 2 different approaches.

- 0001: This is the XLogSetAsyncXactLSN call in RecordTransactionAbort
approach. The small difference with v3 is that the 'XactLastRecEnd !=
0' condition is now merged with !isSubXact:
+if (!isSubXact && XactLastRecEnd != 0)
+{
+        XLogSetAsyncXactLSN(XactLastRecEnd);
         XactLastRecEnd = 0;
+}

- 0002: This is the ShutdownXLOG approach. I've used
XLogFlush(WriteRqstPtr) instead of updating the async LSN. It feels
like if we're going to stop the walsenders, we may as well flush
everything and get the WAL in a good state.
The spinlock to access XLogCtl->LogwrtRqst.Write is probably
unnecessary since we're at a point where no additional WAL records
should be written, but it doesn't hurt to keep consistency.

Regards,
Anthonin Bonnefoy

Вложения

Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

От
Fujii Masao
Дата:
On Wed, Mar 4, 2026 at 2:11 AM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:
>
> Here's a small updated version of the patch, with the 2 different approaches.

Thanks for the patches!


> - 0001: This is the XLogSetAsyncXactLSN call in RecordTransactionAbort
> approach. The small difference with v3 is that the 'XactLastRecEnd !=
> 0' condition is now merged with !isSubXact:
> +if (!isSubXact && XactLastRecEnd != 0)
> +{
> +        XLogSetAsyncXactLSN(XactLastRecEnd);
>          XactLastRecEnd = 0;
> +}

The approach of calling XLogSetAsyncXactLSN() in RecordTransactionAbort() seems
more like an improvement than a bug fix. Since it changes
RecordTransactionAbort(), it could have unintended impact on the system.

It may be a reasonable idea (though I'm not certain yet), but for a bug fix
I believe we should first apply the minimal change necessary to resolve
the issue. If needed, this approach could then be proposed later separately as
an improvement for the next major version.


> - 0002: This is the ShutdownXLOG approach. I've used
> XLogFlush(WriteRqstPtr) instead of updating the async LSN. It feels
> like if we're going to stop the walsenders, we may as well flush
> everything and get the WAL in a good state.
> The spinlock to access XLogCtl->LogwrtRqst.Write is probably
> unnecessary since we're at a point where no additional WAL records
> should be written, but it doesn't hurt to keep consistency.

As a simpler alternative, would it make sense for walsender to call
XLogFlush(GetXLogInsertRecPtr()) instead of XLogBackgroundFlush() during
shutdown? I'm not sure why walsender currently uses XLogBackgroundFlush().
If there isn't a clear reason for that choice, directly calling XLogFlush()
might be the simpler solution. Thought?

Regards,

--
Fujii Masao



Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

От
Anthonin Bonnefoy
Дата:
On Tue, Mar 3, 2026 at 6:29 PM Fujii Masao <masao.fujii@gmail.com> wrote:
> The approach of calling XLogSetAsyncXactLSN() in RecordTransactionAbort() seems
> more like an improvement than a bug fix. Since it changes
> RecordTransactionAbort(), it could have unintended impact on the system.
>
> It may be a reasonable idea (though I'm not certain yet), but for a bug fix
> I believe we should first apply the minimal change necessary to resolve
> the issue. If needed, this approach could then be proposed later separately as
> an improvement for the next major version.

Agreed, that's definitely a change that can have a large impact. I
will open a separate thread later.

> As a simpler alternative, would it make sense for walsender to call
> XLogFlush(GetXLogInsertRecPtr()) instead of XLogBackgroundFlush() during
> shutdown? I'm not sure why walsender currently uses XLogBackgroundFlush().
> If there isn't a clear reason for that choice, directly calling XLogFlush()
> might be the simpler solution. Thought?

That sounds like a good solution. I've tried it and it fixes the
issue. And this only changes the shutdown behaviour in the walsender.

The use of XLogBackgroundFlush() has been introduced with
c6c333436491, but there's no mention why it was specifically used. I
guess the assumption was that a change would either be flushed with a
commit, or tracked by async LSN through rollback, so
XLogBackgroundFlush() would always write pending records. But this
turns out to be false in the case of this bug.

I've updated the patch with this approach.

Regards,
Anthonin Bonnefoy

Вложения

Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

От
Fujii Masao
Дата:
On Wed, Mar 4, 2026 at 5:52 PM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:
>
> On Tue, Mar 3, 2026 at 6:29 PM Fujii Masao <masao.fujii@gmail.com> wrote:
> > The approach of calling XLogSetAsyncXactLSN() in RecordTransactionAbort() seems
> > more like an improvement than a bug fix. Since it changes
> > RecordTransactionAbort(), it could have unintended impact on the system.
> >
> > It may be a reasonable idea (though I'm not certain yet), but for a bug fix
> > I believe we should first apply the minimal change necessary to resolve
> > the issue. If needed, this approach could then be proposed later separately as
> > an improvement for the next major version.
>
> Agreed, that's definitely a change that can have a large impact. I
> will open a separate thread later.
>
> > As a simpler alternative, would it make sense for walsender to call
> > XLogFlush(GetXLogInsertRecPtr()) instead of XLogBackgroundFlush() during
> > shutdown? I'm not sure why walsender currently uses XLogBackgroundFlush().
> > If there isn't a clear reason for that choice, directly calling XLogFlush()
> > might be the simpler solution. Thought?
>
> That sounds like a good solution. I've tried it and it fixes the
> issue. And this only changes the shutdown behaviour in the walsender.
>
> The use of XLogBackgroundFlush() has been introduced with
> c6c333436491, but there's no mention why it was specifically used. I
> guess the assumption was that a change would either be flushed with a
> commit, or tracked by async LSN through rollback, so
> XLogBackgroundFlush() would always write pending records. But this
> turns out to be false in the case of this bug.
>
> I've updated the patch with this approach.

Thanks for updating the patch!

Barring any objections, I will commit the patch and backpatch it to
all supported branches.

Regards,

--
Fujii Masao



Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

От
Alexander Lakhin
Дата:
Hello Anthonin and Masao-san,

04.03.2026 19:46, Fujii Masao wrote:
> On Wed, Mar 4, 2026 at 5:52 PM Anthonin Bonnefoy
> <anthonin.bonnefoy@datadoghq.com> wrote:
>> I've updated the patch with this approach.
> Thanks for updating the patch!
>
> Barring any objections, I will commit the patch and backpatch it to
> all supported branches.

Thank you for working on this!

It looks like the same issue was discovered and discussed before, but that
time without a final fix: [1]. I tried v5 patch with my
099_walsender_stop.pl test and it executed 100 internal iterations
successfully, while without the patch it failed for me on iterations 6, 8, 5.

[1] https://www.postgresql.org/message-id/flat/f15d665f-4cd1-4894-037c-afdbe369287e%40gmail.com

Best regards,
Alexander



Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

От
Anthonin Bonnefoy
Дата:
Hi Alexander,

On Thu, Mar 5, 2026 at 7:30 AM Alexander Lakhin <exclusion@gmail.com> wrote:
> Hello Anthonin and Masao-san,
> Thank you for working on this!
>
> It looks like the same issue was discovered and discussed before, but that
> time without a final fix: [1]. I tried v5 patch with my
> 099_walsender_stop.pl test and it executed 100 internal iterations
> successfully, while without the patch it failed for me on iterations 6, 8, 5.
>
> [1] https://www.postgresql.org/message-id/flat/f15d665f-4cd1-4894-037c-afdbe369287e%40gmail.com

Thanks for the tests and the additional context.

Looking at the the thread, the latest patch provided a similar solution using:
+ XLogFlush(GetInsertRecPtr());

So it was relying on GetInsertRecPtr() instead of
GetXLogInsertRecPtr(). As mentioned in the thread, GetInsertRecPtr()
only returns the position of the last full xlog page, meaning it
doesn't fix the issue we have where the last partial page contains a
continuation record.

Testing the XLogFlush(GetInsertRecPtr()) patch with my script, I still
get the shutdown stuck issue.

Using GetXLogInsertRecPtr() is required to make sure the last partial
page is correctly flushed.

Regards,
Anthonin Bonnefoy



Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

От
Fujii Masao
Дата:
On Thu, Mar 5, 2026 at 5:40 PM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:
> So it was relying on GetInsertRecPtr() instead of
> GetXLogInsertRecPtr(). As mentioned in the thread, GetInsertRecPtr()
> only returns the position of the last full xlog page, meaning it
> doesn't fix the issue we have where the last partial page contains a
> continuation record.
>
> Testing the XLogFlush(GetInsertRecPtr()) patch with my script, I still
> get the shutdown stuck issue.
>
> Using GetXLogInsertRecPtr() is required to make sure the last partial
> page is correctly flushed.

Since GetXLogInsertRecPtr() returns a bogus LSN and XLogFlush() does
almost nothing during recovery, I added a !RecoveryInProgress() check
as follows. I've attached the latest version of the patch and updated
the commit message.

- if (got_STOPPING)
- XLogBackgroundFlush();
+ if (got_STOPPING && !RecoveryInProgress())
+ XLogFlush(GetXLogInsertRecPtr());

Regards,

--
Fujii Masao

Вложения

Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

От
Fujii Masao
Дата:
On Fri, Mar 6, 2026 at 8:46 AM Fujii Masao <masao.fujii@gmail.com> wrote:
>
> On Thu, Mar 5, 2026 at 5:40 PM Anthonin Bonnefoy
> <anthonin.bonnefoy@datadoghq.com> wrote:
> > So it was relying on GetInsertRecPtr() instead of
> > GetXLogInsertRecPtr(). As mentioned in the thread, GetInsertRecPtr()
> > only returns the position of the last full xlog page, meaning it
> > doesn't fix the issue we have where the last partial page contains a
> > continuation record.
> >
> > Testing the XLogFlush(GetInsertRecPtr()) patch with my script, I still
> > get the shutdown stuck issue.
> >
> > Using GetXLogInsertRecPtr() is required to make sure the last partial
> > page is correctly flushed.
>
> Since GetXLogInsertRecPtr() returns a bogus LSN and XLogFlush() does
> almost nothing during recovery, I added a !RecoveryInProgress() check
> as follows. I've attached the latest version of the patch and updated
> the commit message.
>
> - if (got_STOPPING)
> - XLogBackgroundFlush();
> + if (got_STOPPING && !RecoveryInProgress())
> + XLogFlush(GetXLogInsertRecPtr());

I've pushed the patch. Thanks!

Regards,

--
Fujii Masao



Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

От
Andres Freund
Дата:
Hi,

On 2026-03-06 16:48:06 +0900, Fujii Masao wrote:
> On Fri, Mar 6, 2026 at 8:46 AM Fujii Masao <masao.fujii@gmail.com> wrote:
> >
> > On Thu, Mar 5, 2026 at 5:40 PM Anthonin Bonnefoy
> > <anthonin.bonnefoy@datadoghq.com> wrote:
> > > So it was relying on GetInsertRecPtr() instead of
> > > GetXLogInsertRecPtr(). As mentioned in the thread, GetInsertRecPtr()
> > > only returns the position of the last full xlog page, meaning it
> > > doesn't fix the issue we have where the last partial page contains a
> > > continuation record.
> > >
> > > Testing the XLogFlush(GetInsertRecPtr()) patch with my script, I still
> > > get the shutdown stuck issue.
> > >
> > > Using GetXLogInsertRecPtr() is required to make sure the last partial
> > > page is correctly flushed.
> >
> > Since GetXLogInsertRecPtr() returns a bogus LSN and XLogFlush() does
> > almost nothing during recovery, I added a !RecoveryInProgress() check
> > as follows. I've attached the latest version of the patch and updated
> > the commit message.
> >
> > - if (got_STOPPING)
> > - XLogBackgroundFlush();
> > + if (got_STOPPING && !RecoveryInProgress())
> > + XLogFlush(GetXLogInsertRecPtr());
> 
> I've pushed the patch. Thanks!

I'm pretty sure this is not correct as-is, it suffers from the same issue as
https://postgr.es/m/vf4hbwrotvhbgcnknrqmfbqlu75oyjkmausvy66ic7x7vuhafx%40e4rvwavtjswo
I.e. it is not safe to use GetXLogInsertRecPtr() to determine up to where to
flush to, due to page boundaries.

Greetings,

Andres Freund



Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

От
Anthonin Bonnefoy
Дата:
On Tue, Mar 10, 2026 at 6:11 PM Andres Freund <andres@anarazel.de> wrote:
> I'm pretty sure this is not correct as-is, it suffers from the same issue as
> https://postgr.es/m/vf4hbwrotvhbgcnknrqmfbqlu75oyjkmausvy66ic7x7vuhafx%40e4rvwavtjswo
> I.e. it is not safe to use GetXLogInsertRecPtr() to determine up to where to
> flush to, due to page boundaries.

I've managed to reproduce this issue by ensuring the FPI_FOR_HINT
record finishes at the end of a page with the following script (might
need some adjustment if the record sizes are different):

DROP TABLE IF EXISTS test_insert_rec_ptr;
CREATE TABLE test_insert_rec_ptr(aid int, data text) WITH
(autovacuum_enabled = false);
INSERT INTO test_insert_rec_ptr SELECT *, repeat('a', 100) FROM
generate_series(0, 57);
-- This should tag the page as full
BEGIN; UPDATE test_insert_rec_ptr SET aid=2 where aid=1; ROLLBACK;
CHECKPOINT;
-- Start with a fresh file
SELECT pg_switch_wal();
-- Our FPI_FOR_HINT writes 8193 bytes
-- With the long header,  the first  page has 8152 bytes available
-- With the short header, the second page has 8168 bytes available
-- We want our FPI_FOR_HINT to finish at the end of the second page
(+/- 8 bytes of alignment)
-- We need to write the first 25 bytes (or 32 with alignment) in the first page
-- For that, we need to write 8120 bytes of WAL records
BEGIN;
-- 264 bytes of FPW
INSERT INTO test_insert_rec_ptr VALUES(1);
-- 74 * 104 bytes
INSERT INTO test_insert_rec_ptr SELECT *, repeat('a', 44) FROM
generate_series(1, 74);
-- 108 bytes
INSERT INTO test_insert_rec_ptr VALUES(1, repeat('a', 48));
-- 46 bytes
COMMIT;
-- 264 + 74 * 104 + 46 + 108 = 8114 bytes, which will round up to 8120
with alignment
-- FPI_FOR_HINT record should be at 0x1FE0
BEGIN; SELECT * FROM test_insert_rec_ptr WHERE aid=2; ROLLBACK;

As far as I can tell, the only impact it has is to complain about the
write request being too far:
LOG:  request to flush past end of generated WAL; request 0/01604018,
current position 0/01604000
ERROR:  xlog flush request 0/01604018 is not satisfied --- flushed
only to 0/01604000

To avoid this issue, it sounds like we need something to use
XLogBytePosToEndRecPtr instead of XLogBytePosToRecPtr to convert the
byte position? With XLogBytePosToRecPtr(), the flush request would
stop at 01604000 instead of going to the next page with 01604018.

In the attached patch, I've added a GetXLogInsertEndRecPtr() function
which is similar to GetXLogInsertRecPtr(), except it uses
XLogBytePosToEndRecPtr() to stop at the page boundary.
There was also another XLogFlush(GetXLogWriteRecPtr()) call in
syncutils.c, so I replaced both calls with
XLogFlush(GetXLogInsertEndRecPtr()).

Regards,
Anthonin Bonnefoy

Вложения

Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

От
Fujii Masao
Дата:
On Thu, Mar 12, 2026 at 11:08 PM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:
>
> On Tue, Mar 10, 2026 at 6:11 PM Andres Freund <andres@anarazel.de> wrote:
> > I'm pretty sure this is not correct as-is, it suffers from the same issue as
> > https://postgr.es/m/vf4hbwrotvhbgcnknrqmfbqlu75oyjkmausvy66ic7x7vuhafx%40e4rvwavtjswo
> > I.e. it is not safe to use GetXLogInsertRecPtr() to determine up to where to
> > flush to, due to page boundaries.

Thanks for the report!


> I've managed to reproduce this issue by ensuring the FPI_FOR_HINT
> record finishes at the end of a page with the following script (might
> need some adjustment if the record sizes are different):
>
> DROP TABLE IF EXISTS test_insert_rec_ptr;
> CREATE TABLE test_insert_rec_ptr(aid int, data text) WITH
> (autovacuum_enabled = false);
> INSERT INTO test_insert_rec_ptr SELECT *, repeat('a', 100) FROM
> generate_series(0, 57);
> -- This should tag the page as full
> BEGIN; UPDATE test_insert_rec_ptr SET aid=2 where aid=1; ROLLBACK;
> CHECKPOINT;
> -- Start with a fresh file
> SELECT pg_switch_wal();
> -- Our FPI_FOR_HINT writes 8193 bytes
> -- With the long header,  the first  page has 8152 bytes available
> -- With the short header, the second page has 8168 bytes available
> -- We want our FPI_FOR_HINT to finish at the end of the second page
> (+/- 8 bytes of alignment)
> -- We need to write the first 25 bytes (or 32 with alignment) in the first page
> -- For that, we need to write 8120 bytes of WAL records
> BEGIN;
> -- 264 bytes of FPW
> INSERT INTO test_insert_rec_ptr VALUES(1);
> -- 74 * 104 bytes
> INSERT INTO test_insert_rec_ptr SELECT *, repeat('a', 44) FROM
> generate_series(1, 74);
> -- 108 bytes
> INSERT INTO test_insert_rec_ptr VALUES(1, repeat('a', 48));
> -- 46 bytes
> COMMIT;
> -- 264 + 74 * 104 + 46 + 108 = 8114 bytes, which will round up to 8120
> with alignment
> -- FPI_FOR_HINT record should be at 0x1FE0
> BEGIN; SELECT * FROM test_insert_rec_ptr WHERE aid=2; ROLLBACK;
>
> As far as I can tell, the only impact it has is to complain about the
> write request being too far:
> LOG:  request to flush past end of generated WAL; request 0/01604018,
> current position 0/01604000
> ERROR:  xlog flush request 0/01604018 is not satisfied --- flushed
> only to 0/01604000
>
> To avoid this issue, it sounds like we need something to use
> XLogBytePosToEndRecPtr instead of XLogBytePosToRecPtr to convert the
> byte position? With XLogBytePosToRecPtr(), the flush request would
> stop at 01604000 instead of going to the next page with 01604018.
>
> In the attached patch, I've added a GetXLogInsertEndRecPtr() function
> which is similar to GetXLogInsertRecPtr(), except it uses
> XLogBytePosToEndRecPtr() to stop at the page boundary.
> There was also another XLogFlush(GetXLogWriteRecPtr()) call in
> syncutils.c, so I replaced both calls with
> XLogFlush(GetXLogInsertEndRecPtr()).

Thanks for investigating the issue and making the patch!
It looks good to me.

Andres,

Do you have any comments on the proposed patch?

Regards,

--
Fujii Masao



Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

От
Fujii Masao
Дата:
On Fri, Mar 13, 2026 at 2:24 AM Fujii Masao <masao.fujii@gmail.com> wrote:
> Thanks for investigating the issue and making the patch!
> It looks good to me.

Since Tomas added GetXLogInsertEndRecPtr() in commit b1f14c96720,
I updated the patch to use it. Patch attached.
Barring any objections, I will commit it.

-       XLogFlush(GetXLogWriteRecPtr());
+       XLogFlush(GetXLogInsertEndRecPtr());

I excluded the above change from the patch because it seems like a separate
issue. I also wonder whether this code could cause an error in XLogFlush()
even when GetXLogWriteRecPtr() is used.

Regards,

--
Fujii Masao

Вложения

Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

От
Anthonin Bonnefoy
Дата:
On Mon, Mar 16, 2026 at 6:39 AM Fujii Masao <masao.fujii@gmail.com> wrote:
> Since Tomas added GetXLogInsertEndRecPtr() in commit b1f14c96720,
> I updated the patch to use it. Patch attached.
> Barring any objections, I will commit it.
>
> -       XLogFlush(GetXLogWriteRecPtr());
> +       XLogFlush(GetXLogInsertEndRecPtr());

Thanks for the updated patch! I've run my test script against the
patch and there's no more "xlog flush request xxx is not satisfied"
errors reported.

> I excluded the above change from the patch because it seems like a separate
> issue. I also wonder whether this code could cause an error in XLogFlush()
> even when GetXLogWriteRecPtr() is used.

Ha right, I've mixed Insert and Write and thought that
FinishSyncWorker was also doing a XlogFlush(GetXLogInsertRecPtr())
when writing the patch. If I try to trigger the partial record issue,
GetXLogWriteRecPtr() points at the end of the WAL page containing the
beginning of the FPI_FOR_HINT, there's no attempt to flush in the
future. So FinishSyncWorker doesn't seem impacted by the issue.

Regards,
Anthonin Bonnefoy



Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

От
Fujii Masao
Дата:
On Mon, Mar 16, 2026 at 6:45 PM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:
>
> On Mon, Mar 16, 2026 at 6:39 AM Fujii Masao <masao.fujii@gmail.com> wrote:
> > Since Tomas added GetXLogInsertEndRecPtr() in commit b1f14c96720,
> > I updated the patch to use it. Patch attached.
> > Barring any objections, I will commit it.
> >
> > -       XLogFlush(GetXLogWriteRecPtr());
> > +       XLogFlush(GetXLogInsertEndRecPtr());
>
> Thanks for the updated patch! I've run my test script against the
> patch and there's no more "xlog flush request xxx is not satisfied"
> errors reported.

Thanks for the test! I've pushed the patch.


> > I excluded the above change from the patch because it seems like a separate
> > issue. I also wonder whether this code could cause an error in XLogFlush()
> > even when GetXLogWriteRecPtr() is used.
>
> Ha right, I've mixed Insert and Write and thought that
> FinishSyncWorker was also doing a XlogFlush(GetXLogInsertRecPtr())
> when writing the patch. If I try to trigger the partial record issue,
> GetXLogWriteRecPtr() points at the end of the WAL page containing the
> beginning of the FPI_FOR_HINT, there's no attempt to flush in the
> future. So FinishSyncWorker doesn't seem impacted by the issue.

Understood. Thanks for the investigation!

Regards,

--
Fujii Masao



Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

От
Michael Paquier
Дата:
On Tue, Mar 17, 2026 at 08:16:08AM +0900, Fujii Masao wrote:
> On Mon, Mar 16, 2026 at 6:45 PM Anthonin Bonnefoy
> <anthonin.bonnefoy@datadoghq.com> wrote:
>> Thanks for the updated patch! I've run my test script against the
>> patch and there's no more "xlog flush request xxx is not satisfied"
>> errors reported.
>
> Thanks for the test! I've pushed the patch.

This stuff seems sensible enough that I think we should at least have
a test, no?  It does not have to be absolutely perfect in terms of
reproducibility, just good enough to be able to detect it across the
buildfarm.  We already do various things with page boundaries in WAL
during recovery, and a shutdown could be perhaps timed to increase the
reproducibility rate of the issues discussed?
--
Michael

Вложения

Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

От
Anthonin Bonnefoy
Дата:
On Tue, Mar 17, 2026 at 12:26 AM Michael Paquier <michael@paquier.xyz> wrote:
> This stuff seems sensible enough that I think we should at least have
> a test, no?  It does not have to be absolutely perfect in terms of
> reproducibility, just good enough to be able to detect it across the
> buildfarm.  We already do various things with page boundaries in WAL
> during recovery, and a shutdown could be perhaps timed to increase the
> reproducibility rate of the issues discussed?

I initially thought that there was no easy way to trigger this issue
reliably in a test: the script I've been using won't work as soon as
there are changes in the record sizes. Then I remembered that
pg_logical_emit_message existed and could be used to write a WAL
record of a specific size, without allocating a xid and without
flushing the record.

With this, the test can be simplified to:
SELECT pg_switch_wal();
BEGIN;
SELECT pg_logical_emit_message(false, '', repeat('a', 16265), false);
ROLLBACK;

Any change in WAL short header, long header or xl_logical_message
struct will "break" the test since the record won't be at the exact
end of the page boundary. This also assumes that we have an 8 byte
alignment. 32 bits machine will have the WAL record ends at 3FF0, so
not exactly the end, but that should be fine to test different
conditions.

A word of caution about this test: While running it on my machine,
I've managed to trigger some weird WAL corruption. The new segment
after the switch had 1 or 2 excessive bytes at the start of the
segment just before the xlog page magic, shifting the whole file. The
first time it happened, I thought I'd messed something up and added
the bytes myself while looking at the WAL with imhex. The second time,
I've only run the script, and the new segment had a 1.1MB size shortly
after, so I'm pretty sure I didn't do anything that could have
introduced those excessive bytes.

I'm still trying to understand the trigger conditions (some race
condition between the switch and the walwriter?), but if this test is
merged, it may trigger this WAL corruption issue on the buildfarm.

Regards,
Anthonin Bonnefoy

Вложения