Обсуждение: BUG: Former primary node might stuck when started as a standby
Hello hackers, [ reporting this bug here due to limitations of the bug reporting form ] When a node, that acted as a primary, becomes a standby as in the following script: [ ... some WAL-logged activity ... ] $primary->teardown_node; $standby->promote; $primary->enable_streaming($standby); $primary->start; it might not go online, due to the error: new timeline N forked off current database system timeline M before current recovery point X/X A complete TAP test is attached. I put it in src/test/recovery/t, run as follows: for i in `seq 100`; do echo "iteration $i"; timeout 60 make -s check -C src/test/recovery PROVE_TESTS="t/099*" || break; done and get: ... iteration 7 # +++ tap check in src/test/recovery +++ t/099_change_roles.pl .. ok All tests successful. Files=1, Tests=20, 14 wallclock secs ( 0.01 usr 0.00 sys + 4.20 cusr 4.75 csys = 8.96 CPU) Result: PASS iteration 8 # +++ tap check in src/test/recovery +++ t/099_change_roles.pl .. 9/? make: *** [Makefile:23: check] Terminated With wal_debug enabled (and log_min_messages=DEBUG2, log_statement=all), I see the following in the _node1.log: 2024-01-18 15:21:02.258 UTC [663701] 099_change_roles.pl LOG: INSERT @ 0/304DBF0: - Transaction/COMMIT: 2024-01-18 15:21:02.258739+00 2024-01-18 15:21:02.258 UTC [663701] 099_change_roles.pl STATEMENT: INSERT INTO t VALUES (10, 'inserted on node1'); 2024-01-18 15:21:02.258 UTC [663701] 099_change_roles.pl LOG: xlog flush request 0/304DBF0; write 0/0; flush 0/0 2024-01-18 15:21:02.258 UTC [663701] 099_change_roles.pl STATEMENT: INSERT INTO t VALUES (10, 'inserted on node1'); 2024-01-18 15:21:02.258 UTC [663671] node2 DEBUG: write 0/304DBF0 flush 0/304DB78 apply 0/304DB78 reply_time 2024-01-18 15:21:02.2588+00 2024-01-18 15:21:02.258 UTC [663671] node2 DEBUG: write 0/304DBF0 flush 0/304DBF0 apply 0/304DB78 reply_time 2024-01-18 15:21:02.258809+00 2024-01-18 15:21:02.258 UTC [663671] node2 DEBUG: write 0/304DBF0 flush 0/304DBF0 apply 0/304DBF0 reply_time 2024-01-18 15:21:02.258864+00 2024-01-18 15:21:02.259 UTC [663563] DEBUG: server process (PID 663701) exited with exit code 0 2024-01-18 15:21:02.260 UTC [663563] DEBUG: forked new backend, pid=663704 socket=8 2024-01-18 15:21:02.261 UTC [663704] 099_change_roles.pl LOG: statement: INSERT INTO t VALUES (1000 * 1 + 608, 'background activity'); 2024-01-18 15:21:02.261 UTC [663704] 099_change_roles.pl LOG: INSERT @ 0/304DC40: - Heap/INSERT: off: 12, flags: 0x00 2024-01-18 15:21:02.261 UTC [663704] 099_change_roles.pl STATEMENT: INSERT INTO t VALUES (1000 * 1 + 608, 'background activity'); 2024-01-18 15:21:02.261 UTC [663563] DEBUG: postmaster received shutdown request signal 2024-01-18 15:21:02.261 UTC [663563] LOG: received immediate shutdown request 2024-01-18 15:21:02.261 UTC [663704] 099_change_roles.pl LOG: INSERT @ 0/304DC68: - Transaction/COMMIT: 2024-01-18 15:21:02.261828+00 2024-01-18 15:21:02.261 UTC [663704] 099_change_roles.pl STATEMENT: INSERT INTO t VALUES (1000 * 1 + 608, 'background activity'); 2024-01-18 15:21:02.261 UTC [663704] 099_change_roles.pl LOG: xlog flush request 0/304DC68; write 0/0; flush 0/0 2024-01-18 15:21:02.261 UTC [663704] 099_change_roles.pl STATEMENT: INSERT INTO t VALUES (1000 * 1 + 608, 'background activity'); ... 2024-01-18 15:21:02.262 UTC [663563] LOG: database system is shut down ... 2024-01-18 15:21:02.474 UTC [663810] LOG: starting PostgreSQL 16.1 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0, 64-bit ... 2024-01-18 15:21:02.478 UTC [663816] LOG: REDO @ 0/304DBC8; LSN 0/304DBF0: prev 0/304DB78; xid 898; len 8 - Transaction/COMMIT: 2024-01-18 15:21:02.258739+00 2024-01-18 15:21:02.478 UTC [663816] LOG: REDO @ 0/304DBF0; LSN 0/304DC40: prev 0/304DBC8; xid 899; len 3; blkref #0: rel 1663/5/16384, blk 1 - Heap/INSERT: off: 12, flags: 0x00 2024-01-18 15:21:02.478 UTC [663816] LOG: REDO @ 0/304DC40; LSN 0/304DC68: prev 0/304DBF0; xid 899; len 8 - Transaction/COMMIT: 2024-01-18 15:21:02.261828+00 ... 2024-01-18 15:21:02.481 UTC [663819] LOG: fetching timeline history file for timeline 20 from primary server 2024-01-18 15:21:02.481 UTC [663819] LOG: started streaming WAL from primary at 0/3000000 on timeline 19 ... 2024-01-18 15:21:02.481 UTC [663819] DETAIL: End of WAL reached on timeline 19 at 0/304DBF0. ... 2024-01-18 15:21:02.481 UTC [663816] LOG: new timeline 20 forked off current database system timeline 19 before current recovery point 0/304DC68 In this case, node1 wrote to it's WAL record 0/304DC68, but sent to node2 only record 0/304DBF0, then node2, being promoted to primary, forked a next timeline from it, but when node1 was started as a standby, it first replayed 0/304DC68 from WAL, and then could not switch to the new timeline starting from the previous position. Reproduced on REL_12_STABLE .. master. Best regards, Alexander
Вложения
Hi, > it might not go online, due to the error: > new timeline N forked off current database system timeline M before current recovery point X/X > [...] > In this case, node1 wrote to it's WAL record 0/304DC68, but sent to node2 > only record 0/304DBF0, then node2, being promoted to primary, forked a next > timeline from it, but when node1 was started as a standby, it first > replayed 0/304DC68 from WAL, and then could not switch to the new timeline > starting from the previous position. Unless I'm missing something, this is just the right behavior of the system. node1 has no way of knowing the history of node1/node2/nodeN promotion. It sees that it has more data and/or inconsistent timeline with another node and refuses to process further until DBA will intervene. What else can node1 do, drop the data? That's not how things are done in Postgres :) What if this is a very important data and node2 was promoted mistakenly, either manually or by a buggy script. It's been a while since I seriously played with replication, but if memory serves, a proper way to switch node1 to a replica mode would be to use pg_rewind on it first. -- Best regards, Aleksander Alekseev
Hi Aleksander, 19.01.2024 14:45, Aleksander Alekseev wrote: > >> it might not go online, due to the error: >> new timeline N forked off current database system timeline M before current recovery point X/X >> [...] >> In this case, node1 wrote to it's WAL record 0/304DC68, but sent to node2 >> only record 0/304DBF0, then node2, being promoted to primary, forked a next >> timeline from it, but when node1 was started as a standby, it first >> replayed 0/304DC68 from WAL, and then could not switch to the new timeline >> starting from the previous position. > Unless I'm missing something, this is just the right behavior of the system. Thank you for the answer! > node1 has no way of knowing the history of node1/node2/nodeN > promotion. It sees that it has more data and/or inconsistent timeline > with another node and refuses to process further until DBA will > intervene. But node1 knows that it's a standby now and it's expected to get all the WAL records from the primary, doesn't it? Maybe it could REDO from it's own WAL as little records as possible, before requesting records from the authoritative source... Is it supposed that it's more performance-efficient (not on the first restart, but on later ones)? > What else can node1 do, drop the data? That's not how > things are done in Postgres :) In case no other options exist (this behavior is really correct and the only possible), maybe the server should just stop? Can DBA intervene somehow to make the server proceed without stopping it? > It's been a while since I seriously played with replication, but if > memory serves, a proper way to switch node1 to a replica mode would be > to use pg_rewind on it first. Perhaps that's true generally, but as we can see, without the extra records replayed, this scenario works just fine. Moreover, existing tests rely on it, e.g., 009_twophase.pl or 012_subtransactions.pl (in fact, my research of the issue was initiated per a test failure). Best regards, Alexander
Hi, > But node1 knows that it's a standby now and it's expected to get all the > WAL records from the primary, doesn't it? Yes, but node1 doesn't know if it always was a standby or not. What if node1 was always a standby, node2 was a primary, then node2 died and node3 is a new primary. If node1 sees inconsistency in the WAL records, it should report it and stop doing anything, since it doesn't has all the information needed to resolve the inconsistencies in all the possible cases. Only DBA has this information. > > It's been a while since I seriously played with replication, but if > > memory serves, a proper way to switch node1 to a replica mode would be > > to use pg_rewind on it first. > > Perhaps that's true generally, but as we can see, without the extra > records replayed, this scenario works just fine. Moreover, existing tests > rely on it, e.g., 009_twophase.pl or 012_subtransactions.pl (in fact, my > research of the issue was initiated per a test failure). I suggest focusing on particular flaky tests then and how to fix them. -- Best regards, Aleksander Alekseev
Hi Aleksander, [ I'm writing this off-list to minimize noise, but we can continue the discussion in -hackers, if you wish ] 22.01.2024 14:00, Aleksander Alekseev wrote: > Hi, > >> But node1 knows that it's a standby now and it's expected to get all the >> WAL records from the primary, doesn't it? > Yes, but node1 doesn't know if it always was a standby or not. What if > node1 was always a standby, node2 was a primary, then node2 died and > node3 is a new primary. Excuse me, but I still can't understand what could go wrong in this case. Let's suppose, node1 has WAL with the following contents before start: CPLOC | TL1R1 | TL1R2 | TL1R3 | while node2's WAL contains: TL1R1 | TL2R1 | TL2R2 | ... where CPLOC -- a checkpoint location, TLxRy -- a record y on a timeline x. I assume that requesting all WAL records from node2 without redoing local records should be the right thing. And even in the situation you propose: CPLOC | TL2R5 | TL2R6 | TL2R7 | while node3's WAL contains: TL2R5 | TL3R1 | TL3R2 | ... I see no issue with applying records from node3... > If node1 sees inconsistency in the WAL > records, it should report it and stop doing anything, since it doesn't > has all the information needed to resolve the inconsistencies in all > the possible cases. Only DBA has this information. I still wonder, what can be considered an inconsistency in this situation. Doesn't the exactly redo of all the local WAL records create the inconsistency here? For me, it's the question of an authoritative source, and if we had such a source, we should trust it's records only. Or in the other words, what if the record TL1R3, which node1 wrote to it's WAL, but didn't send to node2, happened to have an incorrect checksum (due to partial write, for example)? If I understand correctly, node1 will just stop redoing WAL at that position to receive all the following records from node2 and move forward without reporting the inconsistency (an extra WAL record). Best regards, Alexander
Hello,
22.01.2024 13:00, Aleksander Alekseev wrote:
22.01.2024 13:00, Aleksander Alekseev wrote:
Perhaps that's true generally, but as we can see, without the extra records replayed, this scenario works just fine. Moreover, existing tests rely on it, e.g., 009_twophase.pl or 012_subtransactions.pl (in fact, my research of the issue was initiated per a test failure).I suggest focusing on particular flaky tests then and how to fix them.
For the record: skink produced the same failure with 009_twophase.pl
yesterday [1]:
366/366 postgresql:recovery / recovery/009_twophase TIMEOUT 15000.31s killed by signal 15 SIGTERM
[15:11:30.591](0.807s) ok 11 - Restore of prepared transaction on promoted standby
### Enabling streaming replication for node "london"
### Starting node "london"
# Running: pg_ctl --wait --pgdata /home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/recovery/009_twophase/data/t_009_twophase_london_data/pgdata --log /home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/recovery/009_twophase/log/009_twophase_london.log --options --cluster-name=london start
waiting for server to start....... done
server started
# Postmaster PID for node "london" is 4181556
death by signal at /home/bf/bf-build/skink-master/HEAD/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm line 181.
# Postmaster PID for node "london" is 4181556
### Stopping node "london" using mode immediate
# Running: pg_ctl --pgdata /home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/recovery/009_twophase/data/t_009_twophase_london_data/pgdata --mode immediate stop
waiting for server to shut down.... done
server stopped
# No postmaster PID for node "london"
# Postmaster PID for node "paris" is 4147391
### Stopping node "paris" using mode immediate
# Running: pg_ctl --pgdata /home/bf/bf-build/skink-master/HEAD/pgsql.build/testrun/recovery/009_twophase/data/t_009_twophase_paris_data/pgdata --mode immediate stop
waiting for server to shut down.... done
server stopped
# No postmaster PID for node "paris"
[19:19:46.999](14896.408s) # Tests were run but no plan was declared and done_testing() was not seen.
[19:19:46.999](0.000s) # Looks like your test exited with 4 just after 11.
pgsql.build/testrun/recovery/009_twophase/log/009_twophase_london.log
2026-02-04 15:11:35.675 CET [4181556][postmaster][:0] LOG: starting PostgreSQL 19devel on x86_64-linux, compiled by gcc-15.2.0, 64-bit
2026-02-04 15:11:35.692 CET [4181556][postmaster][:0] LOG: listening on Unix socket "/tmp/IzuUID03TM/.s.PGSQL.22911"
2026-02-04 15:11:35.904 CET [4183900][startup][:0] LOG: database system was interrupted; last known up at 2026-02-04 15:11:14 CET
2026-02-04 15:11:36.017 CET [4183900][startup][:0] LOG: entering standby mode
2026-02-04 15:11:36.019 CET [4183900][startup][:0] LOG: database system was not properly shut down; automatic recovery in progress
2026-02-04 15:11:36.104 CET [4183900][startup][38/0:0] LOG: redo starts at 0/03021600
2026-02-04 15:11:36.206 CET [4183900][startup][38/0:0] LOG: invalid record length at 0/030220B8: expected at least 24, got 0
2026-02-04 15:11:36.209 CET [4183900][startup][38/0:0] LOG: consistent recovery state reached at 0/030220B8
2026-02-04 15:11:36.210 CET [4181556][postmaster][:0] LOG: database system is ready to accept read-only connections
2026-02-04 15:11:36.919 CET [4184142][walreceiver][:0] LOG: fetching timeline history file for timeline 2 from primary server
2026-02-04 15:11:36.997 CET [4184142][walreceiver][:0] LOG: started streaming WAL from primary at 0/03000000 on timeline 1
2026-02-04 15:11:37.076 CET [4184142][walreceiver][:0] LOG: replication terminated by primary server
2026-02-04 15:11:37.076 CET [4184142][walreceiver][:0] DETAIL: End of WAL reached on timeline 1 at 0/03022070.
2026-02-04 15:11:37.098 CET [4183900][startup][38/0:0] LOG: new timeline 2 forked off current database system timeline 1 before current recovery point 0/030220B8
2026-02-04 15:11:37.103 CET [4184142][walreceiver][:0] LOG: restarted WAL streaming at 0/03000000 on timeline 1
2026-02-04 15:11:37.107 CET [4184142][walreceiver][:0] LOG: replication terminated by primary server
...
[1] https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=skink&dt=2026-02-04%2013%3A36%3A40&stg=recovery-check
Best regards,
Alexander
RE: BUG: Former primary node might stuck when started as a standby
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Alexander, Aleksander
Is it enough that we can wait till all WALs are replayed before the promotion?
Like attached.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
Вложения
Dear Kuroda-san,
09.02.2026 04:38, Hayato Kuroda (Fujitsu) wrote:
09.02.2026 04:38, Hayato Kuroda (Fujitsu) wrote:
@font-face {font-family:"MS ゴシック"; panose-1:2 11 6 9 7 2 5 8 2 4;}@font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face {font-family:游ゴシック; panose-1:2 11 4 0 0 0 0 0 0 0;}@font-face {font-family:"MS Pゴシック"; panose-1:2 11 6 0 7 2 5 8 2 4;}@font-face {font-family:"\@MS Pゴシック";}@font-face {font-family:"\@游ゴシック"; panose-1:2 11 4 0 0 0 0 0 0 0;}@font-face {font-family:"\@MS ゴシック"; panose-1:2 11 6 9 7 2 5 8 2 4;}p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0mm; font-size:12.0pt; font-family:"MS Pゴシック";}.MsoChpDefault {mso-style-type:export-only; font-size:10.0pt; mso-ligatures:none;}div.WordSection1 {page:WordSection1;} Dear Alexander, Aleksander
Is it enough that we can wait till all WALs are replayed before the promotion?
Like attached.
Thank you for paying attention to this!
I tried to add
$primary->wait_for_replay_catchup($standby);
in my test (see attached), but that doesn't help -- it still fails as below:
iteration 61
# +++ tap check in src/test/recovery +++
t/099_change_roles.pl .. 1/? make: *** [Makefile:28: check] Terminated
src/test/recovery/tmp_check/log/099_change_roles_node1.log
2026-02-12 20:30:42.953 EET [2716149] LOG: new timeline 4 forked off current database system timeline 3 before current recovery point 0/3025B48
Best regards,
Alexander
Вложения
RE: BUG: Former primary node might stuck when started as a standby
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Alexander,
I checked your test and reproduced the issue with it.
Was it possible that INSERT happened in-between wait_for_replay_catchup and
teardown_node? In this case we may not ensure WAL records generated in the time
window were reached, right?
Similar stuff won7t happen in 009_twophase.pl because it does not have the bg activities.
Obtained log around here is shown in [1].
[1]
```
2026-02-13 10:34:56.751 JST client backend[521764] 099_change_roles.pl LOG: statement: INSERT INTO t VALUES (1000 * 1
+2812, 'background activity');
2026-02-13 10:34:56.753 JST client backend[521762] 099_change_roles.pl LOG: statement: SELECT '0/030ED4F8' <=
replay_lsnAND state = 'streaming' * from wait_for_replay_catchup
FROM pg_catalog.pg_stat_replication
WHERE application_name IN ('node2', 'walreceiver')
2026-02-13 10:34:56.762 JST client backend[521767] 099_change_roles.pl LOG: statement: INSERT INTO t VALUES (1000 * 1
+2813, 'background activity'); * do INSERT from background activities
2026-02-13 10:34:56.764 JST client backend[521768] 099_change_roles.pl LOG: statement: INSERT INTO t VALUES (1000 * 2
+2629, 'background activity');
2026-02-13 10:34:56.765 JST postmaster[521622] LOG: received immediate shutdown request
* from teardown_node
2026-02-13 10:34:56.767 JST postmaster[521622] LOG: database system is shut down
2026-02-13 10:34:56.996 JST postmaster[521844] LOG: starting PostgreSQL 19devel on x86_64-linux, compiled by
gcc-11.5.0,64-bit
....
2026-02-13 10:34:57.124 JST walreceiver[521864] LOG: replication terminated by primary server
2026-02-13 10:34:57.124 JST walreceiver[521864] DETAIL: End of WAL reached on timeline 153 at 0/030ED6D8.
2026-02-13 10:34:57.124 JST startup[521855] LOG: new timeline 154 forked off current database system timeline 153
beforecurrent recovery point 0/030ED750
2026-02-13 10:34:57.124 JST walreceiver[521864] LOG: restarted WAL streaming at 0/03000000 on timeline 153
```
Best regards,
Hayato Kuroda
FUJITSU LIMITED
Dear Kuroda-san,
13.02.2026 04:03, Hayato Kuroda (Fujitsu) wrote:
13.02.2026 04:03, Hayato Kuroda (Fujitsu) wrote:
Dear Alexander, I checked your test and reproduced the issue with it. Was it possible that INSERT happened in-between wait_for_replay_catchup and teardown_node? In this case we may not ensure WAL records generated in the time window were reached, right? Similar stuff won7t happen in 009_twophase.pl because it does not have the bg activities.
From my old records, 009_twophase.pl failed exactly due to background (
namely, bgwriter's) activity.
I modified bgwriter.c to reproduce the failure easier:
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -67,7 +67,7 @@ int BgWriterDelay = 200;
* Interval in which standby snapshots are logged into the WAL stream, in
* milliseconds.
*/
-#define LOG_SNAPSHOT_INTERVAL_MS 15000
+#define LOG_SNAPSHOT_INTERVAL_MS 1
/*
* LSN and timestamp at which we last issued a LogStandbySnapshot(), to avoid
@@ -306,7 +306,7 @@ BackgroundWriterMain(const void *startup_data, size_t startup_data_len)
*/
rc = WaitLatch(MyLatch,
WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
- BgWriterDelay /* ms */ , WAIT_EVENT_BGWRITER_MAIN);
+ 1 /* ms */ , WAIT_EVENT_BGWRITER_MAIN);
/*
* If no latch event and BgBufferSync says nothing's happening, extend
@@ -339,6 +339,5 @@ BackgroundWriterMain(const void *startup_data, size_t startup_data_len)
StrategyNotifyBgWriter(-1);
}
- prev_hibernate = can_hibernate;
}
}
multiplied the test to increase probability of the failure:
for i in {1..20}; do cp -r src/test/recovery/ src/test/recovery_$i/; sed "s|src/test/recovery|src/test/recovery_$i|" -i src/test/recovery_$i/Makefile; done
and executed it in a loop:
for i in {1..100}; do echo "ITERATION $i"; parallel --halt now,fail=1 -j20 --linebuffer --tag PROVE_TESTS="t/009*" NO_TEMP_INSTALL=1 timeout 60 make check -s -C src/test/recovery_{} ::: `seq 20` || break; done
It failed for me on iterations 27, 4, 22 as below:
ITERATION 22
...
18 t/009_twophase.pl .. ok
18 All tests successful.
18 Files=1, Tests=30, 12 wallclock secs ( 0.01 usr 0.01 sys + 0.26 cusr 0.58 csys = 0.86 CPU)
18 Result: PASS
5 make: *** wait: No child processes. Stop.
5 make: *** Waiting for unfinished jobs....
5 make: *** wait: No child processes. Stop.
parallel: This job failed:
PROVE_TESTS=t/009* NO_TEMP_INSTALL=1 timeout 60 make check -s -C src/test/recovery_5
src/test/recovery_5/tmp_check/log/009_twophase_london.log contains:
2026-02-13 21:03:28.248 EET [3987222] LOG: new timeline 2 forked off current database system timeline 1 before current recovery point 0/3029190
...
(Without "timeout 60", the test just hangs — we can see the same in [1],
the test was killed with SIGTERM after 15000 seconds...)
[1] https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=skink&dt=2026-02-04%2013%3A36%3A40&stg=recovery-check
Best regards,
Alexander
RE: BUG: Former primary node might stuck when started as a standby
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Alexander, > From my old records, 009_twophase.pl failed exactly due to background ( > namely, bgwriter's) activity. Okay, so I think there are two reasons why the test could fail. 1) old primary shut down before all changes are replicated. This can avoid by adding wait_for_replay_catchup() before the tearing down. 2) bgwriter on old primary generated the RUNNING_XACTS record and the node shut dwon before sending it. .. and you mentioned for the case 2), right? I recalled that an injection point "skip-log-running-xacts" can be used to supress generating the WAL record, see 035_standby_logical_decoding.pl. My idea is to attach the injeciton point before the switchover and avoid adding the record. Attached patch implements the idea. How do you feel? Best regards, Hayato Kuroda FUJITSU LIMITED
Вложения
Dear Kuroda-san,
16.02.2026 05:10, Hayato Kuroda (Fujitsu) wrote:
16.02.2026 05:10, Hayato Kuroda (Fujitsu) wrote:
From my old records, 009_twophase.pl failed exactly due to background ( namely, bgwriter's) activity.Okay, so I think there are two reasons why the test could fail. 1) old primary shut down before all changes are replicated. This can avoid by adding wait_for_replay_catchup() before the tearing down. 2) bgwriter on old primary generated the RUNNING_XACTS record and the node shut dwon before sending it. .. and you mentioned for the case 2), right? I recalled that an injection point "skip-log-running-xacts" can be used to supress generating the WAL record, see 035_standby_logical_decoding.pl. My idea is to attach the injeciton point before the switchover and avoid adding the record. Attached patch implements the idea. How do you feel?
Unfortunately, the testing procedure I shared above still produces failures
with the patched 009_twophase.pl. With
wal_debug = on
in /tmp/temp.config, I can see:
...
ITERATION 3
...
11 t/009_twophase.pl .. ok
11 All tests successful.
11 Files=1, Tests=30, 10 wallclock secs ( 0.02 usr 0.00 sys + 0.27 cusr 0.63 csys = 0.92 CPU)
11 Result: PASS
20 make: *** [Makefile:28: check] Terminated
parallel: This job failed:
TEMP_CONFIG=/tmp/temp.config PROVE_TESTS=t/009* NO_TEMP_INSTALL=1 timeout 60 make check -s -C src/test/recovery_20
2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl LOG: statement: PREPARE TRANSACTION 'xact_009_10';
2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl LOG: INSERT @ 0/030227F8: - Transaction/PREPARE: gid xact_009_10: 2026-02-17 07:06:44.313023+02; subxacts: 790
2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl STATEMENT: PREPARE TRANSACTION 'xact_009_10';
2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl LOG: xlog flush request 0/030227F8; write 0/00000000; flush 0/00000000
2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl STATEMENT: PREPARE TRANSACTION 'xact_009_10';
2026-02-17 07:06:44.313 EET background writer[754333] LOG: INSERT @ 0/03022838: - Standby/RUNNING_XACTS: nextXid 791 latestCompletedXid 788 oldestRunningXid 789; 1 xacts: 789; 1 subxacts: 790
2026-02-17 07:06:44.322 EET client backend[754949] 009_twophase.pl LOG: statement: SELECT pg_current_wal_flush_lsn()
2026-02-17 07:06:44.330 EET client backend[754974] 009_twophase.pl LOG: statement: SELECT '0/030227F8' <= replay_lsn AND state = 'streaming'
FROM pg_catalog.pg_stat_replication
WHERE application_name IN ('paris', 'walreceiver')
2026-02-17 07:06:44.337 EET postmaster[754306] LOG: received immediate shutdown request
2026-02-17 07:06:44.339 EET postmaster[754306] LOG: database system is shut down
...
2026-02-17 07:06:44.566 EET startup[755326] LOG: REDO @ 0/03022618; LSN 0/03022650: prev 0/030225E0; xid 0; len 24 - Standby/RUNNING_XACTS: nextXid 789 latestCompletedXid 788 oldestRunningXid 789
2026-02-17 07:06:44.566 EET startup[755326] LOG: REDO @ 0/03022650; LSN 0/030226A0: prev 0/03022618; xid 789; len 3; blkref #0: rel 1663/5/16384, blk 0 - Heap/INSERT: off: 21, flags: 0x00
2026-02-17 07:06:44.566 EET startup[755326] LOG: REDO @ 0/030226A0; LSN 0/030226F0: prev 0/03022650; xid 790; len 3; blkref #0: rel 1663/5/16384, blk 0 - Heap/INSERT: off: 22, flags: 0x00
2026-02-17 07:06:44.566 EET startup[755326] LOG: REDO @ 0/030226F0; LSN 0/030227F8: prev 0/030226A0; xid 789; len 232 - Transaction/PREPARE: gid xact_009_10: 2026-02-17 07:06:44.313023+02; subxacts: 790
2026-02-17 07:06:44.566 EET startup[755326] LOG: REDO @ 0/030227F8; LSN 0/03022838: prev 0/030226F0; xid 0; len 32 - Standby/RUNNING_XACTS: nextXid 791 latestCompletedXid 788 oldestRunningXid 789; 1 xacts: 789; 1 subxacts: 790
2026-02-17 07:06:44.566 EET startup[755326] LOG: invalid record length at 0/03022838: expected at least 24, got 0
2026-02-17 07:06:44.566 EET startup[755326] LOG: consistent recovery state reached at 0/03022838
2026-02-17 07:06:44.566 EET postmaster[755283] LOG: database system is ready to accept read-only connections
2026-02-17 07:06:44.569 EET walreceiver[755338] LOG: fetching timeline history file for timeline 2 from primary server
2026-02-17 07:06:44.569 EET walreceiver[755338] LOG: started streaming WAL from primary at 0/03000000 on timeline 1
2026-02-17 07:06:44.569 EET walreceiver[755338] LOG: replication terminated by primary server
2026-02-17 07:06:44.569 EET walreceiver[755338] DETAIL: End of WAL reached on timeline 1 at 0/030227F8.
2026-02-17 07:06:44.570 EET startup[755326] LOG: new timeline 2 forked off current database system timeline 1 before current recovery point 0/03022838
...
Regarding the approach with "skip-log-running-xacts", I wonder if there is
a strong guarantee that no other WAL write can happen in the same window
(e.g., from autovacuum)?
Best regards,
Alexander
RE: BUG: Former primary node might stuck when started as a standby
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Alexander,
> Unfortunately, the testing procedure I shared above still produces failures
> with the patched 009_twophase.pl.
Hmm, I ran the test for hours, but I could nor reproduce the failure. But let me analyze
based on your log.
> With wal_debug = on
> in /tmp/temp.config, I can see:
> 2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl LOG: statement: PREPARE TRANSACTION
'xact_009_10';
> 2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl LOG: INSERT @ 0/030227F8: - Transaction/PREPARE:
gidxact_009_10: 2026-02-17 07:06:44.313023+02; subxacts: 790
> 2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl STATEMENT: PREPARE TRANSACTION 'xact_009_10';
> 2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl LOG: xlog flush request 0/030227F8; write
0/00000000;flush 0/00000000
> 2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl STATEMENT: PREPARE TRANSACTION 'xact_009_10';
> 2026-02-17 07:06:44.313 EET background writer[754333] LOG: INSERT @ 0/03022838: - Standby/RUNNING_XACTS: nextXid
791latestCompletedXid 788 oldestRunningXid 789; 1 xacts: 789; 1 subxacts: 790
> 2026-02-17 07:06:44.322 EET client backend[754949] 009_twophase.pl LOG: statement: SELECT
pg_current_wal_flush_lsn()
> 2026-02-17 07:06:44.330 EET client backend[754974] 009_twophase.pl LOG: statement: SELECT '0/030227F8' <= replay_lsn
ANDstate = 'streaming'
> FROM pg_catalog.pg_stat_replication
> WHERE application_name IN ('paris', 'walreceiver')
> 2026-02-17 07:06:44.337 EET postmaster[754306] LOG: received immediate shutdown request
> 2026-02-17 07:06:44.339 EET postmaster[754306] LOG: database system is shut down
I have few experience to see the wal_debug output, but background writer seems to
generate the RUNNING_XACTS record. It's different from my expectation. To confirm,
did you really enable the injection point? For now 009_twophase can work without
the `-Dinjection_points=true` but it should be set to avoid random failures.
Another possibility is that bgwriter has already been passed `IS_INJECTION_POINT_ATTACHED("skip-log-running-xacts")`
before attaching the injection point. In this case background writer would
generate the RUNNING_XACTS records at the bad timing. One possible workaround
is to attach the injeciton_point bit earlier, it does not solve the root cause
though. Attached patch applies the band-aid.
> Regarding the approach with "skip-log-running-xacts", I wonder if there is
> a strong guarantee that no other WAL write can happen in the same window
> (e.g., from autovacuum)?
Good point, autovacuum and checkpoint should be also avoided. IIUC it can be done
by setting GUCs.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
Вложения
Dear Kuroda-san,
19.02.2026 05:50, Hayato Kuroda (Fujitsu) wrote:
> Dear Alexander,
>
>> Unfortunately, the testing procedure I shared above still produces failures
>> with the patched 009_twophase.pl.
> Hmm, I ran the test for hours, but I could nor reproduce the failure. But let me analyze
> based on your log.
Please look at the attached self-contained script. It works for me (failed
on iterations 6, 12, 2 right now, on my workstation with Ryzen 7900X) --
probably you could adjust number of parallel jobs to reproduce it on your
hardware.
> I have few experience to see the wal_debug output, but background writer seems to
> generate the RUNNING_XACTS record. It's different from my expectation. To confirm,
> did you really enable the injection point? For now 009_twophase can work without
> the `-Dinjection_points=true` but it should be set to avoid random failures.
I think it failed before the injection was set. My log contains:
2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl STATEMENT: PREPARE TRANSACTION 'xact_009_10';
2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl LOG: xlog flush request 0/030227F8; write
0/00000000; flush 0/00000000
2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl STATEMENT: PREPARE TRANSACTION 'xact_009_10';
2026-02-17 07:06:44.313 EET background writer[754333] LOG: INSERT @ 0/03022838: - Standby/RUNNING_XACTS: nextXid 791
latestCompletedXid 788 oldestRunningXid 789; 1 xacts: 789; 1 subxacts: 790
As far as I can see, it corresponds to this place in the test:
SAVEPOINT s1;
INSERT INTO t_009_tbl VALUES (22, 'issued to ${cur_primary_name}');
PREPARE TRANSACTION 'xact_009_10';");
+$cur_primary->wait_for_replay_catchup($cur_standby);
$cur_primary->teardown_node;
$cur_standby->promote;
And as we found out before, wait_for_replay_catchup() before teardown
doesn't help... I can't say for sure, but from my experiments, the test
didn't fail with $cur_primary->stop instead of $cur_primary->teardown_node.
Best regards,
Alexander
Вложения
Dear Kuroda-san, [ back to -hackers after short off-list discussion ] 20.02.2026 04:00, Alexander Lakhin wrote: > > Please look at the attached self-contained script. It works for me (failed > on iterations 6, 12, 2 right now, on my workstation with Ryzen 7900X) -- > probably you could adjust number of parallel jobs to reproduce it on your > hardware. > As it turned out, v2 patch works as expected, but the test may still fail when the build is configured without injection points. Maybe this test ( and similar one(s)) should be skipped in this case, not sure... Best regards, Alexander
On Mon, Mar 02, 2026 at 09:00:00AM +0200, Alexander Lakhin wrote: > As it turned out, v2 patch works as expected, but the test may still fail > when the build is configured without injection points. Maybe this test ( > and similar one(s)) should be skipped in this case, not sure... Exactly, I don't see what else we can do here except skip the sequences of the test that we know may fail if injection points are not enabled in the build. Looking at v2, the patch ought to comment the reason *why* these tests are skipped and *why* an injection point is used. The reader should have more details than just a small hint about a set of "random failures". I'd suggest to document that once with the first injection point attached, and have the other blocks hold comments telling to look at the first block. -- Michael
Вложения
RE: BUG: Former primary node might stuck when started as a standby
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Alexander, Michael, > On Mon, Mar 02, 2026 at 09:00:00AM +0200, Alexander Lakhin wrote: > > As it turned out, v2 patch works as expected, but the test may still fail > > when the build is configured without injection points. Maybe this test ( > > and similar one(s)) should be skipped in this case, not sure... > > Exactly, I don't see what else we can do here except skip the > sequences of the test that we know may fail if injection points are > not enabled in the build. I had a concern that some BF animals have not enable the injection point yet thus coverage might be decreased for them. But it's OK for me to fix it. > Looking at v2, the patch ought to comment the reason *why* these tests > are skipped and *why* an injection point is used. The reader should > have more details than just a small hint about a set of "random > failures". I'd suggest to document that once with the first injection > point attached, and have the other blocks hold comments telling to > look at the first block. I preferred to add descriptions at the place checking enable_injection_points. See the updated version. Best regards, Hayato Kuroda FUJITSU LIMITED
Вложения
On Tue, Mar 03, 2026 at 04:02:53AM +0000, Hayato Kuroda (Fujitsu) wrote: > I had a concern that some BF animals have not enable the injection point yet > thus coverage might be decreased for them. But it's OK for me to fix > it. Requiring injection points to be enabled so as we have a strict control over the standby snapshot records does not strike me as a bad requirement in itself. Most of the animals use the switch these days. It's a bit sad if this is not entirely stable in pre-v16 branches, but a stable post-v17 behavior would always be better than an unstable behavior everywhere. > I preferred to add descriptions at the place checking enable_injection_points. > See the updated version. + autovacuum = off + checkpoint_timeout = 1h Why do we need these? An explanation seems in order in the shape of a commit, or these should be removed. Is there a different trick than the one posted at [1] to check the stability of the proposal? I am wondering if I am missing something, or if that's all. Alexander? [1]: https://www.postgresql.org/message-id/e1cf52d2-c344-4dfd-a918-e5f20ff04fa2@gmail.com -- Michael
Вложения
Hello Michael,
03.03.2026 08:13, Michael Paquier wrote:
03.03.2026 08:13, Michael Paquier wrote:
+ autovacuum = off + checkpoint_timeout = 1h Why do we need these? An explanation seems in order in the shape of a commit, or these should be removed. Is there a different trick than the one posted at [1] to check the stability of the proposal? I am wondering if I am missing something, or if that's all. Alexander? [1]: https://www.postgresql.org/message-id/e1cf52d2-c344-4dfd-a918-e5f20ff04fa2@gmail.com
There was also a separate test (independent of bgwriter, autovacuum,
checkpointer,...) at the top of the thread: [1].
FWIW, 004_timeline_switch also failed the past month in the same way [2].
I've just tested 012_subtransactions.pl with the script I proposed at [3]
and it passed 100 iterations (because it performs $node_primary->stop;
before $node_standby->promote;, I suppose).
[1] https://www.postgresql.org/message-id/b0102688-6d6c-c86a-db79-e0e91d245b1a%40gmail.com
[2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2026-02-07%2006%3A33%3A20
[3] https://www.postgresql.org/message-id/045cab6f-4738-417e-b551-01adba44d6c3%40gmail.com
Best regards,
Alexander
RE: BUG: Former primary node might stuck when started as a standby
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Michael, > Requiring injection points to be enabled so as we have a strict > control over the standby snapshot records does not strike me as a bad > requirement in itself. Most of the animals use the switch these days. > It's a bit sad if this is not entirely stable in pre-v16 branches, but > a stable post-v17 behavior would always be better than an unstable > behavior everywhere. OK, thanks for the clarification. I will create patches for back branches after it is stabilized on HEAD. (Current one might be able to be applied, not sure) > + autovacuum = off > + checkpoint_timeout = 1h > > Why do we need these? An explanation seems in order in the shape of a > commit, or these should be removed. I think it is needed to stabilize the test. If checkpointer and/or may generate the WAL between the switchover, the test can fail with the same reason as bgwriter's activity. I added comments for it. > Is there a different trick than the one posted at [1] to check the > stability of the proposal? I am wondering if I am missing something, > or if that's all. Alexander? > > [1]: > https://www.postgresql.org/message-id/e1cf52d2-c344-4dfd-a918-e5f20ff04fa > 2@gmail.com To clarify, I have no knowledge for it. Best regards, Hayato Kuroda FUJITSU LIMITED
Вложения
RE: BUG: Former primary node might stuck when started as a standby
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Alexander, > FWIW, 004_timeline_switch also failed the past month in the same way [2]. Thanks for the info. So I can provide the patch after the issue for 009_twophase.pl is fixed. For better understanding we may be able to fork new thread. > I've just tested 012_subtransactions.pl with the script I proposed at [3] > and it passed 100 iterations (because it performs $node_primary->stop; > before $node_standby->promote;, I suppose). I guess so. cluster::stop does the `pg_ctl stop -m fast` command. In this case the walsender waits till there are nothing to be sent, see WalSndLoop(). Do let me know if you have observed the similar failure here. Best regards, Hayato Kuroda FUJITSU LIMITED
On Tue, Mar 03, 2026 at 09:17:16AM +0000, Hayato Kuroda (Fujitsu) wrote: > Thanks for the info. So I can provide the patch after the issue for 009_twophase.pl > is fixed. For better understanding we may be able to fork new > thread. Regarding your posted v4, I am actually not convinced that there is a need for injection points and disabling standby snapshots, for the three sequences of tests proposed. While the first wait_for_replay_catchup() can be useful before the teardown_node() of the primary in the "Check that prepared transactions can be committed on promoted standby" sequence, it still has a limited impact. It looks like we could have other parasite records as well, depending on how slowly the primary is stopped? I think that we should switch to a plain stop() of the primary, the test wants to check that prepared transactions can be committed on a standby. Stopping the primary abruptly does not matter for this sequence. For the second wait_for_replay_catchup(), after the PREPARE of xact_009_11. I may be missing something but in how does it change things? A plain stop() of the primary means that it would have received all the WAL records from the primary on disk in its pg_wal, no? Upon restart, it should replay everything it finds in pg_wal/. I don't see a change required here. For the third wait_for_replay_catchup(), after the PREPARE of xact_009_12, same dance. The primary is cleanly stopped first. All the WAL records of the primary should have been flushed to the standby. As a whole, it looks like we should just switch the teardown() call to a stop() call in the first test with xact_009_10, backpatch it, and call it a day. No need for injection points and no need for GUC tweaks. I have not looked at 004_timeline_switch yet. > I guess so. cluster::stop does the `pg_ctl stop -m fast` command. In this case > the walsender waits till there are nothing to be sent, see WalSndLoop(). > Do let me know if you have observed the similar failure here. Exactly. Doing a clean stop of the primary offers a strong guarantee here. We are sure that the standby will have received all the records from the primary. Timeline forking is an impossible thing in 012_subtransactions.pl based on how the switchover from the primary to the standby happens. I don't see a need for tweaking this test at all. Or perhaps you did see a failure of some kind in this test, Alexander? -- Michael
Вложения
On Wed, Mar 04, 2026 at 02:31:29PM +0900, Michael Paquier wrote: > As a whole, it looks like we should just switch the teardown() call to > a stop() call in the first test with xact_009_10, backpatch it, and > call it a day. No need for injection points and no need for GUC > tweaks. With a little bit more patience, I have reproduced the same failure as Alexander using the bgwriter trick, -DWAL_DEBUG and his reproducer script with parallel runs of the 009 recovery test. The attached patch is also proving to work. The failure happens at the 2nd~3rd iteration without the fix, and the tests are able to last more than 50 iterations with the fix. As far as I can see by scanning the history of the test, this is a copy-pasto coming from 30820982b295 where the tests were initially introduced, where teardown_node() was copied across the test sequences. As we want to check that a promoted standby is able to commit the 2PC transactions issued on the primary, a plain stop() will equally work. I'll push this fix shortly, taking care of one instability. Nice investigation on this one, Alexander, by the way. -- Michael
Вложения
On Wed, Mar 04, 2026 at 03:29:35PM +0900, Michael Paquier wrote: > I'll push this fix shortly, taking care of one instability. Nice > investigation on this one, Alexander, by the way. With the patch attached, for reference.. -- Michael
Вложения
RE: BUG: Former primary node might stuck when started as a standby
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Michael, Thanks for the smarter solutions. Actually I have misunderstood that teardown_node is used everywhere and it's necessary. Your analysis looks completely correct. Best regards, Hayato Kuroda FUJITSU LIMITED
Hello Michael,
04.03.2026 07:31, Michael Paquier wrote
04.03.2026 07:31, Michael Paquier wrote
I guess so. cluster::stop does the `pg_ctl stop -m fast` command. In this case the walsender waits till there are nothing to be sent, see WalSndLoop(). Do let me know if you have observed the similar failure here.Exactly. Doing a clean stop of the primary offers a strong guarantee here. We are sure that the standby will have received all the records from the primary. Timeline forking is an impossible thing in 012_subtransactions.pl based on how the switchover from the primary to the standby happens. I don't see a need for tweaking this test at all. Or perhaps you did see a failure of some kind in this test, Alexander?
Yes, 012_subtransactions doesn't fail with aggressive bgwriter, as I noted
before. I mentioned it exactly to show that stop does matter here. But if
we recognize teardown_node in this context as risky, maybe it would make
sense to review also other tests in recovery/. I already wrote about
004_timeline_switch, but probably there are more. E.g., 028_pitr_timelines
(I haven't tested it intensively yet) does:
$node_primary->stop('immediate');
# Promote the standby, and switch WAL so that it archives a WAL segment
# that contains all the INSERTs, on a new timeline.
$node_standby->promote;
Best regards,
Alexander
On Wed, Mar 04, 2026 at 10:00:00AM +0200, Alexander Lakhin wrote:
> Yes, 012_subtransactions doesn't fail with aggressive bgwriter, as I noted
> before. I mentioned it exactly to show that stop does matter here. But if
> we recognize teardown_node in this context as risky, maybe it would make
> sense to review also other tests in recovery/. I already wrote about
> 004_timeline_switch, but probably there are more. E.g., 028_pitr_timelines
> (I haven't tested it intensively yet) does:
> $node_primary->stop('immediate');
>
> # Promote the standby, and switch WAL so that it archives a WAL segment
> # that contains all the INSERTs, on a new timeline.
> $node_standby->promote;
I think that your take about 004 is actually right, looking at it more
closely. By tearing down the primary, it could be possible that
standby_2 receives more records than standby_1. Then, when we try to
reconnect standby_2 to the promoted standby_1, the TLI could fork, in
theory. The fix would be the same: by switching to stop(), we'd make
sure that both standby_1 and standby_2 have received all the records
from the primary. We can also remove the wait_for_catchup() before
the primary is stopped, this offers no protection for standby_2
receiving more records from the primary than standby_1.
It is not surprising that this failure with a three-node scenario is
much harder to reproduce. I have run the same loop as 009 but things
are super stable even after 50-ish iteractions. By reading the code,
I agree that the failure is possible to reach in theory, though. Some
hardcoded sleeps would do the trick (make bgwriter aggressive, patch
the checkpointer so as we do not send the last standby snapshot
records to standby_2, only to standby_1, etc.).
Did you find any buildfarm failures involving 028? I cannot get
excited in changing tests where nothing has happened, and this test
looks OK as we don't do a switchover. For 004, we have at least one
failure recorded based on what you said. That's a fact sufficient for
me to fix things, for 004.
--
Michael
Вложения
04.03.2026 10:23, Michael Paquier wrote: > Did you find any buildfarm failures involving 028? I cannot get > excited in changing tests where nothing has happened, and this test > looks OK as we don't do a switchover. For 004, we have at least one > failure recorded based on what you said. That's a fact sufficient for > me to fix things, for 004. No, I didn't. I agree that eliminating already known failures makes more sense. Anyway, we'll have the thread which explain why such failures can happen. Best regards, Alexander
On Wed, Mar 04, 2026 at 11:00:00AM +0200, Alexander Lakhin wrote: > No, I didn't. I agree that eliminating already known failures makes more > sense. Anyway, we'll have the thread which explain why such failures can > happen. Fixed this second instability now with the same trick as 009, down to v14, as there is no strong requirement for an immediate stop of the primary to check that a standby is able to to a TLI jump when reconnecting to the promoted standby. -- Michael
Вложения
RE: BUG: Former primary node might stuck when started as a standby
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Michael, Alexander, Thanks for pushing fixes. I checked buildfarm animals and I could not find failures related with it (configure failure might mask them though). We can keep on eye and come again if there are not still robust. Thanks! Best regards, Hayato Kuroda FUJITSU LIMITED