Обсуждение: Instability in postgres_fdw regression tests
Several BF animals have intermittently shown this regression diff:
diff -U3 /home/bf/bf-build/culicidae/HEAD/pgsql/contrib/postgres_fdw/expected/postgres_fdw.out
/home/bf/bf-build/culicidae/HEAD/pgsql.build/testrun/postgres_fdw-running/regress/results/postgres_fdw.out
--- /home/bf/bf-build/culicidae/HEAD/pgsql/contrib/postgres_fdw/expected/postgres_fdw.out 2025-12-29
19:48:22.661603936+0100
+++ /home/bf/bf-build/culicidae/HEAD/pgsql.build/testrun/postgres_fdw-running/regress/results/postgres_fdw.out
2026-02-1000:31:31.856460156 +0100
@@ -6519,6 +6519,7 @@
UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *;
c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
------+----+-----+----+----+----+------------+----
+ 2010 | 0 | bar | | | | ft2 |
2001 | 1 | bar | | | | ft2 |
2002 | 2 | bar | | | | ft2 |
2003 | 3 | bar | | | | ft2 |
@@ -6528,7 +6529,6 @@
2007 | 7 | bar | | | | ft2 |
2008 | 8 | bar | | | | ft2 |
2009 | 9 | bar | | | | ft2 |
- 2010 | 0 | bar | | | | ft2 |
(10 rows)
EXPLAIN (verbose, costs off)
The above is from culicidae [1], and greenfly has shown it a few times
[2], and here's one from scorpion [3], and crake on a back branch [4],
and there are a few more in the past 90 days.
It's pretty clear what is happening: the rows we are looking at are
being returned by a seqscan, and they were just inserted a few lines
above into a table that has been modified multiple times already.
So the test is reliant on them being inserted in sequence at the
end of the table, yet sometimes the last row is going into free
space someplace earlier.
One's first instinct is to blame autovacuum, but the test already
goes out of its way to disable that:
-- Disable autovacuum for these tables to avoid unexpected effects of that
ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false');
After experimenting for awhile I think I have (part of) the answer.
All of the failing animals are using meson, which means that this
"installcheck" test case is probably running in parallel with other
test scripts in other databases in the same cluster. I've not
reproduced the exact symptom seen in the buildfarm, but I can easily
make the postgres_fdw test put these rows at different ctids if
I leave a transaction sitting open in a different database. So I
think we are seeing some effect of opportunistic page pruning behaving
differently depending on whether there is a concurrent transaction.
It's not clear to me that it's worth running this to ground in any
more detail than that. The behavior is not wrong; it's the test's
fault to assume that these rows will be returned in a deterministic
order. So I think the right fix is to adjust the test query,
along the lines of
-UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *;
+WITH cte AS (
+ UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *
+) SELECT * FROM cte ORDER BY c1;
Thoughts, better ideas?
regards, tom lane
[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2026-02-09%2023%3A21%3A25
[2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=greenfly&dt=2026-01-20%2004%3A32%3A29
[3] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=scorpion&dt=2025-12-09%2015%3A23%3A53
[4] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2025-12-04%2017%3A50%3A38
On Tue, Feb 10, 2026 at 12:06:02PM -0500, Tom Lane wrote: > It's not clear to me that it's worth running this to ground in any > more detail than that. The behavior is not wrong; it's the test's > fault to assume that these rows will be returned in a deterministic > order. So I think the right fix is to adjust the test query, > along the lines of > > -UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *; > +WITH cte AS ( > + UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING * > +) SELECT * FROM cte ORDER BY c1; +1. I faintly recall looking into this a while ago and, for some reason, I was worried that this would become a game of Whac-A-Mole, so apparently I didn't follow through. But fixing this query is still an improvement over the status quo. -- nathan
Hello Tom, 10.02.2026 19:06, Tom Lane wrote: > Several BF animals have intermittently shown this regression diff: > > diff -U3 /home/bf/bf-build/culicidae/HEAD/pgsql/contrib/postgres_fdw/expected/postgres_fdw.out /home/bf/bf-build/culicidae/HEAD/pgsql.build/testrun/postgres_fdw-running/regress/results/postgres_fdw.out > --- /home/bf/bf-build/culicidae/HEAD/pgsql/contrib/postgres_fdw/expected/postgres_fdw.out 2025-12-29 19:48:22.661603936+0100 > +++ /home/bf/bf-build/culicidae/HEAD/pgsql.build/testrun/postgres_fdw-running/regress/results/postgres_fdw.out 2026-02-1000:31:31.856460156 +0100 > @@ -6519,6 +6519,7 @@ > UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *; > c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 > ------+----+-----+----+----+----+------------+---- > + 2010 | 0 | bar | | | | ft2 | > 2001 | 1 | bar | | | | ft2 | > 2002 | 2 | bar | | | | ft2 | > 2003 | 3 | bar | | | | ft2 | > @@ -6528,7 +6529,6 @@ > 2007 | 7 | bar | | | | ft2 | > 2008 | 8 | bar | | | | ft2 | > 2009 | 9 | bar | | | | ft2 | > - 2010 | 0 | bar | | | | ft2 | > (10 rows) > > EXPLAIN (verbose, costs off) > > The above is from culicidae [1], and greenfly has shown it a few times > [2], and here's one from scorpion [3], and crake on a back branch [4], > and there are a few more in the past 90 days. FWIW, all the failures of this ilk are tracked at [1]. > It's pretty clear what is happening: the rows we are looking at are > being returned by a seqscan, and they were just inserted a few lines > above into a table that has been modified multiple times already. > So the test is reliant on them being inserted in sequence at the > end of the table, yet sometimes the last row is going into free > space someplace earlier. > > One's first instinct is to blame autovacuum, but the test already > goes out of its way to disable that: > > -- Disable autovacuum for these tables to avoid unexpected effects of that > ALTER TABLE "S 1"."T 1" SET (autovacuum_enabled = 'false'); I reproduced and reported the failure before, please look at [2] — probably some information there could be helpful... [1] https://wiki.postgresql.org/wiki/Known_Buildfarm_Test_Failures#postgres_fdw.sql_might_fail_due_to_autovacuum [2] https://www.postgresql.org/message-id/flat/867266ef-3dd1-44a9-a203-27cb5d2be58d%40gmail.com Best regards, Alexander
Nathan Bossart <nathandbossart@gmail.com> writes:
> On Tue, Feb 10, 2026 at 12:06:02PM -0500, Tom Lane wrote:
>> It's not clear to me that it's worth running this to ground in any
>> more detail than that. The behavior is not wrong; it's the test's
>> fault to assume that these rows will be returned in a deterministic
>> order. So I think the right fix is to adjust the test query,
>> along the lines of
>>
>> -UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *;
>> +WITH cte AS (
>> + UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *
>> +) SELECT * FROM cte ORDER BY c1;
> +1. I faintly recall looking into this a while ago and, for some reason, I
> was worried that this would become a game of Whac-A-Mole, so apparently I
> didn't follow through. But fixing this query is still an improvement over
> the status quo.
Yeah, it's certainly fair to wonder where else we have
even-lower-probability test interactions. But I don't think
getting rid of the interaction is realistic, especially given
Alexander's results (which I confess to having forgotten about)
that show that autovacuum is involved in this somehow despite
being disabled on this particular table. So the answer has to
be to make the test case more robust against such things.
regards, tom lane
On Wed, Feb 11, 2026 at 3:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Nathan Bossart <nathandbossart@gmail.com> writes: > > On Tue, Feb 10, 2026 at 12:06:02PM -0500, Tom Lane wrote: > >> It's not clear to me that it's worth running this to ground in any > >> more detail than that. The behavior is not wrong; it's the test's > >> fault to assume that these rows will be returned in a deterministic > >> order. So I think the right fix is to adjust the test query, > >> along the lines of > >> > >> -UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *; > >> +WITH cte AS ( > >> + UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING * > >> +) SELECT * FROM cte ORDER BY c1; > > > +1. I faintly recall looking into this a while ago and, for some reason, I > > was worried that this would become a game of Whac-A-Mole, so apparently I > > didn't follow through. But fixing this query is still an improvement over > > the status quo. > > Yeah, it's certainly fair to wonder where else we have > even-lower-probability test interactions. But I don't think > getting rid of the interaction is realistic, especially given > Alexander's results (which I confess to having forgotten about) > that show that autovacuum is involved in this somehow despite > being disabled on this particular table. So the answer has to > be to make the test case more robust against such things. +1 for that. I noticed this problem because of Alexander's report, but I completely forgot about it... Thanks! Best regards, Etsuro Fujita