On Mon, Apr 29, 2024 at 09:12:48AM +0800, Richard Guo wrote:
> Does anyone have any clue to this failure?
>
> FWIW, after another run of this test, the failure just disappears. Does
> it suggest that the test case is flaky?
If you grep the source tree, you'd notice that a prepared transaction
named gxid only exists in the 2PC tests of ECPG, in
src/interfaces/ecpg/test/sql/twophase.pgc. So the origin of the
failure comes from a race condition due to test parallelization,
because the scan of pg_prepared_xacts affects all databases with
installcheck, and in your case it means that the scan of
pg_prepared_xacts was running in parallel of the ECPG tests with an
installcheck.
The only location in the whole tree where we want to do predictible
scans of pg_prepared_xacts is prepared_xacts.sql, so rather than
playing with 2PC transactions across a bunch of tests, I think that we
should do two things, both touching prepared_xacts.sql:
- The 2PC transactions run in the main regression test suite should
use names that would be unlikely used elsewhere.
- Limit the scans of pg_prepared_xacts on these name patterns to avoid
interferences.
See for example the attached with both expected outputs updated
depending on the value set for max_prepared_transactions in the
backend. There may be an argument in back-patching that, but I don't
recall seeing this failure in the CI, so perhaps that's not worth
bothering with. What do you think?
--
Michael