05.01.2024 02:48, Michael Paquier wrote:
> On Thu, Jan 04, 2024 at 04:00:01PM +0300, Alexander Lakhin wrote:
>> Reproduced here.
> Did you just make the run slow enough to show the failure with
> valgrind?
Yes, I just run several test instances (with no extra modifications) under
Valgrind with parallel as follows:
for i in `seq 20`; do cp -r src/test/recovery/ src/test/recovery_$i/; sed "s|src/test/recovery|src/test/recovery_$i|"
-i
src/test/recovery_$i/Makefile; done
for i in `seq 20`; do echo "iteration $i"; parallel --halt now,fail=1 -j20 --linebuffer --tag PROVE_TESTS="t/035*"
NO_TEMP_INSTALL=1 make check -s -C src/test/recovery_{} PROVE_FLAGS="--timer" ::: `seq 20` || break; done
The test run fails for me on iterations 9, 8, 14 like so:
...
iteration 9
...
5
5 # Failed test 'inactiveslot slot invalidation is logged with vacuum on pg_authid'
5 # at t/035_standby_logical_decoding.pl line 224.
5
5 # Failed test 'activeslot slot invalidation is logged with vacuum on pg_authid'
5 # at t/035_standby_logical_decoding.pl line 229.
13 [07:35:53] t/035_standby_logical_decoding.pl .. ok 432930 ms ( 0.02 usr 0.00 sys + 292.08 cusr 13.05 csys =
305.15 CPU)
13 [07:35:53]
13 All tests successful.
...
I've also reproduced it without Valgrind in a VM with CPU slowed down to
5% (on iterations 2, 5, 4), where average test duration is about 800 sec:
4
4 # Failed test 'inactiveslot slot invalidation is logged with vacuum on pg_authid'
4 # at t/035_standby_logical_decoding.pl line 222.
4
4 # Failed test 'activeslot slot invalidation is logged with vacuum on pg_authid'
4 # at t/035_standby_logical_decoding.pl line 227.
6 [15:16:53] t/035_standby_logical_decoding.pl .. ok 763168 ms ( 0.68 usr 0.10 sys + 19.55 cusr 102.59 csys =
122.92 CPU)
>
>> As I can see in the failure logs you referenced, the first problem is:
>> # Failed test 'inactiveslot slot invalidation is logged with vacuum on pg_authid'
>> # at t/035_standby_logical_decoding.pl line 224.
>> It reminded me of:
>> https://www.postgresql.org/message-id/b2a1f7d0-7629-72c0-2da7-e9c4e336b010%40gmail.com
>>
>> It seems that it's not something new, and maybe that my analysis is still
>> valid. If so, VACUUM FREEZE/autovacuum = off should fix the issue.
> Not sure about that yet.
>
Your suspicion was proved right. After
git show c161ab74f src/test/recovery/t/035_standby_logical_decoding.pl | git apply -R
20 iterations with 20 tests in parallel performed successfully for me
(twice).
So it looks like c161ab74f really made the things worse.
Best regards,
Alexander