Re: Testing autovacuum wraparound (including failsafe)

Поиск
Список
Период
Сортировка
От Alexander Lakhin
Тема Re: Testing autovacuum wraparound (including failsafe)
Дата
Msg-id 5811175c-1a31-4869-032f-7af5e3e4506a@gmail.com
обсуждение исходный текст
Ответ на Re: Testing autovacuum wraparound (including failsafe)  (Masahiko Sawada <sawada.mshk@gmail.com>)
Список pgsql-hackers
Hello,

30.11.2023 10:35, Masahiko Sawada wrote:
>
>> I've attached new version patches (0002 and 0003 are unchanged except
>> for the commit message). I'll push them, barring any objections.
>>
> Pushed.

I've discovered that the test 001_emergency_vacuum.pl can fail due to a
race condition. I can't see the server log at [1], but I reproduced the
failure locally and with additional logging and log_min_messages = DEBUG3,
the log shows:
...
2024-05-22 11:46:28.125 UTC [21256:2853] DEBUG:  SlruScanDirectory invoking callback on pg_xact/0690
2024-05-22 11:46:28.125 UTC [21256:2854] DEBUG:  transaction ID wrap limit is 2147484396, limited by database with OID
5
2024-05-22 11:46:28.126 UTC [21256:2855] LOG: !!!SendPostmasterSignal| PMSIGNAL_START_AUTOVAC_LAUNCHER
2024-05-22 11:46:28.135 UTC [14871:20077] DEBUG:  postmaster received pmsignal signal
2024-05-22 11:46:28.137 UTC [21257:1] DEBUG:  autovacuum launcher started
2024-05-22 11:46:28.137 UTC [21257:2] DEBUG:  InitPostgres
2024-05-22 11:46:28.138 UTC [21257:3] LOG:  !!!AutoVacLauncherMain| !AutoVacuumingActive() && !ShutdownRequestPending;

before do_start_worker()
2024-05-22 11:46:28.138 UTC [21257:4] LOG:  !!!do_start_worker| return quickly when there are no free workers
2024-05-22 11:46:28.138 UTC [21257:5] DEBUG:  shmem_exit(0): 4 before_shmem_exit callbacks to make
2024-05-22 11:46:28.138 UTC [21257:6] DEBUG:  shmem_exit(0): 6 on_shmem_exit callbacks to make
2024-05-22 11:46:28.138 UTC [21257:7] DEBUG:  proc_exit(0): 1 callbacks to make
2024-05-22 11:46:28.138 UTC [21257:8] DEBUG:  exit(0)
2024-05-22 11:46:28.138 UTC [21257:9] DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
2024-05-22 11:46:28.138 UTC [21257:10] DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
2024-05-22 11:46:28.138 UTC [21257:11] DEBUG:  proc_exit(-1): 0 callbacks to make
2024-05-22 11:46:28.146 UTC [21256:2856] DEBUG:  MultiXactId wrap limit is 2147483648, limited by database with OID 5
2024-05-22 11:46:28.146 UTC [21256:2857] DEBUG:  MultiXact member stop limit is now 4294914944 based on MultiXact 1
2024-05-22 11:46:28.146 UTC [21256:2858] DEBUG:  shmem_exit(0): 4 before_shmem_exit callbacks to make
2024-05-22 11:46:28.147 UTC [21256:2859] DEBUG:  shmem_exit(0): 7 on_shmem_exit callbacks to make
2024-05-22 11:46:28.147 UTC [21256:2860] DEBUG:  proc_exit(0): 1 callbacks to make
2024-05-22 11:46:28.147 UTC [21256:2861] DEBUG:  exit(0)
2024-05-22 11:46:28.147 UTC [21256:2862] DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
2024-05-22 11:46:28.147 UTC [21256:2863] DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
2024-05-22 11:46:28.147 UTC [21256:2864] DEBUG:  proc_exit(-1): 0 callbacks to make
2024-05-22 11:46:28.151 UTC [14871:20078] DEBUG:  forked new backend, pid=21258 socket=8
2024-05-22 11:46:28.171 UTC [14871:20079] DEBUG:  server process (PID 21256) exited with exit code 0
2024-05-22 11:46:28.152 UTC [21258:1] [unknown] LOG:  connection received: host=[local]
2024-05-22 11:46:28.171 UTC [21258:2] [unknown] DEBUG:  InitPostgres
2024-05-22 11:46:28.172 UTC [21258:3] [unknown] LOG:  connection authenticated: user="vagrant" method=trust 

(/pgtest/postgresql.git/src/test/modules/xid_wraparound/tmp_check/t_001_emergency_vacuum_main_data/pgdata/pg_hba.conf:117)
2024-05-22 11:46:28.172 UTC [21258:4] [unknown] LOG:  connection authorized: user=vagrant database=postgres 
application_name=001_emergency_vacuum.pl
2024-05-22 11:46:28.175 UTC [21258:5] 001_emergency_vacuum.pl LOG: statement: INSERT INTO small(data) SELECT 1

That is, autovacuum worker (21256) sent PMSIGNAL_START_AUTOVAC_LAUNCHER,
postmaster started autovacuum launcher, which could not start new
autovacuum worker due to the process 21256 not exited yet.

The failure can be reproduced easily with the sleep added inside
SetTransactionIdLimit():
         if (TransactionIdFollowsOrEquals(curXid, xidVacLimit) &&
                 IsUnderPostmaster && !InRecovery)
SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER);
+pg_usleep(10000L);

By the way I also discovered that rather resource-intensive xid_wraparound
tests executed twice during the buildfarm "make" run (on dodo and perentie
(see [2]) animals), at stage module-xid_wraparound-check and then at stage
testmodules-install-check-C.

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dodo&dt=2024-05-19%2006%3A33%3A34
[2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=perentie&dt=2024-05-22%2000%3A02%3A19

Best regards,
Alexander



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: State of pg_createsubscriber
Следующее
От: torikoshia
Дата:
Сообщение: Re: First draft of PG 17 release notes