Force the old transactions logs cleanup even if checkpoint is skipped

Поиск
Список
Период
Сортировка
От Zakhlystov, Daniil (Nebius)
Тема Force the old transactions logs cleanup even if checkpoint is skipped
Дата
Msg-id AM9P190MB12346310F38B3FAF9287D1FFB5D6A@AM9P190MB1234.EURP190.PROD.OUTLOOK.COM
обсуждение исходный текст
Ответы Re: Force the old transactions logs cleanup even if checkpoint is skipped  (Shlok Kyal <shlok.kyal.oss@gmail.com>)
Re: Force the old transactions logs cleanup even if checkpoint is skipped  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
Hi, hackers!

I've stumbled into an interesting problem. Currently, if Postgres has nothing to write, it would skip the checkpoint
creationdefined by the checkpoint timeout setting. However, we might face a temporary archiving problem (for example,
somenetwork issues) that might lead to a pile of wal files stuck in pg_wal. After this temporary issue has gone, we
wouldstill be unable to archive them since we effectively skip the checkpoint because we have nothing to write. 

That might lead to a problem - suppose you've run out of disk space because of the temporary failure of the archiver.
Afterthis temporary failure has gone, Postgres would be unable to recover from it automatically and will require human
attentionto initiate a CHECKPOINT call. 

I suggest changing this behavior by trying to clean up the old WAL even if we skip the main checkpoint routine. I've
attachedthe patch that does exactly that. 

What do you think?

To reproduce the issue, you might repeat the following steps:

1. Init Postgres:
pg_ctl initdb -D /Users/usernamedt/test_archiver

2. Add the archiver script to simulate failure:
➜  ~ cat /Users/usernamedt/command.sh
#!/bin/bash

false

3. Then alter the PostgreSQL conf:

archive_mode = on
checkpoint_timeout = 30s
archive_command = /Users/usernamedt/command.sh
log_min_messages = debug1

4. Then start Postgres:
/usr/local/pgsql/bin/pg_ctl -D /Users/usernamedt/test_archiver -l logfile start

5. Insert some data:
pgbench -i -s 30 -d postgres

6. Trigger checkpoint to flush all data:
psql -c "checkpoint;"

7. Alter the archiver script to simulate the end of archiver issues:
➜  ~ cat /Users/usernamedt/command.sh
#!/bin/bash

true

8. Check that the WAL files are actually archived but not removed:
➜  ~ ls -lha /Users/usernamedt/test_archiver/pg_wal/archive_status | head
total 0
drwx------@ 48 usernamedt  LD\Domain Users   1.5K Oct 17 17:44 .
drwx------@ 50 usernamedt  LD\Domain Users   1.6K Oct 17 17:43 ..
-rw-------@  1 usernamedt  LD\Domain Users     0B Oct 17 17:42 000000010000000000000040.done
...
-rw-------@  1 usernamedt  LD\Domain Users     0B Oct 17 17:43 00000001000000000000006D.done

2023-10-17 18:03:44.621 +04 [71737] DEBUG:  checkpoint skipped because system is idle

Thanks,

Daniil Zakhlystov
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: run pgindent on a regular basis / scripted manner
Следующее
От: Tom Lane
Дата:
Сообщение: Re: run pgindent on a regular basis / scripted manner