Bug report - pg_upgrade tool seems to have a race condition when trying to delete a pg_wal file

Поиск
Список
Период
Сортировка
От Waka Ranai
Тема Bug report - pg_upgrade tool seems to have a race condition when trying to delete a pg_wal file
Дата
Msg-id CAP8Vo=9o0FE6gzqZJ3XdeGPNqi=eNV3cM_6v-thE640YcYoWog@mail.gmail.com
обсуждение исходный текст
Ответы Re: Bug report - pg_upgrade tool seems to have a race condition when trying to delete a pg_wal file
Список pgsql-bugs

Hello,

 

I tested the pg_upgrade tool many times on different servers (always Windows server 19, actual subversion may differ) when trying to upgrade an existing database from Postgres 9.6 to Postgres 15 (I tried both the 15.4.2 and 15.7) and was almost all the time faced with this issue during the step “Setting next transaction ID and epoch for new cluster”.

Here’s the version of one of the servers, on which it failed at least three times :

 image.png

 

The command I ran is "C:\Program Files\PostgreSQL\15\bin\pg_upgrade.exe" -d "C:\Program Files\PostgreSQL\9.6\data" -D "C:\Program Files\PostgreSQL\15\data" -b "C:\Program Files\PostgreSQL\9.6\bin" -B "C:\Program Files\PostgreSQL\15\bin" -U postgres after having set PGPASSWORD to the correct password.

 

The issue was either “pg_resetwal: error: could not delete file "pg_wal/000000010000000000000001": Permission denied” or sometimes it was saying that the file could not be found instead of Permission denied. When I look in the directory while it is executing, I can see that the file is there previously, and always removed after the pg_upgrade crashes. I tried to inspect with Process Explorer what processes were using it, always processes from postgres, only one after a fresh install of postgres 15, but I saw that during the execution of pg_upgrade, sometimes two processes were using it.

 

I suspect that there is some sort of race condition where one process sees that the file exists, does something with it and deletes it, while another process saw the file existing, but upon trying to delete it, it could not find it anymore. I had a look in the code and I believe it happens in the function KillExistingXLOG from line 973 of pg_resetwal.c (https://github.com/postgres/postgres/blob/master/src/bin/pg_resetwal/pg_resetwal.c#L973) though I cannot be entirely sure of the cause.

 

You can find the logs produced by the pg_upgrade tool attached, with the verbose option.

 

Thanks in advance for the investigation and I hope to understand better the problem and hopefully see a fix soon as it is complicating the deployment of a major upgrade of our product,

 

Have a great day,

 

Thomas

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Ugur Yilmaz
Дата:
Сообщение: Ynt: Ynt: Postgresql 16.3 installation error (setup file) on Windows 11
Следующее
От: Robert Haas
Дата:
Сообщение: Re: BUG #18362: unaccent rules and Old Greek text