Re: Replace O_EXCL with O_TRUNC for creation of state.tmp in SaveSlotToPath
От | Michael Paquier |
---|---|
Тема | Re: Replace O_EXCL with O_TRUNC for creation of state.tmp in SaveSlotToPath |
Дата | |
Msg-id | aOdsBLhCdFaRX8GI@paquier.xyz обсуждение исходный текст |
Ответ на | Replace O_EXCL with O_TRUNC for creation of state.tmp in SaveSlotToPath (Kevin K Biju <kevinkbiju@gmail.com>) |
Ответы |
Re: Replace O_EXCL with O_TRUNC for creation of state.tmp in SaveSlotToPath
|
Список | pgsql-hackers |
On Tue, Sep 30, 2025 at 05:21:05PM +0530, Kevin K Biju wrote: > We have encountered a few instances where logical replication errors out > during SaveSlotToPath() after creating the state.tmp file, but before it > was renamed (due to ENOSPC, for example). In these cases, since state.tmp > is not cleaned up and is created with the O_EXCL flag, further invocations > of SaveSlotToPath() for this slot will error out on OpenTransientFile() > with EEXIST, completely blocking slot metadata persistence. The only > explicit cleanup for state.tmp occurs during server startup as part of > RestoreSlotFromDisk(). Ah, you are referring to the window between a CloseTransientFile() completing and the rename(). It's not the first time this report pops up. I have found two references, for the same error as yours, with one referring to a discussion about O_EXCL vs O_TRUNC: https://www.postgresql.org/message-id/08bbfab1-a61d-3750-fc18-4ab2c1aa7f09@postgrespro.ru https://www.postgresql.org/message-id/3559061693910326@qy4q4a6esb2lebnz.sas.yp-c.yandex.net > It doesn't seem that this function relies on data written to state.tmp > previously, so O_EXCL is unnecessary. Attaching a patch that swaps O_EXCL > for O_TRUNC, ensuring a fresh state.tmp is available for writing. Using O_TRUNC has been discussed and discarded because O_EXCL is more protective in this specific code path, see the argument here: https://www.postgresql.org/message-id/20191202161222.sazl2omhhk5pl3nl@alap3.anarazel.de An alternative fix that we can do here instead is to unlink() the temporary file when reaching on these error code paths, allowing future accesses to work correctly. This was suggested as a second solution, other than the O_TRUNC objected to. One thing is to make sure that the unlinks are done while holding the lwlock for the IO in progress. So, something like the attached should also solve your problem. Any thoughts or comments from the others? I'd like to backpatch that all the way down, 6 years too late. But later is better than never, right? -- Michael
Вложения
В списке pgsql-hackers по дате отправления: