whether to unlink the existing state.tmp file in SaveSlotToPath

Поиск
Список
Период
Сортировка
От Sergei Kornilov
Тема whether to unlink the existing state.tmp file in SaveSlotToPath
Дата
Msg-id 3559061693910326@qy4q4a6esb2lebnz.sas.yp-c.yandex.net
обсуждение исходный текст
Список pgsql-hackers
Hello
I encountered a very lucky logical decoding error on the publisher:

2023-09-05 09:58:38.955 UTC 28316 melkij@postgres from [local] [vxid:3/0 txid:0] [START_REPLICATION] LOG:  starting
logicaldecoding for slot "pubsub"
 
2023-09-05 09:58:38.955 UTC 28316 melkij@postgres from [local] [vxid:3/0 txid:0] [START_REPLICATION] DETAIL:  Streaming
transactionscommitting after 0/16AD5F8, reading WAL from 0/16AD5F8.
 
2023-09-05 09:58:38.955 UTC 28316 melkij@postgres from [local] [vxid:3/0 txid:0] [START_REPLICATION] STATEMENT:
START_REPLICATIONSLOT "pubsub" LOGICAL 0/16AD5F8 (proto_version '4', origin 'any', publication_names '"testpub"')
 
2023-09-05 09:58:38.956 UTC 28316 melkij@postgres from [local] [vxid:3/0 txid:0] [START_REPLICATION] LOG:  logical
decodingfound consistent point at 0/16AD5F8
 
2023-09-05 09:58:38.956 UTC 28316 melkij@postgres from [local] [vxid:3/0 txid:0] [START_REPLICATION] DETAIL:  There are
norunning transactions.
 
2023-09-05 09:58:38.956 UTC 28316 melkij@postgres from [local] [vxid:3/0 txid:0] [START_REPLICATION] STATEMENT:
START_REPLICATIONSLOT "pubsub" LOGICAL 0/16AD5F8 (proto_version '4', origin 'any', publication_names '"testpub"')
 
2023-09-05 09:58:39.187 UTC 28316 melkij@postgres from [local] [vxid:3/0 txid:0] [START_REPLICATION] ERROR:  could not
createfile "pg_replslot/pubsub/state.tmp": File exists
 

As I found out, the disk with the database ran out of space, but it was so lucky that postgresql did not go into crash
recovery.Doubly lucky that logical walsender was able to create state.tmp, but could not write the contents and got
"ERROR:could not write to file "pg_replslot/pubsub/state.tmp": No space left on device". The empty state.tmp remained
ondisk. When the problem with free disk space was solved, the publication remained inoperative. To fix it, one need to
restartthe database (RestoreSlotFromDisk always deletes state.tmp) or delete state.tmp manually.
 

Maybe in SaveSlotToPath (src/backend/replication/slot.c) it's also worth deleting state.tmp if it already exists? All
operationsare performed under LWLock and there should be no parallel access.
 

PS: I reproduced the error on HEAD by adding pg_usleep to SaveSlotToPath before writing to file. At this time, I filled
upthe virtual disk.
 

regards, Sergei



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nazir Bilal Yavuz
Дата:
Сообщение: Remove unnecessary 'always:' from CompilerWarnings task
Следующее
От: John Naylor
Дата:
Сообщение: Re: Improving the heapgetpage function improves performance in common scenarios