Re: logical replication: could not create file "state.tmp": File exists

Поиск
Список
Период
Сортировка
От Dmitry Vasiliev
Тема Re: logical replication: could not create file "state.tmp": File exists
Дата
Msg-id CANCe5h0su4Jn7giDhWs0He=QSSXGEAWzijApet5K2PMSO9j5dQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: logical replication: could not create file "state.tmp": Fileexists  (Grigory Smolkin <g.smolkin@postgrespro.ru>)
Список pgsql-bugs
Here's what happened from the publisher and subscriber point of view:

publisher:  (some query) ERROR: could not write to tuplestore temporary file: No space left on device
subscriber: db =, user =, app =, client = ERROR: could not receive data from WAL stream: ERROR: could not write to file "pg_logical/snapshots/2AE-F3E52FB8.snap.27574.tmp": No space left on device
subscriber: db =, user =, app =, client = LOG: background worker "logical replication worker" (PID 23114) exited with exit code 1
subscriber: db =, user =, app =, client = LOG: logical replication apply worker for subscription "<name> _sub" has started
publisher:  LOG: received replication command: IDENTIFY_SYSTEM
publisher:  LOG: received replication command: START_REPLICATION SLOT "<name> _sub" LOGICAL 2AE/F3C0B920 (proto_version '1', publication_names '"<name>_pub"')
publisher:  ERROR: could not create file "pg_replslot /<name>_sub/state.tmp": File exists

I think some publisher logs may not be available due to out of space problem.

On Mon, Dec 2, 2019 at 7:54 PM Grigory Smolkin <g.smolkin@postgrespro.ru> wrote:

On 12/2/19 7:12 PM, Andres Freund wrote:
> Hi,
>
> On 2019-11-30 15:09:39 +0300, Grigory Smolkin wrote:
>> One of my colleagues encountered an out of space condition, which broke his
>> logical replication setup.
>> It`s manifested with the following errors:
>>
>> ERROR:  could not receive data from WAL stream: ERROR:  could not create
>> file "pg_replslot/some_sub/state.tmp": File exists
> Hm. What was the log output leading to this state? Some cases of this
> would end up in a PANIC, which'd remove the .tmp file during
> recovery. But there's some where we won't - it seems the right fix for
> this would be to unlink the tmp file in that case?
>
>
>> I`ve digged a bit into this problem, and it`s turned out that in
>> SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT |
>> O_EXCL' flags, which makes this routine as not very reentrant.
>>
>> Since an exclusive lock is taken before temp file creation, I think it
>> should be safe to replace O_EXCL with O_TRUNC.
> I'm very doubtful about this. I think it's a good safety measure to
> ensure that there's no previous state file that we're somehow
> overwriting.
Is it possible with exclusive lock taken before that?
>
>
>> Script to reproduce and patch are attached.
> Well:
>
>> # Imitate out_of_space/write_operation_error
>> touch ${PGDATA_PUB}/pg_replslot/mysub/state.tmp
> Doesn't really replicate how we got into this state...

But it replicate the exactly the same state we would get, if write() to
temp file would have failed with out of space.


>
> Greetings,
>
> Andres Freund
>
>
--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Grigory Smolkin
Дата:
Сообщение: Re: logical replication: could not create file "state.tmp": Fileexists
Следующее
От: Grigory Smolkin
Дата:
Сообщение: Re: logical replication: could not create file "state.tmp": Fileexists