Re: logical replication: could not create file "state.tmp": Fileexists
От | Grigory Smolkin |
---|---|
Тема | Re: logical replication: could not create file "state.tmp": Fileexists |
Дата | |
Msg-id | 14b3454f-2d68-c637-68e4-2b42ff976168@postgrespro.ru обсуждение исходный текст |
Ответ на | Re: logical replication: could not create file "state.tmp": Fileexists (Andres Freund <andres@anarazel.de>) |
Список | pgsql-bugs |
On 12/2/19 7:12 PM, Andres Freund wrote: > Hi, > > On 2019-11-30 15:09:39 +0300, Grigory Smolkin wrote: >> One of my colleagues encountered an out of space condition, which broke his >> logical replication setup. >> It`s manifested with the following errors: >> >> ERROR: could not receive data from WAL stream: ERROR: could not create >> file "pg_replslot/some_sub/state.tmp": File exists > Hm. What was the log output leading to this state? Some cases of this > would end up in a PANIC, which'd remove the .tmp file during > recovery. But there's some where we won't - it seems the right fix for > this would be to unlink the tmp file in that case? > > >> I`ve digged a bit into this problem, and it`s turned out that in >> SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT | >> O_EXCL' flags, which makes this routine as not very reentrant. >> >> Since an exclusive lock is taken before temp file creation, I think it >> should be safe to replace O_EXCL with O_TRUNC. > I'm very doubtful about this. I think it's a good safety measure to > ensure that there's no previous state file that we're somehow > overwriting. > > >> Script to reproduce and patch are attached. > Well: > >> # Imitate out_of_space/write_operation_error >> touch ${PGDATA_PUB}/pg_replslot/mysub/state.tmp > Doesn't really replicate how we got into this state... I`ve managed to reproduce the issue using the attached script: 2019-12-02 20:32:12.547 MSK [4180] ERROR: could not write to file "pg_logical/snapshots/0-160E9A28.snap.4180.tmp": No space left on device 2019-12-02 20:32:12.548 MSK [4178] ERROR: could not receive data from WAL stream: ERROR: could not write to file "pg_logical/snapshots/0-160E9A28.snap.4180.tmp": No space left on device 2019-12-02 20:32:12.557 MSK [4645] LOG: logical replication apply worker for subscription "mysub" has started 2019-12-02 20:32:12.562 MSK [4091] LOG: background worker "logical replication worker" (PID 4178) exited with exit code 1 2019-12-02 20:32:12.569 MSK [4648] LOG: starting logical decoding for slot "mysub" 2019-12-02 20:32:12.569 MSK [4648] DETAIL: Streaming transactions committing after 0/160E9A28, reading WAL from 0/1566A2D8. 2019-12-02 20:32:12.569 MSK [4648] LOG: logical decoding found consistent point at 0/1566A2D8 2019-12-02 20:32:12.569 MSK [4648] DETAIL: Logical decoding will begin using saved snapshot. 2019-12-02 20:32:12.672 MSK [4648] ERROR: could not write to file "pg_replslot/mysub/state.tmp": No space left on device 2019-12-02 20:32:12.672 MSK [4645] ERROR: could not receive data from WAL stream: ERROR: could not write to file "pg_replslot/mysub/state.tmp": No space left on device 2019-12-02 20:32:12.672 MSK [4091] LOG: background worker "logical replication worker" (PID 4645) exited with exit code 1 2019-12-02 20:32:17.684 MSK [4653] LOG: logical replication apply worker for subscription "mysub" has started 2019-12-02 20:32:17.693 MSK [4654] LOG: starting logical decoding for slot "mysub" 2019-12-02 20:32:17.693 MSK [4654] DETAIL: Streaming transactions committing after 0/160E9A28, reading WAL from 0/1566A2D8. 2019-12-02 20:32:17.693 MSK [4654] LOG: logical decoding found consistent point at 0/1566A2D8 2019-12-02 20:32:17.693 MSK [4654] DETAIL: Logical decoding will begin using saved snapshot. 2019-12-02 20:32:17.847 MSK [4654] ERROR: could not write to file "pg_logical/snapshots/0-160E9A28.snap.4654.tmp": No space left on device 2019-12-02 20:32:17.847 MSK [4653] ERROR: could not receive data from WAL stream: ERROR: could not write to file "pg_logical/snapshots/0-160E9A28.snap.4654.tmp": No space left on device 2019-12-02 20:32:17.848 MSK [4091] LOG: background worker "logical replication worker" (PID 4653) exited with exit code 1 2019-12-02 20:32:22.859 MSK [4658] LOG: logical replication apply worker for subscription "mysub" has started 2019-12-02 20:32:22.868 MSK [4659] LOG: starting logical decoding for slot "mysub" 2019-12-02 20:32:22.868 MSK [4659] DETAIL: Streaming transactions committing after 0/160E9A28, reading WAL from 0/1566A2D8. 2019-12-02 20:32:22.868 MSK [4659] LOG: logical decoding found consistent point at 0/1566A2D8 2019-12-02 20:32:22.868 MSK [4659] DETAIL: Logical decoding will begin using saved snapshot. 2019-12-02 20:32:22.868 MSK [4659] ERROR: could not create file "pg_replslot/mysub/state.tmp": File exists 2019-12-02 20:32:22.869 MSK [4658] ERROR: could not receive data from WAL stream: ERROR: could not create file "pg_replslot/mysub/state.tmp": File exists > > Greetings, > > Andres Freund -- Grigory Smolkin Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Вложения
В списке pgsql-bugs по дате отправления:
Предыдущее
От: Dmitry VasilievДата:
Сообщение: Re: logical replication: could not create file "state.tmp": File exists
Следующее
От: Thomas MunroДата:
Сообщение: Re: Since '2001-09-09 01:46:40'::timestamp microseconds are lost whenextracting epoch