Re: BUG #16039: PANIC when activating replication slots in Postgres12.0 64bit under Windows
От | Andres Freund |
---|---|
Тема | Re: BUG #16039: PANIC when activating replication slots in Postgres12.0 64bit under Windows |
Дата | |
Msg-id | 20191004200605.yqcmn75otebwcvyj@alap3.anarazel.de обсуждение исходный текст |
Ответ на | BUG #16039: PANIC when activating replication slots in Postgres 12.0 64bit under Windows (PG Bug reporting form <noreply@postgresql.org>) |
Ответы |
Re: BUG #16039: PANIC when activating replication slots in Postgres12.0 64bit under Windows
(Andres Freund <andres@anarazel.de>)
Re: BUG #16039: PANIC when activating replication slots in Postgres12.0 64bit under Windows (Michael Paquier <michael@paquier.xyz>) |
Список | pgsql-bugs |
Hi, Thanks for the report! On 2019-10-04 19:28:28 +0000, PG Bug reporting form wrote: > We just moved our production cluster from pg 11.5 to pg 12.0 under windows > using pg_dump/initdb/pg_restore. > > When we activated the replication slots by > > SELECT * FROM pg_create_physical_replication_slot('sam_repli_3'); > > and tried restarting the server, we got a PANIC in error log: > > CPS PRD 2019-10-04 19:10:07 CEST 00000 1:> LOG: database system was shut > down at 2019-10-04 19:10:02 CEST > CPS PRD 2019-10-04 19:10:07 CEST XX000 2:> PANIC: could not fsync file > "pg_replslot/sam_repli_3/state": Bad file descriptor > CPS PRD 2019-10-04 19:10:07 CEST 00000 6:> LOG: startup process (PID > 4028) was terminated by exception 0xC0000409 > CPS PRD 2019-10-04 19:10:07 CEST 00000 7:> HINT: See C include file > "ntstatus.h" for a description of the hexadecimal value. > CPS PRD 2019-10-04 19:10:07 CEST 00000 8:> LOG: aborting startup due to > startup process failure > CPS PRD 2019-10-04 19:10:07 CEST 00000 9:> LOG: database system is shut > down > > We use the EDB distribution from the website under Windows Server 2019 > (September 2019 patch level). > > select version (); > version > ------------------------------------------------------------ > PostgreSQL 12.0, compiled by Visual C++ build 1914, 64-bit > (1 Zeile) > > This seems to me like a fatal bug which makes the streaming replication > unusable under Windows x64 /pg12. > > The same configuration worked flawlessly under pg 11.x until pg 11.5 > > By searching on google we encountered a similar error from 2015 under pg > 9.4.1 reported under BUG #13287: > > https://www.postgresql.org/message-id/flat/20150514105514.2691.67352%40wrigleys.postgresql.org Uh, Michael? You just reintroduced this bug in commit 82a5649fb9dbef12d04cd24799be6bf298d889a6 Author: Michael Paquier <michael@paquier.xyz> Date: 2019-03-09 08:50:55 +0900 Tighten use of OpenTransientFile and CloseTransientFile This fixes two sets of issues related to the use of transient files in the backend: 1) OpenTransientFile() has been used in some code paths with read-write flags while read-only is sufficient, so switch those calls to be read-only where necessary. These have been reported by Joe Conway. You pretty much entirely reverted: commit dfbaed459754e71e01bb0cc90a12802bba3f9786 Author: Andres Freund <andres@anarazel.de> Date: 2015-04-28 00:12:38 +0200 Use a fd opened for read/write when syncing slots during startup. Some operating systems, including the reporter's windows, return EBADFD or similar when fsync() is invoked on a O_RDONLY file descriptor. Unfortunately RestoreSlotFromDisk() does exactly that; which causes failures after restarts in at least some scenarios. If you hit the bug the error message will be something like ERROR: could not fsync file "pg_replslot/$name/state": Bad file descriptor Simply use O_RDWR instead of O_RDONLY when opening the relevant file descriptor to fix the bug. Unfortunately I have no way of verifying the fix, but we've seen similar problems in the past. This bug goes back to 9.4 where slots were introduced. Backpatch accordingly. Reported-By: Patrice Drolet Bug: #13143: Discussion: 20150424101006.2556.60897@wrigleys.postgresql.org I realize I perhaps should have added a comment explaining this, but this is far from the only location that knows we have to know open fds r/w to be able to fsync them. What were you even trying to fix by changing this? Seems also pretty clear that we need a few animals running with fsync enabled. Not sure how we best can write test infrastructure to make it easy to set that for all tests. Guess I best start a thread about it on -hackers. Greetings, Andres Freund
В списке pgsql-bugs по дате отправления:
Предыдущее
От: Andres FreundДата:
Сообщение: Re: BUG #16036: Segmentation fault while doing an update
Следующее
От: PG Bug reporting formДата:
Сообщение: BUG #16040: PL/PGSQL RETURN QUERY statement never uses a parallel plan