Re: BUG #13143: Cannot stop and restart a streaming server with a replication slot

Поиск
Список
Период
Сортировка
От Alvaro Herrera
Тема Re: BUG #13143: Cannot stop and restart a streaming server with a replication slot
Дата
Msg-id 20150427144447.GT4369@alvh.no-ip.org
обсуждение исходный текст
Ответ на Re: BUG #13143: Cannot stop and restart a streaming server with a replication slot  (Andres Freund <andres@anarazel.de>)
Ответы Re: BUG #13143: Cannot stop and restart a streaming server with a replication slot  (Andres Freund <andres@anarazel.de>)
Список pgsql-bugs
Andres Freund wrote:

> On 2015-04-24 10:10:06 +0000, pdrolet@infodata.ca wrote:

> > 2015-04-24 04:47:12 EDT LOG:  le système de bases de données a été arrêté à
> > 2015-04-24 04:44:37 EDT
> > 2015-04-24 04:47:12 EDT PANIC:  n'a pas pu synchroniser sur disque (fsync)
> > le fichier « pg_replslot/node_win2012sec/state » : Bad file descriptor
> > 2015-04-24 04:47:12 EDT LOG:  processus de lancement (PID 23180) quitte avec
> > le code de sortie 3
> > 2015-04-24 04:47:12 EDT LOG:  annulation du démarrage à cause d'un échec
> > dans le processus de lancement
> >
> > To restart the server, I have to manually delete the folder in pg_replslot.
> > But then I need to re build the slave. Not very practical for a multi
> > gigabyte database.
>
> Obviously that's not how it supposed to be. I don't have access to a
> windows systems, much less a french one unfortunately.

I think this is failing in the fsync_fname() call in slot.c line 1045
(REL9_4_STABLE).  Notice it's in a critical section (hence PANIC) and
isdir=false.  This happens just after the rename() from tmppath to path;
maybe the file is "busy" and could not be renamed?  Anyway the rename
itself didn't fail, and the file (under the new name) could be opened by
fd.c, otherwise the error would say "could not open" instead of "could
not fsync".

There are many other callers of rename() and none of them seem to have
special cases for WIN32 specifically; they all assume it works.  (Some
of them are in turn special cases related to link/unlink).

The vast majority of callers of fsync_fname() are related to logical
decoding, so it seems fair game to assume that that code is missing a
trick or two.

> 2) Check that it's unrelated to any anti-virus software running?

It seems likely that something like this is related.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Daniele Varrazzo
Дата:
Сообщение: Re: Client deadlocks when connecting via ssl
Следующее
От: Robert Haas
Дата:
Сообщение: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)