Re: BUG #9118: WAL Sender does not disconnect replication clients during shutdown
От | Heikki Linnakangas |
---|---|
Тема | Re: BUG #9118: WAL Sender does not disconnect replication clients during shutdown |
Дата | |
Msg-id | 52F3425F.5050101@vmware.com обсуждение исходный текст |
Ответ на | BUG #9118: WAL Sender does not disconnect replication clients during shutdown (jhedden@apple.com) |
Ответы |
Re: BUG #9118: WAL Sender does not disconnect replication
clients during shutdown
(Fujii Masao <masao.fujii@gmail.com>)
|
Список | pgsql-bugs |
On 02/06/2014 05:08 AM, jhedden@apple.com wrote: > The following bug has been logged on the website: > > Bug reference: 9118 > Logged by: Joel Hedden > Email address: jhedden@apple.com > PostgreSQL version: 9.3.2 > Operating system: Mac OS X 10.9.1 > Description: > > I connect a pg_receivexlog instance and have "hot_standby" archiving > enabled, with "archive_command" defined correctly. When the WAL Sender > process receives a SIGUSR2 from the postmaster (or me), it fails to shut > down and pg_receivexlog remains connected. Upon inspection, it looks like > the test for "sentPtr == MyWalSnd->flush" is always false at > walsender.c:1058 (sentPtr is still non-zero) where the wal sender should be > shutting down. Replication and archiving seem to be working otherwise. > Killing pg_receivexlog allows for the WAL Sender to terminate. Hmm. Before exiting, walsender waits until the client has flushed all the WAL to disk. However, pg_receivexlog never sends a "flush" pointer back to the server, so the server waits forever. The first question is, why does pg_receivexlog not send its "flush" pointer back to the server? It *does* fsync the files to disk. However, currently it only fsyncs when closing a full segment, but when shutting down, the last segment would not be full, so to fix this issue it should be taught to fsync also partial segments. Second question is, does it make sense for the server to wait for all replication clients to flush the WAL? It seems reasonable for a standby server; after shutting down, the standby has all the WAL safely on disk. Although if a standby is not connected at the moment, all bets are off. It also seems reasonable for pg_receivexlog, except for the fact that pg_receivexlog never sends a flush pointer back. > This didn't affect 9.2.4 for me. Yeah, the waiting was introduced by commit bee4a4d361c054c531c3a27024f9ff3efef3635b, and wasn't included in 9.2.4. It is in 9.2.5, however. Quoting the original email Fujii posted about this (http://www.postgresql.org/message-id/CAHGQGwHLjEROTMtSWJd=xg_VFwRe3oJWnTYsyBDUbRYa6rr0DQ@mail.gmail.com): > But there is one problem: though walsender tries to send all the outstanding > WAL records, it doesn't wait for them to be replicated to the standby. IOW, > walsender closes the replication connection as soon as it sends WAL records. > Then, before receiving all the WAL records, walreceiver can detect > the closure of connection and exit. We cannot guarantee that there is no > missing WAL in the standby after clean shutdown of the master. In this case, > backup from new master is required when restarting the stopped master as > new standby. I have experienced this case several times, especially when > enabling WAL archiving. Fujii-san, how can walreceiver detect the closure of the connection, before reading all the buffered WAL from the TCP connection? What kind of log messages do you get when it happens? I tried to reproduce that with commit bee4a4d361c054c531c3a27024f9ff3efef3635b reverted, but couldn't. Although this was with master and standby running on same laptop, and this is essentially a race condition, so it's possible that I just didn't get the timing right to make it happen. - Heikki
В списке pgsql-bugs по дате отправления:
Предыдущее
От: michal.wos@leonisapps.plДата:
Сообщение: BUG #9117: PGXAConnection - equals method returning false
Следующее
От: Alexander HillДата:
Сообщение: Re: BUG #8354: stripped positions can generate nonzero rank in ts_rank_cd