Re: Streaming replication and a disk full in primary
От | Heikki Linnakangas |
---|---|
Тема | Re: Streaming replication and a disk full in primary |
Дата | |
Msg-id | 4B58605F.8090908@enterprisedb.com обсуждение исходный текст |
Ответ на | Streaming replication and a disk full in primary (Fujii Masao <masao.fujii@gmail.com>) |
Ответы |
Re: Streaming replication and a disk full in primary
(Fujii Masao <masao.fujii@gmail.com>)
|
Список | pgsql-hackers |
Fujii Masao wrote: > If the primary has a connected standby, the WAL files required for > the standby cannot be deleted. So if it has fallen too far behind > for some reasons, a disk full failure might occur on the primary. > This is one of the problems that should be fixed for v9.0. > > We can cope with that case by carefully monitoring the standby lag. > In addition to this, I think that we should put an upper limit on > the number of WAL files held in pg_xlog for the standby (i.e., > the maximum delay of the standby) as a safeguard against a disk > full error. > > The attached patch introduces new GUC 'replication_lag_segments' > which specifies the maximum number of WAL files held in pg_xlog > to send to the standby. The replication to the standby which > falls more than the upper limit behind is automatically terminated, > which would avoid a disk full erro on the primary. Thanks! I don't think we should do the check XLogWrite(). There's really no reason to kill the standby connections before the next checkpoint, when the old WAL files are recycled. XLogWrite() is in the critical path of normal operations, too. There's another important reason for that: If archiving is not working for some reason, the standby can't obtain the old segments from the archive either. If we refuse to stream such old segments, and they're not getting archived, the standby has no way to catch up until archiving is fixed. Allowing streaming of such old segments is free wrt. disk space, because we're keeping the files around anyway. Walreceiver will get an error if it tries to open a segment that's been deleted or recycled already. The dangerous situation we need to avoid is when walreceiver holds a file open while bgwriter recycles it. Walreceiver will merrily continue streaming data from it, even though it's be overwritten by new data already. A straightforward fix is to keep an "newest recycled XLogRecPtr" in shared memory that RemoveOldXlogFiles() updates. Walreceiver checks it right after read()ing from a file, before sending it to the client, and throws an error if the data it read() was already recycled. Or you could do it entirely in walreceiver, by calling fstat() on the open file instead of checking the variable in shared memory. If the filename isn't what you expect, indicating that it's been recycled, throw an error. But that needs an extra fstat() call for every read(). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: