Обсуждение: How can you get "WAL segment has already been removed" when doing synchronous replication ?!

Поиск
Список
Период
Сортировка

How can you get "WAL segment has already been removed" when doing synchronous replication ?!

От
hubert depesz lubaczewski
Дата:
We are seeing situation like this:
1. 9.2.4 database
2. Master settings:
           name            |    setting
---------------------------+---------------
 fsync                     | on
 synchronize_seqscans      | on
 synchronous_commit        | remote_write
 synchronous_standby_names | *
 wal_sync_method           | open_datasync
(5 rows)

Yet, every now and then we're getting:
FATAL:  requested WAL segment * has already been removed

Assuming no part of the system is issuing "set synchronous_commit
= off", how can we get in such situation?

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
                                                             http://depesz.com/



On Thu, Jul 11, 2013 at 11:18 PM, hubert depesz lubaczewski <depesz@depesz.com> wrote:
We are seeing situation like this:
1. 9.2.4 database
2. Master settings:
           name            |    setting
---------------------------+---------------
 fsync                     | on
 synchronize_seqscans      | on
 synchronous_commit        | remote_write
 synchronous_standby_names | *
 wal_sync_method           | open_datasync
(5 rows)

Yet, every now and then we're getting:
FATAL:  requested WAL segment * has already been removed

Assuming no part of the system is issuing "set synchronous_commit
= off", how can we get in such situation?

Best regards,

depesz


Increasing the wal_keep_segments ?

---
Regards,
Raghavendra
EnterpriseDB Corporation

Re: How can you get "WAL segment has already been removed" when doing synchronous replication ?!

От
hubert depesz lubaczewski
Дата:
On Thu, Jul 11, 2013 at 11:29:24PM +0530, Raghavendra wrote:
> On Thu, Jul 11, 2013 at 11:18 PM, hubert depesz lubaczewski <
> depesz@depesz.com> wrote:
>
> > We are seeing situation like this:
> > 1. 9.2.4 database
> > 2. Master settings:
> >            name            |    setting
> > ---------------------------+---------------
> >  fsync                     | on
> >  synchronize_seqscans      | on
> >  synchronous_commit        | remote_write
> >  synchronous_standby_names | *
> >  wal_sync_method           | open_datasync
> > (5 rows)
> >
> > Yet, every now and then we're getting:
> > FATAL:  requested WAL segment * has already been removed
> >
> > Assuming no part of the system is issuing "set synchronous_commit
> > = off", how can we get in such situation?
> >
> > Best regards,
> >
> > depesz
> >
> >
> Increasing the wal_keep_segments ?

I know that I can increase wal_keep_segments to "solve" it, but
shouldn't it be *impossible* to happen with synchronous replication?
After all - all commits should wait for slave to be 100% up to date!

Best regards,

depesz



On Thu, Jul 11, 2013 at 11:31 PM, hubert depesz lubaczewski
<depesz@depesz.com> wrote:
> On Thu, Jul 11, 2013 at 11:29:24PM +0530, Raghavendra wrote:
>> On Thu, Jul 11, 2013 at 11:18 PM, hubert depesz lubaczewski <
>> depesz@depesz.com> wrote:
>>
>> > We are seeing situation like this:
>> > 1. 9.2.4 database
>> > 2. Master settings:
>> >            name            |    setting
>> > ---------------------------+---------------
>> >  fsync                     | on
>> >  synchronize_seqscans      | on
>> >  synchronous_commit        | remote_write
>> >  synchronous_standby_names | *
>> >  wal_sync_method           | open_datasync
>> > (5 rows)
>> >
>> > Yet, every now and then we're getting:
>> > FATAL:  requested WAL segment * has already been removed
>> >
>> > Assuming no part of the system is issuing "set synchronous_commit
>> > = off", how can we get in such situation?
>> >
>> > Best regards,
>> >
>> > depesz
>> >
>> >
>> Increasing the wal_keep_segments ?
>
> I know that I can increase wal_keep_segments to "solve" it, but
> shouldn't it be *impossible* to happen with synchronous replication?
> After all - all commits should wait for slave to be 100% up to date!
>

Is it possible that xlog recycling might have caused this wherein the
xlog segment which is yet to be archived/shipped is recycled? I
remember something of that sort. Check this discussion:

http://www.postgresql.org/message-id/51779B3B.1020003@lab.ntt.co.jp

Is this logged on the master or a standby?

--
Amit Langote


Re: How can you get "WAL segment has already been removed" when doing synchronous replication ?!

От
hubert depesz lubaczewski
Дата:
On Fri, Jul 12, 2013 at 12:30:22PM +0530, Amit Langote wrote:
> >> Increasing the wal_keep_segments ?
> > I know that I can increase wal_keep_segments to "solve" it, but
> > shouldn't it be *impossible* to happen with synchronous replication?
> > After all - all commits should wait for slave to be 100% up to date!
> Is it possible that xlog recycling might have caused this wherein the
> xlog segment which is yet to be archived/shipped is recycled? I

As far as I know, pg will not recycle log before it's archived.
Otherwise we wouldn't be able to have archives.

> remember something of that sort. Check this discussion:
> http://www.postgresql.org/message-id/51779B3B.1020003@lab.ntt.co.jp
> Is this logged on the master or a standby?

master.

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
                                                             http://depesz.com/


On Thu, Jul 11, 2013 at 11:01 AM, hubert depesz lubaczewski
<depesz@depesz.com> wrote:
> On Thu, Jul 11, 2013 at 11:29:24PM +0530, Raghavendra wrote:
>> On Thu, Jul 11, 2013 at 11:18 PM, hubert depesz lubaczewski <
>> depesz@depesz.com> wrote:
>>
>> >
>> > Yet, every now and then we're getting:
>> > FATAL:  requested WAL segment * has already been removed
>> >
>> > Assuming no part of the system is issuing "set synchronous_commit
>> > = off", how can we get in such situation?
>> >
>> > Best regards,
>> >
>> > depesz
>> >
>> >
>> Increasing the wal_keep_segments ?
>
> I know that I can increase wal_keep_segments to "solve" it, but
> shouldn't it be *impossible* to happen with synchronous replication?

If a single transaction spans over both log switch boundaries and
checkpoint boundaries (at least two of the later, I think) it is
possible for a file to be recycled before the commit, and hence before
any attempt to synch-to-standby has occured.

> After all - all commits should wait for slave to be 100% up to date!

But if the file isn't there on the sending end, no amount of waiting can help.

It looks like what is needed is to invoke the SyncRepWaitForLSN code
just before log file recycle, as well as upon transaction commit.
I'm not sure why that isn't already done indirectly.  Doesn't the
checkpointer insert a WAL record upon completion of a checkpoint
indicating that completion, before any recycling is attempted?  Surely
the LSN of that record is higher than that in any file becoming
eligible for recycling.  But I guess that that record is not a commit
record, so does not trigger the sync rep.

Cheers,

Jeff