Re: Bug in walreceiver

Поиск
Список
Период
Сортировка
От Fujii Masao
Тема Re: Bug in walreceiver
Дата
Msg-id AANLkTi=dKkPCKZEB29VG0qc8RdUupav4j_avq16i=U4a@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Bug in walreceiver  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Ответы Re: Bug in walreceiver  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Список pgsql-hackers
On Thu, Jan 13, 2011 at 5:59 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 13.01.2011 10:28, Fujii Masao wrote:
>>
>> When the master shuts down or crashes, there seems to be
>> the case where walreceiver exits without flushing WAL which
>> has already been written. This might lead startup process to
>> replay un-flushed WAL and break a Write-Ahead-Logging rule.
>
> Hmm, that can happen at a crash even with no replication involved. If you
> "kill -9 postmaster", and some WAL had been written but not fsync'd, on
> crash recovery we will happily recover the unsynced WAL.

Right. If postmaster restarts immediately after kill -9, WAL which has not
reached to the disk might be replayed. Then if the server crashes when
min recovery point indicates such an unsynced WAL, the database would
get corrupted.

As you say, that is not just about replication. But that is more likely to
happen in the standby because unsynced WAL appears while recovery
is in progress. This is one of reasons why walreceiver doesn't let the
startup process know that new WAL has arrived before flushing it, I think.

So I believe that the patch is somewhat worth applying.

BTW, another good point of the patch is that we can track the last WAL
receive location correctly. Since WalRcv->receivedUpto is updated
after WAL flush, if the patch is not applied, the location of WAL received
just before walreceiver exits might not be saved in WalRcv->receivedUpto.

> We could prevent
> that by fsyncing all WAL before applying it - presumably fsyncing a file
> that has already been flushed is quick. But is it worth the trouble?

No. It looks overkill though it would completely prevent the problem.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dimitri Fontaine
Дата:
Сообщение: Re: Add function dependencies
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Error code for "terminating connection due to conflict with recovery"