Re: Unarchived WALs deleted after crash

Поиск
Список
Период
Сортировка
От Fujii Masao
Тема Re: Unarchived WALs deleted after crash
Дата
Msg-id CAHGQGwEEKbyo7=BHvOHF1QB=XrtdPq8P_SF1+ckSHp3Ei-ESCQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Unarchived WALs deleted after crash  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Ответы Re: Unarchived WALs deleted after crash  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Re: Unarchived WALs deleted after crash  (Simon Riggs <simon@2ndQuadrant.com>)
Список pgsql-hackers
On Fri, Feb 15, 2013 at 11:31 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> On 14.02.2013 17:45, Jehan-Guillaume de Rorthais wrote:
>>
>> I am facing an unexpected behavior on a 9.2.2 cluster that I can
>> reproduce on current HEAD.
>>
>> On a cluster with archive enabled but failing, after a crash of
>> postmaster, the checkpoint occurring before leaving the recovery mode
>> deletes any additional WALs, even those waiting to be archived.
>
>> ...
>> Is it expected ?
>
> No, it's a bug. Ouch. It was introduced in 9.2, by commit
> 5286105800c7d5902f98f32e11b209c471c0c69c:

Oh, sorry for my mistake.

>
>> -  /*
>> -   * Normally we don't delete old XLOG files during recovery to
>> -   * avoid accidentally deleting a file that looks stale due to a
>> -   * bug or hardware issue, but in fact contains important data.
>> -   * During streaming recovery, however, we will eventually fill the
>> -   * disk if we never clean up, so we have to. That's not an issue
>> -   * with file-based archive recovery because in that case we
>> -   * restore one XLOG file at a time, on-demand, and with a
>> -   * different filename that can't be confused with regular XLOG
>> -   * files.
>> -   */
>> -   if (WalRcvInProgress() || XLogArchiveCheckDone(xlde->d_name))
>> +   if (RecoveryInProgress() || XLogArchiveCheckDone(xlde->d_name))
>>          [ delete the file ]
>
>
> With that commit, we started to keep WAL segments restored from the archive
> in pg_xlog, so we needed to start deleting old segments during archive
> recovery, even when streaming replication was not active. But the above
> change was to broad; we started to delete old segments also during crash
> recovery.
>
> The above should check InArchiveRecovery, ie. only delete old files when in
> archive recovery, not when in crash recovery. But there's one little
> complication: InArchiveRecovery is currently only valid in the startup
> process, so we'll need to also share it in shared memory, so that the
> checkpointer process can access it.
>
> I propose the attached patch to fix it.

At least in 9.2, when the archived file is restored into pg_xlog, its xxx.done
archive status file is created. So we don't need to check InArchiveRecovery
when deleting old WAL files. Checking whether xxx.done exists is enough.

Unfortunately in HEAD, xxx.done file is not created when restoring archived
file because of absence of the patch. We need to implement that first.

Regards,

-- 
Fujii Masao



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Unarchived WALs deleted after crash
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Unarchived WALs deleted after crash