Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)

Поиск
Список
Период
Сортировка
От Fujii Masao
Тема Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
Дата
Msg-id CAHGQGwE3wOCr8Z_G6nHu8a6+ZoZriFUnRr0NPbgkdVZ5EM_CiQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)  (Michael Paquier <michael.paquier@gmail.com>)
Ответы Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
Список pgsql-hackers
On Thu, Oct 9, 2014 at 3:26 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
>
>
> On Wed, Oct 8, 2014 at 10:00 PM, Michael Paquier <michael.paquier@gmail.com>
> wrote:
>>
>> The additional process at promotion sounds like a good idea, I'll try to
>> get a patch done tomorrow. This would result as well in removing the
>> XLogArchiveForceDone stuff. Either way, not that I have been able to
>> reproduce the problem manually, things can be clearly solved.
>
> Please find attached two patches aimed to fix this issue and to improve the
> situation:
> - 0001 prevents the apparition of those phantom WAL segment file by ensuring
> that when a node is in recovery it will remove it whatever its status in
> archive_status. This patch is the real fix, and should be applied down to
> 9.2.
> - 0002 is a patch implementing Heikki's idea of enforcing all the segment
> files present in pg_xlog to have their status to .done, marking them for
> removal. When looking at the code, I finally concluded that Fujii-san's
> point, about marking in all cases as .done segment files that have been
> fully streamed, actually makes more sense to not be backward. This patch
> would actually not be mandatory for back-patching, but it makes the process
> more robust IMO.

Thanks for the patches!

I found one problem in the 0002 patch. The patch changes the recovery so that
it creates .done files for every WAL files which exist in pg_xlog directory at
the end of recovery. But even WAL files which will have to be archived later
can exist in pg_xlog at that moment. For example, the latest, recycled and
fully-written-but-not-archived-yet (i.e., maybe having .ready files) WAL files.
The patch wrongly prevents them from being archvied at all.

ISTM that the 0001 patch has the similar problem. Please imagine the following
scenario.

1. There are many unarchived WAL files in pg_xlog because of the continuous   failure of archive_command, for example,
andthen the server unfortunately   crashes because of the corruption of database itself.
 

2. DBA restores the backup onto the server and copies all the WAL files    from old pg_xlog to new one. Then he or she
preparesfor archive recovery.
 

3. DBA starts the server and the archive recovery starts.

4. After all the archived WAL files are replayed, the server tries to replay    the WAL files in pg_xlog. Since there
aremany WAL files in pg_xlog,    more than one restartpoints happen while they are being replayed.
 

In this case, the patch seems to make the restartpoint recycle even WAL files
which have .ready files and will have to be archived later. Thought?

Regards,

-- 
Fujii Masao



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: CREATE POLICY and RETURNING
Следующее
От: Robert Haas
Дата:
Сообщение: Re: CREATE POLICY and RETURNING