Re: Bogus WAL segments archived after promotion

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: Bogus WAL segments archived after promotion
Дата
Msg-id CAB7nPqSDdF0heotQU3gsepgqx+9c+6KjLd3R6aNYH7KKfDd2ig@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Bogus WAL segments archived after promotion  (Heikki Linnakangas <hlinnaka@iki.fi>)
Ответы Re: [HACKERS] Bogus WAL segments archived after promotion  (Bruce Momjian <bruce@momjian.us>)
Список pgsql-hackers
On Mon, Apr 13, 2015 at 11:57 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> On 04/01/2015 07:12 PM, Bruce Momjian wrote:
>>
>> On Fri, Dec 19, 2014 at 10:26:34PM +0200, Heikki Linnakangas wrote:
>>>
>>> On 12/19/2014 02:55 PM, Heikki Linnakangas wrote:
>>>>
>>>> I'm thinking that we should add a step to promotion, where we scan
>>>> pg_xlog for any segments higher than the timeline switch point, and
>>>> remove them, or mark them with .done so that they are not archived.
>>>> There might be some real WAL that was streamed from the primary, but not
>>>> yet applied, but such WAL is of no interest to that server anyway, after
>>>> it's been promoted. It's a bit disconcerting to zap WAL that's valid,
>>>> even if doesn't belong to the current server's timeline history, because
>>>> as a general rule it's good to avoid destroying evidence that might be
>>>> useful in debugging. There isn't much difference between removing them
>>>> immediately and marking them as .done, though, because they will
>>>> eventually be removed/recycled anyway if they're marked as .done.
>>>
>>>
>>> This is what I came up with. This patch removes the suspect segments
>>> at timeline switch. The alternative of creating .done files for them
>>> would preserve more evidence for debugging, but OTOH it would also
>>> be very confusing to have valid-looking WAL segments in pg_xlog,
>>> with .done files, that in fact contain garbage.
>>>
>>> The patch is a bit longer than it otherwise would be, because I
>>> moved the code to remove a single file from RemoveOldXlogFiles() to
>>> a new function. I think that makes it more readable in any case,
>>> simply because it was so deeply indented in RemoveOldXlogFiles.
>>
>>
>> Where are we on this?
>
>
> I didn't hear any better ideas, so committed this now.

Finally looking at that... The commit log of b2a5545 is a bit
misleading. Segment files that were recycled during archive recovery
are not necessarily removed, they could be recycled as well during
promotion on the new timeline in line with what RemoveOldXlogFiles()
does. Hence I think that the comment on top of
RemoveNonParentXlogFiles() should be updated to reflect that like in
the patch attached.

Something minor: perhaps we could refactor xlogarchive.c to have
XLogArchiveCheckDone() and XLogArchiveIsBusy() use the new
XLogArchiveIsReady().
Regards,
--
Michael

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Code paths where LWLock should be released on failure
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Code paths where LWLock should be released on failure