Обсуждение: Bogus WAL segments archived after promotion

Поиск
Список
Период
Сортировка

Bogus WAL segments archived after promotion

От
Heikki Linnakangas
Дата:
When streaming replication was introduced in 9.0, we started to recycle 
old WAL segments in archive recovery, like we do during normal 
operation. The WAL segments are recycled on the current timeline. There 
is no guarantee that they are useful, if the current timeline changes, 
because we step to recover another timeline after that, or the standby 
is promoted, but that was thought to be harmless.

However, consider what happens after a server is promoted, and WAL 
archiving is enabled. The server's pg_xlog directory will look something 
like this:

> -rw------- 1 heikki heikki 16777216 Dec 19 14:22 000000010000000000000005
> -rw------- 1 heikki heikki 16777216 Dec 19 14:23 000000010000000000000006
> -rw------- 1 heikki heikki 16777216 Dec 19 14:23 000000010000000000000007
> -rw------- 1 heikki heikki 16777216 Dec 19 14:23 000000010000000000000008
> -rw------- 1 heikki heikki 16777216 Dec 19 14:23 000000010000000000000009
> -rw------- 1 heikki heikki 16777216 Dec 19 14:23 00000001000000000000000A
> -rw------- 1 heikki heikki 16777216 Dec 19 14:23 00000001000000000000000B
> -rw------- 1 heikki heikki 16777216 Dec 19 14:23 00000001000000000000000C
> -rw------- 1 heikki heikki 16777216 Dec 19 14:23 00000001000000000000000D
> -rw------- 1 heikki heikki 16777216 Dec 19 14:23 00000001000000000000000E
> -rw------- 1 heikki heikki 16777216 Dec 19 14:23 00000001000000000000000F
> -rw------- 1 heikki heikki 16777216 Dec 19 14:23 000000010000000000000010
> -rw------- 1 heikki heikki 16777216 Dec 19 14:23 000000010000000000000011
> -rw------- 1 heikki heikki 16777216 Dec 19 14:23 000000010000000000000012
> -rw------- 1 heikki heikki 16777216 Dec 19 14:23 000000010000000000000013
> -rw------- 1 heikki heikki 16777216 Dec 19 14:23 000000010000000000000014
> -rw------- 1 heikki heikki 16777216 Dec 19 14:23 000000010000000000000015
> -rw------- 1 heikki heikki 16777216 Dec 19 14:23 000000010000000000000016
> -rw------- 1 heikki heikki 16777216 Dec 19 14:23 000000010000000000000017
> -rw------- 1 heikki heikki 16777216 Dec 19 14:23 000000010000000000000018
> -rw------- 1 heikki heikki 16777216 Dec 19 14:24 000000010000000000000019
> -rw------- 1 heikki heikki 16777216 Dec 19 14:22 00000001000000000000001A
> -rw------- 1 heikki heikki 16777216 Dec 19 14:22 00000001000000000000001B
> -rw------- 1 heikki heikki 16777216 Dec 19 14:22 00000001000000000000001C
> -rw------- 1 heikki heikki 16777216 Dec 19 14:24 000000020000000000000019
> -rw------- 1 heikki heikki 16777216 Dec 19 14:24 00000002000000000000001A
> -rw------- 1 heikki heikki       42 Dec 19 14:24 00000002.history

The files on timeline 1, up to 000000010000000000000019, are valid 
segments, streamed from the primary or restored from the WAL archive. 
The segments 00000001000000000000001A and 00000001000000000000001B are 
recycled segments that haven't been reused yet. Their contents are not 
valid (they contain records from some earlier point in WAL, but it might 
as well be garbage).

The server was promoted within the segment 19, and a new timeline was 
started. Segments 000000020000000000000019 and 00000002000000000000001A 
contain valid WAL on the new timeline.

Now, after enough time passes that the bogus 00000001000000000000001A 
and 00000001000000000000001B segments become old enough to be recycled, 
the system will see that there is no .ready or .done file for them, and 
will create .ready files so that they are archived. And they are 
archived. That's bogus, because the files are bogus. Worse, if the 
primary server where this server was forked off from continues running, 
and creates the genuine 00000001000000000000001A and 
00000001000000000000001B segments, it can fail to archive them if the 
standby had already archived the bogus segments with the same names.

We must somehow prevent the recycled, but not yet used, segments from 
being archived. One idea is to not create them in the first place, i.e. 
don't recycle old segments during recovery, just delete them and have 
new ones be created on demand. That's simple, but would hurt performance.

I'm thinking that we should add a step to promotion, where we scan 
pg_xlog for any segments higher than the timeline switch point, and 
remove them, or mark them with .done so that they are not archived. 
There might be some real WAL that was streamed from the primary, but not 
yet applied, but such WAL is of no interest to that server anyway, after 
it's been promoted. It's a bit disconcerting to zap WAL that's valid, 
even if doesn't belong to the current server's timeline history, because 
as a general rule it's good to avoid destroying evidence that might be 
useful in debugging. There isn't much difference between removing them 
immediately and marking them as .done, though, because they will 
eventually be removed/recycled anyway if they're marked as .done.

The archival behaviour at promotion is a bit inconsistent and weird 
anyway; even valid, streamed WAL is marked as .done and not archived 
anyway, except for the last partial segment. We're discussing that in 
the other thread (Streaming replication and WAL archive interactions, 
http://www.postgresql.org/message-id/689EB259-44C2-4820-B901-4F6B1C55A1E4@simply.name), 
but it would be good have a small, back-patchable fix to prevent bogus 
segments from being archived.

- Heikki



Re: Bogus WAL segments archived after promotion

От
Heikki Linnakangas
Дата:
On 12/19/2014 02:55 PM, Heikki Linnakangas wrote:
> I'm thinking that we should add a step to promotion, where we scan
> pg_xlog for any segments higher than the timeline switch point, and
> remove them, or mark them with .done so that they are not archived.
> There might be some real WAL that was streamed from the primary, but not
> yet applied, but such WAL is of no interest to that server anyway, after
> it's been promoted. It's a bit disconcerting to zap WAL that's valid,
> even if doesn't belong to the current server's timeline history, because
> as a general rule it's good to avoid destroying evidence that might be
> useful in debugging. There isn't much difference between removing them
> immediately and marking them as .done, though, because they will
> eventually be removed/recycled anyway if they're marked as .done.

This is what I came up with. This patch removes the suspect segments at
timeline switch. The alternative of creating .done files for them would
preserve more evidence for debugging, but OTOH it would also be very
confusing to have valid-looking WAL segments in pg_xlog, with .done
files, that in fact contain garbage.

The patch is a bit longer than it otherwise would be, because I moved
the code to remove a single file from RemoveOldXlogFiles() to a new
function. I think that makes it more readable in any case, simply
because it was so deeply indented in RemoveOldXlogFiles.

Thoughts?

- Heikki

Вложения

Re: Bogus WAL segments archived after promotion

От
Bruce Momjian
Дата:
On Fri, Dec 19, 2014 at 10:26:34PM +0200, Heikki Linnakangas wrote:
> On 12/19/2014 02:55 PM, Heikki Linnakangas wrote:
> >I'm thinking that we should add a step to promotion, where we scan
> >pg_xlog for any segments higher than the timeline switch point, and
> >remove them, or mark them with .done so that they are not archived.
> >There might be some real WAL that was streamed from the primary, but not
> >yet applied, but such WAL is of no interest to that server anyway, after
> >it's been promoted. It's a bit disconcerting to zap WAL that's valid,
> >even if doesn't belong to the current server's timeline history, because
> >as a general rule it's good to avoid destroying evidence that might be
> >useful in debugging. There isn't much difference between removing them
> >immediately and marking them as .done, though, because they will
> >eventually be removed/recycled anyway if they're marked as .done.
> 
> This is what I came up with. This patch removes the suspect segments
> at timeline switch. The alternative of creating .done files for them
> would preserve more evidence for debugging, but OTOH it would also
> be very confusing to have valid-looking WAL segments in pg_xlog,
> with .done files, that in fact contain garbage.
> 
> The patch is a bit longer than it otherwise would be, because I
> moved the code to remove a single file from RemoveOldXlogFiles() to
> a new function. I think that makes it more readable in any case,
> simply because it was so deeply indented in RemoveOldXlogFiles.

Where are we on this?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Bogus WAL segments archived after promotion

От
Heikki Linnakangas
Дата:
On 04/01/2015 07:12 PM, Bruce Momjian wrote:
> On Fri, Dec 19, 2014 at 10:26:34PM +0200, Heikki Linnakangas wrote:
>> On 12/19/2014 02:55 PM, Heikki Linnakangas wrote:
>>> I'm thinking that we should add a step to promotion, where we scan
>>> pg_xlog for any segments higher than the timeline switch point, and
>>> remove them, or mark them with .done so that they are not archived.
>>> There might be some real WAL that was streamed from the primary, but not
>>> yet applied, but such WAL is of no interest to that server anyway, after
>>> it's been promoted. It's a bit disconcerting to zap WAL that's valid,
>>> even if doesn't belong to the current server's timeline history, because
>>> as a general rule it's good to avoid destroying evidence that might be
>>> useful in debugging. There isn't much difference between removing them
>>> immediately and marking them as .done, though, because they will
>>> eventually be removed/recycled anyway if they're marked as .done.
>>
>> This is what I came up with. This patch removes the suspect segments
>> at timeline switch. The alternative of creating .done files for them
>> would preserve more evidence for debugging, but OTOH it would also
>> be very confusing to have valid-looking WAL segments in pg_xlog,
>> with .done files, that in fact contain garbage.
>>
>> The patch is a bit longer than it otherwise would be, because I
>> moved the code to remove a single file from RemoveOldXlogFiles() to
>> a new function. I think that makes it more readable in any case,
>> simply because it was so deeply indented in RemoveOldXlogFiles.
>
> Where are we on this?

I didn't hear any better ideas, so committed this now.

- Heikki




Re: Bogus WAL segments archived after promotion

От
Michael Paquier
Дата:
On Mon, Apr 13, 2015 at 11:57 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> On 04/01/2015 07:12 PM, Bruce Momjian wrote:
>>
>> On Fri, Dec 19, 2014 at 10:26:34PM +0200, Heikki Linnakangas wrote:
>>>
>>> On 12/19/2014 02:55 PM, Heikki Linnakangas wrote:
>>>>
>>>> I'm thinking that we should add a step to promotion, where we scan
>>>> pg_xlog for any segments higher than the timeline switch point, and
>>>> remove them, or mark them with .done so that they are not archived.
>>>> There might be some real WAL that was streamed from the primary, but not
>>>> yet applied, but such WAL is of no interest to that server anyway, after
>>>> it's been promoted. It's a bit disconcerting to zap WAL that's valid,
>>>> even if doesn't belong to the current server's timeline history, because
>>>> as a general rule it's good to avoid destroying evidence that might be
>>>> useful in debugging. There isn't much difference between removing them
>>>> immediately and marking them as .done, though, because they will
>>>> eventually be removed/recycled anyway if they're marked as .done.
>>>
>>>
>>> This is what I came up with. This patch removes the suspect segments
>>> at timeline switch. The alternative of creating .done files for them
>>> would preserve more evidence for debugging, but OTOH it would also
>>> be very confusing to have valid-looking WAL segments in pg_xlog,
>>> with .done files, that in fact contain garbage.
>>>
>>> The patch is a bit longer than it otherwise would be, because I
>>> moved the code to remove a single file from RemoveOldXlogFiles() to
>>> a new function. I think that makes it more readable in any case,
>>> simply because it was so deeply indented in RemoveOldXlogFiles.
>>
>>
>> Where are we on this?
>
>
> I didn't hear any better ideas, so committed this now.

Finally looking at that... The commit log of b2a5545 is a bit
misleading. Segment files that were recycled during archive recovery
are not necessarily removed, they could be recycled as well during
promotion on the new timeline in line with what RemoveOldXlogFiles()
does. Hence I think that the comment on top of
RemoveNonParentXlogFiles() should be updated to reflect that like in
the patch attached.

Something minor: perhaps we could refactor xlogarchive.c to have
XLogArchiveCheckDone() and XLogArchiveIsBusy() use the new
XLogArchiveIsReady().
Regards,
--
Michael

Вложения

Re: [HACKERS] Bogus WAL segments archived after promotion

От
Bruce Momjian
Дата:
On Thu, Apr 23, 2015 at 02:57:59PM +0900, Michael Paquier wrote:
> Finally looking at that... The commit log of b2a5545 is a bit
> misleading. Segment files that were recycled during archive recovery
> are not necessarily removed, they could be recycled as well during
> promotion on the new timeline in line with what RemoveOldXlogFiles()
> does. Hence I think that the comment on top of
> RemoveNonParentXlogFiles() should be updated to reflect that like in
> the patch attached.
> 
> Something minor: perhaps we could refactor xlogarchive.c to have
> XLogArchiveCheckDone() and XLogArchiveIsBusy() use the new
> XLogArchiveIsReady().
> Regards,

Old patch, but still valid, so applied to master, thanks.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.