Re: Add timeline to partial WAL segments

Поиск
Список
Период
Сортировка
От David Steele
Тема Re: Add timeline to partial WAL segments
Дата
Msg-id 44698ea5-085d-18e6-e9e0-249a742d7960@pgmasters.net
обсуждение исходный текст
Ответ на Re: Add timeline to partial WAL segments  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: Add timeline to partial WAL segments  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
Hi Michael,

On 12/10/18 8:43 PM, Michael Paquier wrote:
> On Mon, Dec 10, 2018 at 10:21:23AM -0500, David Steele wrote:
>> We recommend that archive commands not overwrite an existing segment.
>> Some backup tools will compare the contents and succeed if they are
>> equal, but in this case that will still often fail because recycled WAL
>> segments will have different bytes at the end on the primary and
>> standby.  The files may not even be logically the same because B may not
>> have received all WAL from A.
>
> This is not a new problem, the last, partial segment generated
> post-promotion of a timeline needs to be archived.  Since the
> introduction of .partial within the segment name in 9.5, we also assume
> that the OP would be smart enough to rename the segment to replay up to
> the end of the past timeline if need be for PITR.

It's not a new problem, but I do think it was only partially solved.
There are easy-to-reproduce cases where more than one .partial is
generated and I think we should handle that gracefully.

>> However, there is still a race condition here.  Since the
>> 000000010000000100000001.partial is archived first the 00000002.history
>> file might not make it to the archive before B crashes.  In that case A
>> will pick timeline 2 and still be stuck.  However, I'm thinking it would
>> be easy to teach pgarch_readyXlog() to return any .history files it
>> finds first (in order, of course).
>
> Still the .ready file of the partial segment would be generated before
> the history file, right?  In what does that help?

It looks to me like the history file is written first, and then the
.partial.  But because we process WAL alphabetically the .partial ends
up being pushed first.

The idea is to stake a claim to the timeline as quickly as possible so
nobody else claims it.  The pgarch_readyXlog() reads through all the
files, so it would be easy to return the history files first, then the
.partial, then the new timeline files, regardless of what order the
.ready files were written in.

The history file is also small, so it will be faster to copy and less
subject to latency concerns.  We want to reduce the window in which
another potential primary can end up with a duplicate timeline.

>> Another option would be to immediately archive the first WAL segment on
>> timeline 2 and forgo the .partial file entirely.  In this case the
>> archiver will archive the 00000002.history file before
>> 000000020000000100000001 and we avoid the race condition above.  That
>> also means we could recover A and promote without a conflict on the
>> .partial.  Or we could recover A along timeline 2.
>
> This breaks the definition of IsPartialXLogFileName() in
> xlog_internal.h, and the current naming convention of using only dots as
> field separators.

Good point.  Interesting that missing this didn't break any tests.

> Another more tricky problem is that this is
> inconsistent with the way pg_receivewal.c behaves for non-completed
> segments, which is a reason behind using .partial for the last partial
> segment on the backend side as well.  So this proposal makes things more
> inconsistent.

I think that we might need to apply the same logic to pg_receivewal.

>> I have attached a patch that adds the timeline to the .partial file.
>> This passes check-world.
>>
>> I think we should consider back-patching some set of these changes since
>> this causes real pain in current production HA configurations.
>>
>> Thoughts?
>
> So you basically append the new timeline ID to the segment name which
> still uses the old timeline ID in the first 8 characters of its name.
> Logically I find this proposal weird as the segment refers contents
> which are part of the past, and the backend is not going to use the
> contents of this segment when jumping to the a new timeline, but the
> contents of the segment which has the same contents up to the point WAL
> forked, with the name of the new timeline.

.partial files are rarely used in general, but we decided not to throw
them away because they *might* contain valuable information.  However,
in their current form they are more of a nuisance than a help.

What we're really looking for here is a way to sensibly version .partial
files in a way so that they a) don't conflict in the repo and b) their
name indicates where they came from.

If anyone can think of a naming scheme that makes more sense, I'm all ears.

> It seems to me that this is quite a change for a low-probability
> problem, as this assumes that the promotion of two different servers
> happen on exactly the same segment and that both would finish by
> archiving the same last partial segment.

It's actually a common (i.e. daily) problem at scale, especially when
archiving to high-latency storage like S3.  Patroni is especially likely
to show the issue as it favors uptime over preserving data in its
default configuration.

Thanks,
--
-David
david@pgmasters.net


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: Thinking about EXPLAIN ALTER TABLE
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Why not represent "never vacuumed" accurately wrt pg_class.relpages?