Re: Incremental backup from a streaming replication standby fails

Поиск
Список
Период
Сортировка
От David Steele
Тема Re: Incremental backup from a streaming replication standby fails
Дата
Msg-id a4391db7-d308-4814-ba6b-7c4e5ed59dc6@pgmasters.net
обсуждение исходный текст
Ответ на Re: Incremental backup from a streaming replication standby fails  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Incremental backup from a streaming replication standby fails
Список pgsql-hackers
On 7/19/24 21:52, Robert Haas wrote:
> On Mon, Jul 15, 2024 at 11:27 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>> On Sat, 2024-06-29 at 07:01 +0200, Laurenz Albe wrote:
>>> I played around with incremental backup yesterday and tried $subject
>>>
>>> The WAL summarizer is running on the standby server, but when I try
>>> to take an incremental backup, I get an error that I understand to mean
>>> that WAL summarizing hasn't caught up yet.
>>>
>>> I am not sure if that is working as designed, but if it is, I think it
>>> should be documented.
>>
>> I played with this some more.  Here is the exact error message:
>>
>> ERROR:  manifest requires WAL from final timeline 1 ending at 0/1967C260, but this backup starts at 0/1967C190
>>
>> By trial and error I found that when I run a CHECKPOINT on the primary,
>> taking an incremental backup on the standby works.
>>
>> I couldn't fathom the cause of that, but I think that that should either
>> be addressed or documented before v17 comes out.
> 
> I had a feeling this was going to be confusing. I'm not sure what to
> do about it, but I'm open to suggestions.
> 
> Suppose you take a full backup F; replay of that backup will begin
> with a checkpoint CF. Then you try to take an incremental backup I;
> replay will begin from a checkpoint CI. For the incremental backup to
> be valid, it must include all blocks modified after CF and before CI.
> But when the backup is taken on a standby, no new checkpoint is
> possible. Hence, CI will be the most recent restartpoint on the
> standby that has occurred before the backup starts. So, if F is taken
> on the primary and then I is immediately taken on the standby without
> the standby having done a new restartpoint, or if both F and I are
> taken on the standby and no restartpoint intervenes, then CF=CI. In
> that scenario, an incremental backup is pretty much pointless: every
> single incremental file would contain 0 blocks. You might as well just
> use the backup you already have, unless one of the non-relation files
> has changed. So, except in that unusual corner case, the fact that the
> backup fails isn't really costing you anything. In fact, there's a
> decent chance that it's saving you from taking a completely useless
> backup.

<snip>

> I think I'm a little too close to this to really know what the best
> thing to do is, so I'm happy to hear suggestions from you and others.

I think it would be enough just to add a hint such as:

HINT: this is possible when making a standby backup with little or no 
activity.

My guess is in production environments this will be uncommon.

For example, over the years we (pgBackRest) have gotten numerous bug 
reports that time-targeted PITR does not work. In every case we found 
that the user was just testing procedures and the database had no 
activity between backups -- therefore recovery had no commit timestamps 
to use to end recovery. Test environments sometimes produce weird results.

Having said that, I think it would be better if it worked even if it 
does produce an empty backup. An empty backup wastes some disk space but 
if it produces less friction and saves an admin having to intervene then 
it is probably worth it. I don't immediately see how to do that in a 
reliable way, though, and in any case it seems like something to 
consider for PG18.

Regards,
-David



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Christoph Berg
Дата:
Сообщение: Re: Build with LTO / -flto on macOS
Следующее
От: Jeff Davis
Дата:
Сообщение: Re: Built-in CTYPE provider