Re: trying again to get incremental backup

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: trying again to get incremental backup
Дата
Msg-id 20231030184645.txm3qtkusivvfjm5@awork3.anarazel.de
обсуждение исходный текст
Ответ на Re: trying again to get incremental backup  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: trying again to get incremental backup  (Robert Haas <robertmhaas@gmail.com>)
Re: trying again to get incremental backup  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hi,

On 2023-10-30 10:45:03 -0400, Robert Haas wrote:
> On Tue, Oct 24, 2023 at 12:08 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > Note that whether to remove summaries is a separate question from
> > whether to generate them in the first place. Right now, I have
> > wal_summarize_mb controlling whether they get generated in the first
> > place, but as I noted in another recent email, that isn't an entirely
> > satisfying solution.
> 
> I did some more research on this. My conclusion is that I should
> remove wal_summarize_mb and just have a GUC summarize_wal = on|off
> that controls whether the summarizer runs at all. There will be one
> summary file per checkpoint, no matter how far apart checkpoints are
> or how large the summary gets. Below I'll explain the reasoning; let
> me know if you disagree.

> What I describe above would be a bad plan if it were realistically
> possible for a summary file to get so large that it might run the
> machine out of memory either when producing it or when trying to make
> use of it for an incremental backup. This seems to be a somewhat
> difficult scenario to create. So far, I haven't been able to generate
> WAL summary files more than a few tens of megabytes in size, even when
> summarizing 50+ GB of WAL per summary file. One reason why it's hard
> to produce large summary files is because, for a single relation fork,
> the WAL summary size converges to 1 bit per modified block when the
> number of modified blocks is large. This means that, even if you have
> a terabyte sized relation, you're looking at no more than perhaps 20MB
> of summary data no matter how much of it gets modified. Now, somebody
> could have a 30TB relation and then if they modify the whole thing
> they could have the better part of a gigabyte of summary data for that
> relation, but if you've got a 30TB table you probably have enough
> memory that that's no big deal.

I'm not particularly worried about the rewriting-30TB-table case - that'd also
generate >= 30TB of WAL most of the time. Which realistically is going to
trigger a few checkpoints, even on very big instances.


> But, what if you have multiple relations? I initialized pgbench with a
> scale factor of 30000 and also with 30000 partitions and did a 1-hour
> run. I got 4 checkpoints during that time and each one produced an
> approximately 16MB summary file.

Hm, I assume the pgbench run will be fairly massively bottlenecked on IO, due
to having to read data from disk, lots of full page write and having to write
out lots of data?  I.e. we won't do all that many transactions during the 1h?


> To get a 1GB+ WAL summary file, you'd need to modify millions of relation
> forks, maybe tens of millions, and most installations aren't even going to
> have that many relation forks, let alone be modifying them all frequently.

I tried to find bad cases for a bit - and I am not worried. I wrote a pgbench
script to create 10k single-row relations in each script, ran that with 96
clients, checkpointed, and ran a pgbench script that updated the single row in
each table.

After creation of the relation WAL summarizer uses
LOG:  level: 1; Wal Summarizer: 378433680 total in 43 blocks; 5628936 free (66 chunks); 372804744 used
and creates a 26MB summary file.

After checkpoint & updates WAL summarizer uses:
LOG:  level: 1; Wal Summarizer: 369205392 total in 43 blocks; 5864536 free (26 chunks); 363340856 used
and creates a 26MB summary file.

Sure, 350MB ain't nothing, but simply just executing \dt in the database
created by this makes the backend use 260MB after. Which isn't going away,
whereas WAL summarizer drops its memory usage soon after.


> But I think that's sufficiently niche that the current patch shouldn't
> concern itself with such cases. If we find that they're common enough
> to worry about, we might eventually want to do something to mitigate
> them, but whether that thing looks anything like wal_summarize_mb
> seems pretty unclear. So I conclude that it's a mistake to include
> that GUC as currently designed and propose to replace it with a
> Boolean as described above.

After playing with this for a while, I don't see a reason for wal_summarize_mb
from a memory usage POV at least.

I wonder if there are use cases that might like to consume WAL summaries
before the next checkpoint? For those wal_summarize_mb likely wouldn't be a
good control, but they might want to request a summary file to be created at
some point?

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: COPY TO (FREEZE)?
Следующее
От: Robert Haas
Дата:
Сообщение: Re: trying again to get incremental backup