Обсуждение: [HACKERS] Hooks to track changed pages for backup purposes

Поиск
Список
Период
Сортировка

[HACKERS] Hooks to track changed pages for backup purposes

От
Andrey Borodin
Дата:
Hi hackers!

Here is the patch with hooks that I consider sufficient for implementation of incremental backup with pages tracking as
extension.

Recently I was posting these things to the thread "Adding hook in BufferSync for backup purposes" [0], but here I start
separatethread since Subj field of previous discussion is technically wrong. 

Currently various incremental backups can use one of this methods to take diff of a cluster since some LSN:
1. Check LSN of every page
2. Scan WAL and collect block numbers of changed pages

I propose adding hooks:
1. When a buffer is registered in WAL insertion
This hook is supposed to place blocknumbers in a temporary storage, like backend-local static array.
2. When a WAL record insertion is started and finished, to transfer blocknumbers to more concurrency-protected storage.
3. When the WAL segment is switched to initiate async transfer of accumulated blocknumbers to durable storage.

When we have accumulated diff blocknumbers for most of segments we can significantly speed up method of WAL scanning.
Ifwe have blocknumbers for all segments we can skip WAL scanning at all. 

I think that these proposed hooks can enable more efficient backups. How do you think?

Any ideas will be appreciated. This patch is influenced by the code of PTRACK (Yury Zhuravlev and Postgres
Professional).

Best regards, Andrey Borodin.


[0]
https://www.postgresql.org/message-id/flat/20051502087457%40webcorp01e.yandex-team.ru#20051502087457@webcorp01e.yandex-team.ru


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Hooks to track changed pages for backup purposes

От
Michael Paquier
Дата:
On Thu, Aug 31, 2017 at 3:02 PM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> Here is the patch with hooks that I consider sufficient for implementation of incremental backup with pages tracking
asextension.
 
>
> Recently I was posting these things to the thread "Adding hook in BufferSync for backup purposes" [0], but here I
startseparate thread since Subj field of previous discussion is technically wrong.
 
>
> Currently various incremental backups can use one of this methods to take diff of a cluster since some LSN:
> 1. Check LSN of every page
> 2. Scan WAL and collect block numbers of changed pages
>
> I propose adding hooks:
> 1. When a buffer is registered in WAL insertion
> This hook is supposed to place blocknumbers in a temporary storage, like backend-local static array.
> 2. When a WAL record insertion is started and finished, to transfer blocknumbers to more concurrency-protected
storage.
> 3. When the WAL segment is switched to initiate async transfer of accumulated blocknumbers to durable storage.
>
> When we have accumulated diff blocknumbers for most of segments we can significantly speed up method of WAL scanning.
Ifwe have blocknumbers for all segments we can skip WAL scanning at all.
 
>
> I think that these proposed hooks can enable more efficient backups. How do you think?
>
> Any ideas will be appreciated. This patch is influenced by the code of PTRACK (Yury Zhuravlev and Postgres
Professional).

+                if (xlog_insert_buffer_hook)
+                    for(nblock = 0; nblock <
xlogreader->max_block_id; nblock++)
+                    {
+                        if(xlogreader->blocks[nblock].forknum == MAIN_FORKNUM)
+                        {
+
xlog_insert_buffer_hook(xlogreader->blocks[nblock].blkno,
+
xlogreader->blocks[nblock].rnode, true);
+                        }
+                    }
[...]
+    if(xlog_begin_insert_hook)
+        xlog_begin_insert_hook();
Such things are not Postgres-C like. The first one should use
brackets, and the second one proper spacing after "if".

I don't understand what xlog_begin_insert_hook() is good for. There
are no arguments fed to this hook, so modules would not be able to
analyze things in this context, except shared memory and process
state?

Those hooks are put in hot code paths, and could impact performance of
WAL insertion itself. So you basically move the cost of scanning WAL
segments for those blocks from any backup solution to the WAL
insertion itself. Really, wouldn't it be more simple to let for
example the archiver process to create this meta-data if you just want
to take faster backups with a set of segments? Even better, you could
do a scan after archiving N segments, and then use M jobs to do this
work more quickly. (A set of background workers could do this job as
well).

In the backup/restore world, backups can be allowed to be taken at a
slow pace, what matters is to be able to restore them quickly. In
short, anything moving performance from an external backup code path
to a critical backend code path looks like a bad design to begin with.
So I am dubious that what you are proposing here is a good idea.
-- 
Michael



Re: [HACKERS] Hooks to track changed pages for backup purposes

От
Andrey Borodin
Дата:
Thank you for your reply, Michael! Your comments are valuable, especially in the world of backups.

> 31 авг. 2017 г., в 19:44, Michael Paquier <michael.paquier@gmail.com> написал(а):
> Such things are not Postgres-C like.
Will be fixed.

> I don't understand what xlog_begin_insert_hook() is good for.
memset control structures and array of blocknos and relfilenodes of the size XLR_MAX_BLOCK_ID .

> There
> are no arguments fed to this hook, so modules would not be able to
> analyze things in this context, except shared memory and process
> state?

>
> Those hooks are put in hot code paths, and could impact performance of
> WAL insertion itself.
I do not think sending few bytes to cached array is comparable to disk write of XLog record. Checking the func ptr is
evencheaper with correct branch prediction. 

> So you basically move the cost of scanning WAL
> segments for those blocks from any backup solution to the WAL
> insertion itself. Really, wouldn't it be more simple to let for
> example the archiver process to create this meta-data if you just want
> to take faster backups with a set of segments? Even better, you could
> do a scan after archiving N segments, and then use M jobs to do this
> work more quickly. (A set of background workers could do this job as
> well).
I like the idea of doing this during archiving. It is different trade-off between performance of OLTP and performance
ofbackuping. Essentially, it is parsing WAL some time before doing backup. The best thing about it is usage of CPUs
thatare usually spinning in idle loop on backup machines. 

> In the backup/restore world, backups can be allowed to be taken at a
> slow pace, what matters is to be able to restore them quickly.
Backups are taken much more often than restored.

> In short, anything moving performance from an external backup code path
> to a critical backend code path looks like a bad design to begin with.
> So I am dubious that what you are proposing here is a good idea.
I will think about it more. This proposal takes vanishingly small part of backend performance, but, indeed, nonzero
part.

Again, thank you for your time and comments.

Best regards, Andrey Borodin.


Re: [HACKERS] Hooks to track changed pages for backup purposes

От
Tomas Vondra
Дата:
Hi,

On 09/01/2017 08:13 AM, Andrey Borodin wrote:
> Thank you for your reply, Michael! Your comments are valuable,
especially in the world of backups.
>
>> 31 авг. 2017 г., в 19:44, Michael Paquier <michael.paquier@gmail.com>
написал(а):
>> Such things are not Postgres-C like.
> Will be fixed.
>

A few more comments:

* The patch defines wal_switch_hook, but it's never called.

* I see there are conditions like this:
   if(xlogreader->blocks[nblock].forknum == MAIN_FORKNUM)

Why is it enough to restrict the block-tracking code to main fork?
Aren't we interested in all relation forks?

>> I don't understand what xlog_begin_insert_hook() is good for.
> memset control structures and array of blocknos and relfilenodes of
the size XLR_MAX_BLOCK_ID .
>

Why can't that happen in the other hooks? I guess you'll have to explain
what the implementation of the hooks is supposed to do, and why these
locations for hook calls are the right ones. It's damn impossible to
validate the patch without that information.

Assuming you still plan to use the hook approach ...

>> There
>> are no arguments fed to this hook, so modules would not be able to
>> analyze things in this context, except shared memory and process
>> state?
>
>>
>> Those hooks are put in hot code paths, and could impact performance of
>> WAL insertion itself.
> I do not think sending few bytes to cached array is comparable to disk
write of XLog record. Checking the func ptr is even cheaper with correct
branch prediction.
>

That seems somewhat suspicious, for two reasons. Firstly, I believe we
only insert the XLOG records into WAL buffer here, so why should there
be any disk write related? Or do you mean the final commit?

But more importantly, doesn't this kind of information require some
durability guarantees? I mean, if it gets lost during server crashes or
restarts, doesn't that mean the incremental backups might miss some
buffers? I'd guess the hooks will have to do some sort of I/O, to
achieve that, no?

In any case, claims like this probably deserve some benchmarking.

>> So you basically move the cost of scanning WAL
>> segments for those blocks from any backup solution to the WAL
>> insertion itself. Really, wouldn't it be more simple to let for
>> example the archiver process to create this meta-data if you just want
>> to take faster backups with a set of segments? Even better, you could
>> do a scan after archiving N segments, and then use M jobs to do this
>> work more quickly. (A set of background workers could do this job as
>> well).
> I like the idea of doing this during archiving. It is different
trade-off between performance of OLTP and performance of backuping.
Essentially, it is parsing WAL some time before doing backup. The best
thing about it is usage of CPUs that are usually spinning in idle loop
on backup machines.
>
>> In the backup/restore world, backups can be allowed to be taken at a
>> slow pace, what matters is to be able to restore them quickly.
> Backups are taken much more often than restored.
>

I agree. The ability to take backups quickly (and often) is just as
important as fast recovery.

>> In short, anything moving performance from an external backup code path
>> to a critical backend code path looks like a bad design to begin with.
>> So I am dubious that what you are proposing here is a good idea.
> I will think about it more. This proposal takes vanishingly small part
of backend performance, but, indeed, nonzero part.
>

But I think this is not necessarily just about code paths - moving it to
the archiver or a bunch of background workers may easily have just as
negative effect (or event worse), as it's using the same CPUs, has to
actually re-read the WAL segments from disk (which may be a lot of I/O,
particularly when done from multiple processes).

From this POV, the idea to collect this information on the backup system
(WAL archive) by pre-processing the arriving WAL segments seems like the
most promising. It moves the work to another system, the backup system
can make it as durable as the WAL segments, etc.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Hooks to track changed pages for backup purposes

От
Andrey Borodin
Дата:
Hi Tomas! Thank you for looking into that patch.

8 сент. 2017 г., в 1:53, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а):

A few more comments:

* The patch defines wal_switch_hook, but it's never called.
That call was missing, that's a bug, thanks for spotting that out.

* I see there are conditions like this:

   if(xlogreader->blocks[nblock].forknum == MAIN_FORKNUM)

Why is it enough to restrict the block-tracking code to main fork?
Aren't we interested in all relation forks?
fsm, vm and others are small enough to take them 

I guess you'll have to explain
what the implementation of the hooks is supposed to do, and why these
locations for hook calls are the right ones. It's damn impossible to
validate the patch without that information.

Assuming you still plan to use the hook approach ...
Yes, I still think hooking is good idea, but you are right - I need prototype first. I'll mark patch as Returned with feedback before prototype implementation.


There
are no arguments fed to this hook, so modules would not be able to
analyze things in this context, except shared memory and process
state?


Those hooks are put in hot code paths, and could impact performance of
WAL insertion itself.
I do not think sending few bytes to cached array is comparable to disk
write of XLog record. Checking the func ptr is even cheaper with correct
branch prediction.


That seems somewhat suspicious, for two reasons. Firstly, I believe we
only insert the XLOG records into WAL buffer here, so why should there
be any disk write related? Or do you mean the final commit?
Yes, I mean finally we will be waiting for disk. Hundred empty ptr checks are neglectable in comparision with disk.

But more importantly, doesn't this kind of information require some
durability guarantees? I mean, if it gets lost during server crashes or
restarts, doesn't that mean the incremental backups might miss some
buffers? I'd guess the hooks will have to do some sort of I/O, to
achieve that, no?
We need durability only on the level of one segment. If we do not have info from segment we can just rescan it.
If we send segment to S3 as one file, we are sure in it's integrity. But this IO can by async.

PTRACK in it's turn switch bits in fork's buffers which are written in checkpointer and..well... recovered during recovery. By usual WAL replay of recovery.


From this POV, the idea to collect this information on the backup system
(WAL archive) by pre-processing the arriving WAL segments seems like the
most promising. It moves the work to another system, the backup system
can make it as durable as the WAL segments, etc.

Well, in some not so rare cases users encrypt backups and send to S3. And there is no system with CPUs that can handle that WAL parsing. Currently, I'm considering mocking prototype for wal-g, which works exactly this.

Your comments were very valuable, thank you for looking into the patch and joining the discussion.

Best regards, Andrey Borodin.


Re: [HACKERS] Hooks to track changed pages for backup purposes

От
Ants Aasma
Дата:
On Thu, Aug 31, 2017 at 9:02 AM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> When we have accumulated diff blocknumbers for most of segments we can significantly speed up method of WAL scanning.
Ifwe have blocknumbers for all segments we can skip WAL scanning at all.
 

Have you measured that the WAL scanning is actually a significant
issue? As a quick experiment I hacked up pg_waldump to just dump block
references to stdout in binary format. It scanned 2.8GB of WAL in 3.17
seconds, outputting 9.3M block refs per second. WAL was generated with
pgbench, synchronous commit off, using 4 cores for 10 minutes - making
the ratio of work from generating WAL to parsing it be about 750:1.

Regards,
Ants Aasma


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hooks to track changed pages for backup purposes

От
Andrey Borodin
Дата:
Hi!
Thank you for your interest and experiment results.
> 13 сент. 2017 г., в 15:43, Ants Aasma <ants.aasma@eesti.ee> написал(а):
>
> On Thu, Aug 31, 2017 at 9:02 AM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>> When we have accumulated diff blocknumbers for most of segments we can significantly speed up method of WAL
scanning.If we have blocknumbers for all segments we can skip WAL scanning at all. 
>
> Have you measured that the WAL scanning is actually a significant
> issue? As a quick experiment I hacked up pg_waldump to just dump block
> references to stdout in binary format. It scanned 2.8GB of WAL in 3.17
> seconds, outputting 9.3M block refs per second. WAL was generated with
> pgbench, synchronous commit off, using 4 cores for 10 minutes - making
> the ratio of work from generating WAL to parsing it be about 750:1.
>

No, I had not done this measurement myself. Sure, parsing WAL, when it is in RAM, is not very expensive. Though, it can
beeven cheaper before formatting WAL. 
I just want to figure out what is the best place for this, if backuping exec is sharing CPUs with postmaster.

Best regards, Andrey Borodin.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hooks to track changed pages for backup purposes

От
Tomas Vondra
Дата:

On 09/13/2017 07:53 AM, Andrey Borodin wrote:
>> * I see there are conditions like this:
>>
>>    if(xlogreader->blocks[nblock].forknum == MAIN_FORKNUM)
>>
>> Why is it enough to restrict the block-tracking code to main fork?
>> Aren't we interested in all relation forks?
> fsm, vm and others are small enough to take them 
> 

That seems like an optimization specific to your backup solution, not
necessarily to others and/or to other possible use cases.

>> I guess you'll have to explain
>> what the implementation of the hooks is supposed to do, and why these
>> locations for hook calls are the right ones. It's damn impossible to
>> validate the patch without that information.
>>
>> Assuming you still plan to use the hook approach ...
> Yes, I still think hooking is good idea, but you are right - I need
> prototype first. I'll mark patch as Returned with feedback before
> prototype implementation.
> 

OK

>>
>>>> There
>>>> are no arguments fed to this hook, so modules would not be able to
>>>> analyze things in this context, except shared memory and process
>>>> state?
>>>
>>>>
>>>> Those hooks are put in hot code paths, and could impact performance of
>>>> WAL insertion itself.
>>> I do not think sending few bytes to cached array is comparable to disk
>> write of XLog record. Checking the func ptr is even cheaper with correct
>> branch prediction.
>>>
>>
>> That seems somewhat suspicious, for two reasons. Firstly, I believe we
>> only insert the XLOG records into WAL buffer here, so why should there
>> be any disk write related? Or do you mean the final commit?
> Yes, I mean finally we will be waiting for disk. Hundred empty ptr
> checks are neglectable in comparision with disk.

Aren't we doing these calls while holding XLog locks? IIRC there was
quite a significant performance improvement after Heikki reduced the
amount of code executed while holding the locks.

>>
>> But more importantly, doesn't this kind of information require some
>> durability guarantees? I mean, if it gets lost during server crashes or
>> restarts, doesn't that mean the incremental backups might miss some
>> buffers? I'd guess the hooks will have to do some sort of I/O, to
>> achieve that, no?
> We need durability only on the level of one segment. If we do not have
> info from segment we can just rescan it.
> If we send segment to S3 as one file, we are sure in it's integrity. But
> this IO can by async.
> 
> PTRACK in it's turn switch bits in fork's buffers which are written in
> checkpointer and..well... recovered during recovery. By usual WAL replay
> of recovery.
> 

But how do you do that from the hooks, if they only store the data into
a buffer in memory? Let's say you insert ~8MB of WAL into a segment, and
then the system crashes and reboots. How do you know you have incomplete
information from the WAL segment?

Although, that's probably what wal_switch_hook() might do - sync the
data whenever the WAL segment is switched. Right?

> 
>> From this POV, the idea to collect this information on the backup system
>> (WAL archive) by pre-processing the arriving WAL segments seems like the
>> most promising. It moves the work to another system, the backup system
>> can make it as durable as the WAL segments, etc.
> 
> Well, in some not so rare cases users encrypt backups and send to S3.
> And there is no system with CPUs that can handle that WAL parsing.
> Currently, I'm considering mocking prototype for wal-g, which works
> exactly this.
> 

Why couldn't there be a system with enough CPU power? Sure, if you want
to do this, you'll need a more powerful system, but regular CPUs can do
>1GB/s in AES-256-GCM thanks to AES-NI. Or you could do it on the
database as part of archive_command, before the encryption, of course.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hooks to track changed pages for backup purposes

От
Daniel Gustafsson
Дата:
> On 13 Sep 2017, at 15:01, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> On 09/13/2017 07:53 AM, Andrey Borodin wrote:
>>> * I see there are conditions like this:
>>>
>>>    if(xlogreader->blocks[nblock].forknum == MAIN_FORKNUM)
>>>
>>> Why is it enough to restrict the block-tracking code to main fork?
>>> Aren't we interested in all relation forks?
>> fsm, vm and others are small enough to take them
>
> That seems like an optimization specific to your backup solution, not
> necessarily to others and/or to other possible use cases.
>
>>> I guess you'll have to explain
>>> what the implementation of the hooks is supposed to do, and why these
>>> locations for hook calls are the right ones. It's damn impossible to
>>> validate the patch without that information.
>>>
>>> Assuming you still plan to use the hook approach ...
>> Yes, I still think hooking is good idea, but you are right - I need
>> prototype first. I'll mark patch as Returned with feedback before
>> prototype implementation.
>
> OK
>
>>>>> There
>>>>> are no arguments fed to this hook, so modules would not be able to
>>>>> analyze things in this context, except shared memory and process
>>>>> state?
>>>>
>>>>>
>>>>> Those hooks are put in hot code paths, and could impact performance of
>>>>> WAL insertion itself.
>>>> I do not think sending few bytes to cached array is comparable to disk
>>> write of XLog record. Checking the func ptr is even cheaper with correct
>>> branch prediction.
>>>
>>> That seems somewhat suspicious, for two reasons. Firstly, I believe we
>>> only insert the XLOG records into WAL buffer here, so why should there
>>> be any disk write related? Or do you mean the final commit?
>> Yes, I mean finally we will be waiting for disk. Hundred empty ptr
>> checks are neglectable in comparision with disk.
>
> Aren't we doing these calls while holding XLog locks? IIRC there was
> quite a significant performance improvement after Heikki reduced the
> amount of code executed while holding the locks.
>
>>> But more importantly, doesn't this kind of information require some
>>> durability guarantees? I mean, if it gets lost during server crashes or
>>> restarts, doesn't that mean the incremental backups might miss some
>>> buffers? I'd guess the hooks will have to do some sort of I/O, to
>>> achieve that, no?
>> We need durability only on the level of one segment. If we do not have
>> info from segment we can just rescan it.
>> If we send segment to S3 as one file, we are sure in it's integrity. But
>> this IO can by async.
>>
>> PTRACK in it's turn switch bits in fork's buffers which are written in
>> checkpointer and..well... recovered during recovery. By usual WAL replay
>> of recovery.
>
> But how do you do that from the hooks, if they only store the data into
> a buffer in memory? Let's say you insert ~8MB of WAL into a segment, and
> then the system crashes and reboots. How do you know you have incomplete
> information from the WAL segment?
>
> Although, that's probably what wal_switch_hook() might do - sync the
> data whenever the WAL segment is switched. Right?
>
>>> From this POV, the idea to collect this information on the backup system
>>> (WAL archive) by pre-processing the arriving WAL segments seems like the
>>> most promising. It moves the work to another system, the backup system
>>> can make it as durable as the WAL segments, etc.
>>
>> Well, in some not so rare cases users encrypt backups and send to S3.
>> And there is no system with CPUs that can handle that WAL parsing.
>> Currently, I'm considering mocking prototype for wal-g, which works
>> exactly this.
>
> Why couldn't there be a system with enough CPU power? Sure, if you want
> to do this, you'll need a more powerful system, but regular CPUs can do
>> 1GB/s in AES-256-GCM thanks to AES-NI. Or you could do it on the
> database as part of archive_command, before the encryption, of course.

Based on unanswered questions in the discussion in this thread, and that no new
version of the patch has been submitted, I’m marking this returned with
feedback.  Please re-submit the patch in a future commitfest when ready for new
review.

cheers ./daniel

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers