Re: [HACKERS] Hooks to track changed pages for backup purposes

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: [HACKERS] Hooks to track changed pages for backup purposes
Дата
Msg-id CAB7nPqTVNKDTt+h+MZ+0B+vW8=B9dLhoygMFjujRNho2OYU14g@mail.gmail.com
обсуждение исходный текст
Ответ на [HACKERS] Hooks to track changed pages for backup purposes  (Andrey Borodin <x4mmm@yandex-team.ru>)
Ответы Re: [HACKERS] Hooks to track changed pages for backup purposes  (Andrey Borodin <x4mmm@yandex-team.ru>)
Список pgsql-hackers
On Thu, Aug 31, 2017 at 3:02 PM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> Here is the patch with hooks that I consider sufficient for implementation of incremental backup with pages tracking
asextension.
 
>
> Recently I was posting these things to the thread "Adding hook in BufferSync for backup purposes" [0], but here I
startseparate thread since Subj field of previous discussion is technically wrong.
 
>
> Currently various incremental backups can use one of this methods to take diff of a cluster since some LSN:
> 1. Check LSN of every page
> 2. Scan WAL and collect block numbers of changed pages
>
> I propose adding hooks:
> 1. When a buffer is registered in WAL insertion
> This hook is supposed to place blocknumbers in a temporary storage, like backend-local static array.
> 2. When a WAL record insertion is started and finished, to transfer blocknumbers to more concurrency-protected
storage.
> 3. When the WAL segment is switched to initiate async transfer of accumulated blocknumbers to durable storage.
>
> When we have accumulated diff blocknumbers for most of segments we can significantly speed up method of WAL scanning.
Ifwe have blocknumbers for all segments we can skip WAL scanning at all.
 
>
> I think that these proposed hooks can enable more efficient backups. How do you think?
>
> Any ideas will be appreciated. This patch is influenced by the code of PTRACK (Yury Zhuravlev and Postgres
Professional).

+                if (xlog_insert_buffer_hook)
+                    for(nblock = 0; nblock <
xlogreader->max_block_id; nblock++)
+                    {
+                        if(xlogreader->blocks[nblock].forknum == MAIN_FORKNUM)
+                        {
+
xlog_insert_buffer_hook(xlogreader->blocks[nblock].blkno,
+
xlogreader->blocks[nblock].rnode, true);
+                        }
+                    }
[...]
+    if(xlog_begin_insert_hook)
+        xlog_begin_insert_hook();
Such things are not Postgres-C like. The first one should use
brackets, and the second one proper spacing after "if".

I don't understand what xlog_begin_insert_hook() is good for. There
are no arguments fed to this hook, so modules would not be able to
analyze things in this context, except shared memory and process
state?

Those hooks are put in hot code paths, and could impact performance of
WAL insertion itself. So you basically move the cost of scanning WAL
segments for those blocks from any backup solution to the WAL
insertion itself. Really, wouldn't it be more simple to let for
example the archiver process to create this meta-data if you just want
to take faster backups with a set of segments? Even better, you could
do a scan after archiving N segments, and then use M jobs to do this
work more quickly. (A set of background workers could do this job as
well).

In the backup/restore world, backups can be allowed to be taken at a
slow pace, what matters is to be able to restore them quickly. In
short, anything moving performance from an external backup code path
to a critical backend code path looks like a bad design to begin with.
So I am dubious that what you are proposing here is a good idea.
-- 
Michael



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: [HACKERS] multiple target of VACUUM command
Следующее
От: Tom Lane
Дата:
Сообщение: [HACKERS] Assorted leaks and weirdness in parallel execution