Re: Tracking of page changes for backup purposes. PTRACK [POC]

Поиск
Список
Период
Сортировка
От Andrey Borodin
Тема Re: Tracking of page changes for backup purposes. PTRACK [POC]
Дата
Msg-id 1D5FDB76-D7AC-46BB-B684-50C90E0E7BBE@yandex-team.ru
обсуждение исходный текст
Ответ на Re: Tracking of page changes for backup purposes. PTRACK [POC]  (Michael Paquier <michael.paquier@gmail.com>)
Список pgsql-hackers
Hi!

> 21 дек. 2017 г., в 5:51, Michael Paquier <michael.paquier@gmail.com> написал(а):
>
> On Thu, Dec 21, 2017 at 7:35 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Dec 20, 2017 at 3:45 PM, Tomas Vondra
>> <tomas.vondra@2ndquadrant.com> wrote:
>>>> Isn't more effective hold this info in Postgres than in backup sw?
>>>> Then any backup sw can use this implementation
>> [Skipped]
>> I agree with all of that.
>
> +1. This summarizes a bunch of concerns about all kinds of backend
> implementations proposed. Scanning for a list of blocks modified via
> streaming gives more availability, but knowing that you will need to
> switch to a new segment anyway when finishing a backup, does it really
> matter? Doing it once a segment has finished would be cheap enough,
> and you can even do it in parallel with a range of segments.
>
> Also, since 9.4 and the introduction of the new WAL API to track
> modified blocks, you don't need to know about the record types to know
> which blocks are being changed. Here is an example of tool I hacked up
> in a couple of hours that does actually what you are looking for, aka
> a scanner of the blocks modified per record for a given WAL segment
> using xlogreader.c:
> https://github.com/michaelpq/pg_plugins/tree/master/pg_wal_blocks
>
> You could just use that and shape the data in the way you want and you
> would be good to go.

Michael, that's almost what I want. I've even filed GSoC proposal for this [0].
But can we have something like this in Postgres?
The tool I'm hacking is in Go, I cannot just embed bunch of Postgres C into it. That is why API, like PTRACK, suits my
needsbetter. Not because it uses any superior mechanics, but because it is API, ready for external 3rd party backup
software(having backup software in Pg would be even better). 


Anastasia, I've implemented PTRACK support in WAL-G.
First, there are few minor issues with patch:
1. There is malformed comment
2. Function pg_ptrack_version() is absent

Then, I think that API is far from perfect: pg_ptrack_get_and_clear() changes global ptrack_clear_lsn, which introduces
someweakness (for paranoids). May be use something like "pg_ptrack_get_and_clear(oid,oid,previous_lsn)" which will fail
ifprevious_lsn do not match? Also, function pg_ptrack_get_and_clear() do not return errors when there is no table with
thisoid. Finally, I had to interpret any empty map as absence of map. From my POV, function must fail on errors like:
invaidoid passed, no table found, no PTRACK map exists, et c. 
I use external file-tracking mechanics, so function pg_ptrack_init_get_and_clear() was of no use for me.

Last, but most important for me: my tests showed lost page updates. Probably, it is bug or paranoia in my test
software.But may I ask you to check this [1] code, which converts PTRACK map to number of block numbers. Do I get
meaningof PTRACK map right? Thank you very much. 

[0] https://wiki.postgresql.org/index.php?title=GSoC_2018#WAL-G_delta_backups_with_WAL_scanning_.282018.29
[1] https://github.com/wal-g/wal-g/blob/ptrack/pagefile.go#L167-L173



В списке pgsql-hackers по дате отправления:

Предыдущее
От: apt.postgresql.org repository
Дата:
Сообщение: pgbackrest updated to version 1.27-1.pgdg+1
Следующее
От: PG Doc comments form
Дата:
Сообщение: pg_dumpall examples may lead to encoding problems