Re: finding changed blocks using WAL scanning

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: finding changed blocks using WAL scanning
Дата
Msg-id CA+TgmobjopcytHN35czB9PG1vqwHcW3mwzoTwF7HMVdH+7WU9Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: finding changed blocks using WAL scanning  (Stephen Frost <sfrost@snowman.net>)
Ответы Re: finding changed blocks using WAL scanning  (Stephen Frost <sfrost@snowman.net>)
Список pgsql-hackers
On Fri, Apr 19, 2019 at 8:39 PM Stephen Frost <sfrost@snowman.net> wrote:
> While I do think we should at least be thinking about the load caused
> from scanning the WAL to generate a list of blocks that are changed, the
> load I was more concerned with in the other thread is the effort
> required to actually merge all of those changes together over a large
> amount of WAL.  I'm also not saying that we couldn't have either of
> those pieces done as a background worker, just that it'd be really nice
> to have an external tool (or library) that can be used on an independent
> system to do that work.

Oh.  Well, I already explained my algorithm for doing that upthread,
which I believe would be quite cheap.

1. When you generate the .modblock files, stick all the block
references into a buffer.  qsort().  Dedup.  Write out in sorted
order.

2. When you want to use a bunch of .modblock files, do the same thing
MergeAppend does, or what merge-sort does when it does a merge pass.
Read the first 1MB of each file (or whatever amount).  Repeatedly pull
an item from whichever file has the lowest remaining value, using a
binary heap.  When no buffered data remains for a particular file,
read another chunk from that file.

If each .modblock file covers 1GB of WAL, you could the data from
across 1TB of WAL using only 1GB of memory, and that's assuming you
have a 1MB buffer for each .modblock file.  You probably don't need
such a large buffer.  If you use, say, a 128kB buffer, you could merge
the data from across 8TB of WAL using 1GB of memory.  And if you have
8TB of WAL and you can't spare 1GB for the task of computing which
blocks need to be included in your incremental backup, it's time for a
hardware upgrade.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: finding changed blocks using WAL scanning
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: block-level incremental backup