Re: [RFC] Incremental backup v3: incremental PoC

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: [RFC] Incremental backup v3: incremental PoC
Дата
Msg-id CA+TgmoaG4QVdVMVdaK0r-=8B2Z=Os7qfXcr1vob=-jry1pNvsg@mail.gmail.com
обсуждение исходный текст
Ответ на [RFC] Incremental backup v3: incremental PoC  (Marco Nenciarini <marco.nenciarini@2ndquadrant.it>)
Ответы Re: [RFC] Incremental backup v3: incremental PoC  (Jehan-Guillaume de Rorthais <jgdr@dalibo.com>)
Re: [RFC] Incremental backup v3: incremental PoC  (Marco Nenciarini <marco.nenciarini@2ndquadrant.it>)
Список pgsql-hackers
On Tue, Oct 14, 2014 at 1:17 PM, Marco Nenciarini
<marco.nenciarini@2ndquadrant.it> wrote:
> I would to replace the getMaxLSN function with a more-or-less persistent
> structure which contains the maxLSN for each data segment.
>
> To make it work I would hook into the ForwardFsyncRequest() function in
> src/backend/postmaster/checkpointer.c and update an in memory hash every
> time a block is going to be fsynced. The structure could be persisted on
> disk at some time (probably on checkpoint).
>
> I think a good key for the hash would be a BufferTag with blocknum
> "rounded" to the start of the segment.
>
> I'm here asking for comments and advices on how to implement it in an
> acceptable way.

I'm afraid this is going to be quite tricky to implement.  There's no
way to make the in-memory hash table large enough that it can
definitely contain all of the entries for the entire database.  Even
if it's big enough at a certain point in time, somebody can create
100,000 new tables and now it's not big enough any more.  This is not
unlike the problem we had with the visibility map and free space map
before 8.4 (and you probably remember how much fun that was).

I suggest leaving this out altogether for the first version.  I can
think of three possible ways that we can determine which blocks need
to be backed up.  One, just read every block in the database and look
at the LSN of each one.  Two, maintain a cache of LSN information on a
per-segment (or smaller) basis, as you suggest here.  Three, scan the
WAL generated since the incremental backup and summarize it into a
list of blocks that need to be backed up.  This last idea could either
be done when the backup is requested, or it could be done as the WAL
is generated and used to populate the LSN cache.  In the long run, I
think some variant of approach #3 is likely best, but in the short
run, approach #1 (scan everything) is certainly easiest.  While it
doesn't optimize I/O, it still gives you the benefit of reducing the
amount of data that needs to be transferred and stored, and that's not
nothing.  If we get that much working, we can improve things more
later.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: TABLESAMPLE patch
Следующее
От: Kouhei Kaigai
Дата:
Сообщение: Re: ctidscan as an example of custom-scan (Re: [v9.5] Custom Plan API)