Re: Streaming a base backup from master

Поиск
Список
Период
Сортировка
От Greg Stark
Тема Re: Streaming a base backup from master
Дата
Msg-id AANLkTimVrLsH=ox4=WnxwYAsy4LSKYRjAKkmRW=nFOJ8@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Streaming a base backup from master  (Martijn van Oosterhout <kleptog@svana.org>)
Ответы Re: Streaming a base backup from master  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Sun, Sep 5, 2010 at 4:51 PM, Martijn van Oosterhout
<kleptog@svana.org> wrote:

> If you're working from a known good version of the database at some
> point, yes you are right you have more interesting options. If you
> don't you want something that will fix it.

Sure, in that case you want to restore from backup. Whatever you use
to do that is the same net result. I'm not sure rsync is actually
going to be much faster though since it still has to read all of the
existing database which a normal restore doesn't have to. If the
database has changed significantly that's a lot of extra I/O and
you're probably on a local network with a lot of bandwidth available.

What I'm talking about is how you *take* backups. Currently you have
to take a full backup which if you have a large data warehouse could
be a big job. If only a small percentage of the database is changing
then you could use rsync to reduce the network bandwidth to transfer
your backup but you still have to read the entire database and write
out the entire backup.

Incremental backups mean being able to read just the data blocks that
have been modified and write out a backup file with just those blocks.
When it comes time to restore then you restore the last full backup,
then any incremental backups since then, then replay any logs needed
to bring it to a consistent state.

I think that description pretty much settles the question in my mind.
The implementation choice of scanning the WAL to find all the changed
blocks is more relevant to the use cases where incremental backups are
useful. If you still have to read the entire database then there's not
all that much to be gained except storage space. If you scan the WAL
then you can avoid reading most of your large data warehouse to
generate the incremental and only read the busy portion.

In the use case where the database is extremely busy but writing and
rewriting the same small number of blocks over and over even scanning
the WAL might not be ideal. For that use case it might be more useful
to generate a kind of wal-summary which lists all the blocks touched
since the last checkpoint every checkpoint. But that could be a later
optimization.


-- 
greg


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: string function - "format" function proposal
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: Synchronous replication - patch status inquiry