WIP patch for parallel pg_dump

Поиск
Список
Период
Сортировка
От Joachim Wieland
Тема WIP patch for parallel pg_dump
Дата
Msg-id AANLkTin27_TOVU5KF90Ou3qGnT+d76JPgaDbrDLZBaxV@mail.gmail.com
обсуждение исходный текст
Ответы Re: WIP patch for parallel pg_dump  (Joachim Wieland <joe@mcknight.de>)
Список pgsql-hackers
This is the second patch for parallel pg_dump, now the actual part that
parallelizes the whole thing. More precisely, it adds parallel backup/restore
to pg_dump/pg_restore for the directory archive format and keeps the parallel
restore part of the custom archive format. Combined with my archive format
directory patch, which also includes a prototype of the liblzf compression you
can combine this compression with any of the just mentioned backup/restore
scenarios. This patch is on top of the previous directory patch.

You would add a regular parallel dump with

$ pg_dump -j 4 -Fd -f out.dir dbname

In previous discussions there was a request to add support for multiple
directories, which I have done as well, so that you can also run

$ pg_dump -j 4 -Fd -f dir1:dir2:dir3 dbname

to equally distribute the data among those three directories (we can still
discuss the syntax, I am not all that happy with the colon either...)

The dump would always start with the largest objects, by looking at the
relpages column of pg_class which should give a good estimate. The order of the
objects to restore is determined by the dependencies among the objects (which
is already used in the parallel restore of the custom archivetype).

The file test.sh includes some example commands that I have run here as a kind
of regression test that should give you an impression of how to call it from the
command line.

One thing that is currently missing is proper support for Windows, this is the next
thing that I will be working on. Also this version still gives quite a bunch of debug
information about what the processes are doing, so don't try to pipe the
pg_dump output anywhere (even when not run in parallel), it will probably just
not work...

The missing part that would make parallel pg_dump work with no strings attached
is snapshot synchronization. As long as there are no synchronized snapshots,
you would need to stop writing to your database before starting the parallel
pg_dump. However it turns out that most often when you are especially concerned
about a fast dump, you have shut down your applications anyway (which is the
reason why you are so concerned about speed in the first place). These cases
are typically database migrations from one host/platform to another or database
upgrades without pg_migrator.


Joachim
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Greg Smith
Дата:
Сообщение: Re: pg_stat_bgwriter broken?
Следующее
От: Greg Smith
Дата:
Сообщение: Re: Instrument checkpoint sync calls