directory archive format for pg_dump

Поиск
Список
Период
Сортировка
От Joachim Wieland
Тема directory archive format for pg_dump
Дата
Msg-id AANLkTimUELTXwRSQDQNwxik_k1y3YcH1u-9NgHZqpi9e@mail.gmail.com
обсуждение исходный текст
Ответы Re: directory archive format for pg_dump  (Dimitri Fontaine <dimitri@2ndQuadrant.fr>)
Список pgsql-hackers
This is the first of two patches for parallel pg_dump. In particular, this
patch adds a new pg_dump archive type which can save pg_dump data to a
directory, with each table/blob being a file so that several processes can
write to different files in parallel.

Since the compression is currently all down in the custom format backup code,
the first thing I've done was refactoring the compression functions into a
separate file. While at it, I have added support for liblzf compression.

Writing the backup to a directory brings the disadvantage that your backup now
consists of a bunch of files and you should make sure not to lose files or mix
files of different backup sets. Therefore, I have added a -k switch that
checks if a directory backup set is complete. To do this, every backup has a
different id (basically a random md5sum) which is copied into every file (both
TOC and data files). The TOC also knows about the size of each data file and
can check if it has been truncated for some reason.

Regarding lzf compression, the last discussion was here:

http://archives.postgresql.org/pgsql-hackers/2010-04/msg00442.php

I have included it to actually have multiple compression algorithms to build a
framework for and to allow people to just compile and run it and see what they
get. In my tests, when I run a backup with lzf compression, the postgres
backend is using 100% of one CPU and pg_dump is using 15% of another CPU.
Running with zlib however gives me 100% zlib and 70% postgres. Specifying the
fastest zlib compression rate of 1 gives me 50% pg_dump and 100% postgres. zlib
compression can be taken out of the code in like two minutes, it's all in
#ifdef's, so please see lzf just as an optional addition to the directory patch
instead of as a main feature.

I am also submitting a WIP patch that shows the parallel version of pg_dump
which is a patch on top of this one. It is not completely ready yet but I am
releasing it as a WIP patch so you can see the overall picture and can play
with it already now. And hopefully I can get some feedback if I am going into
the right direction.

There is a small shellscript included (test.sh) listing some of the commands,
to give people a quick overview of how to call it.


Joachim

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Per-column collation
Следующее
От: Greg Smith
Дата:
Сообщение: Re: Count backend self-sync calls