directory archive format for pg_dump

Поиск

Список

Период

Сортировка

От	Joachim Wieland
Тема	directory archive format for pg_dump
Дата	14 ноября 2010 г. 23:18:25
Msg-id	AANLkTimUELTXwRSQDQNwxik_k1y3YcH1u-9NgHZqpi9e@mail.gmail.com обсуждение исходный текст
Ответы	Re: directory archive format for pg_dump
Список	pgsql-hackers

Дерево обсуждения

This is the first of two patches for parallel pg_dump. In particular, this
patch adds a new pg_dump archive type which can save pg_dump data to a
directory, with each table/blob being a file so that several processes can
write to different files in parallel.

Since the compression is currently all down in the custom format backup code,
the first thing I've done was refactoring the compression functions into a
separate file. While at it, I have added support for liblzf compression.

Writing the backup to a directory brings the disadvantage that your backup now
consists of a bunch of files and you should make sure not to lose files or mix
files of different backup sets. Therefore, I have added a -k switch that
checks if a directory backup set is complete. To do this, every backup has a
different id (basically a random md5sum) which is copied into every file (both
TOC and data files). The TOC also knows about the size of each data file and
can check if it has been truncated for some reason.

Regarding lzf compression, the last discussion was here:

http://archives.postgresql.org/pgsql-hackers/2010-04/msg00442.php

I have included it to actually have multiple compression algorithms to build a
framework for and to allow people to just compile and run it and see what they
get. In my tests, when I run a backup with lzf compression, the postgres
backend is using 100% of one CPU and pg_dump is using 15% of another CPU.
Running with zlib however gives me 100% zlib and 70% postgres. Specifying the
fastest zlib compression rate of 1 gives me 50% pg_dump and 100% postgres. zlib
compression can be taken out of the code in like two minutes, it's all in
#ifdef's, so please see lzf just as an optional addition to the directory patch
instead of as a main feature.

I am also submitting a WIP patch that shows the parallel version of pg_dump
which is a patch on top of this one. It is not completely ready yet but I am
releasing it as a WIP patch so you can see the overall picture and can play
with it already now. And hopefully I can get some feedback if I am going into
the right direction.

There is a small shellscript included (test.sh) listing some of the commands,
to give people a quick overview of how to call it.

Joachim

Вложения

pg_dump-directory.diff

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Peter Eisentraut
Дата: 14 ноября 2010 г., 23:06:48
Сообщение: Per-column collation

Следующее

От: Greg Smith
Дата: 14 ноября 2010 г., 23:19:51
Сообщение: Re: Count backend self-sync calls

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

directory archive format for pg_dump

Вложения

Предыдущее

Следующее