Re: WIP patch for parallel pg_dump

Поиск
Список
Период
Сортировка
От Andrew Dunstan
Тема Re: WIP patch for parallel pg_dump
Дата
Msg-id 4CF8EA53.5040101@dunslane.net
обсуждение исходный текст
Ответ на Re: WIP patch for parallel pg_dump  (Joachim Wieland <joe@mcknight.de>)
Ответы Re: WIP patch for parallel pg_dump  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers

On 12/02/2010 11:44 PM, Joachim Wieland wrote:
> On Thu, Dec 2, 2010 at 9:33 PM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>> In particular, this issue *has* been discussed before, and there was a
>> consensus that preserving dump consistency was a requirement.  I don't
>> think that Joachim gets to bypass that decision just by submitting a
>> patch that ignores it.
> I am not trying to bypass anything here :)  Regarding the locking
> issue I probably haven't done sufficient research, at least I managed
> to miss the emails that mentioned it. Anyway, that seems to be solved
> now fortunately, I'm going to implement your idea over the weekend.
>
> Regarding snapshot cloning and dump consistency, I brought this up
> already several months ago and asked if the feature is considered
> useful even without snapshot cloning. And actually it was you who
> motivated me to work on it even without having snapshot consistency...
>
> http://archives.postgresql.org/pgsql-hackers/2010-03/msg01181.php
>
> In my patch pg_dump emits a warning when called with -j, if you feel
> better with an extra option
> --i-know-that-i-have-no-synchronized-snapshots, fine with me :-)
>
> In the end we provide a tool with limitations, it might not serve all
> use cases but there are use cases that would benefit a lot. I
> personally think this is better than to provide no tool at all...
>
>
>

I think Tom's statement there:

> I think migration to a new server version (that's too incompatible for
> PITR or pg_migrate migration) is really the only likely use case.

is just wrong. Say you have a site that's open 24/7. But there is a 
window of, say, 6 hours, each day, when it's almost but not quite quiet. 
You want to be able to make your disaster recovery dump within that 
window, and the low level of traffic means you can afford the degraded 
performance that might result from a parallel dump. Or say you have a 
hot standby machine from which you want to make the dump but want to set 
the max_standby_*_delay as low as possible. These are both cases where 
you might want parallel dump and yet you want dump consistency. I have a 
client currently considering the latter setup, and the timing tolerances 
are a little tricky. The times in which the system is in a state that we 
want dumped are fixed, and we want to be sure that the dump is finished 
by the next time such a time rolls around. (This is a system that in 
effect makes one giant state change at a time.) If we can't complete the 
dump in that time then there will be a delay introduced to the system's 
critical path. Parallel dump will be very useful in helping us avoid 
such a situation, but only if it's properly consistent.

I think Josh Berkus' comments in the thread you mentioned are correct:

> Actually, I'd say that there's a broad set of cases of people who want
> to do a parallel pg_dump while their system is active.  Parallel pg_dump
> on a stopped system will help some people (for migration, particularly)
> but parallel pg_dump with snapshot cloning will help a lot more people.



cheers

andrew




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Florian Weimer
Дата:
Сообщение: Re: [PATCH] Custom code int(32|64) => text conversions out of performance reasons
Следующее
От: Jeroen Vermeulen
Дата:
Сообщение: Re: Hypothetical Indexes - PostgreSQL extension - PGCON 2010