Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT

Поиск
Список
Период
Сортировка
От Andrew Dunstan
Тема Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT
Дата
Msg-id 544ECDD7.1040500@dunslane.net
обсуждение исходный текст
Ответ на Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT  (Tomas Vondra <tv@fuzzy.cz>)
Ответы Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT
Список pgsql-hackers
On 10/27/2014 05:58 PM, Tomas Vondra wrote:
> On 27.10.2014 17:24, Heikki Linnakangas wrote:
>> On 10/27/2014 03:46 PM, Tom Lane wrote:
>>> Heikki Linnakangas <hlinnakangas@vmware.com> writes:
>>>> On 10/27/2014 03:21 PM, Tomas Vondra wrote:
>>>>> Thinking about this a bit more, do we really need a full checkpoint?
>>>>> That
>>>>> is a checkpoint of all the databases in the cluster? Why
>>>>> checkpointing the
>>>>> source database is not enough?
>>>> A full checkpoint ensures that you always begin recovery *after* the
>>>> DBASE_CREATE record. I.e. you never replay a DBASE_CREATE record during
>>>> crash recovery (except when you crash before the transaction commits, in
>>>> which case it doesn't matter if the new database's directory is borked).
>>> Yeah.  After re-reading the 2005 thread, I wonder if we shouldn't just
>>> bite the bullet and redesign CREATE DATABASE as you suggest, ie, WAL-log
>>> all the copied files instead of doing a "cp -r"-equivalent directory
>>> copy.
>>> That would fix a number of existing replay hazards as well as making it
>>> safe to do what Tomas wants.  In the small scale this would cause more
>>> I/O
>>> (2 copies of the template database's data) but in production situations
>>> we might well come out ahead by avoiding a forced checkpoint of the rest
>>> of the cluster.  Also I guess we could skip WAL-logging if WAL archiving
>>> is off, similarly to the existing optimization for CREATE INDEX etc.
>> That would be a nasty surprise for anyone who's using CREATE DATABASE
>> as a fast way to clone a large database. But I would be OK with that,
>> at least if we can skip the WAL-logging with wal_level=minimal.
> That's true. Sadly, I can't think of a solution that would address both
> use cases at the same time :-(
>
> The only thing I can think of is having two CREATE DATABASE "flavors".
> One keeping the current approach (suitable for fast cloning) and one
> with the WAL logging (minimizing the CREATE DATABASE duration the impact
> on other backends).
>
> It will probably make the code significantly more complex, which is not
> exactly desirable, I guess. Also, if we keep the current code (even if
> only as a special case) it won't eliminate the existing replay hazards
> (which was one of the Tom's arguments for biting the bullet).
>
> I'm also thinking that for wal_level=archive and large databases, this
> won't really eliminate the checkpoint as it will likely generate enough
> WAL to hit checkpoint_segments and trigger a checkpoint anyway. No?
>
> That being said, our CREATE DATABASE docs currently say this
>
>      Although it is possible to copy a database other than template1 by
>      specifying its name as the template, this is not (yet) intended as
>      a general-purpose "COPY DATABASE" facility. The principal
>      limitation is that no other sessions can be connected to the
>      template database while it is being copied. CREATE DATABASE will
>      fail if any other connection exists when it starts; otherwise, new
>      connections to the template database are locked out until CREATE
>      DATABASE completes. See Section 21.3 for more information.
>
> I think that this limitation pretty much means no one should use CREATE
> DATABASE for cloning live databases in production environment (because
> of the locking).
>
> It also seems to me the "general-purpose COPY DATABASE" described in the
> docs is what we're describing in this thread.
>


Notwithstanding what the docs say, I have seen CREATE DATABASE used 
plenty of times, and quite effectively, to clone databases. I don't 
think making it do twice the IO in the general case is going to go down 
well.

cheers

andrew




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jeff Janes
Дата:
Сообщение: Re: pgcrypto: PGP signatures
Следующее
От: Andres Freund
Дата:
Сообщение: Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT