Обсуждение: Re: Methods to quickly spin up copies of an existing databases

Поиск
Список
Период
Сортировка

Re: Methods to quickly spin up copies of an existing databases

От
Kevin Wilkinson
Дата:

if you are able/willing to use ZFS (rather than ext4, xfs, ...) to store your database, then it might work for you. ZFS is copy-on-write so it can very quickly clone a database.

kevin

On 3/1/2019 12:08 PM, Arjun Ranade wrote:
I'm working on a project that requires on-demand creation of a fresh database as quick as possible (seconds).  Essentially, this is a build server that will require a cloned instance of Postgres to run unit tests on.  So the pattern of use would be:

  • Build is triggered
  • New postgres instance is created and connection details are provided to build server
  • Build runs unit tests that connect to the database
  • Build completes, database is no longer needed and can be thrown away
The "template" database will be approximately 50gb in size so my question is what is the fastest way to spin up new instances of this database?  I've thought of using "CREATE DATABASE WITH TEMPLATE..." but that takes too long.  I've also thought about cloning the $PGDATA directory, changing the port number in postgresql.conf, and starting a new instance of postgres pointing to the cloned data directory.

Both of these methods take longer than I'd like, so I'm wondering if anyone else has this use case and what methods they've used to solve this.

Thanks,
Arjun

Re: Methods to quickly spin up copies of an existing databases

От
Kenneth Marshall
Дата:
On Fri, Mar 01, 2019 at 11:57:30AM -0800, Kevin Wilkinson wrote:
> if you are able/willing to use ZFS (rather than ext4, xfs, ...) to
> store your database, then it might work for you. ZFS is
> copy-on-write so it can very quickly clone a database.
> 
> kevin

Hi Arjun

Redhat 7 does have LVM snapshots that does something similar. Kevin is
correct, COW is the secret.

Regards,
Ken


Methods to quickly spin up copies of an existing databases

От
Arjun Ranade
Дата:
I'm working on a project that requires on-demand creation of a fresh database as quick as possible (seconds).  Essentially, this is a build server that will require a cloned instance of Postgres to run unit tests on.  So the pattern of use would be:

  • Build is triggered
  • New postgres instance is created and connection details are provided to build server
  • Build runs unit tests that connect to the database
  • Build completes, database is no longer needed and can be thrown away
The "template" database will be approximately 50gb in size so my question is what is the fastest way to spin up new instances of this database?  I've thought of using "CREATE DATABASE WITH TEMPLATE..." but that takes too long.  I've also thought about cloning the $PGDATA directory, changing the port number in postgresql.conf, and starting a new instance of postgres pointing to the cloned data directory.

Both of these methods take longer than I'd like, so I'm wondering if anyone else has this use case and what methods they've used to solve this.

Thanks,
Arjun

Re: Methods to quickly spin up copies of an existing databases

От
Jerry Sievers
Дата:
Kenneth Marshall <ktm@rice.edu> writes:

> On Fri, Mar 01, 2019 at 11:57:30AM -0800, Kevin Wilkinson wrote:
>
>> if you are able/willing to use ZFS (rather than ext4, xfs, ...) to
>> store your database, then it might work for you. ZFS is
>> copy-on-write so it can very quickly clone a database.
>> 
>> kevin
>
> Hi Arjun
>
> Redhat 7 does have LVM snapshots that does something similar. Kevin is
> correct, COW is the secret.

Going a bit further...

Any sort of storage backend that can support *atomic* snapshots across
*all* volumes (in case multiple tablespaces ar involved), can be used to
permit $instantaneous cloning where instantaneous relates to the actual
snapshot time and crash recovery.

Inability to make *atomic* snaps but perhaps seperate snaps very
quickly, combined with PITR can result in clones of high-churn systems
sized in TBs (as in our use case) to be provisioned in about 1 minute.

Nothing but the most trivial system can be cloned rapidly and perhaps
any number of times in succession without employment of
thin-provisioning, copy-on-write (as mentioned already), etc.

                   Virtual copy is more and more compelling as physical
                   size, or more precisely, *physical* copy time grow.

HTH



>
> Regards,
> Ken
>
>

-- 
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net


Re: Methods to quickly spin up copies of an existing databases

От
Bruce Klein
Дата:
Apologies for the low tech suggestion, but if this really is a clone of a previously existing template, could the clone operation just be done ahead of time? I.e., have the build server keep X copies ready for use and generate additional copies as those are consumed, so that the cloning is no longer on the critical path?

On Fri, Mar 1, 2019 at 11:09 AM Jerry Sievers <gsievers19@comcast.net> wrote:
Kenneth Marshall <ktm@rice.edu> writes:

> On Fri, Mar 01, 2019 at 11:57:30AM -0800, Kevin Wilkinson wrote:
>
>> if you are able/willing to use ZFS (rather than ext4, xfs, ...) to
>> store your database, then it might work for you. ZFS is
>> copy-on-write so it can very quickly clone a database.
>>
>> kevin
>
> Hi Arjun
>
> Redhat 7 does have LVM snapshots that does something similar. Kevin is
> correct, COW is the secret.

Going a bit further...

Any sort of storage backend that can support *atomic* snapshots across
*all* volumes (in case multiple tablespaces ar involved), can be used to
permit $instantaneous cloning where instantaneous relates to the actual
snapshot time and crash recovery.

Inability to make *atomic* snaps but perhaps seperate snaps very
quickly, combined with PITR can result in clones of high-churn systems
sized in TBs (as in our use case) to be provisioned in about 1 minute.

Nothing but the most trivial system can be cloned rapidly and perhaps
any number of times in succession without employment of
thin-provisioning, copy-on-write (as mentioned already), etc.

                   Virtual copy is more and more compelling as physical
                   size, or more precisely, *physical* copy time grow.

HTH



>
> Regards,
> Ken
>
>

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net

Re: Methods to quickly spin up copies of an existing databases

От
Arjun Ranade
Дата:
Pre-copying is not really an option since we could potentially need 1-X instances so it needs to be scalable.  XFS also allows for "cp --reflink" which I could do on a PGDATA directory and then change the port number.  That's probably the method I'll try first.

We do use barman, but again a barman recover operation is pretty much just an rsync. 

I will first try XFS, then ZFS, and if those don't work, I'll look into a SAN option.



On Fri, Mar 1, 2019 at 4:29 PM Bruce Klein <brucek@gmail.com> wrote:
Apologies for the low tech suggestion, but if this really is a clone of a previously existing template, could the clone operation just be done ahead of time? I.e., have the build server keep X copies ready for use and generate additional copies as those are consumed, so that the cloning is no longer on the critical path?

On Fri, Mar 1, 2019 at 11:09 AM Jerry Sievers <gsievers19@comcast.net> wrote:
Kenneth Marshall <ktm@rice.edu> writes:

> On Fri, Mar 01, 2019 at 11:57:30AM -0800, Kevin Wilkinson wrote:
>
>> if you are able/willing to use ZFS (rather than ext4, xfs, ...) to
>> store your database, then it might work for you. ZFS is
>> copy-on-write so it can very quickly clone a database.
>>
>> kevin
>
> Hi Arjun
>
> Redhat 7 does have LVM snapshots that does something similar. Kevin is
> correct, COW is the secret.

Going a bit further...

Any sort of storage backend that can support *atomic* snapshots across
*all* volumes (in case multiple tablespaces ar involved), can be used to
permit $instantaneous cloning where instantaneous relates to the actual
snapshot time and crash recovery.

Inability to make *atomic* snaps but perhaps seperate snaps very
quickly, combined with PITR can result in clones of high-churn systems
sized in TBs (as in our use case) to be provisioned in about 1 minute.

Nothing but the most trivial system can be cloned rapidly and perhaps
any number of times in succession without employment of
thin-provisioning, copy-on-write (as mentioned already), etc.

                   Virtual copy is more and more compelling as physical
                   size, or more precisely, *physical* copy time grow.

HTH



>
> Regards,
> Ken
>
>

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net

how best to specify table constraints with "create table as "

От
Kevin Wilkinson
Дата:
i want to recluster an immutable table without locking the table and then add the table as a partition of a parent table. my plan was:
create table tbl_cpy as select * from tbl order by c1 asc;
alter table tbl_cpy add constraint c1 not null, c1>=lo c1<hi;
alter table tbl_parent attach partition tbl_cpy for values from (lo) to (hi);
i want this to run quickly. so, i populate the table with a parallel query (using create table as ...). i do NOT want attach partition to rescan the table (which it does to validate the value range) so i have to explicitly add constraints to tbl_cpy. but adding these constraints itself causes a scan of the table.

so, i want to specify the constraints when the table is created. but that seems not possible. if i first create an empty table with constraints, then i cannot populate it with a parallel query (since insert into is not parallel).

am i missing something? any ideas?

thanks,

kevin

Re: Methods to quickly spin up copies of an existing databases

От
Karsten Hilbert
Дата:
On Fri, Mar 01, 2019 at 04:51:32PM -0500, Arjun Ranade wrote:

> Pre-copying is not really an option since we could potentially need 1-X
> instances so it needs to be scalable.

Would it work to have a PG instance with a ready-to-use
cluster/DB inside a VM, and make a copy of that as needed ? 
The copies then get thrown away.

Same here, pre-copying a bunch of VMs would help. Disk space
is (apparently) cheaper than time (for your use case).

Karsten
-- 
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B