Обсуждение: Fastest way to duplicate a quite large database

Поиск
Список
Период
Сортировка

Fastest way to duplicate a quite large database

От
Edson Richter
Дата:
Hi!

I have a database "Customer" with about 60Gb of data.
I know I can backup and restore, but this seems too slow.

Is there any other option to duplicate this database as "CustomerTest"
as fast as possible (even fastar than backup/restore) - better if in one
operation (something like "copy database A to B")?
I would like to run this everyday, overnight, with minimal impact to
prepare a test environment based on production data.

Thanks,

--
Atenciosamente,

Edson Carlos Ericksson Richter



Re: Fastest way to duplicate a quite large database

От
John R Pierce
Дата:
On 4/12/2016 7:25 AM, Edson Richter wrote:
>
> I have a database "Customer" with about 60Gb of data.
> I know I can backup and restore, but this seems too slow.
>
> Is there any other option to duplicate this database as "CustomerTest"
> as fast as possible (even fastar than backup/restore) - better if in
> one operation (something like "copy database A to B")?
> I would like to run this everyday, overnight, with minimal impact to
> prepare a test environment based on production data.


copy to the same machine, or copy to a different test server? different
answers.



--
john r pierce, recycling bits in santa cruz



Re: Fastest way to duplicate a quite large database

От
Edson Richter
Дата:
Same machine, same cluster - just different database name.

Atenciosamente,

Edson Carlos Ericksson Richter

Em 12/04/2016 11:46, John R Pierce escreveu:
> On 4/12/2016 7:25 AM, Edson Richter wrote:
>>
>> I have a database "Customer" with about 60Gb of data.
>> I know I can backup and restore, but this seems too slow.
>>
>> Is there any other option to duplicate this database as
>> "CustomerTest" as fast as possible (even fastar than backup/restore)
>> - better if in one operation (something like "copy database A to B")?
>> I would like to run this everyday, overnight, with minimal impact to
>> prepare a test environment based on production data.
>
>
> copy to the same machine, or copy to a different test server?
> different answers.
>
>
>



Re: Fastest way to duplicate a quite large database

От
John McKown
Дата:
On Tue, Apr 12, 2016 at 9:25 AM, Edson Richter <edsonrichter@hotmail.com> wrote:
Hi!

I have a database "Customer" with about 60Gb of data.
I know I can backup and restore, but this seems too slow.

Is there any other option to duplicate this database as "CustomerTest" as fast as possible (even fastar than backup/restore) - better if in one operation (something like "copy database A to B")?
I would like to run this everyday, overnight, with minimal impact to prepare a test environment based on production data.

​Hum, I don't know exactly how to do it, but on Linux, you could put the "Customer" database in a tablespace which resides on a BTRFS filesystem. BTRFS can do a quick "snapshot" of the filesystem and you can then set things for "incremental backup", as talked about here: https://btrfs.wiki.kernel.org/index.php/Incremental_Backup . From some reading, btrfs is a performance dog compared to others.
interesting take using various filesystems for PostgreSQL: http://www.slideshare.net/fuzzycz/postgresql-on-ext4-xfs-btrfs-and-zfs
<quote from above>
...
So we managed to take fork a 15 GB database in 6 seconds with only a small hiccup in performance. We are ready to start up the forked database.
...
<quote/>

I got a number of hits searching on "postgresql btrfs" using Google search.


Thanks,

--
Atenciosamente,

Edson Carlos Ericksson Richter


--
How many surrealists does it take to screw in a lightbulb? One to hold the griffon and one to fill the bathtub with brightly colored LEDs.

Maranatha! <><
John McKown

Re: Fastest way to duplicate a quite large database

От
Louis Battuello
Дата:
> On Apr 12, 2016, at 10:51 AM, Edson Richter <edsonrichter@hotmail.com> wrote:
>
> Same machine, same cluster - just different database name.
>
> Atenciosamente,
>
> Edson Carlos Ericksson Richter
>
> Em 12/04/2016 11:46, John R Pierce escreveu:
>> On 4/12/2016 7:25 AM, Edson Richter wrote:
>>>
>>> I have a database "Customer" with about 60Gb of data.
>>> I know I can backup and restore, but this seems too slow.
>>>
>>> Is there any other option to duplicate this database as "CustomerTest" as fast as possible (even fastar than
backup/restore)- better if in one operation (something like "copy database A to B")? 
>>> I would like to run this everyday, overnight, with minimal impact to prepare a test environment based on production
data. 
>>
>>
>> copy to the same machine, or copy to a different test server? different answers.
>>
>>
>>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Not sure how fast is “fast” for your system. You could try:

create database customer_test with template customer;

I’m able to duplicate a 20GB in a couple minutes with the above command.

Couple caveats:

1. No active connections to customer are allowed during the create.
2. You’ll likely have to recreate the search_path and reissue connect grants to the newly created database.



Re: Fastest way to duplicate a quite large database

От
Adrian Klaver
Дата:
On 04/12/2016 07:51 AM, Edson Richter wrote:
> Same machine, same cluster - just different database name.

Hmm, running tests against the same cluster you are running the
production database would seem to be a performance hit against the
production database and potentially dangerous should the tests trip a
bug that crashes the server.

>
> Atenciosamente,
>
> Edson Carlos Ericksson Richter
>
> Em 12/04/2016 11:46, John R Pierce escreveu:
>> On 4/12/2016 7:25 AM, Edson Richter wrote:
>>>
>>> I have a database "Customer" with about 60Gb of data.
>>> I know I can backup and restore, but this seems too slow.
>>>
>>> Is there any other option to duplicate this database as
>>> "CustomerTest" as fast as possible (even fastar than backup/restore)
>>> - better if in one operation (something like "copy database A to B")?
>>> I would like to run this everyday, overnight, with minimal impact to
>>> prepare a test environment based on production data.
>>
>>
>> copy to the same machine, or copy to a different test server?
>> different answers.
>>
>>
>>
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Fastest way to duplicate a quite large database

От
John R Pierce
Дата:
On 4/12/2016 7:55 AM, John McKown wrote:
> Hum, I don't know exactly how to do it, but on Linux, you could put
> the "Customer" database in a tablespace which resides on a BTRFS
> filesystem. BTRFS can do a quick "snapshot" of the filesystem....

except, tablespaces aren't standalone, and there's no provision for
importing the contents of the tablespace.     all the metadata remains
in the default tablespace, which leaves all sorts of room for problems
if you do this.

the /best/ way to achieve what the OP is asking for would likely be to
run the tests on a seperate server (or at least seperate postgres
instance aka cluster), and use pg_basebackup to rebuild this test instance.




--
john r pierce, recycling bits in santa cruz



Re: Fastest way to duplicate a quite large database

От
John McKown
Дата:
On Tue, Apr 12, 2016 at 10:14 AM, John R Pierce <pierce@hogranch.com> wrote:
On 4/12/2016 7:55 AM, John McKown wrote:
Hum, I don't know exactly how to do it, but on Linux, you could put the "Customer" database in a tablespace which resides on a BTRFS filesystem. BTRFS can do a quick "snapshot" of the filesystem....

except, tablespaces aren't standalone, and there's no provision for importing the contents of the tablespace.     all the metadata remains in the default tablespace, which leaves all sorts of room for problems if you do this.

the /best/ way to achieve what the OP is asking for would likely be to run the tests on a seperate server (or at least seperate postgres instance aka cluster), and use pg_basebackup to rebuild this test instance.

​Good point. The reference, ​ http://www.cybertec.at/2015/01/forking-databases-the-art-of-copying-without-copying/ , actually uses btrfs to make a copy using btrfs facilities, which is then used in a _different_ PostgreSQL server instance on the same machine. Actually, the data base _name_ would stay "Customer", but would reside in a separate PostgreSQL server's PGDATA area instead of the production server's PGDATA. Basically, any productions program or scripts could be used, as is, with the test data simply by using the test server's port or unix socket instead of the production server's. Not quite as good as being on a separate Linux image, but not too shabby either. IMO.

 

--
john r pierce, recycling bits in santa cruz



--
How many surrealists does it take to screw in a lightbulb? One to hold the griffon and one to fill the bathtub with brightly colored LEDs.

Maranatha! <><
John McKown

Re: Fastest way to duplicate a quite large database

От
Louis Battuello
Дата:
> On Apr 12, 2016, at 11:14 AM, John R Pierce <pierce@hogranch.com> wrote:
>
> On 4/12/2016 7:55 AM, John McKown wrote:
>> Hum, I don't know exactly how to do it, but on Linux, you could put the "Customer" database in a tablespace which
resideson a BTRFS filesystem. BTRFS can do a quick "snapshot" of the filesystem.... 
>
> except, tablespaces aren't standalone, and there's no provision for importing the contents of the tablespace.     all
themetadata remains in the default tablespace, which leaves all sorts of room for problems if you do this. 
>
> the /best/ way to achieve what the OP is asking for would likely be to run the tests on a seperate server (or at
leastseperate postgres instance aka cluster), and use pg_basebackup to rebuild this test instance. 
>
>
>
>
> --
> john r pierce, recycling bits in santa cruz
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

I agree with John’s post. I should have mentioned that my template database is never production. It’s an obfuscated
copyof the production data on separate hardware. I use  the "create with template” to spin up copies for
developers/testersto provide a representative data set (not identical to production). And, since the create doesn’t
copytable statistics, I have to kick off a post-copy background process to gather them: 

nohup vacuumdb --analyze-only --quiet --dbname=${DATABASE} &>/dev/null &

Still, with all that, users are still able to drop and recreate a test database within a coffee break.