Re: CREATE DATABASE with filesystem cloning

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: CREATE DATABASE with filesystem cloning
Дата
Msg-id 20231009234827.k5t2iz4bss7dwanp@awork3.anarazel.de
обсуждение исходный текст
Ответ на CREATE DATABASE with filesystem cloning  (Thomas Munro <thomas.munro@gmail.com>)
Ответы Re: CREATE DATABASE with filesystem cloning
Список pgsql-hackers
Hi,

On 2023-10-07 18:51:45 +1300, Thomas Munro wrote:
> It should be a lot faster, and use less physical disk, than the two
> existing strategies on recent-ish XFS, BTRFS, very recent OpenZFS,
> APFS (= macOS), and it could in theory be extended to other systems
> that invented different system calls for this with more work (Solaris,
> Windows).  Then extra physical disk space will be consumed only as the
> two clones diverge.

> It's just like the old strategy=file_copy, except it asks the OS to do
> its best copying trick.  If you try it on a system that doesn't
> support copy-on-write, then copy_file_range() should fall back to
> plain old copy, but it might still be better than we could do, as it
> can push copy commands to network storage or physical storage.
> 
> Therefore, the usual caveats from strategy=file_copy also apply here.
> Namely that it has to perform checkpoints which could be very
> expensive, and there are some quirks/brokenness about concurrent
> backups and PITR.  Which makes me wonder if it's worth pursuing this
> idea.  Thoughts?

I think it'd be interesting to have. For the regression tests we do end up
spending a lot of disk throughput on contents duplicated between
template0/template1/postgres.  And I've plenty of time spent time copying huge
template databases, to have a reproducible starting point for some benchmark
that's expensive to initialize.

If we do this, I think we should consider creating template0, template1 with
the new strategy, so that a new initdb cluster ends up with deduplicated data.


FWIW, I experimented with using cp -c on macos for the initdb template, and
that provided some further gain. I suspect that that gain would increase if
template0/template1/postgres were deduplicated.


> diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c
> index e04bc3941a..8c963ff548 100644
> --- a/src/backend/storage/file/copydir.c
> +++ b/src/backend/storage/file/copydir.c
> @@ -19,14 +19,21 @@
>  #include "postgres.h"
>  
>  #include <fcntl.h>
> +#include <limits.h>
>  #include <unistd.h>
>  
> +#ifdef HAVE_COPYFILE_H
> +#include <copyfile.h>
> +#endif

We already have code around this in src/bin/pg_upgrade/file.c, seems we ought
to move it somewhere in src/port?

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: post-recovery amcheck expectations
Следующее
От: vignesh C
Дата:
Сообщение: Re: typo in couple of places