Re: CREATE DATABASE with filesystem cloning

Поиск

Список

Период

Сортировка

От	Andres Freund
Тема	Re: CREATE DATABASE with filesystem cloning
Дата	10 октября 2023 г. 02:48:27
Msg-id	20231009234827.k5t2iz4bss7dwanp@awork3.anarazel.de обсуждение исходный текст
Ответ на	CREATE DATABASE with filesystem cloning (Thomas Munro <thomas.munro@gmail.com>)
Ответы	Re: CREATE DATABASE with filesystem cloning
Список	pgsql-hackers

Дерево обсуждения

Hi,

On 2023-10-07 18:51:45 +1300, Thomas Munro wrote:
> It should be a lot faster, and use less physical disk, than the two
> existing strategies on recent-ish XFS, BTRFS, very recent OpenZFS,
> APFS (= macOS), and it could in theory be extended to other systems
> that invented different system calls for this with more work (Solaris,
> Windows).  Then extra physical disk space will be consumed only as the
> two clones diverge.

> It's just like the old strategy=file_copy, except it asks the OS to do
> its best copying trick.  If you try it on a system that doesn't
> support copy-on-write, then copy_file_range() should fall back to
> plain old copy, but it might still be better than we could do, as it
> can push copy commands to network storage or physical storage.
> 
> Therefore, the usual caveats from strategy=file_copy also apply here.
> Namely that it has to perform checkpoints which could be very
> expensive, and there are some quirks/brokenness about concurrent
> backups and PITR.  Which makes me wonder if it's worth pursuing this
> idea.  Thoughts?

I think it'd be interesting to have. For the regression tests we do end up
spending a lot of disk throughput on contents duplicated between
template0/template1/postgres.  And I've plenty of time spent time copying huge
template databases, to have a reproducible starting point for some benchmark
that's expensive to initialize.

If we do this, I think we should consider creating template0, template1 with
the new strategy, so that a new initdb cluster ends up with deduplicated data.

FWIW, I experimented with using cp -c on macos for the initdb template, and
that provided some further gain. I suspect that that gain would increase if
template0/template1/postgres were deduplicated.

> diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c
> index e04bc3941a..8c963ff548 100644
> --- a/src/backend/storage/file/copydir.c
> +++ b/src/backend/storage/file/copydir.c
> @@ -19,14 +19,21 @@
>  #include "postgres.h"
>  
>  #include <fcntl.h>
> +#include <limits.h>
>  #include <unistd.h>
>  
> +#ifdef HAVE_COPYFILE_H
> +#include <copyfile.h>
> +#endif

We already have code around this in src/bin/pg_upgrade/file.c, seems we ought
to move it somewhere in src/port?

Greetings,

Andres Freund

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Peter Geoghegan
Дата: 10 октября 2023 г., 02:46:26
Сообщение: Re: post-recovery amcheck expectations

Следующее

От: vignesh C
Дата: 10 октября 2023 г., 03:43:17
Сообщение: Re: typo in couple of places

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: CREATE DATABASE with filesystem cloning

Предыдущее

Следующее