CREATE DATABASE with filesystem cloning

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема CREATE DATABASE with filesystem cloning
Дата
Msg-id CA+hUKGLM+t+SwBU-cHeMUXJCOgBxSHLGZutV5zCwY4qrCcE02w@mail.gmail.com
обсуждение исходный текст
Ответы Re: CREATE DATABASE with filesystem cloning  (Thomas Munro <thomas.munro@gmail.com>)
Re: CREATE DATABASE with filesystem cloning  (Andrew Dunstan <andrew@dunslane.net>)
Re: CREATE DATABASE with filesystem cloning  (Andres Freund <andres@anarazel.de>)
Re: CREATE DATABASE with filesystem cloning  (Peter Eisentraut <peter@eisentraut.org>)
Re: CREATE DATABASE with filesystem cloning  ("Dan Langille" <dan@langille.org>)
Список pgsql-hackers
Hello hackers,

Here is an experimental POC of fast/cheap database cloning.  For
clones from little template databases, no one cares much, but it might
be useful to be able to create a snapshot or fork of very large
database for testing/experimentation like this:

  create database foodb_snapshot20231007 template=foodb strategy=file_clone

It should be a lot faster, and use less physical disk, than the two
existing strategies on recent-ish XFS, BTRFS, very recent OpenZFS,
APFS (= macOS), and it could in theory be extended to other systems
that invented different system calls for this with more work (Solaris,
Windows).  Then extra physical disk space will be consumed only as the
two clones diverge.

It's just like the old strategy=file_copy, except it asks the OS to do
its best copying trick.  If you try it on a system that doesn't
support copy-on-write, then copy_file_range() should fall back to
plain old copy, but it might still be better than we could do, as it
can push copy commands to network storage or physical storage.

Therefore, the usual caveats from strategy=file_copy also apply here.
Namely that it has to perform checkpoints which could be very
expensive, and there are some quirks/brokenness about concurrent
backups and PITR.  Which makes me wonder if it's worth pursuing this
idea.  Thoughts?

I tested on bleeding edge FreeBSD/ZFS, where you need to set sysctl
vfs.zfs.bclone_enabled=1 to enable the optimisation, as it's still a
very new feature that is still being rolled out.  The system call
succeeds either way, but that controls whether the new database
initially shares blocks on disk, or get new copies.  I also tested on
a Mac.  In both cases I could clone large databases in a fraction of a
second.

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Erik Wienhold
Дата:
Сообщение: Re: Fix output of zero privileges in psql
Следующее
От: Richard Guo
Дата:
Сообщение: Check each of base restriction clauses for constant-FALSE-or-NULL