pg_verifybackup: TAR format backup verification

Поиск
Список
Период
Сортировка
От Amul Sul
Тема pg_verifybackup: TAR format backup verification
Дата
Msg-id CAAJ_b94StvLWrc_p4q-f7n3OPfr6GhL8_XuAg2aAaYZp1tF-nw@mail.gmail.com
обсуждение исходный текст
Список pgsql-hackers
Hi,

Currently, pg_verifybackup only works with plain (directory) format backups.
This proposal aims to support tar-format backups too.  We will read the tar
files from start to finish and verify each file inside against the
backup_manifest information, similar to how it verifies plain files.

We are introducing new options to pg_verifybackup: -F, --format=p|t and -Z,
--compress=METHOD, which allow users to specify the backup format and
compression type, similar to the options available in pg_basebackup. If these
options are not provided, the backup format and compression type will be
automatically detected.  To determine the format, we will search for PG_VERSION
file in the backup directory — if found, it indicates a plain backup;
otherwise, it
is a tar-format backup.  For the compression type, we will check the extension
of base.tar.xxx file of tar-format backup. Refer to patch 0008 for the details.

The main challenge is to structure the code neatly.  For plain-format backups,
we verify bytes directly from the files.  For tar-format backups, we read bytes
from the tar file of the specific file we care about.  We need an abstraction
to handle both formats smoothly, without using many if statements or special
cases.

To achieve this goal, we need to reuse existing infrastructure without
duplicating code, and for that, the major work involved here is the code
refactoring. Here is a breakdown of the work:

1. BBSTREAMER Rename and Relocate:
BBSTREAMER, currently used in pg_basebackup for reading and decompressing TAR
files; can also be used for pg_verifybackup. In the future, it could support
other tools like pg_combinebackup for merging TAR backups without extraction,
and pg_waldump for verifying WAL files from the tar backup.  For that
accessibility,
BBSTREAMER needs to be relocated to a shared directory.

Moreover, renaming BBSTREAMER to ASTREAMER (short for Archive Streamer) would
better indicate its general application across multiple tools. Moving it to
src/fe_utils directory is appropriate, given its frontend infrastructure use.

2. pg_verifybackup Code Refactoring:
The existing code for plain backup verification will be split into separate
files or functions, so it can also be reused for tar backup verification.

3. Adding TAR Backup Verification:
Finally, patches will be added to implement TAR backup verification, along with
tests and documentation.

Patches 0001-0003 focus on renaming and relocating BBSTREAMER, patches
0004-0007 on splitting the existing verification code, and patches 0008-0010 on
adding TAR backup verification capabilities, tests, and documentation. The last
set could be a single patch but is split to make the review easier.

Please take a look at the attached patches and share your comments,
suggestions, or any ways to enhance them. Your feedback is greatly
appreciated.

Thank you !

--
Regards,
Amul Sul
EDB: http://www.enterprisedb.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kartyshov Ivan
Дата:
Сообщение: Re: [HACKERS] make async slave to wait for lsn to be replayed
Следующее
От: Amit Langote
Дата:
Сообщение: Re: Address the -Wuse-after-free warning in ATExecAttachPartition()