Re: pg_verifybackup: TAR format backup verification

Поиск
Список
Период
Сортировка
От Sravan Kumar
Тема Re: pg_verifybackup: TAR format backup verification
Дата
Msg-id CA+=NbjjkFADANQoyvgTQfKH8ZerxpmsJD70R17V8ZtfaoyRYwg@mail.gmail.com
обсуждение исходный текст
Ответ на pg_verifybackup: TAR format backup verification  (Amul Sul <sulamul@gmail.com>)
Ответы Re: pg_verifybackup: TAR format backup verification
Список pgsql-hackers
Hi Amul,
     thanks for working on this.

+ file_name_len = strlen(relpath);
+ if (file_name_len < file_extn_len ||
+ strcmp(relpath + file_name_len - file_extn_len, file_extn) != 0)
+ {
+ if (compress_algorithm == PG_COMPRESSION_NONE)
+ report_backup_error(context,
+ "\"%s\" is not a valid file, expecting tar file",
+ relpath);
+ else
+ report_backup_error(context,
+ "\"%s\" is not a valid file, expecting \"%s\" compressed tar file",
+ relpath,
+ get_compress_algorithm_name(compress_algorithm));
+ return;
+ }

I believe pg_verifybackup needs to exit after reporting a failure here since it could not figure out a streamer to allocate.

Also, v1-0002 removes #include "pqexpbuffer.h" from astreamer.h and adds it to the new .h file and in v1-0004 it
reverts the change. So this can be avoided altogether. 

On Tue, Jul 9, 2024 at 3:24 PM Amul Sul <sulamul@gmail.com> wrote:
Hi,

Currently, pg_verifybackup only works with plain (directory) format backups.
This proposal aims to support tar-format backups too.  We will read the tar
files from start to finish and verify each file inside against the
backup_manifest information, similar to how it verifies plain files.

We are introducing new options to pg_verifybackup: -F, --format=p|t and -Z,
--compress=METHOD, which allow users to specify the backup format and
compression type, similar to the options available in pg_basebackup. If these
options are not provided, the backup format and compression type will be
automatically detected.  To determine the format, we will search for PG_VERSION
file in the backup directory — if found, it indicates a plain backup;
otherwise, it
is a tar-format backup.  For the compression type, we will check the extension
of base.tar.xxx file of tar-format backup. Refer to patch 0008 for the details.

The main challenge is to structure the code neatly.  For plain-format backups,
we verify bytes directly from the files.  For tar-format backups, we read bytes
from the tar file of the specific file we care about.  We need an abstraction
to handle both formats smoothly, without using many if statements or special
cases.

To achieve this goal, we need to reuse existing infrastructure without
duplicating code, and for that, the major work involved here is the code
refactoring. Here is a breakdown of the work:

1. BBSTREAMER Rename and Relocate:
BBSTREAMER, currently used in pg_basebackup for reading and decompressing TAR
files; can also be used for pg_verifybackup. In the future, it could support
other tools like pg_combinebackup for merging TAR backups without extraction,
and pg_waldump for verifying WAL files from the tar backup.  For that
accessibility,
BBSTREAMER needs to be relocated to a shared directory.

Moreover, renaming BBSTREAMER to ASTREAMER (short for Archive Streamer) would
better indicate its general application across multiple tools. Moving it to
src/fe_utils directory is appropriate, given its frontend infrastructure use.

2. pg_verifybackup Code Refactoring:
The existing code for plain backup verification will be split into separate
files or functions, so it can also be reused for tar backup verification.

3. Adding TAR Backup Verification:
Finally, patches will be added to implement TAR backup verification, along with
tests and documentation.

Patches 0001-0003 focus on renaming and relocating BBSTREAMER, patches
0004-0007 on splitting the existing verification code, and patches 0008-0010 on
adding TAR backup verification capabilities, tests, and documentation. The last
set could be a single patch but is split to make the review easier.

Please take a look at the attached patches and share your comments,
suggestions, or any ways to enhance them. Your feedback is greatly
appreciated.

Thank you !

--
Regards,
Amul Sul
EDB: http://www.enterprisedb.com


--
Thanks & Regards,
Sravan Velagandula
The Enterprise PostgreSQL Company

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Hayato Kuroda (Fujitsu)"
Дата:
Сообщение: RE: Slow catchup of 2PC (twophase) transactions on replica in LR
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Slow catchup of 2PC (twophase) transactions on replica in LR