Re: WIP/PoC for parallel backup

Поиск
Список
Период
Сортировка
От Asif Rehman
Тема Re: WIP/PoC for parallel backup
Дата
Msg-id CADM=JegEJSuNguL6EenCOM3mTzyuDnxMGnBotdGcfA7Jsz6fsg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: WIP/PoC for parallel backup  (Jeevan Chalke <jeevan.chalke@enterprisedb.com>)
Список pgsql-hackers


On Wed, Nov 27, 2019 at 1:38 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:


On Wed, Nov 13, 2019 at 7:04 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

Sorry, I sent the wrong patches. Please see the correct version of the patches (_v6).

Review comments on these patches:

1.
+    XLogRecPtr    wal_location;

Looking at the other field names in basebackup_options structure, let's use
wallocation instead. Or better startwallocation to be precise.

2.
+    int32        size;

Should we use size_t here?

3.
I am still not sure why we need SEND_BACKUP_FILELIST as a separate command.
Can't we return the file list with START_BACKUP itself?

4.
+        else if (
+#ifndef WIN32
+                 S_ISLNK(statbuf.st_mode)
+#else
+                 pgwin32_is_junction(pathbuf)
+#endif
+            )
+        {
+            /*
+             * If symlink, write it as a directory. file symlinks only allowed
+             * in pg_tblspc
+             */
+            statbuf.st_mode = S_IFDIR | pg_dir_create_mode;
+            _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf, false);
+        }

In normal backup mode, we skip the special file which is not a regular file or
a directory or a symlink inside pg_tblspc. But in your patch, above code,
treats it as a directory. Should parallel backup too skip such special files?

Yeah going through the code again, I found it a little bit inconsistent. In fact
SendBackupFiles function is supposed to send the files that were requested of
it. However, currently is performing these tasks:

1) If the requested file were to be a directory, it will return a TAR directory entry.
2) If the requested files were to be symlink inside pg_tblspc, it will return the link path.
3) and as you pointed out above, if the requested files were a symlink outside pg_tblspc 
and inside PGDATA then it will return TAR directory entry.

I think that this function should not take care of any of the above. Instead, it should
be the client (i.e. pg_basebackup) managing it. The SendBackupFiles should only send the
regular files and ignore the request of any other kind, be it a directory or symlink.

Any thoughts?


5.
Please keep header file inclusions in alphabetical order in basebackup.c and
pg_basebackup.c

6.
+        /*
+         * build query in form of: SEND_BACKUP_FILES ('base/1/1245/32683',
+         * 'base/1/1245/32683', ...) [options]
+         */

Please update these comments as we fetch one file at a time.

7.
+backup_file:
+            SCONST                            { $$ = (Node *) makeString($1); }
+            ;
+

Instead of having this rule with only one constant terminal, we can use
SCONST directly in backup_files_list. However, I don't see any issue with
this approach either, just trying to reduce the rules.

8.
Please indent code within 80 char limit at all applicable places.

9.
Please fix following typos:

identifing => identifying
optionaly => optionally
structre => structure
progrsss => progress
Retrive => Retrieve
direcotries => directories


=====

The other mail thread related to backup manifest [1], is creating a
backup_manifest file and sends that to the client which has optional
checksum and other details including filename, file size, mtime, etc.
There is a patch on the same thread which is then validating the backup too.

Since this patch too gets a file list from the server and has similar
details (except checksum), can somehow parallel backup use the backup-manifest
infrastructure from that patch?

This was discussed earlier in the thread, and as Robert suggested, it would complicate the
code to no real benefit.


When the parallel backup is in use, will there be a backup_manifest file
created too? I am just visualizing what will be the scenario when both these
features are checked-in.

Yes, I think it should. Since the full backup will have a manifest file, there is no
reason for parallel backup to not support it.

I'll share the updated patch in the next couple of days.
 
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jeremy Finzel
Дата:
Сообщение: Re: Index corruption / planner issue with one table in my pg 11.6 instance
Следующее
От: Asif Rehman
Дата:
Сообщение: Re: WIP/PoC for parallel backup