[HACKERS] WIP: Restricting pg_rewind to data/wal dirs

Поиск
Список
Период
Сортировка
От Chris Travers
Тема [HACKERS] WIP: Restricting pg_rewind to data/wal dirs
Дата
Msg-id CAN-RpxDPE4baiMMJ6TLd6AiUvrG=YrC05tGxrgp4aUutH9j5TQ@mail.gmail.com
обсуждение исходный текст
Ответы Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs  (Robert Haas <robertmhaas@gmail.com>)
Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs  (Michael Paquier <michael.paquier@gmail.com>)
Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs  (Chris Travers <chris.travers@adjust.com>)
Список pgsql-hackers
Hi;

There are still some cleanup bits needed here but I wanted to get feedback on my general direction.

I hope to submit for commit fest soon if the general feedback is good.  Tests are passing (with adjustments intended for change of behaviour in one test script).  I want to note that Crimson Thompson (also with Adjust) has provided some valuable discussion on this code.

The Problem:

pg_rewind makes no real guarantees regarding the final state of non-data files or directories.  It.checks to see if the timeline has incremented (and therefore guarantees that if successful the data directories are on the same timeline) but for non-data files, these are clobbered if we rewind and left intact if not.  These other files include postgresql.auto.conf, replication slots, and can include log files.

Copying logs over to the new slave is something one probably never wants to do (same with replication slots), and the behaviours here can mask troubleshooting regarding what a particular master failed, cause wal segments to build up, automatic settings changes, and other undesirable behaviours.  Together these make pg_rewind very difficult to use properly and push tasks to replication management tooling that the management tools are not in a good position to handle correctly.

Designing the Fix:

Two proposed fixes have been considered and one selected:  Either we whitelist directories and only rewind those.  The other was to allow shell globs to be provided that could be used to exclude files.  The whitelisting solution was chosen for a number of reasons.

When pg_rewind "succeeds" but not quite correctly the result is usually a corrupted installation which requires a base backup to replace it anyway.  In a recovery situation, sometimes pressures occur which render human judgment less effective, and therefore glob exclusion strikes me as something which would probably do more harm than good, but maybe I don't understand the use case (comments as to why some people look at the other solution as preferable would be welcome).

In going with the whitelisting solution, we chose to include all directories with WAL-related information.    This allows more predicable interaction with other parts of the replication chain.  Consequently not only do we copy pg_wal and pg_xact but also commit timestamps and so forth.

The Solution:

The solution is a whitelist of directories specified which are the only ones which are synchronised.  The relevant part of this patch is:

+/* List of directories to synchronize:

+ * base data dirs (and ablespaces)

+ * wal/transaction data

+ * and that is it.

+ *

+ * This array is null-terminated to make

+ * it easy to expand

+ */

+

+const char *rewind_dirs[] = {

+    "base",

+    "global",

+    "pg_commit_ts",

+    "pg_logical",

+    "pg_multixact",

+    "pg_serial",

+    "pg_subtrans",

+    "pg_tblspc",

+    "pg_twophase",

+    "pg_wal",

+    "pg_xact",

+    NULL

+};


From there we iterate over this array for directory-based approaches in copy_fetch.c and with one query per directory in libpq_fetch.c.  This also means shifting from the basic interface from PQexec to PQexecParams.  It would be possible to move to binary formats too, but this was not done currently in order to simplify code review (that could be a separate independent patch at a later time).


Testing Done:

The extra files tests correctly test this change in behaviour.  The tests have been modified, and diagnostics in cases of failures expanded, in this case.  The other tests provide good coverage of whether pg_rewind does what it is supposed to do.

Cleanup still required:

I accidentally left Carp::Always in the PM in this perl module.  It will be fixed.

I renamed one of the functions used to have a more descriptive name but currently did not remove the old function yet. 


Feedback is very welcome.  pg_rewind is a very nice piece of software.  I am hoping that these sorts of changes will help ensure that it is easier to use and provides more predictable results.
--
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 
Saarbrücker Straße 37a, 10405 Berlin

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: [HACKERS] MERGE SQL Statement for PG11
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: [HACKERS] Re: proposal - psql: possibility to specify sort fordescribe commands, when size is printed