Re: Data-only pg_rewind, take 2

Поиск
Список
Период
Сортировка
От Chris Travers
Тема Re: Data-only pg_rewind, take 2
Дата
Msg-id CAN-RpxC2Fn2eZnpa6J9DB-Bgt_8K-z816Ys7B=j3zf1Rx30GqQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Data-only pg_rewind, take 2  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: Data-only pg_rewind, take 2
Список pgsql-hackers


On Mon, Mar 18, 2019 at 4:09 AM Michael Paquier <michael@paquier.xyz> wrote:
On Sun, Mar 17, 2019 at 09:00:57PM +0800, Chris Travers wrote:
> I also added test cases and some docs.  I don't know if the docs are
> sufficient.  Feedback is appreciated.

To be honest, I don't think that this approach is a good idea per the
same reasons as mentioned the last time, as this can cause pg_rewind
to break if any newly-added folder in the data directory has
non-replaceable data which is needed at the beginning of recovery and
cannot be automatically rebuilt.  So that's one extra maintenance
burden to worry about.

Actually I think this is safe.  Let me go through the cases not handled in the current behavior at all:

1.  For rpms we distribute, we clobber db logs, which means we overwrite application logs on the failed system with copes of logs from a replica.  This means that after you rewind, you lose the ability to figure out what went wrong.  This is an exquisitely bad idea unless you like data loss, and since this location is configurable you can't just say "well we put our logs here so we are excluding them."  Making this configured per rewind run strikes me as error-prone and something that will may lead to hidden interference with postmortems in the future, and postmortems are vitally important in terms of running database clusters with any sort of reliability guarantees.

2.  With the PostgreSQL.conf.auto now having recovery.conf info, you have some very significant failure cases with regard to replication and accidentally clobbering these.

On to the corner cases with --data-only enabled and the implications as I see them since this preserves files on the old master but does not copy them from the replica:

1.  If the changes are not wal logged (let's say CSVs created using a file foreign data wrapper), then deleting the files on rewind is where you can lose data, and --data-only avoids this, so here you *avoid* data loss where you put state files on the systems and do not rewind them because they were not wal-logged.  However the files still exist on the old master and are not deleted, so the data can easily be restored at that point.  Now, we can say, probably, that putting data files in $PGDATA that are not wal-logged is a bad idea.  But even if you put them somewhere else, pg_rewind isn't going to magically move them over to the replica for you.

2.  If the changes *are* wal-logged, then you have a problem with --data-only which is not present without it, namely that files can get out of sync with their wal-logged updates.  So in this case, --data-dir is *not* safe.

So here I think we have to issue a choice.  For now I don't feel comfortable changing the default behavior, but the default behavior could cause data loss in certain cases (including the ones I think you are concerned about).  Maybe it would be better if I document the above points?
 

Here is the reference of the last thread about the same topic:
https://www.postgresql.org/message-id/CAN-RpxD8Y7hMOjzd93hOqV6n8kPEo5cmW9gYm+8JirTPiFnmmQ@mail.gmail.com
--
Michael
-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEEG72nH6vTowiyblFKnvQgOdbyQH0FAlyPGgYACgkQnvQgOdby
QH0ekRAAiXcZRcDZwwwdbdlIpkniE/SuG5gaS7etUcAW88m8Vts5r4QoAEwUwGhg
EZzuOb77OKvti7lmOZkBgC0VB1PmFku+mIdqJtzvdcSDdlOkABcLaw4JRrm//2/7
jAi5Jw4um1EAz38dZXcWYwORavyo/4tR2S1PCyBA35F704w2NILAEDiq233P/ALf
M3cOjgwiFIPf0v9PJIfYsl56sIwqW4rofPH63V6teaz5W8Qf2zHSsG5CeNqnEix0
QZwwlzuhtAUYINab3oN3qMtF2q9vzJWCoSprzxx1qYrzPHEX8EMot0+L7YPdaAp0
xyiUKSzy1rXtpoW0rsJ7w5bdrh1gS7HzprCEtqRZGe6NlVDcNjXfJIG9sT6hMWYS
GTNbVH5VpKziw3byT8JpyqR38+iFqeXoLd1PEVadYjP62qOWbK8P2wokQwM+7EcK
Hpr8jrvgV5x8IEnhR4bPyTqjORCJMBGTXCNgT99cPYpuVSasr/0IsBC/RtmQfRB9
xhK0/qp5koQbX+mbLK11XsaFS9JAL2DNmSQg8TqICtV3bb0UTThs331XgjEjlOpm
1RjM6Tzwqq2is04mkkT+DtRAOclQuL8wWJWU5rr4fMKHCeFxtvUfwTyKlo2u+mI0
x7YZhd4AFCM14ga2Ko/qiGqeOWR5Y0RvYANmnmjG5bxQGi+Dtek=
=LNZB
-----END PGP SIGNATURE-----


--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 
Saarbrücker Straße 37a, 10405 Berlin

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Banck
Дата:
Сообщение: Re: Online verification of checksums
Следующее
От: Michael Banck
Дата:
Сообщение: Re: Online verification of checksums