Re: pg_upgrade instructions involving "rsync --size-only" might lead to standby corruption?

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: pg_upgrade instructions involving "rsync --size-only" might lead to standby corruption?
Дата
Msg-id CA+TgmoYQXT1L9dbAP1246xoYZ0VXd5M+HwMgzJK6MGosZ5_UZg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: pg_upgrade instructions involving "rsync --size-only" might lead to standby corruption?  (Bruce Momjian <bruce@momjian.us>)
Ответы Re: pg_upgrade instructions involving "rsync --size-only" might lead to standby corruption?  (Bruce Momjian <bruce@momjian.us>)
Re: pg_upgrade instructions involving "rsync --size-only" might lead to standby corruption?  ("Andrey M. Borodin" <x4mmm@yandex-team.ru>)
Список pgsql-hackers
On Fri, Jun 30, 2023 at 1:41 PM Bruce Momjian <bruce@momjian.us> wrote:
> I think --size-only was chosen only because it is the minimal comparison
> option.

I think it's worse than that. I think that the procedure relies on
using the --size-only option to intentionally trick rsync into
thinking that files are identical when they're not.

Say we have a file like base/23246/78901 on the primary. Unless
wal_log_hints=on, the standby version is very likely different, but
only in ways that don't matter to WAL replay. So the procedure aims to
trick rsync into hard-linking the version of that file that exists on
the standby in the old cluster into the new cluster on the standby,
instead of copying the slightly-different version from the master,
thus making the upgrade very fast. If rsync actually checksummed the
files, it would realize that they're different and copy the file from
the original primary, which the person who wrote this procedure does
not want.

That's kind of a crazy thing for us to be documenting. I think we
really ought to consider removing from this documentation. If somebody
wants to write a reliable tool for this to ship as part of PostgreSQL,
well and good. But this procedure has no real sanity checks and is
based on very fragile assumptions. That doesn't seem suitable for
end-user use.

I'm not quite clear on how Nikolay got into trouble here. I don't
think I understand under exactly what conditions the procedure is
reliable and under what conditions it isn't. But there is no way in
heck I would ever advise anyone to use this procedure on a database
they actually care about. This is a great party trick or something to
show off in a lightning talk at PGCon, not something you ought to be
doing with valuable data that you actually care about.

--
Robert Haas
EDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Christensen
Дата:
Сообщение: Re: Initdb-time block size specification
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Initdb-time block size specification