Re: FSM corruption and standby servers

Поиск
Список
Период
Сортировка
От David G. Johnston
Тема Re: FSM corruption and standby servers
Дата
Msg-id CAKFQuwZkbH9r9bEp6X+9JjE8Q9mcXKDW0sjEJhAX3xBTa-jgGQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: FSM corruption and standby servers  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: FSM corruption and standby servers  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-admin
On Mon, Oct 31, 2016 at 9:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Hunley, Douglas" <douglas.hunley@openscg.com> writes:
> On Mon, Oct 31, 2016 at 10:38 AM, Tim Goodaire <tgoodaire@dyn.com> wrote:
>> I have a question regarding the FSM corruption bug that is fixed in
>> postgresql 9.5.5 (https://wiki.postgresql.org/wiki/Free_Space_Map_Problems).
>> If I don't find any corruption on a master database, is it still possible
>> that there is corruption on the standbys?

> It shouldn't be, iirc. FSMs are only ever created/updated by vacuum, which
> doesn't run on a slave until it is promoted to a master.

The problem is that the WAL data can be wrong in these cases, and since
the standbys only know what they were told in the WAL stream, their images
will be wrong even if the master is valid.

I would have thought that the referenced page is clear enough about
needing to check the standbys; do you think it isn't?

​I can ​see how the following is a bit loose for someone not super-familiar with WAL.

"A database crash-and-restart shortly after such an event can lead to corrupted FSMs. Also, standby servers will receive incorrect WAL data causing them to create corrupted FSMs locally."

I believe the "shortly" here is present because the crash must occur before the next checkpoint in order for the problem to appear on the master.  Given this constraint the secondary emphasis that standby servers receive seems mis-placed.  The most probable scenario - given the bug has manifested and one is running a standby - is a broken standby and a functioning master.​

"Standby servers are directly impacted by this bug and must be checked for corruption even if their master appears clean.  The master will only exhibit a problem if there is a crash-and-restart cycle shortly after (up until a checkpoint) the problem statement that causes the master to replay the just generated WAL."

It is not clear to what extent traditional backups (in the realm of using pg_basebackup) are affected...

David J.


В списке pgsql-admin по дате отправления:

Предыдущее
От: Tim Goodaire
Дата:
Сообщение: Re: FSM corruption and standby servers
Следующее
От: Tom Lane
Дата:
Сообщение: Re: FSM corruption and standby servers