pg_rewind is not crash safe

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема pg_rewind is not crash safe
Дата
Msg-id d8dcc760-8780-5084-f066-6d663801d2e2@iki.fi
обсуждение исходный текст
Ответы Re: pg_rewind is not crash safe  ("Andrey M. Borodin" <x4mmm@yandex-team.ru>)
Список pgsql-hackers
A colleague of mine brought to my attention that pg_rewind is not crash 
safe. If it is interrupted for any reason, it leaves behind a data 
directory with a mix of data from the source and target images. If 
you're "lucky", the server will start up, but it can be in an 
inconsistent state. That's obviously not good. It would be nice to:

1. Detect the situation, and refuse to start up.

Or even better:

2. Make pg_rewind crash safe, so that you could safely restart it if 
it's interrupted.

Has anyone else run into this? How did you work around it?

It doesn't seem hard to detect this. pg_rewind can somehow "poison" the 
data directory just before it starts making irreversible changes. I'm 
thinking of updating the 'state' in the control file to a new 
PG_IN_REWIND value.

It also doesn't seem too hard to make it restartable. As long as you 
point it to the same source server, it is already almost safe to run 
pg_rewind again. If we re-order the way it writes the control or backup 
files and makes other changes, pg_rewind can verify that you pointed it 
at the same or compatible primary as before.

I think there's one corner case with truncated files, if pg_rewind has 
extended a file by copying missing "tail" from the source system, but 
the system crashes before it's fsynced to disk. But I think we can fix 
that too, by paying attention to SMGR_TRUNCATE records when scanning the 
source WAL.

- Heikki



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: FailedAssertion("pd_idx == pinfo->nparts", File: "execPartition.c", Line: 1689)
Следующее
От: Tom Lane
Дата:
Сообщение: Re: FailedAssertion("pd_idx == pinfo->nparts", File: "execPartition.c", Line: 1689)