Re: [bug fix] PITR corrupts the database cluster

Поиск

Список

Период

Сортировка

От	Andres Freund
Тема	Re: [bug fix] PITR corrupts the database cluster
Дата	24 июля 2013 г. 13:05:39
Msg-id	20130724130530.GE27288@alap2.anarazel.de обсуждение исходный текст
Ответ на	Re: [bug fix] PITR corrupts the database cluster (Heikki Linnakangas <hlinnakangas@vmware.com>)
Ответы	Re: [bug fix] PITR corrupts the database cluster
Список	pgsql-hackers

Дерево обсуждения

On 2013-07-24 15:45:52 +0300, Heikki Linnakangas wrote:
> Andres Freund <andres@2ndquadrant.com> wrote:
> >On 2013-07-24 12:59:43 +0200, Andres Freund wrote:
> >> > <Approach 2>
> >> What we imo could do would be to drop the tablespaces in a *separate*
> >> transaction *after* the transaction that removed the pg_tablespace
> >> entry. Then an "incomplete actions" logic similar to btree and gin
> >could
> >> be used to remove the database directory if we crashed between the
> >two
> >> transactions.
> >> 
> >> SO:
> >> TXN1 does:
> >> * remove catalog entries
> >> * drop buffers
> >> * XLogInsert(XLOG_DBASE_DROP_BEGIN)
> >> 
> >> TXN2:
> >> * remove_dbtablespaces
> >> * XLogInsert(XLOG_DBASE_DROP_FINISH)
> >> 
> >> The RM_DBASE_ID resource manager would then grow a rm_cleanup
> >callback
> >> (which would perform TXN2 if we failed inbetween) and a
> >> rm_safe_restartpoint which would prevent restartpoints from occuring
> >on
> >> standby between both.
> >> 
> >> The same should probably done for CREATE DATABASE because that
> >currently
> >> can result in partially copied databases lying around.
> >
> >And CREATE/DROP TABLESPACE.
> >
> >Not really related, but CREATE DATABASE's implementation makes me itch
> >everytime I read parts of it...
> 
> I've been hoping that we could get rid of the rm_cleanup mechanism entirely. I eliminated it for gist a while back,
andI've been thinking of doing the same for gin and btree. The way it works currently is buggy - while we have
rm_safe_restartpointto avoid creating a restartpoint at a bad moment, there is nothing to stop you from running a
checkpointwhile incomplete actions are pending. It's possible that there are page locks or something that prevent it in
practice,but it feels shaky.

> 
> So I'd prefer a solution that doesn't rely on rm_cleanup. Piggybacking on commit record seems ok to me, though if
we'regoing to have a lot of different things to attach there, maybe we need to generalize it somehow. Like, allow
resourcemanagers to attach arbitrary payload to the commit record, and provide a new rm_redo_commit function to replay
them.

The problem is that piggybacking on the commit record doesn't really fix
the problem that we end up with a bad state if we crash in a bad
moment.

For CREATE DATABASE you will have to copy the template database *before*
you commit the pg_database insert. Which means if we abort before that
we have old data in the datadir.

For DROP DATABASE, without something like incomplete actions,
piggybacking on the commit record doesn't solve the issue of CHECKPOINTS
either, because the commit record you piggybacked on could have
committed before a checkpoint, while you still were busy deleting all
the files.

Similar things go for CREATE/DROP TABLESPACE.

I think getting rid of something like incomplete actions entirely isn't
really realistic. There are several places were we probably should use
them but don't. I think we probably will have change the logic around it
so there's something that determines whether a normal checkpoint is safe
atm.
A simple version of this basically exists via
GetVirtualXIDsDelayingChkpt() which is now also used for checksums but
we probably want to extend that at some point.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [bug fix] PITR corrupts the database cluster