Re: "using previous checkpoint record at" maybe not the greatest idea?

Поиск

Список

Период

Сортировка

От	Amit Kapila
Тема	Re: "using previous checkpoint record at" maybe not the greatest idea?
Дата	7 февраля 2016 г. 08:25:02
Msg-id	CAA4eK1+w_TmR5jcT9NqpU5msU4+tZ_12YuTHrmvpyeue8A9dWw@mail.gmail.com обсуждение исходный текст
Ответ на	"using previous checkpoint record at" maybe not the greatest idea? (Andres Freund <andres@anarazel.de>)
Ответы	Re: "using previous checkpoint record at" maybe not the greatest idea? (Amit Kapila <amit.kapila16@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On Tue, Feb 2, 2016 at 5:28 AM, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> currently if, when not in standby mode, we can't read a checkpoint
> record, we automatically fall back to the previous checkpoint, and start
> replay from there.
>
> Doing so without user intervention doesn't actually seem like a good
> idea. While not super likely, it's entirely possible that doing so can
> wreck a cluster, that'd otherwise easily recoverable. Imagine e.g. a
> tablespace being dropped - going back to the previous checkpoint very
> well could lead to replay not finishing, as the directory to create
> files in doesn't even exist.
>

I think there are similar hazards for deletion of relation when

relfilenode gets reused. Basically, it can delete the data

for one of the newer relations which is created after the

last checkpoint.

> As there's, afaics, really no "legitimate" reasons for needing to go
> back to the previous checkpoint I don't think we should do so in an
> automated fashion.
>

I have tried to find out why at the first place such a mechanism has

been introduced and it seems to me that commit

4d14fe0048cf80052a3ba2053560f8aab1bb1b22 has introduced it, but

the reason is not apparent. Then I digged through the archives

and found mail chain which I think has lead to this commit.

Refer [1][2].

If we want to do something for fallback-to-previous-checkpoint

mechanism, then I think it is worth considering whether we want

to retain xlog files from two checkpoints as that also seems to

have been introduced in the same commit.

> All the cases where I could find logs containing "using previous
> checkpoint record at" were when something else had already gone pretty
> badly wrong. Now that obviously doesn't have a very large significance,
> because in the situations where it "just worked" are unlikely to be
> reported...
>
> Am I missing a reason for doing this by default?
>

I am not sure, but may be such hazards won't exist at the time

fallback-to-previous-checkpoint mechanism has been introduced.

I think even if we want to make it non-default, it will be very

difficult for users to decide whether to turn it on or not. Basically,

I think if such a situation occurs, what ever solution we try to

provide to user, it might not be full-proof, but OTOH we should

provide some way to allow user to start database and dump the

existing contents. Some of the options that comes to mind are

provide some way to get the last checkpoint record from WAL

or provide a way to compute max-lsn from data-pages and use

that with pg_resetxlog utility to allow user to start database.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Robert Haas
Дата: 07 февраля 2016 г., 08:10:39
Сообщение: Re: Patch: fix lock contention for HASHHDR.mutex

Следующее

От: Amit Kapila
Дата: 07 февраля 2016 г., 08:32:09
Сообщение: Re: "using previous checkpoint record at" maybe not the greatest idea?

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: "using previous checkpoint record at" maybe not the greatest idea?

Предыдущее

Следующее