Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Дата
Msg-id CA+TgmoaMHFaOrVO-Ejrt2ce8K=yCUW0vw6hSjPEv6f2wCKU9Vg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints  (Dilip Kumar <dilipbalaut@gmail.com>)
Список pgsql-hackers
On Fri, Sep 3, 2021 at 6:23 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>> + /* Built-in oids are mapped directly */
>> + if (classForm->oid < FirstGenbkiObjectId)
>> + relfilenode = classForm->oid;
>> + else if (OidIsValid(classForm->relfilenode))
>> + relfilenode = classForm->relfilenode;
>> + else
>> + continue;
>>
>> Am I missing something, or is this totally busted?
>
> Oops, I think the condition should be like below, but I will think carefully before posting the next version if there
issomething else I am missing. 
>
> if (OidIsValid(classForm->relfilenode))
>    relfilenode = classForm->relfilenode;
> else if  if (classForm->oid < FirstGenbkiObjectId)
>    relfilenode = classForm->oid;
> else
>   continue

What about mapped rels that have been rewritten at some point?

> Agreed to all, but In general, I think WAL hitting the disk before data is more applicable for the shared buffers,
wherewe want to flush the WAL first before writing the shared buffer so that in case of torn page we have an option to
recoverthe page from previous FPI. But in such cases where we are creating a directory or file there is no such
requirement.  Anyways, I agreed with the comments that it should be more uniform and the comment should be correct. 

There have been previous debates about whether WAL records for
filesystem operations should be issued before or after those
operations are performed. I'm not sure how easy those discussion are
to find in the archives, but it's very relevant here. I think the
short version is - if we write a WAL record first and then the
operation fails afterward, we have to PANIC. But if we perform the
operation first and then write the WAL record if it succeeds, then we
could crash before writing WAL and end up out of sync with our
standbys. If we then later do any WAL-logged operation locally that
depends on that operation having been performed, replay will fail on
the standby. There used to be, or maybe still are, comments in the
code defending the latter approach, but more recently it's been
strongly criticized. The thinking, AIUI, is basically that filesystem
operations really ought not to fail, because nobody should be doing
weird things to the data directory, and if they do, panicking is OK.
But having replay fail in strange ways on the standby later is not OK.

I'm not sure if everyone agrees with that logic; it seems somewhat
debatable. I *think* I personally agree with it but ... I'm not even
100% sure about that.

--
Robert Haas
EDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Improve logging when using Huge Pages
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Add guc to enable send SIGSTOP to peers when backend exits abnormally