Re: [HACKERS] Unlogged tables cleanup

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: [HACKERS] Unlogged tables cleanup
Дата
Msg-id 20190514041504.GJ1418@paquier.xyz
обсуждение исходный текст
Ответ на Re: [HACKERS] Unlogged tables cleanup  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Mon, May 13, 2019 at 11:28:32AM -0400, Robert Haas wrote:
> 1. The comment added in that commit says "Recover may as well remove
> it while replaying..." but what is really meant is "Recovery may well
> remove it well replaying..."  The phrase "may as well" means that
> there isn't really any reason not to do it even if it's not strictly
> necessary.  The phrase "may well" means that it's entirely possible.
> The latter meaning is the one we want here.

Perhaps I wrote this comment.  Per what you say, indeed we should use
"may well" here, to outline the fact that it is a possibility.  I have
to admit that I don't completely get the difference with "may as
well", which also sounds like a possibility by reading this comment
today, but I am no native speaker.

> 2. The comment as adjusted in that commit refers to needing an
> immediate sync "even if the page has been logged, because the write
> did not go through shared_buffers," but there is no page and no write,
> because an empty heap is just an empty file.   That logic makes sense
> for index AMs that bypass shared buffers to write a metapage, e.g.
> nbtree, as opposed to others which go through shared_buffers and don't
> have the immediate sync, e.g. brin.  However, the heap writes no pages
> either through shared buffers or bypassing shared buffers, so either
> I'm confused or the comment makes no sense.

We need to ensure that the init fork, even if empty, still exists so
as recovery can do its job of detection that this is an unlogged table
and then recreate the empty main fork.  Perhaps the comment could
insist on the on-disk existence of the relfilenode instead, making
necessary the logging of this page, however empty it is.

> 3. Before that commit, the comment said that "the immediate sync is
> required, because otherwise there's no guarantee that this will hit
> the disk before the next checkpoint moves the redo pointer."  However,
> that justification seems to apply to *anything* that does smgrcreate +
> log_smgrcreate would also need to do smgrimmedsync, and
> RelationCreateStorage doesn't.  Unless, for some reason that I'm not
> thinking of right now, the init fork has stronger durability
> requirements that the main fork, it's hard to understand why
> heapam_relation_set_new_filenode can call RelationCreateStorage to do
> smgrcreate + log_smgrcreate for the main fork and that's OK, but when
> it does the same thing itself for the init fork, it now needs
> smgrimmedsync as well.
>
> My guess, just shooting from the hip, is that the smgrimmedsync call
> can be removed here.  If that's wrong, then we need a better
> explanation for why it's needed, and we possibly need to add it to
> every single place that does smgrcreate that doesn't have it already.

That's mentioned upthread, and my sloppy memories on the matter match
that we need the fsync call to ensure that the init fork is present at
the init of recovery, and that logging makes sure that is does not get
wiped out if replaying a database or tablespace record.
--
Michael

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: VACUUM can finish an interrupted nbtree page split -- is that okay?
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: [HACKERS] Unlogged tables cleanup