Re: [HACKERS] Unlogged tables cleanup

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: [HACKERS] Unlogged tables cleanup
Дата
Msg-id 20190513172442.bcjcdj2wxlvs4dix@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: [HACKERS] Unlogged tables cleanup  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Ответы Re: [HACKERS] Unlogged tables cleanup  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Список pgsql-hackers
Hi,

On 2019-05-13 13:07:30 -0400, Alvaro Herrera wrote:
> On 2019-May-13, Andres Freund wrote:
> 
> > On 2019-05-13 12:24:05 -0400, Alvaro Herrera wrote:
> 
> > > AFAICS ResetUnloggedRelations copies the init fork after replaying WAL,
> > > so it would be sufficient to have the init fork be recovered from WAL
> > > for that to work.  However, we also do ResetUnloggedRelations *before*
> > > replaying WAL in order to remove leftover not-init-fork files, and that
> > > process requires that the init fork is present at that time.
> > 
> > What scenario are you precisely wondering about? That
> > ResetUnloggedRelations() could overwrite the main fork, while not yet
> > having valid contents (due to the lack of smgrimmedsync())? Shouldn't
> > that only be possible while still in an inconsistent state? A checkpoint
> > would have serialized the correct contents, and we'd not reach HS
> > consistency before having replayed that WAL records resetting the table
> > and the init fork consistency?
> 
> The first ResetUnloggedRelations call occurs before any WAL is replayed,
> so the data dir certainly still in inconsistent state.  At that point,
> we need the init fork files to be present, because the init files are the
> indicators of what relations we need to delete the other forks for.

Hm. I think this might be a self-made problem. For the main fork, we
don't need this - if the init fork was created before the last
checkpoint/restartpoint, it'll be on-disk. If it was created afterwards,
WAL replay will recreate both main an init fork. So the problem is just
that the VM fork might survive, because it'll not get nuked given the
current arrangement. Is that what you're thinking about?

I'm doubtful that that is a sane arrangement - there very well could be
tables created and dropped, and then recreated with a recycled oid,
between start and end of recovery. I'm not sure this is actively a
problem for the VM, but I think it's pretty broken for the FSM.

Why isn't the correct answer to nuke all forks during the WAL replay of
the main relation's creation?


> Maybe we can do something lighter than a full immedsync of all the data
> for the init file -- it would be sufficient to have the file *exist* --
> but I'm not sure this optimization is worth anything.

I don't think just that is sufficient in isolation for types of
relations with metapages (e.g. btree) - the init fork constains data
there.


> > > So I think the immedsync call is necessary (otherwise the cleanup
> > > may fail).  I don't quite understand why the log_smgrcreate is
> > > necessary, but I think it is for reasons that are not adequately
> > > explained by the existing comments.
> > 
> > Well, otherwise the relation won't exist on a standby? And if replay
> > starts from before a database/tablespace creation we'd remove the init
> > fork. So if it's not in the WAL, we'd loose it.
> 
> Ah, of course.  Well, that needs to be in the comments then.

I think it is?

         * ... Recovery may as well remove it
     * while replaying, for example, XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE
     * record. Therefore, logging is necessary even if wal_level=minimal.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: [HACKERS] Unlogged tables cleanup
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: [HACKERS] Unlogged tables cleanup