On 2019-May-13, Andres Freund wrote:
> On 2019-05-13 13:07:30 -0400, Alvaro Herrera wrote:
> > On 2019-May-13, Andres Freund wrote:
> > The first ResetUnloggedRelations call occurs before any WAL is replayed,
> > so the data dir certainly still in inconsistent state. At that point,
> > we need the init fork files to be present, because the init files are the
> > indicators of what relations we need to delete the other forks for.
>
> Hm. I think this might be a self-made problem. For the main fork, we
> don't need this - if the init fork was created before the last
> checkpoint/restartpoint, it'll be on-disk. If it was created afterwards,
> WAL replay will recreate both main an init fork. So the problem is just
> that the VM fork might survive, because it'll not get nuked given the
> current arrangement. Is that what you're thinking about?
No, this wasn't was I was trying to explain. Robert described it
better.
> > Maybe we can do something lighter than a full immedsync of all the data
> > for the init file -- it would be sufficient to have the file *exist* --
> > but I'm not sure this optimization is worth anything.
>
> I don't think just that is sufficient in isolation for types of
> relations with metapages (e.g. btree) - the init fork constains data
> there.
No, I meant that when doing the initial cleanup (before WAL replay) we
only delete files; and for that we only need to know whether the table
is unlogged, and we know that by testing presence of the init file. We
do need the contents *after* WAL replay, and for indexes we of course
need the actual contents of the init fork.
> > > Well, otherwise the relation won't exist on a standby? And if replay
> > > starts from before a database/tablespace creation we'd remove the init
> > > fork. So if it's not in the WAL, we'd loose it.
> >
> > Ah, of course. Well, that needs to be in the comments then.
>
> I think it is?
>
> * ... Recovery may as well remove it
> * while replaying, for example, XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE
> * record. Therefore, logging is necessary even if wal_level=minimal.
I meant the standby part.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services