Re: standby recovery fails (tablespace related) (tentative patchand discussion)

Поиск
Список
Период
Сортировка
От Kyotaro HORIGUCHI
Тема Re: standby recovery fails (tablespace related) (tentative patchand discussion)
Дата
Msg-id 20190514.120613.60660607.horiguchi.kyotaro@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: standby recovery fails (tablespace related) (tentative patch and discussion)  (Paul Guo <pguo@pivotal.io>)
Ответы Re: standby recovery fails (tablespace related) (tentative patch and discussion)  (Paul Guo <pguo@pivotal.io>)
Список pgsql-hackers
Hello.

At Mon, 13 May 2019 17:37:50 +0800, Paul Guo <pguo@pivotal.io> wrote in
<CAEET0ZF9yN4DaXyuFLzOcAYyxuFF1Ms_OQWeA+Rwv3GhA5Q-SA@mail.gmail.com>
> Thanks for the reply.
> 
> On Tue, May 7, 2019 at 2:47 PM Kyotaro HORIGUCHI <
> horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> 
> >
> > +      if (!(stat(parent_path, &st) == 0 && S_ISDIR(st.st_mode)))
> > +      {
> >
> > This patch is allowing missing source and destination directory
> > even in consistent state. I don't think it is safe.
> >
> 
> I do not understand this. Can you elaborate?

Suppose we were recoverying based on a backup at LSN1 targeting
to LSN3 then it crashed at LSN2, where LSN1 < LSN2 <= LSN3. LSN2
is called as "consistency point", before where the database is
not consistent. It's because we are applying WAL recored older
than those that were already applied in the second trial. The
same can be said for crash recovery, where LSN1 is the latest
checkpoint ('s redo LSN) and LSN2=LSN3 is the crashed LSN.

Creation of an existing directory or dropping of a non-existent
directory are apparently inconsistent or "broken" so we should
stop recovery when seeing such WAL records while database is in
consistent state.

> > +        ereport(WARNING,
> > +            (errmsg("directory \"%s\" for copydir() does not exists."
> > +                "It is possibly expected. Skip copydir().",
> > +                parent_path)));
> >
> > This message seems unfriendly to users, or it seems like an elog
> > message. How about something like this. The same can be said for
> > the source directory.
> >
> > | WARNING:  skipped creating database directory: "%s"
> > | DETAIL:  The tabelspace %u may have been removed just before crash.
> >
> 
> Yeah. Looks better.
> 
> 
> >
> > # I'm not confident in this at all:(
> >
> > > 2) Fixed dbase_desc(). Now the xlog output looks correct.
> > >
> > > rmgr: Database    len (rec/tot):     42/    42, tx:        486, lsn:
> > > 0/016386A8, prev 0/01638630, desc: CREATE copy dir base/1 to
> > > pg_tblspc/16384/PG_12_201904281/16386
> > >
> > > rmgr: Database    len (rec/tot):     34/    34, tx:        487, lsn:
> > > 0/01638EB8, prev 0/01638E40, desc: DROP dir
> > > pg_tblspc/16384/PG_12_201904281/16386
> >
> > WAL records don't convey such information. The previous
> > description seems right to me.
> >
> 
> 2019-04-17 14:52:14.951 CST [23030] CONTEXT:  WAL redo at 0/3011650 for
> Database/CREATE: copy dir 1663/1 to 65546/65547
> The directories are definitely wrong and misleading.

The original description is right in the light of how the server
recognizes. The record exactly says that "copy dir 1663/1 to
65546/65547" and the latter path is converted in filesystem layer
via a symlink.


> > > > Also I'd suggest we should use pg_mkdir_p() in
> > TablespaceCreateDbspace()
> > > > to replace
> > > > the code block includes a lot of
> > > > get_parent_directory(), MakePGDirectory(), etc even it
> > > > is not fixing a bug since pg_mkdir_p() code change seems to be more
> > > > graceful and simpler.
> >
> > But I don't agree to this. pg_mkdir_p goes above two-parents up,
> > which would be unwanted here.
> >
> > I do not understand this also. pg_mkdir_p() is similar to 'mkdir -p'.
> This change just makes the code concise. Though in theory the change is not
> needed.

We don't want to create tablespace direcotory after concurrent
DROPing, as the comment just above is saying:

|  * Acquire TablespaceCreateLock to ensure that no DROP TABLESPACE
|  * or TablespaceCreateDbspace is running concurrently.

If the concurrent DROP TABLESPACE destroyed the grand parent
directory, we mustn't create it again.

> > > > Whatever ignore mkdir failure or mkdir_p, I found that these steps
> > seem to
> > > > be error-prone
> > > > along with postgre evolving since they are hard to test and also we are
> > > > not easy to think out
> > > > various potential bad cases. Is it possible that we should do real
> > > > checkpoint (flush & update
> > > > redo lsn) when seeing checkpoint xlogs for these operations? This will
> > > > slow down standby
> > > > but master also does this and also these operations are not usual,
> > > > espeically it seems that it
> > > > does not slow down wal receiving usually?
> >
> > That dramatically slows recovery (not replication) if databases
> > are created and deleted frequently. That wouldn't be acceptable.
> >
> 
> This behavior is rare and seems to have the same impact on master & standby
> from checkpoint/restartpoint.
> We do not worry about master so we should not worry about standby also.

I didn't mention replication. I said that that slows recovery,
which is not governed by master's speed.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: remove doc/bug.template?
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: VACUUM can finish an interrupted nbtree page split -- is that okay?