Re: [HACKERS] hash index on unlogged tables doesn't behave asexpected

Поиск
Список
Период
Сортировка
От Kyotaro HORIGUCHI
Тема Re: [HACKERS] hash index on unlogged tables doesn't behave asexpected
Дата
Msg-id 20170706.185447.256482539.horiguchi.kyotaro@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: [HACKERS] hash index on unlogged tables doesn't behave as expected  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: [HACKERS] hash index on unlogged tables doesn't behave asexpected  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected  (Amit Kapila <amit.kapila16@gmail.com>)
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hello,

At Tue, 4 Jul 2017 14:49:26 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
<CAA4eK1+SYqCmA7ioTBpJHcO-B-rf8A=N9Gr1-RP3RhwecB5E-A@mail.gmail.com>
> On Tue, Jul 4, 2017 at 1:23 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
> > On Mon, Jul 3, 2017 at 7:40 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > It seems to me that this qualifies as an open item for 10. WAL-logging
> > is new for hash tables. Amit, could you add one?
> >
> 
> Added.
> 
> > +   use_wal = RelationNeedsWAL(rel) ||
> > +             (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED &&
> > +              forkNum == INIT_FORKNUM);
> > In access AMs, empty() routines are always only called for unlogged
> > relations, so you could relax that to check for INIT_FORKNUM only.
> >
> 
> Yeah, but _hash_init() is an exposed API, so I am not sure it is a
> good idea to make such an assumption at that level of code.  Now, as
> there is no current user which wants to use it any other way, we can
> make such an assumption, but does it serve any purpose?

Check for INIT_FORKNUM appears both accompanied and not
accompanied by check for RELPER.._UNLOGGED, so I'm not sure which
is the convention here.

The difference of the two is an init fork of TEMP
relations. However I believe that init fork is by definition a
part of an unlogged relation, it seems to me that it ought not to
be wal-logged if we had it. From this viewpoint, the middle line
makes sense.



> > Looking at the patch, I think that you are right to do the logging
> > directly in _hash_init() and not separately in hashbuildempty().
> > Compared to other relations the init data is more dynamic.
> >
> > +       /*
> > +        * Force the on-disk state of init forks to always be in sync with the
> > +        * state in shared buffers.  See XLogReadBufferForRedoExtended.
> > +        */
> > +       XLogRecGetBlockTag(record, 0, NULL, &forknum, NULL);
> > +       if (forknum == INIT_FORKNUM)
> > +               FlushOneBuffer(metabuf);
> > I find those forced syncs a bit surprising. The buffer is already
> > marked as dirty, so even if there is a concurrent checkpoint they
> > would be flushed.
> >
> 
> What if there is no concurrent checkpoint activity and the server is
> shutdown?  Remember this replay happens on standby and we will just
> try to perform Restartpoint and there is no guarantee that it will
> flush the buffers.  If the buffers are not flushed at shutdown, then
> the Init fork will be empty on restart and the hash index will be
> corrupt.

XLogReadBufferForRedoExtended() automatically force the flush for
a XLOG record on init fork that having FPW imaeges. And
_hash_init seems to have emitted such a XLOG record using
log_newpage.

> > If recovery comes again here, then they would just
> > be replayed. I am unclear why XLogReadBufferForRedoExtended is not
> > enough to replay a FPW of that.
> >
> 
> Where does FPW come into the picture for a replay of metapage or bitmappage?

Since required FPW seems to be emitted and the flush should be
done automatically, the issuer side (_hash_init) may attach wrong
FPW images if hash_xlog_init_meta_page requires additional
flushes. But I don't see such a flaw there. I think Michael wants
to say such a kind of thing.



By the way the comment of the function ReadBufferWithoutRelcache
has the following sentense.

| * NB: At present, this function may only be used on permanent relations, which
| * is OK, because we only use it during XLOG replay.  If in the future we
| * want to use it on temporary or unlogged relations, we could pass additional
| * parameters.

and does

| return ReadBuffer_common(smgr, RELPERSISTENCE_PERMANENT, forkNum, blockNum,                         mode, strategy,
&hit);

This surely works since BufferAlloc recognizes INIT_FORK. But
some adjustment may be needed around here.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




В списке pgsql-hackers по дате отправления:

Предыдущее
От: tushar
Дата:
Сообщение: Re: [HACKERS] increasing the default WAL segment size
Следующее
От: Kyotaro HORIGUCHI
Дата:
Сообщение: Re: [HACKERS] hash index on unlogged tables doesn't behave asexpected