Re: [HACKERS] hash index on unlogged tables doesn't behave asexpected
От | Kyotaro HORIGUCHI |
---|---|
Тема | Re: [HACKERS] hash index on unlogged tables doesn't behave asexpected |
Дата | |
Msg-id | 20170706.185447.256482539.horiguchi.kyotaro@lab.ntt.co.jp обсуждение исходный текст |
Ответ на | Re: [HACKERS] hash index on unlogged tables doesn't behave as expected (Amit Kapila <amit.kapila16@gmail.com>) |
Ответы |
Re: [HACKERS] hash index on unlogged tables doesn't behave asexpected
(Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected (Amit Kapila <amit.kapila16@gmail.com>) Re: [HACKERS] hash index on unlogged tables doesn't behave as expected (Robert Haas <robertmhaas@gmail.com>) |
Список | pgsql-hackers |
Hello, At Tue, 4 Jul 2017 14:49:26 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in <CAA4eK1+SYqCmA7ioTBpJHcO-B-rf8A=N9Gr1-RP3RhwecB5E-A@mail.gmail.com> > On Tue, Jul 4, 2017 at 1:23 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: > > On Mon, Jul 3, 2017 at 7:40 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > It seems to me that this qualifies as an open item for 10. WAL-logging > > is new for hash tables. Amit, could you add one? > > > > Added. > > > + use_wal = RelationNeedsWAL(rel) || > > + (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED && > > + forkNum == INIT_FORKNUM); > > In access AMs, empty() routines are always only called for unlogged > > relations, so you could relax that to check for INIT_FORKNUM only. > > > > Yeah, but _hash_init() is an exposed API, so I am not sure it is a > good idea to make such an assumption at that level of code. Now, as > there is no current user which wants to use it any other way, we can > make such an assumption, but does it serve any purpose? Check for INIT_FORKNUM appears both accompanied and not accompanied by check for RELPER.._UNLOGGED, so I'm not sure which is the convention here. The difference of the two is an init fork of TEMP relations. However I believe that init fork is by definition a part of an unlogged relation, it seems to me that it ought not to be wal-logged if we had it. From this viewpoint, the middle line makes sense. > > Looking at the patch, I think that you are right to do the logging > > directly in _hash_init() and not separately in hashbuildempty(). > > Compared to other relations the init data is more dynamic. > > > > + /* > > + * Force the on-disk state of init forks to always be in sync with the > > + * state in shared buffers. See XLogReadBufferForRedoExtended. > > + */ > > + XLogRecGetBlockTag(record, 0, NULL, &forknum, NULL); > > + if (forknum == INIT_FORKNUM) > > + FlushOneBuffer(metabuf); > > I find those forced syncs a bit surprising. The buffer is already > > marked as dirty, so even if there is a concurrent checkpoint they > > would be flushed. > > > > What if there is no concurrent checkpoint activity and the server is > shutdown? Remember this replay happens on standby and we will just > try to perform Restartpoint and there is no guarantee that it will > flush the buffers. If the buffers are not flushed at shutdown, then > the Init fork will be empty on restart and the hash index will be > corrupt. XLogReadBufferForRedoExtended() automatically force the flush for a XLOG record on init fork that having FPW imaeges. And _hash_init seems to have emitted such a XLOG record using log_newpage. > > If recovery comes again here, then they would just > > be replayed. I am unclear why XLogReadBufferForRedoExtended is not > > enough to replay a FPW of that. > > > > Where does FPW come into the picture for a replay of metapage or bitmappage? Since required FPW seems to be emitted and the flush should be done automatically, the issuer side (_hash_init) may attach wrong FPW images if hash_xlog_init_meta_page requires additional flushes. But I don't see such a flaw there. I think Michael wants to say such a kind of thing. By the way the comment of the function ReadBufferWithoutRelcache has the following sentense. | * NB: At present, this function may only be used on permanent relations, which | * is OK, because we only use it during XLOG replay. If in the future we | * want to use it on temporary or unlogged relations, we could pass additional | * parameters. and does | return ReadBuffer_common(smgr, RELPERSISTENCE_PERMANENT, forkNum, blockNum, mode, strategy, &hit); This surely works since BufferAlloc recognizes INIT_FORK. But some adjustment may be needed around here. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
В списке pgsql-hackers по дате отправления:
Следующее
От: Kyotaro HORIGUCHIДата:
Сообщение: Re: [HACKERS] hash index on unlogged tables doesn't behave asexpected