Re: [HACKERS] hash index on unlogged tables doesn't behave as expected

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: [HACKERS] hash index on unlogged tables doesn't behave as expected
Дата
Msg-id CAA4eK1K2iAvnei88thQ8jco3uAYg6avkP1WGN4SJ-QxmDkSeqA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] hash index on unlogged tables doesn't behave asexpected  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Список pgsql-hackers
On Thu, Jul 6, 2017 at 3:24 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Hello,
>
> At Tue, 4 Jul 2017 14:49:26 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
<CAA4eK1+SYqCmA7ioTBpJHcO-B-rf8A=N9Gr1-RP3RhwecB5E-A@mail.gmail.com>
>> On Tue, Jul 4, 2017 at 1:23 PM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>> > On Mon, Jul 3, 2017 at 7:40 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> >
>> > It seems to me that this qualifies as an open item for 10. WAL-logging
>> > is new for hash tables. Amit, could you add one?
>> >
>>
>> Added.
>>
>> > +   use_wal = RelationNeedsWAL(rel) ||
>> > +             (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED &&
>> > +              forkNum == INIT_FORKNUM);
>> > In access AMs, empty() routines are always only called for unlogged
>> > relations, so you could relax that to check for INIT_FORKNUM only.
>> >
>>
>> Yeah, but _hash_init() is an exposed API, so I am not sure it is a
>> good idea to make such an assumption at that level of code.  Now, as
>> there is no current user which wants to use it any other way, we can
>> make such an assumption, but does it serve any purpose?
>
> Check for INIT_FORKNUM appears both accompanied and not
> accompanied by check for RELPER.._UNLOGGED, so I'm not sure which
> is the convention here.
>
> The difference of the two is an init fork of TEMP
> relations. However I believe that init fork is by definition a
> part of an unlogged relation, it seems to me that it ought not to
> be wal-logged if we had it. From this viewpoint, the middle line
> makes sense.
>

What is the middle line?  Are you okay with a current check or do you
see any problem with it or do you prefer to write it differently?

>
>
>> > Looking at the patch, I think that you are right to do the logging
>> > directly in _hash_init() and not separately in hashbuildempty().
>> > Compared to other relations the init data is more dynamic.
>> >
>> > +       /*
>> > +        * Force the on-disk state of init forks to always be in sync with the
>> > +        * state in shared buffers.  See XLogReadBufferForRedoExtended.
>> > +        */
>> > +       XLogRecGetBlockTag(record, 0, NULL, &forknum, NULL);
>> > +       if (forknum == INIT_FORKNUM)
>> > +               FlushOneBuffer(metabuf);
>> > I find those forced syncs a bit surprising. The buffer is already
>> > marked as dirty, so even if there is a concurrent checkpoint they
>> > would be flushed.
>> >
>>
>> What if there is no concurrent checkpoint activity and the server is
>> shutdown?  Remember this replay happens on standby and we will just
>> try to perform Restartpoint and there is no guarantee that it will
>> flush the buffers.  If the buffers are not flushed at shutdown, then
>> the Init fork will be empty on restart and the hash index will be
>> corrupt.
>
> XLogReadBufferForRedoExtended() automatically force the flush for
> a XLOG record on init fork that having FPW imaeges. And
> _hash_init seems to have emitted such a XLOG record using
> log_newpage.
>

Sure, but log_newpage is not used for metapage and bitmappage.

>> > If recovery comes again here, then they would just
>> > be replayed. I am unclear why XLogReadBufferForRedoExtended is not
>> > enough to replay a FPW of that.
>> >
>>
>> Where does FPW come into the picture for a replay of metapage or bitmappage?
>
> Since required FPW seems to be emitted and the flush should be
> done automatically, the issuer side (_hash_init) may attach wrong
> FPW images if hash_xlog_init_meta_page requires additional> flushes. But I don't see such a flaw there. I think
Michaelwants
 
> to say such a kind of thing.
>

I am not able to understand what you want to say, but I think you
don't see any problem with the patch as such.

>
>
> By the way the comment of the function ReadBufferWithoutRelcache
> has the following sentense.
>

I don't think we need anything with respect to this patch because
ReadBufferWithoutRelcache is used in case of FPW replay of INIT_FORK
pages as well.  Do you have something else in mind which I am not able
to see?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Etsuro Fujita
Дата:
Сообщение: Re: [HACKERS] Add support for tuple routing to foreign partitions
Следующее
От: AP
Дата:
Сообщение: Re: [HACKERS] pgsql 10: hash indexes testing