Re: [HACKERS] hash index on unlogged tables doesn't behave asexpected

Поиск
Список
Период
Сортировка
От Kyotaro HORIGUCHI
Тема Re: [HACKERS] hash index on unlogged tables doesn't behave asexpected
Дата
Msg-id 20170710.185740.203923467.horiguchi.kyotaro@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: [HACKERS] hash index on unlogged tables doesn't behave as expected  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: [HACKERS] hash index on unlogged tables doesn't behave as expected  (Amit Kapila <amit.kapila16@gmail.com>)
Re: [HACKERS] hash index on unlogged tables doesn't behave as expected  (Michael Paquier <michael.paquier@gmail.com>)
Список pgsql-hackers
Hi,

At Mon, 10 Jul 2017 14:58:13 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
<CAA4eK1JYyO5Hcxx4rP0jb=JmMC4qvY1YvG9UvkwNr5qrojsOPw@mail.gmail.com>
> On Mon, Jul 10, 2017 at 10:39 AM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > At Sat, 8 Jul 2017 16:41:27 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
<CAA4eK1+-DUto+MyeNdLE9P9u8G3Fv6n+SOjPSqmPSw6ashhPjw@mail.gmail.com>
> >> On Sat, Jul 8, 2017 at 9:08 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> >> > On Fri, Jul 7, 2017 at 11:36 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >> >> I think we should do that as a separate patch (I can write the same as
> >> >> well) because that is not new behavior introduced by this patch, ...
> >> >
> >> > True, although maybe hash indexes are the only way that happens today?
> >> >
> >>
> >> No, I think it will happen for all other indexes as well.  Basically,
> >> we log new page for init forks of other indexes and then while
> >> restoring that full page image, it will fall through the same path.
> >
> > (Sorry, I didn't noticed that hash metapage is not using log_newpgae)
> >
> 
> The bitmappage is also not using log_newpage.
> 
> > For example, (bt|gin|gist|spgist|brin)buildempty are using
> > log_newpage to log INIT_FORK metapages. I believe that the xlog
> > flow from log_newpage to XLogReadBufferForRedoExtended is
> > developed so that pages in init forks are surely flushed during
> > recovery, so we should fix it instead adding other flushes if the
> > path doesn't work. Am I looking wrong place? (I think it is
> > working.)
> >
> 
> You are looking at right place.
> 
> > If I'm understanding correctly here, hashbild and hashbuildempty
> > should be refactored using the same structure with other *build
> > and *buildempty functions. Specifically metapages initialization
> > subroutines (_hash_init or _bt_initmetapage or SpGistInitMetapage
> > or...)  does only on-memory initialization and does not emit WAL,
> > then *build and *buildempty emits WAL in their required way.
> >
> 
> I have considered this way of doing as well, read my first e-mail [1]
> in this thread "Another approach to fix the issue ....".  It is not
> that we can't change the code of hashbuildempty to make it log new
> pages for all kind of pages (metapage, bitmappage and datapages), but
> I am not sure if there is a value in going in that direction.  If we
> have to go in that direction, we need to hard code some of the values
> like hashm_nmaps and hashm_mapp in metapage rather than initializing
> them after bitmappage creation.  Before going in that direction, I
> think we should also take opinion from other people especially some
> committer as we might need to maintain two different ways of doing
> almost the same thing.

Thanks for the explanation and the pointer (to the start of this
thread.. sorry).

> I am also not 100% comfortable with adding flush at two new places,
> but that makes the code for fix simpler and fundamentally there is no
> problem in doing so.

I agree that the patch would be simpler. Ok, I am satisfied with
an additional comment for _hash_init and hash_xlog_init_meta_page
that describes the reason that _hash_init doesn't/can't use
log_newpage and thus requires explicit flushes. Something like
the description in [1] would be enough.

> There is a small downside to always logging new page which is that it
> will always log full page image in WAL which has the potential to
> increase WAL volume if there are many unlogged tables and indexes.
> Now considering the init fork generally has very less pages, it is not
> a big concern, but still avoiding full page image has some value.

Perhaps more effective mechanism will be developed at the time.

> >> >>> By the way the comment of the function ReadBufferWithoutRelcache
> >> >>> has the following sentense.
> >> >>>
> >> >>> | * NB: At present, this function may only be used on permanent relations, which
> >> >>> | * is OK, because we only use it during XLOG replay.  If in the future we
> >> >>> | * want to use it on temporary or unlogged relations, we could pass additional
> >> >>> | * parameters.
> >> >>>
> >> >>> and does
> >> >>>
> >> >>> | return ReadBuffer_common(smgr, RELPERSISTENCE_PERMANENT, forkNum, blockNum,
> >> >>>                                                          mode, strategy, &hit);
> >> >>>
> >> >>> This surely works since BufferAlloc recognizes INIT_FORK. But
> >> >>> some adjustment may be needed around here.
> >> >>
> >> >> Yeah, it should probably mention that the init fork of an unlogged
> >> >> relation is also OK.
> >> >>
> >> >
> >> > I think we should do that as a separate patch (I can write the same as
> >> > well) because that is not new behavior introduced by this patch, but
> >> > let me know if you think that we should add such a comment in this
> >> > patch itself.
> >> >
> >>
> >> Attached a separate patch to adjust the comment atop ReadBufferWithoutRelcache.
> >
> > + * NB: At present, this function may only be used on permanent relations and
> > + * init fork of an unlogged relation, which is OK, because we only use it
> > + * during XLOG replay.  If in the future we want to use it on temporary or
> > + * unlogged relations, we could pass additional parameters.
> >
> > *I* think the target of the funcion is permanent relations and
> > init forks, not unlogged relations. And I'd like to have an
> > additional comment about RELPERSISTENCE_PERMANENT. Something like
> > the attached.
> >
> 
> Okay, let's leave this for committer to decide.

Agreed, thanks.

> [1] - https://www.postgresql.org/message-id/CAA4eK1JpcMsEtOL_J7WODumeEfyrPi7FPYHeVdS7fyyrCrgp4w%40mail.gmail.com

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: [HACKERS] POC: Sharing record typmods between backends
Следующее
От: Andrew Gierth
Дата:
Сообщение: Re: [HACKERS] COPY vs. transition tables