Re: Error with index on unlogged table

Поиск
Список
Период
Сортировка
От Kyotaro HORIGUCHI
Тема Re: Error with index on unlogged table
Дата
Msg-id 20151201.111101.189005118.horiguchi.kyotaro@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: Error with index on unlogged table  (Michael Paquier <michael.paquier@gmail.com>)
Ответы Re: Error with index on unlogged table  (Michael Paquier <michael.paquier@gmail.com>)
Список pgsql-hackers
Hello, I studied your lastest patch.

At Fri, 27 Nov 2015 16:59:20 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqRoaCMhr4hjEgq4rCZ4GaCB-6=cH8b2U7K7T5-kBGC5bA@mail.gmail.com>
> On Fri, Nov 27, 2015 at 3:42 PM, Michael Paquier wrote:
> > I am still investigating for a correct fix, looking at reinit.c the
> > code in charge of copying the init fork as the main fork for a
> > relation at the end of recovery looks to be doing its job correctly...
> 
> Attached is a patch that fixes the issue for me in master and 9.5.
> Actually in the last patch I forgot a call to smgrwrite to ensure that
> the INIT_FORKNUM is correctly synced to disk when those pages are
> replayed at recovery, letting the reset routines for unlogged
> relations do their job correctly. I have noticed as well that we need
> to do the same for gin and brin relations. In this case I think that
> we could limit the flush to unlogged relations, my patch does it
> unconditionally though to generalize the logic. Thoughts?

I feel quite uncomfortable that it solves the problem from a kind
of nature of unlogged object by arbitrary flagging which is not
fully corresponds to the nature. If we can deduce the necessity
of fsync from some nature, it would be preferable.

In the current patch, is_sync for log_newpage is generally true
for and only for INIT_FORKNUM pages. Exceptions as far as I can
see are,

copy_relation_data: called with arbitrary forknum but it doesn't  set is_fsync even for coying INIT_FORKNUM. (Is this
nota  problem?)
 

spgbuildempty, ginbuildempty: these emits two or three newpage logs at once so only the last one is set is_fsync for
performancereason.
 

And other anormallies are,

ginbuildempty, gistbuildempty: These funciton doesn't seem to immediately fsync but is_fsync is set to INIT_FORKNUM. Of
courseit wouldn't be a problem.
 


In short, it seems to me that the reason to choose using
XLOG_FPI_FOR_SYNC here is only performance of processing
successive FPIs for INIT_FORKNUM.

INIT_FORKNUM is generated only for unlogged tables and their
belongings. I suppose such successive fsyncs doesn't cause
observable performance drop assuming that the number of unlogged
tables and belongings is not so high, especially with smarter
storages. All we should do is that just fsync only for
INIT_FORKNUM's FPIs for the case. If the performance does matter
even so, we still can fsync the last md-file when any wal record
other than FPI for INIT_FORK comes. (But this would be a bit
complex..)

By the way, I suppose that fsyncing only the last page in
successive new pages still theoretically can cause this problem
when the last pages is not in the same file with other
pages. That cannot occur for INIT_FORKNUM files though in
reality:)

Thoughts?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





В списке pgsql-hackers по дате отправления:

Предыдущее
От: Vinayak
Дата:
Сообщение: Re: [PROPOSAL] VACUUM Progress Checker.
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: proposal: multiple psql option -c