Re: Error with index on unlogged table

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Error with index on unlogged table
Дата
Msg-id 20150326175024.GJ451@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Error with index on unlogged table  (Andres Freund <andres@2ndquadrant.com>)
Ответы Re: Error with index on unlogged table  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Список pgsql-hackers
On 2015-03-26 15:13:41 +0100, Andres Freund wrote:
> On 2015-03-26 13:55:22 +0000, Thom Brown wrote:
> > I still, however, have a problem with the separate and original issue of:
> > 
> > # insert into utest (thing) values ('moomoo');
> > ERROR:  index "utest_pkey" contains unexpected zero page at block 0
> > HINT:  Please REINDEX it.
> > 
> > I don't see why the user should need to go re-indexing all unlogged tables
> > each time a standby is promoted.  The index should just be empty and ready
> > to use.
> 
> There's definitely something rather broken here. Investigating.

As far as I can see this has been broken at least since the introduction
of fast promotion. WAL replay will update the init fork in shared
memory, but it'll not be guaranteed to be flushed to disk when the reset
happens. d3586fc8a et al. then also made it possible to hit the issue
without fast promotion.

To hit the issue there may not be a restartpoint (requiring a checkpoint
on the primary) since the creation of the unlogged table.

I think the problem here is that the *primary* makes no such
assumptions. Init forks are logged via stuff likesmgrwrite(index->rd_smgr, INIT_FORKNUM, BTREE_METAPAGE,          (char
*)metapage, true);if (XLogIsNeeded())    log_newpage(&index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
BTREE_METAPAGE,metapage, false);
 
/* * An immediate sync is required even if we xlog'd the page, because the * write did not go through shared_buffers
andtherefore a concurrent * checkpoint may have moved the redo pointer past our xlog record.
*/smgrimmedsync(index->rd_smgr,INIT_FORKNUM);
 

i.e. the data is written out directly to disk, circumventing
shared_buffers. It's pretty bad that we don't do the same on the
standby. For master I think we should just add a bit to the XLOG_FPI
record saying the data should be forced out to disk. I'm less sure
what's to be done in the back branches. Flushing every HEAP_NEWPAGE
record isn't really an option.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Index-only scans for GiST.
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0