Re: Scaling XLog insertion (was Re: Moving more work outside WALInsertLock)

Поиск
Список
Период
Сортировка
От Fujii Masao
Тема Re: Scaling XLog insertion (was Re: Moving more work outside WALInsertLock)
Дата
Msg-id CAHGQGwGRuNJ=_ctXwteNkFkdvMDNFYxFdn0D1cd-CqL0OgNCLg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Scaling XLog insertion (was Re: Moving more work outside WALInsertLock)  (Jeff Janes <jeff.janes@gmail.com>)
Список pgsql-hackers
On Sun, Feb 19, 2012 at 3:01 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> I've tested your v9 patch.  I no longer see any inconsistencies or
> lost transactions in the recovered database.  But occasionally I get
> databases that fail to recover at all.
> It has always been with the exact same failed assertion, at xlog.c line 2154.
>
> I've only seen this 4 times out of 2202 cycles of crash and recovery,
> so it must be some rather obscure situation.
>
> LOG:  database system was not properly shut down; automatic recovery in progress
> LOG:  redo starts at 0/180001B0
> LOG:  unexpected pageaddr 0/15084000 in log file 0, segment 25, offset 540672
> LOG:  redo done at 0/19083FD0
> LOG:  last completed transaction was at log time 2012-02-17 11:13:50.369488-08
> LOG:  checkpoint starting: end-of-recovery immediate
> TRAP: FailedAssertion("!(((((((uint64) (NewPageEndPtr).xlogid *
> (uint64) (((uint32) 0xffffffff) / ((uint32) (16 * 1024 * 1024))) *
> ((uint32) (16 * 1024 * 1024))) + (NewPageEndPtr).xrecoff - 1)) / 8192)
> % (XLogCtl->XLogCacheBlck + 1)) == nextidx)", File: "xlog.c", Line:
> 2154)
> LOG:  startup process (PID 5390) was terminated by signal 6: Aborted
> LOG:  aborting startup due to startup process failure

I could reproduce this when I made the server crash just after executing
"select pg_switch_xlog()".

$ initdb -D data
$ pg_ctl -D data start
$ psql -c "select pg_switch_xlog()"
$ pg_ctl -D data stop -m i
$ pg_ctl -D data start
...
LOG:  redo done at 0/16E3B0C
TRAP: FailedAssertion("!(((((((uint64) (NewPageEndPtr).xlogid *
(uint64) (((uint32) 0xffffffff) / ((uint32) (16 * 1024 * 1024))) *
((uint32) (16 * 1024 * 1024))) + (NewPageEndPtr).xrecoff - 1)) / 8192)
% (XLogCtl->XLogCacheBlck + 1)) == nextidx)", File: "xlog.c", Line:
2154)
LOG:  startup process (PID 16361) was terminated by signal 6: Aborted
LOG:  aborting startup due to startup process failure

Though I've not read new patch yet, I doubt that xlog switch code would
still have a bug.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Jehan-Guillaume (ioguix) de Rorthais"
Дата:
Сообщение: Re: Google Summer of Code? Call for mentors.
Следующее
От: "Albe Laurenz"
Дата:
Сообщение: Re: pgsql_fdw, FDW for PostgreSQL server