Re: Assertion failure on hot standby

Поиск
Список
Период
Сортировка
От Fujii Masao
Тема Re: Assertion failure on hot standby
Дата
Msg-id AANLkTikGGKPFPBGSVxF2C2oPsMcopchZjUjLU129vrg8@mail.gmail.com
обсуждение исходный текст
Ответ на Assertion failure on hot standby  (Fujii Masao <masao.fujii@gmail.com>)
Ответы Re: Assertion failure on hot standby  (Simon Riggs <simon@2ndQuadrant.com>)
Список pgsql-hackers
On Wed, Nov 24, 2010 at 1:27 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> Hi,
>
> http://archives.postgresql.org/pgsql-hackers/2010-11/msg01303.php
>
> When I did unusual operations (e.g., suspend bgwriter by gdb,
> pgbench -i and issue txid_current many times) on the master
> in order to try to reproduce the above HS error, I encountered
> the following assertion error.
>
> Since I compiled the standby postgres with WAL_DEBUG and
> ran it with wal_debug = on, all the replayed WAL records were
> logged.
>
> ------------------------
> sby LOG:  REDO @ 0/134C0490; LSN 0/134C04D0: prev 0/134C0450; xid
> 23253; len 32: Transaction - commit: 2010-11-24 12:15:02.315634+09
> sby LOG:  REDO @ 0/134C04D0; LSN 0/134C0510: prev 0/134C0490; xid
> 23254; len 32: Transaction - commit: 2010-11-24 12:15:02.325252+09
> sby LOG:  consistent recovery state reached at 0/134C0510
> sby LOG:  REDO @ 0/134C0510; LSN 0/134C0550: prev 0/134C04D0; xid
> 23255; len 32: Transaction - commit: 2010-11-24 12:15:09.224343+09
> sby LOG:  REDO @ 0/134C0550; LSN 0/134C0580: prev 0/134C0510; xid 0;
> len 16: Standby - AccessExclusive locks: xid 0 db 11910 rel 16409
> sby LOG:  REDO @ 0/134C0580; LSN 0/134C05B8: prev 0/134C0550; xid 0;
> len 20: Standby -  running xacts: nextXid 23256 latestCompletedXid
> 23255 oldestRunningXid 23256
> TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File:
> "twophase.c", Line: 1209)
> sby LOG:  database system is ready to accept read only connections
> sby LOG:  startup process (PID 32666) was terminated by signal 6: Aborted
> sby LOG:  terminating any other active server processes
> ------------------------
>
> Does anyone know what the cause of the problem is?

I was able to reproduce this problem. This happens because CHECKPOINT
can write the WAL record indicating that the transaction with XID = 0 has taken
the AccessExclusive lock. This WAL record causes that assertion failure in the
standby.

Here is the procedure to reproduce the problem:

-------------------
1. Execute "DROP TABLE" and suspend the execution before calling
    RemoveRelations -> LockRelationOid -> LockAcquire ->
LockAcquireExtended -> LogAccessExclusiveLock
    by, for example, using gdb.

2. While "DROP TABLE" is being suspended, execute CHECKPOINT.    This CHECKPOINT will generate the above-mentioned WAL
record.
-------------------

To solve the problem, ISTM that XID should be assigned before the information
about AccessExclusive lock becomes visible to another process. Or CHECKPOINT
(i.e., GetRunningTransactionLocks) should ignore the locks with XID = 0.

Thought?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Itagaki Takahiro
Дата:
Сообщение: Re: format() with embedded to_char() formatter
Следующее
От: Shigeru HANADA
Дата:
Сообщение: SQL/MED - file_fdw