Re: Assertion failure twophase.c (testing HS/SR)

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Assertion failure twophase.c (testing HS/SR)
Дата
Msg-id 4B881D7E.1080304@enterprisedb.com
обсуждение исходный текст
Ответ на Assertion failure twophase.c (testing HS/SR)  ("Erik Rijkers" <er@xs4all.nl>)
Список pgsql-hackers
Erik Rijkers wrote:
> 9.0devel (cvs yesterday) primary+server, with this patch:
>   extend_format_of_recovery_info_funcs_v2.patch
>   ( http://archives.postgresql.org/pgsql-hackers/2010-02/msg02116.php )
>
> A large (500 GB) restore left to run overnight, gave the below crash. The standby was restarted,
> and seems to be catching up OK.
>
> LOG:  entering standby mode
> LOG:  redo starts at 0/1000020
> LOG:  consistent recovery state reached at 0/2000000
> LOG:  database system is ready to accept read only connections
> TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File: "twophase.c", Line: 1201)
> LOG:  startup process (PID 21044) was terminated by signal 6: Aborted
> LOG:  terminating any other active server processes
> LOG:  database system was interrupted while in recovery at log time 2010-02-26 06:42:14 CET
> HINT:  If this has occurred more than once some data might be corrupted and you might need to
> choose an earlier recovery target.
> cp: cannot stat `/var/data1/pg_stuff/dump/hotslave/replication_archive/00000001000000150000003F':
> No such file or directory
> LOG:  entering standby mode
> LOG:  redo starts at 15/3400E828
> LOG:  consistent recovery state reached at 15/6D6D9FD8
> LOG:  database system is ready to accept read only connections

Aha, there seems to be a typo in KnownAssignedXidsRemoveMany(), see
attached patch.

But I wonder how an invalid XID found its way to that function, with
keepPreparedXacts==true? I don't think that should happen; it's called
from ExpireOldKnownAssignedTransactionIds(), which is called from
ProcArrayApplyRecoveryInfo() and at replay of checkpoint records in
xlog_redo. So either the oldestRunningXid value in a running-xacts
record or the nextXid field in a checkpoint record is invalid, and
neither should ever be.

We need to track that down, but in any case we should add an assertion
to ExpireOldAssignedTransactionIds() so that we still catch such cases
after fixing KnownAssignedXidsRemoveMany().

> (btw, I think I have seen this exact same one (File "twophase.c", Line: 1201) a few times before,
> without reporting it here, so it might have no connection to this particular patch. Sorry to be
> vague about that)

Thanks for the report, an assertion failure is always a bug.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 12de877..4691c51 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2317,6 +2317,7 @@ ExpireAllKnownAssignedTransactionIds(void)
 void
 ExpireOldKnownAssignedTransactionIds(TransactionId xid)
 {
+    Assert(TransactionIdIsValid(xid));
     LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
     KnownAssignedXidsRemoveMany(xid, true);
     LWLockRelease(ProcArrayLock);
@@ -2512,7 +2513,7 @@ KnownAssignedXidsRemoveMany(TransactionId xid, bool keepPreparedXacts)

         if (!TransactionIdIsValid(xid) || TransactionIdPrecedes(removeXid, xid))
         {
-            if (keepPreparedXacts && StandbyTransactionIdIsPrepared(xid))
+            if (keepPreparedXacts && StandbyTransactionIdIsPrepared(removeXid))
                 continue;
             else
             {

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Josh Berkus
Дата:
Сообщение: Re: Re: Hot Standby query cancellation and Streaming Replication integration
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Re: Hot Standby query cancellation and Streaming Replication integration