Обсуждение: txid_status returns NULL for recently commited transactions

Поиск
Список
Период
Сортировка

txid_status returns NULL for recently commited transactions

От
Michail Nikolaev
Дата:
Hello everyone.

I see strange beaviour of txid_status with one of my production servers.

SELECT txid_status(txid_current()) -> NULL
SELECT txid_current() -> 4447342811

It also returns null for recent commited and aborted transactions.

SELECT datname, age(datfrozenxid), datfrozenxid FROM pg_database;
template       0239961994 4207340131
postgres       289957756 4157344369
template1     289957661 4157344464
project          682347934 3764954191

GDB shows it happens because of TransactionIdPrecedes(xid, ShmemVariableCache->oldestClogXid)) at txid.c:155.

xid_epoch = 1
xid = 150227913
now_epoch = 1
now_epoch_last_xid = 150468261
ShmemVariableCache->oldestClogXid = 2207340131

The strangest thing is ShmemVariableCache->oldestClogXid == 2207340131 which is greater than current xid without an epoch (150227913) .

Postgres was updated from 9.6 to PostgreSQL 10.4 (Ubuntu 10.4-2.pgdg14.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit.

Also, another small issue:

On the same database:
SELECT txid_status(BIGINT '4294967295') -> 'commited'.
SELECT txid_status(BIGINT '4294967296') -> NULL

Both tx ids are from my head and not valid.
First 'commited' happends because int32 overflow: TransactionIdPrecedes(59856482, 2207340131) == false

Could anyone help me with understanding of txid_status behaviour?

Thanks,
Michail.

Re: txid_status returns NULL for recently commited transactions

От
Michail Nikolaev
Дата:
Sorry, some auto formatting in recent email.

Again wtih fixed formatiing:

I see strange beaviour of txid_status with one of my production servers.

SELECT txid_status(txid_current()) -> NULL
SELECT txid_current() -> 4447342811

It also returns null for recent commited and aborted transactions.

SELECT datname, age(datfrozenxid), datfrozenxid FROM pg_database;
template 0239961994 4207340131
postgres 289957756 4157344369
template1 289957661 4157344464
project 682347934 3764954191

GDB shows it happens because of TransactionIdPrecedes(xid, ShmemVariableCache->oldestClogXid)) at txid.c:155.

xid_epoch = 1
xid = 150227913
now_epoch = 1
now_epoch_last_xid = 150468261
ShmemVariableCache->oldestClogXid = 2207340131

The strangest thing is ShmemVariableCache->oldestClogXid == 2207340131 which is greater than current xid without an epoch (150227913) .

Postgres was updated from 9.6 to PostgreSQL 10.4 (Ubuntu 10.4-2.pgdg14.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit.

Also, another small issue:

On the same database:
SELECT txid_status(BIGINT '4294967295') -> 'commited'.
SELECT txid_status(BIGINT '4294967296') -> NULL

Both tx ids are from my head and not valid.
First 'commited' happends because int32 overflow: TransactionIdPrecedes(59856482, 2207340131) == false

Could anyone help me with understanding of txid_status behaviour?

Thanks,
Michail.



вт, 25 сент. 2018 г. в 19:47, Michail Nikolaev <michail.nikolaev@gmail.com>:
Hello everyone.

I see strange beaviour of txid_status with one of my production servers.

SELECT txid_status(txid_current()) -> NULL
SELECT txid_current() -> 4447342811

It also returns null for recent commited and aborted transactions.

SELECT datname, age(datfrozenxid), datfrozenxid FROM pg_database;
template       0239961994 4207340131
postgres       289957756 4157344369
template1     289957661 4157344464
project          682347934 3764954191

GDB shows it happens because of TransactionIdPrecedes(xid, ShmemVariableCache->oldestClogXid)) at txid.c:155.

xid_epoch = 1
xid = 150227913
now_epoch = 1
now_epoch_last_xid = 150468261
ShmemVariableCache->oldestClogXid = 2207340131

The strangest thing is ShmemVariableCache->oldestClogXid == 2207340131 which is greater than current xid without an epoch (150227913) .

Postgres was updated from 9.6 to PostgreSQL 10.4 (Ubuntu 10.4-2.pgdg14.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit.

Also, another small issue:

On the same database:
SELECT txid_status(BIGINT '4294967295') -> 'commited'.
SELECT txid_status(BIGINT '4294967296') -> NULL

Both tx ids are from my head and not valid.
First 'commited' happends because int32 overflow: TransactionIdPrecedes(59856482, 2207340131) == false

Could anyone help me with understanding of txid_status behaviour?

Thanks,
Michail.

Re: txid_status returns NULL for recently commited transactions

От
Andres Freund
Дата:
Hi,

On 2018-09-25 19:47:40 +0300, Michail Nikolaev wrote:
> I see strange beaviour of txid_status with one of my production servers.

What are you using it for?


> SELECT txid_status(txid_current()) -> NULL

I can't reproduce that...


> SELECT txid_current() -> 4447342811
> 
> It also returns null for recent commited and aborted transactions.
> 
> SELECT datname, age(datfrozenxid), datfrozenxid FROM pg_database;
> template       0239961994 4207340131
> postgres       289957756 4157344369 <(415)%20734-4369>
> template1     289957661 4157344464 <(415)%20734-4464>
> project          682347934 3764954191

"0239961994" - I can't see how a leading zero is possible.

Your mailer appears to do very annoying things by converting numbers to
phone numbers.


> GDB shows it happens because of TransactionIdPrecedes(xid,
> ShmemVariableCache->oldestClogXid)) at txid.c:155.
> 
> xid_epoch = 1
> xid = 150227913
> now_epoch = 1
> now_epoch_last_xid = 150468261
> ShmemVariableCache->oldestClogXid = 2207340131 <(220)%20734-0131>
> 
> The strangest thing is ShmemVariableCache->oldestClogXid == 2207340131
> <(220)%20734-0131> which is greater than current xid without an epoch
> (150227913) .

It's from the last epoch.  Plain xids are 32bit wide, the epochs deal
with values that are bigger.  And 2207340131 is less than 2^31 in the
past.

> Postgres was updated from 9.6 to PostgreSQL 10.4 (Ubuntu
> 10.4-2.pgdg14.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
> 4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit.
> 
> Also, another small issue:
> 
> On the same database:
> SELECT txid_status(BIGINT '4294967295') -> 'commited'.
> SELECT txid_status(BIGINT '4294967296') -> NULL
> 
> Both tx ids are from my head and not valid.
> First 'commited' happends because int32 overflow:
> TransactionIdPrecedes(59856482, 2207340131 <(220)%20734-0131>) == false

Why do you think that is the wrong result?

Greetings,

Andres Freund


Re: txid_status returns NULL for recently commited transactions

От
Michail Nikolaev
Дата:
Hi, thanks for the reply!

> What are you using it for?

I want to use it to validate status of related entities in other database (queue) in short interval after PG transaction commit/rollback.

> I can't reproduce that...

Yes, it happens only with one cluster. All others work as expected.

> Your mailer appears to do very annoying things by converting numbers to
phone numbers.

Sorry.

> It's from the last epoch. Plain xids are 32bit wide, the epochs deal
> with values that are bigger. And 2207340131 is less than 2^31 in the
> past.

Yes, and probably it is cause of the issue.

ShmemVariableCache->oldestClogXid = 2207340131
xid_epoch = 1
xid = 150227913
TransactionIdPrecedes(xid, ShmemVariableCache->oldestClogXid)) return TRUE , then TransactionIdInRecentPast return FALSE and txtd_status return NULL.

But xid (1) and xid_epoch (150227913) are correct values from my active (or recently commited) transaction.

>> SELECT txid_status(BIGINT '4294967295') -> 'commited'.
>> SELECT txid_status(BIGINT '4294967296') -> NULL
> Why do you think that is the wrong result?

Let's leave it for now (maybe my misunderstanding). I think it is better to deal with "txid_status(txid_current()) -> NULL" issue first.

Thanks,
Michail.

Re: txid_status returns NULL for recently commited transactions

От
Michail Nikolaev
Дата:
Hello.

Got some new information.

There are 6 replicas and master in cluster. I rebooted two replicas... And they started to work as expected!

So, on master and 4 untouched replicas:
txid_current() -> 4459388265
txid_status(BIGINT '4459388265') -> NULL

On two rebooted replicas:
txid_status(BIGINT '4459388265') -> 'commited'

All replicas are in sync with master.

Root cause is ShmemVariableCache.oldestClogXid value, as described in previous message.

On master and 4 replicas:
(gdb) p ShmemVariableCache.oldestClogXid
$13 = 2207340131
(gdb) p ShmemVariableCache.oldestXid
$11 = 3764954191

On two rebooted replicas:
(gdb) p ShmemVariableCache.oldestClogXid
$14 = 3764954191
(gdb) p ShmemVariableCache.oldestXid
$12 = 3764954191

SELECT datname, age(datfrozenxid), datfrozenxid FROM pg_database;
template0    252790897     4207340131
postgres      302786659     4157344369
template1    302786564     4157344464
project         695176837     3764954191

As far as I remember master and replicas were not rebooted after upgrading from 9.6 to 10. So, maybe issue is upgrade-related.

вт, 25 сент. 2018 г. в 22:22, Michail Nikolaev <michail.nikolaev@gmail.com>:
Hi, thanks for the reply!


> What are you using it for?

I want to use it to validate status of related entities in other database (queue) in short interval after PG transaction commit/rollback.


> I can't reproduce that...

Yes, it happens only with one cluster. All others work as expected.


> Your mailer appears to do very annoying things by converting numbers to
phone numbers.

Sorry.


> It's from the last epoch. Plain xids are 32bit wide, the epochs deal
> with values that are bigger. And 2207340131 is less than 2^31 in the
> past.

Yes, and probably it is cause of the issue.

ShmemVariableCache->oldestClogXid = 2207340131

xid_epoch = 1
xid = 150227913
TransactionIdPrecedes(xid, ShmemVariableCache->oldestClogXid)) return TRUE , then TransactionIdInRecentPast return FALSE and txtd_status return NULL.

But xid (1) and xid_epoch (150227913) are correct values from my active (or recently commited) transaction.

>> SELECT txid_status(BIGINT '4294967295') -> 'commited'.
>> SELECT txid_status(BIGINT '4294967296') -> NULL
> Why do you think that is the wrong result?

Let's leave it for now (maybe my misunderstanding). I think it is better to deal with "txid_status(txid_current()) -> NULL" issue first.

Thanks,
Michail.