Обсуждение: txid_status returns NULL for recently commited transactions
Hello everyone.
I see strange beaviour of txid_status with one of my production servers.
SELECT txid_status(txid_current()) -> NULL
SELECT txid_current() -> 4447342811
It also returns null for recent commited and aborted transactions.
SELECT datname, age(datfrozenxid), datfrozenxid FROM pg_database;
template 0239961994 4207340131
postgres 289957756 4157344369
template1 289957661 4157344464
project 682347934 3764954191
GDB shows it happens because of TransactionIdPrecedes(xid, ShmemVariableCache->oldestClogXid)) at txid.c:155.
xid_epoch = 1
xid = 150227913
now_epoch = 1
now_epoch_last_xid = 150468261
ShmemVariableCache->oldestClogXid = 2207340131
The strangest thing is ShmemVariableCache->oldestClogXid == 2207340131 which is greater than current xid without an epoch (150227913) .
Postgres was updated from 9.6 to PostgreSQL 10.4 (Ubuntu 10.4-2.pgdg14.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit.
Also, another small issue:
On the same database:
SELECT txid_status(BIGINT '4294967295') -> 'commited'.
SELECT txid_status(BIGINT '4294967296') -> NULL
Both tx ids are from my head and not valid.
First 'commited' happends because int32 overflow: TransactionIdPrecedes(59856482, 2207340131) == false
Could anyone help me with understanding of txid_status behaviour?
Thanks,
Michail.
I see strange beaviour of txid_status with one of my production servers.
SELECT txid_status(txid_current()) -> NULL
SELECT txid_current() -> 4447342811
It also returns null for recent commited and aborted transactions.
SELECT datname, age(datfrozenxid), datfrozenxid FROM pg_database;
template 0239961994 4207340131
postgres 289957756 4157344369
template1 289957661 4157344464
project 682347934 3764954191
GDB shows it happens because of TransactionIdPrecedes(xid, ShmemVariableCache->oldestClogXid)) at txid.c:155.
xid_epoch = 1
xid = 150227913
now_epoch = 1
now_epoch_last_xid = 150468261
ShmemVariableCache->oldestClogXid = 2207340131
The strangest thing is ShmemVariableCache->oldestClogXid == 2207340131 which is greater than current xid without an epoch (150227913) .
Postgres was updated from 9.6 to PostgreSQL 10.4 (Ubuntu 10.4-2.pgdg14.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit.
Also, another small issue:
On the same database:
SELECT txid_status(BIGINT '4294967295') -> 'commited'.
SELECT txid_status(BIGINT '4294967296') -> NULL
Both tx ids are from my head and not valid.
First 'commited' happends because int32 overflow: TransactionIdPrecedes(59856482, 2207340131) == false
Could anyone help me with understanding of txid_status behaviour?
Thanks,
Michail.
Sorry, some auto formatting in recent email.
Again wtih fixed formatiing:
I see strange beaviour of txid_status with one of my production servers.
SELECT txid_status(txid_current()) -> NULL
SELECT txid_current() -> 4447342811
It also returns null for recent commited and aborted transactions.
SELECT datname, age(datfrozenxid), datfrozenxid FROM pg_database;
template 0239961994 4207340131
postgres 289957756 4157344369
template1 289957661 4157344464
project 682347934 3764954191
GDB shows it happens because of TransactionIdPrecedes(xid, ShmemVariableCache->oldestClogXid)) at txid.c:155.
xid_epoch = 1
xid = 150227913
now_epoch = 1
now_epoch_last_xid = 150468261
ShmemVariableCache->oldestClogXid = 2207340131
The strangest thing is ShmemVariableCache->oldestClogXid == 2207340131 which is greater than current xid without an epoch (150227913) .
Postgres was updated from 9.6 to PostgreSQL 10.4 (Ubuntu 10.4-2.pgdg14.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit.
Also, another small issue:
On the same database:
SELECT txid_status(BIGINT '4294967295') -> 'commited'.
SELECT txid_status(BIGINT '4294967296') -> NULL
Both tx ids are from my head and not valid.
First 'commited' happends because int32 overflow: TransactionIdPrecedes(59856482, 2207340131) == false
Could anyone help me with understanding of txid_status behaviour?
Thanks,
Michail.
Again wtih fixed formatiing:
I see strange beaviour of txid_status with one of my production servers.
SELECT txid_status(txid_current()) -> NULL
SELECT txid_current() -> 4447342811
It also returns null for recent commited and aborted transactions.
SELECT datname, age(datfrozenxid), datfrozenxid FROM pg_database;
template 0239961994 4207340131
postgres 289957756 4157344369
template1 289957661 4157344464
project 682347934 3764954191
GDB shows it happens because of TransactionIdPrecedes(xid, ShmemVariableCache->oldestClogXid)) at txid.c:155.
xid_epoch = 1
xid = 150227913
now_epoch = 1
now_epoch_last_xid = 150468261
ShmemVariableCache->oldestClogXid = 2207340131
The strangest thing is ShmemVariableCache->oldestClogXid == 2207340131 which is greater than current xid without an epoch (150227913) .
Postgres was updated from 9.6 to PostgreSQL 10.4 (Ubuntu 10.4-2.pgdg14.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit.
Also, another small issue:
On the same database:
SELECT txid_status(BIGINT '4294967295') -> 'commited'.
SELECT txid_status(BIGINT '4294967296') -> NULL
Both tx ids are from my head and not valid.
First 'commited' happends because int32 overflow: TransactionIdPrecedes(59856482, 2207340131) == false
Could anyone help me with understanding of txid_status behaviour?
Thanks,
Michail.
вт, 25 сент. 2018 г. в 19:47, Michail Nikolaev <michail.nikolaev@gmail.com>:
Hello everyone.
I see strange beaviour of txid_status with one of my production servers.
SELECT txid_status(txid_current()) -> NULL
SELECT txid_current() -> 4447342811
It also returns null for recent commited and aborted transactions.
SELECT datname, age(datfrozenxid), datfrozenxid FROM pg_database;
template 0239961994 4207340131
postgres 289957756 4157344369
template1 289957661 4157344464
project 682347934 3764954191
GDB shows it happens because of TransactionIdPrecedes(xid, ShmemVariableCache->oldestClogXid)) at txid.c:155.
xid_epoch = 1
xid = 150227913
now_epoch = 1
now_epoch_last_xid = 150468261
ShmemVariableCache->oldestClogXid = 2207340131
The strangest thing is ShmemVariableCache->oldestClogXid == 2207340131 which is greater than current xid without an epoch (150227913) .
Postgres was updated from 9.6 to PostgreSQL 10.4 (Ubuntu 10.4-2.pgdg14.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit.
Also, another small issue:
On the same database:
SELECT txid_status(BIGINT '4294967295') -> 'commited'.
SELECT txid_status(BIGINT '4294967296') -> NULL
Both tx ids are from my head and not valid.
First 'commited' happends because int32 overflow: TransactionIdPrecedes(59856482, 2207340131) == false
Could anyone help me with understanding of txid_status behaviour?
Thanks,
Michail.
Hi, On 2018-09-25 19:47:40 +0300, Michail Nikolaev wrote: > I see strange beaviour of txid_status with one of my production servers. What are you using it for? > SELECT txid_status(txid_current()) -> NULL I can't reproduce that... > SELECT txid_current() -> 4447342811 > > It also returns null for recent commited and aborted transactions. > > SELECT datname, age(datfrozenxid), datfrozenxid FROM pg_database; > template 0239961994 4207340131 > postgres 289957756 4157344369 <(415)%20734-4369> > template1 289957661 4157344464 <(415)%20734-4464> > project 682347934 3764954191 "0239961994" - I can't see how a leading zero is possible. Your mailer appears to do very annoying things by converting numbers to phone numbers. > GDB shows it happens because of TransactionIdPrecedes(xid, > ShmemVariableCache->oldestClogXid)) at txid.c:155. > > xid_epoch = 1 > xid = 150227913 > now_epoch = 1 > now_epoch_last_xid = 150468261 > ShmemVariableCache->oldestClogXid = 2207340131 <(220)%20734-0131> > > The strangest thing is ShmemVariableCache->oldestClogXid == 2207340131 > <(220)%20734-0131> which is greater than current xid without an epoch > (150227913) . It's from the last epoch. Plain xids are 32bit wide, the epochs deal with values that are bigger. And 2207340131 is less than 2^31 in the past. > Postgres was updated from 9.6 to PostgreSQL 10.4 (Ubuntu > 10.4-2.pgdg14.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu > 4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit. > > Also, another small issue: > > On the same database: > SELECT txid_status(BIGINT '4294967295') -> 'commited'. > SELECT txid_status(BIGINT '4294967296') -> NULL > > Both tx ids are from my head and not valid. > First 'commited' happends because int32 overflow: > TransactionIdPrecedes(59856482, 2207340131 <(220)%20734-0131>) == false Why do you think that is the wrong result? Greetings, Andres Freund
Hi, thanks for the reply!
> What are you using it for?
I want to use it to validate status of related entities in other database (queue) in short interval after PG transaction commit/rollback.
> I can't reproduce that...
Yes, it happens only with one cluster. All others work as expected.
> Your mailer appears to do very annoying things by converting numbers to
phone numbers.
Sorry.
> It's from the last epoch. Plain xids are 32bit wide, the epochs deal
> with values that are bigger. And 2207340131 is less than 2^31 in the
> past.
Yes, and probably it is cause of the issue.
ShmemVariableCache->oldestClogXid = 2207340131
xid_epoch = 1
xid = 150227913
TransactionIdPrecedes(xid, ShmemVariableCache->oldestClogXid)) return TRUE , then TransactionIdInRecentPast return FALSE and txtd_status return NULL.
But xid (1) and xid_epoch (150227913) are correct values from my active (or recently commited) transaction.
>> SELECT txid_status(BIGINT '4294967295') -> 'commited'.
>> SELECT txid_status(BIGINT '4294967296') -> NULL
> Why do you think that is the wrong result?
Let's leave it for now (maybe my misunderstanding). I think it is better to deal with "txid_status(txid_current()) -> NULL" issue first.
Thanks,
Michail.
> What are you using it for?
I want to use it to validate status of related entities in other database (queue) in short interval after PG transaction commit/rollback.
> I can't reproduce that...
Yes, it happens only with one cluster. All others work as expected.
> Your mailer appears to do very annoying things by converting numbers to
phone numbers.
Sorry.
> It's from the last epoch. Plain xids are 32bit wide, the epochs deal
> with values that are bigger. And 2207340131 is less than 2^31 in the
> past.
Yes, and probably it is cause of the issue.
ShmemVariableCache->oldestClogXid = 2207340131
xid_epoch = 1
xid = 150227913
TransactionIdPrecedes(xid, ShmemVariableCache->oldestClogXid)) return TRUE , then TransactionIdInRecentPast return FALSE and txtd_status return NULL.
But xid (1) and xid_epoch (150227913) are correct values from my active (or recently commited) transaction.
>> SELECT txid_status(BIGINT '4294967295') -> 'commited'.
>> SELECT txid_status(BIGINT '4294967296') -> NULL
> Why do you think that is the wrong result?
Let's leave it for now (maybe my misunderstanding). I think it is better to deal with "txid_status(txid_current()) -> NULL" issue first.
Thanks,
Michail.
Hello.
Got some new information.
There are 6 replicas and master in cluster. I rebooted two replicas... And they started to work as expected!
So, on master and 4 untouched replicas:
txid_current() -> 4459388265
txid_status(BIGINT '4459388265') -> NULL
On two rebooted replicas:
txid_status(BIGINT '4459388265') -> 'commited'
All replicas are in sync with master.
Root cause is ShmemVariableCache.oldestClogXid value, as described in previous message.
On master and 4 replicas:
(gdb) p ShmemVariableCache.oldestClogXid
$13 = 2207340131
(gdb) p ShmemVariableCache.oldestXid
$11 = 3764954191
On two rebooted replicas:
(gdb) p ShmemVariableCache.oldestClogXid
$14 = 3764954191
(gdb) p ShmemVariableCache.oldestXid
$12 = 3764954191
SELECT datname, age(datfrozenxid), datfrozenxid FROM pg_database;
template0 252790897 4207340131
postgres 302786659 4157344369
template1 302786564 4157344464
project 695176837 3764954191
As far as I remember master and replicas were not rebooted after upgrading from 9.6 to 10. So, maybe issue is upgrade-related.
Got some new information.
There are 6 replicas and master in cluster. I rebooted two replicas... And they started to work as expected!
So, on master and 4 untouched replicas:
txid_current() -> 4459388265
txid_status(BIGINT '4459388265') -> NULL
On two rebooted replicas:
txid_status(BIGINT '4459388265') -> 'commited'
All replicas are in sync with master.
Root cause is ShmemVariableCache.oldestClogXid value, as described in previous message.
On master and 4 replicas:
(gdb) p ShmemVariableCache.oldestClogXid
$13 = 2207340131
(gdb) p ShmemVariableCache.oldestXid
$11 = 3764954191
On two rebooted replicas:
(gdb) p ShmemVariableCache.oldestClogXid
$14 = 3764954191
(gdb) p ShmemVariableCache.oldestXid
$12 = 3764954191
SELECT datname, age(datfrozenxid), datfrozenxid FROM pg_database;
template0 252790897 4207340131
postgres 302786659 4157344369
template1 302786564 4157344464
project 695176837 3764954191
As far as I remember master and replicas were not rebooted after upgrading from 9.6 to 10. So, maybe issue is upgrade-related.
вт, 25 сент. 2018 г. в 22:22, Michail Nikolaev <michail.nikolaev@gmail.com>:
Hi, thanks for the reply!
> What are you using it for?I want to use it to validate status of related entities in other database (queue) in short interval after PG transaction commit/rollback.
> I can't reproduce that...Yes, it happens only with one cluster. All others work as expected.
> Your mailer appears to do very annoying things by converting numbers to
phone numbers.Sorry.
> It's from the last epoch. Plain xids are 32bit wide, the epochs deal
> with values that are bigger. And 2207340131 is less than 2^31 in the
> past.
xid_epoch = 1
xid = 150227913TransactionIdPrecedes(xid, ShmemVariableCache->oldestClogXid)) return TRUE , then TransactionIdInRecentPast return FALSE and txtd_status return NULL.
But xid (1) and xid_epoch (150227913) are correct values from my active (or recently commited) transaction.
>> SELECT txid_status(BIGINT '4294967295') -> 'commited'.
>> SELECT txid_status(BIGINT '4294967296') -> NULL
> Why do you think that is the wrong result?
Let's leave it for now (maybe my misunderstanding). I think it is better to deal with "txid_status(txid_current()) -> NULL" issue first.
Thanks,
Michail.