Hello.
Got some new information.
There are 6 replicas and master in cluster. I rebooted two replicas... And they started to work as expected!
So, on master and 4 untouched replicas:
txid_current() -> 4459388265
txid_status(BIGINT '4459388265') -> NULL
On two rebooted replicas:
txid_status(BIGINT '4459388265') -> 'commited'
All replicas are in sync with master.
Root cause is ShmemVariableCache.oldestClogXid value, as described in previous message.
On master and 4 replicas:
(gdb) p ShmemVariableCache.oldestClogXid
$13 = 2207340131
(gdb) p ShmemVariableCache.oldestXid
$11 = 3764954191
On two rebooted replicas:
(gdb) p ShmemVariableCache.oldestClogXid
$14 = 3764954191
(gdb) p ShmemVariableCache.oldestXid
$12 = 3764954191
SELECT datname, age(datfrozenxid), datfrozenxid FROM pg_database;
template0 252790897 4207340131
postgres 302786659 4157344369
template1 302786564 4157344464
project 695176837 3764954191
As far as I remember master and replicas were not rebooted after upgrading from 9.6 to 10. So, maybe issue is upgrade-related.
Hi, thanks for the reply!
> What are you using it for?
I want to use it to validate status of related entities in other database (queue) in short interval after PG transaction commit/rollback.
> I can't reproduce that...
Yes, it happens only with one cluster. All others work as expected.
> Your mailer appears to do very annoying things by converting numbers to
phone numbers.
Sorry.
> It's from the last epoch. Plain xids are 32bit wide, the epochs deal
> with values that are bigger. And
2207340131 is less than 2^31 in the
> past.
Yes, and probably it is cause of the issue.
ShmemVariableCache->oldestClogXid =
2207340131
xid_epoch = 1
xid = 150227913
TransactionIdPrecedes(xid, ShmemVariableCache->oldestClogXid)) return TRUE , then TransactionIdInRecentPast return FALSE and txtd_status return NULL.
But xid (1) and xid_epoch (150227913) are correct values from my active (or recently commited) transaction.
>> SELECT txid_status(BIGINT '4294967295') -> 'commited'.
>> SELECT txid_status(BIGINT '4294967296') -> NULL
> Why do you think that is the wrong result?
Let's leave it for now (maybe my misunderstanding). I think it is better to deal with "txid_status(txid_current()) -> NULL" issue first.
Thanks,
Michail.