Обсуждение: clog segment truncation
Hello, hackers!
I`m currently investigating the case of one of our PostgreSQL instance having started issuing error messages about missing clog segment:
ERROR: could not access status of transaction 1550558894
DETAIL: Could not open file "pg_xact/05C6": No such file or directory.
After examining the WAL records just prior the first sighing of these errors, it became apparent, that clog segment was truncated by autovacuum
rmgr: CLOG len (rec/tot): 38/ 38, tx: 0, lsn: 81A/87C899E0, prev 81A/87C895A0, desc: TRUNCATE page 48264; oldestXact 1581542039
After looking closely at the affected heap pages (there are scores of them) with tuples, created by 1550558894 transaction, and comparing them with pg_visibility state and pg_class.relfrozenxig, it became apparent, that there is may be something awry with either setting all_frozen bit in _vm, or oldestXact calculation by vacuum.
According to pg_visibility, all_frozen bit is set:
wms-svc-inspecting=# select * from pg_visibility('affected_relation', 5456224)
wms-svc-inspecting-# ;
all_visible | all_frozen | pd_all_visible
-------------+------------+----------------
t | t | f
(1 row)
So all tuples on this page should be marked with HEAP_XMIN_FROZEN, but that is not the case
SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask, t_hoff, t_bits, heap_tuple_infomask_flags(t_infomask, t_infomask2) FROM heap_page_items(get_raw_page('affected_relation', 5456224)) ;
-[ RECORD 1 ]-------------+-----------------------------------------------------------------------------
lp | 1
lp_off | 7976
lp_flags | 1
lp_len | 216
t_xmin | 1550558894
t_xmax | 0
t_field3 | 0
t_ctid | (5456224,1)
t_infomask2 | 12
t_infomask | 2051
t_hoff | 32
t_bits | 1111111011100000
heap_tuple_infomask_flags | ("{HEAP_HASNULL,HEAP_HASVARWIDTH,HEAP_XMAX_INVALID}",{})
-[ RECORD 2 ]-------------+-----------------------------------------------------------------------------
lp | 2
lp_off | 7768
lp_flags | 1
lp_len | 208
t_xmin | 1550558894
t_xmax | 0
t_field3 | 0
t_ctid | (5456224,2)
t_infomask2 | 12
t_infomask | 2307
t_hoff | 32
t_bits | 1111111111100000
heap_tuple_infomask_flags | ("{HEAP_HASNULL,HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID}",{})
-[ RECORD 3 ]-------------+-----------------------------------------------------------------------------
lp | 3
lp_off | 7520
lp_flags | 1
lp_len | 248
t_xmin | 1550558894
t_xmax | 0
t_field3 | 0
t_ctid | (5456224,3)
t_infomask2 | 12
t_infomask | 2051
t_hoff | 32
t_bits | 1111111111100000
heap_tuple_infomask_flags | ("{HEAP_HASNULL,HEAP_HASVARWIDTH,HEAP_XMAX_INVALID}",{})
-[ RECORD 4 ]-------------+-----------------------------------------------------------------------------
....
-[ RECORD 23 ]------------+-----------------------------------------------------------------------------
lp | 23
lp_off | 272
lp_flags | 1
lp_len | 312
t_xmin | 1550558894
t_xmax | 0
t_field3 | 0
t_ctid | (5456224,23)
t_infomask2 | 12
t_infomask | 2051
t_hoff | 32
t_bits | 1111111111100000
heap_tuple_infomask_flags | ("{HEAP_HASNULL,HEAP_HASVARWIDTH,HEAP_XMAX_INVALID}",{})
Also pg_class.relfrozenxig contains xid which exceed 1550558894:
# select relname, relfrozenxid from pg_class where oid = 'affected_relation'::regclass::oid;
-[ RECORD 1 ]+-----------
relname | affected_relation
relfrozenxid | 1609822074
So is it possible that either all_frozen bit was set incorrectly or wansn't unset when tuples were created?
I will be happy to provide any additional information if required.
PostgreSQL version: 15.6, data_checksums: on
I`m currently investigating the case of one of our PostgreSQL instance having started issuing error messages about missing clog segment:
ERROR: could not access status of transaction 1550558894
DETAIL: Could not open file "pg_xact/05C6": No such file or directory.
After examining the WAL records just prior the first sighing of these errors, it became apparent, that clog segment was truncated by autovacuum
rmgr: CLOG len (rec/tot): 38/ 38, tx: 0, lsn: 81A/87C899E0, prev 81A/87C895A0, desc: TRUNCATE page 48264; oldestXact 1581542039
After looking closely at the affected heap pages (there are scores of them) with tuples, created by 1550558894 transaction, and comparing them with pg_visibility state and pg_class.relfrozenxig, it became apparent, that there is may be something awry with either setting all_frozen bit in _vm, or oldestXact calculation by vacuum.
According to pg_visibility, all_frozen bit is set:
wms-svc-inspecting=# select * from pg_visibility('affected_relation', 5456224)
wms-svc-inspecting-# ;
all_visible | all_frozen | pd_all_visible
-------------+------------+----------------
t | t | f
(1 row)
So all tuples on this page should be marked with HEAP_XMIN_FROZEN, but that is not the case
SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask, t_hoff, t_bits, heap_tuple_infomask_flags(t_infomask, t_infomask2) FROM heap_page_items(get_raw_page('affected_relation', 5456224)) ;
-[ RECORD 1 ]-------------+-----------------------------------------------------------------------------
lp | 1
lp_off | 7976
lp_flags | 1
lp_len | 216
t_xmin | 1550558894
t_xmax | 0
t_field3 | 0
t_ctid | (5456224,1)
t_infomask2 | 12
t_infomask | 2051
t_hoff | 32
t_bits | 1111111011100000
heap_tuple_infomask_flags | ("{HEAP_HASNULL,HEAP_HASVARWIDTH,HEAP_XMAX_INVALID}",{})
-[ RECORD 2 ]-------------+-----------------------------------------------------------------------------
lp | 2
lp_off | 7768
lp_flags | 1
lp_len | 208
t_xmin | 1550558894
t_xmax | 0
t_field3 | 0
t_ctid | (5456224,2)
t_infomask2 | 12
t_infomask | 2307
t_hoff | 32
t_bits | 1111111111100000
heap_tuple_infomask_flags | ("{HEAP_HASNULL,HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID}",{})
-[ RECORD 3 ]-------------+-----------------------------------------------------------------------------
lp | 3
lp_off | 7520
lp_flags | 1
lp_len | 248
t_xmin | 1550558894
t_xmax | 0
t_field3 | 0
t_ctid | (5456224,3)
t_infomask2 | 12
t_infomask | 2051
t_hoff | 32
t_bits | 1111111111100000
heap_tuple_infomask_flags | ("{HEAP_HASNULL,HEAP_HASVARWIDTH,HEAP_XMAX_INVALID}",{})
-[ RECORD 4 ]-------------+-----------------------------------------------------------------------------
....
-[ RECORD 23 ]------------+-----------------------------------------------------------------------------
lp | 23
lp_off | 272
lp_flags | 1
lp_len | 312
t_xmin | 1550558894
t_xmax | 0
t_field3 | 0
t_ctid | (5456224,23)
t_infomask2 | 12
t_infomask | 2051
t_hoff | 32
t_bits | 1111111111100000
heap_tuple_infomask_flags | ("{HEAP_HASNULL,HEAP_HASVARWIDTH,HEAP_XMAX_INVALID}",{})
Also pg_class.relfrozenxig contains xid which exceed 1550558894:
# select relname, relfrozenxid from pg_class where oid = 'affected_relation'::regclass::oid;
-[ RECORD 1 ]+-----------
relname | affected_relation
relfrozenxid | 1609822074
So is it possible that either all_frozen bit was set incorrectly or wansn't unset when tuples were created?
I will be happy to provide any additional information if required.
PostgreSQL version: 15.6, data_checksums: on
On Mon Nov 10, 2025 at 9:29 AM -03, Smolkin Grigory wrote: > Hello, hackers! > > I`m currently investigating the case of one of our PostgreSQL instance > having started issuing error messages about missing clog segment: > ERROR: could not access status of transaction 1550558894 > DETAIL: Could not open file "pg_xact/05C6": No such file or directory. > Is this error happening with the LISTEN/NOTIFY? There is a discussion going on [1] that aims to fix this issue for the LISTEN/NOTIFY. [1] https://www.postgresql.org/message-id/CAK98qZ3wZLE-RZJN_Y%2BTFjiTRPPFPBwNBpBi5K5CU8hUHkzDpw%40mail.gmail.com -- Matheus Alcantara EDB: http://www.enterprisedb.com
On Mon Nov 10, 2025 at 9:29 AM -03, Smolkin Grigory wrote: > So is it possible that either all_frozen bit was set incorrectly or wansn't > unset when tuples were created? > I will be happy to provide any additional information if required. > PostgreSQL version: 15.6, data_checksums: on > Actually the 15.6 is quite old. I would also recommend to upgrade to the latest 15 release which is 15.14. I didn't find any specific release note about clog truncation but you are missing some important fixes though. There is this fix [1] on 15.8 but I'm not sure if it's fully related with your issue faced here. [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=c809a2b2d -- Matheus Alcantara EDB: http://www.enterprisedb.com
On Mon Nov 10, 2025 at 9:29 AM -03, Smolkin Grigory wrote:
> Hello, hackers!
>
> I`m currently investigating the case of one of our PostgreSQL instance
> having started issuing error messages about missing clog segment:
> ERROR: could not access status of transaction 1550558894
> DETAIL: Could not open file "pg_xact/05C6": No such file or directory.
>
Is this error happening with the LISTEN/NOTIFY? There is a discussion
going on [1] that aims to fix this issue for the LISTEN/NOTIFY.
No, LISTEN/NOTIFY are not used.
So do you have a step by step to reproduce the issue?
Unfortunately, no.
Or at least some more information about what the server was running that caused the
error?
By the looks of it, nothing special, really, just plain inserts and occasional index-scan reads (tuple (5456224,2) looks like it was read by index-scan, because it has HEAP_XMIN_COMMITTED stamped on).
Instance wasn't subjected to power outages, kernel crashes or hardware issues.
On Mon, Nov 10, 2025 at 4:43 PM Matheus Alcantara <matheusssilv97@gmail.com> wrote:
On Mon Nov 10, 2025 at 9:29 AM -03, Smolkin Grigory wrote:
> So is it possible that either all_frozen bit was set incorrectly or wansn't
> unset when tuples were created?
> I will be happy to provide any additional information if required.
> PostgreSQL version: 15.6, data_checksums: on
>
Actually the 15.6 is quite old.
It is, but I didn't see any fixes that can possibly be related to this case, but maybe I've missed something.
I didn't find any specific release note about clog truncation but you
are missing some important fixes though.
There is this fix [1] on 15.8 but I'm not sure if it's fully related
with your issue faced here.
[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=c809a2b2d
Looks like it is more about multi-xact truncation, so I'm also not sure.