Обсуждение: Report oldest xmin source when autovacuum cannot remove tuples

Поиск
Список
Период
Сортировка

Report oldest xmin source when autovacuum cannot remove tuples

От
Shinya Kato
Дата:
Hi hackers,

I am proposing to add the reason for the oldest xmin to VACUUM logs.
This feature would be useful for identifying why dead tuples cannot be
removed, thereby helping to diagnose and prevent table bloat.

The current logs only indicate that dead tuples could not be reclaimed
due to the oldest xmin, but they do not reveal the underlying reason.
To identify the cause, it is necessary to query multiple views:
pg_stat_activity (for active transactions), pg_prepared_xacts (for
prepared statements), pg_replication_slots (for replication slots),
and pg_stat_replication (for hot standby feedback). However, because
the data in these views is volatile, it is difficult to retroactively
determine what was holding the oldest xmin at the specific time the
log message was generated.

This PoC patch addresses this problem. The implementation now outputs
the reason for the oldest xmin and, where applicable, the backend PID.
This information was originally discarded when calculating the oldest
xmin horizon, and the computation required to retrieve these reasons
is considered reasonable.

The patch is attached. What do you think?

-- 
Best regards,
Shinya Kato
NTT OSS Center

Вложения

Re: Report oldest xmin source when autovacuum cannot remove tuples

От
wenhui qiu
Дата:
HI 
  Thank you for your path ,This path is extremely helpful.
> +/*
> + * Identifies what determined a relation's OldestXmin horizon.
> + * Used by autovacuum to report why dead tuples were not removable.
> + */
> +typedef enum OldestXminSource
> +{
> + OLDESTXMIN_SOURCE_ACTIVE_TRANSACTION,
> + OLDESTXMIN_SOURCE_HOT_STANDBY_FEEDBACK,
> + OLDESTXMIN_SOURCE_PREPARED_TRANSACTION,
> + OLDESTXMIN_SOURCE_REPLICATION_SLOT,
> + OLDESTXMIN_SOURCE_OTHER
> +} OldestXminSource;
> +
> +typedef struct OldestXminInfo
> +{
> + OldestXminSource source;
> + int backend_pid;
> +} OldestXminInfo;
I have a question for like this 
one session 
begin;
select * from table_a
not commit or not closed session 
It is in OLDESTXMIN_SOURCE_ACTIVE_TRANSACTION type ?



Thank 

On Fri, Oct 31, 2025 at 2:32 PM Shinya Kato <shinya11.kato@gmail.com> wrote:
Hi hackers,

I am proposing to add the reason for the oldest xmin to VACUUM logs.
This feature would be useful for identifying why dead tuples cannot be
removed, thereby helping to diagnose and prevent table bloat.

The current logs only indicate that dead tuples could not be reclaimed
due to the oldest xmin, but they do not reveal the underlying reason.
To identify the cause, it is necessary to query multiple views:
pg_stat_activity (for active transactions), pg_prepared_xacts (for
prepared statements), pg_replication_slots (for replication slots),
and pg_stat_replication (for hot standby feedback). However, because
the data in these views is volatile, it is difficult to retroactively
determine what was holding the oldest xmin at the specific time the
log message was generated.

This PoC patch addresses this problem. The implementation now outputs
the reason for the oldest xmin and, where applicable, the backend PID.
This information was originally discarded when calculating the oldest
xmin horizon, and the computation required to retrieve these reasons
is considered reasonable.

The patch is attached. What do you think?

--
Best regards,
Shinya Kato
NTT OSS Center

Re: Report oldest xmin source when autovacuum cannot remove tuples

От
Shinya Kato
Дата:
On Fri, Oct 31, 2025 at 5:01 PM wenhui qiu <qiuwenhuifx@gmail.com> wrote:
>
> HI
>   Thank you for your path ,This path is extremely helpful.

Thank you!

> > +/*
> > + * Identifies what determined a relation's OldestXmin horizon.
> > + * Used by autovacuum to report why dead tuples were not removable.
> > + */
> > +typedef enum OldestXminSource
> > +{
> > + OLDESTXMIN_SOURCE_ACTIVE_TRANSACTION,
> > + OLDESTXMIN_SOURCE_HOT_STANDBY_FEEDBACK,
> > + OLDESTXMIN_SOURCE_PREPARED_TRANSACTION,
> > + OLDESTXMIN_SOURCE_REPLICATION_SLOT,
> > + OLDESTXMIN_SOURCE_OTHER
> > +} OldestXminSource;
> > +
> > +typedef struct OldestXminInfo
> > +{
> > + OldestXminSource source;
> > + int backend_pid;
> > +} OldestXminInfo;
> I have a question for like this
> one session
> begin;
> select * from table_a
> not commit or not closed session
> It is in OLDESTXMIN_SOURCE_ACTIVE_TRANSACTION type ?

Exactly. Looking at 010_autovacuum_oldest_xmin_reason.pl should make
it clear which logs are output in which cases. (I just noticed there
seems to be a case where the test fails. I need to fix that.)

--
Best regards,
Shinya Kato
NTT OSS Center



Re: Report oldest xmin source when autovacuum cannot remove tuples

От
Fujii Masao
Дата:
On Fri, Oct 31, 2025 at 3:32 PM Shinya Kato <shinya11.kato@gmail.com> wrote:
>
> Hi hackers,
>
> I am proposing to add the reason for the oldest xmin to VACUUM logs.
> This feature would be useful for identifying why dead tuples cannot be
> removed, thereby helping to diagnose and prevent table bloat.

+1

I like this idea. Thanks for working on this!


> This PoC patch addresses this problem. The implementation now outputs
> the reason for the oldest xmin and, where applicable, the backend PID.
> This information was originally discarded when calculating the oldest
> xmin horizon, and the computation required to retrieve these reasons
> is considered reasonable.
>
> The patch is attached. What do you think?

According to cfbot, the 010_autovacuum_oldest_xmin_reason regression test
passes on some platforms but fails on others (see [1]), so it doesn't
appear stable.


When I set up a primary and standby with hot_standby_feedback enabled,
then created an old prepared transaction expected to prevent dead tuples
from being vacuumed, VACUUM VERBOSE reported "hot standby feedback"
instead of "prepared transaction" as the oldest xmin source. This isn't a bug
since both xmins are the same in this case. But it may be confusing?
Would it be better to report "prepared transaction" in such cases?


+ case OLDESTXMIN_SOURCE_ACTIVE_TRANSACTION:
+ msgfmt = include_pid ?
+ _("oldest xmin source: active transaction (pid=%d)\n") :
+ _("oldest xmin source: active transaction\n");
+ break;
+ case OLDESTXMIN_SOURCE_HOT_STANDBY_FEEDBACK:
+ msgfmt = include_pid ?
+ _("oldest xmin source: hot standby feedback (pid=%d)\n") :
+ _("oldest xmin source: hot standby feedback\n");

In these two cases, the PID should always be non-zero, so the message
formats without (pid=%d) might not be necessary.


+ /* Identify what forced each of the horizons above. */
+ OldestXminInfo shared_oldest_nonremovable_info;
+ OldestXminInfo catalog_oldest_nonremovable_info;
+ OldestXminInfo data_oldest_nonremovable_info;
+ OldestXminInfo temp_oldest_nonremovable_info;

It might be good to add a comment explaining why we track
sources only for these four oldest xmins, and not for others
like oldest_considered_running.


+ TransactionId old;
+ TransactionId new_horizon;
+
+ if (!TransactionIdIsValid(candidate))
+ return;

The TransactionIdIsValid(candidate) check may be redundant,
since TransactionIdOlder(old, candidate) already performs
the same validation.


- switch (GlobalVisHorizonKindForRel(rel))
+ kind = GlobalVisHorizonKindForRel(rel);
+ switch (kind)

This change doesn't seem necessary.

Regards,

[1] https://cirrus-ci.com/task/6063548834512896

--
Fujii Masao



Re: Report oldest xmin source when autovacuum cannot remove tuples

От
wenhui qiu
Дата:
Hi 
Thank you for your response , In this context, that session  state is idle in transaction .If we could further distinguish between active sessions that are still running and idle sessions, that would be clearer.




Thanks

On Tue, Nov 4, 2025 at 11:59 PM Fujii Masao <masao.fujii@gmail.com> wrote:
On Fri, Oct 31, 2025 at 3:32 PM Shinya Kato <shinya11.kato@gmail.com> wrote:
>
> Hi hackers,
>
> I am proposing to add the reason for the oldest xmin to VACUUM logs.
> This feature would be useful for identifying why dead tuples cannot be
> removed, thereby helping to diagnose and prevent table bloat.

+1

I like this idea. Thanks for working on this!


> This PoC patch addresses this problem. The implementation now outputs
> the reason for the oldest xmin and, where applicable, the backend PID.
> This information was originally discarded when calculating the oldest
> xmin horizon, and the computation required to retrieve these reasons
> is considered reasonable.
>
> The patch is attached. What do you think?

According to cfbot, the 010_autovacuum_oldest_xmin_reason regression test
passes on some platforms but fails on others (see [1]), so it doesn't
appear stable.


When I set up a primary and standby with hot_standby_feedback enabled,
then created an old prepared transaction expected to prevent dead tuples
from being vacuumed, VACUUM VERBOSE reported "hot standby feedback"
instead of "prepared transaction" as the oldest xmin source. This isn't a bug
since both xmins are the same in this case. But it may be confusing?
Would it be better to report "prepared transaction" in such cases?


+ case OLDESTXMIN_SOURCE_ACTIVE_TRANSACTION:
+ msgfmt = include_pid ?
+ _("oldest xmin source: active transaction (pid=%d)\n") :
+ _("oldest xmin source: active transaction\n");
+ break;
+ case OLDESTXMIN_SOURCE_HOT_STANDBY_FEEDBACK:
+ msgfmt = include_pid ?
+ _("oldest xmin source: hot standby feedback (pid=%d)\n") :
+ _("oldest xmin source: hot standby feedback\n");

In these two cases, the PID should always be non-zero, so the message
formats without (pid=%d) might not be necessary.


+ /* Identify what forced each of the horizons above. */
+ OldestXminInfo shared_oldest_nonremovable_info;
+ OldestXminInfo catalog_oldest_nonremovable_info;
+ OldestXminInfo data_oldest_nonremovable_info;
+ OldestXminInfo temp_oldest_nonremovable_info;

It might be good to add a comment explaining why we track
sources only for these four oldest xmins, and not for others
like oldest_considered_running.


+ TransactionId old;
+ TransactionId new_horizon;
+
+ if (!TransactionIdIsValid(candidate))
+ return;

The TransactionIdIsValid(candidate) check may be redundant,
since TransactionIdOlder(old, candidate) already performs
the same validation.


- switch (GlobalVisHorizonKindForRel(rel))
+ kind = GlobalVisHorizonKindForRel(rel);
+ switch (kind)

This change doesn't seem necessary.

Regards,

[1] https://cirrus-ci.com/task/6063548834512896

--
Fujii Masao


Re: Report oldest xmin source when autovacuum cannot remove tuples

От
Sami Imseih
Дата:
Thanks for starting this thread! This is a very useful
feature that users will find beneficial to easily narrow
down the reason the xmin horizon is being held back,
and take action.

Adding this information to the vacuum logging is useful, but
I can see this information being exposed in a view as well in
the future.

I have a few comments:

A few minor ones:

1/ pid should be declared as "pid_t"

2/ last value of an enum should be have a traling comma
+typedef enum OldestXminSource
+{
+       OLDESTXMIN_SOURCE_ACTIVE_TRANSACTION,
+       OLDESTXMIN_SOURCE_HOT_STANDBY_FEEDBACK,
+       OLDESTXMIN_SOURCE_PREPARED_TRANSACTION,
+       OLDESTXMIN_SOURCE_REPLICATION_SLOT,
+       OLDESTXMIN_SOURCE_OTHER
+} OldestXminSource;

More importantly:

3/ As mentioned earlier in the thread, the "idle-in-transaction"
transactions is not being reported correctly, particularly for write
tansactions. I think that is an important missing case. The reason
for this is the cutoff xmin is not being looked up against the current
list of xid's, so we are not blaming the correct pid.

4/
Thinking about point 3 above, I began to wonder if this
whole thing can be simplified with inspiration. Looking at the
existing  BackendXidGetPid(), I think it can.

Based on BackendXidGetPid(), I tried a new routine called
BackendXidFindCutOffReason() which can take in the cutoff xmin,
passed in by vacuum and can walk though the proc array and
determine the reason. We don't need to touch ComputeXidHorizons()
to make this work, it seems to me. This comes with an additional
walk though the procarray holding a shared lock, but I don't think
this will be an issue.

Attached is a rough sketch of BackendXidFindCutOffReason()
For now, I just added NOTICE messages which will log with
VACUUM (verbose) for testing.

This takes what you are doing in v1 inside ComputeXidHorizons()
into a new routine. I think this is a cleaner approach.

5/ Also, I think we should also include tests for serializable
transactions


What do you think?

--

Sami Imseih
Amazon Web Services (AWS)

Вложения

Re: Report oldest xmin source when autovacuum cannot remove tuples

От
wenhui qiu
Дата:
Hi Sami 

> Thinking about point 3 above, I began to wonder if this
> whole thing can be simplified with inspiration. Looking at the
> existing  BackendXidGetPid(), I think it can.

> Based on BackendXidGetPid(), I tried a new routine called
> BackendXidFindCutOffReason() which can take in the cutoff xmin,
> passed in by vacuum and can walk though the proc array and
> determine the reason. We don't need to touch ComputeXidHorizons()
> to make this work, it seems to me. This comes with an additional
> walk though the procarray holding a shared lock, but I don't think
> this will be an issue.

> Attached is a rough sketch of BackendXidFindCutOffReason()
> For now, I just added NOTICE messages which will log with
> VACUUM (verbose) for testing.
I like your idea , I think we also could consider introducing a GUC parameter in the future, which would terminate sessions blocking vacuum operations when the table's age reaches vacuum_failsafe_age.


Thanks 

On Sat, Nov 15, 2025 at 8:25 AM Sami Imseih <samimseih@gmail.com> wrote:
Thanks for starting this thread! This is a very useful
feature that users will find beneficial to easily narrow
down the reason the xmin horizon is being held back,
and take action.

Adding this information to the vacuum logging is useful, but
I can see this information being exposed in a view as well in
the future.

I have a few comments:

A few minor ones:

1/ pid should be declared as "pid_t"

2/ last value of an enum should be have a traling comma
+typedef enum OldestXminSource
+{
+       OLDESTXMIN_SOURCE_ACTIVE_TRANSACTION,
+       OLDESTXMIN_SOURCE_HOT_STANDBY_FEEDBACK,
+       OLDESTXMIN_SOURCE_PREPARED_TRANSACTION,
+       OLDESTXMIN_SOURCE_REPLICATION_SLOT,
+       OLDESTXMIN_SOURCE_OTHER
+} OldestXminSource;

More importantly:

3/ As mentioned earlier in the thread, the "idle-in-transaction"
transactions is not being reported correctly, particularly for write
tansactions. I think that is an important missing case. The reason
for this is the cutoff xmin is not being looked up against the current
list of xid's, so we are not blaming the correct pid.

4/
Thinking about point 3 above, I began to wonder if this
whole thing can be simplified with inspiration. Looking at the
existing  BackendXidGetPid(), I think it can.

Based on BackendXidGetPid(), I tried a new routine called
BackendXidFindCutOffReason() which can take in the cutoff xmin,
passed in by vacuum and can walk though the proc array and
determine the reason. We don't need to touch ComputeXidHorizons()
to make this work, it seems to me. This comes with an additional
walk though the procarray holding a shared lock, but I don't think
this will be an issue.

Attached is a rough sketch of BackendXidFindCutOffReason()
For now, I just added NOTICE messages which will log with
VACUUM (verbose) for testing.

This takes what you are doing in v1 inside ComputeXidHorizons()
into a new routine. I think this is a cleaner approach.

5/ Also, I think we should also include tests for serializable
transactions


What do you think?

--

Sami Imseih
Amazon Web Services (AWS)

Re: Report oldest xmin source when autovacuum cannot remove tuples

От
Dilip Kumar
Дата:
On Sat, Nov 15, 2025 at 5:56 AM Sami Imseih <samimseih@gmail.com> wrote:
>
> Thanks for starting this thread! This is a very useful
> feature that users will find beneficial to easily narrow
> down the reason the xmin horizon is being held back,
> and take action.

+1 for the idea.  In BackendXidFindCutOffReason() you have directly
reported using NOTICE I believe that is just to show the idea and you
are planning to append this to the main message?  Apart from that we
are looping the whole pgprocarray, however it is only done when we are
vacuuming with verbose mode so might not be that bad.

--
Regards,
Dilip Kumar
Google



Re: Report oldest xmin source when autovacuum cannot remove tuples

От
Sami Imseih
Дата:
> > Thanks for starting this thread! This is a very useful
> > feature that users will find beneficial to easily narrow
> > down the reason the xmin horizon is being held back,
> > and take action.
>
> +1 for the idea.  In BackendXidFindCutOffReason() you have directly
> reported using NOTICE I believe that is just to show the idea and you
> are planning to append this to the main message?

Yes, the NOTICE is just for demonstration. Some more work is needed
to integrate the output into the vacuum log.

> Apart from that we are looping the whole pgprocarray, however it is
> only done when we are vacuuming with verbose mode so might not
> be that bad.

The extra procarray loop will occur during VACUUM VERBOSE
or when a vacuum exceeds log_autovacuum_min_duration, 10
minutes by default.

I do think however, we should be more selective when to
do this work. We should only care about emitting this
information in autovacuum logging if the cutoffs->OldestXmid
did not advance between consecutive vacuums. This will
mean we we will need to track the last cutoff value in relation stats
(pg_stat_user_tables), but I think having this value in
stats will be useful on its own actually, as users can use
it to track tables with stalled cutoffs even if they don't
have sufficient autovacuum logging.

What do you think?

--
Sami Imseih
Amazon Web Services (AWS)



Re: Report oldest xmin source when autovacuum cannot remove tuples

От
Andres Freund
Дата:
Hi,

On 2025-10-31 15:31:40 +0900, Shinya Kato wrote:
> The patch is attached. What do you think?

The added tests never seem to pass in CI:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F6188

Greetings,

Andres Freund