Обсуждение: enhance wraparound warnings

Поиск
Список
Период
Сортировка

enhance wraparound warnings

От
Nathan Bossart
Дата:
varsup.c has the following comment:

    /*
     * We'll start complaining loudly when we get within 40M transactions of
     * data loss.  This is kind of arbitrary, but if you let your gas gauge
     * get down to 2% of full, would you be looking for the next gas station?
     * We need to be fairly liberal about this number because there are lots
     * of scenarios where most transactions are done by automatic clients that
     * won't pay attention to warnings.  (No, we're not gonna make this
     * configurable.  If you know enough to configure it, you know enough to
     * not get in this kind of trouble in the first place.)
     */

I don't know about you, but I start getting antsy around a quarter tank.
In any case, I'm told that even 40M transactions aren't enough time to
react these days.  Attached are a few patches to enhance the wraparound
warnings.

* 0001 adds a "percent remaining" detail message to the existing WARNING.
The idea is that "1.86% of transaction IDs" is both easier to understand
and better indicates urgency than "39985967 transactions".

* 0002 bumps the warning limit from 40M to 100M to give folks some more
time to react.

* 0003 adds an early warning system for when fewer than 500M transactions
remain.  This system sends a LOG only to the server log every 1M
transactions.  The hope is that this gets someone's attention sooner
without flooding the application and server log.

Thoughts?

-- 
nathan

Вложения

Re: enhance wraparound warnings

От
Nathan Bossart
Дата:

Re: enhance wraparound warnings

От
Chao Li
Дата:
Hi Nathan,

I just reviewed the patch. My comments are mainly in 0001, and a few nits on 0003. For 0002, the code change is quite
straightforward,I am not sure the value bumping to has been discussed. 

> On Dec 12, 2025, at 04:28, Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> rebased
>
> --
> nathan
>
<v2-0001-Add-percentage-of-transaction-IDs-that-are-availa.patch><v2-0002-Bump-transaction-ID-limit-to-warn-at-100M.patch><v2-0003-Perodically-emit-server-logs-when-fewer-than-500M.patch>

1 - 0001
```
+                                   (double) (multiWrapLimit - result) / PG_INT32_MAX * 100),
```

I don’t feel good with using PG_INT32_MAX as denominator, though the value is correct.

Looking at the code of how xidWrapLimit is calculated:
```
    /*
     * The place where we actually get into deep trouble is halfway around
     * from the oldest potentially-existing XID.  (This calculation is
     * probably off by one or two counts, because the special XIDs reduce the
     * size of the loop a little bit.  But we throw in plenty of slop below,
     * so it doesn't matter.)
     */
    xidWrapLimit = oldest_datfrozenxid + (MaxTransactionId >> 1);
    if (xidWrapLimit < FirstNormalTransactionId)
        xidWrapLimit += FirstNormalTransactionId;
```

Where "(MaxTransactionId >> 1)” has the same value as PG_INT32_MAX. But if one day xid is changed to 64 bits, that code
doesn’tneed to updated, while these patched code will need to be updated. 

So, can we define a const in transom.h like:
```
#define MaxTransactionId ((TransactionId) 0xFFFFFFFF)
#define WrapAroundWindow (MaxTransactionId>>1)
```

And use WrapAroundWindow in all places.

2 - 0001
```
+                         errdetail("Approximately %.2f%% of MultiXactIds are available for use.",
```

“%.2f%%” shows only 2 digits after dot. xidWrapLimit is roughly 2B, when remaining goes down to 107374, it will shows
“0.00%”.IMO, when remaining is a large number, percentage makes more sense, while an exact number is clearer when the
numberis relatively small. So, can we show both percentage and exact number? Or shows the exact number when percentage
is0.00%? 

3 - 0001
```
 <programlisting>
 WARNING:  database "mydb" must be vacuumed within 39985967 transactions
+DETAIL:  Approximately 1.86% of transactions IDs are available for use.
```

Typo: " transactions IDs” => " transaction IDs"

4 - 0003
```
Subject: [PATCH v2 3/3] Perodically emit server logs when fewer than 500M
```

Typo: Perodically => Periodically

5 - 0003
```
+    xidLogLimit = xidWrapLimit - 500000000;
```

Instead of hardcode 500M, do we want to consider autovacuum_freeze_max_age? If a deployment sets
autovacuum_freeze_max_age> 500M, then vacuum would be triggered first, then this log can get kinda non-intuitive. But
ifa vacuum cannot freeze anything tuple, then this log will still make sense. I am not sure. Maybe not a real problem. 

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







Re: enhance wraparound warnings

От
Nathan Bossart
Дата:
On Fri, Dec 12, 2025 at 10:59:53AM +0800, Chao Li wrote:
> I just reviewed the patch. My comments are mainly in 0001, and a few nits
> on 0003. For 0002, the code change is quite straightforward, I am not
> sure the value bumping to has been discussed.

Thanks!

> Where "(MaxTransactionId >> 1)” has the same value as PG_INT32_MAX. But
> if one day xid is changed to 64 bits, that code doesn’t need to updated,
> while these patched code will need to be updated.
> 
> So, can we define a const in transom.h like:
> ```
> #define MaxTransactionId ((TransactionId) 0xFFFFFFFF)
> #define WrapAroundWindow (MaxTransactionId>>1)
> ```
> 
> And use WrapAroundWindow in all places.

I think I'd rather just open-code the (MaxTransactionId / 2) here.  I'm not
too concerned about 64-bit transaction IDs (there's a lot more than this to
change for that), but it does seem like a good idea to be consistent with
nearby code.

> ```
> +                         errdetail("Approximately %.2f%% of MultiXactIds are available for use.",
> ```
> 
> “%.2f%%” shows only 2 digits after dot. xidWrapLimit is roughly 2B, when
> remaining goes down to 107374, it will shows “0.00%”. IMO, when remaining
> is a large number, percentage makes more sense, while an exact number is
> clearer when the number is relatively small. So, can we show both
> percentage and exact number? Or shows the exact number when percentage is
> 0.00%?

The errmsg part should already show the exact number of IDs remaining.

> ```
> +    xidLogLimit = xidWrapLimit - 500000000;
> ```
> 
> Instead of hardcode 500M, do we want to consider
> autovacuum_freeze_max_age? If a deployment sets autovacuum_freeze_max_age
> > 500M, then vacuum would be triggered first, then this log can get kinda
> non-intuitive. But if a vacuum cannot freeze anything tuple, then this
> log will still make sense. I am not sure. Maybe not a real problem.

IMHO we should still emit warnings about imminent wraparound even if
autovacuum_freeze_max_age is set to totally-inadvisable values.  I think
the behavior you are describing only happens if users set it to north of
1.6B.

-- 
nathan

Вложения