Обсуждение: [PATCH] Don't call ereport(ERROR) from recovery target GUC assign hooks
The five recovery target GUC assign hooks -- assign_recovery_target,
assign_recovery_target_lsn, assign_recovery_target_name,
assign_recovery_target_time, and assign_recovery_target_xid -- all call
error_multiple_recovery_targets() when a second conflicting target is
detected, which invokes ereport(ERROR). The GUC README
(src/backend/utils/misc/README) is explicit: "There is no provision for
a failure result code. assign_hooks should never fail." Raising an
error from an assign hook leaves guc.c's internal state inconsistent
before the abort, because the abort handling path was not designed to
run mid-assign.
The code acknowledges this with an XXX comment that has been there for
years:
XXX this code is broken by design. Throwing an error from a GUC
assign hook breaks fundamental assumptions of guc.c. So long as
all the variables for which this can happen are PGC_POSTMASTER, the
consequences are limited, since we'd just abort postmaster startup
anyway. Nonetheless it's likely that we have odd behaviors such as
unexpected GUC ordering dependencies.
The "limited consequences" argument is true enough that this hasn't
caused visible failures in practice, but fixing a known contract
violation seems worthwhile.
The fix is to remove the conflict check from all five assign hooks and
detect conflicts in validateRecoveryParameters() instead. That
function already runs after all GUCs have been loaded (called from
InitWalRecovery() in the startup process), so it can safely read each
GUC's current value via GetConfigOption() and count how many are
non-empty. If more than one is set, it reports FATAL, consistent with
the other validation errors already in that function.
This changes when the error fires: it now happens in the startup process
rather than in the postmaster's ProcessConfigFile. The outcome is the
same (server does not start), but guc.c's state is no longer disturbed.
There is one secondary behavioral change: when recovery is not actually
requested (ArchiveRecoveryRequested is false), validateRecoveryParameters
returns early and never checks for conflicts, so conflicting recovery
target settings are silently ignored. The old code would reject them
even then, since assign hooks fire unconditionally during ProcessConfigFile.
I think the new behavior is arguably more correct -- those GUCs have no
effect when recovery is not requested, so there is no reason to treat
their values as an error. The existing TAP test in 003_recovery_targets.pl
already covers the conflict-in-recovery case; this patch adds a test for
the new behavior (conflicting targets accepted outside recovery).
Patch attached.
Вложения
Hi Shin,
Thanks for taking this one on. The XXX comment has been there for a
long time and fixing a known contract violation is worthwhile. The
mechanical part of the patch, moving the conflict check from the
assign hooks into validateRecoveryParameters(), looks right to me.
A few things worth discussing before this gets a committer's eye:
1. Behavioral change when recovery is not requested.
The patch's commit message notes that conflicting recovery target
settings are now silently accepted when ArchiveRecoveryRequested is
false, and argues this is "arguably more correct" because those GUCs
have no effect outside recovery. I am not sure I agree. Today a
misconfigured postgresql.conf (say, both recovery_target_time and
recovery_target_xid present) is caught immediately at postmaster
start. After the patch, that same misconfiguration boots
successfully, and the operator only finds out later when they add
recovery.signal and the startup process FATALs. That is a real
downgrade in error-detection timing.
Is it feasible to keep the early-detection behavior without violating
the assign-hook contract? One option: emit a WARNING (not ERROR)
from the assign hook, or defer the check to a post-config-file pass
(GUC_check_hook chain, or a one-shot check in PostmasterMain before
the startup process is forked). I do not have a strong opinion on
the mechanism, but I think the user-visible behavior (conflicting
recovery_target_* settings are caught at server start) is worth
preserving if possible.
If preserving that is not feasible, I think the commit message should
flag the timing change more prominently, since it is a user-visible
change to when postgres refuses to start.
2. Test coverage of the conflict-in-recovery path.
The existing 003_recovery_targets.pl has a test (not modified by
this patch, as far as I can tell) that exercises the multiple-targets
rejection. With v1, that rejection fires from
validateRecoveryParameters() rather than from an assign hook, so the
error message source changes even if the text is identical. CFBot
is green, so the existing regex must still match, but it would be
good to confirm which test covers this and to verify the test's
assertions actually validate the new FATAL path, not just "server
failed to start for some reason".
3. errdetail wording.
errdetail("At most one of \"recovery_target\", ..., "
"\"recovery_target_xid\" may be set.")
The error-message style guide (doc/src/sgml/error-style-guide.sgml)
says to avoid "may" because it reads as permission rather than
ability. Suggest "can be set" instead. Trivial, but since we are
already touching this message.
4. Test cleanup fragility.
The new test case does:
$node_primary->append_conf('postgresql.conf', "recovery_target_name = ...
recovery_target_time = ...");
$node_primary->start;
...
ALTER SYSTEM RESET recovery_target_name;
ALTER SYSTEM RESET recovery_target_time;
$node_primary->restart;
append_conf permanently extends postgresql.conf. The ALTER SYSTEM
RESET writes to postgresql.auto.conf which takes precedence, so the
stale postgresql.conf lines are effectively masked for subsequent
test steps, but they remain in the file, and any later test that
relies on a predictable postgresql.conf content (for example a test
that inspects ConfigFileVar or does its own append) could be
confused. Using ALTER SYSTEM SET for the setup would be cleaner,
or using a dedicated temporary cluster for this one case. Also,
the ok() predicate `defined $primary_pid && $primary_pid ne ''` is
redundant. safe_psql would have died earlier on start failure.
5. Nit: block comment.
The new block inside validateRecoveryParameters() could benefit from
noting that ArchiveRecoveryRequested is guaranteed true at this
point (because of the early return above it), so a reader does not
have to scroll up to confirm.
Nothing here is a blocker. I think the overall direction is right.
The behavioral change question in item 1 is the main design decision
I would want the author's thinking on.
Thanks,
Greg
On Mon, Apr 13, 2026 at 5:21 PM JoongHyuk Shin <sjh910805@gmail.com> wrote:
> The "limited consequences" argument is true enough that this hasn't
> caused visible failures in practice, but fixing a known contract
> violation seems worthwhile.
+1
> The fix is to remove the conflict check from all five assign hooks and
> detect conflicts in validateRecoveryParameters() instead. That
> function already runs after all GUCs have been loaded (called from
> InitWalRecovery() in the startup process), so it can safely read each
> GUC's current value via GetConfigOption() and count how many are
> non-empty. If more than one is set, it reports FATAL, consistent with
> the other validation errors already in that function.
In the master, when the following two recovery targets are specified,
the recovery target assign hook detects that multiple targets were given
and reports an error. With the patch, however, the same settings do not
raise an error, recoveryTarget is set to RECOVERY_TARGET_UNSET, and
recovery unexpectedly proceeds with no target. Could this be a bug
in the patch?
recovery_target_xid = '9999'
recovery_target_time = ''
Regards,
--
Fujii Masao
Re: [PATCH] Don't call ereport(ERROR) from recovery target GUC assign hooks
От
Michael Paquier
Дата:
On Fri, Apr 24, 2026 at 10:08:04PM +0900, Fujii Masao wrote: > In the master, when the following two recovery targets are specified, > the recovery target assign hook detects that multiple targets were given > and reports an error. With the patch, however, the same settings do not > raise an error, recoveryTarget is set to RECOVERY_TARGET_UNSET, and > recovery unexpectedly proceeds with no target. Could this be a bug > in the patch? > > recovery_target_xid = '9999' > recovery_target_time = '' Don't think so. You are specifying for recovery_target_time the same thing as the default, as in "I don't know and do nothing about the time". Why would it matter to make the difference between a default value set and what's stored by default if nothing is set in this case? FWIW, I am wondering if we should seriously consider this stuff as candidate for a backpatch because this is a design mistake: we should never *ever* rely on the GUC hooks to do cross-checks of multiple values, f2cbffc7a618 deciding that it was a right thing to do. It's not. The risk of breaking something may not justify that a backpatch. +1 for reworking that on HEAD, at least. -- Michael
Вложения
On Mon, Apr 27, 2026 at 10:52 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Apr 24, 2026 at 10:08:04PM +0900, Fujii Masao wrote: > > In the master, when the following two recovery targets are specified, > > the recovery target assign hook detects that multiple targets were given > > and reports an error. With the patch, however, the same settings do not > > raise an error, recoveryTarget is set to RECOVERY_TARGET_UNSET, and > > recovery unexpectedly proceeds with no target. Could this be a bug > > in the patch? > > > > recovery_target_xid = '9999' > > recovery_target_time = '' > > Don't think so. You are specifying for recovery_target_time the same > thing as the default, as in "I don't know and do nothing about the > time". Why would it matter to make the difference between a default > value set and what's stored by default if nothing is set in this case? With those settings, how should recovery behave? I would expect it to behave as in master, i.e., detect that multiple targets were specified and report an error. Alternatively, it might be OK for me to proceed with recovery_target_xid = '9999' and ignore recovery_target_time = '', since that matches the default. With the proposed patch, however, both settings are ignored and recovery starts with no target. That seems unexpected to me. > +1 for reworking that on HEAD, at least. I was thinking the same. +1 Regards, -- Fujii Masao
Re: [PATCH] Don't call ereport(ERROR) from recovery target GUC assign hooks
От
Michael Paquier
Дата:
On Mon, Apr 27, 2026 at 02:36:11PM +0900, Fujii Masao wrote: > With the proposed patch, however, both settings are ignored and > recovery starts with no target. That seems unexpected to me. If that's the case (not tested myself), agreed. -- Michael
Вложения
Thanks for the reviews.
v2 attached.
* Conflict check moved to a new static CheckRecoveryTargetConflicts(),
called from validateRecoveryParameters() before its early return.
It runs at every startup, so misconfiguration is caught as in master.
I kept it in startup process rather than PostmasterMain (Greg'ssuggestion),
matching the existing recovery validation there.
* Removed each assign hook's `else recoveryTarget = UNSET` branch
(B in Fujii's framing). Fixes the empty-string clobber Fujii reported,
`recovery_target_xid='9999' + recovery_target_time=''` was silently
running with no target.
003_recovery_targets.pl now covers it (fails on v1, passes on v2).
* errdetail "may" -> "can" (Greg).
* TAP test that asserted the v1 regression is replaced with one
asserting conflict rejection at every startup.
Agreed: HEAD only, no backpatch.
--
JH Shin
v2 attached.
* Conflict check moved to a new static CheckRecoveryTargetConflicts(),
called from validateRecoveryParameters() before its early return.
It runs at every startup, so misconfiguration is caught as in master.
I kept it in startup process rather than PostmasterMain (Greg'ssuggestion),
matching the existing recovery validation there.
* Removed each assign hook's `else recoveryTarget = UNSET` branch
(B in Fujii's framing). Fixes the empty-string clobber Fujii reported,
`recovery_target_xid='9999' + recovery_target_time=''` was silently
running with no target.
003_recovery_targets.pl now covers it (fails on v1, passes on v2).
* errdetail "may" -> "can" (Greg).
* TAP test that asserted the v1 regression is replaced with one
asserting conflict rejection at every startup.
Agreed: HEAD only, no backpatch.
--
JH Shin
On Mon, Apr 27, 2026 at 3:00 PM Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Apr 27, 2026 at 02:36:11PM +0900, Fujii Masao wrote:
> With the proposed patch, however, both settings are ignored and
> recovery starts with no target. That seems unexpected to me.
If that's the case (not tested myself), agreed.
--
Michael
Вложения
On Wed, Apr 29, 2026 at 6:30 PM JoongHyuk Shin <sjh910805@gmail.com> wrote:
>
> Thanks for the reviews.
>
> v2 attached.
Thanks for updating the patch!
When I started postgres with the following command, recovery_target_xid was
treated as unset in the master, but with the patch the recovery_target_xid=700
setting was used instead. This behavior seems unexpected to me. Thoughts?
postgres -D data -c "recovery_target_xid=700" -c "recovery_target_xid="
Regards,
--
Fujii Masao