Обсуждение: Re: Fix assert failure when decoding XLOG_PARAMETER_CHANGE on primary
On Fri, Jan 24, 2025 at 4:05 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> When a standby replays a XLOG_PARAMETER_CHANGE record that lowers
> wal_level from logical, we invalidate all logical slots only when the
> standby is in hot standby mode:
>
> if (InRecovery && InHotStandby &&
>     xlrec.wal_level < WAL_LEVEL_LOGICAL &&
>     wal_level >= WAL_LEVEL_LOGICAL)
>     InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_LEVEL,
>                                        0, InvalidOid,
>                                        InvalidTransactionId);
>
> However, it's possible that this record is replayed when not in hot
> standby mode and the slot is used after the promotion. In this case,
> the following Assert in xlog_decode() fails:
>
> /*
>  * This can occur only on a standby, as a primary would
>  * not allow to restart after changing wal_level < logical
>  * if there is pre-existing logical slot.
>  */
Shouldn't we do similar to what this comment indicates on standby? We
can disallow to start the server as standby, if the hot_standby is off
and there is a pre-existing logical slot.
> Assert(RecoveryInProgress());
> ereport(ERROR,
>         (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>          errmsg("logical decoding on standby requires \"wal_level\" >=
> \"logical\" on the primary")));
>
> Here is brief steps to reproduce this issue:
>
> 1. setup the primary and the standby servers (with hot_standby=on).
> 2. create a logical slot on the standby.
> 3. on standby, set hot_standby=off and restart.
> 4. on primary, lower wal_level to replica and restart.
> 5. promote the standby and execute the logical decoding.
>
> I've attached a small patch to fix this issue. With the patch, we
> invalidate all logical slots even when not in hot standby, that is,
> the patch just removes InHotStandby from the condition. Thoughts?
>
This can fix the scenario you described but I am not sure it is a good
idea. We can consider the fix on these lines if you don't like the
above suggestion.
--
With Regards,
Amit Kapila.
			
		On Sun, Feb 2, 2025 at 8:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jan 24, 2025 at 4:05 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > When a standby replays a XLOG_PARAMETER_CHANGE record that lowers
> > wal_level from logical, we invalidate all logical slots only when the
> > standby is in hot standby mode:
> >
> > if (InRecovery && InHotStandby &&
> >     xlrec.wal_level < WAL_LEVEL_LOGICAL &&
> >     wal_level >= WAL_LEVEL_LOGICAL)
> >     InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_LEVEL,
> >                                        0, InvalidOid,
> >                                        InvalidTransactionId);
> >
> > However, it's possible that this record is replayed when not in hot
> > standby mode and the slot is used after the promotion. In this case,
> > the following Assert in xlog_decode() fails:
> >
> > /*
> >  * This can occur only on a standby, as a primary would
> >  * not allow to restart after changing wal_level < logical
> >  * if there is pre-existing logical slot.
> >  */
>
> Shouldn't we do similar to what this comment indicates on standby? We
> can disallow to start the server as standby, if the hot_standby is off
> and there is a pre-existing logical slot.
It seems like a better idea. I thought we could pass StandbyMode to
StartupReplicationSlots() and check if there is a pre-existing logical
slot, but it would break the ABI compatibility. It might not be a
problem in practice as StartupReplicationSlots() is normally used only
by the startup process. But if we want to avoid that we can introduce
a new function for that.
>
> > Assert(RecoveryInProgress());
> > ereport(ERROR,
> >         (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> >          errmsg("logical decoding on standby requires \"wal_level\" >=
> > \"logical\" on the primary")));
> >
> > Here is brief steps to reproduce this issue:
> >
> > 1. setup the primary and the standby servers (with hot_standby=on).
> > 2. create a logical slot on the standby.
> > 3. on standby, set hot_standby=off and restart.
> > 4. on primary, lower wal_level to replica and restart.
> > 5. promote the standby and execute the logical decoding.
> >
> > I've attached a small patch to fix this issue. With the patch, we
> > invalidate all logical slots even when not in hot standby, that is,
> > the patch just removes InHotStandby from the condition. Thoughts?
> >
>
> This can fix the scenario you described but I am not sure it is a good
> idea. We can consider the fix on these lines if you don't like the
> above suggestion.
Agreed. If your suggestion doesn't work, let's go back to this idea.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		On Tue, Feb 4, 2025 at 12:59 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Sun, Feb 2, 2025 at 8:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Jan 24, 2025 at 4:05 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > When a standby replays a XLOG_PARAMETER_CHANGE record that lowers > > > wal_level from logical, we invalidate all logical slots only when the > > > standby is in hot standby mode: > > > > > > if (InRecovery && InHotStandby && > > > xlrec.wal_level < WAL_LEVEL_LOGICAL && > > > wal_level >= WAL_LEVEL_LOGICAL) > > > InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_LEVEL, > > > 0, InvalidOid, > > > InvalidTransactionId); > > > > > > However, it's possible that this record is replayed when not in hot > > > standby mode and the slot is used after the promotion. In this case, > > > the following Assert in xlog_decode() fails: > > > > > > /* > > > * This can occur only on a standby, as a primary would > > > * not allow to restart after changing wal_level < logical > > > * if there is pre-existing logical slot. > > > */ > > > > Shouldn't we do similar to what this comment indicates on standby? We > > can disallow to start the server as standby, if the hot_standby is off > > and there is a pre-existing logical slot. > > It seems like a better idea. I thought we could pass StandbyMode to > StartupReplicationSlots() and check if there is a pre-existing logical > slot, but it would break the ABI compatibility. It might not be a > problem in practice as StartupReplicationSlots() is normally used only > by the startup process. But if we want to avoid that we can introduce > a new function for that. Since StandbyMode is exposed, we don't need to change the function signature. I'll update and submit the patch soon. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Wed, Feb 5, 2025 at 2:06 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> I've attached the updated patch. The fix needs to be back-patched to
> v16 where logical decoding on standby was introduced.
>
Isn't it better to give the new ERROR near the below code?
RestoreSlotFromDisk()
{
...
if (cp.slotdata.database != InvalidOid && wal_level < WAL_LEVEL_LOGICAL)
ereport(FATAL,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("logical replication slot \"%s\" exists, but \"wal_level\" <
\"logical\"",
NameStr(cp.slotdata.name)),
...
}
If feasible, this will avoid an additional loop over the slots and a
new ERROR will be added at the same place as an existing similar
ERROR.
--
With Regards,
Amit Kapila.
			
		Hi,
On Wed, Feb 05, 2025 at 12:08:26PM +0530, Amit Kapila wrote:
> On Wed, Feb 5, 2025 at 2:06 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > I've attached the updated patch. The fix needs to be back-patched to
> > v16 where logical decoding on standby was introduced.
Nice catch and thanks for the patch!
I also do prefer the current idea (disallow to start if the hot_standby is off
and there is an existing logical slot).
> Isn't it better to give the new ERROR near the below code?
> RestoreSlotFromDisk()
> {
> ...
> if (cp.slotdata.database != InvalidOid && wal_level < WAL_LEVEL_LOGICAL)
> ereport(FATAL,
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> errmsg("logical replication slot \"%s\" exists, but \"wal_level\" <
> \"logical\"",
> NameStr(cp.slotdata.name)),
> ...
> }
> 
> If feasible, this will avoid an additional loop over the slots and a
> new ERROR will be added at the same place as an existing similar
> ERROR.
Agree.
+  if (SlotIsLogical(s) && !EnableHotStandby)
+      ereport(FATAL,
+              (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+               errmsg("hot standby must be enabled for pre-existing logical replication slots")));
On a primary lowering wal_level < logical, we'd get something like:
"
FATAL:  logical replication slot "logical_slot" exists, but "wal_level" < "logical"
"
What about being close to it, so something like?
"
FATAL:  logical replication slot "logical_slot" exists, but "hot_standby" = "off"
"
Regards,
-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
			
		On Thu, Feb 6, 2025 at 12:30 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > I've updated the patch accordingly. > Today, again thinking about the proposed fix, I was wondering about the following case. Say, on hot_standby, the user created a logical slot, then shut down hot_standby, turn off the hot_standby flag, and restart standby. This is allowed today but not after the patch. It is possible that the user can promote a non-hot_standby server after its restart in the previous step and can reuse the logical slot. So, is it okay to change this behavior? If not, then we may need to go back to the first fix proposed by you in this thread. -- With Regards, Amit Kapila.
On Wed, Feb 5, 2025 at 10:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Feb 6, 2025 at 12:30 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > I've updated the patch accordingly. > > > > Today, again thinking about the proposed fix, I was wondering about > the following case. Say, on hot_standby, the user created a logical > slot, then shut down hot_standby, turn off the hot_standby flag, and > restart standby. This is allowed today but not after the patch. It is > possible that the user can promote a non-hot_standby server after its > restart in the previous step and can reuse the logical slot. So, is it > okay to change this behavior? If not, then we may need to go back to > the first fix proposed by you in this thread. While non hot standby, no one can advance logical slots, meaning the standby server ends up accumulating WAL records and also causing the bloat on the primary if hot_standby_feedback is enabled. Given this outcome, I though that the current patch approach would be reasonable. On the other hand, as you pointed out, it would not be good in terms of compatibility as we end up limiting potentially existing use cases. And it would be prefectly fine if users promote the standby server shortly after starting up with hot_standby = off. But, similarly, is there any concerns of my proposed fix? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Fri, Feb 7, 2025 at 12:11 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Wed, Feb 5, 2025 at 10:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Feb 6, 2025 at 12:30 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > I've updated the patch accordingly. > > > > > > > Today, again thinking about the proposed fix, I was wondering about > > the following case. Say, on hot_standby, the user created a logical > > slot, then shut down hot_standby, turn off the hot_standby flag, and > > restart standby. This is allowed today but not after the patch. It is > > possible that the user can promote a non-hot_standby server after its > > restart in the previous step and can reuse the logical slot. So, is it > > okay to change this behavior? If not, then we may need to go back to > > the first fix proposed by you in this thread. > > While non hot standby, no one can advance logical slots, meaning the > standby server ends up accumulating WAL records and also causing the > bloat on the primary if hot_standby_feedback is enabled. Given this > outcome, I though that the current patch approach would be reasonable. > Agreed as I also can't think of any case where the user would require a logical slot on a non-hotstandby server. > On the other hand, as you pointed out, it would not be good in terms > of compatibility as we end up limiting potentially existing use cases. > And it would be prefectly fine if users promote the standby server > shortly after starting up with hot_standby = off. True but it sounds like there is more harm than benefit. It seems reasonable to do this on HEAD. Shall we think of doing it differently in HEAD and back-branches or let's restrict as your v2 patch is doing and if by any chance users complain about it we can try to fix it in another way? > > But, similarly, is > there any concerns of my proposed fix? > - if (InRecovery && InHotStandby && + if (InRecovery && xlrec.wal_level < WAL_LEVEL_LOGICAL && Won't the InRecovery be true even for crash-recovery as well? I think we need to check standby mode here. -- With Regards, Amit Kapila.
Hi, On Fri, Feb 07, 2025 at 11:00:39AM +0530, Amit Kapila wrote: > On Fri, Feb 7, 2025 at 12:11 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Wed, Feb 5, 2025 at 10:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Thu, Feb 6, 2025 at 12:30 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > I've updated the patch accordingly. > > > > > > > > > > Today, again thinking about the proposed fix, I was wondering about > > > the following case. Say, on hot_standby, the user created a logical > > > slot, then shut down hot_standby, turn off the hot_standby flag, and > > > restart standby. This is allowed today but not after the patch. It is > > > possible that the user can promote a non-hot_standby server after its > > > restart in the previous step and can reuse the logical slot. So, is it > > > okay to change this behavior? If not, then we may need to go back to > > > the first fix proposed by you in this thread. > > > > While non hot standby, no one can advance logical slots, meaning the > > standby server ends up accumulating WAL records and also causing the > > bloat on the primary if hot_standby_feedback is enabled. Yeah. I think the bloat on the primary can occur only if there is also a physical replication slot between the two (which is probably the majority of the cases though). > Given this > > outcome, I though that the current patch approach would be reasonable. > > > > Agreed as I also can't think of any case where the user would require > a logical slot on a non-hotstandby server. +1 > > On the other hand, as you pointed out, it would not be good in terms > > of compatibility as we end up limiting potentially existing use cases. > > And it would be prefectly fine if users promote the standby server > > shortly after starting up with hot_standby = off. yeah exactly. And if hot_standby = off is "really" needed then the option would be to remove the logical replication slot on the standby (which is probably a good thing to do with hot_standby = off due to the cons you mentioned above). > True but it sounds like there is more harm than benefit. It seems > reasonable to do this on HEAD. Shall we think of doing it differently > in HEAD and back-branches or let's restrict as your v2 patch is doing > and if by any chance users complain about it we can try to fix it in > another way? I'm tempted to vote for the later so that we can maintain the same code across branches. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi,
On Wed, Feb 05, 2025 at 10:59:22AM -0800, Masahiko Sawada wrote:
> On Tue, Feb 4, 2025 at 11:48 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > +  if (SlotIsLogical(s) && !EnableHotStandby)
> > +      ereport(FATAL,
> > +              (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > +               errmsg("hot standby must be enabled for pre-existing logical replication slots")));
> >
> > On a primary lowering wal_level < logical, we'd get something like:
> >
> > "
> > FATAL:  logical replication slot "logical_slot" exists, but "wal_level" < "logical"
> > "
> >
> > What about being close to it, so something like?
> >
> > "
> > FATAL:  logical replication slot "logical_slot" exists, but "hot_standby" = "off"
> > "
> 
> Looks good, but I'd like to mention that this is required only on
> standbys. So how about the following?
> 
> FATAL:  logical replication slot "s" exists on the standby, but
> "hot_standby" = "off"
> HINT:  Change "hot_standby" to be "on".
Thanks! Looks good and consistent with the existing wording that can be found
in slot.c.
Regards,
-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
			
		On Thu, Feb 6, 2025 at 9:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Feb 7, 2025 at 12:11 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Wed, Feb 5, 2025 at 10:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Thu, Feb 6, 2025 at 12:30 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > I've updated the patch accordingly. > > > > > > > > > > Today, again thinking about the proposed fix, I was wondering about > > > the following case. Say, on hot_standby, the user created a logical > > > slot, then shut down hot_standby, turn off the hot_standby flag, and > > > restart standby. This is allowed today but not after the patch. It is > > > possible that the user can promote a non-hot_standby server after its > > > restart in the previous step and can reuse the logical slot. So, is it > > > okay to change this behavior? If not, then we may need to go back to > > > the first fix proposed by you in this thread. > > > > While non hot standby, no one can advance logical slots, meaning the > > standby server ends up accumulating WAL records and also causing the > > bloat on the primary if hot_standby_feedback is enabled. Given this > > outcome, I though that the current patch approach would be reasonable. > > > > Agreed as I also can't think of any case where the user would require > a logical slot on a non-hotstandby server. > > > On the other hand, as you pointed out, it would not be good in terms > > of compatibility as we end up limiting potentially existing use cases. > > And it would be prefectly fine if users promote the standby server > > shortly after starting up with hot_standby = off. > > True but it sounds like there is more harm than benefit. It seems > reasonable to do this on HEAD. Shall we think of doing it differently > in HEAD and back-branches or let's restrict as your v2 patch is doing > and if by any chance users complain about it we can try to fix it in > another way? While the latter would be good for maintainability it would not be advisable to change the behavior back and forth in back branches. I'd like to make it clear what point of v2 patch (restring server startup for pre-existing logical slots) is preferable over the first patch (removing InHotStandby condition)? If the approach of the first patch is reasonably good, we could take it both in HEAD and backbranches. > > > > > But, similarly, is > > there any concerns of my proposed fix? > > > > - if (InRecovery && InHotStandby && > + if (InRecovery && > xlrec.wal_level < WAL_LEVEL_LOGICAL && > > Won't the InRecovery be true even for crash-recovery as well? I think > we need to check standby mode here. I think if we check standby mode here (i.e., adding StandbyMode), we would not be able to invalidate logical slots during archive recovery. That is, in the following scenario, we would hit the same assertion failure: 1. setup the primary and the standby servers (with hot_standby=on). 2. create a logical slot on the standby. 3. on standby, set archive recovery settings (setting restore_command, removing standby.signal, and creating recovery.signal etc.). 4. on primary, lower wal_level to replica and restart. 5. start standby (in archive recovery mode). 6. execute the logical decoding after the archive recovery completes. And, you're right that InRecovery is true even for crash-recovery. But is there any case where we invalidate logical slots due to XLOG_PARAMETER_CHANGE during a crash recovery? Also, I guess InRecovery is always true when we call xlog_redo(). Do we need to check it in the first place? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Fri, Feb 7, 2025 at 11:30 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Feb 6, 2025 at 9:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > True but it sounds like there is more harm than benefit. It seems > > reasonable to do this on HEAD. Shall we think of doing it differently > > in HEAD and back-branches or let's restrict as your v2 patch is doing > > and if by any chance users complain about it we can try to fix it in > > another way? > > While the latter would be good for maintainability it would not be > advisable to change the behavior back and forth in back branches. I'd > like to make it clear what point of v2 patch (restring server startup > for pre-existing logical slots) is preferable over the first patch > (removing InHotStandby condition)? > Sorry, I didn't understand your question. > If the approach of the first patch > is reasonably good, we could take it both in HEAD and backbranches. > > > > > > > > > But, similarly, is > > > there any concerns of my proposed fix? > > > > > > > - if (InRecovery && InHotStandby && > > + if (InRecovery && > > xlrec.wal_level < WAL_LEVEL_LOGICAL && > > > > Won't the InRecovery be true even for crash-recovery as well? I think > > we need to check standby mode here. > > I think if we check standby mode here (i.e., adding StandbyMode), we > would not be able to invalidate logical slots during archive recovery. > That is, in the following scenario, we would hit the same assertion > failure: > > 1. setup the primary and the standby servers (with hot_standby=on). > 2. create a logical slot on the standby. > 3. on standby, set archive recovery settings (setting restore_command, > removing standby.signal, and creating recovery.signal etc.). > 4. on primary, lower wal_level to replica and restart. > 5. start standby (in archive recovery mode). > 6. execute the logical decoding after the archive recovery completes. > > And, you're right that InRecovery is true even for crash-recovery. But > is there any case where we invalidate logical slots due to > XLOG_PARAMETER_CHANGE during a crash recovery? > I am not sure but what about the cases where the server crashes immediately after logging this WAL record and the user reduced the wal_lvel before the next restart? I think in such a case we may ERROR out while restoring slots from the disk which will be before we will try to replay the WAL. However, I am not sure if it is a good idea to rely on that assumption. -- With Regards, Amit Kapila.
On Mon, Feb 10, 2025 at 4:11 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Feb 7, 2025 at 11:30 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Feb 6, 2025 at 9:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > True but it sounds like there is more harm than benefit. It seems
> > > reasonable to do this on HEAD. Shall we think of doing it differently
> > > in HEAD and back-branches or let's restrict as your v2 patch is doing
> > > and if by any chance users complain about it we can try to fix it in
> > > another way?
> >
> > While the latter would be good for maintainability it would not be
> > advisable to change the behavior back and forth in back branches. I'd
> > like to make it clear what point of v2 patch (restring server startup
> > for pre-existing logical slots) is preferable over the first patch
> > (removing InHotStandby condition)?
> >
>
> Sorry, I didn't understand your question.
I wanted to mean that there is another direction to fix this issue by
applying the v1 patch to both HEAD and back-branches. It might be
worth considering this option too if the v1 patch's approach is
reasonable.
>
> > If the approach of the first patch
> > is reasonably good, we could take it both in HEAD and backbranches.
> >
> > >
> > > >
> > > > But, similarly, is
> > > > there any concerns of my proposed fix?
> > > >
> > >
> > > - if (InRecovery && InHotStandby &&
> > > + if (InRecovery &&
> > >   xlrec.wal_level < WAL_LEVEL_LOGICAL &&
> > >
> > > Won't the InRecovery be true even for crash-recovery as well? I think
> > > we need to check standby mode here.
> >
> > I think if we check standby mode here (i.e., adding StandbyMode), we
> > would not be able to invalidate logical slots during archive recovery.
> > That is, in the following scenario, we would hit the same assertion
> > failure:
> >
> > 1. setup the primary and the standby servers (with hot_standby=on).
> > 2. create a logical slot on the standby.
> > 3. on standby, set archive recovery settings (setting restore_command,
> > removing standby.signal, and creating recovery.signal etc.).
> > 4. on primary, lower wal_level to replica and restart.
> > 5. start standby (in archive recovery mode).
> > 6. execute the logical decoding after the archive recovery completes.
> >
> > And, you're right that InRecovery is true even for crash-recovery. But
> > is there any case where we invalidate logical slots due to
> > XLOG_PARAMETER_CHANGE during a crash recovery?
> >
>
> I am not sure but what about the cases where the server crashes
> immediately after logging this WAL record and the user reduced the
> wal_lvel before the next restart? I think in such a case we may ERROR
> out while restoring slots from the disk which will be before we will
> try to replay the WAL.
I think so too.
> However, I am not sure if it is a good idea to
> rely on that assumption.
Agreed, I'm fine with leaving InRecovery in this condition. I think
the point is whether we should add StandbyMode to the condition or
not. I think if we do that, we would end up with the same error in the
above scenario I described. So does the following condition make
sense?
        if (InRecovery &&
            xlrec.wal_level < WAL_LEVEL_LOGICAL &&
            wal_level >= WAL_LEVEL_LOGICAL)
            InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_LEVEL,
                                               0, InvalidOid,
                                               InvalidTransactionId);
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		On Wed, Feb 12, 2025 at 1:16 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Mon, Feb 10, 2025 at 4:11 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Feb 7, 2025 at 11:30 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > On Thu, Feb 6, 2025 at 9:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > True but it sounds like there is more harm than benefit. It seems > > > > reasonable to do this on HEAD. Shall we think of doing it differently > > > > in HEAD and back-branches or let's restrict as your v2 patch is doing > > > > and if by any chance users complain about it we can try to fix it in > > > > another way? > > > > > > While the latter would be good for maintainability it would not be > > > advisable to change the behavior back and forth in back branches. I'd > > > like to make it clear what point of v2 patch (restring server startup > > > for pre-existing logical slots) is preferable over the first patch > > > (removing InHotStandby condition)? > > > > > > > Sorry, I didn't understand your question. > > I wanted to mean that there is another direction to fix this issue by > applying the v1 patch to both HEAD and back-branches. It might be > worth considering this option too if the v1 patch's approach is > reasonable. > > > > > > If the approach of the first patch > > > is reasonably good, we could take it both in HEAD and backbranches. > > > > > > > > > > > > > > > > > But, similarly, is > > > > > there any concerns of my proposed fix? > > > > > > > > > > > > > - if (InRecovery && InHotStandby && > > > > + if (InRecovery && > > > > xlrec.wal_level < WAL_LEVEL_LOGICAL && > > > > > > > > Won't the InRecovery be true even for crash-recovery as well? I think > > > > we need to check standby mode here. > > > > > > I think if we check standby mode here (i.e., adding StandbyMode), we > > > would not be able to invalidate logical slots during archive recovery. > > > That is, in the following scenario, we would hit the same assertion > > > failure: > > > > > > 1. setup the primary and the standby servers (with hot_standby=on). > > > 2. create a logical slot on the standby. > > > 3. on standby, set archive recovery settings (setting restore_command, > > > removing standby.signal, and creating recovery.signal etc.). > > > 4. on primary, lower wal_level to replica and restart. > > > 5. start standby (in archive recovery mode). > > > 6. execute the logical decoding after the archive recovery completes. > > > > > > And, you're right that InRecovery is true even for crash-recovery. But > > > is there any case where we invalidate logical slots due to > > > XLOG_PARAMETER_CHANGE during a crash recovery? > > > > > > > I am not sure but what about the cases where the server crashes > > immediately after logging this WAL record and the user reduced the > > wal_lvel before the next restart? I think in such a case we may ERROR > > out while restoring slots from the disk which will be before we will > > try to replay the WAL. > > I think so too. > > > However, I am not sure if it is a good idea to > > rely on that assumption. > > Agreed, I'm fine with leaving InRecovery in this condition. I think > the point is whether we should add StandbyMode to the condition or > not. I think if we do that, we would end up with the same error in the > above scenario I described. So does the following condition make > sense? > > if (InRecovery && > xlrec.wal_level < WAL_LEVEL_LOGICAL && > wal_level >= WAL_LEVEL_LOGICAL) > InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_LEVEL, > 0, InvalidOid, > InvalidTransactionId); > This will still be true for crash-recovery as the InRecovery flag will be true for that case as well. I think we should go with your v2 patch approach for HEAD and back-branches. -- With Regards, Amit Kapila.
On Wed, Feb 12, 2025 at 5:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Feb 12, 2025 at 1:16 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > Agreed, I'm fine with leaving InRecovery in this condition. I think > > the point is whether we should add StandbyMode to the condition or > > not. I think if we do that, we would end up with the same error in the > > above scenario I described. So does the following condition make > > sense? > > > > if (InRecovery && > > xlrec.wal_level < WAL_LEVEL_LOGICAL && > > wal_level >= WAL_LEVEL_LOGICAL) > > InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_LEVEL, > > 0, InvalidOid, > > InvalidTransactionId); > > > > This will still be true for crash-recovery as the InRecovery flag will > be true for that case as well. I think we should go with your v2 patch > approach for HEAD and back-branches. > Any opinion on how to proceed here? -- With Regards, Amit Kapila.
Hi, On Mon, Feb 24, 2025 at 05:01:30PM +0530, Amit Kapila wrote: > On Wed, Feb 12, 2025 at 5:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Feb 12, 2025 at 1:16 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > Agreed, I'm fine with leaving InRecovery in this condition. I think > > > the point is whether we should add StandbyMode to the condition or > > > not. I think if we do that, we would end up with the same error in the > > > above scenario I described. So does the following condition make > > > sense? > > > > > > if (InRecovery && > > > xlrec.wal_level < WAL_LEVEL_LOGICAL && > > > wal_level >= WAL_LEVEL_LOGICAL) > > > InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_LEVEL, > > > 0, InvalidOid, > > > InvalidTransactionId); > > > > > > > This will still be true for crash-recovery as the InRecovery flag will > > be true for that case as well. I think we should go with your v2 patch > > approach for HEAD and back-branches. > > > > Any opinion on how to proceed here? As far I'm concerned, I did not change my mind since [1] and think the same i.e: go with v2 for HEAD and back-branches. [1]: https://www.postgresql.org/message-id/Z6W7GtSzeDcZec%2Bf%40ip-10-97-1-34.eu-west-3.compute.internal Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Mon, Feb 24, 2025 at 7:13 AM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > Hi, > > On Mon, Feb 24, 2025 at 05:01:30PM +0530, Amit Kapila wrote: > > On Wed, Feb 12, 2025 at 5:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Wed, Feb 12, 2025 at 1:16 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > > > > Agreed, I'm fine with leaving InRecovery in this condition. I think > > > > the point is whether we should add StandbyMode to the condition or > > > > not. I think if we do that, we would end up with the same error in the > > > > above scenario I described. So does the following condition make > > > > sense? > > > > > > > > if (InRecovery && > > > > xlrec.wal_level < WAL_LEVEL_LOGICAL && > > > > wal_level >= WAL_LEVEL_LOGICAL) > > > > InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_LEVEL, > > > > 0, InvalidOid, > > > > InvalidTransactionId); > > > > > > > > > > This will still be true for crash-recovery as the InRecovery flag will > > > be true for that case as well. I think we should go with your v2 patch > > > approach for HEAD and back-branches. > > > > > > > Any opinion on how to proceed here? > > As far I'm concerned, I did not change my mind since [1] and think the same i.e: > go with v2 for HEAD and back-branches. Agreed too. So I'm going to proceed with backpatching the v2 patch to v16. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Mon, Feb 24, 2025 at 11:39 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Mon, Feb 24, 2025 at 7:13 AM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > > > Hi, > > > > On Mon, Feb 24, 2025 at 05:01:30PM +0530, Amit Kapila wrote: > > > On Wed, Feb 12, 2025 at 5:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Wed, Feb 12, 2025 at 1:16 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > > > > > > > Agreed, I'm fine with leaving InRecovery in this condition. I think > > > > > the point is whether we should add StandbyMode to the condition or > > > > > not. I think if we do that, we would end up with the same error in the > > > > > above scenario I described. So does the following condition make > > > > > sense? > > > > > > > > > > if (InRecovery && > > > > > xlrec.wal_level < WAL_LEVEL_LOGICAL && > > > > > wal_level >= WAL_LEVEL_LOGICAL) > > > > > InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_LEVEL, > > > > > 0, InvalidOid, > > > > > InvalidTransactionId); > > > > > > > > > > > > > This will still be true for crash-recovery as the InRecovery flag will > > > > be true for that case as well. I think we should go with your v2 patch > > > > approach for HEAD and back-branches. > > > > > > > > > > Any opinion on how to proceed here? > > > > As far I'm concerned, I did not change my mind since [1] and think the same i.e: > > go with v2 for HEAD and back-branches. > > Agreed too. So I'm going to proceed with backpatching the v2 patch to v16. > Pushed. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com