Обсуждение: Bug: wrong relname in RemoveSubscriptionRel() error detail

Поиск
Список
Период
Сортировка

Bug: wrong relname in RemoveSubscriptionRel() error detail

От
Chao Li
Дата:
Hi,

I just noticed a wrong relation name in the error detail emitted by RemoveSubscriptionRel():
```
*
 * Drop subscription relation mapping. These can be for a particular
 * subscription, or for a particular relation, or both.
 */
void
RemoveSubscriptionRel(Oid subid, Oid relid)
{
    Relation    rel;
    TableScanDesc scan;
    ScanKeyData skey[2];
    HeapTuple    tup;
    int            nkeys = 0;

    rel = table_open(SubscriptionRelRelationId, RowExclusiveLock);

    if (OidIsValid(subid))
    {
        ScanKeyInit(&skey[nkeys++],
                    Anum_pg_subscription_rel_srsubid,
                    BTEqualStrategyNumber,
                    F_OIDEQ,
                    ObjectIdGetDatum(subid));
    }

    if (OidIsValid(relid))
    {
        ScanKeyInit(&skey[nkeys++],
                    Anum_pg_subscription_rel_srrelid,
                    BTEqualStrategyNumber,
                    F_OIDEQ,
                    ObjectIdGetDatum(relid));
    }

    /* Do the search and delete what we found. */
    scan = table_beginscan_catalog(rel, nkeys, skey);
    while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
    {
        Form_pg_subscription_rel subrel;

        subrel = (Form_pg_subscription_rel) GETSTRUCT(tup);

        if (!OidIsValid(subid) &&
            subrel->srsubstate != SUBREL_STATE_READY &&
            get_rel_relkind(subrel->srrelid) != RELKIND_SEQUENCE)
        {
            ereport(ERROR,
                    (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                     errmsg("could not drop relation mapping for subscription \"%s\"",
                            get_subscription_name(subrel->srsubid, false)),
                     errdetail("Table synchronization for relation \"%s\" is in progress and is in state \"%c\".",
                               get_rel_name(relid), subrel->srsubstate), // <====== bug is here

```

In the error detail, it uses relid to get the relation name. But at that point we are iterating over subrel, and the
functionargument relid can be InvalidOid. So the error detail should use subrel->srrelid instead. 

This looks like a first-day bug introducing by ce0fdbf, so I think it’s worth back-patching.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/





Вложения

Re: Bug: wrong relname in RemoveSubscriptionRel() error detail

От
Nisha Moond
Дата:
On Fri, Mar 27, 2026 at 3:22 PM Chao Li <li.evan.chao@gmail.com> wrote:
>
> Hi,
>
> I just noticed a wrong relation name in the error detail emitted by RemoveSubscriptionRel():
> ```
> *
>  * Drop subscription relation mapping. These can be for a particular
>  * subscription, or for a particular relation, or both.
>  */
> void
> RemoveSubscriptionRel(Oid subid, Oid relid)
> {
>         Relation        rel;
>         TableScanDesc scan;
>         ScanKeyData skey[2];
>         HeapTuple       tup;
>         int                     nkeys = 0;
>
>         rel = table_open(SubscriptionRelRelationId, RowExclusiveLock);
>
>         if (OidIsValid(subid))
>         {
>                 ScanKeyInit(&skey[nkeys++],
>                                         Anum_pg_subscription_rel_srsubid,
>                                         BTEqualStrategyNumber,
>                                         F_OIDEQ,
>                                         ObjectIdGetDatum(subid));
>         }
>
>         if (OidIsValid(relid))
>         {
>                 ScanKeyInit(&skey[nkeys++],
>                                         Anum_pg_subscription_rel_srrelid,
>                                         BTEqualStrategyNumber,
>                                         F_OIDEQ,
>                                         ObjectIdGetDatum(relid));
>         }
>
>         /* Do the search and delete what we found. */
>         scan = table_beginscan_catalog(rel, nkeys, skey);
>         while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
>         {
>                 Form_pg_subscription_rel subrel;
>
>                 subrel = (Form_pg_subscription_rel) GETSTRUCT(tup);
>
>                 if (!OidIsValid(subid) &&
>                         subrel->srsubstate != SUBREL_STATE_READY &&
>                         get_rel_relkind(subrel->srrelid) != RELKIND_SEQUENCE)
>                 {
>                         ereport(ERROR,
>                                         (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>                                          errmsg("could not drop relation mapping for subscription \"%s\"",
>                                                         get_subscription_name(subrel->srsubid, false)),
>                                          errdetail("Table synchronization for relation \"%s\" is in progress and is
instate \"%c\".", 
>                                                            get_rel_name(relid), subrel->srsubstate), // <====== bug
ishere 
>
> ```
>
> In the error detail, it uses relid to get the relation name. But at that point we are iterating over subrel, and the
functionargument relid can be InvalidOid. So the error detail should use subrel->srrelid instead. 
>
> This looks like a first-day bug introducing by ce0fdbf, so I think it’s worth back-patching.
>

I tried reproducing the said bug on HEAD, but it doesn’t seem to exist
in the current code.

To hit the mentioned error, the subid has to be invalid -
   if (!OidIsValid(subid) &&  <==
And currently, the only path that uses an invalid subid is via
heap_drop_with_catalog() :
…
/*
   * Remove any associated relation synchronization states.
   */
  RemoveSubscriptionRel(InvalidOid, relid);
…

But here relid is always a valid OID (it's the table being dropped).
The corresponding pg_class row is deleted after
RemoveSubscriptionRel(), i.e. via a later call to
DeleteRelationTuple(relid);

It can only happen with a hypothetical future caller of
RemoveSubscriptionRel(InvalidOid, InvalidOid). And in that case, using
"subrel->srrelid" would be correct.

So this doesn’t appear to be a real issue in the current code, and
doesn’t look like a bug to fix now. IMO, if such a caller is added in
the future, we can add a guard at that point.

--
Thanks,
Nisha



Re: Bug: wrong relname in RemoveSubscriptionRel() error detail

От
Chao Li
Дата:

> On Mar 30, 2026, at 14:16, Nisha Moond <nisha.moond412@gmail.com> wrote:
>
> On Fri, Mar 27, 2026 at 3:22 PM Chao Li <li.evan.chao@gmail.com> wrote:
>>
>> Hi,
>>
>> I just noticed a wrong relation name in the error detail emitted by RemoveSubscriptionRel():
>> ```
>> *
>> * Drop subscription relation mapping. These can be for a particular
>> * subscription, or for a particular relation, or both.
>> */
>> void
>> RemoveSubscriptionRel(Oid subid, Oid relid)
>> {
>>        Relation        rel;
>>        TableScanDesc scan;
>>        ScanKeyData skey[2];
>>        HeapTuple       tup;
>>        int                     nkeys = 0;
>>
>>        rel = table_open(SubscriptionRelRelationId, RowExclusiveLock);
>>
>>        if (OidIsValid(subid))
>>        {
>>                ScanKeyInit(&skey[nkeys++],
>>                                        Anum_pg_subscription_rel_srsubid,
>>                                        BTEqualStrategyNumber,
>>                                        F_OIDEQ,
>>                                        ObjectIdGetDatum(subid));
>>        }
>>
>>        if (OidIsValid(relid))
>>        {
>>                ScanKeyInit(&skey[nkeys++],
>>                                        Anum_pg_subscription_rel_srrelid,
>>                                        BTEqualStrategyNumber,
>>                                        F_OIDEQ,
>>                                        ObjectIdGetDatum(relid));
>>        }
>>
>>        /* Do the search and delete what we found. */
>>        scan = table_beginscan_catalog(rel, nkeys, skey);
>>        while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
>>        {
>>                Form_pg_subscription_rel subrel;
>>
>>                subrel = (Form_pg_subscription_rel) GETSTRUCT(tup);
>>
>>                if (!OidIsValid(subid) &&
>>                        subrel->srsubstate != SUBREL_STATE_READY &&
>>                        get_rel_relkind(subrel->srrelid) != RELKIND_SEQUENCE)
>>                {
>>                        ereport(ERROR,
>>                                        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>>                                         errmsg("could not drop relation mapping for subscription \"%s\"",
>>                                                        get_subscription_name(subrel->srsubid, false)),
>>                                         errdetail("Table synchronization for relation \"%s\" is in progress and is
instate \"%c\".", 
>>                                                           get_rel_name(relid), subrel->srsubstate), // <====== bug
ishere 
>>
>> ```
>>
>> In the error detail, it uses relid to get the relation name. But at that point we are iterating over subrel, and the
functionargument relid can be InvalidOid. So the error detail should use subrel->srrelid instead. 
>>
>> This looks like a first-day bug introducing by ce0fdbf, so I think it’s worth back-patching.
>>
>
> I tried reproducing the said bug on HEAD, but it doesn’t seem to exist
> in the current code.
>
> To hit the mentioned error, the subid has to be invalid -
>   if (!OidIsValid(subid) &&  <==
> And currently, the only path that uses an invalid subid is via
> heap_drop_with_catalog() :
> …
> /*
>   * Remove any associated relation synchronization states.
>   */
>  RemoveSubscriptionRel(InvalidOid, relid);
> …
>
> But here relid is always a valid OID (it's the table being dropped).
> The corresponding pg_class row is deleted after
> RemoveSubscriptionRel(), i.e. via a later call to
> DeleteRelationTuple(relid);
>
> It can only happen with a hypothetical future caller of
> RemoveSubscriptionRel(InvalidOid, InvalidOid). And in that case, using
> "subrel->srrelid" would be correct.
>
> So this doesn’t appear to be a real issue in the current code, and
> doesn’t look like a bug to fix now. IMO, if such a caller is added in
> the future, we can add a guard at that point.
>
> --
> Thanks,
> Nisha

Hi Nisha,

Thanks for your review.

I think one current call site may have been overlooked. In DropSubscription(), we have:
```
    /* Remove any associated relation synchronization states. */
    RemoveSubscriptionRel(subid, InvalidOid);
```

I agree this is an edge-case bug and may be difficult to reproduce in practice. But from the function’s semantics, it
seemsclear to me that the wrong relation OID is used in the error detail, regardless of how easy it is to trigger
today.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







RE: Bug: wrong relname in RemoveSubscriptionRel() error detail

От
"Zhijie Hou (Fujitsu)"
Дата:
On Monday, March 30, 2026 2:33 PM Chao Li <li.evan.chao@gmail.com> wrote:
> > On Mar 30, 2026, at 14:16, Nisha Moond <nisha.moond412@gmail.com>
> wrote:
> >
> > On Fri, Mar 27, 2026 at 3:22 PM Chao Li <li.evan.chao@gmail.com> wrote:
> > > This looks like a first-day bug introducing by ce0fdbf, so I think it’s worth
> > > back-patching.
> > >
> > I tried reproducing the said bug on HEAD, but it doesn’t seem to exist
> > in the current code.
> >
> > To hit the mentioned error, the subid has to be invalid -
> >   if (!OidIsValid(subid) &&  <==
> > And currently, the only path that uses an invalid subid is via
> > heap_drop_with_catalog() :
> > …
> > /*
> >   * Remove any associated relation synchronization states.
> >   */
> >  RemoveSubscriptionRel(InvalidOid, relid);
> > …
> >
> > But here relid is always a valid OID (it's the table being dropped).
> > The corresponding pg_class row is deleted after
> > RemoveSubscriptionRel(), i.e. via a later call to
> > DeleteRelationTuple(relid);
> >
> > It can only happen with a hypothetical future caller of
> > RemoveSubscriptionRel(InvalidOid, InvalidOid). And in that case, using
> > "subrel->srrelid" would be correct.
> >
> > So this doesn’t appear to be a real issue in the current code, and
> > doesn’t look like a bug to fix now. IMO, if such a caller is added in
> > the future, we can add a guard at that point.
> >
> > --
> > Thanks,
> > Nisha
> 
> Hi Nisha,
> 
> Thanks for your review.
> 
> I think one current call site may have been overlooked. In DropSubscription(),
> we have:
> ```
>     /* Remove any associated relation synchronization states. */
>     RemoveSubscriptionRel(subid, InvalidOid);
> ```

This won't trigger the bug either, since it passes a valid subscription OID to
the function, while the function only reports an error when an invalid OID is
passed.

> 
> I agree this is an edge-case bug and may be difficult to reproduce in practice.
> But from the function’s semantics, it seems clear to me that the wrong
> relation OID is used in the error detail, regardless of how easy it is to trigger
> today.

Since this is a public function, I think it should be OK to fix it as it's good
to make the function future-proof anyway. I'm slightly unsure, however, whether
it's worth backpatching, since this is purely a theoretical issue at the moment.

Best Regards,
Hou zj 

Re: Bug: wrong relname in RemoveSubscriptionRel() error detail

От
Amit Kapila
Дата:
On Mon, Mar 30, 2026 at 1:44 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Monday, March 30, 2026 2:33 PM Chao Li <li.evan.chao@gmail.com> wrote:
> > > On Mar 30, 2026, at 14:16, Nisha Moond <nisha.moond412@gmail.com>
> > wrote:
> > >
> > > On Fri, Mar 27, 2026 at 3:22 PM Chao Li <li.evan.chao@gmail.com> wrote:
> > > > This looks like a first-day bug introducing by ce0fdbf, so I think it’s worth
> > > > back-patching.
> > > >
> > > I tried reproducing the said bug on HEAD, but it doesn’t seem to exist
> > > in the current code.
> > >
> > > To hit the mentioned error, the subid has to be invalid -
> > >   if (!OidIsValid(subid) &&  <==
> > > And currently, the only path that uses an invalid subid is via
> > > heap_drop_with_catalog() :
> > > …
> > > /*
> > >   * Remove any associated relation synchronization states.
> > >   */
> > >  RemoveSubscriptionRel(InvalidOid, relid);
> > > …
> > >
> > > But here relid is always a valid OID (it's the table being dropped).
> > > The corresponding pg_class row is deleted after
> > > RemoveSubscriptionRel(), i.e. via a later call to
> > > DeleteRelationTuple(relid);
> > >
> > > It can only happen with a hypothetical future caller of
> > > RemoveSubscriptionRel(InvalidOid, InvalidOid). And in that case, using
> > > "subrel->srrelid" would be correct.
> > >
> > > So this doesn’t appear to be a real issue in the current code, and
> > > doesn’t look like a bug to fix now. IMO, if such a caller is added in
> > > the future, we can add a guard at that point.
> > >
> > > --
> > > Thanks,
> > > Nisha
> >
> > Hi Nisha,
> >
> > Thanks for your review.
> >
> > I think one current call site may have been overlooked. In DropSubscription(),
> > we have:
> > ```
> >     /* Remove any associated relation synchronization states. */
> >     RemoveSubscriptionRel(subid, InvalidOid);
> > ```
>
> This won't trigger the bug either, since it passes a valid subscription OID to
> the function, while the function only reports an error when an invalid OID is
> passed.
>
> >
> > I agree this is an edge-case bug and may be difficult to reproduce in practice.
> > But from the function’s semantics, it seems clear to me that the wrong
> > relation OID is used in the error detail, regardless of how easy it is to trigger
> > today.
>
> Since this is a public function, I think it should be OK to fix it as it's good
> to make the function future-proof anyway.
>

Even if it is exposed, it is not clear to me in which case one would
like to use it the way it can lead to a problem. I feel unless we have
some concrete case it may not be beneficial to change it.

--
With Regards,
Amit Kapila.



Re: Bug: wrong relname in RemoveSubscriptionRel() error detail

От
Chao Li
Дата:

> On Mar 30, 2026, at 20:19, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Mar 30, 2026 at 1:44 PM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
>>
>> On Monday, March 30, 2026 2:33 PM Chao Li <li.evan.chao@gmail.com> wrote:
>>>> On Mar 30, 2026, at 14:16, Nisha Moond <nisha.moond412@gmail.com>
>>> wrote:
>>>>
>>>> On Fri, Mar 27, 2026 at 3:22 PM Chao Li <li.evan.chao@gmail.com> wrote:
>>>>> This looks like a first-day bug introducing by ce0fdbf, so I think it’s worth
>>>>> back-patching.
>>>>>
>>>> I tried reproducing the said bug on HEAD, but it doesn’t seem to exist
>>>> in the current code.
>>>>
>>>> To hit the mentioned error, the subid has to be invalid -
>>>>  if (!OidIsValid(subid) &&  <==
>>>> And currently, the only path that uses an invalid subid is via
>>>> heap_drop_with_catalog() :
>>>> …
>>>> /*
>>>>  * Remove any associated relation synchronization states.
>>>>  */
>>>> RemoveSubscriptionRel(InvalidOid, relid);
>>>> …
>>>>
>>>> But here relid is always a valid OID (it's the table being dropped).
>>>> The corresponding pg_class row is deleted after
>>>> RemoveSubscriptionRel(), i.e. via a later call to
>>>> DeleteRelationTuple(relid);
>>>>
>>>> It can only happen with a hypothetical future caller of
>>>> RemoveSubscriptionRel(InvalidOid, InvalidOid). And in that case, using
>>>> "subrel->srrelid" would be correct.
>>>>
>>>> So this doesn’t appear to be a real issue in the current code, and
>>>> doesn’t look like a bug to fix now. IMO, if such a caller is added in
>>>> the future, we can add a guard at that point.
>>>>
>>>> --
>>>> Thanks,
>>>> Nisha
>>>
>>> Hi Nisha,
>>>
>>> Thanks for your review.
>>>
>>> I think one current call site may have been overlooked. In DropSubscription(),
>>> we have:
>>> ```
>>>    /* Remove any associated relation synchronization states. */
>>>    RemoveSubscriptionRel(subid, InvalidOid);
>>> ```
>>
>> This won't trigger the bug either, since it passes a valid subscription OID to
>> the function, while the function only reports an error when an invalid OID is
>> passed.
>>
>>>
>>> I agree this is an edge-case bug and may be difficult to reproduce in practice.
>>> But from the function’s semantics, it seems clear to me that the wrong
>>> relation OID is used in the error detail, regardless of how easy it is to trigger
>>> today.
>>
>> Since this is a public function, I think it should be OK to fix it as it's good
>> to make the function future-proof anyway.
>>
>
> Even if it is exposed, it is not clear to me in which case one would
> like to use it the way it can lead to a problem. I feel unless we have
> some concrete case it may not be beneficial to change it.
>
> --
> With Regards,
> Amit Kapila.

Hi Amit,

Yes, I agree there is no current use case that would trigger this bug.

Looking at the function header comment again:
```
/*
* Drop subscription relation mapping. These can be for a particular
* subscription, or for a particular relation, or both.
*/
void
RemoveSubscriptionRel(Oid subid, Oid relid)
```

Looks like the function comment does not clearly say that both subid and relid being InvalidOid is a supported case. To
me,“a particular subscription” or “a particular relation” reads as a specific valid OID, so the comment is really
silentabout the case where both are invalid. 

If we consider subid == InvalidOid && relid == InvalidOid to be invalid usage, then I think it would be better to make
thatexplicit, for example by documenting it and adding an Assert. Otherwise, the current behavior can still lead to a
misleadingerror detail in that path. 

From the code readability perspective, I also think subrel->srrelid is a better choice here regardless. This error is
reportingthe relation whose table synchronization is actually in progress, and that relation comes from the current
mappingrow, not from the input argument used as a filter. So using subrel->srrelid makes the code and the error
constructioneasier to understand. 

Given that there is no known practical case that triggers it today, I agree that back-patching is not needed.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/