Обсуждение: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

Поиск

Список

Период

Сортировка

BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

От

PG Bug reporting form

Дата:

16 марта, 18:51:30

The following bug has been logged on the website:

Bug reference:      19435
Logged by:          Hang ammmkilo
Email address:      ammmkilo@163.com
PostgreSQL version: 18.3
Operating system:   Ubuntu 22.04
Description:

A user encountered an error when attempting to execute a query involving
multiple RIGHT JOIN operations and a NATURAL JOIN on the same table
(pg_table_a). The error message returned was:
[XX000]ERROR: no relation entry for relid 2
This error seems to be an internal one and should not be triggered by users.
It might be a bug.
```sql
DROP TABLE IF EXISTS pg_table_a;

CREATE TABLE pg_table_a (
    id INTEGER PRIMARY KEY,
    col_bool BOOLEAN
);

INSERT INTO pg_table_a (id, col_bool)
VALUES (5, TRUE);

SELECT  1 AS c1
FROM (
    pg_table_a AS tom0
    RIGHT JOIN (
        (pg_table_a AS tom1 NATURAL JOIN pg_table_a AS tom2)
        RIGHT JOIN pg_table_a AS tom3
        ON tom1.col_bool IS NOT NULL
    )
    ON tom1.col_bool
);
```

Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

От

Fujii Masao

Дата:

17 марта, 15:14:44

On Tue, Mar 17, 2026 at 7:34 PM PG Bug reporting form
<noreply@postgresql.org> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference:      19435
> Logged by:          Hang ammmkilo
> Email address:      ammmkilo@163.com
> PostgreSQL version: 18.3
> Operating system:   Ubuntu 22.04
> Description:
>
> A user encountered an error when attempting to execute a query involving
> multiple RIGHT JOIN operations and a NATURAL JOIN on the same table
> (pg_table_a). The error message returned was:
> [XX000]ERROR: no relation entry for relid 2
> This error seems to be an internal one and should not be triggered by users.
> It might be a bug.
> ```sql
> DROP TABLE IF EXISTS pg_table_a;
>
> CREATE TABLE pg_table_a (
>     id INTEGER PRIMARY KEY,
>     col_bool BOOLEAN
> );
>
> INSERT INTO pg_table_a (id, col_bool)
> VALUES (5, TRUE);
>
> SELECT  1 AS c1
> FROM (
>     pg_table_a AS tom0
>     RIGHT JOIN (
>         (pg_table_a AS tom1 NATURAL JOIN pg_table_a AS tom2)
>         RIGHT JOIN pg_table_a AS tom3
>         ON tom1.col_bool IS NOT NULL
>     )
>     ON tom1.col_bool
> );
> ```

Thanks for the report!

I was able to reproduce this issue on the master. git bisect that I ran pointed
to commit fc069a3a631 as the likely cause. So I've CC'd its committer,
Alexander, on this thread.

Regards,

--
Fujii Masao

Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

От

Alexander Korotkov

Дата:

17 марта, 15:26:12

On Tue, Mar 17, 2026 at 2:14 PM Fujii Masao <masao.fujii@gmail.com> wrote:
>
> On Tue, Mar 17, 2026 at 7:34 PM PG Bug reporting form
> <noreply@postgresql.org> wrote:
> >
> > The following bug has been logged on the website:
> >
> > Bug reference:      19435
> > Logged by:          Hang ammmkilo
> > Email address:      ammmkilo@163.com
> > PostgreSQL version: 18.3
> > Operating system:   Ubuntu 22.04
> > Description:
> >
> > A user encountered an error when attempting to execute a query involving
> > multiple RIGHT JOIN operations and a NATURAL JOIN on the same table
> > (pg_table_a). The error message returned was:
> > [XX000]ERROR: no relation entry for relid 2
> > This error seems to be an internal one and should not be triggered by users.
> > It might be a bug.
> > ```sql
> > DROP TABLE IF EXISTS pg_table_a;
> >
> > CREATE TABLE pg_table_a (
> >     id INTEGER PRIMARY KEY,
> >     col_bool BOOLEAN
> > );
> >
> > INSERT INTO pg_table_a (id, col_bool)
> > VALUES (5, TRUE);
> >
> > SELECT  1 AS c1
> > FROM (
> >     pg_table_a AS tom0
> >     RIGHT JOIN (
> >         (pg_table_a AS tom1 NATURAL JOIN pg_table_a AS tom2)
> >         RIGHT JOIN pg_table_a AS tom3
> >         ON tom1.col_bool IS NOT NULL
> >     )
> >     ON tom1.col_bool
> > );
> > ```
>
> Thanks for the report!
>
> I was able to reproduce this issue on the master. git bisect that I ran pointed
> to commit fc069a3a631 as the likely cause. So I've CC'd its committer,
> Alexander, on this thread.

Thank you for adding me to the thread.  I'm lookin at this.

------
Regards,
Alexander Korotkov
Supabase

Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

От

Kirill Reshke

Дата:

17 марта, 15:33:02

On Tue, 17 Mar 2026 at 17:15, Fujii Masao <masao.fujii@gmail.com> wrote:
>
> On Tue, Mar 17, 2026 at 7:34 PM PG Bug reporting form
> <noreply@postgresql.org> wrote:
> >
> > The following bug has been logged on the website:
> >
> > Bug reference:      19435
> > Logged by:          Hang ammmkilo
> > Email address:      ammmkilo@163.com
> > PostgreSQL version: 18.3
> > Operating system:   Ubuntu 22.04
> > Description:
> >
> > A user encountered an error when attempting to execute a query involving
> > multiple RIGHT JOIN operations and a NATURAL JOIN on the same table
> > (pg_table_a). The error message returned was:
> > [XX000]ERROR: no relation entry for relid 2
> > This error seems to be an internal one and should not be triggered by users.
> > It might be a bug.
> > ```sql
> > DROP TABLE IF EXISTS pg_table_a;
> >
> > CREATE TABLE pg_table_a (
> >     id INTEGER PRIMARY KEY,
> >     col_bool BOOLEAN
> > );
> >
> > INSERT INTO pg_table_a (id, col_bool)
> > VALUES (5, TRUE);
> >
> > SELECT  1 AS c1
> > FROM (
> >     pg_table_a AS tom0
> >     RIGHT JOIN (
> >         (pg_table_a AS tom1 NATURAL JOIN pg_table_a AS tom2)
> >         RIGHT JOIN pg_table_a AS tom3
> >         ON tom1.col_bool IS NOT NULL
> >     )
> >     ON tom1.col_bool
> > );
> > ```
>
> Thanks for the report!
>
> I was able to reproduce this issue on the master. git bisect that I ran pointed
> to commit fc069a3a631 as the likely cause. So I've CC'd its committer,
> Alexander, on this thread.
>
> Regards,
>
> --
> Fujii Masao
>
>

My git bisect shows the same commit
also, after "set enable_self_join_elimination to false;" query executes ok

--
Best regards,
Kirill Reshke

Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

От

Tender Wang

Дата:

17 марта, 15:59:04

Alexander Korotkov <aekorotkov@gmail.com> 于2026年3月17日周二 20:26写道：
>
> On Tue, Mar 17, 2026 at 2:14 PM Fujii Masao <masao.fujii@gmail.com> wrote:
> >
> > On Tue, Mar 17, 2026 at 7:34 PM PG Bug reporting form
> > <noreply@postgresql.org> wrote:
> > >
> > > The following bug has been logged on the website:
> > >
> > > Bug reference:      19435
> > > Logged by:          Hang ammmkilo
> > > Email address:      ammmkilo@163.com
> > > PostgreSQL version: 18.3
> > > Operating system:   Ubuntu 22.04
> > > Description:
> > >
> > > A user encountered an error when attempting to execute a query involving
> > > multiple RIGHT JOIN operations and a NATURAL JOIN on the same table
> > > (pg_table_a). The error message returned was:
> > > [XX000]ERROR: no relation entry for relid 2
> > > This error seems to be an internal one and should not be triggered by users.
> > > It might be a bug.
> > > ```sql
> > > DROP TABLE IF EXISTS pg_table_a;
> > >
> > > CREATE TABLE pg_table_a (
> > >     id INTEGER PRIMARY KEY,
> > >     col_bool BOOLEAN
> > > );
> > >
> > > INSERT INTO pg_table_a (id, col_bool)
> > > VALUES (5, TRUE);
> > >
> > > SELECT  1 AS c1
> > > FROM (
> > >     pg_table_a AS tom0
> > >     RIGHT JOIN (
> > >         (pg_table_a AS tom1 NATURAL JOIN pg_table_a AS tom2)
> > >         RIGHT JOIN pg_table_a AS tom3
> > >         ON tom1.col_bool IS NOT NULL
> > >     )
> > >     ON tom1.col_bool
> > > );
> > > ```
> >
> > Thanks for the report!
> >
> > I was able to reproduce this issue on the master. git bisect that I ran pointed
> > to commit fc069a3a631 as the likely cause. So I've CC'd its committer,
> > Alexander, on this thread.
>
> Thank you for adding me to the thread.  I'm lookin at this.

The error was reported in rebuild_joinclause_attr_needed() when
processing Relid = 1(rtindex =1),
When processing its joininfo" ON tom1.col_bool IS NOT NULL",
(gdb) pgprint rinfo->clause
Var [varno=2 varattno=2 vartype=16
varreturningtype=VAR_RETURNING_DEFAULT varnosyn=2 varattnosyn=2]

The varno=2, rtindex=2(tom1) has been removed. In
add_vars_to_attr_needed(), to find the base_rel, but the
root->simple_rel_array[2] is NULL.
So the error is reporting.
It seems the joininfo should be replaced by rtindex = 3, because the
rtindex=2 would be removed.
--
Thanks,
Tender Wang

Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

От

Tender Wang

Дата:

17 марта, 16:19:45

Tender Wang <tndrwang@gmail.com> 于2026年3月17日周二 20:59写道：
>
> The error was reported in rebuild_joinclause_attr_needed() when
> processing Relid = 1(rtindex =1),
> When processing its joininfo" ON tom1.col_bool IS NOT NULL",
> (gdb) pgprint rinfo->clause
> Var [varno=2 varattno=2 vartype=16
> varreturningtype=VAR_RETURNING_DEFAULT varnosyn=2 varattnosyn=2]
>
> The varno=2, rtindex=2(tom1) has been removed. In
> add_vars_to_attr_needed(), to find the base_rel, but the
> root->simple_rel_array[2] is NULL.
> So the error is reporting.
> It seems the joininfo should be replaced by rtindex = 3, because the
> rtindex=2 would be removed.
> --
(gdb) pgprint rinfo
RestrictInfo [is_pushed_down=false can_join=false pseudoconstant=false
has_clone=true is_clone=false leakproof=false
has_volatile=VOLATILITY_UNKNOWN security_level=0
              num_base_rels=1 rinfo_serial=4 eval_cost={startup = -1,
per_tuple = 0} norm_selec=-1 outer_selec=-1 outer_is_left=false
hashjoinoperator=0 left_bucketsize=-1
              right_bucketsize=-1 left_mcvfreq=-1 right_mcvfreq=-1
left_hasheqoperator=0 right_hasheqoperator=0]
[clause] Var [varno=2 varattno=2 vartype=16
varreturningtype=VAR_RETURNING_DEFAULT varnosyn=2 varattnosyn=2]
[clause_relids] Bitmapset [3]
[required_relids] Bitmapset [3 1]
[incompatible_relids] Bitmapset [7 6]
[outer_relids] Bitmapset [6 5 3]

The above is the joininfo of the rtindex=1(tom0),  we can see that the
required_relids is changed to [3 1], but the clause is still
rtindex=2(varno=2).
I guess the current logic in remove_self_join_rel() may forget to
process the rinfo->clause.



--
Thanks,
Tender Wang

Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

От

Kirill Reshke

Дата:

17 марта, 16:24:26

On Tue, 17 Mar 2026 at 18:20, Tender Wang <tndrwang@gmail.com> wrote:
>
> Tender Wang <tndrwang@gmail.com> 于2026年3月17日周二 20:59写道：
> >
> > The error was reported in rebuild_joinclause_attr_needed() when
> > processing Relid = 1(rtindex =1),
> > When processing its joininfo" ON tom1.col_bool IS NOT NULL",
> > (gdb) pgprint rinfo->clause
> > Var [varno=2 varattno=2 vartype=16
> > varreturningtype=VAR_RETURNING_DEFAULT varnosyn=2 varattnosyn=2]
> >
> > The varno=2, rtindex=2(tom1) has been removed. In
> > add_vars_to_attr_needed(), to find the base_rel, but the
> > root->simple_rel_array[2] is NULL.
> > So the error is reporting.
> > It seems the joininfo should be replaced by rtindex = 3, because the
> > rtindex=2 would be removed.
> > --
> (gdb) pgprint rinfo
> RestrictInfo [is_pushed_down=false can_join=false pseudoconstant=false
> has_clone=true is_clone=false leakproof=false
> has_volatile=VOLATILITY_UNKNOWN security_level=0
>               num_base_rels=1 rinfo_serial=4 eval_cost={startup = -1,
> per_tuple = 0} norm_selec=-1 outer_selec=-1 outer_is_left=false
> hashjoinoperator=0 left_bucketsize=-1
>               right_bucketsize=-1 left_mcvfreq=-1 right_mcvfreq=-1
> left_hasheqoperator=0 right_hasheqoperator=0]
> [clause] Var [varno=2 varattno=2 vartype=16
> varreturningtype=VAR_RETURNING_DEFAULT varnosyn=2 varattnosyn=2]
> [clause_relids] Bitmapset [3]
> [required_relids] Bitmapset [3 1]
> [incompatible_relids] Bitmapset [7 6]
> [outer_relids] Bitmapset [6 5 3]
>
> The above is the joininfo of the rtindex=1(tom0),  we can see that the
> required_relids is changed to [3 1], but the clause is still
> rtindex=2(varno=2).
> I guess the current logic in remove_self_join_rel() may forget to
> process the rinfo->clause.

Yes, it looks like your analysis is valid. Will you share a patch for
updating `clause` ?


--
Best regards,
Kirill Reshke

Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

От

Tender Wang

Дата:

17 марта, 16:30:44

Kirill Reshke <reshkekirill@gmail.com> 于2026年3月17日周二 21:24写道：
>
> On Tue, 17 Mar 2026 at 18:20, Tender Wang <tndrwang@gmail.com> wrote:
> Yes, it looks like your analysis is valid. Will you share a patch for
> updating `clause` ?
>
Sorry, it's already the middle of the night here, so I'm afraid I
don't have much time to work on this right now.


--
Thanks,
Tender Wang

Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

От

Alexander Korotkov

Дата:

17 марта, 20:46:18

On Tue, Mar 17, 2026 at 3:30 PM Tender Wang <tndrwang@gmail.com> wrote:
> Kirill Reshke <reshkekirill@gmail.com> 于2026年3月17日周二 21:24写道：
> >
> > On Tue, 17 Mar 2026 at 18:20, Tender Wang <tndrwang@gmail.com> wrote:
> > Yes, it looks like your analysis is valid. Will you share a patch for
> > updating `clause` ?
> >
> Sorry, it's already the middle of the night here, so I'm afraid I
> don't have much time to work on this right now.

Thank you for your research.  I've written a simple draft patch.  It
fixes the reported case, but I doubt it is correct in general.  I'll
continue the investigation.

------
Regards,
Alexander Korotkov
Supabase

Вложения

sje_joininfo_fix.patch

Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

От

Tender Wang

Дата:

18 марта, 04:12:57

Alexander Korotkov <aekorotkov@gmail.com> 于2026年3月18日周三 01:46写道：
>
> On Tue, Mar 17, 2026 at 3:30 PM Tender Wang <tndrwang@gmail.com> wrote:
> > Kirill Reshke <reshkekirill@gmail.com> 于2026年3月17日周二 21:24写道：
> > >
> > > On Tue, 17 Mar 2026 at 18:20, Tender Wang <tndrwang@gmail.com> wrote:
> > > Yes, it looks like your analysis is valid. Will you share a patch for
> > > updating `clause` ?
> > >
> > Sorry, it's already the middle of the night here, so I'm afraid I
> > don't have much time to work on this right now.
>
> Thank you for your research.  I've written a simple draft patch.  It
> fixes the reported case, but I doubt it is correct in general.  I'll
> continue the investigation.
The cause of this bug is doing the following statement:
...
ChangeVarNodesWalkExpression((Node *) rinfo->clause, context);
...
in replace_relid_callback().

The rinfo->clause is only Var(tom1.col_bool), sorry I typoed it to "
ON tom1.col_bool IS NOT NULL", in my first reply email.
See expression_tree_walker_impl(), it does nothing if the node is just
a Var node.
So I think whether we can add logic in ChangeVarNodesWalkExpression() as below:

...
if (node && IsA(node, Var))
    {
        Var        *var = (Var *) node;

        if (var->varlevelsup == context->sublevels_up)
        {
            if (var->varno == context->rt_index)
                var->varno = context->new_index;
            var->varnullingrels = adjust_relid_set(var->varnullingrels,
                                                   context->rt_index,
                                                   context->new_index);
            if (var->varnosyn == context->rt_index)
                var->varnosyn = context->new_index;
        }
        return false;
    }
    else
        return expression_tree_walker(node,
                                      ChangeVarNodes_walker,
                                      (void *) context);
...

I tried the above fix, no error again.  But I got a plan like this:
                                        QUERY PLAN
-------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=0.00..115164616.71 rows=7458350250 width=4)
   ->  Seq Scan on pg_table_a tom3  (cost=0.00..34.70 rows=2470 width=0)
   ->  Materialize  (cost=0.00..99509.82 rows=3019575 width=0)
         ->  Nested Loop Left Join  (cost=0.00..75564.95 rows=3019575 width=0)
               Join Filter: tom2.col_bool
               ->  Seq Scan on pg_table_a tom2  (cost=0.00..34.70
rows=2445 width=5)
                     Filter: ((col_bool IS NOT NULL) AND (col_bool IS NOT NULL))
               ->  Materialize  (cost=0.00..47.05 rows=2470 width=0)
                     ->  Seq Scan on pg_table_a tom0
(cost=0.00..34.70 rows=2470 width=0)
(9 rows)

Filter: ((col_bool IS NOT NULL) AND (col_bool IS NOT NULL))
This makes me unhappy.
Your patch gets the same plan.


--
Thanks,
Tender Wang

Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

От

Tender Wang

Дата:

18 марта, 08:44:36

Tender Wang <tndrwang@gmail.com> 于2026年3月18日周三 09:12写道：
>
> Alexander Korotkov <aekorotkov@gmail.com> 于2026年3月18日周三 01:46写道：
> >
> > On Tue, Mar 17, 2026 at 3:30 PM Tender Wang <tndrwang@gmail.com> wrote:
> > > Kirill Reshke <reshkekirill@gmail.com> 于2026年3月17日周二 21:24写道：
> > > >
> > > > On Tue, 17 Mar 2026 at 18:20, Tender Wang <tndrwang@gmail.com> wrote:
> > > > Yes, it looks like your analysis is valid. Will you share a patch for
> > > > updating `clause` ?
> > > >
> > > Sorry, it's already the middle of the night here, so I'm afraid I
> > > don't have much time to work on this right now.
> >
> > Thank you for your research.  I've written a simple draft patch.  It
> > fixes the reported case, but I doubt it is correct in general.  I'll
> > continue the investigation.
> I tried the above fix, no error again.  But I got a plan like this:
>                                         QUERY PLAN
> -------------------------------------------------------------------------------------------
>  Nested Loop Left Join  (cost=0.00..115164616.71 rows=7458350250 width=4)
>    ->  Seq Scan on pg_table_a tom3  (cost=0.00..34.70 rows=2470 width=0)
>    ->  Materialize  (cost=0.00..99509.82 rows=3019575 width=0)
>          ->  Nested Loop Left Join  (cost=0.00..75564.95 rows=3019575 width=0)
>                Join Filter: tom2.col_bool
>                ->  Seq Scan on pg_table_a tom2  (cost=0.00..34.70
> rows=2445 width=5)
>                      Filter: ((col_bool IS NOT NULL) AND (col_bool IS NOT NULL))
>                ->  Materialize  (cost=0.00..47.05 rows=2470 width=0)
>                      ->  Seq Scan on pg_table_a tom0
> (cost=0.00..34.70 rows=2470 width=0)
> (9 rows)
>
> Filter: ((col_bool IS NOT NULL) AND (col_bool IS NOT NULL))
> This makes me unhappy.
> Your patch gets the same plan.
>
In replace_relid_callback(), we add NullTest to rinfo, but it is not a
logical equal check by restrict_infos_logically_equal().
I think for baserestrictinfo, we can just use rinfo->clause, no need
to check the equality of RestrictInfo.

I tried this way, the plan looks as follows:
                                        QUERY PLAN
-------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=0.00..115776846.35 rows=7498006100 width=4)
   ->  Seq Scan on pg_table_a tom3  (cost=0.00..34.70 rows=2470 width=0)
   ->  Materialize  (cost=0.00..100038.47 rows=3035630 width=0)
         ->  Nested Loop Left Join  (cost=0.00..75966.32 rows=3035630 width=0)
               Join Filter: tom2.col_bool
               ->  Seq Scan on pg_table_a tom2  (cost=0.00..34.70
rows=2458 width=5)
                     Filter: (col_bool IS NOT NULL)
               ->  Materialize  (cost=0.00..47.05 rows=2470 width=0)
                     ->  Seq Scan on pg_table_a tom0
(cost=0.00..34.70 rows=2470 width=0)
(9 rows)

No redundant filter anymore.

Please see the attached patch.

--
Thanks,
Tender Wang

Вложения

0001-Fix-sje-bug.patch

Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

От

Kirill Reshke

Дата:

18 марта, 08:56:55

On Wed, 18 Mar 2026 at 10:44, Tender Wang <tndrwang@gmail.com> wrote:
>
> Tender Wang <tndrwang@gmail.com> 于2026年3月18日周三 09:12写道：
> >
> > Alexander Korotkov <aekorotkov@gmail.com> 于2026年3月18日周三 01:46写道：
> > >
> > > On Tue, Mar 17, 2026 at 3:30 PM Tender Wang <tndrwang@gmail.com> wrote:
> > > > Kirill Reshke <reshkekirill@gmail.com> 于2026年3月17日周二 21:24写道：
> > > > >
> > > > > On Tue, 17 Mar 2026 at 18:20, Tender Wang <tndrwang@gmail.com> wrote:
> > > > > Yes, it looks like your analysis is valid. Will you share a patch for
> > > > > updating `clause` ?
> > > > >
> > > > Sorry, it's already the middle of the night here, so I'm afraid I
> > > > don't have much time to work on this right now.
> > >
> > > Thank you for your research.  I've written a simple draft patch.  It
> > > fixes the reported case, but I doubt it is correct in general.  I'll
> > > continue the investigation.
> > I tried the above fix, no error again.  But I got a plan like this:
> >                                         QUERY PLAN
> > -------------------------------------------------------------------------------------------
> >  Nested Loop Left Join  (cost=0.00..115164616.71 rows=7458350250 width=4)
> >    ->  Seq Scan on pg_table_a tom3  (cost=0.00..34.70 rows=2470 width=0)
> >    ->  Materialize  (cost=0.00..99509.82 rows=3019575 width=0)
> >          ->  Nested Loop Left Join  (cost=0.00..75564.95 rows=3019575 width=0)
> >                Join Filter: tom2.col_bool
> >                ->  Seq Scan on pg_table_a tom2  (cost=0.00..34.70
> > rows=2445 width=5)
> >                      Filter: ((col_bool IS NOT NULL) AND (col_bool IS NOT NULL))
> >                ->  Materialize  (cost=0.00..47.05 rows=2470 width=0)
> >                      ->  Seq Scan on pg_table_a tom0
> > (cost=0.00..34.70 rows=2470 width=0)
> > (9 rows)
> >
> > Filter: ((col_bool IS NOT NULL) AND (col_bool IS NOT NULL))
> > This makes me unhappy.
> > Your patch gets the same plan.
> >
> In replace_relid_callback(), we add NullTest to rinfo, but it is not a
> logical equal check by restrict_infos_logically_equal().
> I think for baserestrictinfo, we can just use rinfo->clause, no need
> to check the equality of RestrictInfo.
>
> I tried this way, the plan looks as follows:
>                                         QUERY PLAN
> -------------------------------------------------------------------------------------------
>  Nested Loop Left Join  (cost=0.00..115776846.35 rows=7498006100 width=4)
>    ->  Seq Scan on pg_table_a tom3  (cost=0.00..34.70 rows=2470 width=0)
>    ->  Materialize  (cost=0.00..100038.47 rows=3035630 width=0)
>          ->  Nested Loop Left Join  (cost=0.00..75966.32 rows=3035630 width=0)
>                Join Filter: tom2.col_bool
>                ->  Seq Scan on pg_table_a tom2  (cost=0.00..34.70
> rows=2458 width=5)
>                      Filter: (col_bool IS NOT NULL)
>                ->  Materialize  (cost=0.00..47.05 rows=2470 width=0)
>                      ->  Seq Scan on pg_table_a tom0
> (cost=0.00..34.70 rows=2470 width=0)
> (9 rows)
>
> No redundant filter anymore.
>
> Please see the attached patch.
>
> --
> Thanks,
> Tender Wang

Hi!
Your patch looks solid. ChangeVarNodesWalkExpression looks like a
generic rewriter utility function, so I was wondering why we never got
complaints about bugs related it. But this functions is reachable only
when SJE optimisation is allowed, so looks like this explains the
issue.


--
Best regards,
Kirill Reshke

Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

От

Alexander Korotkov

Дата:

18 марта, 11:40:17

Hi, Tender!

On Wed, Mar 18, 2026 at 7:44 AM Tender Wang <tndrwang@gmail.com> wrote:
>
> Tender Wang <tndrwang@gmail.com> 于2026年3月18日周三 09:12写道：
> >
> > Alexander Korotkov <aekorotkov@gmail.com> 于2026年3月18日周三 01:46写道：
> > >
> > > On Tue, Mar 17, 2026 at 3:30 PM Tender Wang <tndrwang@gmail.com> wrote:
> > > > Kirill Reshke <reshkekirill@gmail.com> 于2026年3月17日周二 21:24写道：
> > > > >
> > > > > On Tue, 17 Mar 2026 at 18:20, Tender Wang <tndrwang@gmail.com> wrote:
> > > > > Yes, it looks like your analysis is valid. Will you share a patch for
> > > > > updating `clause` ?
> > > > >
> > > > Sorry, it's already the middle of the night here, so I'm afraid I
> > > > don't have much time to work on this right now.
> > >
> > > Thank you for your research.  I've written a simple draft patch.  It
> > > fixes the reported case, but I doubt it is correct in general.  I'll
> > > continue the investigation.
> > I tried the above fix, no error again.  But I got a plan like this:
> >                                         QUERY PLAN
> > -------------------------------------------------------------------------------------------
> >  Nested Loop Left Join  (cost=0.00..115164616.71 rows=7458350250 width=4)
> >    ->  Seq Scan on pg_table_a tom3  (cost=0.00..34.70 rows=2470 width=0)
> >    ->  Materialize  (cost=0.00..99509.82 rows=3019575 width=0)
> >          ->  Nested Loop Left Join  (cost=0.00..75564.95 rows=3019575 width=0)
> >                Join Filter: tom2.col_bool
> >                ->  Seq Scan on pg_table_a tom2  (cost=0.00..34.70
> > rows=2445 width=5)
> >                      Filter: ((col_bool IS NOT NULL) AND (col_bool IS NOT NULL))
> >                ->  Materialize  (cost=0.00..47.05 rows=2470 width=0)
> >                      ->  Seq Scan on pg_table_a tom0
> > (cost=0.00..34.70 rows=2470 width=0)
> > (9 rows)
> >
> > Filter: ((col_bool IS NOT NULL) AND (col_bool IS NOT NULL))
> > This makes me unhappy.
> > Your patch gets the same plan.
> >
> In replace_relid_callback(), we add NullTest to rinfo, but it is not a
> logical equal check by restrict_infos_logically_equal().
> I think for baserestrictinfo, we can just use rinfo->clause, no need
> to check the equality of RestrictInfo.
>
> I tried this way, the plan looks as follows:
>                                         QUERY PLAN
> -------------------------------------------------------------------------------------------
>  Nested Loop Left Join  (cost=0.00..115776846.35 rows=7498006100 width=4)
>    ->  Seq Scan on pg_table_a tom3  (cost=0.00..34.70 rows=2470 width=0)
>    ->  Materialize  (cost=0.00..100038.47 rows=3035630 width=0)
>          ->  Nested Loop Left Join  (cost=0.00..75966.32 rows=3035630 width=0)
>                Join Filter: tom2.col_bool
>                ->  Seq Scan on pg_table_a tom2  (cost=0.00..34.70
> rows=2458 width=5)
>                      Filter: (col_bool IS NOT NULL)
>                ->  Materialize  (cost=0.00..47.05 rows=2470 width=0)
>                      ->  Seq Scan on pg_table_a tom0
> (cost=0.00..34.70 rows=2470 width=0)
> (9 rows)
>
> No redundant filter anymore.
>
> Please see the attached patch.

What about being more generic and call ChangeVarNodes_walker() for the
node in ChangeVarNodesWalkExpression()?  It also works with out case
and avoids code duplication.

Changes in restrict_infos_logically_equal() makes me a bit uneasy.  I
see, restictinfo's are different by their outer_relids.  Why
outer_relids doesn't matter when required_relids is singleton?  More
general when do outer_relids matter for add_non_redundant_clauses() if
we're putting restictinfo's into a single list anyway?

------
Regards,
Alexander Korotkov
Supabase

Вложения

0001-Fix-sje-bug-v2.patch

Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

От

Tender Wang

Дата:

18 марта, 12:46:51

Alexander Korotkov <aekorotkov@gmail.com> 于2026年3月18日周三 16:40写道：
>
> What about being more generic and call ChangeVarNodes_walker() for the
> node in ChangeVarNodesWalkExpression()?  It also works with out case
> and avoids code duplication.

Works for me.

>
> Changes in restrict_infos_logically_equal() makes me a bit uneasy.  I
> see, restictinfo's are different by their outer_relids.  Why
> outer_relids doesn't matter when required_relids is singleton?  More
> general when do outer_relids matter for add_non_redundant_clauses() if
> we're putting restictinfo's into a single list anyway?

(gdb) call nodeToString(binfo_candidates )
{RESTRICTINFO :clause {NULLTEST :arg {VAR :varno 3 :varattno 2
:vartype 16 :vartypmod -1 :varcollid 0 :varnullingrels (b)
:varlevelsup 0 :varreturningtype 0 :varnosyn 3 :varattnosyn 2
:location -1} :nulltesttype 1 :argisrow false :location -1}
:is_pushed_down true :can_join false :pseudoconstant false :has_clone
false :is_clone false :leakproof false :has_volatile 0 :security_level
0 :num_base_rels 1 :clause_relids (b 3) :required_relids (b 3)
:incompatible_relids (b) :outer_relids (b 5) :left_relids (b)
:right_relids (b) :orclause <> :rinfo_serial 3 :eval_cost.startup -1
:eval_cost.per_tuple 0 :norm_selec -1 :outer_selec -1 :mergeopfamilies
<> :left_em <> :right_em <> :outer_is_left false :hashjoinoperator 0
:left_bucketsize -1 :right_bucketsize -1 :left_mcvfreq -1
:right_mcvfreq -1 :left_hasheqoperator 0 :right_hasheqoperator 0}

{RESTRICTINFO :clause {NULLTEST :arg {VAR :varno 3 :varattno 2
:vartype 16 :vartypmod -1 :varcollid 0 :varnullingrels (b)
:varlevelsup 0 :varreturningtype 0 :varnosyn 3 :varattnosyn 2
:location -1} :nulltesttype 1 :argisrow false :location -1}
:is_pushed_down true :can_join true :pseudoconstant false :has_clone
false :is_clone false :leakproof false :has_volatile 2 :security_level
0 :num_base_rels 1 :clause_relids (b 3) :required_relids (b 3)
:incompatible_relids (b) :outer_relids (b) :left_relids (b 3)
:right_relids (b 3) :orclause <> :rinfo_serial 6 :eval_cost.startup -1
:eval_cost.per_tuple 0 :norm_selec -1 :outer_selec -1 :mergeopfamilies
<> :left_em <> :right_em <> :outer_is_left true :hashjoinoperator 91
:left_bucketsize -1 :right_bucketsize -1 :left_mcvfreq -1
:right_mcvfreq -1 :left_hasheqoperator 91 :right_hasheqoperator 91})"

The first is tom1.col_bool IS NOT NULL,  its outer_relids (b 5)  is
not empty because it's in the nullable side.
The second is added in replace_relid_callback().  Some fields do not match.

In this case, tom1.col_bool IS NOT NULL becomes a filter clause, not a
join clause, and is safe to only check rinfo->clause.
I think it's better to add is_pushed_down == true check, for example:
...
-       if (bms_membership(a->required_relids) == BMS_SINGLETON)
+       if (bms_membership(a->required_relids) == BMS_SINGLETON &&
+               a->is_pushed_down &&
+               b->is_pushed_down)
...

Any thought?

--
Thanks,
Tender Wang

Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

От

Andrei Lepikhov

Дата:

18 марта, 15:18:27

On 18/3/26 09:40, Alexander Korotkov wrote:
> What about being more generic and call ChangeVarNodes_walker() for the
> node in ChangeVarNodesWalkExpression()?  It also works with out case
> and avoids code duplication.
I’ve reached the same conclusion. We lost a possible case when the 
RestrictInfo→clause contains a bare Var that isn’t pushed into either 
the left or right subtree.
I think we can fix this by replacing the expression walker with 
ChangeVarNodes_walker().
What is the reason for the second change? Tender, can you show us how to 
reproduce the issue so we can support your update to 
restrict_infos_logically_equal? If we include it, we should add a test.

-- 
regards, Andrei Lepikhov,
pgEdge

Вложения

fix.diff

Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

От

Alexander Korotkov

Дата:

18 марта, 15:21:34

On Wed, Mar 18, 2026 at 2:18 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
>
> On 18/3/26 09:40, Alexander Korotkov wrote:
> > What about being more generic and call ChangeVarNodes_walker() for the
> > node in ChangeVarNodesWalkExpression()?  It also works with out case
> > and avoids code duplication.
> I’ve reached the same conclusion. We lost a possible case when the
> RestrictInfo→clause contains a bare Var that isn’t pushed into either
> the left or right subtree.
> I think we can fix this by replacing the expression walker with
> ChangeVarNodes_walker().

I see.  This is even better.

> What is the reason for the second change? Tender, can you show us how to
> reproduce the issue so we can support your update to
> restrict_infos_logically_equal? If we include it, we should add a test.

I think Tender already shown this in [1].  The same qual is present
twice in the plan.

Links.
1. https://www.postgresql.org/message-id/CAHewXN%3D7kDJjUcgEm%2B6qhaKOXuqzvhRqAAKdafNCRgn0yH7BGg%40mail.gmail.com

------
Regards,
Alexander Korotkov
Supabase

Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

От

Andrei Lepikhov

Дата:

18 марта, 16:31:00

On 18/3/26 13:21, Alexander Korotkov wrote:
> On Wed, Mar 18, 2026 at 2:18 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
>> What is the reason for the second change? Tender, can you show us how to
>> reproduce the issue so we can support your update to
>> restrict_infos_logically_equal? If we include it, we should add a test.
> 
> I think Tender already shown this in [1].  The same qual is present
> twice in the plan.
Got it. I suggest making this a separate commit to keep the history 
clear. Let me share a draft with a test case for the bug fix first.

-- 
regards, Andrei Lepikhov,
pgEdge

Вложения

0001-Fix-self-join-removal-to-update-bare-Var-references-.patch

Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

От

Alexander Korotkov

Дата:

18 марта, 17:38:15

On Wed, Mar 18, 2026 at 3:31 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
> On 18/3/26 13:21, Alexander Korotkov wrote:
> > On Wed, Mar 18, 2026 at 2:18 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
> >> What is the reason for the second change? Tender, can you show us how to
> >> reproduce the issue so we can support your update to
> >> restrict_infos_logically_equal? If we include it, we should add a test.
> >
> > I think Tender already shown this in [1].  The same qual is present
> > twice in the plan.
> Got it. I suggest making this a separate commit to keep the history
> clear. Let me share a draft with a test case for the bug fix first.

Yes, I was also thinking about splitting this into two distinct
commit.  The patch you've attached looks good for me.  I'm going to
push and backpatch it if no objections.  And let's continue the
investigation on restrict_infos_logically_equal().

------
Regards,
Alexander Korotkov
Supabase

Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

От

Kirill Reshke

Дата:

18 марта, 20:55:22

On Wed, 18 Mar 2026 at 18:31, Andrei Lepikhov <lepihov@gmail.com> wrote:
>
> On 18/3/26 13:21, Alexander Korotkov wrote:
> > On Wed, Mar 18, 2026 at 2:18 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
> >> What is the reason for the second change? Tender, can you show us how to
> >> reproduce the issue so we can support your update to
> >> restrict_infos_logically_equal? If we include it, we should add a test.
> >
> > I think Tender already shown this in [1].  The same qual is present
> > twice in the plan.
> Got it. I suggest making this a separate commit to keep the history
> clear. Let me share a draft with a test case for the bug fix first.
>
> --
> regards, Andrei Lepikhov,
> pgEdge

Hi!
Is `cool_bool` a typo of `col_bool` in regression test ?

--
Best regards,
Kirill Reshke

Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

От

Tender Wang

Дата:

19 марта, 04:11:10

Alexander Korotkov <aekorotkov@gmail.com> 于2026年3月18日周三 22:38写道：
>
> On Wed, Mar 18, 2026 at 3:31 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
> > On 18/3/26 13:21, Alexander Korotkov wrote:
> > > On Wed, Mar 18, 2026 at 2:18 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
> > >> What is the reason for the second change? Tender, can you show us how to
> > >> reproduce the issue so we can support your update to
> > >> restrict_infos_logically_equal? If we include it, we should add a test.
> > >
> > > I think Tender already shown this in [1].  The same qual is present
> > > twice in the plan.
> > Got it. I suggest making this a separate commit to keep the history
> > clear. Let me share a draft with a test case for the bug fix first.
>
> Yes, I was also thinking about splitting this into two distinct
> commit.  The patch you've attached looks good for me.  I'm going to
> push and backpatch it if no objections.  And let's continue the
> investigation on restrict_infos_logically_equal().

Agree.
And the patch seems to have forgotten to add "Reported by".


--
Thanks,
Tender Wang

Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

От

Andrei Lepikhov

Дата:

19 марта, 10:18:03

On 18/3/26 18:55, Kirill Reshke wrote:
> On Wed, 18 Mar 2026 at 18:31, Andrei Lepikhov <lepihov@gmail.com> wrote:
>>
>> On 18/3/26 13:21, Alexander Korotkov wrote:
>>> On Wed, Mar 18, 2026 at 2:18 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
>>>> What is the reason for the second change? Tender, can you show us how to
>>>> reproduce the issue so we can support your update to
>>>> restrict_infos_logically_equal? If we include it, we should add a test.
>>>
>>> I think Tender already shown this in [1].  The same qual is present
>>> twice in the plan.
>> Got it. I suggest making this a separate commit to keep the history
>> clear. Let me share a draft with a test case for the bug fix first.
> Hi!
> Is `cool_bool` a typo of `col_bool` in regression test ?

This is actually a happy coincidence, not a typo. We made this mistake 
during initial development because I didn't realise that a boolean 
operation on a bool variable is never wrapped in an expression 
structure. It's a 'cool' example that shows a rare edge case.

 > And the patch seems to have forgotten to add "Reported by".

Yeah, let the committer manage the award part.

-- 
regards, Andrei Lepikhov,
pgEdge

Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

От

Alexander Korotkov

Дата:

19 марта, 16:03:17

On Thu, Mar 19, 2026 at 9:18 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
> On 18/3/26 18:55, Kirill Reshke wrote:
> > On Wed, 18 Mar 2026 at 18:31, Andrei Lepikhov <lepihov@gmail.com> wrote:
> >>
> >> On 18/3/26 13:21, Alexander Korotkov wrote:
> >>> On Wed, Mar 18, 2026 at 2:18 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
> >>>> What is the reason for the second change? Tender, can you show us how to
> >>>> reproduce the issue so we can support your update to
> >>>> restrict_infos_logically_equal? If we include it, we should add a test.
> >>>
> >>> I think Tender already shown this in [1].  The same qual is present
> >>> twice in the plan.
> >> Got it. I suggest making this a separate commit to keep the history
> >> clear. Let me share a draft with a test case for the bug fix first.
> > Hi!
> > Is `cool_bool` a typo of `col_bool` in regression test ?
>
> This is actually a happy coincidence, not a typo. We made this mistake
> during initial development because I didn't realise that a boolean
> operation on a bool variable is never wrapped in an expression
> structure. It's a 'cool' example that shows a rare edge case.
>
>  > And the patch seems to have forgotten to add "Reported by".
>
> Yeah, let the committer manage the award part.

I've revised the patch.  Renamed cool_bool to cool_col, added
"Reported by", and revised authors list according to my opinion.

------
Regards,
Alexander Korotkov
Supabase

Вложения

v2-0001-Fix-self-join-removal-to-update-bare-Var-referenc.patch

Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

От

Alexander Korotkov

Дата:

20 марта, 17:02:10

On Thu, Mar 19, 2026 at 3:03 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> On Thu, Mar 19, 2026 at 9:18 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
> > On 18/3/26 18:55, Kirill Reshke wrote:
> > > On Wed, 18 Mar 2026 at 18:31, Andrei Lepikhov <lepihov@gmail.com> wrote:
> > >>
> > >> On 18/3/26 13:21, Alexander Korotkov wrote:
> > >>> On Wed, Mar 18, 2026 at 2:18 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
> > >>>> What is the reason for the second change? Tender, can you show us how to
> > >>>> reproduce the issue so we can support your update to
> > >>>> restrict_infos_logically_equal? If we include it, we should add a test.
> > >>>
> > >>> I think Tender already shown this in [1].  The same qual is present
> > >>> twice in the plan.
> > >> Got it. I suggest making this a separate commit to keep the history
> > >> clear. Let me share a draft with a test case for the bug fix first.
> > > Hi!
> > > Is `cool_bool` a typo of `col_bool` in regression test ?
> >
> > This is actually a happy coincidence, not a typo. We made this mistake
> > during initial development because I didn't realise that a boolean
> > operation on a bool variable is never wrapped in an expression
> > structure. It's a 'cool' example that shows a rare edge case.
> >
> >  > And the patch seems to have forgotten to add "Reported by".
> >
> > Yeah, let the committer manage the award part.
>
> I've revised the patch.  Renamed cool_bool to cool_col, added
> "Reported by", and revised authors list according to my opinion.

OK. I've pushed this.  Let's go back to
restrict_infos_logically_equal().  I'm still not convinced that we
need to check if required_relids is singleton.  Why we can ignore
outer_relids for singleton, but can't do if, for instance, two
relations involved?

------
Regards,
Alexander Korotkov
Supabase

Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

От

Tom Lane

Дата:

20 марта, 17:30:06

Alexander Korotkov <aekorotkov@gmail.com> writes:
> OK. I've pushed this.

I don't love this patch.  It fixes the functional problem, but it
does nothing to fix the underlying cause of that problem, namely
the abysmal under-documentation of the ChangeVarNodesXXX functions.
There are very specific assumptions about whether the recursion
is starting at a Query or not, and the code will do the wrong
thing if invoked at the wrong node level.

In particular, imagine that ChangeVarNodes_walker is invoked
directly on a Query node, something that ChangeVarNodesExtended
is careful not to do.  It will increment sublevels_up immediately
and thus process the contents of the Query with sublevels_up==1,
meaning it will not recognize local Vars as needing adjustment.

At the very least we need to add comments, but I wonder if we
don't actually need an Assert that ChangeVarNodesWalkExpression
is not invoked directly on a Query.  It would have done the
right thing before this patch, but now it won't.  That's an
okay tradeoff for fixing the bare-Var case, but not documenting
what you did is not okay.

            regards, tom lane

Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

От

Tom Lane

Дата:

20 марта, 23:26:13

I wrote:
> At the very least we need to add comments, but I wonder if we
> don't actually need an Assert that ChangeVarNodesWalkExpression
> is not invoked directly on a Query.  It would have done the
> right thing before this patch, but now it won't.  That's an
> okay tradeoff for fixing the bare-Var case, but not documenting
> what you did is not okay.

After further contemplation I've decided that an Assert would be
wrong, because it's not impossible that a callback would want
to invoke this on a sub-Query --- for instance, if it wanted to
short-circuit ChangeVarNodes's processing of a SubLink node,
it would need to do that.  The key point is that if we do see a
Query node here, we will treat it as a sub-query not a top-level
query, which also justifies skipping the work that
ChangeVarNodesExtended does on a top-level Query.  So we just
need a comment explaining that.  I'm thinking about the attached.

(BTW, by this reasoning the previous implementation of
ChangeVarNodesWalkExpression was doubly wrong, since it would
have done the wrong thing at a Query node as well as a Var node.)

            regards, tom lane

diff --git a/src/backend/rewrite/rewriteManip.c b/src/backend/rewrite/rewriteManip.c
index 7249ffbfb36..dc803a17037 100644
--- a/src/backend/rewrite/rewriteManip.c
+++ b/src/backend/rewrite/rewriteManip.c
@@ -672,7 +672,7 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
  * value indicating if the given node should be skipped from further processing
  * by ChangeVarNodes_walker.  The callback is called only for expressions and
  * other children nodes of a Query processed by a walker.  Initial processing
- * of the root Query doesn't involve the callback.
+ * of the root Query node doesn't invoke the callback.
  */
 void
 ChangeVarNodesExtended(Node *node, int rt_index, int new_index,
@@ -737,9 +737,16 @@ ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
 }

 /*
- * ChangeVarNodesWalkExpression - process expression within the custom
- *                                  callback provided to the
- *                                  ChangeVarNodesExtended.
+ * ChangeVarNodesWalkExpression - process subexpression within a callback
+ *                                  function passed to ChangeVarNodesExtended.
+ *
+ * This is intended to be used by a callback that needs to recursively
+ * process subexpressions of some node being visited by an outer
+ * ChangeVarNodesExtended call (not letting ChangeVarNodes_walker do that).
+ * Hence, we invoke ChangeVarNodes_walker directly.  This means that if
+ * the passed Node is a Query node, it will be treated as a sub-Query,
+ * so sublevels_up will be incremented immediately.  Do not apply this
+ * to a top-level Query node, or you'll likely get wrong results.
  */
 bool
 ChangeVarNodesWalkExpression(Node *node, ChangeVarNodes_context *context)

Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

От

Tender Wang

Дата:

22 марта, 08:15:56

Hi Tom,

Tom Lane <tgl@sss.pgh.pa.us> 于2026年3月21日周六 04:26写道：
>
> I wrote:
> > At the very least we need to add comments, but I wonder if we
> > don't actually need an Assert that ChangeVarNodesWalkExpression
> > is not invoked directly on a Query.  It would have done the
> > right thing before this patch, but now it won't.  That's an
> > okay tradeoff for fixing the bare-Var case, but not documenting
> > what you did is not okay.
>
> After further contemplation I've decided that an Assert would be
> wrong, because it's not impossible that a callback would want
> to invoke this on a sub-Query --- for instance, if it wanted to
> short-circuit ChangeVarNodes's processing of a SubLink node,
> it would need to do that.  The key point is that if we do see a
> Query node here, we will treat it as a sub-query not a top-level
> query, which also justifies skipping the work that
> ChangeVarNodesExtended does on a top-level Query.  So we just
> need a comment explaining that.  I'm thinking about the attached.
>
> (BTW, by this reasoning the previous implementation of
> ChangeVarNodesWalkExpression was doubly wrong, since it would
> have done the wrong thing at a Query node as well as a Var node.)
Thanks for pointing this out. The attached looks good to me.

Do you have some advice about that the same qual is present
twice in the plan, see [1].
Should we do something in restrict_infos_logically_equal().
Please take a look.
--
Thanks,
Tender Wang

Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

От

Alexander Korotkov

Дата:

24 марта, 00:28:45

Hi, Tom!

On Fri, Mar 20, 2026 at 10:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I wrote:
> > At the very least we need to add comments, but I wonder if we
> > don't actually need an Assert that ChangeVarNodesWalkExpression
> > is not invoked directly on a Query.  It would have done the
> > right thing before this patch, but now it won't.  That's an
> > okay tradeoff for fixing the bare-Var case, but not documenting
> > what you did is not okay.
>
> After further contemplation I've decided that an Assert would be
> wrong, because it's not impossible that a callback would want
> to invoke this on a sub-Query --- for instance, if it wanted to
> short-circuit ChangeVarNodes's processing of a SubLink node,
> it would need to do that.  The key point is that if we do see a
> Query node here, we will treat it as a sub-query not a top-level
> query, which also justifies skipping the work that
> ChangeVarNodesExtended does on a top-level Query.  So we just
> need a comment explaining that.  I'm thinking about the attached.
>
> (BTW, by this reasoning the previous implementation of
> ChangeVarNodesWalkExpression was doubly wrong, since it would
> have done the wrong thing at a Query node as well as a Var node.)

Thank you so much for caring about this.  I agree that this kind of
changes should go with proper comments.

Please, consider my additions to the comment.  They explain why we use
ChangeVarNodes_walker() instead of expression_tree_walker(), and gives
a bit more details about difference in processing of top-level Query
and subquery.

------
Regards,
Alexander Korotkov
Supabase

Вложения

further-improve-ChangeVarNodesWalkExpression-comments.patch

Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

От

Tom Lane

Дата:

24 марта, 00:34:31

Alexander Korotkov <aekorotkov@gmail.com> writes:
> Please, consider my additions to the comment.  They explain why we use
> ChangeVarNodes_walker() instead of expression_tree_walker(), and gives
> a bit more details about difference in processing of top-level Query
> and subquery.

I'm fine with this, although in

+ * ... We invoke ChangeVarNodes_walker directly rather than
+ * expression_tree_walker

I would write "rather than via expression_tree_walker".

            regards, tom lane

Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

От

Andrei Lepikhov

Дата:

26 марта, 22:59:13

On 20/3/26 15:02, Alexander Korotkov wrote:
> OK. I've pushed this.  Let's go back to
> restrict_infos_logically_equal().  I'm still not convinced that we
> need to check if required_relids is singleton.  Why we can ignore
> outer_relids for singleton, but can't do if, for instance, two
> relations involved?

Let's continue. In the attachment, the Tender's proposal that I changed 
a little and added some tests.

As you can see in the tests, the SINGLETON limitation keeps duplicates 
of clauses like 'a.x + b.y = c'.
This example shows the main flaw of this approach. Introducing the 
restrict_infos_logically_equal(), we do a little more job than just the 
equal() routine could because of the context where we call this function 
and on which clauses.
But skipping all other RestrictInfo fields except required_relids seems 
excessive. - see the example with security_level difference - ignoring 
its value, we potentially might remove the clause with enforced security 
level in favour of an unsecured one.
That's more, further some new optimisations might introduce more fields 
into RestrictInfo that should be checked to correctly decide on the 
equality, and we may forget to arrange this specific place.

So, formally it works, and making the following replacement, we close 
the singleton issue:

-       if (bms_membership(a->required_relids) == BMS_SINGLETON &&
-               a->security_level == b->security_level)
+       if (bms_equal(a->required_relids, b->required_relids) &&
+               a->security_level == b->security_level &&
+               a->is_pushed_down == b->is_pushed_down)

but I'm unsure, in general, that this approach is conservative enough. 
Maybe we shouldn’t change this logic and invent one more optimisation 
‘deduplication’ stage later, before the selectivity estimation stage.

-- 
regards, Andrei Lepikhov,
pgEdge

Вложения

0001-Improve-RestrictInfo-deduplication-after-self-join-e.patch

Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

От

Tender Wang

Дата:

27 марта, 04:19:29

Andrei Lepikhov <lepihov@gmail.com> 于2026年3月27日周五 03:59写道：
>
> On 20/3/26 15:02, Alexander Korotkov wrote:
> > OK. I've pushed this.  Let's go back to
> > restrict_infos_logically_equal().  I'm still not convinced that we
> > need to check if required_relids is singleton.  Why we can ignore
> > outer_relids for singleton, but can't do if, for instance, two
> > relations involved?
>
> Let's continue. In the attachment, the Tender's proposal that I changed
> a little and added some tests.
>
> As you can see in the tests, the SINGLETON limitation keeps duplicates
> of clauses like 'a.x + b.y = c'.
> This example shows the main flaw of this approach. Introducing the
> restrict_infos_logically_equal(), we do a little more job than just the
> equal() routine could because of the context where we call this function
> and on which clauses.
> But skipping all other RestrictInfo fields except required_relids seems
> excessive. - see the example with security_level difference - ignoring
> its value, we potentially might remove the clause with enforced security
> level in favour of an unsecured one.

Yes, it seems too strict to require all fields to be equal, but
skipping some fields is unsafe.


> That's more, further some new optimisations might introduce more fields
> into RestrictInfo that should be checked to correctly decide on the
> equality, and we may forget to arrange this specific place.
>

Agree.

> So, formally it works, and making the following replacement, we close
> the singleton issue:
>
> -       if (bms_membership(a->required_relids) == BMS_SINGLETON &&
> -               a->security_level == b->security_level)
> +       if (bms_equal(a->required_relids, b->required_relids) &&
> +               a->security_level == b->security_level &&
> +               a->is_pushed_down == b->is_pushed_down)
>

The singleton issue does not seem to be the correct way; I don't dive
deeply to cover all cases.

> but I'm unsure, in general, that this approach is conservative enough.
> Maybe we shouldn’t change this logic and invent one more optimisation
> ‘deduplication’ stage later, before the selectivity estimation stage.




--
Thanks,
Tender Wang

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения