Обсуждение: Is it expected behavior index only scan shows "OPERATOR(pg_catalog." for EXPLAIN?

Поиск
Список
Период
Сортировка
Hi,

While I'm researching about [1], I found there are inconsistent EXPLAIN outputs.
Here is an example which shows " OPERATOR(pg_catalog.". Though it's not wrong,
I feel like there is no consistency in the output format.

-- A reproduce procedure
create temp table btree_bpchar (f1 text collate "C");
create index on btree_bpchar(f1 bpchar_ops);
insert into btree_bpchar values ('foo'), ('fool'), ('bar'), ('quux');
set enable_seqscan to false;
set enable_bitmapscan to false;
set enable_indexonlyscan to false; -- or true
explain (costs off)
select * from btree_bpchar where f1::bpchar like 'foo';

-- Index Scan result
                      QUERY PLAN
------------------------------------------------------
 Index Scan using btree_bpchar_f1_idx on btree_bpchar
   Index Cond: ((f1)::bpchar = 'foo'::bpchar)
   Filter: ((f1)::bpchar ~~ 'foo'::text)
(3 rows)

-- Index Only Scan result which has 'OPERATOR'
                        QUERY PLAN
-----------------------------------------------------------
 Index Only Scan using btree_bpchar_f1_idx on btree_bpchar
   Index Cond: (f1 OPERATOR(pg_catalog.=) 'foo'::bpchar)        -- Here is the point.
   Filter: ((f1)::bpchar ~~ 'foo'::text)
(3 rows)


IIUC, the index only scan use fixed_indexquals, which is removed "RelabelType" nodes,
for EXPLAIN so that get_rule_expr() could not understand the left argument of the operator
(f1 if the above case) can be displayed with arg::resulttype and it doesn't need to
show "OPERATOR(pg_catalog.)".

I've attached PoC patch to show a simple solution. It just adds a new member "indexqualorig"
to the index only scan node like the index scan and the bitmap index scan. But, since I'm
a beginner about the planner, I might have misunderstood something or there should be better
ways.



BTW, I'm trying to add a new index AM interface for EXPLAIN on the thread([1]). As the aspect,
my above solution might not be ideal because AMs can only know index column ids (varattno)
from fixed_indexquals. In that case, to support "fixed_indexquals" as argument of deparse_expression()
is better.

[1] Improve EXPLAIN output for multicolumn B-Tree Index

https://www.postgresql.org/message-id/flat/TYWPR01MB1098260B694D27758FE2BA46FB1C92%40TYWPR01MB10982.jpnprd01.prod.outlook.com

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION


Вложения
On 7/8/24 13:03, Masahiro.Ikeda@nttdata.com wrote:
> Hi,
> 
> While I'm researching about [1], I found there are inconsistent EXPLAIN outputs.
> Here is an example which shows " OPERATOR(pg_catalog.". Though it's not wrong,
> I feel like there is no consistency in the output format.
> 
> -- A reproduce procedure
> create temp table btree_bpchar (f1 text collate "C");
> create index on btree_bpchar(f1 bpchar_ops);
> insert into btree_bpchar values ('foo'), ('fool'), ('bar'), ('quux');
> set enable_seqscan to false;
> set enable_bitmapscan to false;
> set enable_indexonlyscan to false; -- or true
> explain (costs off)
> select * from btree_bpchar where f1::bpchar like 'foo';
> 
> -- Index Scan result
>                       QUERY PLAN                      
> ------------------------------------------------------
>  Index Scan using btree_bpchar_f1_idx on btree_bpchar
>    Index Cond: ((f1)::bpchar = 'foo'::bpchar)
>    Filter: ((f1)::bpchar ~~ 'foo'::text)
> (3 rows)
> 
> -- Index Only Scan result which has 'OPERATOR'
>                         QUERY PLAN                         
> -----------------------------------------------------------
>  Index Only Scan using btree_bpchar_f1_idx on btree_bpchar
>    Index Cond: (f1 OPERATOR(pg_catalog.=) 'foo'::bpchar)        -- Here is the point.
>    Filter: ((f1)::bpchar ~~ 'foo'::text)
> (3 rows)
> 

This apparently comes from generate_operator_name() in ruleutils.c,
where the OPERATOR() decorator is added if:

 /*
  * The idea here is to schema-qualify only if the parser would fail to
  * resolve the correct operator given the unqualified op name with the
  * specified argtypes.
  */

So clearly, the code believes just the operator name could be ambiguous,
so it adds the namespace too. Why exactly it is considered ambiguous I
don't know, but perhaps you have other applicable operators in the
search_path, or something like that?

> 
> IIUC, the index only scan use fixed_indexquals, which is removed "RelabelType" nodes,
> for EXPLAIN so that get_rule_expr() could not understand the left argument of the operator
> (f1 if the above case) can be displayed with arg::resulttype and it doesn't need to
> show "OPERATOR(pg_catalog.)".
> 
> I've attached PoC patch to show a simple solution. It just adds a new member "indexqualorig"
> to the index only scan node like the index scan and the bitmap index scan. But, since I'm
> a beginner about the planner, I might have misunderstood something or there should be better
> ways.
> 

I honestly don't know if this is the correct solution. It seems to me
handling this at the EXPLAIN level might just mask the issue - it's not
clear to me why adding "indexqualorig" would remove the ambiguity (if
there's one). Perhaps it might be better to find why the ruleutils.c
code thinks the OPERATOR() is necessary, and then improve/fix that?

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
> I honestly don't know if this is the correct solution. It seems to me
> handling this at the EXPLAIN level might just mask the issue - it's not
> clear to me why adding "indexqualorig" would remove the ambiguity (if
> there's one). Perhaps it might be better to find why the ruleutils.c
> code thinks the OPERATOR() is necessary, and then improve/fix that?

As Masahiro-san already said, the root cause is that the planner
removes the RelabelType that is on the indexed variable.  So ruleutils
sees that the operator is not the one that would be chosen by the
parser given those input expressions (cf generate_operator_name)
and it concludes it'd better schema-qualify the operator.  While
that doesn't really make any difference in this particular case,
it is the right thing in typical rule-deparsing cases.

I don't think this output is really wrong, and I'm not in favor
of adding nontrivial overhead to make it better, so I don't like
the proposed patch.  Maybe generate_operator_name could use some
other heuristic, but I'm unsure what.  The case it wants to cover
is that the operator is in the search path but is masked by some
operator in an earlier schema, so simply testing if the operator's
schema is in the path at all would be insufficient.

            regards, tom lane



Thanks for your comments.

Tom Lane <tgl@sss.pgh.pa.us> writes:
> Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
> > I honestly don't know if this is the correct solution. It seems to me
> > handling this at the EXPLAIN level might just mask the issue - it's
> > not clear to me why adding "indexqualorig" would remove the ambiguity
> > (if there's one). Perhaps it might be better to find why the
> > ruleutils.c code thinks the OPERATOR() is necessary, and then improve/fix that?
>
> As Masahiro-san already said, the root cause is that the planner removes the
> RelabelType that is on the indexed variable.  So ruleutils sees that the operator is not
> the one that would be chosen by the parser given those input expressions (cf
> generate_operator_name) and it concludes it'd better schema-qualify the operator.
> While that doesn't really make any difference in this particular case, it is the right thing
> in typical rule-deparsing cases.

Yes.

The plan of index scan has a "RELABELTYPE" node, and it has "resulttype" so that
generate_operator_name() gets "operid =1054("="), arg1=1042(bpchar from resulttype),
arg2=1042(bpchar)". The plan of index only scan is removed it so that
generate_operator_name() gets "operid =1054("="), arg1=25(*text*), arg2=1042(bpchar)".

Though there is no entry "operid=1054("="), arg1=25(*text*), arg2=1042(bpchar)" in
pg_operator, it decided to check with the operator "=" for (text, text) because it is
coercion-compatible and preferred than operator "=" for (bpchar, bpchar).

But since the caller expected to use operator "=" for (bpchar, bpchar), it plus
OPERATOR() decoration sadly.

# the partial output of Index Scan plan
              :indexqualorig (
                 {OPEXPR
                 :opno 1054                -- "=" operator
                 :opfuncid 1048
                 :opresulttype 16
                 :opretset false
                 :opcollid 0
                 :inputcollid 950
                 :args (
                    {RELABELTYPE
                    :arg
                       {VAR
                       :varno 1
                       :varattno 1
                       :vartype 25           -- text. But it will be evaluated as 1042(bpchar) from resulttype
                       :vartypmod -1
                       :varcollid 950
                       :varnullingrels (b)
                       :varlevelsup 0
                       :varnosyn 1
                       :varattnosyn 1
                       :location 33
                       }
                    :resulttype 1042        -- bpchar
                    :resulttypmod -1
                    :resultcollid 950
                    :relabelformat 1
                    :location 35
                    }
                    {CONST
                    :consttype 1042        -- bpchar
                    :consttypmod -1
                    :constcollid 100
                    :constlen -1
                    :constbyval false
                    :constisnull false
                    :location 46
                    :constvalue 7 [ 28 0 0 0 102 111 111 ]
                    }
                 )

# the partial output of Index Only Scan plan
              :indexqual (
                 {OPEXPR
                 :opno 1054               -- "=" operator
                 :opfuncid 1048
                 :opresulttype 16
                 :opretset false
                 :opcollid 0
                 :inputcollid 950
                 :args (
                    {VAR
                    :varno -3
                    :varattno 1
                    :vartype 25      -- text
                    :vartypmod -1
                    :varcollid 950
                    :varnullingrels (b)
                    :varlevelsup 0
                    :varnosyn 1
                    :varattnosyn 1
                    :location 33
                    }
                    {CONST
                    :consttype 1042   -- bpchar
                    :consttypmod -1
                    :constcollid 100
                    :constlen -1
                    :constbyval false
                    :constisnull false
                    :location 46
                    :constvalue 7 [ 28 0 0 0 102 111 111 ]
                    }
                 )
                 :location 44
                 }
              )

> I don't think this output is really wrong, and I'm not in favor of adding nontrivial overhead
> to make it better, so I don't like the proposed patch.  Maybe generate_operator_name
> could use some other heuristic, but I'm unsure what.  The case it wants to cover is that
> the operator is in the search path but is masked by some operator in an earlier schema,
> so simply testing if the operator's schema is in the path at all would be insufficient.

Yes, that's one of my concerns.

IIUC, the above case is not related to the search path but the arguments don't match. If so,
I think there are two ways to solve the issue.

1. make callers of generate_operator_name() check the arguments first.
The callers (e.g., get_oper_expr()) of generate_operator_name() decide whether they
should/can cast the arguments. The planner already decided what operator should be used
so that get_oper_expr() can cast always on the explain context if the arguments don't match
the operator's one, doesn't it?

2. make generate_operator_name() checks all operator candidate.
Currently generate_operator_name() checks only one operator which matches the operator's name
even if although there are other candidates (e.g., to call oper()->oper_select_candidate()).
func_select_candidate() selects operator "=" for (text, text) in the above case because "=" for
(btchar, btchar) is typispreferred=false. So, it seems to me that it can solve the issue
if generate_operator_name() checks all candidates.

I think the second solution is better because callers might expect to use specified operator
even if there are other candidates' operator which can handle the arguments.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION