Обсуждение: slow queries over information schema.tables

Поиск
Список
Период
Сортировка

slow queries over information schema.tables

От
Pavel Stehule
Дата:
Hi

one customer reported slow queries over information_schema.tables. There is newer used a index over relname probably due casting to information_schema.sql_identifier.

Slow query

select * from information_schema.tables where table_name = 'pg_class';

Usually, there is hard to fix it on application level due corporate environment.

Regards

Pavel

Re: slow queries over information schema.tables

От
Tom Lane
Дата:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> Slow query
> select * from information_schema.tables where table_name = 'pg_class';

Yeah.  This has been complained of many times before.

The core of the problem, I think, is that we're unable to convert the
condition on table_name into an indexscan on pg_class.relname, because
the view has cast pg_class.relname to the sql_identifier domain.

There are two different issues in that.  One is that the domain might
have constraints (though in reality it does not), so the planner can't
throw away the CoerceToDomain node, and thus can't match the expression
to the index.  Even if we did throw away the CoerceToDomain, it still
would not work because the domain is declared to be over varchar, and
so there's a cast-to-varchar underneath the CoerceToDomain.

The latter half of the problem can be exhibited in simplified form as

regression=# explain select * from pg_class where relname = 'pg_class'::text;
                         QUERY PLAN
------------------------------------------------------------
 Seq Scan on pg_class  (cost=0.00..175.41 rows=7 width=791)
   Filter: ((relname)::text = 'pg_class'::text)
(2 rows)

which is much stupider than what you get with a name comparison:

regression=# explain select * from pg_class where relname = 'pg_class';
                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Index Scan using pg_class_relname_nsp_index on pg_class  (cost=0.28..8.29 rows=1 width=791)
   Index Cond: (relname = 'pg_class'::name)
(2 rows)

(We've seen complaints about this form of problem, too.)  Since there's
no name-vs-text equality operator, we end up applying a cast to text so
that the texteq operator can be used, and now we're out of luck for
matching that to the index.  To add insult to injury, we also fail to
match the cast expression to the available statistics, so that we don't
get quite the right selectivity estimate.

The most straightforward way to fix that would be to add some cross-type
comparison operators to the name_ops operator family.  While we only
directly need 'name = text' to make this case work, the opr_sanity
regression test will complain if we don't supply a fairly complete set of
operators, and I think not without reason.  So I experimented with doing
that, as in the very-much-WIP 0001 patch below.  A name index can only
cope with strcmp-style comparison semantics, not strcoll-style, so while
it's okay to call the equality operator = I felt we'd better call the
inequality operators ~<~ and so on.  While the patch as presented fixes
the case shown above, it's not all good: the problem with having a new
'text = name' operator is that it will also capture cases where you would
like to have a text index working with a comparison value of type "name".
If 'text = name' isn't part of the text_ops opclass then that doesn't
work.  I think that the reason for the join.sql regression test failure
shown in patch 0001 is likewise that since 'text = name' isn't part of the
text_ops opclass, the join elimination stuff is unable to prove that it
can eliminate a join to a unique text column.  This could probably be
fixed by creating yet another dozen cross-type operators that implement
text vs name comparisons using strcoll semantics (hence, using the normal
'<' operator names), and adding that set to the text_ops opfamily.
I didn't do that here (yet), because it seems pretty tedious.

Also worth noting is that the name_ops and text_pattern_ops opfamilies
are implementing identical semantics.  I wonder whether we could merge
them.

While most of the other regression test output changes shown in the 0001
patch are unsurprising, one that is surprising is that a merge join on
two text columns has started sorting USING ~<~ rather than the normal
text ordering.  The reason for this seems to be that the 'text = text'
operator is now a member of name_ops as well as text_ops, and
select_outer_pathkeys_for_merge arbitrarily uses the lowest-number
opfamily OID if it has a choice.  We could avoid that by renumbering
name_ops to have an OID higher than text_ops, though that's certainly
klugy.  Or we might just decide that we like that plan better anyway,
since strcmp-based comparison is probably faster than strcoll-based
comparison.  (It'd be kind of nice if the choice were based on something
more principled than OID order, though I don't especially want to do
anything about that right now.)

Now, 0001 by itself doesn't do much for Pavel's complaint, because
the view is still forcing a cast to sql_identifier to be inserted
atop pg_class.relname, even though we no longer need any cast for
simple name-vs-text cases.

0002 is a really preliminary POC for eliminating CoerceToDomain nodes
at plan time if the domain has no constraints to check.  While this is
enough to check the effects on plan selection, it's certainly not
committable as-is, because the resulting plan is broken if someone then
adds a constraint to the domain.  (There is a regression test failure,
not shown in 0002, for a case that tests exactly that scenario.)

What we would have to do to make 0002 committable is to add sinval
signaling to invalidate any cached plan that's dependent on an
assumption of no constraints on a domain.  This is something that
we probably want anyway, since it would be necessary if we ever want
to compile domain constraint expressions as part of the plan tree
rather than leaving them to run-time.

While the sinval additions per se would be fairly straightforward,
I wonder whether anyone is likely to complain about the race conditions
that will ensue from not taking any locks associated with the domain
type; i.e. it's possible that a query would execute with slightly
out-of-date assumptions about what constraints a domain has.  I suspect
that we have race conditions like that already, but they might be worse
if we inspect the domain during planning rather than at executor
startup.  Is anyone worried enough about that to want to add locking
overhead to domain usage?  (I'm not.)

0001+0002 still don't get the job done: now we strip off the
CoerceToDomain successfully, but we still have "relname::varchar"
being compared to the comparison value, so we still can't match
that to the index.  0003 is a somewhat klugy solution to that part,
allowing indxpath.c to construct a name equality test from a texteq
restriction condition.  (This is semantically OK because equality
is equality in both name and text domains.  We could not convert
inequalities, at least not as freely; maybe it could be done in
C locale, but I've not done anything about that here.)

With all three patches in place, we get

regression=# explain select * from pg_class where relname::information_schema.sql_identifier = 'foo'::text;
                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Index Scan using pg_class_relname_nsp_index on pg_class  (cost=0.28..8.30 rows=7 width=781)
   Index Cond: (relname = 'foo'::text)
   Filter: ((relname)::text = 'foo'::text)
(3 rows)

so we've fixed the lack of indexscan but we're still a bit off about the
statistics.  I don't have any immediate proposal about how to fix the
latter, but anyway this is enough to make Pavel's case a lot better.

0003 essentially converts "namecol::text texteq textvalue" into
"namecol nameeqtext textvalue", relying on the new equality
operator introduced by 0001.  Another way we could approach
this is to dispense with 0001 altogether, and use a variant of
0003 that converts such a clause to "namecol nameeq textvalue::name".
At first glance it seems like that wouldn't work quite right:
the cast to name will silently truncate overlength text values,
which could allow such a value to be found equal to a name that
it shouldn't be equal to.  However, it really would be OK because
the context is that we're assuming the converted clause is lossy
(i.e. might have false matches), and so we'll recheck the original
clause as a filter condition, which will get rid of such false
matches.  The recheck behavior is just slightly-annoying overhead
in 0003 as presented, but it'd be essential in the other version.

I'm not entirely sure whether to go with 0001+0003 or this alternative
approach.  The alternative is surely far less invasive; 0001 might have
more side effects I haven't thought of.  However, my gut feeling is
that 0001 would be a good thing on balance because it'd give the system
considerably more information about name-vs-text comparisons than it has
now.  In principle that should lead to better plans for other cases
besides the narrow one we're specifically thinking of.

Comments, complaints, other ideas?

            regards, tom lane

diff --git a/src/backend/access/nbtree/nbtcompare.c b/src/backend/access/nbtree/nbtcompare.c
index 6f2ad23..103c97a 100644
*** a/src/backend/access/nbtree/nbtcompare.c
--- b/src/backend/access/nbtree/nbtcompare.c
*************** btnamesortsupport(PG_FUNCTION_ARGS)
*** 360,362 ****
--- 360,408 ----
      ssup->comparator = btnamefastcmp;
      PG_RETURN_VOID();
  }
+
+ Datum
+ btnametextcmp(PG_FUNCTION_ARGS)
+ {
+     Name        arg1 = PG_GETARG_NAME(0);
+     text       *arg2 = PG_GETARG_TEXT_PP(1);
+     size_t        len1 = strlen(NameStr(*arg1));
+     size_t        len2 = VARSIZE_ANY_EXHDR(arg2);
+     int32        result;
+
+     result = memcmp(NameStr(*arg1), VARDATA_ANY(arg2), Min(len1, len2));
+     if (result == 0)
+     {
+         if (len1 > len2)
+             result = A_GREATER_THAN_B;
+         else if (len1 < len2)
+             result = A_LESS_THAN_B;
+     }
+
+     PG_FREE_IF_COPY(arg2, 1);
+
+     PG_RETURN_INT32(result);
+ }
+
+ Datum
+ bttextnamecmp(PG_FUNCTION_ARGS)
+ {
+     text       *arg1 = PG_GETARG_TEXT_PP(0);
+     Name        arg2 = PG_GETARG_NAME(1);
+     size_t        len1 = VARSIZE_ANY_EXHDR(arg1);
+     size_t        len2 = strlen(NameStr(*arg2));
+     int32        result;
+
+     result = memcmp(VARDATA_ANY(arg1), NameStr(*arg2), Min(len1, len2));
+     if (result == 0)
+     {
+         if (len1 > len2)
+             result = A_GREATER_THAN_B;
+         else if (len1 < len2)
+             result = A_LESS_THAN_B;
+     }
+
+     PG_FREE_IF_COPY(arg1, 0);
+
+     PG_RETURN_INT32(result);
+ }
diff --git a/src/backend/utils/adt/name.c b/src/backend/utils/adt/name.c
index c266da2..1ae3214 100644
*** a/src/backend/utils/adt/name.c
--- b/src/backend/utils/adt/name.c
*************** namege(PG_FUNCTION_ARGS)
*** 184,191 ****
      PG_RETURN_BOOL(strncmp(NameStr(*arg1), NameStr(*arg2), NAMEDATALEN) >= 0);
  }


! /* (see char.c for comparison/operation routines) */

  int
  namecpy(Name n1, const NameData *n2)
--- 184,292 ----
      PG_RETURN_BOOL(strncmp(NameStr(*arg1), NameStr(*arg2), NAMEDATALEN) >= 0);
  }

+ /*
+  * Cross-type comparison functions using name (i.e., strcmp) semantics.
+  */

! Datum
! nameeqtext(PG_FUNCTION_ARGS)
! {
!     PG_RETURN_BOOL(DatumGetInt32(DirectFunctionCall2(btnametextcmp,
!                                                      PG_GETARG_DATUM(0),
!                                                      PG_GETARG_DATUM(1))) == 0);
! }
!
! Datum
! namenetext(PG_FUNCTION_ARGS)
! {
!     PG_RETURN_BOOL(DatumGetInt32(DirectFunctionCall2(btnametextcmp,
!                                                      PG_GETARG_DATUM(0),
!                                                      PG_GETARG_DATUM(1))) != 0);
! }
!
! Datum
! namelttext(PG_FUNCTION_ARGS)
! {
!     PG_RETURN_BOOL(DatumGetInt32(DirectFunctionCall2(btnametextcmp,
!                                                      PG_GETARG_DATUM(0),
!                                                      PG_GETARG_DATUM(1))) < 0);
! }
!
! Datum
! nameletext(PG_FUNCTION_ARGS)
! {
!     PG_RETURN_BOOL(DatumGetInt32(DirectFunctionCall2(btnametextcmp,
!                                                      PG_GETARG_DATUM(0),
!                                                      PG_GETARG_DATUM(1))) <= 0);
! }
!
! Datum
! namegttext(PG_FUNCTION_ARGS)
! {
!     PG_RETURN_BOOL(DatumGetInt32(DirectFunctionCall2(btnametextcmp,
!                                                      PG_GETARG_DATUM(0),
!                                                      PG_GETARG_DATUM(1))) > 0);
! }
!
! Datum
! namegetext(PG_FUNCTION_ARGS)
! {
!     PG_RETURN_BOOL(DatumGetInt32(DirectFunctionCall2(btnametextcmp,
!                                                      PG_GETARG_DATUM(0),
!                                                      PG_GETARG_DATUM(1))) >= 0);
! }
!
! Datum
! texteqname(PG_FUNCTION_ARGS)
! {
!     PG_RETURN_BOOL(DatumGetInt32(DirectFunctionCall2(bttextnamecmp,
!                                                      PG_GETARG_DATUM(0),
!                                                      PG_GETARG_DATUM(1))) == 0);
! }
!
! Datum
! textnename(PG_FUNCTION_ARGS)
! {
!     PG_RETURN_BOOL(DatumGetInt32(DirectFunctionCall2(bttextnamecmp,
!                                                      PG_GETARG_DATUM(0),
!                                                      PG_GETARG_DATUM(1))) != 0);
! }
!
! Datum
! textltname(PG_FUNCTION_ARGS)
! {
!     PG_RETURN_BOOL(DatumGetInt32(DirectFunctionCall2(bttextnamecmp,
!                                                      PG_GETARG_DATUM(0),
!                                                      PG_GETARG_DATUM(1))) < 0);
! }
!
! Datum
! textlename(PG_FUNCTION_ARGS)
! {
!     PG_RETURN_BOOL(DatumGetInt32(DirectFunctionCall2(bttextnamecmp,
!                                                      PG_GETARG_DATUM(0),
!                                                      PG_GETARG_DATUM(1))) <= 0);
! }
!
! Datum
! textgtname(PG_FUNCTION_ARGS)
! {
!     PG_RETURN_BOOL(DatumGetInt32(DirectFunctionCall2(bttextnamecmp,
!                                                      PG_GETARG_DATUM(0),
!                                                      PG_GETARG_DATUM(1))) > 0);
! }
!
! Datum
! textgename(PG_FUNCTION_ARGS)
! {
!     PG_RETURN_BOOL(DatumGetInt32(DirectFunctionCall2(bttextnamecmp,
!                                                      PG_GETARG_DATUM(0),
!                                                      PG_GETARG_DATUM(1))) >= 0);
! }
!
! /*
!  * Miscellaneous support functions
!  */

  int
  namecpy(Name n1, const NameData *n2)
diff --git a/src/include/catalog/pg_amop.dat b/src/include/catalog/pg_amop.dat
index 075a54c..914cc2a 100644
*** a/src/include/catalog/pg_amop.dat
--- b/src/include/catalog/pg_amop.dat
***************
*** 317,322 ****
--- 317,373 ----
    amoprighttype => 'name', amopstrategy => '5', amopopr => '>(name,name)',
    amopmethod => 'btree' },

+ # crosstype operators name v text
+ { amopfamily => 'btree/name_ops', amoplefttype => 'name',
+   amoprighttype => 'text', amopstrategy => '1', amopopr => '~<~(name,text)',
+   amopmethod => 'btree' },
+ { amopfamily => 'btree/name_ops', amoplefttype => 'name',
+   amoprighttype => 'text', amopstrategy => '2', amopopr => '~<=~(name,text)',
+   amopmethod => 'btree' },
+ { amopfamily => 'btree/name_ops', amoplefttype => 'name',
+   amoprighttype => 'text', amopstrategy => '3', amopopr => '=(name,text)',
+   amopmethod => 'btree' },
+ { amopfamily => 'btree/name_ops', amoplefttype => 'name',
+   amoprighttype => 'text', amopstrategy => '4', amopopr => '~>=~(name,text)',
+   amopmethod => 'btree' },
+ { amopfamily => 'btree/name_ops', amoplefttype => 'name',
+   amoprighttype => 'text', amopstrategy => '5', amopopr => '~>~(name,text)',
+   amopmethod => 'btree' },
+
+ # crosstype operators text v name
+ { amopfamily => 'btree/name_ops', amoplefttype => 'text',
+   amoprighttype => 'name', amopstrategy => '1', amopopr => '~<~(text,name)',
+   amopmethod => 'btree' },
+ { amopfamily => 'btree/name_ops', amoplefttype => 'text',
+   amoprighttype => 'name', amopstrategy => '2', amopopr => '~<=~(text,name)',
+   amopmethod => 'btree' },
+ { amopfamily => 'btree/name_ops', amoplefttype => 'text',
+   amoprighttype => 'name', amopstrategy => '3', amopopr => '=(text,name)',
+   amopmethod => 'btree' },
+ { amopfamily => 'btree/name_ops', amoplefttype => 'text',
+   amoprighttype => 'name', amopstrategy => '4', amopopr => '~>=~(text,name)',
+   amopmethod => 'btree' },
+ { amopfamily => 'btree/name_ops', amoplefttype => 'text',
+   amoprighttype => 'name', amopstrategy => '5', amopopr => '~>~(text,name)',
+   amopmethod => 'btree' },
+
+ # crosstype operators text v text (same as text_pattern_ops)
+ { amopfamily => 'btree/name_ops', amoplefttype => 'text',
+   amoprighttype => 'text', amopstrategy => '1', amopopr => '~<~(text,text)',
+   amopmethod => 'btree' },
+ { amopfamily => 'btree/name_ops', amoplefttype => 'text',
+   amoprighttype => 'text', amopstrategy => '2', amopopr => '~<=~(text,text)',
+   amopmethod => 'btree' },
+ { amopfamily => 'btree/name_ops', amoplefttype => 'text',
+   amoprighttype => 'text', amopstrategy => '3', amopopr => '=(text,text)',
+   amopmethod => 'btree' },
+ { amopfamily => 'btree/name_ops', amoplefttype => 'text',
+   amoprighttype => 'text', amopstrategy => '4', amopopr => '~>=~(text,text)',
+   amopmethod => 'btree' },
+ { amopfamily => 'btree/name_ops', amoplefttype => 'text',
+   amoprighttype => 'text', amopstrategy => '5', amopopr => '~>~(text,text)',
+   amopmethod => 'btree' },
+
  # btree text_ops

  { amopfamily => 'btree/text_ops', amoplefttype => 'text',
diff --git a/src/include/catalog/pg_amproc.dat b/src/include/catalog/pg_amproc.dat
index 0ef2c08..e0bf039 100644
*** a/src/include/catalog/pg_amproc.dat
--- b/src/include/catalog/pg_amproc.dat
***************
*** 152,157 ****
--- 152,166 ----
    amprocrighttype => 'name', amprocnum => '1', amproc => 'btnamecmp' },
  { amprocfamily => 'btree/name_ops', amproclefttype => 'name',
    amprocrighttype => 'name', amprocnum => '2', amproc => 'btnamesortsupport' },
+ { amprocfamily => 'btree/name_ops', amproclefttype => 'name',
+   amprocrighttype => 'text', amprocnum => '1', amproc => 'btnametextcmp' },
+ { amprocfamily => 'btree/name_ops', amproclefttype => 'text',
+   amprocrighttype => 'name', amprocnum => '1', amproc => 'bttextnamecmp' },
+ { amprocfamily => 'btree/name_ops', amproclefttype => 'text',
+   amprocrighttype => 'text', amprocnum => '1', amproc => 'bttext_pattern_cmp' },
+ { amprocfamily => 'btree/name_ops', amproclefttype => 'text',
+   amprocrighttype => 'text', amprocnum => '2',
+   amproc => 'bttext_pattern_sortsupport' },
  { amprocfamily => 'btree/numeric_ops', amproclefttype => 'numeric',
    amprocrighttype => 'numeric', amprocnum => '1', amproc => 'numeric_cmp' },
  { amprocfamily => 'btree/numeric_ops', amproclefttype => 'numeric',
diff --git a/src/include/catalog/pg_opclass.dat b/src/include/catalog/pg_opclass.dat
index 13928ba..9ea8783 100644
*** a/src/include/catalog/pg_opclass.dat
--- b/src/include/catalog/pg_opclass.dat
***************
*** 96,101 ****
--- 96,103 ----
  { opcmethod => 'btree', opcname => 'name_ops', opcfamily => 'btree/name_ops',
    opcintype => 'name', opckeytype => 'cstring' },

+ { opcmethod => 'btree', opcname => 'name_text_ops',
+   opcfamily => 'btree/name_ops', opcintype => 'text', opcdefault => 'f' },
  { opcmethod => 'hash', opcname => 'name_ops', opcfamily => 'hash/name_ops',
    opcintype => 'name' },
  { oid => '3125', oid_symbol => 'NUMERIC_BTREE_OPS_OID',
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index ce23c2f..d488c85 100644
*** a/src/include/catalog/pg_operator.dat
--- b/src/include/catalog/pg_operator.dat
***************
*** 74,80 ****
    oprright => 'char', oprresult => 'bool', oprcom => '=(char,char)',
    oprnegate => '<>(char,char)', oprcode => 'chareq', oprrest => 'eqsel',
    oprjoin => 'eqjoinsel' },
! { oid => '93', descr => 'equal',
    oprname => '=', oprcanmerge => 't', oprcanhash => 't', oprleft => 'name',
    oprright => 'name', oprresult => 'bool', oprcom => '=(name,name)',
    oprnegate => '<>(name,name)', oprcode => 'nameeq', oprrest => 'eqsel',
--- 74,80 ----
    oprright => 'char', oprresult => 'bool', oprcom => '=(char,char)',
    oprnegate => '<>(char,char)', oprcode => 'chareq', oprrest => 'eqsel',
    oprjoin => 'eqjoinsel' },
! { oid => '93', oid_symbol => 'NameEqualOperator', descr => 'equal',
    oprname => '=', oprcanmerge => 't', oprcanhash => 't', oprleft => 'name',
    oprright => 'name', oprresult => 'bool', oprcom => '=(name,name)',
    oprnegate => '<>(name,name)', oprcode => 'nameeq', oprrest => 'eqsel',
***************
*** 107,112 ****
--- 107,169 ----
    oprcode => 'starts_with', oprrest => 'prefixsel',
    oprjoin => 'prefixjoinsel' },

+ { oid => '254', oid_symbol => 'NameEqualTextOperator', descr => 'equal',
+   oprname => '=', oprcanmerge => 't', oprleft => 'name', oprright => 'text',
+   oprresult => 'bool', oprcom => '=(text,name)', oprnegate => '<>(name,text)',
+   oprcode => 'nameeqtext', oprrest => 'eqsel', oprjoin => 'eqjoinsel' },
+ { oid => '255', descr => 'less than',
+   oprname => '~<~', oprleft => 'name', oprright => 'text', oprresult => 'bool',
+   oprcom => '~>~(text,name)', oprnegate => '~>=~(name,text)',
+   oprcode => 'namelttext', oprrest => 'scalarltsel',
+   oprjoin => 'scalarltjoinsel' },
+ { oid => '256', descr => 'less than or equal',
+   oprname => '~<=~', oprleft => 'name', oprright => 'text', oprresult => 'bool',
+   oprcom => '~>=~(text,name)', oprnegate => '~>~(name,text)',
+   oprcode => 'nameletext', oprrest => 'scalarlesel',
+   oprjoin => 'scalarlejoinsel' },
+ { oid => '257', descr => 'greater than or equal',
+   oprname => '~>=~', oprleft => 'name', oprright => 'text', oprresult => 'bool',
+   oprcom => '~<=~(text,name)', oprnegate => '~<~(name,text)',
+   oprcode => 'namegetext', oprrest => 'scalargesel',
+   oprjoin => 'scalargejoinsel' },
+ { oid => '258', descr => 'greater than',
+   oprname => '~>~', oprleft => 'name', oprright => 'text', oprresult => 'bool',
+   oprcom => '~<~(text,name)', oprnegate => '~<=~(name,text)',
+   oprcode => 'namegttext', oprrest => 'scalargtsel',
+   oprjoin => 'scalargtjoinsel' },
+ { oid => '259', descr => 'not equal',
+   oprname => '<>', oprleft => 'name', oprright => 'text', oprresult => 'bool',
+   oprcom => '<>(text,name)', oprnegate => '=(name,text)',
+   oprcode => 'namenetext', oprrest => 'neqsel', oprjoin => 'neqjoinsel' },
+ { oid => '260', oid_symbol => 'TextEqualNameOperator', descr => 'equal',
+   oprname => '=', oprcanmerge => 't', oprleft => 'text', oprright => 'name',
+   oprresult => 'bool', oprcom => '=(name,text)', oprnegate => '<>(text,name)',
+   oprcode => 'texteqname', oprrest => 'eqsel', oprjoin => 'eqjoinsel' },
+ { oid => '261', descr => 'less than',
+   oprname => '~<~', oprleft => 'text', oprright => 'name', oprresult => 'bool',
+   oprcom => '~>~(name,text)', oprnegate => '~>=~(text,name)',
+   oprcode => 'textltname', oprrest => 'scalarltsel',
+   oprjoin => 'scalarltjoinsel' },
+ { oid => '262', descr => 'less than or equal',
+   oprname => '~<=~', oprleft => 'text', oprright => 'name', oprresult => 'bool',
+   oprcom => '~>=~(name,text)', oprnegate => '~>~(text,name)',
+   oprcode => 'textlename', oprrest => 'scalarlesel',
+   oprjoin => 'scalarlejoinsel' },
+ { oid => '263', descr => 'greater than or equal',
+   oprname => '~>=~', oprleft => 'text', oprright => 'name', oprresult => 'bool',
+   oprcom => '~<=~(name,text)', oprnegate => '~<~(text,name)',
+   oprcode => 'textgename', oprrest => 'scalargesel',
+   oprjoin => 'scalargejoinsel' },
+ { oid => '264', descr => 'greater than',
+   oprname => '~>~', oprleft => 'text', oprright => 'name', oprresult => 'bool',
+   oprcom => '~<~(name,text)', oprnegate => '~<=~(text,name)',
+   oprcode => 'textgtname', oprrest => 'scalargtsel',
+   oprjoin => 'scalargtjoinsel' },
+ { oid => '265', descr => 'not equal',
+   oprname => '<>', oprleft => 'text', oprright => 'name', oprresult => 'bool',
+   oprcom => '<>(name,text)', oprnegate => '=(text,name)',
+   oprcode => 'textnename', oprrest => 'neqsel', oprjoin => 'neqjoinsel' },
+
  { oid => '349', descr => 'append element onto end of array',
    oprname => '||', oprleft => 'anyarray', oprright => 'anyelement',
    oprresult => 'anyarray', oprcode => 'array_append' },
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 034a41e..b74a4cb 100644
*** a/src/include/catalog/pg_proc.dat
--- b/src/include/catalog/pg_proc.dat
***************
*** 673,678 ****
--- 673,721 ----
    proname => 'line_distance', prorettype => 'float8',
    proargtypes => 'line line', prosrc => 'line_distance' },

+ { oid => '240',
+   proname => 'nameeqtext', proleakproof => 't', prorettype => 'bool',
+   proargtypes => 'name text', prosrc => 'nameeqtext' },
+ { oid => '241',
+   proname => 'namelttext', proleakproof => 't', prorettype => 'bool',
+   proargtypes => 'name text', prosrc => 'namelttext' },
+ { oid => '242',
+   proname => 'nameletext', proleakproof => 't', prorettype => 'bool',
+   proargtypes => 'name text', prosrc => 'nameletext' },
+ { oid => '243',
+   proname => 'namegetext', proleakproof => 't', prorettype => 'bool',
+   proargtypes => 'name text', prosrc => 'namegetext' },
+ { oid => '244',
+   proname => 'namegttext', proleakproof => 't', prorettype => 'bool',
+   proargtypes => 'name text', prosrc => 'namegttext' },
+ { oid => '245',
+   proname => 'namenetext', proleakproof => 't', prorettype => 'bool',
+   proargtypes => 'name text', prosrc => 'namenetext' },
+ { oid => '246', descr => 'less-equal-greater',
+   proname => 'btnametextcmp', proleakproof => 't', prorettype => 'int4',
+   proargtypes => 'name text', prosrc => 'btnametextcmp' },
+ { oid => '247',
+   proname => 'texteqname', proleakproof => 't', prorettype => 'bool',
+   proargtypes => 'text name', prosrc => 'texteqname' },
+ { oid => '248',
+   proname => 'textltname', proleakproof => 't', prorettype => 'bool',
+   proargtypes => 'text name', prosrc => 'textltname' },
+ { oid => '249',
+   proname => 'textlename', proleakproof => 't', prorettype => 'bool',
+   proargtypes => 'text name', prosrc => 'textlename' },
+ { oid => '250',
+   proname => 'textgename', proleakproof => 't', prorettype => 'bool',
+   proargtypes => 'text name', prosrc => 'textgename' },
+ { oid => '251',
+   proname => 'textgtname', proleakproof => 't', prorettype => 'bool',
+   proargtypes => 'text name', prosrc => 'textgtname' },
+ { oid => '252',
+   proname => 'textnename', proleakproof => 't', prorettype => 'bool',
+   proargtypes => 'text name', prosrc => 'textnename' },
+ { oid => '253', descr => 'less-equal-greater',
+   proname => 'bttextnamecmp', proleakproof => 't', prorettype => 'int4',
+   proargtypes => 'text name', prosrc => 'bttextnamecmp' },
+
  { oid => '274',
    descr => 'current date and time - increments during transactions',
    proname => 'timeofday', provolatile => 'v', prorettype => 'text',
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 1f53780..a72881a 100644
*** a/src/test/regress/expected/join.out
--- b/src/test/regress/expected/join.out
*************** from
*** 4393,4410 ****
       left join uniquetbl u1 ON u1.f1 = t1.string4) ss
    on t0.f1 = ss.case1
  where ss.stringu2 !~* ss.case1;
!                                          QUERY PLAN
! --------------------------------------------------------------------------------------------
!  Nested Loop
!    Join Filter: (CASE t1.ten WHEN 0 THEN 'doh!'::text ELSE NULL::text END = t0.f1)
     ->  Nested Loop
!          ->  Seq Scan on int4_tbl i4
!          ->  Index Scan using tenk1_unique2 on tenk1 t1
!                Index Cond: (unique2 = i4.f1)
!                Filter: (stringu2 !~* CASE ten WHEN 0 THEN 'doh!'::text ELSE NULL::text END)
!    ->  Materialize
!          ->  Seq Scan on text_tbl t0
! (9 rows)

  select t0.*
  from
--- 4393,4413 ----
       left join uniquetbl u1 ON u1.f1 = t1.string4) ss
    on t0.f1 = ss.case1
  where ss.stringu2 !~* ss.case1;
!                                             QUERY PLAN
! --------------------------------------------------------------------------------------------------
!  Nested Loop Left Join
!    Join Filter: (u1.f1 = t1.string4)
     ->  Nested Loop
!          Join Filter: (CASE t1.ten WHEN 0 THEN 'doh!'::text ELSE NULL::text END = t0.f1)
!          ->  Nested Loop
!                ->  Seq Scan on int4_tbl i4
!                ->  Index Scan using tenk1_unique2 on tenk1 t1
!                      Index Cond: (unique2 = i4.f1)
!                      Filter: (stringu2 !~* CASE ten WHEN 0 THEN 'doh!'::text ELSE NULL::text END)
!          ->  Materialize
!                ->  Seq Scan on text_tbl t0
!    ->  Seq Scan on uniquetbl u1
! (12 rows)

  select t0.*
  from
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 6072f6b..5edeaab 100644
*** a/src/test/regress/expected/opr_sanity.out
--- b/src/test/regress/expected/opr_sanity.out
*************** int24ge(smallint,integer)
*** 525,530 ****
--- 525,544 ----
  int42ge(integer,smallint)
  oideq(oid,oid)
  oidne(oid,oid)
+ nameeqtext(name,text)
+ namelttext(name,text)
+ nameletext(name,text)
+ namegetext(name,text)
+ namegttext(name,text)
+ namenetext(name,text)
+ btnametextcmp(name,text)
+ texteqname(text,name)
+ textltname(text,name)
+ textlename(text,name)
+ textgename(text,name)
+ textgtname(text,name)
+ textnename(text,name)
+ bttextnamecmp(text,name)
  float4eq(real,real)
  float4ne(real,real)
  float4lt(real,real)
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index c55de5d..a000695 100644
*** a/src/test/regress/expected/partition_join.out
--- b/src/test/regress/expected/partition_join.out
*************** SELECT t1.a, t2.b FROM prt1 t1, prt2 t2
*** 1045,1058 ****
   Merge Join
     Merge Cond: ((t1.a = t2.b) AND (((((t1.*)::prt1))::text) = ((((t2.*)::prt2))::text)))
     ->  Sort
!          Sort Key: t1.a, ((((t1.*)::prt1))::text)
           ->  Result
                 ->  Append
                       ->  Seq Scan on prt1_p1 t1
                       ->  Seq Scan on prt1_p2 t1_1
                       ->  Seq Scan on prt1_p3 t1_2
     ->  Sort
!          Sort Key: t2.b, ((((t2.*)::prt2))::text)
           ->  Result
                 ->  Append
                       ->  Seq Scan on prt2_p1 t2
--- 1045,1058 ----
   Merge Join
     Merge Cond: ((t1.a = t2.b) AND (((((t1.*)::prt1))::text) = ((((t2.*)::prt2))::text)))
     ->  Sort
!          Sort Key: t1.a, ((((t1.*)::prt1))::text) USING ~<~
           ->  Result
                 ->  Append
                       ->  Seq Scan on prt1_p1 t1
                       ->  Seq Scan on prt1_p2 t1_1
                       ->  Seq Scan on prt1_p3 t1_2
     ->  Sort
!          Sort Key: t2.b, ((((t2.*)::prt2))::text) USING ~<~
           ->  Result
                 ->  Append
                       ->  Seq Scan on prt2_p1 t2
diff --git a/src/test/regress/expected/select_views.out b/src/test/regress/expected/select_views.out
index bf003ad..1aeed84 100644
*** a/src/test/regress/expected/select_views.out
--- b/src/test/regress/expected/select_views.out
*************** NOTICE:  f_leak => hamburger
*** 1326,1335 ****
  (1 row)

  EXPLAIN (COSTS OFF) SELECT * FROM my_property_normal WHERE f_leak(passwd);
!                           QUERY PLAN
! --------------------------------------------------------------
   Seq Scan on customer
!    Filter: (f_leak(passwd) AND (name = (CURRENT_USER)::text))
  (2 rows)

  SELECT * FROM my_property_secure WHERE f_leak(passwd);
--- 1326,1335 ----
  (1 row)

  EXPLAIN (COSTS OFF) SELECT * FROM my_property_normal WHERE f_leak(passwd);
!                       QUERY PLAN
! ------------------------------------------------------
   Seq Scan on customer
!    Filter: (f_leak(passwd) AND (name = CURRENT_USER))
  (2 rows)

  SELECT * FROM my_property_secure WHERE f_leak(passwd);
*************** NOTICE:  f_leak => passwd123
*** 1340,1351 ****
  (1 row)

  EXPLAIN (COSTS OFF) SELECT * FROM my_property_secure WHERE f_leak(passwd);
!                   QUERY PLAN
! -----------------------------------------------
   Subquery Scan on my_property_secure
     Filter: f_leak(my_property_secure.passwd)
     ->  Seq Scan on customer
!          Filter: (name = (CURRENT_USER)::text)
  (4 rows)

  --
--- 1340,1351 ----
  (1 row)

  EXPLAIN (COSTS OFF) SELECT * FROM my_property_secure WHERE f_leak(passwd);
!                  QUERY PLAN
! ---------------------------------------------
   Subquery Scan on my_property_secure
     Filter: f_leak(my_property_secure.passwd)
     ->  Seq Scan on customer
!          Filter: (name = CURRENT_USER)
  (4 rows)

  --
*************** NOTICE:  f_leak => hamburger
*** 1367,1376 ****

  EXPLAIN (COSTS OFF) SELECT * FROM my_property_normal v
          WHERE f_leak('passwd') AND f_leak(passwd);
!                                        QUERY PLAN
! -----------------------------------------------------------------------------------------
   Seq Scan on customer
!    Filter: (f_leak('passwd'::text) AND f_leak(passwd) AND (name = (CURRENT_USER)::text))
  (2 rows)

  SELECT * FROM my_property_secure v
--- 1367,1376 ----

  EXPLAIN (COSTS OFF) SELECT * FROM my_property_normal v
          WHERE f_leak('passwd') AND f_leak(passwd);
!                                    QUERY PLAN
! ---------------------------------------------------------------------------------
   Seq Scan on customer
!    Filter: (f_leak('passwd'::text) AND f_leak(passwd) AND (name = CURRENT_USER))
  (2 rows)

  SELECT * FROM my_property_secure v
*************** NOTICE:  f_leak => passwd
*** 1386,1397 ****

  EXPLAIN (COSTS OFF) SELECT * FROM my_property_secure v
          WHERE f_leak('passwd') AND f_leak(passwd);
!                                  QUERY PLAN
! ----------------------------------------------------------------------------
   Subquery Scan on v
     Filter: f_leak(v.passwd)
     ->  Seq Scan on customer
!          Filter: (f_leak('passwd'::text) AND (name = (CURRENT_USER)::text))
  (4 rows)

  --
--- 1386,1397 ----

  EXPLAIN (COSTS OFF) SELECT * FROM my_property_secure v
          WHERE f_leak('passwd') AND f_leak(passwd);
!                              QUERY PLAN
! --------------------------------------------------------------------
   Subquery Scan on v
     Filter: f_leak(v.passwd)
     ->  Seq Scan on customer
!          Filter: (f_leak('passwd'::text) AND (name = CURRENT_USER))
  (4 rows)

  --
*************** NOTICE:  f_leak => 9801-2345-6789-0123
*** 1409,1423 ****
  (1 row)

  EXPLAIN (COSTS OFF) SELECT * FROM my_credit_card_normal WHERE f_leak(cnum);
!                      QUERY PLAN
! -----------------------------------------------------
   Hash Join
     Hash Cond: (r.cid = l.cid)
     ->  Seq Scan on credit_card r
           Filter: f_leak(cnum)
     ->  Hash
           ->  Seq Scan on customer l
!                Filter: (name = (CURRENT_USER)::text)
  (7 rows)

  SELECT * FROM my_credit_card_secure WHERE f_leak(cnum);
--- 1409,1423 ----
  (1 row)

  EXPLAIN (COSTS OFF) SELECT * FROM my_credit_card_normal WHERE f_leak(cnum);
!                  QUERY PLAN
! ---------------------------------------------
   Hash Join
     Hash Cond: (r.cid = l.cid)
     ->  Seq Scan on credit_card r
           Filter: f_leak(cnum)
     ->  Hash
           ->  Seq Scan on customer l
!                Filter: (name = CURRENT_USER)
  (7 rows)

  SELECT * FROM my_credit_card_secure WHERE f_leak(cnum);
*************** NOTICE:  f_leak => 1111-2222-3333-4444
*** 1428,1435 ****
  (1 row)

  EXPLAIN (COSTS OFF) SELECT * FROM my_credit_card_secure WHERE f_leak(cnum);
!                         QUERY PLAN
! -----------------------------------------------------------
   Subquery Scan on my_credit_card_secure
     Filter: f_leak(my_credit_card_secure.cnum)
     ->  Hash Join
--- 1428,1435 ----
  (1 row)

  EXPLAIN (COSTS OFF) SELECT * FROM my_credit_card_secure WHERE f_leak(cnum);
!                     QUERY PLAN
! ---------------------------------------------------
   Subquery Scan on my_credit_card_secure
     Filter: f_leak(my_credit_card_secure.cnum)
     ->  Hash Join
*************** EXPLAIN (COSTS OFF) SELECT * FROM my_cre
*** 1437,1443 ****
           ->  Seq Scan on credit_card r
           ->  Hash
                 ->  Seq Scan on customer l
!                      Filter: (name = (CURRENT_USER)::text)
  (8 rows)

  --
--- 1437,1443 ----
           ->  Seq Scan on credit_card r
           ->  Hash
                 ->  Seq Scan on customer l
!                      Filter: (name = CURRENT_USER)
  (8 rows)

  --
*************** EXPLAIN (COSTS OFF) SELECT * FROM my_cre
*** 1471,1477 ****
                       ->  Seq Scan on credit_card r_1
                       ->  Hash
                             ->  Seq Scan on customer l_1
!                                  Filter: (name = (CURRENT_USER)::text)
  (13 rows)

  SELECT * FROM my_credit_card_usage_secure
--- 1471,1477 ----
                       ->  Seq Scan on credit_card r_1
                       ->  Hash
                             ->  Seq Scan on customer l_1
!                                  Filter: (name = CURRENT_USER)
  (13 rows)

  SELECT * FROM my_credit_card_usage_secure
*************** EXPLAIN (COSTS OFF) SELECT * FROM my_cre
*** 1502,1508 ****
                       ->  Seq Scan on credit_card r_1
                       ->  Hash
                             ->  Seq Scan on customer l
!                                  Filter: (name = (CURRENT_USER)::text)
  (13 rows)

  --
--- 1502,1508 ----
                       ->  Seq Scan on credit_card r_1
                       ->  Hash
                             ->  Seq Scan on customer l
!                                  Filter: (name = CURRENT_USER)
  (13 rows)

  --
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 8df3693..34c8f68 100644
*** a/src/backend/optimizer/util/clauses.c
--- b/src/backend/optimizer/util/clauses.c
*************** eval_const_expressions_mutator(Node *nod
*** 3699,3704 ****
--- 3699,3761 ----
                  newbtest->location = btest->location;
                  return (Node *) newbtest;
              }
+         case T_CoerceToDomain:
+             {
+                 /*
+                  * If the domain currently has no constraints, we replace the
+                  * CoerceToDomain node with a simple RelabelType, which is
+                  * both far faster to execute and more amenable to later
+                  * optimization.  (XXX this would need to be backed up with
+                  * some plan invalidation support, but leave that for later.)
+                  *
+                  * Also, in estimation mode, just drop the CoerceToDomain
+                  * node, effectively assuming that the coercion will succeed.
+                  */
+                 CoerceToDomain *cdomain = (CoerceToDomain *) node;
+                 CoerceToDomain *newcdomain;
+                 Node       *arg;
+
+                 arg = eval_const_expressions_mutator((Node *) cdomain->arg,
+                                                      context);
+                 if (context->estimate ||
+                     !DomainHasConstraints(cdomain->resulttype))
+                 {
+                     /* This bit should match the RelabelType logic above */
+                     while (arg && IsA(arg, RelabelType))
+                         arg = (Node *) ((RelabelType *) arg)->arg;
+
+                     if (arg && IsA(arg, Const))
+                     {
+                         Const       *con = (Const *) arg;
+
+                         con->consttype = cdomain->resulttype;
+                         con->consttypmod = cdomain->resulttypmod;
+                         con->constcollid = cdomain->resultcollid;
+                         return (Node *) con;
+                     }
+                     else
+                     {
+                         RelabelType *newrelabel = makeNode(RelabelType);
+
+                         newrelabel->arg = (Expr *) arg;
+                         newrelabel->resulttype = cdomain->resulttype;
+                         newrelabel->resulttypmod = cdomain->resulttypmod;
+                         newrelabel->resultcollid = cdomain->resultcollid;
+                         newrelabel->relabelformat = cdomain->coercionformat;
+                         newrelabel->location = cdomain->location;
+                         return (Node *) newrelabel;
+                     }
+                 }
+
+                 newcdomain = makeNode(CoerceToDomain);
+                 newcdomain->arg = (Expr *) arg;
+                 newcdomain->resulttype = cdomain->resulttype;
+                 newcdomain->resulttypmod = cdomain->resulttypmod;
+                 newcdomain->resultcollid = cdomain->resultcollid;
+                 newcdomain->coercionformat = cdomain->coercionformat;
+                 newcdomain->location = cdomain->location;
+                 return (Node *) newcdomain;
+             }
          case T_PlaceHolderVar:

              /*
*************** eval_const_expressions_mutator(Node *nod
*** 3770,3776 ****
       * For any node type not handled above, copy the node unchanged but
       * const-simplify its subexpressions.  This is the correct thing for node
       * types whose behavior might change between planning and execution, such
!      * as CoerceToDomain.  It's also a safe default for new node types not
       * known to this routine.
       */
      return ece_generic_processing(node);
--- 3827,3833 ----
       * For any node type not handled above, copy the node unchanged but
       * const-simplify its subexpressions.  This is the correct thing for node
       * types whose behavior might change between planning and execution, such
!      * as CurrentOfExpr.  It's also a safe default for new node types not
       * known to this routine.
       */
      return ece_generic_processing(node);
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 5f46415..c490364 100644
*** a/src/backend/optimizer/path/indxpath.c
--- b/src/backend/optimizer/path/indxpath.c
***************
*** 35,40 ****
--- 35,41 ----
  #include "optimizer/var.h"
  #include "utils/builtins.h"
  #include "utils/bytea.h"
+ #include "utils/fmgroids.h"
  #include "utils/lsyscache.h"
  #include "utils/pg_locale.h"
  #include "utils/selfuncs.h"
***************
*** 44,49 ****
--- 45,53 ----
  #define IndexCollMatchesExprColl(idxcollation, exprcollation) \
      ((idxcollation) == InvalidOid || (idxcollation) == (exprcollation))

+ /* XXX this ought to be available from fmgroids.h, but it isn't */
+ #define F_NAME_VARCHAR 1401
+
  /* Whether we are looking for plain indexscan, bitmap scan, or either */
  typedef enum
  {
*************** static bool ec_member_matches_indexcol(P
*** 172,177 ****
--- 176,189 ----
                             void *arg);
  static bool match_boolean_index_clause(Node *clause, int indexcol,
                             IndexOptInfo *index);
+ static bool match_name_index_clause(Node *leftop,
+                         Node *rightop,
+                         Relids left_relids,
+                         Relids right_relids,
+                         Oid expr_op,
+                         bool plain_op,
+                         int indexcol,
+                         IndexOptInfo *index);
  static bool match_special_index_operator(Expr *clause,
                               Oid opfamily, Oid idxcollation,
                               bool indexkey_on_left);
*************** match_clause_to_indexcol(IndexOptInfo *i
*** 2474,2479 ****
--- 2486,2504 ----
          return false;
      }

+     /*
+      * If we didn't match above, perhaps it is a textual operator that we
+      * could convert to a name operator.
+      */
+     if (opfamily == NAME_BTREE_FAM_OID)
+     {
+         if (match_name_index_clause(leftop, rightop,
+                                     left_relids, right_relids,
+                                     expr_op, plain_op,
+                                     indexcol, index))
+             return true;
+     }
+
      return false;
  }

*************** match_index_to_operand(Node *operand,
*** 3331,3337 ****
   * because constant simplification does the reverse transformation;
   * without this code there'd be no way to use such an index at all.)
   *
!  * Three routines are provided here:
   *
   * match_special_index_operator() is just an auxiliary function for
   * match_clause_to_indexcol(); after the latter fails to recognize a
--- 3356,3372 ----
   * because constant simplification does the reverse transformation;
   * without this code there'd be no way to use such an index at all.)
   *
!  * Another thing we do here is to support indexing of textual comparisons to
!  * "name" columns, for instance "WHERE pg_class.relname::text = something".
!  * We can handle that by replacing the text operator with a name operator.
!  *
!  * Because this infrastructure was designed for the lossy-operator case,
!  * we'll end up applying the original operator as a qpqual condition even
!  * though it is redundant in the "boolean" and "name" cases.  This is
!  * annoying, but such cases don't come up often enough to justify refactoring
!  * to avoid it (at least, not yet).
!  *
!  * Four routines are provided here:
   *
   * match_special_index_operator() is just an auxiliary function for
   * match_clause_to_indexcol(); after the latter fails to recognize a
*************** match_index_to_operand(Node *operand,
*** 3342,3353 ****
   * match_boolean_index_clause() similarly detects clauses that can be
   * converted into boolean equality operators.
   *
   * expand_indexqual_conditions() converts a list of RestrictInfo nodes
   * (with implicit AND semantics across list elements) into a list of clauses
   * that the executor can actually handle.  For operators that are members of
   * the index's opfamily this transformation is a no-op, but clauses recognized
!  * by match_special_index_operator() or match_boolean_index_clause() must be
!  * converted into one or more "regular" indexqual conditions.
   */

  /*
--- 3377,3391 ----
   * match_boolean_index_clause() similarly detects clauses that can be
   * converted into boolean equality operators.
   *
+  * match_name_index_clause() similarly detects clauses that can be
+  * converted into name comparison operators.
+  *
   * expand_indexqual_conditions() converts a list of RestrictInfo nodes
   * (with implicit AND semantics across list elements) into a list of clauses
   * that the executor can actually handle.  For operators that are members of
   * the index's opfamily this transformation is a no-op, but clauses recognized
!  * by the above routines must be converted into one or more "regular"
!  * indexqual conditions.
   */

  /*
*************** match_boolean_index_clause(Node *clause,
*** 3393,3398 ****
--- 3431,3523 ----
  }

  /*
+  * extract_name_from_text_coercion
+  *      Helper function for match_name_index_clause.
+  *
+  * If the given operand represents coercion of a "name" value to "text"
+  * or "varchar", return the "name" subexpression.  Else return NULL.
+  */
+ static Node *
+ extract_name_from_text_coercion(Node *operand)
+ {
+     /* Ignore any RelabelType node above the operand */
+     if (operand && IsA(operand, RelabelType))
+         operand = (Node *) ((RelabelType *) operand)->arg;
+
+     /* Match if it's a call to name_text(), in either of its guises */
+     if (operand && IsA(operand, FuncExpr) &&
+         (((FuncExpr *) operand)->funcid == F_NAME_TEXT ||
+          ((FuncExpr *) operand)->funcid == F_NAME_VARCHAR))
+     {
+         Assert(list_length(((FuncExpr *) operand)->args) == 1);
+         return (Node *) linitial(((FuncExpr *) operand)->args);
+     }
+
+     return NULL;
+ }
+
+ /*
+  * match_name_index_clause
+  *      Recognize restriction clauses that can be matched to a "name" index.
+  *
+  * This should be called only when the index's operator family is
+  * NAME_BTREE_FAM_OID.  We check to see if the clause is a textual comparison
+  * clause that can be converted into a name comparison matching the index.
+  * Since this is called after match_clause_to_indexcol has disassembled the
+  * clause, we rely on that work rather than re-disassembling the clause.
+  *
+  * Converting texteq is safe because text and name have the same notion of
+  * equality, viz bitwise.  If we ever support collations in which text
+  * equality is not bitwise, we'd need to reject texteq clauses using such
+  * collations.  We cannot convert textual inequality clauses because they
+  * might use a different sort order.  (XXX: maybe convert if C locale?)
+  */
+ static bool
+ match_name_index_clause(Node *leftop,
+                         Node *rightop,
+                         Relids left_relids,
+                         Relids right_relids,
+                         Oid expr_op,
+                         bool plain_op,
+                         int indexcol,
+                         IndexOptInfo *index)
+ {
+     Index        index_relid = index->rel->relid;
+     Node       *nameop;
+
+     /*
+      * For the moment, only consider text equality.
+      */
+     if (expr_op != TextEqualOperator)
+         return false;
+
+     /* We don't support ScalarArrayOp either */
+     if (!plain_op)
+         return false;
+
+     /*
+      * Check to see if we have (indexkey::text = constant), following the same
+      * rules as in match_clause_to_indexcol about what is a constant.
+      */
+     if ((nameop = extract_name_from_text_coercion(leftop)) != NULL &&
+         match_index_to_operand(nameop, indexcol, index) &&
+         !bms_is_member(index_relid, right_relids) &&
+         !contain_volatile_functions(rightop))
+         return true;
+
+     /*
+      * Also try (constant = indexkey::text).
+      */
+     if ((nameop = extract_name_from_text_coercion(rightop)) != NULL &&
+         match_index_to_operand(nameop, indexcol, index) &&
+         !bms_is_member(index_relid, left_relids) &&
+         !contain_volatile_functions(leftop))
+         return true;
+
+     return false;
+ }
+
+ /*
   * match_special_index_operator
   *      Recognize restriction clauses that can be used to generate
   *      additional indexscanable qualifications.
*************** expand_boolean_index_clause(Node *clause
*** 3732,3738 ****
   * The input is a single RestrictInfo, the output a list of RestrictInfos.
   *
   * In the base case this is just list_make1(), but we have to be prepared to
!  * expand special cases that were accepted by match_special_index_operator().
   */
  static List *
  expand_indexqual_opclause(RestrictInfo *rinfo, Oid opfamily, Oid idxcollation)
--- 3857,3864 ----
   * The input is a single RestrictInfo, the output a list of RestrictInfos.
   *
   * In the base case this is just list_make1(), but we have to be prepared to
!  * expand special cases that were accepted by match_special_index_operator()
!  * or match_name_index_clause().
   */
  static List *
  expand_indexqual_opclause(RestrictInfo *rinfo, Oid opfamily, Oid idxcollation)
*************** expand_indexqual_opclause(RestrictInfo *
*** 3814,3819 ****
--- 3940,3990 ----
                                              patt->constvalue);
              }
              break;
+
+             /*
+              * This is in lots of opclasses, but we should only change it if
+              * it was matched to a "name" index.
+              */
+         case TextEqualOperator:
+             if (opfamily == NAME_BTREE_FAM_OID)
+             {
+                 Node       *nameop;
+                 bool        leftisname = false;
+                 bool        rightisname = false;
+                 Oid            new_op;
+                 Expr       *opexpr;
+
+                 /* Drop any coercion to text on either side */
+                 if ((nameop = extract_name_from_text_coercion(leftop)) != NULL)
+                 {
+                     leftop = nameop;
+                     leftisname = true;
+                 }
+                 if ((nameop = extract_name_from_text_coercion(rightop)) != NULL)
+                 {
+                     rightop = nameop;
+                     rightisname = true;
+                 }
+                 /* Choose new operator, construct replacement clause */
+                 if (leftisname)
+                 {
+                     if (rightisname)
+                         new_op = NameEqualOperator;
+                     else
+                         new_op = NameEqualTextOperator;
+                 }
+                 else
+                 {
+                     /* Should not be here unless one side was coerced */
+                     Assert(rightisname);
+                     new_op = TextEqualNameOperator;
+                 }
+                 opexpr = make_opclause(new_op, BOOLOID, false,
+                                        (Expr *) leftop, (Expr *) rightop,
+                                        InvalidOid, InvalidOid);
+                 return list_make1(make_simple_restrictinfo(opexpr));
+             }
+             break;
      }

      /* Default case: just make a list of the unmodified indexqual */

Re: slow queries over information schema.tables

От
Andres Freund
Дата:
Hi,

On 2018-12-05 12:24:54 -0500, Tom Lane wrote:
> The core of the problem, I think, is that we're unable to convert the
> condition on table_name into an indexscan on pg_class.relname, because
> the view has cast pg_class.relname to the sql_identifier domain.
>
> There are two different issues in that.  One is that the domain might
> have constraints (though in reality it does not), so the planner can't
> throw away the CoerceToDomain node, and thus can't match the expression
> to the index.  Even if we did throw away the CoerceToDomain, it still
> would not work because the domain is declared to be over varchar, and
> so there's a cast-to-varchar underneath the CoerceToDomain.

Couldn't we make expression simplification replace CoerceToDomain with a
RelabelType if the constraint is simple enough?  That's not entirely
trivial because we'd have to look into the typecache to get the
constraints, but that doesn't sound too bad.


> The most straightforward way to fix that would be to add some cross-type
> comparison operators to the name_ops operator family.

That seems reasonable.


> While we only
> directly need 'name = text' to make this case work, the opr_sanity
> regression test will complain if we don't supply a fairly complete set of
> operators, and I think not without reason.  So I experimented with doing
> that, as in the very-much-WIP 0001 patch below.  A name index can only
> cope with strcmp-style comparison semantics, not strcoll-style, so while
> it's okay to call the equality operator = I felt we'd better call the
> inequality operators ~<~ and so on.  While the patch as presented fixes
> the case shown above, it's not all good: the problem with having a new
> 'text = name' operator is that it will also capture cases where you would
> like to have a text index working with a comparison value of type "name".
> If 'text = name' isn't part of the text_ops opclass then that doesn't
> work.  I think that the reason for the join.sql regression test failure
> shown in patch 0001 is likewise that since 'text = name' isn't part of the
> text_ops opclass, the join elimination stuff is unable to prove that it
> can eliminate a join to a unique text column.  This could probably be
> fixed by creating yet another dozen cross-type operators that implement
> text vs name comparisons using strcoll semantics (hence, using the normal
> '<' operator names), and adding that set to the text_ops opfamily.
> I didn't do that here (yet), because it seems pretty tedious.

Is there a way we could make that less laborious by allowing a bit more
casting?


> 0002 is a really preliminary POC for eliminating CoerceToDomain nodes
> at plan time if the domain has no constraints to check.  While this is
> enough to check the effects on plan selection, it's certainly not
> committable as-is, because the resulting plan is broken if someone then
> adds a constraint to the domain.  (There is a regression test failure,
> not shown in 0002, for a case that tests exactly that scenario.)

Hah.


> What we would have to do to make 0002 committable is to add sinval
> signaling to invalidate any cached plan that's dependent on an
> assumption of no constraints on a domain.  This is something that
> we probably want anyway, since it would be necessary if we ever want
> to compile domain constraint expressions as part of the plan tree
> rather than leaving them to run-time.

Yea, that seems good.  I don't like the fact that expression
initialization does the lookups for constraints right now.


> While the sinval additions per se would be fairly straightforward,
> I wonder whether anyone is likely to complain about the race conditions
> that will ensue from not taking any locks associated with the domain
> type; i.e. it's possible that a query would execute with slightly
> out-of-date assumptions about what constraints a domain has.  I suspect
> that we have race conditions like that already, but they might be worse
> if we inspect the domain during planning rather than at executor
> startup.  Is anyone worried enough about that to want to add locking
> overhead to domain usage?  (I'm not.)

I'm not either.


> 0001+0002 still don't get the job done: now we strip off the
> CoerceToDomain successfully, but we still have "relname::varchar"
> being compared to the comparison value, so we still can't match
> that to the index.  0003 is a somewhat klugy solution to that part,
> allowing indxpath.c to construct a name equality test from a texteq
> restriction condition.  (This is semantically OK because equality
> is equality in both name and text domains.  We could not convert
> inequalities, at least not as freely; maybe it could be done in
> C locale, but I've not done anything about that here.)
>
> With all three patches in place, we get
>
> regression=# explain select * from pg_class where relname::information_schema.sql_identifier = 'foo'::text;
>                                          QUERY PLAN
> ---------------------------------------------------------------------------------------------
>  Index Scan using pg_class_relname_nsp_index on pg_class  (cost=0.28..8.30 rows=7 width=781)
>    Index Cond: (relname = 'foo'::text)
>    Filter: ((relname)::text = 'foo'::text)
> (3 rows)
>
> so we've fixed the lack of indexscan but we're still a bit off about the
> statistics.  I don't have any immediate proposal about how to fix the
> latter, but anyway this is enough to make Pavel's case a lot better.


> 0003 essentially converts "namecol::text texteq textvalue" into
> "namecol nameeqtext textvalue", relying on the new equality
> operator introduced by 0001.

Ugh, that's indeed a bit kludgy. It'd be nice to have an approach that's
usable outside of one odd builtin type. I was wondering for a bit
whether we could have logic to move the cast to the other side of an
operator, but I don't see how we could make that generally safe.

Greetings,

Andres Freund


Re: slow queries over information schema.tables

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2018-12-05 12:24:54 -0500, Tom Lane wrote:
>> There are two different issues in that.  One is that the domain might
>> have constraints (though in reality it does not), so the planner can't
>> throw away the CoerceToDomain node, and thus can't match the expression
>> to the index.  Even if we did throw away the CoerceToDomain, it still
>> would not work because the domain is declared to be over varchar, and
>> so there's a cast-to-varchar underneath the CoerceToDomain.

> Couldn't we make expression simplification replace CoerceToDomain with a
> RelabelType if the constraint is simple enough?  That's not entirely
> trivial because we'd have to look into the typecache to get the
> constraints, but that doesn't sound too bad.

Not following what you have in mind here?  My 0002 throws away the
CoerceToDomain if there are *no* constraints, but I can't see any
situation in which we'd likely be able to ignore a constraint,
simple or not.

>> 0003 essentially converts "namecol::text texteq textvalue" into
>> "namecol nameeqtext textvalue", relying on the new equality
>> operator introduced by 0001.

> Ugh, that's indeed a bit kludgy. It'd be nice to have an approach that's
> usable outside of one odd builtin type. I was wondering for a bit
> whether we could have logic to move the cast to the other side of an
> operator, but I don't see how we could make that generally safe.

Yeah.  It seems like it could be a special case of a more general
expression transform facility, but we have no such facility now.

On the other hand, all of match_special_index_operator is an ugly
single-purpose kluge already, so I'm not feeling that awful about
throwing another special case into it.  Someday it would be nice
to replace that code with something more general and extensible,
but today is not that day as far as I'm concerned.

            regards, tom lane


Re: slow queries over information schema.tables

От
Andres Freund
Дата:
Hi,

On 2018-12-05 13:22:23 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2018-12-05 12:24:54 -0500, Tom Lane wrote:
> >> There are two different issues in that.  One is that the domain might
> >> have constraints (though in reality it does not), so the planner can't
> >> throw away the CoerceToDomain node, and thus can't match the expression
> >> to the index.  Even if we did throw away the CoerceToDomain, it still
> >> would not work because the domain is declared to be over varchar, and
> >> so there's a cast-to-varchar underneath the CoerceToDomain.
> 
> > Couldn't we make expression simplification replace CoerceToDomain with a
> > RelabelType if the constraint is simple enough?  That's not entirely
> > trivial because we'd have to look into the typecache to get the
> > constraints, but that doesn't sound too bad.
> 
> Not following what you have in mind here?  My 0002 throws away the
> CoerceToDomain if there are *no* constraints, but I can't see any
> situation in which we'd likely be able to ignore a constraint,
> simple or not.

Yea, simple probably means nonexistant for now. We could e.g. optimize
some NOT NULL checks away, but it's probably not worth it.


> >> 0003 essentially converts "namecol::text texteq textvalue" into
> >> "namecol nameeqtext textvalue", relying on the new equality
> >> operator introduced by 0001.
> 
> > Ugh, that's indeed a bit kludgy. It'd be nice to have an approach that's
> > usable outside of one odd builtin type. I was wondering for a bit
> > whether we could have logic to move the cast to the other side of an
> > operator, but I don't see how we could make that generally safe.
> 
> Yeah.  It seems like it could be a special case of a more general
> expression transform facility, but we have no such facility now.
> 
> On the other hand, all of match_special_index_operator is an ugly
> single-purpose kluge already, so I'm not feeling that awful about
> throwing another special case into it.  Someday it would be nice
> to replace that code with something more general and extensible,
> but today is not that day as far as I'm concerned.

Fair enough.

Greetings,

Andres Freund


Re: slow queries over information schema.tables

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2018-12-05 13:22:23 -0500, Tom Lane wrote:
>> Not following what you have in mind here?  My 0002 throws away the
>> CoerceToDomain if there are *no* constraints, but I can't see any
>> situation in which we'd likely be able to ignore a constraint,
>> simple or not.

> Yea, simple probably means nonexistant for now. We could e.g. optimize
> some NOT NULL checks away, but it's probably not worth it.

Ah, yes, that is a case where we might have enough knowledge to prove
the test redundant --- but considering that we explicitly discourage
domain NOT NULL as bad style and not fully supported, I can't get
excited about it.  I suppose in some cases we might be able to use
predtest.c to prove domain CHECK constraints redundant, but I bet that
it's not worth the trouble.

The bigger picture here is that people seem to like to use domains
as simple type aliases, which will never have any constraints, but
we're very dumb about that today.  So the patch as presented seems
like it has lots of potential applicability, as long as we fix the
invalidation aspect.

            regards, tom lane


Re: slow queries over information schema.tables

От
Andres Freund
Дата:
On 2018-12-05 13:41:32 -0500, Tom Lane wrote:
> The bigger picture here is that people seem to like to use domains
> as simple type aliases, which will never have any constraints, but
> we're very dumb about that today.  So the patch as presented seems
> like it has lots of potential applicability, as long as we fix the
> invalidation aspect.

Entirely agreed.


Re: slow queries over information schema.tables

От
Robert Haas
Дата:
On Wed, Dec 5, 2018 at 1:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Ah, yes, that is a case where we might have enough knowledge to prove
> the test redundant --- but considering that we explicitly discourage
> domain NOT NULL as bad style and not fully supported, I can't get
> excited about it.  I suppose in some cases we might be able to use
> predtest.c to prove domain CHECK constraints redundant, but I bet that
> it's not worth the trouble.
>
> The bigger picture here is that people seem to like to use domains
> as simple type aliases, which will never have any constraints, but
> we're very dumb about that today.  So the patch as presented seems
> like it has lots of potential applicability, as long as we fix the
> invalidation aspect.

I'm not thrilled about depending on sinval without locking,
particularly given that my proposal to make sure we
AcceptInvalidationMessages() at least once per query was shouted down.
That means that in corner cases, you could execute many queries
without flushing the old cache entries.  However, I do agree that
locking every type, function, operator, etc. involved in the query is
impractical.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: slow queries over information schema.tables

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> I'm not thrilled about depending on sinval without locking,
> particularly given that my proposal to make sure we
> AcceptInvalidationMessages() at least once per query was shouted down.

It's fairly hard to imagine practical cases where we'd not call
AcceptInvalidationMessages at least once per query, so I'm not
very sure what you're on about.

            regards, tom lane


Re: slow queries over information schema.tables

От
Robert Haas
Дата:
On Thu, Dec 6, 2018 at 11:32 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > I'm not thrilled about depending on sinval without locking,
> > particularly given that my proposal to make sure we
> > AcceptInvalidationMessages() at least once per query was shouted down.
>
> It's fairly hard to imagine practical cases where we'd not call
> AcceptInvalidationMessages at least once per query, so I'm not
> very sure what you're on about.

Unless I'm confused, it happens any time you run a query that only
touches tables using lockmodes previously acquired by the current
transaction.  Like:

BEGIN;
some query;
the same query again;

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: slow queries over information schema.tables

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Dec 6, 2018 at 11:32 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It's fairly hard to imagine practical cases where we'd not call
>> AcceptInvalidationMessages at least once per query, so I'm not
>> very sure what you're on about.

> Unless I'm confused, it happens any time you run a query that only
> touches tables using lockmodes previously acquired by the current
> transaction.  Like:

> BEGIN;
> some query;
> the same query again;

In my testing, that still hits AIM() during parserOpenTable().

[ further experimentation... ]  It looks like if you prepare
a query and then just execute it repeatedly in one transaction,
you'd not reach AIM (as long as you were getting generic plans).
Possibly that's a gap worth closing.

            regards, tom lane


Re: slow queries over information schema.tables

От
Robert Haas
Дата:
On Thu, Dec 6, 2018 at 12:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> In my testing, that still hits AIM() during parserOpenTable().

Oh, I see.  relation_openrv_extended() calls it.

> [ further experimentation... ]  It looks like if you prepare
> a query and then just execute it repeatedly in one transaction,
> you'd not reach AIM (as long as you were getting generic plans).
> Possibly that's a gap worth closing.

If we called it at the start of every query, couldn't we dispense with
the call in relation_openrv_extended()?  On net, that would actually
mean fewer calls to AcceptInvalidationMessages(), assuming you
sometimes run queries that touch multiple relations.  And the behavior
would be more predictable, too, because you'd (hopefully) have no
cases where a command failed to see the results of DDL that committed
before the command was issued.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: slow queries over information schema.tables

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Dec 6, 2018 at 12:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> [ further experimentation... ]  It looks like if you prepare
>> a query and then just execute it repeatedly in one transaction,
>> you'd not reach AIM (as long as you were getting generic plans).
>> Possibly that's a gap worth closing.

> If we called it at the start of every query, couldn't we dispense with
> the call in relation_openrv_extended()?

No.  You need to do AIM *after* obtaining the lock, else you still
have the race condition that you can execute a query on a table
without being aware of recent DDL on it.

What we could possibly do to close the gap cited above is to have
plancache.c's CheckCachedPlan force an AIM call if it notices that
the plan it wants to re-use has a non-empty PlanInvalItems list.
If it does not, then there is nothing that AIM would cause invalidation
for anyway.  This still leaves us with a query-duration-sized race
condition window for DDL on functions and domains, but not any larger
than that.

            regards, tom lane


Re: slow queries over information schema.tables

От
Robert Haas
Дата:
On Thu, Dec 6, 2018 at 12:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > If we called it at the start of every query, couldn't we dispense with
> > the call in relation_openrv_extended()?
>
> No.  You need to do AIM *after* obtaining the lock, else you still
> have the race condition that you can execute a query on a table
> without being aware of recent DDL on it.

Huh? The call in relation_openrv_extended() happens before acquiring the lock.

> What we could possibly do to close the gap cited above is to have
> plancache.c's CheckCachedPlan force an AIM call if it notices that
> the plan it wants to re-use has a non-empty PlanInvalItems list.
> If it does not, then there is nothing that AIM would cause invalidation
> for anyway.  This still leaves us with a query-duration-sized race
> condition window for DDL on functions and domains, but not any larger
> than that.

That would be a nice place to get.  Not perfect, but better than now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: slow queries over information schema.tables

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Dec 6, 2018 at 12:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> No.  You need to do AIM *after* obtaining the lock, else you still
>> have the race condition that you can execute a query on a table
>> without being aware of recent DDL on it.

> Huh? The call in relation_openrv_extended() happens before acquiring the lock.

Oh, I was looking at the wrong AIM call :-(.  You're talking about this
one:

    /*
     * Check for shared-cache-inval messages before trying to open the
     * relation.  This is needed even if we already hold a lock on the
     * relation, because GRANT/REVOKE are executed without taking any lock on
     * the target relation, and we want to be sure we see current ACL
     * information.  We can skip this if asked for NoLock, on the assumption
     * that such a call is not the first one in the current command, and so we
     * should be reasonably up-to-date already.  (XXX this all could stand to
     * be redesigned, but for the moment we'll keep doing this like it's been
     * done historically.)
     */
    if (lockmode != NoLock)
        AcceptInvalidationMessages();

which is indeed about as bletcherous as it could possibly be.

The ideal thing would be for GRANT/REVOKE to act more like other DDL,
but I'm not sure how we get to that point: REVOKE SELECT, at the very
least, would have to take AccessExclusiveLock so that it'd block SELECT.
People doing things like GRANT/REVOKE ON ALL TABLES would doubtless
complain about the greatly increased risk of deadlock from taking a
pile of AELs.  And that still would only fix the problem for table
privileges, not privileges on other object types that we have no
consistent locking scheme for.

I can see the attraction of replacing these AIM calls (and, probably,
the one in AtStart_Cache) with a single call that occurs somewhere
during statement startup.  Not sure exactly where that should be;
but there is certainly not anything principled about doing it in
relation_open_xxx().

            regards, tom lane


Re: slow queries over information schema.tables

От
Tom Lane
Дата:
I wrote:
> Pavel Stehule <pavel.stehule@gmail.com> writes:
>> Slow query
>> select * from information_schema.tables where table_name = 'pg_class';

> Yeah.  This has been complained of many times before.

> The core of the problem, I think, is that we're unable to convert the
> condition on table_name into an indexscan on pg_class.relname, because
> the view has cast pg_class.relname to the sql_identifier domain.

> There are two different issues in that.  One is that the domain might
> have constraints (though in reality it does not), so the planner can't
> throw away the CoerceToDomain node, and thus can't match the expression
> to the index.  Even if we did throw away the CoerceToDomain, it still
> would not work because the domain is declared to be over varchar, and
> so there's a cast-to-varchar underneath the CoerceToDomain.

After my last few commits, the only issue that's left here is the
cast-to-varchar implied by casting to sql_identifier.  Upthread
I showed a possible planner hack to get rid of that, and we could
still solve it that way so far as allowing indexscans on catalogs
is concerned.  However, I wonder what people would think of a
more aggressive approach, viz:

diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index 0fbcfa8..3891e3b 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -216,7 +216,7 @@ CREATE DOMAIN character_data AS character varying COLLATE "C";
  * SQL_IDENTIFIER domain
  */

-CREATE DOMAIN sql_identifier AS character varying COLLATE "C";
+CREATE DOMAIN sql_identifier AS name;



I've not checked to verify that sql_identifier is used for all and only
those view columns that expose "name" catalog columns.  If the SQL
committee was sloppy about that, this idea might not work.  But assuming
that the length restriction is valid for the columns that have this
type, would this be an OK idea?  It does seem to fix the poor-plan-quality
problem at a stroke, with no weird planner hacks.

What I find in the SQL spec is

         5.5  SQL_IDENTIFIER domain

         Function

         Define a domain that contains all valid <identifier body>s and
         <delimited identifier body>s.

         Definition

         CREATE DOMAIN SQL_IDENTIFIER AS
             CHARACTER VARYING (L)
             CHARACTER SET SQL_IDENTIFIER;

         GRANT USAGE ON DOMAIN SQL_IDENTIFIER
             TO PUBLIC WITH GRANT OPTION;

         Description

         1) This domain specifies all variable-length character values that
            conform to the rules for formation and representation of an SQL
            <identifier body> or an SQL <delimited identifier body>.

            NOTE 4 - There is no way in SQL to specify a <domain
            constraint> that would be true for the body of any valid SQL
            <regular identifier> or <delimited identifier> and false for all
            other character string values.

         2) L is the implementation-defined maximum length of <identifier
            body> and <delimited identifier body>.

So we'd be violating the part of the spec that says that the domain's
base type is varchar, but considering all the other requirements here
that we're blithely ignoring, maybe that's not such a sin.  With the
recent collation changes, type name is hard to functionally distinguish
from a domain over varchar anyway.  Furthermore, since name's length limit
corresponds to the "implementation-defined maximum length" part of the
spec, you could argue that in some ways this definition is closer to the
spec than what we've got now.

Thoughts?

            regards, tom lane


Re: slow queries over information schema.tables

От
Pavel Stehule
Дата:


čt 20. 12. 2018 v 0:14 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
I wrote:
> Pavel Stehule <pavel.stehule@gmail.com> writes:
>> Slow query
>> select * from information_schema.tables where table_name = 'pg_class';

> Yeah.  This has been complained of many times before.

> The core of the problem, I think, is that we're unable to convert the
> condition on table_name into an indexscan on pg_class.relname, because
> the view has cast pg_class.relname to the sql_identifier domain.

> There are two different issues in that.  One is that the domain might
> have constraints (though in reality it does not), so the planner can't
> throw away the CoerceToDomain node, and thus can't match the expression
> to the index.  Even if we did throw away the CoerceToDomain, it still
> would not work because the domain is declared to be over varchar, and
> so there's a cast-to-varchar underneath the CoerceToDomain.

After my last few commits, the only issue that's left here is the
cast-to-varchar implied by casting to sql_identifier.  Upthread
I showed a possible planner hack to get rid of that, and we could
still solve it that way so far as allowing indexscans on catalogs
is concerned.  However, I wonder what people would think of a
more aggressive approach, viz:

diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index 0fbcfa8..3891e3b 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -216,7 +216,7 @@ CREATE DOMAIN character_data AS character varying COLLATE "C";
  * SQL_IDENTIFIER domain
  */

-CREATE DOMAIN sql_identifier AS character varying COLLATE "C";
+CREATE DOMAIN sql_identifier AS name;



I've not checked to verify that sql_identifier is used for all and only
those view columns that expose "name" catalog columns.  If the SQL
committee was sloppy about that, this idea might not work.  But assuming
that the length restriction is valid for the columns that have this
type, would this be an OK idea?  It does seem to fix the poor-plan-quality
problem at a stroke, with no weird planner hacks.

What I find in the SQL spec is

         5.5  SQL_IDENTIFIER domain

         Function

         Define a domain that contains all valid <identifier body>s and
         <delimited identifier body>s.

         Definition

         CREATE DOMAIN SQL_IDENTIFIER AS
             CHARACTER VARYING (L)
             CHARACTER SET SQL_IDENTIFIER;

         GRANT USAGE ON DOMAIN SQL_IDENTIFIER
             TO PUBLIC WITH GRANT OPTION;

         Description

         1) This domain specifies all variable-length character values that
            conform to the rules for formation and representation of an SQL
            <identifier body> or an SQL <delimited identifier body>.

            NOTE 4 - There is no way in SQL to specify a <domain
            constraint> that would be true for the body of any valid SQL
            <regular identifier> or <delimited identifier> and false for all
            other character string values.

         2) L is the implementation-defined maximum length of <identifier
            body> and <delimited identifier body>.

So we'd be violating the part of the spec that says that the domain's
base type is varchar, but considering all the other requirements here
that we're blithely ignoring, maybe that's not such a sin.  With the
recent collation changes, type name is hard to functionally distinguish
from a domain over varchar anyway.  Furthermore, since name's length limit
corresponds to the "implementation-defined maximum length" part of the
spec, you could argue that in some ways this definition is closer to the
spec than what we've got now.

Thoughts?

The very common will be compare with text type - some like

SELECT * FROM information_schema.tables WHERE table_name = lower('somename');



                        regards, tom lane

Re: slow queries over information schema.tables

От
Tom Lane
Дата:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> čt 20. 12. 2018 v 0:14 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
>> After my last few commits, the only issue that's left here is the
>> cast-to-varchar implied by casting to sql_identifier.  Upthread
>> I showed a possible planner hack to get rid of that, and we could
>> still solve it that way so far as allowing indexscans on catalogs
>> is concerned.  However, I wonder what people would think of a
>> more aggressive approach, viz:
>> -CREATE DOMAIN sql_identifier AS character varying COLLATE "C";
>> +CREATE DOMAIN sql_identifier AS name;

> The very common will be compare with text type - some like
> SELECT * FROM information_schema.tables WHERE table_name =
> lower('somename');

Yeah, that's not really an issue.  After applying the above one-liner
to HEAD, I get plans like this:

regression=# explain SELECT * FROM information_schema.tables WHERE table_name =
lower('somename');

                   QUERY PLAN
                                                 

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=8.76..18.60 rows=1 width=608)
   ->  Hash Join  (cost=8.34..10.07 rows=1 width=141)
         Hash Cond: (nc.oid = c.relnamespace)
         ->  Seq Scan on pg_namespace nc  (cost=0.00..1.62 rows=33 width=68)
               Filter: (NOT pg_is_other_temp_schema(oid))
         ->  Hash  (cost=8.33..8.33 rows=1 width=77)
               ->  Index Scan using pg_class_relname_nsp_index on pg_class c  (cost=0.28..8.33 rows=1 width=77)
                     Index Cond: ((relname)::name = 'somename'::text)
                     Filter: ((relkind = ANY ('{r,v,f,p}'::"char"[])) AND (pg_has_role(relowner, 'USAGE'::text) OR
has_table_privilege(oid,'SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER'::text) OR
has_any_column_privilege(oid,'SELECT, INSERT, UPDATE, REFERENCES'::text))) 
   ->  Nested Loop  (cost=0.42..8.46 rows=1 width=132)
         ->  Index Scan using pg_type_oid_index on pg_type t  (cost=0.28..8.29 rows=1 width=72)
               Index Cond: (c.reloftype = oid)
         ->  Index Scan using pg_namespace_oid_index on pg_namespace nt  (cost=0.14..0.16 rows=1 width=68)
               Index Cond: (oid = t.typnamespace)
(14 rows)

You could surely argue about whether this is too complicated, but it's not
the planner's fault that we've got so many conditions here ...

            regards, tom lane


Re: slow queries over information schema.tables

От
Pavel Stehule
Дата:


čt 20. 12. 2018 v 5:29 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> čt 20. 12. 2018 v 0:14 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
>> After my last few commits, the only issue that's left here is the
>> cast-to-varchar implied by casting to sql_identifier.  Upthread
>> I showed a possible planner hack to get rid of that, and we could
>> still solve it that way so far as allowing indexscans on catalogs
>> is concerned.  However, I wonder what people would think of a
>> more aggressive approach, viz:
>> -CREATE DOMAIN sql_identifier AS character varying COLLATE "C";
>> +CREATE DOMAIN sql_identifier AS name;

> The very common will be compare with text type - some like
> SELECT * FROM information_schema.tables WHERE table_name =
> lower('somename');

Yeah, that's not really an issue.  After applying the above one-liner
to HEAD, I get plans like this:

regression=# explain SELECT * FROM information_schema.tables WHERE table_name =
lower('somename');
                                                                                                                                            QUERY PLAN                                                                                                                                           
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=8.76..18.60 rows=1 width=608)
   ->  Hash Join  (cost=8.34..10.07 rows=1 width=141)
         Hash Cond: (nc.oid = c.relnamespace)
         ->  Seq Scan on pg_namespace nc  (cost=0.00..1.62 rows=33 width=68)
               Filter: (NOT pg_is_other_temp_schema(oid))
         ->  Hash  (cost=8.33..8.33 rows=1 width=77)
               ->  Index Scan using pg_class_relname_nsp_index on pg_class c  (cost=0.28..8.33 rows=1 width=77)
                     Index Cond: ((relname)::name = 'somename'::text)
                     Filter: ((relkind = ANY ('{r,v,f,p}'::"char"[])) AND (pg_has_role(relowner, 'USAGE'::text) OR has_table_privilege(oid, 'SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER'::text) OR has_any_column_privilege(oid, 'SELECT, INSERT, UPDATE, REFERENCES'::text)))
   ->  Nested Loop  (cost=0.42..8.46 rows=1 width=132)
         ->  Index Scan using pg_type_oid_index on pg_type t  (cost=0.28..8.29 rows=1 width=72)
               Index Cond: (c.reloftype = oid)
         ->  Index Scan using pg_namespace_oid_index on pg_namespace nt  (cost=0.14..0.16 rows=1 width=68)
               Index Cond: (oid = t.typnamespace)
(14 rows)

You could surely argue about whether this is too complicated, but it's not
the planner's fault that we've got so many conditions here ...

this plan looks great

Pavel


                        regards, tom lane

Re: slow queries over information schema.tables

От
Tom Lane
Дата:
I wrote:
> ... However, I wonder what people would think of a
> more aggressive approach, viz:
> -CREATE DOMAIN sql_identifier AS character varying COLLATE "C";
> +CREATE DOMAIN sql_identifier AS name;
>
> I've not checked to verify that sql_identifier is used for all and only
> those view columns that expose "name" catalog columns.  If the SQL
> committee was sloppy about that, this idea might not work.

I poked into this by instrumenting the parser to see what type conversions
it inserts into the information_schema views.  It appears that the vast
majority of conversions to sql_identifier are indeed on "name" columns,
but we have some occurrences of cases like this:

           CAST(a.attnum AS sql_identifier) AS dtd_identifier,

and some like this:

           CAST(p.proname || '_' || CAST(p.oid AS text) AS sql_identifier) AS specific_name

It doesn't look to me like converting to name rather than varchar would
have any real performance consequence in either case: certainly we're not
going to be able to propagate WHERE conditions on these view columns back
to any existing catalog index, regardless of the cast.  However, the
second case offers something else to worry about: what if the
concatenation yields a string longer than NAMEDATALEN?  As the code
stands, the view will simply return a string that's too long to be a name,
which arguably is a violation of SQL spec.  If we change sql_identifier
to be "name", the cast will silently truncate, which also arguably is a
violation of SQL spec, because I think specific_name is supposed to be
unique.  (The point of concatenating the function OID is to make it so.)

Perhaps we could fix this by truncating the p.proname part to ensure
that the concatenation result fits in NAMEDATALEN.  I'm not sure about
a good way to get a correct value of NAMEDATALEN into the
information_schema script, though.  Worse, I don't think we expose any
convenient way to truncate a string based on byte length rather than
character length (substr() does the latter).  So I think that a reasonable
way to tackle this might be to provide a C function along the lines of

    nameconcatoid(name, oid) returns name

which contracts to produce "$1 || '_' || $2" while truncating $1 only as
much as needed to make the result fit in NAMEDATALEN.

            regards, tom lane


Re: slow queries over information schema.tables

От
Greg Stark
Дата:
Just brainstorming here. Another crazy idea would be to get rid of
"name" data type, at least from the user-visible planner point of
view. It would probably have to be stored as a fixed length data type
like today but with a one-byte length header. That would make it
possible for the planner to use as if it was just another varchar.