Обсуждение: LIKE indexing
Here's the patch for review. It adds the non-locale operator classes for all three character types and the analogous selectivity estimation changes. Basically, I'm confident this works, but as some people seem to have doubts I show it here first. -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
Вложения
I'm fairly close to committing a wholesale rearrangement of pg_opclass
and friends (per previous discussions, mostly with Oleg). This is going
to create some conflicts with your patch :-(. Who gets to go first?
regards, tom lane
Tom Lane writes: > I'm fairly close to committing a wholesale rearrangement of pg_opclass > and friends (per previous discussions, mostly with Oleg). This is going > to create some conflicts with your patch :-(. Who gets to go first? If your changes only conflict in the system catalog headers I can redo those when you're done. (Presuming there's documentation coming along.) -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
Peter Eisentraut <peter_e@gmx.net> writes:
> If your changes only conflict in the system catalog headers I can redo
> those when you're done. (Presuming there's documentation coming along.)
Actually, all those uses of op_class() are going to conflict too...
op_class doesn't need an AM OID parameter anymore (and besides which,
I renamed it to op_in_opclass).
If you feel ready to commit, do so, and I'll do the merge.
regards, tom lane
Peter Eisentraut <peter_e@gmx.net> writes:
> Here's the patch for review.
A few gripes:
+ The optimizer can also use a B-Tree index for queries involving the
+ pattern matching operators <literal>LIKE</>,
+ <literal>ILIKE</literal>, <literal>~</literal>, and
+ <literal>~*</literal>, <emphasis>if</emphasis> the pattern is
+ anchored to the beginning of the string, e.g., <literal>col LIKE
+ 'foo%'</literal> or <literal>col ~ '^foo'</literal>, but not
+ <literal>col LIKE 'bar'</literal>. However, if your server does
The "but not" part is wrong: col LIKE 'bar' works perfectly fine as
an indexable LIKE query. Perhaps you meant "but not col LIKE '%foo'".
While it's okay to treat text and varchar alike, I object to treating
bpchar as equivalent to the other two. Shouldn't the bpchar versions of
these functions strip trailing spaces before comparing?
Seems to me you should provide "$<>$" operators for completeness, even
though they're not essential for btree opclasses. I think that these
operators may be useful for more than just this one purpose, so we
shouldn't set up artificial roadblocks.
I don't like the fact that you added expected-output rows to opr_sanity;
seems like tweaking the queries to allow $<$ etc as expected names would
be more appropriate.
regards, tom lane
Tom Lane writes:
> The "but not" part is wrong: col LIKE 'bar' works perfectly fine as
> an indexable LIKE query. Perhaps you meant "but not col LIKE '%foo'".
Thanks. That was a mixup with the POSIX regexp style.
> While it's okay to treat text and varchar alike, I object to treating
> bpchar as equivalent to the other two. Shouldn't the bpchar versions of
> these functions strip trailing spaces before comparing?
I had thought a long time about this and I couldn't see a reason why.
The reason is that the LIKE operator for bpchar does take the blanks into
account, so it effectively doesn't care whether the blanks are the result
of padding or explicit input. E.g.,
peter=# set enable_indexscan to off;
SET VARIABLE
peter=# create table test1 (a char(5));
CREATE
peter=# insert into test1 values ('four');
INSERT 16560 1
peter=# select * from test1 where a like 'four'::bpchar;
a
---
(0 rows)
/*
* If we had stripped spaces here we would have gotten a false positive.
*/
peter=# select * from test1 where a like 'fou_'::bpchar;
a
---
(0 rows)
/*
* Since the padding here is after the wildcard character and is thus
* stripped in the analysis, the augmented expression still holds.
*/
peter=# select * from test1 where a like 'fou%'::bpchar;
a
-------
four
(1 row)
/* same here */
I would also argue that the notion of a direct binary comparision would
not benefit from space stripping.
> Seems to me you should provide "$<>$" operators for completeness, even
> though they're not essential for btree opclasses.
Will do.
> I don't like the fact that you added expected-output rows to opr_sanity;
> seems like tweaking the queries to allow $<$ etc as expected names would
> be more appropriate.
Ok.
--
Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
Peter Eisentraut <peter_e@gmx.net> writes:
> peter=# create table test1 (a char(5));
> CREATE
> peter=# insert into test1 values ('four');
> INSERT 16560 1
> peter=# select * from test1 where a like 'four'::bpchar;
> a
> ---
> (0 rows)
I think this is an erroneous result, actually, seeing as how
regression=# select 'four '::bpchar = 'four'::bpchar;
?column?
----------
t
(1 row)
How can A = B not imply A LIKE B? (This may be related to Hiroshi's
concerns.)
I dug into the spec to see what it has to say, and came up with this
rather opaque prose:
4) If the i-th substring specifier of PCV is neither an
arbitrary character specifier nor an arbitrary string
specifier, then the i-th substring of MCV is equal to
that substring specifier according to the collating
sequence of the <like predicate>, without the appending
of <space> characters to MCV, and has the same length as
that substring specifier.
The bit about "without the appending of <space> characters" *might*
mean that LIKE is always supposed to treat trailing blanks as
significant, but I'm not sure. The text does seem to say that it's okay
to add trailing blanks to the pattern to produce a match, when the
collating sequence is PAD SPACE type (bpchar in our terms).
In any case, Hiroshi is dead right that LIKE is supposed to perform
collating-sequence-dependent comparison, and this probably means that
this whole approach is a dead end :-(
regards, tom lane