Обсуждение: starts_with, ^@ and index usage

Поиск
Список
Период
Сортировка

starts_with, ^@ and index usage

От
Shay Rojansky
Дата:
Greetings hackers,

I'm seeing some odd behavior around string prefix searching -
hopefully I've missed something here (thanks to Nino Floris for
originally flagging this).

In PostgreSQL 11, a starts_with function and a ^@ operators were added
for string prefix checking, as an alternative to LIKE 'foo%' [1] [2].
I've ran a few scenarios and have seen the following behavior:

Queries tested:

1. EXPLAIN SELECT * FROM data WHERE name LIKE 'foo10%';
2. EXPLAIN SELECT * FROM data WHERE name ^@ 'foo10';
3. EXPLAIN SELECT * FROM data WHERE starts_with(name, 'foo10');

... running against a table with 500k rows and enable_seqscan turned
off. Results:

Index  | Operator class   | LIKE 'X%'         | ^@                | starts_with
------ | ---------------- | ----------------- | ----------------- | -----------
btree  | text_ops         | Parallel seq scan | Parallel seq scan | Seq scan
btree  | text_pattern_ops | Index scan        | Parallel seq scan | Seq scan
spgist |                  | Index scan        | Index Scan        | Seq scan

First, starts_with doesn't seem to use SP-GIST indexes, contrary to
the patch description (and also doesn't trigger a parallel seq scan) -
is this intentional? The function is listed front-and-center on the
string functions and operators page[3], and receives mention on the
pattern matching page[4], without any mention of it being so
problematic.

Note that ^@ isn't documented on the string functions and operators,
so it's not very discoverable; if added to the docs, I'd recommend
adding a note on SP-GIST being required, since uninformed new users
would probably expect a default btree index to work as well.

Shay

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=710d90da1fd8c1d028215ecaf7402062079e99e9
[2]
https://www.postgresql.org/message-id/flat/03300255-cff2-b508-50f4-f00cca0a57a1%40sigaev.ru#38d2020edf92f96d204cd2679d362c38
[3] https://www.postgresql.org/docs/current/functions-string.html
[4] https://www.postgresql.org/docs/current/functions-matching.html



Re: starts_with, ^@ and index usage

От
Tom Lane
Дата:
Shay Rojansky <roji@roji.org> writes:
> In PostgreSQL 11, a starts_with function and a ^@ operators were added
> for string prefix checking, as an alternative to LIKE 'foo%' [1] [2].

> First, starts_with doesn't seem to use SP-GIST indexes, contrary to
> the patch description (and also doesn't trigger a parallel seq scan) -
> is this intentional? The function is listed front-and-center on the
> string functions and operators page[3], and receives mention on the
> pattern matching page[4], without any mention of it being so
> problematic.

It seems like it didn't occur to anybody to tie starts_with() into
the machinery for derived index operators.  That wouldn't be hard,
but it wasn't done.

Before (I think) v12, function invocations never could be converted
to indexquals anyway, so it's not surprising that a v11-era patch
wouldn't have thought it needed to address that point.

I do see that starts_with() is marked parallel safe, so it's not clear
why it wouldn't be amenable to a parallel seqscan.  The function (as
opposed to the operator) isn't tied into selectivity estimation either,
so maybe that has something to do with using a default selectivity
estimate for it?  But said estimate would almost always be too high,
which doesn't seem like the direction that would discourage parallelism.

> Note that ^@ isn't documented on the string functions and operators,

That's another oversight.

It seems clear that the original patch author was pretty narrowly focused
on use of the operator with SP-GIST, and didn't think about how it should
fit into the larger ecosystem.

            regards, tom lane



Re: starts_with, ^@ and index usage

От
Tom Lane
Дата:
I wrote:
> Shay Rojansky <roji@roji.org> writes:
>> First, starts_with doesn't seem to use SP-GIST indexes, contrary to
>> the patch description (and also doesn't trigger a parallel seq scan) -
>> is this intentional?

> It seems like it didn't occur to anybody to tie starts_with() into
> the machinery for derived index operators.  That wouldn't be hard,
> but it wasn't done.

I've started another thread with a patch for that [1].

>> Note that ^@ isn't documented on the string functions and operators,

> That's another oversight.

Well, "oversight" might be too strong a word.  AFAICS from a quick look
in pg_operator, most operators on type text are comparisons, pattern
match, or text search, none of which do I want to fold into section 9.4.
The only exceptions are ||, which we do document there under SQL
operators, and ^@.  Commit 710d90da1 apparently decided to treat ^@ as a
pattern match operator, which I guess it could be if you hold your head
at the right angle, but I doubt most people would think to look for it
in section 9.7.  I guess the most practical answer is to rename table
9.10 from "Other String Functions" to "Other String Functions and
Operators", which is more parallel to table 9.9 anyway.  Just as in 9.9,
it would look weird to have a one-entry table of operators.  (Maybe
someday in the far future it'd make sense to split 9.10 into two
tables.)

            regards, tom lane

[1] https://www.postgresql.org/message-id/232599.1633800229%40sss.pgh.pa.us