Обсуждение: BUG #14032: trigram index is not used for '=' operator

Поиск
Список
Период
Сортировка

BUG #14032: trigram index is not used for '=' operator

От
ruslan.zakirov@gmail.com
Дата:
The following bug has been logged on the website:

Bug reference:      14032
Logged by:          Ruslan
Email address:      ruslan.zakirov@gmail.com
PostgreSQL version: 9.4.6
Operating system:   linux
Description:

Hi,

Have table with the following index:

"tags_local_name_trg" gin (lower(name::text) gin_trgm_ops)

Was surprised that I have to use LIKE op to activate index:

sports=> explain analyze select id from tags_local where lower(name) =
'xx';
                                                 QUERY PLAN

-------------------------------------------------------------------------------------------------------------
 Seq Scan on tags_local  (cost=0.00..8647.17 rows=729 width=4) (actual
time=188.669..188.669 rows=0 loops=1)
   Filter: (lower((name)::text) = 'xx'::text)
   Rows Removed by Filter: 145887
 Planning time: 0.298 ms
 Execution time: 188.695 ms
(5 rows)

sports=> explain analyze select id from tags_local where lower(name) like
'xx';
                                                          QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tags_local  (cost=9.01..122.71 rows=729 width=4)
(actual time=1.014..1.014 rows=0 loops=1)
   Recheck Cond: (lower((name)::text) ~~ 'xx'::text)
   ->  Bitmap Index Scan on tags_local_name_trg  (cost=0.00..8.82 rows=729
width=0) (actual time=1.013..1.013 rows=0 loops=1)
         Index Cond: (lower((name)::text) ~~ 'xx'::text)
 Planning time: 0.546 ms
 Execution time: 1.076 ms
(6 rows)

Re: BUG #14032: trigram index is not used for '=' operator

От
Artur Zakirov
Дата:
Hello,

Unfortunately, pg_trgm can not support '=' operator. If I am not mistaken
it is GiST and GIN limitation.

You can read the documentation
http://www.postgresql.org/docs/current/static/pgtrgm.html

The pg_trgm module provides GiST and GIN index operator classes that allow
> you to create an index over a text column for the purpose of very fast
> similarity searches. These index types support the above-described
> similarity operators, and additionally support trigram-based index searches
> for LIKE, ILIKE, ~ and ~* queries. (These indexes do not support equality
> nor simple comparison operators, so you may need a regular B-tree index
> too.)
>

2016-03-18 13:04 GMT+03:00 <ruslan.zakirov@gmail.com>:

> The following bug has been logged on the website:
>
> Bug reference:      14032
> Logged by:          Ruslan
> Email address:      ruslan.zakirov@gmail.com
> PostgreSQL version: 9.4.6
> Operating system:   linux
> Description:
>
> Hi,
>
> Have table with the following index:
>
> "tags_local_name_trg" gin (lower(name::text) gin_trgm_ops)
>
> Was surprised that I have to use LIKE op to activate index:
>
> sports=> explain analyze select id from tags_local where lower(name) =
> 'xx';
>                                                  QUERY PLAN
>
>
> -------------------------------------------------------------------------------------------------------------
>  Seq Scan on tags_local  (cost=0.00..8647.17 rows=729 width=4) (actual
> time=188.669..188.669 rows=0 loops=1)
>    Filter: (lower((name)::text) = 'xx'::text)
>    Rows Removed by Filter: 145887
>  Planning time: 0.298 ms
>  Execution time: 188.695 ms
> (5 rows)
>
> sports=> explain analyze select id from tags_local where lower(name) like
> 'xx';
>                                                           QUERY PLAN
>
>
>
------------------------------------------------------------------------------------------------------------------------------
>  Bitmap Heap Scan on tags_local  (cost=9.01..122.71 rows=729 width=4)
> (actual time=1.014..1.014 rows=0 loops=1)
>    Recheck Cond: (lower((name)::text) ~~ 'xx'::text)
>    ->  Bitmap Index Scan on tags_local_name_trg  (cost=0.00..8.82 rows=729
> width=0) (actual time=1.013..1.013 rows=0 loops=1)
>          Index Cond: (lower((name)::text) ~~ 'xx'::text)
>  Planning time: 0.546 ms
>  Execution time: 1.076 ms
> (6 rows)
>
>
>
>
> --
> Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs
>



--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Re: BUG #14032: trigram index is not used for '=' operator

От
Jeff Janes
Дата:
On Fri, Mar 18, 2016 at 3:37 PM, Artur Zakirov <a.zakirov@postgrespro.ru> wrote:
> Hello,
>
> Unfortunately, pg_trgm can not support '=' operator. If I am not mistaken it
> is GiST and GIN limitation.

I think it is actually quite trivial to support '='.  In fact I think
that all you have to do is tap into the same code that LIKE already
uses, and let the recheck remove things which match on the LIKE
interpretation of the query string but not the equality
interpretation.

Attached is a crude patch I put together which seems to do the job,
but I haven't thoroughly tested it.

The main problem is likely to be that there is already a really good
index type for speeding up equality queries (btree), and adding
another (generally much worse) alternative is likely to confuse the
planner more than anything.  Is it really worth taking the performance
hit on executing the equality query in order to avoid just keeping a
second btree index?

If I could somehow turn this into an extension module that installed
with pg_trgm as a dependency, rather than reaching into pg_trgm's
internals, then it might be worthwhile putting something like this on
PGXN.  But I don't know how to do that.  And it doesn't seem
worthwhile to change pg_trgm itself in this way.

But in any case, it isn't a bug that pg_trgm doesn't do everything it
theoretically could do.

Cheers,

Jeff

Вложения

Re: BUG #14032: trigram index is not used for '=' operator

От
Artur Zakirov
Дата:
On 20.03.2016 04:06, Jeff Janes wrote:
>
> I think it is actually quite trivial to support '='.  In fact I think
> that all you have to do is tap into the same code that LIKE already
> uses, and let the recheck remove things which match on the LIKE
> interpretation of the query string but not the equality
> interpretation.
>
> Attached is a crude patch I put together which seems to do the job,
> but I haven't thoroughly tested it.
>
> The main problem is likely to be that there is already a really good
> index type for speeding up equality queries (btree), and adding
> another (generally much worse) alternative is likely to confuse the
> planner more than anything.  Is it really worth taking the performance
> hit on executing the equality query in order to avoid just keeping a
> second btree index?

Maybe not :)

>
> If I could somehow turn this into an extension module that installed
> with pg_trgm as a dependency, rather than reaching into pg_trgm's
> internals, then it might be worthwhile putting something like this on
> PGXN.  But I don't know how to do that.  And it doesn't seem
> worthwhile to change pg_trgm itself in this way.
>
> But in any case, it isn't a bug that pg_trgm doesn't do everything it
> theoretically could do.

Yes. And so this changes can be applied only for PostgreSQL 9.7.

>
> Cheers,
>
> Jeff
>


--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Re: BUG #14032: trigram index is not used for '=' operator

От
Emre Hasegeli
Дата:
> The main problem is likely to be that there is already a really good
> index type for speeding up equality queries (btree), and adding
> another (generally much worse) alternative is likely to confuse the
> planner more than anything.  Is it really worth taking the performance
> hit on executing the equality query in order to avoid just keeping a
> second btree index?

I don't think it is a problem.  I think btree would usually have lower
cost than GIN and therefore would be preferred by the planner.  The
other opclasses are providing operators for such cases.  For example,
range_ops(GiST) supports =, inet_ops(GiST) supports all basic
comparisons.

It is likely that pg_trgm index created for pattern matching would be
sufficient for equality for some users.  We shouldn't force them to
create an additional btree index.

> If I could somehow turn this into an extension module that installed
> with pg_trgm as a dependency, rather than reaching into pg_trgm's
> internals, then it might be worthwhile putting something like this on
> PGXN.  But I don't know how to do that.  And it doesn't seem
> worthwhile to change pg_trgm itself in this way.

I don't think it is useful to have such a small extension, even if it
would be possible.

> But in any case, it isn't a bug that pg_trgm doesn't do everything it
> theoretically could do.

I agree.  Maybe we can add it to the next commitfest?

Re: BUG #14032: trigram index is not used for '=' operator

От
Ruslan Zakirov
Дата:
Hi,

I all brought up reasonable arguments.

On Thu, Mar 24, 2016 at 6:11 PM, Emre Hasegeli <emre@hasegeli.com> wrote:

> It is likely that pg_trgm index created for pattern matching would be
> sufficient for equality for some users.  We shouldn't force them to
> create an additional btree index.
>

This lines up with my point of view. I just don't want to create 19 btree
indexes as
long as trigram one is fast enough for my load. Switching to using LIKE in
my case,
good it's not big effort in this situation.

--
Best regards, Ruslan.

Re: BUG #14032: trigram index is not used for '=' operator

От
Jeff Janes
Дата:
On Thu, Mar 24, 2016 at 8:11 AM, Emre Hasegeli <emre@hasegeli.com> wrote:
>> The main problem is likely to be that there is already a really good
>> index type for speeding up equality queries (btree), and adding
>> another (generally much worse) alternative is likely to confuse the
>> planner more than anything.  Is it really worth taking the performance
>> hit on executing the equality query in order to avoid just keeping a
>> second btree index?
>
> I don't think it is a problem.  I think btree would usually have lower
> cost than GIN and therefore would be preferred by the planner.  The
> other opclasses are providing operators for such cases.  For example,
> range_ops(GiST) supports =, inet_ops(GiST) supports all basic
> comparisons.

You are quite likely correct that the planner is smart enough to not
be fooled by this.  However, I don't know how to convince myself of
this to a high enough certainty.  While the risk may be low in
probability, any negative result that does occur is high in magnitude.
While the benefit, to those who would benefit, is high probability,
but low in magnitude.

>
> It is likely that pg_trgm index created for pattern matching would be
> sufficient for equality for some users.  We shouldn't force them to
> create an additional btree index.

True, but we also shouldn't force people with no interest in this
equality operator to take some unknown risk that the planner will be
seduced into unfavorable plans.  Which is why an extension for those
who want it seems like the optimal solution.  So perhaps we could add
the C code to the pg_trgm extension itself, but leave out the SQL
which enables it to be used.  Then have that enabling SQL be part of a
separate extension, so people get to choose whether to use it or not.

>
>> If I could somehow turn this into an extension module that installed
>> with pg_trgm as a dependency, rather than reaching into pg_trgm's
>> internals, then it might be worthwhile putting something like this on
>> PGXN.  But I don't know how to do that.  And it doesn't seem
>> worthwhile to change pg_trgm itself in this way.
>
> I don't think it is useful to have such a small extension, even if it
> would be possible.
>
>> But in any case, it isn't a bug that pg_trgm doesn't do everything it
>> theoretically could do.
>
> I agree.  Maybe we can add it to the next commitfest?

I won't do that myself, but anyone is welcome to polish my code and
submit it if they want to.  They should also consider whether there is
more to be gained by enforcing not only that all necessary trigrams
exist, but also that no unnecessary ones do--I had not spent much time
pondering that.

Still, I would want there to be a way to turn it off.  Maybe someone
has a better way of doing that than my idea of it being a separate
extension.

Cheers,

Jeff