Обсуждение: BUG #14032: trigram index is not used for '=' operator
The following bug has been logged on the website: Bug reference: 14032 Logged by: Ruslan Email address: ruslan.zakirov@gmail.com PostgreSQL version: 9.4.6 Operating system: linux Description: Hi, Have table with the following index: "tags_local_name_trg" gin (lower(name::text) gin_trgm_ops) Was surprised that I have to use LIKE op to activate index: sports=> explain analyze select id from tags_local where lower(name) = 'xx'; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Seq Scan on tags_local (cost=0.00..8647.17 rows=729 width=4) (actual time=188.669..188.669 rows=0 loops=1) Filter: (lower((name)::text) = 'xx'::text) Rows Removed by Filter: 145887 Planning time: 0.298 ms Execution time: 188.695 ms (5 rows) sports=> explain analyze select id from tags_local where lower(name) like 'xx'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------ Bitmap Heap Scan on tags_local (cost=9.01..122.71 rows=729 width=4) (actual time=1.014..1.014 rows=0 loops=1) Recheck Cond: (lower((name)::text) ~~ 'xx'::text) -> Bitmap Index Scan on tags_local_name_trg (cost=0.00..8.82 rows=729 width=0) (actual time=1.013..1.013 rows=0 loops=1) Index Cond: (lower((name)::text) ~~ 'xx'::text) Planning time: 0.546 ms Execution time: 1.076 ms (6 rows)
Hello, Unfortunately, pg_trgm can not support '=' operator. If I am not mistaken it is GiST and GIN limitation. You can read the documentation http://www.postgresql.org/docs/current/static/pgtrgm.html The pg_trgm module provides GiST and GIN index operator classes that allow > you to create an index over a text column for the purpose of very fast > similarity searches. These index types support the above-described > similarity operators, and additionally support trigram-based index searches > for LIKE, ILIKE, ~ and ~* queries. (These indexes do not support equality > nor simple comparison operators, so you may need a regular B-tree index > too.) > 2016-03-18 13:04 GMT+03:00 <ruslan.zakirov@gmail.com>: > The following bug has been logged on the website: > > Bug reference: 14032 > Logged by: Ruslan > Email address: ruslan.zakirov@gmail.com > PostgreSQL version: 9.4.6 > Operating system: linux > Description: > > Hi, > > Have table with the following index: > > "tags_local_name_trg" gin (lower(name::text) gin_trgm_ops) > > Was surprised that I have to use LIKE op to activate index: > > sports=> explain analyze select id from tags_local where lower(name) = > 'xx'; > QUERY PLAN > > > ------------------------------------------------------------------------------------------------------------- > Seq Scan on tags_local (cost=0.00..8647.17 rows=729 width=4) (actual > time=188.669..188.669 rows=0 loops=1) > Filter: (lower((name)::text) = 'xx'::text) > Rows Removed by Filter: 145887 > Planning time: 0.298 ms > Execution time: 188.695 ms > (5 rows) > > sports=> explain analyze select id from tags_local where lower(name) like > 'xx'; > QUERY PLAN > > > ------------------------------------------------------------------------------------------------------------------------------ > Bitmap Heap Scan on tags_local (cost=9.01..122.71 rows=729 width=4) > (actual time=1.014..1.014 rows=0 loops=1) > Recheck Cond: (lower((name)::text) ~~ 'xx'::text) > -> Bitmap Index Scan on tags_local_name_trg (cost=0.00..8.82 rows=729 > width=0) (actual time=1.013..1.013 rows=0 loops=1) > Index Cond: (lower((name)::text) ~~ 'xx'::text) > Planning time: 0.546 ms > Execution time: 1.076 ms > (6 rows) > > > > > -- > Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-bugs > -- Artur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
On Fri, Mar 18, 2016 at 3:37 PM, Artur Zakirov <a.zakirov@postgrespro.ru> wrote: > Hello, > > Unfortunately, pg_trgm can not support '=' operator. If I am not mistaken it > is GiST and GIN limitation. I think it is actually quite trivial to support '='. In fact I think that all you have to do is tap into the same code that LIKE already uses, and let the recheck remove things which match on the LIKE interpretation of the query string but not the equality interpretation. Attached is a crude patch I put together which seems to do the job, but I haven't thoroughly tested it. The main problem is likely to be that there is already a really good index type for speeding up equality queries (btree), and adding another (generally much worse) alternative is likely to confuse the planner more than anything. Is it really worth taking the performance hit on executing the equality query in order to avoid just keeping a second btree index? If I could somehow turn this into an extension module that installed with pg_trgm as a dependency, rather than reaching into pg_trgm's internals, then it might be worthwhile putting something like this on PGXN. But I don't know how to do that. And it doesn't seem worthwhile to change pg_trgm itself in this way. But in any case, it isn't a bug that pg_trgm doesn't do everything it theoretically could do. Cheers, Jeff
Вложения
On 20.03.2016 04:06, Jeff Janes wrote: > > I think it is actually quite trivial to support '='. In fact I think > that all you have to do is tap into the same code that LIKE already > uses, and let the recheck remove things which match on the LIKE > interpretation of the query string but not the equality > interpretation. > > Attached is a crude patch I put together which seems to do the job, > but I haven't thoroughly tested it. > > The main problem is likely to be that there is already a really good > index type for speeding up equality queries (btree), and adding > another (generally much worse) alternative is likely to confuse the > planner more than anything. Is it really worth taking the performance > hit on executing the equality query in order to avoid just keeping a > second btree index? Maybe not :) > > If I could somehow turn this into an extension module that installed > with pg_trgm as a dependency, rather than reaching into pg_trgm's > internals, then it might be worthwhile putting something like this on > PGXN. But I don't know how to do that. And it doesn't seem > worthwhile to change pg_trgm itself in this way. > > But in any case, it isn't a bug that pg_trgm doesn't do everything it > theoretically could do. Yes. And so this changes can be applied only for PostgreSQL 9.7. > > Cheers, > > Jeff > -- Artur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
> The main problem is likely to be that there is already a really good > index type for speeding up equality queries (btree), and adding > another (generally much worse) alternative is likely to confuse the > planner more than anything. Is it really worth taking the performance > hit on executing the equality query in order to avoid just keeping a > second btree index? I don't think it is a problem. I think btree would usually have lower cost than GIN and therefore would be preferred by the planner. The other opclasses are providing operators for such cases. For example, range_ops(GiST) supports =, inet_ops(GiST) supports all basic comparisons. It is likely that pg_trgm index created for pattern matching would be sufficient for equality for some users. We shouldn't force them to create an additional btree index. > If I could somehow turn this into an extension module that installed > with pg_trgm as a dependency, rather than reaching into pg_trgm's > internals, then it might be worthwhile putting something like this on > PGXN. But I don't know how to do that. And it doesn't seem > worthwhile to change pg_trgm itself in this way. I don't think it is useful to have such a small extension, even if it would be possible. > But in any case, it isn't a bug that pg_trgm doesn't do everything it > theoretically could do. I agree. Maybe we can add it to the next commitfest?
Hi, I all brought up reasonable arguments. On Thu, Mar 24, 2016 at 6:11 PM, Emre Hasegeli <emre@hasegeli.com> wrote: > It is likely that pg_trgm index created for pattern matching would be > sufficient for equality for some users. We shouldn't force them to > create an additional btree index. > This lines up with my point of view. I just don't want to create 19 btree indexes as long as trigram one is fast enough for my load. Switching to using LIKE in my case, good it's not big effort in this situation. -- Best regards, Ruslan.
On Thu, Mar 24, 2016 at 8:11 AM, Emre Hasegeli <emre@hasegeli.com> wrote: >> The main problem is likely to be that there is already a really good >> index type for speeding up equality queries (btree), and adding >> another (generally much worse) alternative is likely to confuse the >> planner more than anything. Is it really worth taking the performance >> hit on executing the equality query in order to avoid just keeping a >> second btree index? > > I don't think it is a problem. I think btree would usually have lower > cost than GIN and therefore would be preferred by the planner. The > other opclasses are providing operators for such cases. For example, > range_ops(GiST) supports =, inet_ops(GiST) supports all basic > comparisons. You are quite likely correct that the planner is smart enough to not be fooled by this. However, I don't know how to convince myself of this to a high enough certainty. While the risk may be low in probability, any negative result that does occur is high in magnitude. While the benefit, to those who would benefit, is high probability, but low in magnitude. > > It is likely that pg_trgm index created for pattern matching would be > sufficient for equality for some users. We shouldn't force them to > create an additional btree index. True, but we also shouldn't force people with no interest in this equality operator to take some unknown risk that the planner will be seduced into unfavorable plans. Which is why an extension for those who want it seems like the optimal solution. So perhaps we could add the C code to the pg_trgm extension itself, but leave out the SQL which enables it to be used. Then have that enabling SQL be part of a separate extension, so people get to choose whether to use it or not. > >> If I could somehow turn this into an extension module that installed >> with pg_trgm as a dependency, rather than reaching into pg_trgm's >> internals, then it might be worthwhile putting something like this on >> PGXN. But I don't know how to do that. And it doesn't seem >> worthwhile to change pg_trgm itself in this way. > > I don't think it is useful to have such a small extension, even if it > would be possible. > >> But in any case, it isn't a bug that pg_trgm doesn't do everything it >> theoretically could do. > > I agree. Maybe we can add it to the next commitfest? I won't do that myself, but anyone is welcome to polish my code and submit it if they want to. They should also consider whether there is more to be gained by enforcing not only that all necessary trigrams exist, but also that no unnecessary ones do--I had not spent much time pondering that. Still, I would want there to be a way to turn it off. Maybe someone has a better way of doing that than my idea of it being a separate extension. Cheers, Jeff