Обсуждение: LIKE without wildcard different from =
We have been using the C locale for everything at our site, but there is occasionally talk of supporting characters outside the ASCII7 set. In playing around with indexing, to see what the impact of that would be, I stumbled across something which was mildly surprising. In the C locale, if you want to search for an exact value which doesn't contain wildcard characters, it doesn't matter whether you use the 'LIKE' operator or the '=' operator. With LATIN1 encoding, it made three orders of magnitude difference, both in the estimated cost and the actual run time. I'm not entirely clear on whether it would be *incorrect* for PostgreSQL to automatically turn the second query below into the first, or just too expensive an optimization to check for compared to how often it might help. "SccaParty_SearchName" btree ("searchName" varchar_pattern_ops) explain analyze select "searchName" from "SccaParty" where "searchName" like 'SMITH,JOHNBRACEYJR'; Index Scan using "SccaParty_SearchName" on "SccaParty" (cost=0.00..2.94 rows=22 width=18) (actual time=0.046..0.051 rows=2 loops=1) Index Cond: (("searchName")::text ~=~ 'SMITH,JOHNBRACEYJR'::text) Filter: (("searchName")::text ~~ 'SMITH,JOHNBRACEYJR'::text) Total runtime: 0.083 ms explain analyze select "searchName" from "SccaParty" where "searchName" = 'SMITH,JOHNBRACEYJR'; Seq Scan on "SccaParty" (cost=0.00..3014.49 rows=22 width=18) (actual time=2.395..54.228 rows=2 loops=1) Filter: (("searchName")::text = 'SMITH,JOHNBRACEYJR'::text) Total runtime: 54.274 ms I don't have a problem, and am not suggesting any action; just trying to understand this. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > We have been using the C locale for everything at our site, but > there is occasionally talk of supporting characters outside the > ASCII7 set. In playing around with indexing, to see what the impact > of that would be, I stumbled across something which was mildly > surprising. > In the C locale, if you want to search for an exact value which > doesn't contain wildcard characters, it doesn't matter whether you > use the 'LIKE' operator or the '=' operator. With LATIN1 encoding, > it made three orders of magnitude difference, both in the estimated > cost and the actual run time. What PG version are you testing? 8.4 and up should know that an exact-match pattern can be optimized regardless of the lc_collate setting. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >> We have been using the C locale for everything at our site, but >> there is occasionally talk of supporting characters outside the >> ASCII7 set. In playing around with indexing, to see what the >> impact of that would be, I stumbled across something which was >> mildly surprising. > >> In the C locale, if you want to search for an exact value which >> doesn't contain wildcard characters, it doesn't matter whether >> you use the 'LIKE' operator or the '=' operator. With LATIN1 >> encoding, it made three orders of magnitude difference, both in >> the estimated cost and the actual run time. > > What PG version are you testing? 8.4 and up should know that an > exact-match pattern can be optimized regardless of the lc_collate > setting. For reasons not worth getting into, I had an 8.3.8 database sitting around in this locale, so I was testing things there. I'll take the time to copy into an 8.4.4 database for further testing, and maybe 9.0 beta, too. That'll take hours, though, so I can't immediately test it. To be clear, though, the problem isn't that it didn't turn a LIKE with no wildcard characters into an equality test, it's that it would have been three orders of magnitude faster (because of an available index with an opclass specification) if it had treated an equality test as a LIKE. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > To be clear, though, the problem isn't that it didn't turn a LIKE > with no wildcard characters into an equality test, it's that it > would have been three orders of magnitude faster (because of an > available index with an opclass specification) if it had treated an > equality test as a LIKE. Ah. Well, the real fix for that is also in 8.4: we got rid of the separate ~=~ operator, so a text_pattern_ops index is now usable for plain =. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Ah. Well, the real fix for that is also in 8.4: we got rid of the > separate ~=~ operator, so a text_pattern_ops index is now usable > for plain =. Nice! Thanks, -Kevin