The two ideas can be combined to create a workaround for accent-sensitive
nondeterministic collations that enables an ordinary btree to be exploited
if available, and also provides the full LIKE logic in either case-sensitive
or case-insensitive collations:
SELECT * FROM locations WHERE location LIKE 'midi-Pyrén%ées';
becomes:
SELECT * FROM locations
WHERE
location COLLATE "C" ILIKE 'midi-Pyrén%ées' AND
-- For CI collations only
location BETWEEN 'midi-Pyrén' AND 'midi-Pyrén' || E'\uFFFF'; --
exploitable by ordinary btree
LIKE would also be valid for a case sensitive collation, but then the
collation would be CS_AS, so it would be deterministic and no transform
would be needed.
The expression above produces a good plan, but EXPLAIN complains if the
concatenated expression is not a valid character for the current
client_encoding, which I had set to WIN1252 to display the accented
characters properly on the client:
babel=# SELECT * FROM locations
babel-# WHERE
babel-# location COLLATE "C" ILIKE 'midi-Pyrén%ées' AND
-- For CI collations only
babel-# location BETWEEN 'midi-Pyrén' AND 'midi-Pyrén' || E'\uFFFF';
-- exploitable by ordinary btree
location
-----------------
Midi-Pyrénées
midi-Pyrénées
(2 rows)
babel=# EXPLAIN VERBOSE SELECT * FROM locations
babel-# WHERE
babel-# location COLLATE "C" ILIKE 'midi-Pyrén%ées' AND
-- For CI collations only
babel-# location BETWEEN 'midi-Pyrén' AND 'midi-Pyrén' || E'\uFFFF';
-- exploitable by ordinary btree
ERROR: character with byte sequence 0xef 0xbf 0xbf in encoding "UTF8" has
no equivalent in encoding "WIN1252"
reset client_encoding;
babel=# EXPLAIN VERBOSE SELECT * FROM locations
WHERE
location COLLATE "C" ILIKE 'midi-Pyrén%ées' AND
-- For CI collations only
location BETWEEN 'midi-Pyrén' AND 'midi-Pyrén' || E'\uFFFF'; --
exploitable by ordinary btree
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Index Only Scan using *location_index* on public.locations
(cost=0.13..8.15 rows=1 width=18)
Output: location
Index Cond: ((locations.location >= 'midi-Pyrén'::text) AND
(locations.location <= 'midi-Pyrén�'::text))
Filter: ((locations.location)::text ~~* 'midi-Pyrén%ées'::text)
(4 rows)
-----
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html