trgm regex index peculiarity
От | Erik Rijkers |
---|---|
Тема | trgm regex index peculiarity |
Дата | |
Msg-id | 786ef170d3cb6b991425a8b364e77c57.squirrel@webmail.xs4all.nl обсуждение исходный текст |
Ответы |
Re: trgm regex index peculiarity
|
Список | pgsql-hackers |
9.4devel (but same in 9.3) In a 112 MB test table (containing random generated text) with a trgm index (gin_trgm_ops), I consistently get these timings: select txt from azjunk6 where txt ~ '^abcd'; 130 ms select txt from azjunk6 where txt ~ 'abcd' and substr(txt,1,4) = 'abcd'; 3 ms (a similar performance difference occurs when using a regex, i.e. 'abc[de]' ) This difference is so large that I wonder if there is not something wrong in the first case. (The returned results are correct though) Here are the two explains: explain analyze select txt from azjunk6 where txt ~ '^abcd'; QUERYPLAN -------------------------------------------------------------------------------------------------------------------------------------Bitmap HeapScan on azjunk6 (cost=108.78..484.93 rows=100 width=81) (actual time=129.557..129.742 rows=1 loops=1) Recheck Cond:(txt ~ '^abcd'::text) Rows Removed by Index Recheck: 17 -> Bitmap Index Scan on azjunk6_trgm_re_idx (cost=0.00..108.75rows=100 width=0) (actual time=129.503..129.503 rows=18 loops=1) Index Cond: (txt ~ '^abcd'::text)Total runtime: 130.008 ms (6 rows) explain analyze select txt from azjunk6 where txt ~ 'abcd' and substr(txt,1,4) = 'abcd'; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------Bitmap HeapScan on azjunk6 (cost=56.75..433.40 rows=1 width=81) (actual time=2.064..3.379 rows=1 loops=1) Recheck Cond: (txt ~'abcd'::text) Rows Removed by Index Recheck: 14 Filter: (substr(txt, 1, 4) = 'abcd'::text) Rows Removed by Filter: 112 -> Bitmap Index Scan on azjunk6_trgm_re_idx (cost=0.00..56.75 rows=100 width=0) (actual time=1.911..1.911 rows=127 loops=1) Index Cond: (txt ~ 'abcd'::text)Total runtime: 3.409 ms (8 rows) The results in both cases are correct, but does this difference not almost amount to a bug? ( Interestingly, the variant WHERE txt ~ 'abcd$' is as fast as the non-anchored variant ) Thanks, Erik Rijkers
В списке pgsql-hackers по дате отправления: