BUG #15014: pg_trgm regexp with wchar not good?

Поиск
Список
Период
Сортировка
От PG Bug reporting form
Тема BUG #15014: pg_trgm regexp with wchar not good?
Дата
Msg-id 20180118130346.3630.56019@wrigleys.postgresql.org
обсуждение исходный текст
Ответы Re: BUG #15014: pg_trgm regexp with wchar not good?  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
The following bug has been logged on the website:

Bug reference:      15014
Logged by:          Zhou Digoal
Email address:      digoal@126.com
PostgreSQL version: 10.1
Operating system:   CentOS 7.x.x64
Description:

HI,

when i use pg_trgm's gin index, with wchar search, it's not good for regexp,
but good for like express. 


```
postgres=# \l
                                 List of databases
   Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access
privileges   
-----------+----------+----------+------------+------------+-----------------------
 postgres  | postgres | UTF8     | en_US.UTF8 | en_US.UTF8 | 
 template0 | postgres | UTF8     | en_US.UTF8 | en_US.UTF8 | =c/postgres
     +
           |          |          |            |            |
postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF8 | en_US.UTF8 | =c/postgres
     +

create or replace function gen_hanzi(int) returns text as $$  
declare  
  res text;  
begin  
  if $1 >=1 then  
    select string_agg(chr(19968+(random()*20901)::int), '') into res from
generate_series(1,$1);  
    return res;  
  end if;  
  return null;  
end;  
$$ language plpgsql strict;  


postgres=# create table test(id int, info text);
CREATE TABLE
postgres=# insert into test select generate_series(1,100000),
gen_hanzi(100);
INSERT 0 100000
postgres=# create index idx_test_1 on test using gin (info gin_trgm_ops);
CREATE INDEX


postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test
where info ~ '婐绷乂畳';
                                                             QUERY PLAN
                                                       

------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.test  (cost=45261409.28..45261421.30 rows=10
width=36) (actual time=583.810..816.503 rows=1 loops=1)
   Output: id, info
   Recheck Cond: (test.info ~ '婐绷乂畳'::text)
   Rows Removed by Index Recheck: 99999
   Heap Blocks: exact=4167
   Buffers: shared hit=59783
   ->  Bitmap Index Scan on idx_test_1  (cost=0.00..45261409.28 rows=10
width=0) (actual time=583.237..583.237 rows=100000 loops=1)
         Index Cond: (test.info ~ '婐绷乂畳'::text)
         Buffers: shared hit=55616
 Planning time: 0.150 ms
 Execution time: 816.545 ms
(11 rows)



postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test
where info ~ '123';
                                                      QUERY PLAN
                                          

-----------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.test  (cost=39.40..2897.60 rows=4000 width=36)
(actual time=0.046..0.046 rows=0 loops=1)
   Output: id, info
   Recheck Cond: (test.info ~ '123'::text)
   Buffers: shared hit=4
   ->  Bitmap Index Scan on idx_test_1  (cost=0.00..38.40 rows=4000 width=0)
(actual time=0.043..0.043 rows=0 loops=1)
         Index Cond: (test.info ~ '123'::text)
         Buffers: shared hit=4
 Planning time: 0.146 ms
 Execution time: 0.072 ms
(9 rows)
```

best regards,
digoal


В списке pgsql-bugs по дате отправления:

Предыдущее
От: Kyotaro HORIGUCHI
Дата:
Сообщение: Re: Index-only scan returns incorrect results when using acomposite GIST index with a gist_trgm_ops column.
Следующее
От: Tom Lane
Дата:
Сообщение: Re: BUG #15014: pg_trgm regexp with wchar not good?