Re: pg_trgm partial-match

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: pg_trgm partial-match
Дата
Msg-id CAPpHfdvuQPDgckJZWcgq=ggQGO1whvPQBKp+pEdzyW=4jVyK=Q@mail.gmail.com
обсуждение исходный текст
Ответ на pg_trgm partial-match  (Fujii Masao <masao.fujii@gmail.com>)
Ответы Re: pg_trgm partial-match  (Alexander Korotkov <aekorotkov@gmail.com>)
Список pgsql-hackers
Hi!

On Thu, Nov 15, 2012 at 11:39 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
Note that we cannot do a partial-match if KEEPONLYALNUM is disabled,
i.e., if query key contains multibyte characters. In this case, byte length of
the trigram string might be larger than three, and its CRC is used as a
trigram key instead of the trigram string itself. Because of using CRC, we
cannot do a partial-match. Attached patch extends pg_trgm so that it
compares a partial-match query key only when KEEPONLYALNUM is
enabled.

Didn't get this point. How does KEEPONLYALNUM guarantee that each trigram character is singlebyte?

CREATE TABLE test (val TEXT);
INSERT INTO test VALUES ('aa'), ('aaa'), ('шaaш');
CREATE INDEX trgm_idx ON test USING gin (val gin_trgm_ops);
ANALYZE test;
test=# SELECT * FROM test WHERE val LIKE '%aa%';
 val  
------
 aa
 aaa
 шaaш
(3 rows)
test=# set enable_seqscan = off;
SET
test=# SELECT * FROM test WHERE val LIKE '%aa%';
 val 
-----
 aa
 aaa
(2 rows)

I think we can use partial match only for singlebyte encodings. Or, at most, in cases when all alpha-numeric characters are singlebyte (have no idea how to check this).

------
With best regards,
Alexander Korotkov.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: JiangGuiqing
Дата:
Сообщение: [PATCH] Patch to fix missing libecpg_compat.lib and libpgtypes.lib.
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: [PATCH 13/14] Introduce pg_receivellog, the pg_receivexlog equivalent for logical changes