Обсуждение: Question on Trigram GIST indexes
@Moderators: I am reposting this because the original from 22 December apparently didn't arrive on the list.
I was trying to make Postgresql use a trigram gist index on a varchar field, but to no avail.
Specifically, I was trying to replicate what is done in this blog post:
http://www.postgresonline.com/journal/archives/212-PostgreSQL-9.1-Trigrams-teaching-LIKE-and-ILIKE-new-tricks.html
I use Postgresql 9.1.7 on Linux FC17 64bit, my locale is UTF8.
My full table definition is
CREATE TABLE "TEST"
(
"RECID" bigint NOT NULL DEFAULT next_id(),
"TST_PAYLOAD" character varying(255),
CONSTRAINT "PK_TEST" PRIMARY KEY ("RECID")
USING INDEX TABLESPACE local
)
WITH (
OIDS=FALSE
);
CREATE INDEX "TEST_PAYLOAD_PATTERN_1_IDX"
ON "TEST"
USING btree
("TST_PAYLOAD" COLLATE pg_catalog."default" varchar_pattern_ops)
TABLESPACE local;
CREATE INDEX "TEST_PAYLOAD_TRIGRAM_GIST_1_IDX"
ON "TEST"
USING gist
("TST_PAYLOAD" COLLATE pg_catalog."default" gist_trgm_ops)
TABLESPACE local;
CREATE INDEX "TEST_PAYLOAD_TRIGRAM_GIN_1_IDX"
ON "TEST"
USING gin
("TST_PAYLOAD" COLLATE pg_catalog."default" gin_trgm_ops)
TABLESPACE local;
The COLLATE pg_catalog."default" clause is inserted by the DB (default is "Unicode"). I also tried to define the Trigram index with COLLATE pg_catalog."C" but the behavior did not change. I did vacuum and analyze after creating each index.
The field "TST_PAYLOAD" contains 26389 names of cities, all in uppercase.
I have pg_tgrm installed - actually all extensions are present.
Queries which use "WHERE "TST_PAYLOAD" LIKE 'SEAT%'" go to the btree index as it should.
Queries which use "WHERE "TST_PAYLOAD" LIKE '%EAT%'" *should* use the GIST index but do a full table scan instead.
(I am looking for names like 'SEATTLE' in this example)
I also tried dropping the btree index but that has no influence on the behavior.
I have texts/strings in different languages/charsets, so UTF8 looked like the best decision to me, instead of, say, ISO-8859-15, which is limited to just some European charsets. Specifically I am storing strings in European languages (corresponding to the ISO-8859 series) including diacrites line äöüñáéíóú ..., Russian, Arabic, Chinese etc. in one column instead of making different columns/tables and using them via a view because that's my use case and UTF8 should accommodate that IMHO (or is that an abuse of the DB?)
Would it help to `ALTER DATABASE set lc_collate = 'C'`,supposing that is possible? (Oracle doesn't allow that iirc).
Thanks for any insights, pointers ...
I'd be grateful if anybody could explain to me what I am doing wrong.
Thanks in advance.
On 5 January 2013 20:20, ERR ORR <rd0002@gmail.com> wrote:
I have the same problem:
Index:
CREATE INDEX akb_art_abstract_trgm
ON akb_articles
USING gin
(abstract gin_trgm_ops);
and
SELECT title, SIMILARITY(abstract, 'skeef') FROM akb_articles WHERE SIMILARITY (abstract, 'water') > 0
results in a full sequential scan:
"Seq Scan on public.akb_articles (cost=0.00..45751.67 rows=107025 width=666) (actual time=0.236..63153.268 rows=169265 loops=1)"
" Output: title, similarity(abstract, 'skeef'::text)"
" Filter: (similarity(akb_articles.abstract, 'water'::text) > 0::double precision)"
" Buffers: shared hit=39000 read=46460"
"Total runtime: 63173.663 ms"
Regards
Johann
--
Because experiencing your loyal love is better than life itself,
my lips will praise you. (Psalm 63:3)
Queries which use "WHERE "TST_PAYLOAD" LIKE 'SEAT%'" go to the btree index as it should.Queries which use "WHERE "TST_PAYLOAD" LIKE '%EAT%'" *should* use the GIST index but do a full table scan instead.(I am looking for names like 'SEATTLE' in this example)I also tried dropping the btree index but that has no influence on the behavior.
I have the same problem:
Index:
CREATE INDEX akb_art_abstract_trgm
ON akb_articles
USING gin
(abstract gin_trgm_ops);
and
SELECT title, SIMILARITY(abstract, 'skeef') FROM akb_articles WHERE SIMILARITY (abstract, 'water') > 0
results in a full sequential scan:
"Seq Scan on public.akb_articles (cost=0.00..45751.67 rows=107025 width=666) (actual time=0.236..63153.268 rows=169265 loops=1)"
" Output: title, similarity(abstract, 'skeef'::text)"
" Filter: (similarity(akb_articles.abstract, 'water'::text) > 0::double precision)"
" Buffers: shared hit=39000 read=46460"
"Total runtime: 63173.663 ms"
Regards
Johann
Because experiencing your loyal love is better than life itself,
my lips will praise you. (Psalm 63:3)
On Sat, Jan 5, 2013 at 12:20 PM, ERR ORR <rd0002@gmail.com> wrote: > > @Moderators: I am reposting this because the original from 22 December > apparently didn't arrive on the list. > > I was trying to make Postgresql use a trigram gist index on a varchar field, > but to no avail. > > Specifically, I was trying to replicate what is done in this blog post: > http://www.postgresonline.com/journal/archives/212-PostgreSQL-9.1-Trigrams-teaching-LIKE-and-ILIKE-new-tricks.html > > > I use Postgresql 9.1.7 on Linux FC17 64bit, my locale is UTF8. > > My full table definition is > > CREATE TABLE "TEST" > ( > "RECID" bigint NOT NULL DEFAULT next_id(), > "TST_PAYLOAD" character varying(255), > CONSTRAINT "PK_TEST" PRIMARY KEY ("RECID") > USING INDEX TABLESPACE local > ) > WITH ( > OIDS=FALSE > ); > > CREATE INDEX "TEST_PAYLOAD_PATTERN_1_IDX" > ON "TEST" > USING btree > ("TST_PAYLOAD" COLLATE pg_catalog."default" varchar_pattern_ops) > TABLESPACE local; > > CREATE INDEX "TEST_PAYLOAD_TRIGRAM_GIST_1_IDX" > ON "TEST" > USING gist > ("TST_PAYLOAD" COLLATE pg_catalog."default" gist_trgm_ops) > TABLESPACE local; > > CREATE INDEX "TEST_PAYLOAD_TRIGRAM_GIN_1_IDX" > ON "TEST" > USING gin > ("TST_PAYLOAD" COLLATE pg_catalog."default" gin_trgm_ops) > TABLESPACE local; > > > The COLLATE pg_catalog."default" clause is inserted by the DB (default is > "Unicode"). I also tried to define the Trigram index with COLLATE > pg_catalog."C" but the behavior did not change. I did vacuum and analyze > after creating each index. > > The field "TST_PAYLOAD" contains 26389 names of cities, all in uppercase. > > I have pg_tgrm installed - actually all extensions are present. > > Queries which use "WHERE "TST_PAYLOAD" LIKE 'SEAT%'" go to the btree index > as it should. > Queries which use "WHERE "TST_PAYLOAD" LIKE '%EAT%'" *should* use the GIST > index but do a full table scan instead. > (I am looking for names like 'SEATTLE' in this example) where did you determine that pg_trgm should optimize like expressions? pg_trgm provides new operators that are used to index on string similarity... merlin
On Tue, Jan 22, 2013 at 8:07 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Sat, Jan 5, 2013 at 12:20 PM, ERR ORR <rd0002@gmail.com> wrote: >> >> @Moderators: I am reposting this because the original from 22 December >> apparently didn't arrive on the list. >> >> I was trying to make Postgresql use a trigram gist index on a varchar field, >> but to no avail. >> >> Specifically, I was trying to replicate what is done in this blog post: >> http://www.postgresonline.com/journal/archives/212-PostgreSQL-9.1-Trigrams-teaching-LIKE-and-ILIKE-new-tricks.html >> >> >> I use Postgresql 9.1.7 on Linux FC17 64bit, my locale is UTF8. >> >> My full table definition is >> >> CREATE TABLE "TEST" >> ( >> "RECID" bigint NOT NULL DEFAULT next_id(), >> "TST_PAYLOAD" character varying(255), >> CONSTRAINT "PK_TEST" PRIMARY KEY ("RECID") >> USING INDEX TABLESPACE local >> ) >> WITH ( >> OIDS=FALSE >> ); >> >> CREATE INDEX "TEST_PAYLOAD_PATTERN_1_IDX" >> ON "TEST" >> USING btree >> ("TST_PAYLOAD" COLLATE pg_catalog."default" varchar_pattern_ops) >> TABLESPACE local; >> >> CREATE INDEX "TEST_PAYLOAD_TRIGRAM_GIST_1_IDX" >> ON "TEST" >> USING gist >> ("TST_PAYLOAD" COLLATE pg_catalog."default" gist_trgm_ops) >> TABLESPACE local; >> >> CREATE INDEX "TEST_PAYLOAD_TRIGRAM_GIN_1_IDX" >> ON "TEST" >> USING gin >> ("TST_PAYLOAD" COLLATE pg_catalog."default" gin_trgm_ops) >> TABLESPACE local; >> >> >> The COLLATE pg_catalog."default" clause is inserted by the DB (default is >> "Unicode"). I also tried to define the Trigram index with COLLATE >> pg_catalog."C" but the behavior did not change. I did vacuum and analyze >> after creating each index. >> >> The field "TST_PAYLOAD" contains 26389 names of cities, all in uppercase. >> >> I have pg_tgrm installed - actually all extensions are present. >> >> Queries which use "WHERE "TST_PAYLOAD" LIKE 'SEAT%'" go to the btree index >> as it should. >> Queries which use "WHERE "TST_PAYLOAD" LIKE '%EAT%'" *should* use the GIST >> index but do a full table scan instead. >> (I am looking for names like 'SEATTLE' in this example) > > where did you determine that pg_trgm should optimize like expressions? > pg_trgm provides new operators that are used to index on string > similarity... oops -- heh -- I guess you *can* do that (after further documentation review). hm...it works for me... merlin