Re: Similarity search with the tsearch2 extension

Поиск
Список
Период
Сортировка
От Kevin Grittner
Тема Re: Similarity search with the tsearch2 extension
Дата
Msg-id 1386341675.6717.YahooMailNeo@web162903.mail.bf1.yahoo.com
обсуждение исходный текст
Ответ на Similarity search with the tsearch2 extension  ("Janek Sendrowski" <janek12@web.de>)
Список pgsql-performance
Janek Sendrowski <janek12@web.de> wrote:

> I want to realize a Full Text Search with the tsearch2 extension.
> It should find similar sentences.
>  
> I used my own trigger to store the tsvector of the sentences and
> I created a usual gist index on them.
> I have to to use many OR statements with a low set of arguments,
> what heavy damages the performance.
> My former query looked like this:

> SELECT * FROM tablename
>   WHERE vector @@ to_tsquery('speci & tiger & total & weigh')
>     AND vector @@ to_tsquery('largest & length & m & reach')
>     AND vector @@ to_tsquery('3.3 & 306 & bodi & cat & kg');

I don't see any OR operators there.

> And thats very slow.

Are you sure it is using the index?

Anyway, it is better to show an example, with EXPLAIN ANALYZE
output.  Here's mine, involving searches of War and Peace.

test=# -- Create the table.
test=# -- In reality, I would probably make tsv NOT NULL,
test=# -- but I'm keeping the example simple...
test=# CREATE TABLE war_and_peace
test-#   (
test(#     lineno serial PRIMARY KEY,
test(#     linetext text NOT NULL,
test(#     tsv tsvector
test(#   );
CREATE TABLE
test=#
test=# -- Load from downloaded data into database.
test=# COPY war_and_peace (linetext)
test-#   FROM '/home/kgrittn/Downloads/war-and-peace.txt';
COPY 65007
test=#
test=# -- "Digest" data to lexemes.
test=# UPDATE war_and_peace
test-#   SET tsv = to_tsvector('english', linetext);
UPDATE 65007
test=#
test=# -- Index the lexemes using GIN.
test=# CREATE INDEX war_and_peace_tsv
test-#   ON war_and_peace
test-#   USING gin (tsv);
CREATE INDEX
test=#
test=# -- Make sure the database has statistics.
test=# VACUUM ANALYZE war_and_peace;
VACUUM
test=#
test=# -- Find lines with "gentlemen".
test=# EXPLAIN ANALYZE
test-# SELECT * FROM war_and_peace
test-#   WHERE tsv @@ to_tsquery('english', 'gentlemen');
                                                         QUERY
PLAN                                                          

-----------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on war_and_peace  (cost=12.52..240.18 rows=67 width=115) (actual time=0.058..0.130 rows=84 loops=1)
   Recheck Cond: (tsv @@ '''gentlemen'''::tsquery)
   ->  Bitmap Index Scan on war_and_peace_tsv  (cost=0.00..12.50 rows=67 width=0) (actual time=0.045..0.045 rows=84
loops=1)
         Index Cond: (tsv @@ '''gentlemen'''::tsquery)
 Total runtime: 0.160 ms
(5 rows)

test=#
test=# -- Find lines with "ladies".
test=# EXPLAIN ANALYZE
test-# SELECT * FROM war_and_peace
test-#   WHERE tsv @@ to_tsquery('english', 'ladies');
                                                          QUERY
PLAN                                                           

-------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on war_and_peace  (cost=13.39..547.24 rows=180 width=115) (actual time=0.062..0.215 rows=184 loops=1)
   Recheck Cond: (tsv @@ '''ladi'''::tsquery)
   ->  Bitmap Index Scan on war_and_peace_tsv  (cost=0.00..13.35 rows=180 width=0) (actual time=0.043..0.043 rows=184
loops=1)
         Index Cond: (tsv @@ '''ladi'''::tsquery)
 Total runtime: 0.247 ms
(5 rows)

test=#
test=# -- Find lines with "ladies" and "gentlemen".
test=# EXPLAIN ANALYZE
test-# SELECT * FROM war_and_peace
test-#   WHERE tsv @@ to_tsquery('english', 'ladies & gentlemen');
                                                        QUERY
PLAN                                                         

---------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on war_and_peace  (cost=20.00..24.01 rows=1 width=115) (actual time=0.062..0.063 rows=1 loops=1)
   Recheck Cond: (tsv @@ '''ladi'' & ''gentlemen'''::tsquery)
   ->  Bitmap Index Scan on war_and_peace_tsv  (cost=0.00..20.00 rows=1 width=0) (actual time=0.057..0.057 rows=1
loops=1)
         Index Cond: (tsv @@ '''ladi'' & ''gentlemen'''::tsquery)
 Total runtime: 0.090 ms
(5 rows)

test=#
test=# -- Find lines with ("ladies" and "gentlemen") and ("provinces" and "distance").
test=# EXPLAIN ANALYZE
test-# SELECT * FROM war_and_peace
test-#   WHERE tsv @@ to_tsquery('english', 'ladies & gentlemen')
test-#     AND tsv @@ to_tsquery('english', 'provinces & distance');
                                                        QUERY
PLAN                                                         

---------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on war_and_peace  (cost=36.00..40.02 rows=1 width=115) (actual time=0.100..0.100 rows=1 loops=1)
   Recheck Cond: ((tsv @@ '''ladi'' & ''gentlemen'''::tsquery) AND (tsv @@ '''provinc'' & ''distanc'''::tsquery))
   ->  Bitmap Index Scan on war_and_peace_tsv  (cost=0.00..36.00 rows=1 width=0) (actual time=0.095..0.095 rows=1
loops=1)
         Index Cond: ((tsv @@ '''ladi'' & ''gentlemen'''::tsquery) AND (tsv @@ '''provinc'' & ''distanc'''::tsquery))
 Total runtime: 0.130 ms
(5 rows)

test=#
test=# -- Find lines with ("ladies" or "gentlemen") and ("provinces" or "distance").
test=# EXPLAIN ANALYZE
test-# SELECT * FROM war_and_peace
test-#   WHERE tsv @@ to_tsquery('english', 'ladies | gentlemen')
test-#     AND tsv @@ to_tsquery('english', 'provinces | distance');
                                                        QUERY
PLAN                                                         

---------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on war_and_peace  (cost=36.00..40.02 rows=1 width=115) (actual time=0.043..0.043 rows=1 loops=1)
   Recheck Cond: ((tsv @@ '''ladi'' | ''gentlemen'''::tsquery) AND (tsv @@ '''provinc'' | ''distanc'''::tsquery))
   ->  Bitmap Index Scan on war_and_peace_tsv  (cost=0.00..36.00 rows=1 width=0) (actual time=0.042..0.042 rows=1
loops=1)
         Index Cond: ((tsv @@ '''ladi'' | ''gentlemen'''::tsquery) AND (tsv @@ '''provinc'' | ''distanc'''::tsquery))
 Total runtime: 0.056 ms
(5 rows)

test=#
test=# -- Find lines with ("ladies" and "gentlemen") or ("provinces" and "distance").
test=# EXPLAIN ANALYZE
test-# SELECT * FROM war_and_peace
test-#   WHERE tsv @@ to_tsquery('english', 'ladies & gentlemen')
test-#      OR tsv @@ to_tsquery('english', 'provinces & distance');
                                                           QUERY
PLAN                                                            

---------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on war_and_peace  (cost=40.00..44.02 rows=1 width=115) (actual time=0.080..0.080 rows=1 loops=1)
   Recheck Cond: ((tsv @@ '''ladi'' & ''gentlemen'''::tsquery) OR (tsv @@ '''provinc'' & ''distanc'''::tsquery))
   ->  BitmapOr  (cost=40.00..40.00 rows=1 width=0) (actual time=0.076..0.076 rows=0 loops=1)
         ->  Bitmap Index Scan on war_and_peace_tsv  (cost=0.00..20.00 rows=1 width=0) (actual time=0.052..0.052 rows=1
loops=1)
               Index Cond: (tsv @@ '''ladi'' & ''gentlemen'''::tsquery)
         ->  Bitmap Index Scan on war_and_peace_tsv  (cost=0.00..20.00 rows=1 width=0) (actual time=0.024..0.024 rows=1
loops=1)
               Index Cond: (tsv @@ '''provinc'' & ''distanc'''::tsquery)
 Total runtime: 0.116 ms
(8 rows)

Can you provide a similar example which slows the slowness you report?

                                                                                                                                                                                                                                                                             
 
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-performance по дате отправления:

Предыдущее
От: "Janek Sendrowski"
Дата:
Сообщение: Similarity search with the tsearch2 extension
Следующее
От: chidamparam muthusamy
Дата:
Сообщение: postgres performance