Re: Fastest Index/Algorithm to find similar sentences

Поиск
Список
Период
Сортировка
От Beena Emerson
Тема Re: Fastest Index/Algorithm to find similar sentences
Дата
Msg-id CAOG9ApEaGjHaFtm2XrVGYc6WbYFva3JzLxa6ANSFFyW_-mFkQA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Fastest Index/Algorithm to find similar sentences  (Beena Emerson <memissemerson@gmail.com>)
Список pgsql-general

I am sorry, I just re-read your mail and realized  you have already tried with pg_trgm.



On Wed, Jul 31, 2013 at 7:23 PM, Beena Emerson <memissemerson@gmail.com> wrote:
On Sat, Jul 27, 2013 at 10:34 PM, Janek Sendrowski <janek12@web.de> wrote:
Hi Sergey Konoplev,
 
If I'm searching for a sentence like "The tiger is the largest cat species" for example.
 
I can only find the sentences, which include the words "tiger, largest, cat, species", but I also like to have the sentences with only three or even two of these words.
 
Janek


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Hi,

You may use similarity functions of pg_trgm.

Example:
=# \d+ test
                        Table "public.test"
 Column | Type | Modifiers | Storage  | Stats target | Description 
--------+------+-----------+----------+--------------+-------------
 col    | text |           | extended |              | 
Indexes:
    "test_idx" gin (col gin_trgm_ops)
Has OIDs: no

# SELECT * FROM test;
                   col                   
-----------------------------------------
 The tiger is the largest cat species
 The cheetah is the fastest  cat species
 The peacock is the largest bird species
(3 rows)

=# SELECT show_limit();
 show_limit 
------------
        0.3
(1 row)

=# SELECT col, similarity(col, 'The tiger is the largest cat species') AS sml
  FROM test WHERE col % 'The tiger is the largest cat species'
  ORDER BY sml DESC, col;
                   col                   |   sml    
-----------------------------------------+----------
 The tiger is the largest cat species    |        1
 The peacock is the largest bird species | 0.511111
 The cheetah is the fastest  cat species | 0.466667
(3 rows)

=# SELECT set_limit(0.5);
 set_limit 
-----------
       0.5
(1 row)

=# SELECT col, similarity(col, 'The tiger is the largest cat species') AS sml
  FROM test WHERE col % 'The tiger is the largest cat species'
  ORDER BY sml DESC, col;
                   col                   |   sml    
-----------------------------------------+----------
 The tiger is the largest cat species    |        1
 The peacock is the largest bird species | 0.511111
(2 rows)

=# SELECT set_limit(0.9);
 set_limit 
-----------
       0.9
(1 row)

=# SELECT col, similarity(col, 'The tiger is the largest cat species') AS sml
  FROM test WHERE col % 'The tiger is the largest cat species'
  ORDER BY sml DESC, col;
                 col                  | sml 
--------------------------------------+-----
 The tiger is the largest cat species |   1
(1 row)


When you set a higher limit, you get more exact matches.


--
Beena Emerson




--
Beena Emerson

В списке pgsql-general по дате отправления:

Предыдущее
От: Beena Emerson
Дата:
Сообщение: Re: Fastest Index/Algorithm to find similar sentences
Следующее
От: Adrian Klaver
Дата:
Сообщение: Re: Postgres 9.2.4 for Windows (Vista) Dell Vostro 400, re-installation failure PLEASE CAN SOMEONE HELP!!