Re: Index for Levenshtein distance (better format)

Поиск
Список
Период
Сортировка
От Giuseppe Broccolo
Тема Re: Index for Levenshtein distance (better format)
Дата
Msg-id 51ED6223.7080607@2ndquadrant.it
обсуждение исходный текст
Ответ на Index for Levenshtein distance (better format)  ("Janek Sendrowski" <janek12@web.de>)
Список pgsql-general
Hi Janek,

Il 21/07/2013 16:46, Janek Sendrowski ha scritto:
> Hi,
>
> Im searching for a suitable Index for my query witch compares Strings with the Levenshtein distance.
>
> I read that a prefix index would fit, but I dont know how to build it. I only know that its supported by Gist.
>
> I couldn't find an instructions so I hope someone can help me.

Consider this simple example: suppose you have a table "table1" with a
column "string" of type TEXT, and you are interested to find "string" in
your table equal to 'ciao' basing your query in Levenshtein distance.
Then, improve your query creating an index based on gist.

First of all, create extensions to use levenshtein() function, and gist
indexing for types different from PostGIS objects:

CREATE EXTENSION fuzzystrmatch;
CREATE EXTENSION btree_gist;

You can check that all works fine trying a query like:

SELECT string FROM table1 WHERE levenshtein(string,'ciao') = 0;

which finds also "string" equal to 'ciao'. Create now the index to
improve this query:

CREATE INDEX lev_idx ON table1 USING GIST(levenshtein(string,'ciao'));

And relaunch the same query. I made a simple check with a very simple
table1 example with ten raws using EXPLAIN ANALYSE to exploit the
performances: query time changes from 1.077 ms to 0.677 ms after gist
index creation.

Hope it helps,

Giuseppe.

--
Giuseppe Broccolo - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
giuseppe.broccolo@2ndQuadrant.it  |www.2ndQuadrant.it



В списке pgsql-general по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: Different transaction log for database/schema
Следующее
От: Jeff Janes
Дата:
Сообщение: Re: odd locking behaviour