Using text search for locations, or a computed btree index

Поиск
Список
Период
Сортировка
От Jaume Sabater
Тема Using text search for locations, or a computed btree index
Дата
Msg-id 16902748.7851228411294539.JavaMail.root@zimbra.linuxsilo.net
обсуждение исходный текст
Список pgsql-admin
Hello everyone!

The guys creating the web site at the company I work for want an incremental search for locations. So, I need to find
alllocations that start with Uni, then Unit, then Unite, and so on. Problem is that the tables, although partitioned,
havegot several milion of tuples. Here is what I have in mind: 

1. There are no stop words ("Palma of Majorca" is "Palma of Majorca", not "Palma Majorca", although this is not the
bestexample). 
2. There are no synonyms that have to be matched to other words.
3. No phrases have to be mapped to single words.
4. No variations of words need to be mapped to a canonical form (although this may be false when we add alternate names
oflocations) 
5. I need to take out all accented characters and change all words to its lowercased form for faster searching and
easierlocating. 

So, I've been reading all the text search documentation available online and I think that I should be:

1. Creating an extra column with the "normalized" version of the location name.
2. Creating a GIN index on that extra column.
3. Creating a synonyms (variations) dictionary with the alternate names (touristic names that have no political
equivalentbut people know and use to search) 

What I don't know how to do is to just tell tsvector (and tsquery when searching) to do what I need and not what I
don'tneed. I can't find a way in the online documentation to tell the engine which pieces I want from the whole puzzle. 

Alternatively, provided we don't use alternate names, I could use a computed btree index that does those needed
operations:lowercase, substitute accented characters by their equivalents with a regular expression, and store that
intoan extra colum, perhaps? 

Any hints would be welcome. Thanks in advance.

--
Jaume Sabater
http://linuxsilo.net/

"Ubi sapientas ibi libertas"

В списке pgsql-admin по дате отправления:

Предыдущее
От: "Scott Marlowe"
Дата:
Сообщение: Re: autovacuum benchmarking ...
Следующее
От: Jaume Sabater
Дата:
Сообщение: Re: Using text search for locations, or a computed btree index