Re: Searching for "bare" letters

Поиск
Список
Период
Сортировка
От Cody Caughlan
Тема Re: Searching for "bare" letters
Дата
Msg-id CCD811C9-ADF8-4073-A0B8-E5A9704AE299@gmail.com
обсуждение исходный текст
Ответ на Re: Searching for "bare" letters  (planas <jslozier@gmail.com>)
Список pgsql-general
One approach would be to "normalize" all the text and search against that.

That is, basically convert all non-ASCII characters to their equivalents. 

I've had to do this in Solr for searching for the exact reasons you've outlined: treat "ñ" as "n". Ditto for "ü" -> "u", "é" => "e", etc.

This is easily done in Solr via the included ASCIIFoldingFilterFactory:


You could look at the code to see how they do the conversion and implement it.

/Cody

On Oct 1, 2011, at 7:09 PM, planas wrote:

On Sun, 2011-10-02 at 01:25 +0200, Reuven M. Lerner wrote:
Hi, everyone.  I'm working on a project on PostgreSQL 9.0 (soon to be upgraded to 9.1, given that we haven't yet launched).  The project will involve numerous text fields containing English, Spanish, and Portuguese.  Some of those text fields will be searchable by the user.  That's easy enough to do; for our purposes, I was planning to use some combination of LIKE searches; the database is small enough that this doesn't take very much time, and we don't expect the number of searchable records (or columns within those records) to be all that large.

The thing is, the people running the site want searches to work on what I'm calling (for lack of a better term) "bare" letters.  That is, if the user searches for "n", then the search should also match Spanish words containing "ñ".  I'm told by Spanish-speaking members of the team that this is how they would expect searches to work.  However, when I just did a quick test using a UTF-8 encoded 9.0 database, I found that PostgreSQL didn't  see the two characters as identical.  (I must say, this is the behavior that I would have expected, had the Spanish-speaking team member not said anything on the subject.)

So my question is whether I can somehow wrangle PostgreSQL into thinking that "n" and "ñ" are the same character for search purposes, or if I need to do something else -- use regexps, keep a "naked," searchable version of each column alongside the native one, or something else entirely -- to get this to work.

Could you parse the search string for the non-English characters and convert them to the appropriate English character? My skills are not that good or I would offer more details.
Any ideas?

Thanks,

Reuven


-- 
Reuven M. Lerner -- Web development, consulting, and training
Mobile: +972-54-496-8405 * US phone: 847-230-9795
Skype/AIM: reuvenlerner


--
Jay Lozier
jslozier@gmail.com

В списке pgsql-general по дате отправления:

Предыдущее
От: Uwe Schroeder
Дата:
Сообщение: Re: Searching for "bare" letters
Следующее
От: Cody Caughlan
Дата:
Сообщение: Re: Change server encoding after the fact