Обсуждение: fulltext search stemming/ spelling problems
Hi! I'm using postgres 8.4.3 and try to get stemming/ wrong word correction working. I already installed the myspell dictionaries using apt-get and created postgres dictionaries like this: Fulltext search configuration »public.english_ispell« Parser: »pg_catalog.default« Token | Dictionaries -----------------+------------------------------------ asciihword | english_ispell,english_stem,simple asciiword | english_ispell,english_stem,simple email | simple file | simple float | simple host | simple hword | english_ispell,english_stem,simple hword_asciipart | english_ispell,english_stem,simple hword_numpart | simple hword_part | english_ispell,english_stem,simple int | simple numhword | simple numword | simple sfloat | simple uint | simple url | simple url_path | simple version | simple word | english_ispell,english_stem,simple But when I do, for example, SELECT to_tsvector('english_ispell', 'gitar') the result is only: 'gitar':1 Shouldn't the word be corrected to 'guitar'? SELECT plainto_tsquery('english_ispell','gitar') doesn't work neither: 'gitar' Thanks, Corin
On Thu, 8 Apr 2010, Corin wrote: > Hi! > > I'm using postgres 8.4.3 and try to get stemming/ wrong word correction > working. > > I already installed the myspell dictionaries using apt-get and created > postgres dictionaries like this: > > Fulltext search configuration ?public.english_ispell? > Parser: ?pg_catalog.default? > Token | Dictionaries > -----------------+------------------------------------ > asciihword | english_ispell,english_stem,simple > asciiword | english_ispell,english_stem,simple > email | simple > file | simple > float | simple > host | simple > hword | english_ispell,english_stem,simple > hword_asciipart | english_ispell,english_stem,simple > hword_numpart | simple > hword_part | english_ispell,english_stem,simple > int | simple > numhword | simple > numword | simple > sfloat | simple > uint | simple > url | simple > url_path | simple > version | simple > word | english_ispell,english_stem,simple > > But when I do, for example, SELECT to_tsvector('english_ispell', 'gitar') the > result is only: > 'gitar':1 > > Shouldn't the word be corrected to 'guitar'? english_ispell dictionary is a morphology kind of dictionary ! Read docs. Also, simple dictionary will never invoked, since english_stem dictionary recognizes everything ! > > SELECT plainto_tsquery('english_ispell','gitar') doesn't work neither: > 'gitar' Better, use ts_debug() function or ts_dict() for testing. > > Thanks, > Corin > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
On 08.04.2010 20:15, Oleg Bartunov wrote: > On Thu, 8 Apr 2010, Corin wrote: > > english_ispell dictionary is a morphology kind of dictionary ! Read docs. > Also, simple dictionary will never invoked, since english_stem dictionary > recognizes everything ! I'm not sure what you mean with 'morphology'. I sure read the docs but couldn't find anything about 'morphology disctionaries'. I created it myself with the following commands, after I installed the ispell dictionaries using "apt-get": CREATE TEXT SEARCH DICTIONARY english_ispell ( TEMPLATE = ispell, DictFile = system_en_us, AffFile = system_en_us ); CREATE TEXT SEARCH CONFIGURATION english_ispell ( COPY = pg_catalog.english ); ALTER TEXT SEARCH CONFIGURATION english_ispell ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part WITH english_ispell, english_stem; Thank's for the hint with simple dictionary. I'll remove it - but when it's never triggered, I gues it won't solve my problem neither? > > Better, use ts_debug() function or ts_dict() for testing. ts_debug shows: SELECT ts_debug('english_ispell','gitar'); (asciiword,"Word, all ASCII",gitar,"{english_ispell,english_stem}",english_stem,{gitar}) (1 line) ts_dict does not seem to exist, I neither couldn't find it in the docs. > > Regards, > Oleg Thanks, Corin
On Thu, 8 Apr 2010, Corin wrote: > On 08.04.2010 20:15, Oleg Bartunov wrote: >> On Thu, 8 Apr 2010, Corin wrote: >> >> english_ispell dictionary is a morphology kind of dictionary ! Read docs. >> Also, simple dictionary will never invoked, since english_stem dictionary >> recognizes everything ! > I'm not sure what you mean with 'morphology'. I sure read the docs but > couldn't find anything about 'morphology disctionaries'. it means, that (from http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY) 12.6.5. Ispell Dictionary The Ispell dictionary template supports morphological dictionaries, which can normalize many different linguistic forms ofa word into the same lexeme. For example, an English Ispell dictionary can match all declensions and conjugations of thesearch term bank, e.g., banking, banked, banks, banks', and bank's. you confused with the name ! > > I created it myself with the following commands, after I installed the ispell > dictionaries using "apt-get": > > CREATE TEXT SEARCH DICTIONARY english_ispell ( > TEMPLATE = ispell, > DictFile = system_en_us, > AffFile = system_en_us > ); > > CREATE TEXT SEARCH CONFIGURATION english_ispell ( COPY = pg_catalog.english > ); > ALTER TEXT SEARCH CONFIGURATION english_ispell > ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, > hword_part WITH english_ispell, english_stem; > > Thank's for the hint with simple dictionary. I'll remove it - but when it's > never triggered, I gues it won't solve my problem neither? >> >> Better, use ts_debug() function or ts_dict() for testing. > ts_debug shows: > SELECT ts_debug('english_ispell','gitar'); > (asciiword,"Word, all > ASCII",gitar,"{english_ispell,english_stem}",english_stem,{gitar}) > (1 line) > > ts_dict does not seem to exist, I neither couldn't find it in the docs. sorry, ts_lexize >> >> Regards, >> Oleg > Thanks, > Corin > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
On 08.04.2010 21:27, Oleg Bartunov wrote: > it means, that (from > http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY) > > > 12.6.5. Ispell Dictionary > > The Ispell dictionary template supports morphological dictionaries, > which can normalize many different linguistic forms of a word into the > same lexeme. For example, an English Ispell dictionary can match all > declensions and conjugations of the search term bank, e.g., banking, > banked, banks, banks', and bank's. I already read this but I don't know how to solve my problems with this information. SELECT ts_lexize('english_ispell','guitar'); {guitar} (1 line) SELECT ts_lexize('english_ispell','bank'); {bank} (1 line) SELECT ts_debug('english_ispell','bank'); (asciiword,"Word, all ASCII",bank,"{english_ispell,english_stem}",english_ispell,{bank}) (1 line) SELECT plainto_tsquery('english_ispell','bank'); 'bank' (1 line) > Regards, > Oleg It would be very nice if you (or anyone else) could provide me with concrete instructions or any howto. What can I do to find the error in my setup? What output should I expect from the above comments if everything worked correctly? Thanks, Corin