Обсуждение: [tsearch2] Problem with case sensitivity (or with creating own dictionary)

Поиск
Список
Период
Сортировка

[tsearch2] Problem with case sensitivity (or with creating own dictionary)

От
Krzysztof xaru Rajda
Дата:
Hello,

I encountered such a problem. my goal is to extract links from a text
using tsearch2. Everything seemed to be well, unless I got some youtube
links - there are some small and big letters inside, and a tsearch
parser is lowering everything (from http://youtube.com/Y6dsHDX I got
http://youtube.com/y6dshdx, which is not working). I went through
PostgreSQL docs, and it seem that each of default dictionaries (simple,
ispell, snowball) are lowering lexems during normalization, and there is
no option to disable it.

I started to look for some tutorials, how to create own dictionary, or
modify existing one (I'm talking about dictionary like snowball, with my
own source code - not just a dictionary created by 'CREATE
DICTIONARY...' query), but all I found is really out-of-date, and uses
some mechanisms that are deprecated in latest version of Postgres (I'm
working on v 9.2) - like 'contrib/gendict' here:
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/custom-dict.html
<http://www.sai.msu.su/%7Emegera/postgres/gist/tsearch/V2/docs/custom-dict.html>


So now, I have no idea what to do with my case sensitivity problem... Is
there any other way to overcome it, apart from creating own dictionary?
If no - how to create one on the Postgres 9.2?

Regards,
xaru


Re: [tsearch2] Problem with case sensitivity (or with creating own dictionary)

От
Oleg Bartunov
Дата:
Please,

take a look on contrib/dict_int and create your own dict_noop.
It should be easy.  I think you could document it and share
with people (wiki.postgresql.org ?), since there were other people
interesting in noop dictionary. Also, don't forget to modify
your configuration - use ts_debug(), it will helps you.

Regards,
Oleg

On Sat, 3 Aug 2013, Krzysztof xaru Rajda wrote:

> Hello,
>
> I encountered such a problem. my goal is to extract links from a text using
> tsearch2. Everything seemed to be well, unless I got some youtube links -
> there are some small and big letters inside, and a tsearch parser is lowering
> everything (from http://youtube.com/Y6dsHDX I got http://youtube.com/y6dshdx,
> which is not working). I went through PostgreSQL docs, and it seem that each
> of default dictionaries (simple, ispell, snowball) are lowering lexems during
> normalization, and there is no option to disable it.
>
> I started to look for some tutorials, how to create own dictionary, or modify
> existing one (I'm talking about dictionary like snowball, with my own source
> code - not just a dictionary created by 'CREATE DICTIONARY...' query), but
> all I found is really out-of-date, and uses some mechanisms that are
> deprecated in latest version of Postgres (I'm working on v 9.2) - like
> 'contrib/gendict' here:
> http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/custom-dict.html
> <http://www.sai.msu.su/%7Emegera/postgres/gist/tsearch/V2/docs/custom-dict.html>
>
> So now, I have no idea what to do with my case sensitivity problem... Is
> there any other way to overcome it, apart from creating own dictionary? If no
> - how to create one on the Postgres 9.2?
>
> Regards,
> xaru
>
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


Re: [tsearch2] Problem with case sensitivity (or with creating own dictionary)

От
Krzysztof xaru Rajda
Дата:
Ok, so to be sure if I understand everything - first I should install a
postgresql-contrib extension. Next, there will appear a contrib/dict_int
directory with dict_int sourcecode inside, which I can modify. Then,
I'll be able to install this modified dictionary, and it would be
working properly, like ispell or snowball dictionaries. Finally, if
everything will be ok, I'll share a little tutorial at wiki :)

Am I right, or it isn't that easy?

Regards,
xaru




W dniu 2013-08-05 18:37, Oleg Bartunov pisze:
> Please,
>
> take a look on contrib/dict_int and create your own dict_noop.
> It should be easy.  I think you could document it and share
> with people (wiki.postgresql.org ?), since there were other people
> interesting in noop dictionary. Also, don't forget to modify
> your configuration - use ts_debug(), it will helps you.
>
> Regards,
> Oleg
>
> On Sat, 3 Aug 2013, Krzysztof xaru Rajda wrote:
>
>> Hello,
>>
>> I encountered such a problem. my goal is to extract links from a text
>> using tsearch2. Everything seemed to be well, unless I got some
>> youtube links - there are some small and big letters inside, and a
>> tsearch parser is lowering everything (from
>> http://youtube.com/Y6dsHDX I got http://youtube.com/y6dshdx, which is
>> not working). I went through PostgreSQL docs, and it seem that each
>> of default dictionaries (simple, ispell, snowball) are lowering
>> lexems during normalization, and there is no option to disable it.
>>
>> I started to look for some tutorials, how to create own dictionary,
>> or modify existing one (I'm talking about dictionary like snowball,
>> with my own source code - not just a dictionary created by 'CREATE
>> DICTIONARY...' query), but all I found is really out-of-date, and
>> uses some mechanisms that are deprecated in latest version of
>> Postgres (I'm working on v 9.2) - like 'contrib/gendict' here:
>> http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/custom-dict.html
>> <http://www.sai.msu.su/%7Emegera/postgres/gist/tsearch/V2/docs/custom-dict.html>
>>
>> So now, I have no idea what to do with my case sensitivity problem...
>> Is there any other way to overcome it, apart from creating own
>> dictionary? If no - how to create one on the Postgres 9.2?
>>
>> Regards,
>> xaru
>>
>>
>>
>
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83