Обсуждение: FTS wildcard and custom ispell dictionary problem

Поиск
Список
Период
Сортировка

FTS wildcard and custom ispell dictionary problem

От
darklow
Дата:
Hello,

I am using PostgreSQL 8.4 full text search in following way:

Custom FTS configuration called "dc2" with these dictionaries in
following order for asciihword token: latvian_ispell, english_stem,
russian_stem

Latvian ispell dictionary contains words with different endings but
same meaning (latvian langiage specifics, plural words, etc)
The problem starts when using wildcard :* to_tsquery syntax.

For example. If i look for the word "kriev" i am automatically adding
wildcard using syntax:  to_tsquery('dc2', 'kriev:*');

By searching kriev:* FTS founds word "krievs" in latvian_ispell
dictionary which is totally ok.

 SELECT * from ts_debug('dc2', 'kriev:*');
  alias   |   description   | token | dictionaries
  |   dictionary   | lexemes

-----------+-----------------+-------+----------------------------------------------------------+----------------+----------
 asciiword | Word, all ASCII | kriev |
{latvian_ispell,english_stem,russian_stem} | latvian_ispell | {krievs}
 blank     | Space symbols   | :*    | {}

If understand correctly now database uses not kriev:* but krievs:* for
following queries.

And here is the problem, data contains also word: Krievija, and in
this case search doesn't find it, because now it looks for Krievs:*
and not Kriev:* anymore.

Is there any solution anone could suggest to get results by both
criterias - kriev:* (starting query) and krievs:* (founded in ispell
dict).

Only idea i had is to somehow combine two tsqueries one -
to_tsquery('dc2', 'kriev:*') and to_tsquery('english', 'kriev:*'); so
the search looks for both - kriev:* and krievs:* but anyway didnt
figured out any syntax i could use :(

Thanks

Re: FTS wildcard and custom ispell dictionary problem

От
Oleg Bartunov
Дата:
On Thu, 19 Aug 2010, darklow wrote:

> Hello,
>
> I am using PostgreSQL 8.4 full text search in following way:
>
> Custom FTS configuration called "dc2" with these dictionaries in
> following order for asciihword token: latvian_ispell, english_stem,
> russian_stem

russian_stem will never called ! Sorry, I have no time right now, will
be back later.

>
> Latvian ispell dictionary contains words with different endings but
> same meaning (latvian langiage specifics, plural words, etc)
> The problem starts when using wildcard :* to_tsquery syntax.
>
> For example. If i look for the word "kriev" i am automatically adding
> wildcard using syntax:  to_tsquery('dc2', 'kriev:*');
>
> By searching kriev:* FTS founds word "krievs" in latvian_ispell
> dictionary which is totally ok.
>
> SELECT * from ts_debug('dc2', 'kriev:*');
>  alias   |   description   | token | dictionaries
>  |   dictionary   | lexemes
>
-----------+-----------------+-------+----------------------------------------------------------+----------------+----------
> asciiword | Word, all ASCII | kriev |
> {latvian_ispell,english_stem,russian_stem} | latvian_ispell | {krievs}
> blank     | Space symbols   | :*    | {}
>
> If understand correctly now database uses not kriev:* but krievs:* for
> following queries.
>
> And here is the problem, data contains also word: Krievija, and in
> this case search doesn't find it, because now it looks for Krievs:*
> and not Kriev:* anymore.
>
> Is there any solution anone could suggest to get results by both
> criterias - kriev:* (starting query) and krievs:* (founded in ispell
> dict).
>
> Only idea i had is to somehow combine two tsqueries one -
> to_tsquery('dc2', 'kriev:*') and to_tsquery('english', 'kriev:*'); so
> the search looks for both - kriev:* and krievs:* but anyway didnt
> figured out any syntax i could use :(
>
> Thanks
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: FTS wildcard and custom ispell dictionary problem

От
Oleg Bartunov
Дата:
On Thu, 19 Aug 2010, darklow wrote:

> Hello,
>
> I am using PostgreSQL 8.4 full text search in following way:
>
> Custom FTS configuration called "dc2" with these dictionaries in
> following order for asciihword token: latvian_ispell, english_stem,
> russian_stem
>
> Latvian ispell dictionary contains words with different endings but
> same meaning (latvian langiage specifics, plural words, etc)
> The problem starts when using wildcard :* to_tsquery syntax.
>
> For example. If i look for the word "kriev" i am automatically adding
> wildcard using syntax:  to_tsquery('dc2', 'kriev:*');
>
> By searching kriev:* FTS founds word "krievs" in latvian_ispell
> dictionary which is totally ok.
>
> SELECT * from ts_debug('dc2', 'kriev:*');
>  alias   |   description   | token | dictionaries
>  |   dictionary   | lexemes
>
-----------+-----------------+-------+----------------------------------------------------------+----------------+----------
> asciiword | Word, all ASCII | kriev |
> {latvian_ispell,english_stem,russian_stem} | latvian_ispell | {krievs}
> blank     | Space symbols   | :*    | {}
>
> If understand correctly now database uses not kriev:* but krievs:* for
> following queries.
>
> And here is the problem, data contains also word: Krievija, and in
> this case search doesn't find it, because now it looks for Krievs:*
> and not Kriev:* anymore.
>
> Is there any solution anone could suggest to get results by both
> criterias - kriev:* (starting query) and krievs:* (founded in ispell
> dict).
>
> Only idea i had is to somehow combine two tsqueries one -
> to_tsquery('dc2', 'kriev:*') and to_tsquery('english', 'kriev:*'); so
> the search looks for both - kriev:* and krievs:* but anyway didnt
> figured out any syntax i could use :(

select to_tsquery('dc2', 'kriev:*') || to_tsquery('english', 'kriev:*');

>
> Thanks
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: FTS wildcard and custom ispell dictionary problem

От
darklow
Дата:
Great thanks. This function idea works perfectly!

create function xtsquery (tsquery,tsquery) returns tsquery as $$
select $1 || $2 $$ language sql;
SELECT fts_title, ts_headline(fts_title, query) from object_fts ot,
xtsquery(to_tsquery('dc2', 'kriev:*'), to_tsquery('english',
'kriev:*')) query where fts_tsv @@ query;



2010/8/19 Oleg Bartunov <oleg@sai.msu.su>:
> Write function like:
>
> test=# create function xtsquery (tsquery,tsquery) returns tsquery as $$
> select $1 || $2 $$ language sql;
>
> test=# select   query from   xtsquery(to_tsquery('aad:*'),
> to_tsquery('aads:*'))  as query where 'aad:1 aadsj:2'::tsvector @@  query;
>       query -------------------
>  'aad':* | 'aad':*
> (1 row)
>
> Oleg
>
> PS. Please, write back to mailing list, so other people will read it. I'm
> too busy to help individuals :)
>
> On Thu, 19 Aug 2010, darklow wrote:
>
>> If i use this syntax, i cannot give tsquery name "query" to use in for
>> example ts_headline and ts_rank_cd functions! :(
>> This means i need to define ts_query concatenation every time, and i
>> guess this can cause performance issues.
>>
>> This works, but concatenation and to_tsquery is called every time i
>> need it (up to 3 times if i use also ts_rank_cd and ts_headline
>> functions):
>> SELECT fts_title, ts_headline(fts_title, (to_tsquery('dc2', 'kriev:*')
>> || to_tsquery('english', 'kriev:*'))) from object_fts ot where fts_tsv
>> @@ (to_tsquery('dc2', 'kriev:*') || to_tsquery('english', 'kriev:*'));
>>
>> These unfortunately doesn't:
>> SELECT fts_title, ts_headline(fts_title, query) from object_fts ot
>> where fts_tsv @@ (to_tsquery('dc2', 'kriev:*') ||
>> to_tsquery('english', 'kriev:*'));
>>
>> ERROR:  column "query" does not exist
>> SELECT fts_title, ts_headline(fts_title, query) from object_fts ot,
>> (to_tsquery('dc2', 'kriev:*') || to_tsquery('english', 'kriev:*'))
>> query where fts_tsv @@ query;
>>
>> are there any solutions for this?
>> Thanks.
>>
>> 2010/8/19 Oleg Bartunov <oleg@sai.msu.su>:
>>>
>>> test=# select 'aad:1 aadsj:2'::tsvector @@ (to_tsquery('aad:*') ||
>>>  to_tsquery('aads:*'));
>>>  ?column? ----------
>>>  t
>>> (1 row)
>>>
>>> On Thu, 19 Aug 2010, darklow wrote:
>>>
>>>> Thanks this solution could work for me. However i can't get to work
>>>> syntax in more complex query, for example using @@ syntax:
>>>>
>>>> This one works:
>>>> SELECT fts_title, ts_headline(fts_title, query) from object_fts ot,
>>>> to_tsquery('dc2', 'kriev:*') query where query @@ fts_tsv;
>>>>
>>>> With concatenating doesn't work:
>>>> SELECT fts_title, ts_headline(fts_title, query) from object_fts ot,
>>>> (to_tsquery('dc2', 'kriev:*') || to_tsquery('english', 'kriev:*'))
>>>> query where query @@ fts_tsv;
>>>> ERROR:  syntax error at or near "||"
>>>>
>>>> Any ideas what is correct syntax of this?
>>>> Thanks.
>>>>
>>>>
>>>> On Thu, Aug 19, 2010 at 12:43 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:
>>>>>
>>>>> On Thu, 19 Aug 2010, darklow wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I am using PostgreSQL 8.4 full text search in following way:
>>>>>>
>>>>>> Custom FTS configuration called "dc2" with these dictionaries in
>>>>>> following order for asciihword token: latvian_ispell, english_stem,
>>>>>> russian_stem
>>>>>>
>>>>>> Latvian ispell dictionary contains words with different endings but
>>>>>> same meaning (latvian langiage specifics, plural words, etc)
>>>>>> The problem starts when using wildcard :* to_tsquery syntax.
>>>>>>
>>>>>> For example. If i look for the word "kriev" i am automatically adding
>>>>>> wildcard using syntax:  to_tsquery('dc2', 'kriev:*');
>>>>>>
>>>>>> By searching kriev:* FTS founds word "krievs" in latvian_ispell
>>>>>> dictionary which is totally ok.
>>>>>>
>>>>>> SELECT * from ts_debug('dc2', 'kriev:*');
>>>>>>  alias   |   description   | token | dictionaries
>>>>>>  |   dictionary   | lexemes
>>>>>>
>>>>>>
>>>>>>
>>>>>>
-----------+-----------------+-------+----------------------------------------------------------+----------------+----------
>>>>>> asciiword | Word, all ASCII | kriev |
>>>>>> {latvian_ispell,english_stem,russian_stem} | latvian_ispell | {krievs}
>>>>>> blank     | Space symbols   | :*    | {}
>>>>>>
>>>>>> If understand correctly now database uses not kriev:* but krievs:* for
>>>>>> following queries.
>>>>>>
>>>>>> And here is the problem, data contains also word: Krievija, and in
>>>>>> this case search doesn't find it, because now it looks for Krievs:*
>>>>>> and not Kriev:* anymore.
>>>>>>
>>>>>> Is there any solution anone could suggest to get results by both
>>>>>> criterias - kriev:* (starting query) and krievs:* (founded in ispell
>>>>>> dict).
>>>>>>
>>>>>> Only idea i had is to somehow combine two tsqueries one -
>>>>>> to_tsquery('dc2', 'kriev:*') and to_tsquery('english', 'kriev:*'); so
>>>>>> the search looks for both - kriev:* and krievs:* but anyway didnt
>>>>>> figured out any syntax i could use :(
>>>>>
>>>>> select to_tsquery('dc2', 'kriev:*') || to_tsquery('english',
>>>>> 'kriev:*');
>>>>>
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>
>>>>>        Regards,
>>>>>                Oleg
>>>>> _____________________________________________________________
>>>>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>>>>> Sternberg Astronomical Institute, Moscow University, Russia
>>>>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>>>>> phone: +007(495)939-16-83, +007(495)939-23-83
>>>>>
>>>>
>>>
>>>        Regards,
>>>                Oleg
>>> _____________________________________________________________
>>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>>> Sternberg Astronomical Institute, Moscow University, Russia
>>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>>> phone: +007(495)939-16-83, +007(495)939-23-83
>>
>
>        Regards,
>                Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83