Re: contrib/tsearch

Поиск
Список
Период
Сортировка
От Oleg Bartunov
Тема Re: contrib/tsearch
Дата
Msg-id Pine.GSO.4.44.0209051313210.3967-100000@ra.sai.msu.su
обсуждение исходный текст
Ответ на Re: contrib/tsearch  ("Christopher Kings-Lynne" <chriskl@familyhealth.com.au>)
Ответы Re: contrib/tsearch  ("Christopher Kings-Lynne" <chriskl@familyhealth.com.au>)
Список pgsql-hackers
On Thu, 5 Sep 2002, Christopher Kings-Lynne wrote:

> Hmmm...thinking about it, maybe 'herring' is being reduced to 'her' after
> the stemming process and hence is thought to be a stopword?  This is a bug,
> but how should it be fixed?
>

It's difficult question how to use stop words. We'll see what we could
do. Probably, porter's stemming algorithm has problem here.
'herring' -> 'her'~'ring'
(I have a demo of english-russian stemmr, so you can play)
http://intra.astronet.ru/db/lingua/snowball/
I'll ask Martin Porter if there could be an error stemmer.
But I think the problem is in concept of using stop words.
Should we check for stop words before stemming or after ?
In the first case we have to collect all forms of stop-words which is doable
but difficult to maintain, in latter - we'll have current problem.

It's time for beta1 and I'm not sure if we could work on this issue
right now, but I feel a big pressure from tsearch users :-)
If people want to help us why not to work on stop words list including
all forms ? In any case, we are not native  english, so don't expect we'll
create more or less decent list. Programming changes are trivial, probably
we'll end for the moment just using compile time option.
As always, your patches are welcome !

btw, you may test your queries much easier:

list=# select 'herring'::mquery_txt;
ERROR:  Your query contained only stopword(s), ignored
list=# select 'herring'::query_txt;query_txt
-----------'herring'
(1 row)




> Although, tests don't support that:
>
> usa=# select food_id, brand,description,ftiidx from food_foods where ftiidx
> ## 'himring';
>  food_id | brand | description | ftiidx
> ---------+-------+-------------+--------
> (0 rows)
> usa=# select food_id, brand,description,ftiidx from food_foods where ftiidx
> ## 'hisring';
>  food_id | brand | description | ftiidx
> ---------+-------+-------------+--------
> (0 rows)
>
> usa=# select food_id, brand,description,ftiidx from food_foods where ftiidx
> ## 'hising';
>  food_id | brand | description | ftiidx
> ---------+-------+-------------+--------
> (0 rows)
>
> usa=# select food_id, brand,description,ftiidx from food_foods where ftiidx
> ## 'himing';
>  food_id | brand | description | ftiidx
> ---------+-------+-------------+--------
> (0 rows)
>
> All work...?
>
> Chris
>
> > -----Original Message-----
> > From: pgsql-hackers-owner@postgresql.org
> > [mailto:pgsql-hackers-owner@postgresql.org]On Behalf Of Christopher
> > Kings-Lynne
> > Sent: Thursday, 5 September 2002 2:36 PM
> > To: Hackers
> > Subject: [HACKERS] contrib/tsearch
> >
> >
> > Hi Oleg/Teodor,
> >
> > I'm sorry to keep posting bugs without patches, but I'm just
> > hoping you guys
> > know the answer faster than I...I know you're busy.
> >
> > What does tsearch have against the word 'herring' (as in the
> > fish).  Why is
> > it considered a stopword?
> >
> > Attached is example queries...
> >
> > Chris
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>
Regards,    Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83





В списке pgsql-hackers по дате отправления:

Предыдущее
От: Curt Sampson
Дата:
Сообщение: Re: Inheritance
Следующее
От: Vince Vielhaber
Дата:
Сообщение: Re: beta1 packaged