Обсуждение: Tsearch limitations

Поиск
Список
Период
Сортировка

Tsearch limitations

От
Mat
Дата:
Can Tsearch be used to return substring matches?

i.e

Text to search: Hi my email addres is psql-mail@freeuk.com

Query "psql" would match the email address?

Query "mail" would also match?

Query "reeu" would also match?

Or is tsearch not suitable for this type of query? should i use FTI
instead?

Thanks.


Re: Tsearch limitations

От
Oleg Bartunov
Дата:
Mat,

there are several function you may use to see (please, read documentation):

apod=# select to_tsvector('Hi my email addres is psql-mail@freeuk.com');
                    to_tsvector
----------------------------------------------------
 'hi':1 'addr':4 'email':3 'psql-mail@freeuk.com':6
(1 row)

or, even better

apod=# select * from ts_debug('Hi my email addres is psql-mail@freeuk.com');
     ts_name     | tok_type | description |        token         | dict_name |        tsvector
-----------------+----------+-------------+----------------------+-----------+------------------------
 default_russian | lword    | Latin word  | Hi                   | {en_stem} | 'hi'
 default_russian | lword    | Latin word  | my                   | {en_stem} |
 default_russian | lword    | Latin word  | email                | {en_stem} | 'email'
 default_russian | lword    | Latin word  | addres               | {en_stem} | 'addr'
 default_russian | lword    | Latin word  | is                   | {en_stem} |
 default_russian | email    | Email       | psql-mail@freeuk.com | {simple}  | 'psql-mail@freeuk.com'
(6 rows)

You may write your own parser or preprocess text before tsearch.

    Oleg
On Mon, 11 Aug 2003, Mat wrote:

> Can Tsearch be used to return substring matches?
>
> i.e
>
> Text to search: Hi my email addres is psql-mail@freeuk.com
>
> Query "psql" would match the email address?
>
> Query "mail" would also match?
>
> Query "reeu" would also match?
>
> Or is tsearch not suitable for this type of query? should i use FTI
> instead?
>
> Thanks.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: Tsearch limitations

От
psql-mail@freeuk.com
Дата:
Oleg,

I understand (i think) how the parser breaks up the input into words
and builds ts_vector's.

And i understand how to do queries as described into the documentation.
(I have read it!)

SELECT * FROM vectors WHERE vector @@ to_tsquery('(leads|forks) & !
crawl')

But i haven't seen any mention of if i add the word:

cathedral

if there is any query which will match if I search for "thed".

The documentation seems to say that this cannot be done - but i'd just
like to check. Tsearch2 does everything i want except this.

"remember that the search operator @@ finds only exact matches between
query lexemes and vector lexemes � if they are not exactly the same
string, they will not be considered a match"


> Mat,
>
> there are several function you may use to see (please, read
documentation):
>
> apod=# select to_tsvector('Hi my email addres is psql-mail@freeuk.com'
);
>                     to_tsvector
> ----------------------------------------------------
>  'hi':1 'addr':4 'email':3 'psql-mail@freeuk.com':6
> (1 row)
>
> or, even better
>
> apod=# select * from ts_debug('Hi my email addres is psql-mail@freeuk.
com');
>      ts_name     | tok_type | description |        token         |
dict_name |        tsvector
> -----------------+----------+-------------+----------------------+----
-------+------------------------
>  default_russian | lword    | Latin word  | Hi                   | {
en_stem} | 'hi'
>  default_russian | lword    | Latin word  | my                   | {
en_stem} |
>  default_russian | lword    | Latin word  | email                | {
en_stem} | 'email'
>  default_russian | lword    | Latin word  | addres               | {
en_stem} | 'addr'
>  default_russian | lword    | Latin word  | is                   | {
en_stem} |
>  default_russian | email    | Email       | psql-mail@freeuk.com | {
simple}  | 'psql-mail@freeuk.com'
> (6 rows)
>
> You may write your own parser or preprocess text before tsearch.
>
>     Oleg
> On Mon, 11 Aug 2003, Mat wrote:
>
> > Can Tsearch be used to return substring matches?
> >
> > i.e
> >
> > Text to search: Hi my email addres is psql-mail@freeuk.com
> >
> > Query "psql" would match the email address?
> >
> > Query "mail" would also match?
> >
> > Query "reeu" would also match?
> >
> > Or is tsearch not suitable for this type of query? should i use FTI

> > instead?
> >
> > Thanks.
> >
> >
> > ---------------------------(end of broadcast)-----------------------
----
> > TIP 6: Have you searched our list archives?
> >
> >                http://archives.postgresql.org
> >
>
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
>

--

Re: Tsearch limitations

От
Oleg Bartunov
Дата:
On Mon, 11 Aug 2003 psql-mail@freeuk.com wrote:

> Oleg,
>
> I understand (i think) how the parser breaks up the input into words
> and builds ts_vector's.
>
> And i understand how to do queries as described into the documentation.
> (I have read it!)
>
> SELECT * FROM vectors WHERE vector @@ to_tsquery('(leads|forks) & !
> crawl')
>
> But i haven't seen any mention of if i add the word:
>
> cathedral
>
> if there is any query which will match if I search for "thed".

No, tsearch2 is a word oriented search. It doesn't supports substring
search.

>
> The documentation seems to say that this cannot be done - but i'd just
> like to check. Tsearch2 does everything i want except this.
>
> "remember that the search operator @@ finds only exact matches between
> query lexemes and vector lexemes ≈ if they are not exactly the same
> string, they will not be considered a match"
>
>
> > Mat,
> >
> > there are several function you may use to see (please, read
> documentation):
> >
> > apod=# select to_tsvector('Hi my email addres is psql-mail@freeuk.com'
> );
> >                     to_tsvector
> > ----------------------------------------------------
> >  'hi':1 'addr':4 'email':3 'psql-mail@freeuk.com':6
> > (1 row)
> >
> > or, even better
> >
> > apod=# select * from ts_debug('Hi my email addres is psql-mail@freeuk.
> com');
> >      ts_name     | tok_type | description |        token         |
> dict_name |        tsvector
> > -----------------+----------+-------------+----------------------+----
> -------+------------------------
> >  default_russian | lword    | Latin word  | Hi                   | {
> en_stem} | 'hi'
> >  default_russian | lword    | Latin word  | my                   | {
> en_stem} |
> >  default_russian | lword    | Latin word  | email                | {
> en_stem} | 'email'
> >  default_russian | lword    | Latin word  | addres               | {
> en_stem} | 'addr'
> >  default_russian | lword    | Latin word  | is                   | {
> en_stem} |
> >  default_russian | email    | Email       | psql-mail@freeuk.com | {
> simple}  | 'psql-mail@freeuk.com'
> > (6 rows)
> >
> > You may write your own parser or preprocess text before tsearch.
> >
> >     Oleg
> > On Mon, 11 Aug 2003, Mat wrote:
> >
> > > Can Tsearch be used to return substring matches?
> > >
> > > i.e
> > >
> > > Text to search: Hi my email addres is psql-mail@freeuk.com
> > >
> > > Query "psql" would match the email address?
> > >
> > > Query "mail" would also match?
> > >
> > > Query "reeu" would also match?
> > >
> > > Or is tsearch not suitable for this type of query? should i use FTI
>
> > > instead?
> > >
> > > Thanks.
> > >
> > >
> > > ---------------------------(end of broadcast)-----------------------
> ----
> > > TIP 6: Have you searched our list archives?
> > >
> > >                http://archives.postgresql.org
> > >
> >
> >     Regards,
> >         Oleg
> > _____________________________________________________________
> > Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> > Sternberg Astronomical Institute, Moscow University (Russia)
> > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> > phone: +007(095)939-16-83, +007(095)939-23-83
> >
>
>

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: Tsearch limitations

От
Mike Benoit
Дата:
Oleg,

    Is it possible to have Tsearch support soundex, or levenshtein
(http://ca3.php.net/manual/en/function.levenshtein.php) when searching?

I've never used Tsearch before, but I assume this might just be a matter
of writing a different parser to add soundex'd versions of words to the
index, then modify the query functions to search on both versions of the
word?


On Mon, 2003-08-11 at 07:30, Oleg Bartunov wrote:
> On Mon, 11 Aug 2003 psql-mail@freeuk.com wrote:
>
> > Oleg,
> >
> > I understand (i think) how the parser breaks up the input into words
> > and builds ts_vector's.
> >
> > And i understand how to do queries as described into the documentation.
> > (I have read it!)
> >
> > SELECT * FROM vectors WHERE vector @@ to_tsquery('(leads|forks) & !
> > crawl')
> >
> > But i haven't seen any mention of if i add the word:
> >
> > cathedral
> >
> > if there is any query which will match if I search for "thed".
>
> No, tsearch2 is a word oriented search. It doesn't supports substring
> search.
>
> >
> > The documentation seems to say that this cannot be done - but i'd just
> > like to check. Tsearch2 does everything i want except this.
> >
> > "remember that the search operator @@ finds only exact matches between
> > query lexemes and vector lexemes ≈ if they are not exactly the same
> > string, they will not be considered a match"
> >
> >
> > > Mat,
> > >
> > > there are several function you may use to see (please, read
> > documentation):
> > >
> > > apod=# select to_tsvector('Hi my email addres is psql-mail@freeuk.com'
> > );
> > >                     to_tsvector
> > > ----------------------------------------------------
> > >  'hi':1 'addr':4 'email':3 'psql-mail@freeuk.com':6
> > > (1 row)
> > >
> > > or, even better
> > >
> > > apod=# select * from ts_debug('Hi my email addres is psql-mail@freeuk.
> > com');
> > >      ts_name     | tok_type | description |        token         |
> > dict_name |        tsvector
> > > -----------------+----------+-------------+----------------------+----
> > -------+------------------------
> > >  default_russian | lword    | Latin word  | Hi                   | {
> > en_stem} | 'hi'
> > >  default_russian | lword    | Latin word  | my                   | {
> > en_stem} |
> > >  default_russian | lword    | Latin word  | email                | {
> > en_stem} | 'email'
> > >  default_russian | lword    | Latin word  | addres               | {
> > en_stem} | 'addr'
> > >  default_russian | lword    | Latin word  | is                   | {
> > en_stem} |
> > >  default_russian | email    | Email       | psql-mail@freeuk.com | {
> > simple}  | 'psql-mail@freeuk.com'
> > > (6 rows)
> > >
> > > You may write your own parser or preprocess text before tsearch.
> > >
> > >     Oleg
> > > On Mon, 11 Aug 2003, Mat wrote:
> > >
> > > > Can Tsearch be used to return substring matches?
> > > >
> > > > i.e
> > > >
> > > > Text to search: Hi my email addres is psql-mail@freeuk.com
> > > >
> > > > Query "psql" would match the email address?
> > > >
> > > > Query "mail" would also match?
> > > >
> > > > Query "reeu" would also match?
> > > >
> > > > Or is tsearch not suitable for this type of query? should i use FTI
> >
> > > > instead?
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > ---------------------------(end of broadcast)-----------------------
> > ----
> > > > TIP 6: Have you searched our list archives?
> > > >
> > > >                http://archives.postgresql.org
> > > >
> > >
> > >     Regards,
> > >         Oleg
> > > _____________________________________________________________
> > > Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> > > Sternberg Astronomical Institute, Moscow University (Russia)
> > > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> > > phone: +007(095)939-16-83, +007(095)939-23-83
> > >
> >
> >
>
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
--
Best Regards,

Mike Benoit
NetNation Communications Inc.
Systems Engineer
Tel: 604-684-6892 or 888-983-6600
 ---------------------------------------

 Disclaimer: Opinions expressed here are my own and not
 necessarily those of my employer


Re: Tsearch limitations

От
Teodor Sigaev
Дата:

Mike Benoit wrote:
> Oleg,
>
>     Is it possible to have Tsearch support soundex, or levenshtein
> (http://ca3.php.net/manual/en/function.levenshtein.php) when searching?
Sorrry, No


Function of calculating levenshtein distance defined as
int levenshtein ( string str1, string str2)

So, it can't be used as dictionary. :(

Index stores only signature of lexized word and we can't find distance between
query word and signature.

>
> I've never used Tsearch before, but I assume this might just be a matter
> of writing a different parser to add soundex'd versions of words to the
> index, then modify the query functions to search on both versions of the
> word?

For work with tsearch2, dictionary must return "canonical" kind of input lexemes
(usially infinitive). If you can write function which corrects some mistakes in
word then you can use it in tsearch.



>
>
> On Mon, 2003-08-11 at 07:30, Oleg Bartunov wrote:
>
>>On Mon, 11 Aug 2003 psql-mail@freeuk.com wrote:
>>
>>
>>>Oleg,
>>>
>>>I understand (i think) how the parser breaks up the input into words
>>>and builds ts_vector's.
>>>
>>>And i understand how to do queries as described into the documentation.
>>>(I have read it!)
>>>
>>>SELECT * FROM vectors WHERE vector @@ to_tsquery('(leads|forks) & !
>>>crawl')
>>>
>>>But i haven't seen any mention of if i add the word:
>>>
>>>cathedral
>>>
>>>if there is any query which will match if I search for "thed".
>>
>>No, tsearch2 is a word oriented search. It doesn't supports substring
>>search.
>>
>>
>>>The documentation seems to say that this cannot be done - but i'd just
>>>like to check. Tsearch2 does everything i want except this.
>>>
>>>"remember that the search operator @@ finds only exact matches between
>>>query lexemes and vector lexemes ≈ if they are not exactly the same
>>>string, they will not be considered a match"
>>>
>>>
>>>
>>>>Mat,
>>>>
>>>>there are several function you may use to see (please, read
>>>
>>>documentation):
>>>
>>>>apod=# select to_tsvector('Hi my email addres is psql-mail@freeuk.com'
>>>
>>>);
>>>
>>>>                    to_tsvector
>>>>----------------------------------------------------
>>>> 'hi':1 'addr':4 'email':3 'psql-mail@freeuk.com':6
>>>>(1 row)
>>>>
>>>>or, even better
>>>>
>>>>apod=# select * from ts_debug('Hi my email addres is psql-mail@freeuk.
>>>
>>>com');
>>>
>>>>     ts_name     | tok_type | description |        token         |
>>>
>>>dict_name |        tsvector
>>>
>>>>-----------------+----------+-------------+----------------------+----
>>>
>>>-------+------------------------
>>>
>>>> default_russian | lword    | Latin word  | Hi                   | {
>>>
>>>en_stem} | 'hi'
>>>
>>>> default_russian | lword    | Latin word  | my                   | {
>>>
>>>en_stem} |
>>>
>>>> default_russian | lword    | Latin word  | email                | {
>>>
>>>en_stem} | 'email'
>>>
>>>> default_russian | lword    | Latin word  | addres               | {
>>>
>>>en_stem} | 'addr'
>>>
>>>> default_russian | lword    | Latin word  | is                   | {
>>>
>>>en_stem} |
>>>
>>>> default_russian | email    | Email       | psql-mail@freeuk.com | {
>>>
>>>simple}  | 'psql-mail@freeuk.com'
>>>
>>>>(6 rows)
>>>>
>>>>You may write your own parser or preprocess text before tsearch.
>>>>
>>>>    Oleg
>>>>On Mon, 11 Aug 2003, Mat wrote:
>>>>
>>>>
>>>>>Can Tsearch be used to return substring matches?
>>>>>
>>>>>i.e
>>>>>
>>>>>Text to search: Hi my email addres is psql-mail@freeuk.com
>>>>>
>>>>>Query "psql" would match the email address?
>>>>>
>>>>>Query "mail" would also match?
>>>>>
>>>>>Query "reeu" would also match?
>>>>>
>>>>>Or is tsearch not suitable for this type of query? should i use FTI
>>>
>>>>>instead?
>>>>>
>>>>>Thanks.
>>>>>
>>>>>
>>>>>---------------------------(end of broadcast)-----------------------
>>>
>>>----
>>>
>>>>>TIP 6: Have you searched our list archives?
>>>>>
>>>>>               http://archives.postgresql.org
>>>>>
>>>>
>>>>    Regards,
>>>>        Oleg
>>>>_____________________________________________________________
>>>>Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
>>>>Sternberg Astronomical Institute, Moscow University (Russia)
>>>>Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>>>>phone: +007(095)939-16-83, +007(095)939-23-83
>>>>
>>>
>>>
>>    Regards,
>>        Oleg
>>_____________________________________________________________
>>Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
>>Sternberg Astronomical Institute, Moscow University (Russia)
>>Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>>phone: +007(095)939-16-83, +007(095)939-23-83
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 8: explain analyze is your friend

--
Teodor Sigaev                                  E-mail: teodor@sigaev.ru