Обсуждение: Ellipses around result fragment of ts_headline

Поиск
Список
Период
Сортировка

Ellipses around result fragment of ts_headline

От
Asher Snyder
Дата:
It would be very useful if there were an option to have ts_headline append
ellipses before or after a result fragement based on the position of the
fragment in the source document. For instance, when running ts_headline(doc,
query) it will correctly return a fragment with words highlighted, however,
there's no easy way to determine whether this returned fragment is at the
beginning or end of the original doc, and add the necessary ellipses. 

Searches such as postgresql.org ALWAYS add ellipses before or after the
fragment regardless of whether or not ellipses are warranted. In my opinion
always adding ellipses to the fragment is deceptive to the user, in many of
my search result cases, the fragment is at the beginning of the doc, and
would confuse the user to always see ellipses. So you can see how useful the
feature described above would be beneficial to the accuracy of the search
result fragment.






Re: Ellipses around result fragment of ts_headline

От
Sushant Sinha
Дата:
I think we currently do that. We add ellipses only when we encounter a
new fragment. So there should not be ellipses if we are at the end of
the document or if that is the first fragment (includes the beginning of
the document). Here is the code in generateHeadline, ts_parse.c that
adds the ellipses:
           if (!infrag)           {
               /* start of a new fragment */               infrag = 1;               numfragments ++;               /*
adda fragment delimitor if this is after the first
 
one */               if (numfragments > 1)               {                   memcpy(ptr, prs->fragdelim,
prs->fragdelimlen);                  ptr += prs->fragdelimlen;               }
 
           }

It is possible that there is a bug that needs to be fixed. Can you show
me an example where you found that?

-Sushant.




On Sat, 2009-02-14 at 15:13 -0500, Asher Snyder wrote:
> It would be very useful if there were an option to have ts_headline append
> ellipses before or after a result fragement based on the position of the
> fragment in the source document. For instance, when running ts_headline(doc,
> query) it will correctly return a fragment with words highlighted, however,
> there's no easy way to determine whether this returned fragment is at the
> beginning or end of the original doc, and add the necessary ellipses. 
> 
> Searches such as postgresql.org ALWAYS add ellipses before or after the
> fragment regardless of whether or not ellipses are warranted. In my opinion
> always adding ellipses to the fragment is deceptive to the user, in many of
> my search result cases, the fragment is at the beginning of the doc, and
> would confuse the user to always see ellipses. So you can see how useful the
> feature described above would be beneficial to the accuracy of the search
> result fragment.
> 
> 
> 
> 
> 



Re: Ellipses around result fragment of ts_headline

От
Asher Snyder
Дата:
Interesting, it could be that you already do it, but the documentation makes
no reference to a fragment delimiter, so there's no way that I can see to
add one. The documentation for ts_headline only lists StartSel, StopSel,
MaxWords, MinWords, ShortWord, and HighlightAll, there appears to be no
option for a fragment delimiter.

In my case I do:

SELECT v1.id, v1.type_id, v1.title, ts_headline(v1.copy, query, 'MinWords =
17') as copy, ts_rank(v1.text_search, query) AS rank FROM (SELECT b1.*, (setweight(to_tsvector(coalesce(b1.title,'')),
'A')
||    setweight(to_tsvector(coalesce(b1.copy,'')), 'B')) as text_search     FROM search.v_searchable_content b1) v1,
plainto_tsquery($1)query
 
WHERE ($2 IS NULL OR (type_id = ANY($2))) AND query @@ v1.text_search ORDER
BY rank DESC, title

Now, this use of ts_headline correctly returns me highlighted fragmented
search results, but there will be no fragment delimiter for the headline.
Some suggestions were to change ts_headline(v1.copy, query, 'MinWords = 17')
to '...' || _headline(v1.copy, query, 'MinWords = 17') || '...',  but as you
can clearly see this would always occur, and not be intelligent regarding
the fragments. I hope that you're correct and that it is implemented, and
not documented

>-----Original Message-----
>From: Sushant Sinha [mailto:sushant354@gmail.com]
>Sent: Saturday, February 14, 2009 4:07 PM
>To: Asher Snyder
>Cc: pgsql-hackers@postgresql.org
>Subject: Re: [HACKERS] Ellipses around result fragment of ts_headline
>
>I think we currently do that. We add ellipses only when we encounter a
>new fragment. So there should not be ellipses if we are at the end of
>the document or if that is the first fragment (includes the beginning of
>the document). Here is the code in generateHeadline, ts_parse.c that
>adds the ellipses:
>
>            if (!infrag)
>            {
>
>                /* start of a new fragment */
>                infrag = 1;
>                numfragments ++;
>                /* add a fragment delimitor if this is after the first
>one */
>                if (numfragments > 1)
>                {
>                    memcpy(ptr, prs->fragdelim, prs->fragdelimlen);
>                    ptr += prs->fragdelimlen;
>                }
>
>            }
>
>It is possible that there is a bug that needs to be fixed. Can you show
>me an example where you found that?
>
>-Sushant.
>
>
>
>
>On Sat, 2009-02-14 at 15:13 -0500, Asher Snyder wrote:
>> It would be very useful if there were an option to have ts_headline
>append
>> ellipses before or after a result fragement based on the position of
>the
>> fragment in the source document. For instance, when running
>ts_headline(doc,
>> query) it will correctly return a fragment with words highlighted,
>however,
>> there's no easy way to determine whether this returned fragment is at
>the
>> beginning or end of the original doc, and add the necessary ellipses.
>>
>> Searches such as postgresql.org ALWAYS add ellipses before or after
>the
>> fragment regardless of whether or not ellipses are warranted. In my
>opinion
>> always adding ellipses to the fragment is deceptive to the user, in
>many of
>> my search result cases, the fragment is at the beginning of the doc,
>and
>> would confuse the user to always see ellipses. So you can see how
>useful the
>> feature described above would be beneficial to the accuracy of the
>search
>> result fragment.
>>
>>
>>
>>
>>




Re: Ellipses around result fragment of ts_headline

От
Tom Lane
Дата:
Sushant Sinha <sushant354@gmail.com> writes:
> I think we currently do that.

... since about four months ago.

2008-10-17 14:05  teodor
* doc/src/sgml/textsearch.sgml, src/backend/tsearch/ts_parse.c,src/backend/tsearch/wparser_def.c,
src/include/tsearch/ts_public.h,src/test/regress/expected/tsearch.out,src/test/regress/sql/tsearch.sql:Improve
headelinegeneration. Nowheadline can contain several fragments a-la Google.Sushant Sinha <sushant354@gmail.com>
 
        regards, tom lane


Re: Ellipses around result fragment of ts_headline

От
Sushant Sinha
Дата:
The documentation in 8.4dev has information on FragmentDelimiter
http://developer.postgresql.org/pgdocs/postgres/textsearch-controls.html

If you do not specify MaxFragments > 0, then the default headline
generator kicks in. The default headline generator does not have any
fragment delimiter. So it is correct that you will not see any
delimiter.

I think you are looking for the default headline generator to add
ellipses as  well depending on where the fragment is. I do not what
other people opinion on this is.

-Sushant.

On Sat, 2009-02-14 at 16:21 -0500, Asher Snyder wrote:
> Interesting, it could be that you already do it, but the documentation makes
> no reference to a fragment delimiter, so there's no way that I can see to
> add one. The documentation for ts_headline only lists StartSel, StopSel,
> MaxWords, MinWords, ShortWord, and HighlightAll, there appears to be no
> option for a fragment delimiter.
> 
> In my case I do:
> 
> SELECT v1.id, v1.type_id, v1.title, ts_headline(v1.copy, query, 'MinWords =
> 17') as copy, ts_rank(v1.text_search, query) AS rank FROM 
>     (SELECT b1.*, (setweight(to_tsvector(coalesce(b1.title,'')), 'A')
> ||
>      setweight(to_tsvector(coalesce(b1.copy,'')), 'B')) as text_search
>       FROM search.v_searchable_content b1) v1,  
>     plainto_tsquery($1) query
> WHERE ($2 IS NULL OR (type_id = ANY($2))) AND query @@ v1.text_search ORDER
> BY rank DESC, title
> 
> Now, this use of ts_headline correctly returns me highlighted fragmented
> search results, but there will be no fragment delimiter for the headline.
> Some suggestions were to change ts_headline(v1.copy, query, 'MinWords = 17')
> to '...' || _headline(v1.copy, query, 'MinWords = 17') || '...',  but as you
> can clearly see this would always occur, and not be intelligent regarding
> the fragments. I hope that you're correct and that it is implemented, and
> not documented
> 
> >-----Original Message-----
> >From: Sushant Sinha [mailto:sushant354@gmail.com]
> >Sent: Saturday, February 14, 2009 4:07 PM
> >To: Asher Snyder
> >Cc: pgsql-hackers@postgresql.org
> >Subject: Re: [HACKERS] Ellipses around result fragment of ts_headline
> >
> >I think we currently do that. We add ellipses only when we encounter a
> >new fragment. So there should not be ellipses if we are at the end of
> >the document or if that is the first fragment (includes the beginning of
> >the document). Here is the code in generateHeadline, ts_parse.c that
> >adds the ellipses:
> >
> >            if (!infrag)
> >            {
> >
> >                /* start of a new fragment */
> >                infrag = 1;
> >                numfragments ++;
> >                /* add a fragment delimitor if this is after the first
> >one */
> >                if (numfragments > 1)
> >                {
> >                    memcpy(ptr, prs->fragdelim, prs->fragdelimlen);
> >                    ptr += prs->fragdelimlen;
> >                }
> >
> >            }
> >
> >It is possible that there is a bug that needs to be fixed. Can you show
> >me an example where you found that?
> >
> >-Sushant.
> >
> >
> >
> >
> >On Sat, 2009-02-14 at 15:13 -0500, Asher Snyder wrote:
> >> It would be very useful if there were an option to have ts_headline
> >append
> >> ellipses before or after a result fragement based on the position of
> >the
> >> fragment in the source document. For instance, when running
> >ts_headline(doc,
> >> query) it will correctly return a fragment with words highlighted,
> >however,
> >> there's no easy way to determine whether this returned fragment is at
> >the
> >> beginning or end of the original doc, and add the necessary ellipses.
> >>
> >> Searches such as postgresql.org ALWAYS add ellipses before or after
> >the
> >> fragment regardless of whether or not ellipses are warranted. In my
> >opinion
> >> always adding ellipses to the fragment is deceptive to the user, in
> >many of
> >> my search result cases, the fragment is at the beginning of the doc,
> >and
> >> would confuse the user to always see ellipses. So you can see how
> >useful the
> >> feature described above would be beneficial to the accuracy of the
> >search
> >> result fragment.
> >>
> >>
> >>
> >>
> >>
> 
> 



Re: Ellipses around result fragment of ts_headline

От
Asher Snyder
Дата:
Yes, you are correct in your assumption that I'm looking for a single
fragment to also have the option to add a fragment delimiter based on its
position in the document. 

>-----Original Message-----
>From: Sushant Sinha [mailto:sushant354@gmail.com]
>Sent: Saturday, February 14, 2009 4:41 PM
>To: Asher Snyder
>Cc: pgsql-hackers@postgresql.org
>Subject: RE: [HACKERS] Ellipses around result fragment of ts_headline
>
>The documentation in 8.4dev has information on FragmentDelimiter
>http://developer.postgresql.org/pgdocs/postgres/textsearch-controls.html
>
>If you do not specify MaxFragments > 0, then the default headline
>generator kicks in. The default headline generator does not have any
>fragment delimiter. So it is correct that you will not see any
>delimiter.
>
>I think you are looking for the default headline generator to add
>ellipses as  well depending on where the fragment is. I do not what
>other people opinion on this is.
>
>-Sushant.
>
>On Sat, 2009-02-14 at 16:21 -0500, Asher Snyder wrote:
>> Interesting, it could be that you already do it, but the documentation
>makes
>> no reference to a fragment delimiter, so there's no way that I can see
>to
>> add one. The documentation for ts_headline only lists StartSel,
>StopSel,
>> MaxWords, MinWords, ShortWord, and HighlightAll, there appears to be
>no
>> option for a fragment delimiter.
>>
>> In my case I do:
>>
>> SELECT v1.id, v1.type_id, v1.title, ts_headline(v1.copy, query,
>'MinWords =
>> 17') as copy, ts_rank(v1.text_search, query) AS rank FROM
>>     (SELECT b1.*, (setweight(to_tsvector(coalesce(b1.title,'')), 'A')
>> ||
>>      setweight(to_tsvector(coalesce(b1.copy,'')), 'B')) as text_search
>>       FROM search.v_searchable_content b1) v1,
>>     plainto_tsquery($1) query
>> WHERE ($2 IS NULL OR (type_id = ANY($2))) AND query @@ v1.text_search
>ORDER
>> BY rank DESC, title
>>
>> Now, this use of ts_headline correctly returns me highlighted
>fragmented
>> search results, but there will be no fragment delimiter for the
>headline.
>> Some suggestions were to change ts_headline(v1.copy, query, 'MinWords
>= 17')
>> to '...' || _headline(v1.copy, query, 'MinWords = 17') || '...',  but
>as you
>> can clearly see this would always occur, and not be intelligent
>regarding
>> the fragments. I hope that you're correct and that it is implemented,
>and
>> not documented
>>
>> >-----Original Message-----
>> >From: Sushant Sinha [mailto:sushant354@gmail.com]
>> >Sent: Saturday, February 14, 2009 4:07 PM
>> >To: Asher Snyder
>> >Cc: pgsql-hackers@postgresql.org
>> >Subject: Re: [HACKERS] Ellipses around result fragment of ts_headline
>> >
>> >I think we currently do that. We add ellipses only when we encounter
>a
>> >new fragment. So there should not be ellipses if we are at the end of
>> >the document or if that is the first fragment (includes the beginning
>of
>> >the document). Here is the code in generateHeadline, ts_parse.c that
>> >adds the ellipses:
>> >
>> >            if (!infrag)
>> >            {
>> >
>> >                /* start of a new fragment */
>> >                infrag = 1;
>> >                numfragments ++;
>> >                /* add a fragment delimitor if this is after the
>first
>> >one */
>> >                if (numfragments > 1)
>> >                {
>> >                    memcpy(ptr, prs->fragdelim, prs->fragdelimlen);
>> >                    ptr += prs->fragdelimlen;
>> >                }
>> >
>> >            }
>> >
>> >It is possible that there is a bug that needs to be fixed. Can you
>show
>> >me an example where you found that?
>> >
>> >-Sushant.
>> >
>> >
>> >
>> >
>> >On Sat, 2009-02-14 at 15:13 -0500, Asher Snyder wrote:
>> >> It would be very useful if there were an option to have ts_headline
>> >append
>> >> ellipses before or after a result fragement based on the position
>of
>> >the
>> >> fragment in the source document. For instance, when running
>> >ts_headline(doc,
>> >> query) it will correctly return a fragment with words highlighted,
>> >however,
>> >> there's no easy way to determine whether this returned fragment is
>at
>> >the
>> >> beginning or end of the original doc, and add the necessary
>ellipses.
>> >>
>> >> Searches such as postgresql.org ALWAYS add ellipses before or after
>> >the
>> >> fragment regardless of whether or not ellipses are warranted. In my
>> >opinion
>> >> always adding ellipses to the fragment is deceptive to the user, in
>> >many of
>> >> my search result cases, the fragment is at the beginning of the
>doc,
>> >and
>> >> would confuse the user to always see ellipses. So you can see how
>> >useful the
>> >> feature described above would be beneficial to the accuracy of the
>> >search
>> >> result fragment.
>> >>
>> >>
>> >>
>> >>
>> >>
>>
>>




Re: Ellipses around result fragment of ts_headline

От
Asher Snyder
Дата:
No worries, I'm going to start playing around with the dev branch now, but
in any case, your previous response is still applicable, and the question
regarding the fragment delimiter for the first fragment is still applicable.
It seems that without that, I would still have the same problem with the
first fragment.

>-----Original Message-----
>From: Sushant Sinha [mailto:sushant354@gmail.com]
>Sent: Saturday, February 14, 2009 4:47 PM
>To: Tom Lane
>Cc: Asher Snyder; pgsql-hackers@postgresql.org
>Subject: Re: [HACKERS] Ellipses around result fragment of ts_headline
>
>Sorry ... I thought you were running the development branch.
>
>-Sushant.
>
>On Sat, 2009-02-14 at 16:34 -0500, Tom Lane wrote:
>> Sushant Sinha <sushant354@gmail.com> writes:
>> > I think we currently do that.
>>
>> ... since about four months ago.
>>
>> 2008-10-17 14:05  teodor
>>
>>     * doc/src/sgml/textsearch.sgml, src/backend/tsearch/ts_parse.c,
>>     src/backend/tsearch/wparser_def.c,
>src/include/tsearch/ts_public.h,
>>     src/test/regress/expected/tsearch.out,
>>     src/test/regress/sql/tsearch.sql: Improve headeline generation.
>Now
>>     headline can contain several fragments a-la Google.
>>
>>     Sushant Sinha <sushant354@gmail.com>
>>
>>             regards, tom lane




Re: Ellipses around result fragment of ts_headline

От
Sushant Sinha
Дата:
Sorry ... I thought you were running the development branch.

-Sushant.

On Sat, 2009-02-14 at 16:34 -0500, Tom Lane wrote:
> Sushant Sinha <sushant354@gmail.com> writes:
> > I think we currently do that.
> 
> ... since about four months ago.
> 
> 2008-10-17 14:05  teodor
> 
>     * doc/src/sgml/textsearch.sgml, src/backend/tsearch/ts_parse.c,
>     src/backend/tsearch/wparser_def.c, src/include/tsearch/ts_public.h,
>     src/test/regress/expected/tsearch.out,
>     src/test/regress/sql/tsearch.sql: Improve headeline generation. Now
>     headline can contain several fragments a-la Google.
>     
>     Sushant Sinha <sushant354@gmail.com>
> 
>             regards, tom lane