Обсуждение: BUG #17739: postgres ts_headline function is not returning matches it should during full text search

Поиск
Список
Период
Сортировка

BUG #17739: postgres ts_headline function is not returning matches it should during full text search

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      17739
Logged by:          Sam S
Email address:      sssnarr1@gmail.com
PostgreSQL version: 15.1
Operating system:   ubuntu
Description:

Pretty version of this question originally posted on StackExchange:

https://dba.stackexchange.com/questions/321718/postgres-ts-headline-function-is-not-returning-matches-it-should-during-full-tex

**Background**: 
I've been using Postgres full-text search and it has met my needs quite
well. Though there is some unexpected behavior that I cannot seem to wrap my
head around. It has to do with the full-text search results returning the
highlighted matches using the `ts_headline` function. It returns the correct
matches most of the time, but often it will not return a match as I expect
it. I think an example is the best way to demonstrate this. 

Relevant Postgres full-text highlighting docs:
https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-HEADLINE

**Versions**: I have tried Postgres 15 and 12 and experienced this bug
(feature?) in both.

**Examples**:
In the following 3 full-text search highlighting queries, why do the first
one's results not match the second and third's? Im trying to figure out what
I can do to get matches on the first query. I first noticed that some of my
highlight queries were coming back with no 'hits'. I created the following 3
queries to show the issue I'm having.

According to the docs, when there are no matches (identified by `<b></b>`
tags) it simply returns the first `MinWords`. That's what is happening in
the first query when I think that we should actually get the 2 results back
as we do in the following two queries.

```
postgres=# SELECT ts_headline('english', 'beginning word word word word
CHILD word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word
word word word word word word word word word word SERVICE ending', 
    to_tsquery('english', 'CHILD & SERVICE'), 
    'MaxFragments=2,MinWords=5,MaxWords=10');

          ts_headline          
-------------------------------
 beginning word word word word
(1 row)
```
Now let's increase `MaxWords`, now we get 2 matches
```

postgres=# SELECT ts_headline('english', 'beginning word word word word
CHILD word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word
word word word word word word word word word word SERVICE ending', 
    to_tsquery('english', 'CHILD & SERVICE'), 
    'MaxFragments=2,MinWords=5,MaxWords=11');

                                                      ts_headline
                                           

------------------------------------------------------------------------------------------------------------------------
 beginning word word word word <b>CHILD</b> word word word word word ...
word word word word word <b>SERVICE</b> ending
(1 row)
```
Now let's increase `MaxFragments`, now we get 2 matches
```
postgres=# SELECT ts_headline('english', 'beginning word word word word
CHILD word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word
word word word word word word word word word word SERVICE ending', 
    to_tsquery('english', 'CHILD & SERVICE'), 
    'MaxFragments=3,MinWords=5,MaxWords=10');

                                               ts_headline
                            
---------------------------------------------------------------------------------------------------------
 word word word word <b>CHILD</b> word word word word word ... word word
word word <b>SERVICE</b> ending
(1 row)
```

I feel like something subtle is going on between all the `MaxFragments,
MinWords, MaxWords` settings, or maybe this is undefined behavior or a bug.
Im hoping to find a way to get the first query to match as I do believe it
should. Please correct me if I'm wrong.


PG Bug reporting form <noreply@postgresql.org> writes:
> In the following 3 full-text search highlighting queries, why do the first
> one's results not match the second and third's?

I think this is addressed by my pending patch at [1].  Perhaps
you'd like to help review/test that?

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/840.1669405935%40sss.pgh.pa.us



Hi, 

I applied that patch to the main branch, built and tested it and it works as I expect. 
I reviewed the code and it looks good to me as far as someone not familiar with the codebase can tell. 

Regards,
Sam Snarr



On Mon, Jan 9, 2023 at 10:40 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
PG Bug reporting form <noreply@postgresql.org> writes:
> In the following 3 full-text search highlighting queries, why do the first
> one's results not match the second and third's?

I think this is addressed by my pending patch at [1].  Perhaps
you'd like to help review/test that?

                        regards, tom lane

[1] https://www.postgresql.org/message-id/flat/840.1669405935%40sss.pgh.pa.us
Sam S <sssnarr1@gmail.com> writes:
> I applied that patch to the main branch, built and tested it and it works
> as I expect.
> I reviewed the code and it looks good to me as far as someone not familiar
> with the codebase can tell.

Thanks for testing!  "Does it work as you expect" is exactly the
thing I'm most concerned about here, so your input is very valuable.

            regards, tom lane