On 2022-Nov-25, Tom Lane wrote:
> After further contemplation of bug #17691 [1], I've concluded that
> what I did in commit c9b0c678d was largely misguided. For one
> thing, the new hlCover() algorithm no longer finds shortest-possible
> cover strings: if your query is "x & y" and the text is like
> "... x ... x ... y ...", then the selected cover string will run
> from the first occurrence of x to the y, whereas the old algorithm
> would have correctly selected "x ... y". For another thing, the
> maximum-cover-length hack that I added in 78e73e875 to band-aid
> over the performance issues of the original c9b0c678d patch means
> that various scenarios no longer work as well as they used to,
> which is the proximate cause of the complaints in bug #17691.
I came across #17556 which contains a different test for this, and I'm
not sure that this patch changes things completely for the better. In
that bug report, Alex Malek presents this example
select ts_headline('baz baz baz ipsum ' || repeat(' foo ',4998) || 'labor',
$$'ipsum' & 'labor'$$::tsquery,
'StartSel={, StopSel=}, MaxFragments=100, MaxWords=7, MinWords=3'),
ts_headline('baz baz baz ipsum ' || repeat(' foo ',4999) || 'labor',
$$'ipsum' & 'labor'$$::tsquery,
'StartSel={, StopSel=}, MaxFragments=100, MaxWords=7, MinWords=3');
which returns, in the current HEAD, the following
ts_headline │ ts_headline
─────────────────────┼─────────────
{ipsum} ... {labor} │ baz baz baz
(1 fila)
That is, once past the 5000 words of distance, it fails to find a good
cover, but before that it returns an acceptable headline. However,
after your proposed patch, we get this:
ts_headline │ ts_headline
─────────────┼─────────────
{ipsum} │ {ipsum}
(1 fila)
which is an improvement in the second case, though perhaps not as much
as we would like, and definitely not an improvement in the first case.
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"If you have nothing to say, maybe you need just the right tool to help you
not say it." (New York Times, about Microsoft PowerPoint)