Re: Rethinking the implementation of ts_headline()

Поиск
Список
Период
Сортировка
От Alvaro Herrera
Тема Re: Rethinking the implementation of ts_headline()
Дата
Msg-id 20230116122303.ddumvskoxnositjy@alvherre.pgsql
обсуждение исходный текст
Ответ на Rethinking the implementation of ts_headline()  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Rethinking the implementation of ts_headline()  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On 2022-Nov-25, Tom Lane wrote:

> After further contemplation of bug #17691 [1], I've concluded that
> what I did in commit c9b0c678d was largely misguided.  For one
> thing, the new hlCover() algorithm no longer finds shortest-possible
> cover strings: if your query is "x & y" and the text is like
> "... x ... x ... y ...", then the selected cover string will run
> from the first occurrence of x to the y, whereas the old algorithm
> would have correctly selected "x ... y".  For another thing, the
> maximum-cover-length hack that I added in 78e73e875 to band-aid
> over the performance issues of the original c9b0c678d patch means
> that various scenarios no longer work as well as they used to,
> which is the proximate cause of the complaints in bug #17691.

I came across #17556 which contains a different test for this, and I'm
not sure that this patch changes things completely for the better.  In
that bug report, Alex Malek presents this example

select ts_headline('baz baz baz ipsum ' || repeat(' foo ',4998) || 'labor',
           $$'ipsum' & 'labor'$$::tsquery,
       'StartSel={, StopSel=}, MaxFragments=100, MaxWords=7, MinWords=3'),
    ts_headline('baz baz baz ipsum ' || repeat(' foo ',4999) || 'labor',
           $$'ipsum' & 'labor'$$::tsquery,
       'StartSel={, StopSel=}, MaxFragments=100, MaxWords=7, MinWords=3');

which returns, in the current HEAD, the following
     ts_headline     │ ts_headline 
─────────────────────┼─────────────
 {ipsum} ... {labor} │ baz baz baz
(1 fila)

That is, once past the 5000 words of distance, it fails to find a good
cover, but before that it returns an acceptable headline.  However,
after your proposed patch, we get this:

 ts_headline │ ts_headline 
─────────────┼─────────────
 {ipsum}     │ {ipsum}
(1 fila)

which is an improvement in the second case, though perhaps not as much
as we would like, and definitely not an improvement in the first case.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"If you have nothing to say, maybe you need just the right tool to help you
not say it."                   (New York Times, about Microsoft PowerPoint)



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Juan José Santamaría Flecha
Дата:
Сообщение: Re: Using AF_UNIX sockets always for tests on Windows
Следующее
От: torikoshia
Дата:
Сообщение: Record queryid when auto_explain.log_verbose is on