Обсуждение: pgsql: Fix ts_headline() edge cases for empty query and empty search te

Поиск
Список
Период
Сортировка

pgsql: Fix ts_headline() edge cases for empty query and empty search te

От
Tom Lane
Дата:
Fix ts_headline() edge cases for empty query and empty search text.

tsquery's GETQUERY() macro is only safe to apply to a tsquery
that is known non-empty; otherwise it gives a pointer to garbage.
Before commit 5a617d75d, ts_headline() avoided this pitfall, but
only in a very indirect, nonobvious way.  (hlCover could not reach
its TS_execute call, because if the query contains no lexemes
then hlFirstIndex would surely return -1.)  After that commit,
it fell into the trap, resulting in weird errors such as
"unrecognized operator" and/or valgrind complaints.  In HEAD,
fix this by not calling TS_execute_locations() at all for an
empty query.  In the back branches, add a defensive check to
hlCover() --- that's not fixing any live bug, but I judge the
code a bit too fragile as-is.

Also, both mark_hl_fragments() and mark_hl_words() were careless
about the possibility of empty search text: in the cases where
no match has been found, they'd end up telling mark_fragment() to
mark from word indexes 0 to 0 inclusive, even when there is no
word 0.  This is harmless since we over-allocated the prs->words
array, but it does annoy valgrind.  Fix so that the end index is -1
and thus mark_fragment() will do nothing in such cases.

Bottom line is that this fixes a live bug in HEAD, but in the
back branches it's only getting rid of a valgrind nitpick.
Back-patch anyway.

Per report from Alexander Lakhin.

Discussion: https://postgr.es/m/c27f642d-020b-01ff-ae61-086af287c4fd@gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/029dea882a7aa34f46732473eed7c917505e6481

Modified Files
--------------
src/backend/tsearch/wparser_def.c     | 21 ++++++++++++++-------
src/test/regress/expected/tsearch.out | 21 +++++++++++++++++++++
src/test/regress/sql/tsearch.sql      |  6 ++++++
3 files changed, 41 insertions(+), 7 deletions(-)


Re: pgsql: Fix ts_headline() edge cases for empty query and empty search te

От
Andres Freund
Дата:
Hi,

On 2023-04-06 19:52:55 +0000, Tom Lane wrote:
> Fix ts_headline() edge cases for empty query and empty search text.

Unfortunately there appears to be some portability issue with this. Causes a
test output difference on macos. Noticed it when testing something I was about
to push, and I was very confused for a second how my change could have a
portability aspect :)

https://cirrus-ci.com/build/6629557554380800

https://api.cirrus-ci.com/v1/artifact/task/4764833480966144/testrun/build/testrun/regress/regress/regression.diffs

diff -U3 /Users/admin/pgsql/src/test/regress/expected/tsearch.out
/Users/admin/pgsql/build/testrun/regress/regress/results/tsearch.out
--- /Users/admin/pgsql/src/test/regress/expected/tsearch.out    2023-04-06 20:03:11
+++ /Users/admin/pgsql/build/testrun/regress/regress/results/tsearch.out    2023-04-06 20:05:56
@@ -2133,6 +2133,9 @@
 NOTICE:  text-search query doesn't contain lexemes: ""
 LINE 2: '', ''::tsquery);
             ^
+NOTICE:  text-search query doesn't contain lexemes: ""
+LINE 2: '', ''::tsquery);
+            ^
  ts_headline 
 -------------

Sure looks like somehow there's duplicate NOTICEs?

Greetings,

Andres Freund



Re: pgsql: Fix ts_headline() edge cases for empty query and empty search te

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2023-04-06 19:52:55 +0000, Tom Lane wrote:
>> Fix ts_headline() edge cases for empty query and empty search text.

> Unfortunately there appears to be some portability issue with this. Causes a
> test output difference on macos.

Huh ... looking.

            regards, tom lane



Re: pgsql: Fix ts_headline() edge cases for empty query and empty search te

От
Tom Lane
Дата:
I wrote:
> Andres Freund <andres@anarazel.de> writes:
>> On 2023-04-06 19:52:55 +0000, Tom Lane wrote:
>>> Fix ts_headline() edge cases for empty query and empty search text.

>> Unfortunately there appears to be some portability issue with this. Causes a
>> test output difference on macos.

> Huh ... looking.

It's not about macOS, it's about that build script using
-DRANDOMIZE_ALLOCATED_MEMORY, which causes coerce_type() to
call tsqueryin() twice.  So you get the bleat twice.  Ugh.

I can work around it in these test cases by using to_tsquery()
instead of just "''::tsquery", but maybe we should rethink
something.

            regards, tom lane