The following bug has been logged on the website:
Bug reference: 17569
Logged by: Alex Malek
Email address: magicagent@gmail.com
PostgreSQL version: 14.4
Operating system: Red Hat
Description:
It is well documented that "Position values in tsvector must be greater than
0 and no more than 16,383"
However these limits can result in false positive or false negative search
results
doing a FOLLOWED BY / phrase search in a document w/ more than 16,383
words.
The false negative seems particularly bad / unexpected.
The false positive results happen when a word is at or before before
position 16,382, then every word at or past position 16,383 appears to be at
16,383
SELECT tq, text, text @@ tq AS ok, repeat(' foo ',16381) || text @@ tq AS
false_pos
FROM (VALUES( websearch_to_tsquery('"red cat"'), 'red dogs chase with black
cats' )) t(tq, text) ;
tq | text | ok | false_pos
-----------------+--------------------------------+----+-----------
'red' <-> 'cat' | red dogs chase with black cats | f | t
(1 row)
The false negative happens for any phrase that exists at or after position
16,383 since all words appear to be at 16,383
# SELECT tq, text, text @@ tq AS small, repeat(' foo ',16381) || text @@ tq
AS false_neg
FROM (VALUES( websearch_to_tsquery('"black cat"'), 'red dogs chase with
black cats' )) t(tq, text) ;
tq | text | small | false_neg
-------------------+--------------------------------+-------+-----------
'black' <-> 'cat' | red dogs chase with black cats | t | f
(1 row)