Hello,
Affected versions: PG 11 to 14.3 (all).
Affected OS: windows 10 + x86_64-pc-linux-gnu (from dbfiddle)
Issue:
Thesaurus dictionary can transform a compound word to another one. The example provided in the doc is "supernovae stars : *sn". When used with websearch_to_tsquery, this transformation does not occur and the original words are kept, **OR**, if there is another single word entry in the thesaurus, this single transformation occurs.
Why it is a problem:
since other text search functions apply the transformation, a document containing the compound word can't be found when using websearch_to_tsquery.
Expected result:
websearch_to_tsquery should transform compound words from the thesaurus
Good to know:
1) the expected behavior occurs with single words from the thesaurus.
2) the bad behavior occurs regardless of pre or post stemming
3) If the compound word is double quoted, websearch_to_tsquery returns the expected output in V14 but a bad one in previous versions.
Steps to reproduce:
create a test_theasaurus.ths file with the lines
supernovae stars : *sn
supernovae : *sn
abc def: xy
CREATE TEXT SEARCH DICTIONARY test_thesaurus (
TEMPLATE = thesaurus,
DictFile = test_theasaurus,
Dictionary = pg_catalog.english_stem
);
CREATE TEXT SEARCH CONFIGURATION public.test ( COPY = pg_catalog.english );
ALTER TEXT SEARCH CONFIGURATION public.test
ALTER MAPPING FOR hword, hword_part, word, asciihword, hword_asciipart, asciiword
WITH public.test_thesaurus, english_stem;
select to_tsvector('test','abc def') @@ websearch_to_tsquery('test','abc def'); --FALSE - wrong result
select to_tsvector('test','supernovae stars') @@ websearch_to_tsquery('test','supernovae stars'); --FALSE - wrong result
select websearch_to_tsquery('test','abc def'); --'abc def' --> no transformation occurred
select websearch_to_tsquery('test','supernovae stars'); --'sn' & 'star' --> 1st word is listed by itself in the thesaurus and was transformed
select websearch_to_tsquery('test','"abc def"'); -- 'xy' --> in V14, double quoted compound words are transformed as expected
select to_tsvector('test','abc def'), plainto_tsquery('test','abc def'); --'xy', expected behavior in other functions
select to_tsvector('test','supernovae stars'), plainto_tsquery('test','supernovae stars'); --'sn', expected behavior in other functions
Let me know if there is anything else I can provide!
Thank you for taking the time to look at this issue, it is much appreciated
JG