Thanks, Tom.
On 23/02/2022 23.30, Tom Lane wrote:
> =?UTF-8?Q?Johannes_Gra=c3=abn?= <johannes@selfnet.de> writes:
>> This is a minimal example that goes wrong (but shouldn't IMHO:
>
>>> SELECT format('%L:1', '\:')::tsvector
>
> format(%L) is designed to produce a SQL literal, which does not
> have the same requirements as a tsvector element ... yeah, they're
> close, but not close enough. In this particular example,
> what you get is
>
> =# SELECT format('%L:1', '\:');
> format
> ----------
> E'\\:':1
> (1 row)
>
> because format() adds an E prefix for the avoidance of doubt about
> what to do with the backslashes. tsvector doesn't like that.
I see.
> I don't think we have any prefab function that does what you're
> looking for here, and TBH I'm not sure I see the point of it.
> Pretty much any tsvector you'd be dealing with in practice is
> going to have come from one of the to_tsvector family of
> functions, and those tend to drop punctuation.
Applied to normal texts in a standard language, I believe that's true.
I'd like to use FTS in a setting where I need to control the positional
attributes as I'm specifying more than one lexeme at the same position
(which works great btw). That's why I can't use to_tsvector() but need
to cast it from a string to tsvector.
>> My understanding is that everything inside
>> single quotes is taken as a lexeme. From the documentation:
>
>>> To represent lexemes containing whitespace or punctuation, surround them with quotes
>
> See also the next bit about having to double quotes and backslashes
> within those quotes. So what you'd actually need is
>
> =# select $$'\\:':1$$::tsvector;
> tsvector
> ----------
> '\\:':1
> (1 row)
>
> If you write just one backslash, it has the effect of quoting
> the next character, which in this case doesn't need quoting.
Before using format(), I tried just generating those strings by doubling
any single quote or backslash and enclosing the whole string in single
quotes, but that didn't seem a safe way, though it works in principle:
> SELECT format($$'%s':%s$$, replace(replace(s, $$'$$, $$''$$), '\', '\\'), i)::tsvector
> FROM (SELECT $$\:$$ s, 1 i) x;
Would that be the way to go if to_tsvector is not an option?
Regards
Johannes