Re: tsvector string representation and parsing

Поиск
Список
Период
Сортировка
От Johannes Graën
Тема Re: tsvector string representation and parsing
Дата
Msg-id 54593f15-6b3b-086e-ce1e-69f0af9d38a9@selfnet.de
обсуждение исходный текст
Ответ на Re: tsvector string representation and parsing  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: tsvector string representation and parsing  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
Thanks, Tom.

On 23/02/2022 23.30, Tom Lane wrote:
> =?UTF-8?Q?Johannes_Gra=c3=abn?= <johannes@selfnet.de> writes:
>> This is a minimal example that goes wrong (but shouldn't IMHO:
> 
>>> SELECT format('%L:1', '\:')::tsvector
> 
> format(%L) is designed to produce a SQL literal, which does not
> have the same requirements as a tsvector element ... yeah, they're
> close, but not close enough.  In this particular example,
> what you get is
> 
> =# SELECT format('%L:1', '\:');
>    format
> ----------
>   E'\\:':1
> (1 row)
> 
> because format() adds an E prefix for the avoidance of doubt about
> what to do with the backslashes.  tsvector doesn't like that.

I see.


> I don't think we have any prefab function that does what you're
> looking for here, and TBH I'm not sure I see the point of it.
> Pretty much any tsvector you'd be dealing with in practice is
> going to have come from one of the to_tsvector family of
> functions, and those tend to drop punctuation.

Applied to normal texts in a standard language, I believe that's true. 
I'd like to use FTS in a setting where I need to control the positional 
attributes as I'm specifying more than one lexeme at the same position 
(which works great btw). That's why I can't use to_tsvector() but need 
to cast it from a string to tsvector.


>> My understanding is that everything inside
>> single quotes is taken as a lexeme. From the documentation:
> 
>>> To represent lexemes containing whitespace or punctuation, surround them with quotes
> 
> See also the next bit about having to double quotes and backslashes
> within those quotes.  So what you'd actually need is
> 
> =# select $$'\\:':1$$::tsvector;
>   tsvector
> ----------
>   '\\:':1
> (1 row)
> 
> If you write just one backslash, it has the effect of quoting
> the next character, which in this case doesn't need quoting.

Before using format(), I tried just generating those strings by doubling 
any single quote or backslash and enclosing the whole string in single 
quotes, but that didn't seem a safe way, though it works in principle:

> SELECT format($$'%s':%s$$, replace(replace(s, $$'$$, $$''$$), '\', '\\'), i)::tsvector
> FROM (SELECT $$\:$$ s, 1 i) x;

Would that be the way to go if to_tsvector is not an option?

Regards
   Johannes



В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: tsvector string representation and parsing
Следующее
От: "Peter J. Holzer"
Дата:
Сообщение: Re: pg_upgrade from Postgresql-12 to Postgresql-13 fails with "Creating dump of database schemas postgres *failure*"