Re: Tsvector editing functions

Поиск
Список
Период
Сортировка
От Stas Kelvich
Тема Re: Tsvector editing functions
Дата
Msg-id 84195C6D-34FF-414F-976A-583ED7BED27C@postgrespro.ru
обсуждение исходный текст
Ответ на Re: Tsvector editing functions  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: Tsvector editing functions  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Re: Tsvector editing functions  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
Hi, Tomáš! Thanks for comprehensive review.

> On 15 Dec 2015, at 06:07, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> 1) It's a bit difficult to judge the usefulness of the API, as I've
>   always been a mere user of full-text search, and I never had a need
>   (or courage) to mess with the tsvectors. OTOH I don't see a good
>   reason no to have such API, when there's a need for it.
>
>   The API seems to be reasonably complete, with one exception - when
>   looking at editing function we have for 'hstore', we do have these
>   variants for delete()
>
>      delete(hstore,text)
>      delete(hstore,text[])
>      delete(hstore,hstore)
>
>   while this patch only adds delete(tsvector,text). Would it make
>   sense to add variants similar to hstore? It probably does not make
>   much sense to add delete(tsvector,tsvector), right? But being able
>   to delete a bunch of lexemes in one go seems like a good thing.
>
>   What do you think?

That’s a good idea and actually deleting tsvector from tsvector makes perfect sense. In delete function I used exact
stringmatch between string and lexemes in tsvector, but if somebody wants to delete for example “Cats” from tsvector,
thenhe should downcase and singularize this word. Easiest way to do it is to just use to_tsvector() function. Also we
canuse this function to delete specific positions: like delete('cat:3 fat:2,4'::tsvector, 'fat:2'::tsvector) -> 'cat:3
fat:4'::tsvector.

So in attached patch I’ve implemented following:

delete(tsin tsvector, lexarrtext[]) — remove any occurence of lexemes inlexarr from tsin

delete(tsin tsvector, tsv_filter tsvector) — Delete lexemes and/or positions of tsv_filter from tsin. When lexeme in
tsv_filterhas no positions function will delete any occurrence of same lexeme in tsin. When tsv_filter lexeme have
positionsfunction will delete them from positions of matching lexeme in tsin. If after such removal resulting positions
setis empty then function will delete that lexeme from resulting tsvector. 

Also if we want some level of completeness of API and taking into account that concat() function shift positions on
secondargument I thought that it can be useful to also add function that can shift all positions of specific value.
Thishelps to undo concatenation: delete one of concatenating tsvectors and then shift positions in resulting tsvector.
SoI also wrote one another small function: 

shift(tsin tsvector,offset int16) — Shift all positions in tsin by given offset

>
>
> 2) tsvector_op.c needs a bit of love, to eliminate the two warnings it
>   currently triggers:
>
>    tsvector_op.c:211:2: warning: ISO C90 forbids mixed ...
>    tsvector_op.c:635:9: warning: variable ‘lexeme2copy’ set but …
>

fixed

> 3) the patch also touches tsvector_setweight(), only to do change:
>
>      elog(ERROR, "unrecognized weight: %d", cw);
>
>   to
>
>      elog(ERROR, "unrecognized weight: %c", cw);
>
>   That should probably go get committed separately, as a bugfix.
>

Okay, i’ll submit that as a separate patch.

>
> 4) I find it rather annoying that there are pretty much no comments in
>   the code. Granted, there are pretty much no comments in the
>   surrounding code, but I doubt that's a good reason for not having
>   any comments in new code. It makes reviews unnecessarily difficult.
>

Fixed, I think.

>
> 5) tsvector_concat() is not mentioned in docs at all
>

Concat mentioned in docs as an operator ||.

>
> 6) Docs don't mention names of the new parameters in function
>   signatures, just data types. The functions with a single parameter
>   probably don't need to do that, but multi-parameter ones should.
>

Fixed.

> 7) Some of the functions use intexterm that does not match the function
>   name. I see two such cases - to_tsvector and setweight. Is there a
>   reason for that?
>

Because sgml compiler wants unique indexterm. Both functions that you mentioned use overloading of arguments and have
non-uniquename. 




---
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Shay Rojansky
Дата:
Сообщение: Re: Some 9.5beta2 backend processes not terminating properly?
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Some 9.5beta2 backend processes not terminating properly?