Обсуждение: Tsvector editing functions

Поиск
Список
Период
Сортировка

Tsvector editing functions

От
Stas Kelvich
Дата:
Hello.

There is patch that adds some editing routines for tsvector type.

tsvector delete(tsvector, text)
    removes entry from tsvector by lexeme name
set unnest(tsvector)
    expands a tsvector to a set of rows. Each row has following columns: lexeme, postings, weights.
text[] to_array(tsvector)
    converts tsvector to array of lexemes
tsvector to_tsvector(text[])
    converts array of lexemes to tsvector




Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Вложения

Re: Tsvector editing functions

От
Robert Haas
Дата:
On Mon, Oct 5, 2015 at 1:29 PM, Stas Kelvich <s.kelvich@postgrespro.ru> wrote:
> Hello.
>
> There is patch that adds some editing routines for tsvector type.
>
> tsvector delete(tsvector, text)
>         removes entry from tsvector by lexeme name
> set unnest(tsvector)
>         expands a tsvector to a set of rows. Each row has following columns: lexeme, postings, weights.
> text[] to_array(tsvector)
>         converts tsvector to array of lexemes
> tsvector to_tsvector(text[])
>         converts array of lexemes to tsvector

When submitting a patch, it's a good idea to explain why someone would
want the feature you are adding.  Maybe that's obvious to you, but it
isn't clear to me why we'd want this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Tsvector editing functions

От
Teodor Sigaev
Дата:
>> There is patch that adds some editing routines for tsvector type.
...
> When submitting a patch, it's a good idea to explain why someone would
> want the feature you are adding.  Maybe that's obvious to you, but it
> isn't clear to me why we'd want this.
>

Some examples:
tsvector delete(tsvector, text)remove wronlgy indexed word (may, be a stop word)
text[] to_array(tsvector)In my practice, I needed it to work with smlar module.
tsvector to_tsvector(text[])Converts list of tags to tsvector, because search in tsvector is more        flexible and
fastthan array's equivalents
 
set unnest(tsvector)     Count some complicated statistics.

That functions mostly needed in utility processing rather in workflow.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 



Re: Tsvector editing functions

От
Teodor Sigaev
Дата:
1 Please, make patch compilable with current master.
cd ../../../src/include/catalog && '/usr/local/bin/perl' ./duplicate_oids
3315
3316

2 lexin = TextDatumGetCString(PG_GETARG_DATUM(1))  lexin_len = strlen(lexin)

Why do you use C-string instead of just text? Suppose, much better:   t = PG_GETARG_TEXT_P(1)   lexin = VARDATA(t)
lexin_len= VARSIZE_ANY_EXHDR(t)
 

3 Why do you use linear search in tsvector instead of binary search? It could 
produce a performance impact

4 Again, using BuildTupleFromCStrings() call is not very optimal

5 printing weights as numbers is not consistent with other usage of weigth's in 
FTS. Lexem's weight are mentioned as one of A,B,C,D and default weight is a D.

Teodor Sigaev wrote:
>>> There is patch that adds some editing routines for tsvector type.
> ...
>> When submitting a patch, it's a good idea to explain why someone would
>> want the feature you are adding.  Maybe that's obvious to you, but it
>> isn't clear to me why we'd want this.
>>
>
> Some examples:
> tsvector delete(tsvector, text)
>      remove wronlgy indexed word (may, be a stop word)
> text[] to_array(tsvector)
>      In my practice, I needed it to work with smlar module.
> tsvector to_tsvector(text[])
>      Converts list of tags to tsvector, because search in tsvector is more
>          flexible and fast than array's equivalents
> set unnest(tsvector)
>       Count some complicated statistics.
>
> That functions mostly needed in utility processing rather in workflow.
>
>

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 



Re: Tsvector editing functions

От
Teodor Sigaev
Дата:
Hmm, seems, it will be useful to add two fuctions:

tsvector filter(tsvector, array_of_weigths) - returns tsvector contains lexemes 
with given weights

tsvector setweight(tsvector, weigth, array_of_lexemes) - sets given weight for 
given lexemes

Stas Kelvich wrote:
> Hello.
>
> There is patch that adds some editing routines for tsvector type.
>
> tsvector delete(tsvector, text)
>     removes entry from tsvector by lexeme name
> set unnest(tsvector)
>     expands a tsvector to a set of rows. Each row has following columns: lexeme, postings, weights.
> text[] to_array(tsvector)
>     converts tsvector to array of lexemes
> tsvector to_tsvector(text[])
>     converts array of lexemes to tsvector
>
>
>
>
>
>
> Stas Kelvich
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>
>
>
>

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 



Re: Tsvector editing functions

От
Stas Kelvich
Дата:
Hello.

Done with the list of suggestions. Also heavily edit delete function.


---
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

> On 27 Nov 2015, at 15:13, Teodor Sigaev <teodor@sigaev.ru> wrote:
>
> Hmm, seems, it will be useful to add two fuctions:
>
> tsvector filter(tsvector, array_of_weigths) - returns tsvector contains lexemes with given weights
>
> tsvector setweight(tsvector, weigth, array_of_lexemes) - sets given weight for given lexemes
>
> Stas Kelvich wrote:
>> Hello.
>>
>> There is patch that adds some editing routines for tsvector type.
>>
>> tsvector delete(tsvector, text)
>>     removes entry from tsvector by lexeme name
>> set unnest(tsvector)
>>     expands a tsvector to a set of rows. Each row has following columns: lexeme, postings, weights.
>> text[] to_array(tsvector)
>>     converts tsvector to array of lexemes
>> tsvector to_tsvector(text[])
>>     converts array of lexemes to tsvector
>>
>>
>>
>>
>>
>>
>> Stas Kelvich
>> Postgres Professional: http://www.postgrespro.com
>> The Russian Postgres Company
>>
>>
>>
>>
>>
>
> --
> Teodor Sigaev                                   E-mail: teodor@sigaev.ru
>                                                   WWW: http://www.sigaev.ru/


Вложения

Re: Tsvector editing functions

От
Tomas Vondra
Дата:
Hi,

On 12/07/2015 03:05 PM, Stas Kelvich wrote:
> Hello.
>
> Done with the list of suggestions. Also heavily edit delete function.
>

I did a quick review of the updated patch. I'm not a tsvector-expert so 
hopefully my comments won't be entirely bogus.

1) It's a bit difficult to judge the usefulness of the API, as I've   always been a mere user of full-text search, and
Inever had a need   (or courage) to mess with the tsvectors. OTOH I don't see a good   reason no to have such API, when
there'sa need for it.
 
   The API seems to be reasonably complete, with one exception - when   looking at editing function we have for
'hstore',we do have these   variants for delete()
 
      delete(hstore,text)      delete(hstore,text[])      delete(hstore,hstore)
   while this patch only adds delete(tsvector,text). Would it make   sense to add variants similar to hstore? It
probablydoes not make   much sense to add delete(tsvector,tsvector), right? But being able   to delete a bunch of
lexemesin one go seems like a good thing.
 
   What do you think?


2) tsvector_op.c needs a bit of love, to eliminate the two warnings it   currently triggers:
    tsvector_op.c:211:2: warning: ISO C90 forbids mixed ...    tsvector_op.c:635:9: warning: variable ‘lexeme2copy’ set
but...
 

3) the patch also touches tsvector_setweight(), only to do change:
      elog(ERROR, "unrecognized weight: %d", cw);
   to
      elog(ERROR, "unrecognized weight: %c", cw);
   That should probably go get committed separately, as a bugfix.


4) I find it rather annoying that there are pretty much no comments in   the code. Granted, there are pretty much no
commentsin the   surrounding code, but I doubt that's a good reason for not having   any comments in new code. It makes
reviewsunnecessarily difficult.
 


5) tsvector_concat() is not mentioned in docs at all


6) Docs don't mention names of the new parameters in function   signatures, just data types. The functions with a
singleparameter   probably don't need to do that, but multi-parameter ones should.
 

7) Some of the functions use intexterm that does not match the function   name. I see two such cases - to_tsvector and
setweight.Is there a   reason for that?
 

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Tsvector editing functions

От
Michael Paquier
Дата:
On Tue, Dec 15, 2015 at 12:07 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> Hi,
>
> On 12/07/2015 03:05 PM, Stas Kelvich wrote:
>>
>> Hello.
>>
>> Done with the list of suggestions. Also heavily edit delete function.
>>
>
> I did a quick review of the updated patch. I'm not a tsvector-expert so
> hopefully my comments won't be entirely bogus.
>
> 1) It's a bit difficult to judge the usefulness of the API, as I've
>    always been a mere user of full-text search, and I never had a need
>    (or courage) to mess with the tsvectors. OTOH I don't see a good
>    reason no to have such API, when there's a need for it.
>
>    The API seems to be reasonably complete, with one exception - when
>    looking at editing function we have for 'hstore', we do have these
>    variants for delete()
>
>       delete(hstore,text)
>       delete(hstore,text[])
>       delete(hstore,hstore)
>
>    while this patch only adds delete(tsvector,text). Would it make
>    sense to add variants similar to hstore? It probably does not make
>    much sense to add delete(tsvector,tsvector), right? But being able
>    to delete a bunch of lexemes in one go seems like a good thing.
>
>    What do you think?
>
>
> 2) tsvector_op.c needs a bit of love, to eliminate the two warnings it
>    currently triggers:
>
>     tsvector_op.c:211:2: warning: ISO C90 forbids mixed ...
>     tsvector_op.c:635:9: warning: variable ‘lexeme2copy’ set but ...
>
> 3) the patch also touches tsvector_setweight(), only to do change:
>
>       elog(ERROR, "unrecognized weight: %d", cw);
>
>    to
>
>       elog(ERROR, "unrecognized weight: %c", cw);
>
>    That should probably go get committed separately, as a bugfix.
>
>
> 4) I find it rather annoying that there are pretty much no comments in
>    the code. Granted, there are pretty much no comments in the
>    surrounding code, but I doubt that's a good reason for not having
>    any comments in new code. It makes reviews unnecessarily difficult.
>
>
> 5) tsvector_concat() is not mentioned in docs at all
>
>
> 6) Docs don't mention names of the new parameters in function
>    signatures, just data types. The functions with a single parameter
>    probably don't need to do that, but multi-parameter ones should.
>
> 7) Some of the functions use intexterm that does not match the function
>    name. I see two such cases - to_tsvector and setweight. Is there a
>    reason for that?

I have marked this patch as returned with feedback based on the
presence of a review and a lack of replies from the author. Stas, if
you are still working on the patch, please feel free to move it to the
next commit fest.
--
Michael



Re: Tsvector editing functions

От
Stas Kelvich
Дата:
Hi, Tomáš! Thanks for comprehensive review.

> On 15 Dec 2015, at 06:07, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> 1) It's a bit difficult to judge the usefulness of the API, as I've
>   always been a mere user of full-text search, and I never had a need
>   (or courage) to mess with the tsvectors. OTOH I don't see a good
>   reason no to have such API, when there's a need for it.
>
>   The API seems to be reasonably complete, with one exception - when
>   looking at editing function we have for 'hstore', we do have these
>   variants for delete()
>
>      delete(hstore,text)
>      delete(hstore,text[])
>      delete(hstore,hstore)
>
>   while this patch only adds delete(tsvector,text). Would it make
>   sense to add variants similar to hstore? It probably does not make
>   much sense to add delete(tsvector,tsvector), right? But being able
>   to delete a bunch of lexemes in one go seems like a good thing.
>
>   What do you think?

That’s a good idea and actually deleting tsvector from tsvector makes perfect sense. In delete function I used exact
stringmatch between string and lexemes in tsvector, but if somebody wants to delete for example “Cats” from tsvector,
thenhe should downcase and singularize this word. Easiest way to do it is to just use to_tsvector() function. Also we
canuse this function to delete specific positions: like delete('cat:3 fat:2,4'::tsvector, 'fat:2'::tsvector) -> 'cat:3
fat:4'::tsvector.

So in attached patch I’ve implemented following:

delete(tsin tsvector, lexarrtext[]) — remove any occurence of lexemes inlexarr from tsin

delete(tsin tsvector, tsv_filter tsvector) — Delete lexemes and/or positions of tsv_filter from tsin. When lexeme in
tsv_filterhas no positions function will delete any occurrence of same lexeme in tsin. When tsv_filter lexeme have
positionsfunction will delete them from positions of matching lexeme in tsin. If after such removal resulting positions
setis empty then function will delete that lexeme from resulting tsvector. 

Also if we want some level of completeness of API and taking into account that concat() function shift positions on
secondargument I thought that it can be useful to also add function that can shift all positions of specific value.
Thishelps to undo concatenation: delete one of concatenating tsvectors and then shift positions in resulting tsvector.
SoI also wrote one another small function: 

shift(tsin tsvector,offset int16) — Shift all positions in tsin by given offset

>
>
> 2) tsvector_op.c needs a bit of love, to eliminate the two warnings it
>   currently triggers:
>
>    tsvector_op.c:211:2: warning: ISO C90 forbids mixed ...
>    tsvector_op.c:635:9: warning: variable ‘lexeme2copy’ set but …
>

fixed

> 3) the patch also touches tsvector_setweight(), only to do change:
>
>      elog(ERROR, "unrecognized weight: %d", cw);
>
>   to
>
>      elog(ERROR, "unrecognized weight: %c", cw);
>
>   That should probably go get committed separately, as a bugfix.
>

Okay, i’ll submit that as a separate patch.

>
> 4) I find it rather annoying that there are pretty much no comments in
>   the code. Granted, there are pretty much no comments in the
>   surrounding code, but I doubt that's a good reason for not having
>   any comments in new code. It makes reviews unnecessarily difficult.
>

Fixed, I think.

>
> 5) tsvector_concat() is not mentioned in docs at all
>

Concat mentioned in docs as an operator ||.

>
> 6) Docs don't mention names of the new parameters in function
>   signatures, just data types. The functions with a single parameter
>   probably don't need to do that, but multi-parameter ones should.
>

Fixed.

> 7) Some of the functions use intexterm that does not match the function
>   name. I see two such cases - to_tsvector and setweight. Is there a
>   reason for that?
>

Because sgml compiler wants unique indexterm. Both functions that you mentioned use overloading of arguments and have
non-uniquename. 




---
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


Вложения

Re: Tsvector editing functions

От
Alvaro Herrera
Дата:
So, Tomas, Teodor, did you like this new version of the patch?

Stas Kelvich wrote:

> > 7) Some of the functions use intexterm that does not match the function
> >   name. I see two such cases - to_tsvector and setweight. Is there a
> >   reason for that?
> 
> Because sgml compiler wants unique indexterm. Both functions that you
> mentioned use overloading of arguments and have non-unique name.

This sounds wrong.  I think what you should really do is use

<indexterm> <primary>foo</primary> <secondary>bar</secondary>
</indexterm>

to distinguish the two entries.

It's a bit funny that you reintroduce the "unrecognized weight: %d"
(instead of %c) in tsvector_setweight_by_filter.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Tsvector editing functions

От
Tomas Vondra
Дата:

On 12/30/2015 06:49 PM, Stas Kelvich wrote:
> Hi, Tomáš! Thanks for comprehensive review.
>
>> On 15 Dec 2015, at 06:07, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>>
>> 1) It's a bit difficult to judge the usefulness of the API, as I've
>>    always been a mere user of full-text search, and I never had a need
>>    (or courage) to mess with the tsvectors. OTOH I don't see a good
>>    reason no to have such API, when there's a need for it.
>>
>>    The API seems to be reasonably complete, with one exception - when
>>    looking at editing function we have for 'hstore', we do have these
>>    variants for delete()
>>
>>       delete(hstore,text)
>>       delete(hstore,text[])
>>       delete(hstore,hstore)
>>
>>    while this patch only adds delete(tsvector,text). Would it make
>>    sense to add variants similar to hstore? It probably does not make
>>    much sense to add delete(tsvector,tsvector), right? But being able
>>    to delete a bunch of lexemes in one go seems like a good thing.
>>
>>    What do you think?
>
> That’s a good idea and actually deleting tsvector from tsvector makes perfect sense. In delete function I used exact
stringmatch between string and lexemes in tsvector, but if somebody wants to delete for example “Cats” from tsvector,
thenhe should downcase and singularize this word. Easiest way to do it is to just use to_tsvector() function. Also we
canuse this function to delete specific positions: like delete('cat:3 fat:2,4'::tsvector, 'fat:2'::tsvector) -> 'cat:3
fat:4'::tsvector.
>
> So in attached patch I’ve implemented following:
>
> delete(tsin tsvector, lexarrtext[]) — remove any occurence of lexemes inlexarr from tsin

OK, although I do recommend using more sensible variable names, i.e. why 
how to use 'lexemes' instead of 'lexarr' for example? Similarly for the 
other functions.

>
> delete(tsin tsvector, tsv_filter tsvector) — Delete lexemes and/or positions of tsv_filter from tsin. When lexeme in
tsv_filterhas no positions function will delete any occurrence of same lexeme in tsin. When tsv_filter lexeme have
positionsfunction will delete them from positions of matching lexeme in tsin. If after such removal resulting positions
setis empty then function will delete that lexeme from resulting tsvector.
 
>

I can't really imagine situation in which I'd need this, but if you do 
have a use case for it ... although in the initial paragraph you say 
"... but if somebody wants to delete for example ..." which suggests you 
may not have such use case.

Based on bad experience with extending API based on vague ideas, I 
recommend only really adding functions with existing need. It's easy to 
add a function later, much more difficult to remove it or change the 
signature.

> Also if we want some level of completeness of API and taking into account that concat() function shift positions on
secondargument I thought that it can be useful to also add function that can shift all positions of specific value.
Thishelps to undo concatenation: delete one of concatenating tsvectors and then shift positions in resulting tsvector.
SoI also wrote one another small function:
 
>
> shift(tsin tsvector,offset int16) — Shift all positions in tsin by given offset

That seems rather too low-level. Shouldn't it be really built into 
delete() directly somehow?

>
>>
>> 4) I find it rather annoying that there are pretty much no comments in
>>    the code. Granted, there are pretty much no comments in the
>>    surrounding code, but I doubt that's a good reason for not having
>>    any comments in new code. It makes reviews unnecessarily difficult.
>>
>
> Fixed, I think.

Yep, much better now.

>
>>
>> 5) tsvector_concat() is not mentioned in docs at all
>>
>
> Concat mentioned in docs as an operator ||.

Ah, OK.

>
>>
>> 6) Docs don't mention names of the new parameters in function
>>    signatures, just data types. The functions with a single parameter
>>    probably don't need to do that, but multi-parameter ones should.
>>
>
> Fixed.

OK, but please let's use variable names clearly identifying the meaning. 
So not 'w' but 'weight' and so on.

>
>> 7) Some of the functions use intexterm that does not match the function
>>    name. I see two such cases - to_tsvector and setweight. Is there a
>>    reason for that?
>>
>
> Because sgml compiler wants unique indexterm. Both functions that
> youmentioned use overloading of arguments and have non-unique name.

As Michael pointed out, that should probably be handled by using 
<primary> and <secondary> tags.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Tsvector editing functions

От
Stas Kelvich
Дата:
Hi

> On 22 Jan 2016, at 19:03, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> OK, although I do recommend using more sensible variable names, i.e. why how to use 'lexemes' instead of 'lexarr' for
example?Similarly for the other functions. 


Changed. With old names I tried to follow conventions in surrounding code, but probably that is a good idea to switch
tomore meaningful names in new code. 

>>
>>
>> delete(tsin tsvector, tsv_filter tsvector) — Delete lexemes and/or positions of tsv_filter from tsin. When lexeme in
tsv_filterhas no positions function will delete any occurrence of same lexeme in tsin. When tsv_filter lexeme have
positionsfunction will delete them from positions of matching lexeme in tsin. If after such removal resulting positions
setis empty then function will delete that lexeme from resulting tsvector. 
>>
>
> I can't really imagine situation in which I'd need this, but if you do have a use case for it ... although in the
initialparagraph you say "... but if somebody wants to delete for example ..." which suggests you may not have such use
case.
>
> Based on bad experience with extending API based on vague ideas, I recommend only really adding functions with
existingneed. It's easy to add a function later, much more difficult to remove it or change the signature. 

I tried to create more or less self-contained api, e.g. have ability to negate effect of concatenation. But i’ve also
askedpeople around what they think about extending API and everybody convinced that it is better to stick to smaller
API.So let’s drop it. At least that functions exists in mail list in case if somebody will google for such kind of
behaviour.

>>
>> Also if we want some level of completeness of API and taking into account that concat() function shift positions on
secondargument I thought that it can be useful to also add function that can shift all positions of specific value.
Thishelps to undo concatenation: delete one of concatenating tsvectors and then shift positions in resulting tsvector.
SoI also wrote one another small function: 
>>
>> shift(tsin tsvector,offset int16) — Shift all positions in tsin by given offset
>
> That seems rather too low-level. Shouldn't it be really built into delete() directly somehow?


I think it is ambiguous task on delete. But if we are dropping support of delete(tsvector, tsvector) I don’t see points
inkeeping that functions. 

>>>
>>> 7) Some of the functions use intexterm that does not match the function
>>>   name. I see two such cases - to_tsvector and setweight. Is there a
>>>   reason for that?
>>>
>>
>> Because sgml compiler wants unique indexterm. Both functions that
>> youmentioned use overloading of arguments and have non-unique name.
>
> As Michael pointed out, that should probably be handled by using <primary> and <secondary> tags.


Done.


> On 19 Jan 2016, at 00:21, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
>
> It's a bit funny that you reintroduce the "unrecognized weight: %d"
> (instead of %c) in tsvector_setweight_by_filter.
>


Ah, I was thinking about moving it to separate diff and messed. Fixed and attaching diff with same fix for old
tsvector_setweight.





---
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


Вложения

Re: Tsvector editing functions

От
Teodor Sigaev
Дата:
Some notices:

1 tsin in documentation doesn't look like a good name. Changed to vector similar 
to other places.

2 I did some editorization about freeing memory/forgotten names etc

3 It seems to me that tsvector_unnest() could be seriously optimized for
large tsvectors: with current coding it detoasts/decompresses tsvector value on 
each call. Much better to do it once in
multi_call_memory_ctx context at first call init

4 It seems debatable returning empty array for position/weight if they are absent:
=# select * from unnest('a:1 b'::tsvector); lexeme | positions | weights
--------+-----------+--------- a      | {1}       | {D} b      | {}        | {}
I think, it's better to return NULL in this case

5 
array_to_tsvector/tsvector_setweight_by_filter/tsvector_delete_arr/tsvector_filter 
doesn't check or pay attention to NULL elements in input arrays





-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 



Re: Tsvector editing functions

От
Teodor Sigaev
Дата:
> Some notices:
>
> 1 tsin in documentation doesn't look like a good name. Changed to vector similar
> to other places.
>
> 2 I did some editorization about freeing memory/forgotten names etc

Ooops, forgot to attach
--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Вложения

Re: Tsvector editing functions

От
Stas Kelvich
Дата:
>
> On 02 Feb 2016, at 20:10, Teodor Sigaev <teodor@sigaev.ru> wrote:
>
> Some notices:
>
> 1 tsin in documentation doesn't look like a good name. Changed to vector similar to other places.
>
> 2 I did some editorization about freeing memory/forgotten names etc

Thanks.

>
> 3 It seems to me that tsvector_unnest() could be seriously optimized for
> large tsvectors: with current coding it detoasts/decompresses tsvector value on each call. Much better to do it once
in
> multi_call_memory_ctx context at first call init

Done, moved detoasting to first SRF call.

> 4 It seems debatable returning empty array for position/weight if they are absent:
> =# select * from unnest('a:1 b'::tsvector);
> lexeme | positions | weights
> --------+-----------+---------
> a      | {1}       | {D}
> b      | {}        | {}
> I think, it's better to return NULL in this case
>

Okay, done.

> 5 array_to_tsvector/tsvector_setweight_by_filter/tsvector_delete_arr/tsvector_filter doesn't check or pay attention
toNULL elements in input arrays 
>

Thanks! Fixed and added tests.

> --
> Teodor Sigaev                                   E-mail: teodor@sigaev.ru
>                                                   WWW: http://www.sigaev.ru/
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



---
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
Вложения

Re: Tsvector editing functions

От
Teodor Sigaev
Дата:
> Thanks! Fixed and added tests.
Thank you!

I did some patch cleanup/fix, but I have some doubt with function's names:

1 to_tsvector:
# \df to_tsvector
                              List of functions
    Schema   |    Name     | Result data type | Argument data types |  Type
------------+-------------+------------------+---------------------+--------
  pg_catalog | to_tsvector | tsvector         | regconfig, text     | normal
  pg_catalog | to_tsvector | tsvector         | text                | normal
  pg_catalog | to_tsvector | tsvector         | text[]              | normal

First two variants of to_tsvector make a morphological processing, last one doesn't.

2 to_array
# \df *to_array
                                   List of functions
    Schema   |         Name          | Result data type | Argument data types |
  Type
------------+-----------------------+------------------+---------------------+--------
  pg_catalog | regexp_split_to_array | text[]           | text, text          |
normal
  pg_catalog | regexp_split_to_array | text[]           | text, text, text    |
normal
  pg_catalog | string_to_array       | text[]           | text, text          |
normal
  pg_catalog | string_to_array       | text[]           | text, text, text    |
normal
  pg_catalog | to_array              | text[]           | tsvector            |
normal

Seems, to_array is not a right name compared to other *to_array.

I would like to suggest rename both functions to array_to_tsvector and
tsvector_to_array to have consistent name. Later we could add
to_tsvector([regconfig, ], text[]) with morphological processing.

Thoughts?



--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Вложения

Re: Tsvector editing functions

От
Stas Kelvich
Дата:
> On 10 Mar 2016, at 20:29, Teodor Sigaev <teodor@sigaev.ru> wrote:
>
> I would like to suggest rename both functions to array_to_tsvector and tsvector_to_array to have consistent name.
Laterwe could add to_tsvector([regconfig, ], text[]) with morphological processing. 
>
> Thoughts?
>

Seems reasonable, done.



---
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Вложения

Re: Tsvector editing functions

От
Stas Kelvich
Дата:
>
> On 11 Mar 2016, at 16:13, Stas Kelvich <s.kelvich@postgrespro.ru> wrote:
>
>
>> On 10 Mar 2016, at 20:29, Teodor Sigaev <teodor@sigaev.ru> wrote:
>>
>> I would like to suggest rename both functions to array_to_tsvector and tsvector_to_array to have consistent name.
Laterwe could add to_tsvector([regconfig, ], text[]) with morphological processing. 
>>
>> Thoughts?
>>

Hi, thanks for commit.

I saw errors on windows, here is the fix:


---
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


Вложения

Re: Tsvector editing functions

От
Teodor Sigaev
Дата:
> I saw errors on windows, here is the fix:
Thank you, pushed

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/