Обсуждение: Shrinking TSvectors

Поиск
Список
Период
Сортировка

Shrinking TSvectors

От
Howard News
Дата:
Hi,

does anyone have any pointers for shrinking tsvectors

I have looked at the contents of some of these fields and they contain
many details that are not needed. For example...

"'+1':935,942 '-0500':72 '-0578':932 '-0667':938 '-266':937 '-873':944
'-9972':945 '/partners/application.html':222
'/partners/program/program-agreement.pdf':271
'/partners/reseller.html':181,1073 '01756':50,1083 '07767':54,1087
'1':753,771 '12':366 '14':66 (...)"

I am not interested in keeping the numbers or urls in the indexes.

Thanks,

Howard.


Re: Shrinking TSvectors

От
Oleg Bartunov
Дата:


On Tue, Apr 5, 2016 at 2:37 PM, Howard News <howardnews@selestial.com> wrote:
Hi,

does anyone have any pointers for shrinking tsvectors

I have looked at the contents of some of these fields and they contain many details that are not needed. For example...

"'+1':935,942 '-0500':72 '-0578':932 '-0667':938 '-266':937 '-873':944 '-9972':945 '/partners/application.html':222 '/partners/program/program-agreement.pdf':271 '/partners/reseller.html':181,1073 '01756':50,1083 '07767':54,1087 '1':753,771 '12':366 '14':66 (...)"
 
I am not interested in keeping the numbers or urls in the indexes.


select strip ('asd:23');
 strip
-------
 'asd'
(1 row)

 

Thanks,

Howard.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Shrinking TSvectors

От
Artur Zakirov
Дата:
On 05.04.2016 14:37, Howard News wrote:
> Hi,
>
> does anyone have any pointers for shrinking tsvectors
>
> I have looked at the contents of some of these fields and they contain
> many details that are not needed. For example...
>
> "'+1':935,942 '-0500':72 '-0578':932 '-0667':938 '-266':937 '-873':944
> '-9972':945 '/partners/application.html':222
> '/partners/program/program-agreement.pdf':271
> '/partners/reseller.html':181,1073 '01756':50,1083 '07767':54,1087
> '1':753,771 '12':366 '14':66 (...)"
>
> I am not interested in keeping the numbers or urls in the indexes.
>
> Thanks,
>
> Howard.
>
>

Hello,

You need create a new text search configuration. Here is an example of
commands:

CREATE TEXT SEARCH CONFIGURATION public.english_cfg (
     PARSER = default
);
ALTER TEXT SEARCH CONFIGURATION public.english_cfg
     ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
         word, hword, hword_part
     WITH pg_catalog.english_stem;

Instead of the "pg_catalog.english_stem" you can use your own dictionary.

Lets compare new configuration with the embedded configuration
"pg_catalog.english":

postgres=# select to_tsvector('english_cfg', 'home -9972
/partners/application.html /partners/program/program-agreement.pdf');
  to_tsvector
-------------
  'home':1
(1 row)

postgres=# select to_tsvector('english', 'home -9972
/partners/application.html /partners/program/program-agreement.pdf');
                                           to_tsvector

-----------------------------------------------------------------------------------------------
  '-9972':2 '/partners/application.html':3
'/partners/program/program-agreement.pdf':4 'home':1
(1 row)


You can get some additional information about configurations using \dF+:

postgres=# \dF+ english
Text search configuration "pg_catalog.english"
Parser: "pg_catalog.default"
       Token      | Dictionaries
-----------------+--------------
  asciihword      | english_stem
  asciiword       | english_stem
  email           | simple
  file            | simple
  float           | simple
  host            | simple
  hword           | english_stem
  hword_asciipart | english_stem
  hword_numpart   | simple
  hword_part      | english_stem
  int             | simple
  numhword        | simple
  numword         | simple
  sfloat          | simple
  uint            | simple
  url             | simple
  url_path        | simple
  version         | simple
  word            | english_stem

postgres=# \dF+ english_cfg
Text search configuration "public.english_cfg"
Parser: "pg_catalog.default"
       Token      | Dictionaries
-----------------+--------------
  asciihword      | english_stem
  asciiword       | english_stem
  hword           | english_stem
  hword_asciipart | english_stem
  hword_part      | english_stem
  word            | english_stem

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


Re: Shrinking TSvectors

От
Howard News
Дата:


On 05/04/2016 14:44, Oleg Bartunov wrote:


On Tue, Apr 5, 2016 at 2:37 PM, Howard News <howardnews@selestial.com> wrote:
Hi,

does anyone have any pointers for shrinking tsvectors

I have looked at the contents of some of these fields and they contain many details that are not needed. For example...

"'+1':935,942 '-0500':72 '-0578':932 '-0667':938 '-266':937 '-873':944 '-9972':945 '/partners/application.html':222 '/partners/program/program-agreement.pdf':271 '/partners/reseller.html':181,1073 '01756':50,1083 '07767':54,1087 '1':753,771 '12':366 '14':66 (...)"
 
I am not interested in keeping the numbers or urls in the indexes.


select strip ('asd:23');
 strip
-------
 'asd'
(1 row)

 
Hi Oleg,

Is this function documented anywhere?

Howard.

Re: Shrinking TSvectors

От
Alexander Shereshevsky
Дата:
On Tue, Apr 5, 2016 at 5:37 PM, Howard News <howardnews@selestial.com> wrote:


On 05/04/2016 14:44, Oleg Bartunov wrote:


On Tue, Apr 5, 2016 at 2:37 PM, Howard News <howardnews@selestial.com> wrote:
Hi,

does anyone have any pointers for shrinking tsvectors

I have looked at the contents of some of these fields and they contain many details that are not needed. For example...

"'+1':935,942 '-0500':72 '-0578':932 '-0667':938 '-266':937 '-873':944 '-9972':945 '/partners/application.html':222 '/partners/program/program-agreement.pdf':271 '/partners/reseller.html':181,1073 '01756':50,1083 '07767':54,1087 '1':753,771 '12':366 '14':66 (...)"
 
I am not interested in keeping the numbers or urls in the indexes.


select strip ('asd:23');
 strip
-------
 'asd'
(1 row)

 
Hi Oleg,

Is this function documented anywhere?

Howard.

Re: Shrinking TSvectors

От
Adrian Klaver
Дата:
On 04/05/2016 07:37 AM, Howard News wrote:
>
>
> On 05/04/2016 14:44, Oleg Bartunov wrote:
>>
>>
>> On Tue, Apr 5, 2016 at 2:37 PM, Howard News <howardnews@selestial.com
>> <mailto:howardnews@selestial.com>> wrote:
>>
>>     Hi,
>>
>>     does anyone have any pointers for shrinking tsvectors
>>
>>     I have looked at the contents of some of these fields and they
>>     contain many details that are not needed. For example...
>>
>>     "'+1':935,942 '-0500':72 '-0578':932 '-0667':938 '-266':937
>>     '-873':944 '-9972':945 '/partners/application.html':222
>>     '/partners/program/program-agreement.pdf':271
>>     '/partners/reseller.html':181,1073 '01756':50,1083 '07767':54,1087
>>     '1':753,771 '12':366 '14':66 (...)"
>>
>>     I am not interested in keeping the numbers or urls in the indexes.
>>
>>
>>
>> select strip ('asd:23');
>>  strip
>> -------
>>  'asd'
>> (1 row)
>>
>>
> Hi Oleg,
>
> Is this function documented anywhere?

http://www.postgresql.org/docs/9.5/static/functions-textsearch.html

>
> Howard.


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Shrinking TSvectors

От
Howard News
Дата:

On 05/04/2016 15:15, Artur Zakirov wrote:
> On 05.04.2016 14:37, Howard News wrote:
>> Hi,
>>
>> does anyone have any pointers for shrinking tsvectors
>>
>> I have looked at the contents of some of these fields and they contain
>> many details that are not needed. For example...
>>
>> "'+1':935,942 '-0500':72 '-0578':932 '-0667':938 '-266':937 '-873':944
>> '-9972':945 '/partners/application.html':222
>> '/partners/program/program-agreement.pdf':271
>> '/partners/reseller.html':181,1073 '01756':50,1083 '07767':54,1087
>> '1':753,771 '12':366 '14':66 (...)"
>>
>> I am not interested in keeping the numbers or urls in the indexes.
>>
>> Thanks,
>>
>> Howard.
>>
>>
>
> Hello,
>
> You need create a new text search configuration. Here is an example of
> commands:
>
> CREATE TEXT SEARCH CONFIGURATION public.english_cfg (
>     PARSER = default
> );
> ALTER TEXT SEARCH CONFIGURATION public.english_cfg
>     ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
>         word, hword, hword_part
>     WITH pg_catalog.english_stem;
>
> Instead of the "pg_catalog.english_stem" you can use your own dictionary.
>
> Lets compare new configuration with the embedded configuration
> "pg_catalog.english":
>
> postgres=# select to_tsvector('english_cfg', 'home -9972
> /partners/application.html /partners/program/program-agreement.pdf');
>  to_tsvector
> -------------
>  'home':1
> (1 row)
>
> postgres=# select to_tsvector('english', 'home -9972
> /partners/application.html /partners/program/program-agreement.pdf');
>                                           to_tsvector
> -----------------------------------------------------------------------------------------------
>
>  '-9972':2 '/partners/application.html':3
> '/partners/program/program-agreement.pdf':4 'home':1
> (1 row)
>
>
> You can get some additional information about configurations using \dF+:
>
> postgres=# \dF+ english
> Text search configuration "pg_catalog.english"
> Parser: "pg_catalog.default"
>       Token      | Dictionaries
> -----------------+--------------
>  asciihword      | english_stem
>  asciiword       | english_stem
>  email           | simple
>  file            | simple
>  float           | simple
>  host            | simple
>  hword           | english_stem
>  hword_asciipart | english_stem
>  hword_numpart   | simple
>  hword_part      | english_stem
>  int             | simple
>  numhword        | simple
>  numword         | simple
>  sfloat          | simple
>  uint            | simple
>  url             | simple
>  url_path        | simple
>  version         | simple
>  word            | english_stem
>
> postgres=# \dF+ english_cfg
> Text search configuration "public.english_cfg"
> Parser: "pg_catalog.default"
>       Token      | Dictionaries
> -----------------+--------------
>  asciihword      | english_stem
>  asciiword       | english_stem
>  hword           | english_stem
>  hword_asciipart | english_stem
>  hword_part      | english_stem
>  word            | english_stem
>
Thanks Artur,

Thats amazing! Postgres never ceases to amaze me. And the same goes for
the contributors to this list.