Обсуждение: tsvector not giving expected results on one host

Поиск
Список
Период
Сортировка

tsvector not giving expected results on one host

От
"Dan Langille"
Дата:
Under PostgreSQL 12, I have a table using tsvector to search the column pkgmessage.

It looks like this (not all columns are shown).

                                                                                    Table "public.ports"
           Column           |           Type           | Collation | Nullable |
         Default                                                   
 
 pkgmessage                 | text                     |           |          | 
 pkgmessage_textsearchable  | tsvector                 |           |          | generated always as
(to_tsvector('english'::regconfig,pkgmessage)) stored
 

On several servers, it works fine, like this:

freshports.devgit=# SELECT id as port_id, element_pathname(element_id)
  FROM ports
 WHERE pkgmessage_textsearchable2  @@ websearch_to_tsquery('example');
 port_id |                           element_pathname                           
---------+----------------------------------------------------------------------
  100421 | /ports/branches/2022Q1/dns/dnsmasq
  100428 | /ports/branches/2022Q1/emulators/xsystem35
   14686 | /ports/head/sysutils/lmon
... etc

On the problem server, production, we get nothing.  Nada.

freshports.org=> SELECT id as port_id, element_pathname(element_id)
  FROM ports
 WHERE pkgmessage_textsearchable2  @@ websearch_to_tsquery('example');
 port_id | element_pathname 
---------+------------------
(0 rows)

freshports.org=> 

However, ilike on the same database does find the matches:

freshports.org=> SELECT id as port_id, element_pathname(element_id)
  FROM ports
 WHERE pkgmessage ilike '%example%';
 port_id |                           element_pathname                           
---------+----------------------------------------------------------------------
   34126 | /ports/head/security/pond
   74559 | /ports/branches/2015Q3/emulators/linux_base-c6
   60310 | /ports/branches/2020Q4/www/gitlab-ce
   38345 | /ports/head/www/gitlab
... etc

Both database are SQL_ASCII with the same Collate and Ctype settings.

Before composing this email, I ran "reindex table concurrently ports;" on the problem database. It did not change the
results.

prod, has problem

                                     List of databases
      Name      |  Owner   | Encoding  |   Collate   |    Ctype    |   Access privileges   
----------------+----------+-----------+-------------+-------------+-----------------------
 freshports.org | postgres | SQL_ASCII | C           | C           | 


dev, no issues:

[pg02 dan ~] % psql -l
                                        List of databases
            Name             |    Owner     | Encoding  | Collate | Ctype |   Access privileges   
-----------------------------+--------------+-----------+---------+-------+-----------------------
 freshports.devgit           | postgres     | SQL_ASCII | C       | C     | 

Any ideas as to what to search please?

Oh, one difference. All the working-as-expected databases are self-hosted on FreeBSD. The problem database is on AWS
RDS.
-- 
  Dan Langille
  dan@langille.org



Re: tsvector not giving expected results on one host

От
"Dan Langille"
Дата:
On Sat, Dec 17, 2022, at 1:53 PM, Dan Langille wrote:
> Under PostgreSQL 12, I have a table using tsvector to search the column 
> pkgmessage.
>
> It looks like this (not all columns are shown).
>
>                                                                         
>             Table "public.ports"
>            Column           |           Type           | Collation | 
> Nullable |                                                   Default    
>                                                
>  pkgmessage                 | text                     |           |    
>       | 
>  pkgmessage_textsearchable  | tsvector                 |           |    
>       | generated always as (to_tsvector('english'::regconfig, 
> pkgmessage)) stored

 pkgmessage_textsearchable2 | tsvector                 |           |          | generated always as
(to_tsvector('english'::regconfig,translate(pkgmessage, '/'::text, ' '::text))) stored
 

I see the above should have been included as well.

>
> On several servers, it works fine, like this:
>
> freshports.devgit=# SELECT id as port_id, element_pathname(element_id)
>   FROM ports
>  WHERE pkgmessage_textsearchable2  @@ websearch_to_tsquery('example');
>  port_id |                           element_pathname                           
> ---------+----------------------------------------------------------------------
>   100421 | /ports/branches/2022Q1/dns/dnsmasq
>   100428 | /ports/branches/2022Q1/emulators/xsystem35
>    14686 | /ports/head/sysutils/lmon
> ... etc
>
> On the problem server, production, we get nothing.  Nada.
>
> freshports.org=> SELECT id as port_id, element_pathname(element_id)
>   FROM ports
>  WHERE pkgmessage_textsearchable2  @@ websearch_to_tsquery('example');
>  port_id | element_pathname 
> ---------+------------------
> (0 rows)
>
> freshports.org=> 
>
> However, ilike on the same database does find the matches:
>
> freshports.org=> SELECT id as port_id, element_pathname(element_id)
>   FROM ports
>  WHERE pkgmessage ilike '%example%';
>  port_id |                           element_pathname                           
> ---------+----------------------------------------------------------------------
>    34126 | /ports/head/security/pond
>    74559 | /ports/branches/2015Q3/emulators/linux_base-c6
>    60310 | /ports/branches/2020Q4/www/gitlab-ce
>    38345 | /ports/head/www/gitlab
> ... etc

Let's look at that first entry on the problem database:

freshports.org=> select pkgmessage_textsearchable from ports where id = 34126;

pkgmessage_textsearchable
             
 

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 '/tmp':32 '1':10 '2':21 '3':33 '5':17 'client':6,36 'directori':25 'exampl':27 'follow':13,38 'instruct':40 'line':14
'load':19'loader.conf':16 'mount':22,28 'place':11 'pond':3,35,41 'requir':8 'run':34 'tmpfs':9,18,24,30,31 'use':2
'yes':20
(1 row)

freshports.org=> select pkgmessage_textsearchable2 from ports where id = 34126;

pkgmessage_textsearchable2
             
 

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 '1':10 '2':21 '3':33 '5':17 'client':6,36 'directori':25 'exampl':27 'follow':13,38 'instruct':40 'line':14 'load':19
'loader.conf':16'mount':22,28 'place':11 'pond':3,35,41 'requir':8 'run':34 'tmp':32 'tmpfs':9,18,24,30,31 'use':2
'yes':20
(1 row)

freshports.org=> 

From a database which runs this query with expected results:

freshports.devgit=# select pkgmessage_textsearchable2 from ports where id = 34126;

pkgmessage_textsearchable2
             
 

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 '1':10 '2':21 '3':33 '5':17 'client':6,36 'directori':25 'exampl':27 'follow':13,38 'instruct':40 'line':14 'load':19
'loader.conf':16'mount':22,28 'place':11 'pond':3,35,41 'requir':8 'run':34 'tmp':32 'tmpfs':9,18,24,30,31 'use':2
'yes':20
(1 row)

freshports.devgit=# select pkgmessage_textsearchable from ports where id = 34126;

pkgmessage_textsearchable
             
 

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 '/tmp':32 '1':10 '2':21 '3':33 '5':17 'client':6,36 'directori':25 'exampl':27 'follow':13,38 'instruct':40 'line':14
'load':19'loader.conf':16 'mount':22,28 'place':11 'pond':3,35,41 'requir':8 'run':34 'tmpfs':9,18,24,30,31 'use':2
'yes':20
(1 row)

freshports.devgit=# 



If I run my query with 'exampl', it finds what I expected, including 'pond' from above

freshports.org=> SELECT id as port_id, element_pathname(element_id)
  FROM ports
 WHERE pkgmessage_textsearchable2  @@ websearch_to_tsquery('exampl');
 port_id |                           element_pathname                           
---------+----------------------------------------------------------------------
   34126 | /ports/head/security/pond
   74559 | /ports/branches/2015Q3/emulators/linux_base-c6
   60310 | /ports/branches/2020Q4/www/gitlab-ce
   38345 | /ports/head/www/gitlab
   46842 | /ports/branches/2018Q1/mail/postfix-sasl

This is the same on both hosts:

freshports.org=> show  default_text_search_config ;
 default_text_search_config 
----------------------------
 pg_catalog.simple
-- 
  Dan Langille
  dan@langille.org



Re: tsvector not giving expected results on one host

От
Tom Lane
Дата:
"Dan Langille" <dan@langille.org> writes:
>  pkgmessage_textsearchable2 | tsvector                 |           |          | generated always as
(to_tsvector('english'::regconfig,translate(pkgmessage, '/'::text, ' '::text))) stored 

That is not likely to play well with this:

> freshports.org=> show  default_text_search_config ;
>  default_text_search_config
> ----------------------------
>  pg_catalog.simple

because "english" and "simple" will stem words differently.

regression=# select websearch_to_tsquery('english', 'example');
 websearch_to_tsquery
----------------------
 'exampl'
(1 row)

regression=# select websearch_to_tsquery('simple', 'example');
 websearch_to_tsquery
----------------------
 'example'
(1 row)

If what is in your tsvector is 'exampl', then only the first of
these will match.  So IMO the question is not "why is it failing
on prod?", it's "how the heck did it work on the other machine?".
You won't get nice results if websearch_to_tsquery is using a
different TS configuration than to_tsvector did.

            regards, tom lane



Re: tsvector not giving expected results on one host

От
"Dan Langille"
Дата:
On Sat, Dec 17, 2022, at 2:55 PM, Tom Lane wrote:
> "Dan Langille" <dan@langille.org> writes:
>>  pkgmessage_textsearchable2 | tsvector                 |           |          | generated always as
(to_tsvector('english'::regconfig,translate(pkgmessage, '/'::text, ' '::text))) stored
 
>
> That is not likely to play well with this:
>
>> freshports.org=> show  default_text_search_config ;
>>  default_text_search_config 
>> ----------------------------
>>  pg_catalog.simple
>
> because "english" and "simple" will stem words differently.
>
> regression=# select websearch_to_tsquery('english', 'example');
>  websearch_to_tsquery 
> ----------------------
>  'exampl'
> (1 row)
>
> regression=# select websearch_to_tsquery('simple', 'example');
>  websearch_to_tsquery 
> ----------------------
>  'example'
> (1 row)
>
> If what is in your tsvector is 'exampl', then only the first of
> these will match.  So IMO the question is not "why is it failing
> on prod?", it's "how the heck did it work on the other machine?".
> You won't get nice results if websearch_to_tsquery is using a
> different TS configuration than to_tsvector did.

I think this shows why we are getting the results we see.  Credit to ch on IRC for asking this question.

The problem host:

freshports.org=> select websearch_to_tsquery('example');
 websearch_to_tsquery 
----------------------
 'example'
(1 row)


The hosts on which this search works

freshports.devgit=# select websearch_to_tsquery('example');
 websearch_to_tsquery 
----------------------
 'exampl'
(1 row)

On that same host:

[pg02 dan ~] % sudo grep -i default_text_search_config /var/db/postgres/data12/postgresql.conf
default_text_search_config = 'pg_catalog.english'


-- 
  Dan Langille
  dan@langille.org



Re: tsvector not giving expected results on one host

От
"Dan Langille"
Дата:
On Sat, Dec 17, 2022, at 3:14 PM, Dan Langille wrote:
> On Sat, Dec 17, 2022, at 2:55 PM, Tom Lane wrote:
>> "Dan Langille" <dan@langille.org> writes:
>>>  pkgmessage_textsearchable2 | tsvector                 |           |          | generated always as
(to_tsvector('english'::regconfig,translate(pkgmessage, '/'::text, ' '::text))) stored
 
>>
>> That is not likely to play well with this:
>>
>>> freshports.org=> show  default_text_search_config ;
>>>  default_text_search_config 
>>> ----------------------------
>>>  pg_catalog.simple
>>
>> because "english" and "simple" will stem words differently.
>>
>> regression=# select websearch_to_tsquery('english', 'example');
>>  websearch_to_tsquery 
>> ----------------------
>>  'exampl'
>> (1 row)
>>
>> regression=# select websearch_to_tsquery('simple', 'example');
>>  websearch_to_tsquery 
>> ----------------------
>>  'example'
>> (1 row)
>>
>> If what is in your tsvector is 'exampl', then only the first of
>> these will match.  So IMO the question is not "why is it failing
>> on prod?", it's "how the heck did it work on the other machine?".
>> You won't get nice results if websearch_to_tsquery is using a
>> different TS configuration than to_tsvector did.
>
> I think this shows why we are getting the results we see.  Credit to ch 
> on IRC for asking this question.
>
> The problem host:
>
> freshports.org=> select websearch_to_tsquery('example');
>  websearch_to_tsquery 
> ----------------------
>  'example'
> (1 row)

Ahh, this explains the differences and as to why it works where it shouldn't?

freshports.org=> select setting, source from pg_settings where name = 'default_text_search_config';
      setting      | source  
-------------------+---------
 pg_catalog.simple | default
(1 row)


>
> The hosts on which this search works
>
> freshports.devgit=# select websearch_to_tsquery('example');
>  websearch_to_tsquery 
> ----------------------
>  'exampl'
> (1 row)


freshports.devgit=# select setting, source from pg_settings where name = 'default_text_search_config';
      setting       |       source       
--------------------+--------------------
 pg_catalog.english | configuration file
(1 row)


At least now I know what I can play with to get all hosts in sync.

-- 
  Dan Langille
  dan@langille.org



Re: tsvector not giving expected results on one host

От
"Dan Langille"
Дата:
On Sat, Dec 17, 2022, at 3:22 PM, Dan Langille wrote:
> On Sat, Dec 17, 2022, at 3:14 PM, Dan Langille wrote:
>> On Sat, Dec 17, 2022, at 2:55 PM, Tom Lane wrote:
>>> "Dan Langille" <dan@langille.org> writes:
>>>>  pkgmessage_textsearchable2 | tsvector                 |           |          | generated always as
(to_tsvector('english'::regconfig,translate(pkgmessage, '/'::text, ' '::text))) stored
 
>>>
>>> That is not likely to play well with this:
>>>
>>>> freshports.org=> show  default_text_search_config ;
>>>>  default_text_search_config 
>>>> ----------------------------
>>>>  pg_catalog.simple
>>>
>>> because "english" and "simple" will stem words differently.
>>>
>>> regression=# select websearch_to_tsquery('english', 'example');
>>>  websearch_to_tsquery 
>>> ----------------------
>>>  'exampl'
>>> (1 row)
>>>
>>> regression=# select websearch_to_tsquery('simple', 'example');
>>>  websearch_to_tsquery 
>>> ----------------------
>>>  'example'
>>> (1 row)
>>>
>>> If what is in your tsvector is 'exampl', then only the first of
>>> these will match.  So IMO the question is not "why is it failing
>>> on prod?", it's "how the heck did it work on the other machine?".
>>> You won't get nice results if websearch_to_tsquery is using a
>>> different TS configuration than to_tsvector did.
>>
>> I think this shows why we are getting the results we see.  Credit to ch 
>> on IRC for asking this question.
>>
>> The problem host:
>>
>> freshports.org=> select websearch_to_tsquery('example');
>>  websearch_to_tsquery 
>> ----------------------
>>  'example'
>> (1 row)
>
> Ahh, this explains the differences and as to why it works where it shouldn't?
>
> freshports.org=> select setting, source from pg_settings where name = 
> 'default_text_search_config';
>       setting      | source  
> -------------------+---------
>  pg_catalog.simple | default
> (1 row)
>
>
>>
>> The hosts on which this search works
>>
>> freshports.devgit=# select websearch_to_tsquery('example');
>>  websearch_to_tsquery 
>> ----------------------
>>  'exampl'
>> (1 row)
>
>
> freshports.devgit=# select setting, source from pg_settings where name 
> = 'default_text_search_config';
>       setting       |       source       
> --------------------+--------------------
>  pg_catalog.english | configuration file
> (1 row)
>
>
> At least now I know what I can play with to get all hosts in sync.

Here we go, on the problem database, create a new field, based on simple, not english.

ALTER TABLE public.ports
    ADD COLUMN pkgmessage_textsearchable3 tsvector generated always as (to_tsvector('simple'::regconfig,
translate(pkgmessage,'/'::text, ' '::text))) stored;
 

Index it:

CREATE INDEX ports_pkgmessage_textsearchable3_idx
    ON public.ports USING gin
    (pkgmessage_textsearchable3)
    TABLESPACE pg_default;
CREATE INDEX

query it:

freshports.org=> SELECT id as port_id, element_pathname(element_id)
  FROM ports
 WHERE pkgmessage_textsearchable3  @@ websearch_to_tsquery('example');
 port_id |                           element_pathname                           
---------+----------------------------------------------------------------------
   34126 | /ports/head/security/pond
   74559 | /ports/branches/2015Q3/emulators/linux_base-c6
   60310 | /ports/branches/2020Q4/www/gitlab-ce
   38345 | /ports/head/www/gitlab
   46842 | /ports/branches/2018Q1/mail/postfix-sasl
   51532 | /ports/branches/2019Q1/graphics/drm-legacy-kmod

Success. Thank you Mr Lane.

-- 
  Dan Langille
  dan@langille.org



Re: tsvector not giving expected results on one host

От
Oleg Bartunov
Дата:
Dan,

it's always good to specify configuration name in a query to avoid
recheck,  since
websearch_to_tsquery(regconfig, text) is immutable, while
websearch_to_tsquery(text) is stable.

See the difference:

[local]:5433 oleg@oleg=# explain (analyze,costs off)  select title
from apod where  websearch_to_tsquery('simple','galaxies') @@ fts;
                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Bitmap Heap Scan on apod (actual time=0.008..0.008 rows=0 loops=1)
   Recheck Cond: ('''galaxies'''::tsquery @@ fts)
   ->  Bitmap Index Scan on gin_apod_fts_idx (actual time=0.007..0.007
rows=0 loops=1)
         Index Cond: (fts @@ '''galaxies'''::tsquery)
 Planning Time: 0.134 ms
 Execution Time: 0.022 ms
(6 rows)

Time: 0.369 ms
[local]:5433 oleg@oleg=# explain (analyze,costs off)  select title
from apod where  websearch_to_tsquery('galaxies') @@ fts;
                                       QUERY PLAN
-----------------------------------------------------------------------------------------
 Bitmap Heap Scan on apod (actual time=0.107..1.463 rows=493 loops=1)
   Filter: (websearch_to_tsquery('galaxies'::text) @@ fts)
   Heap Blocks: exact=276
   ->  Bitmap Index Scan on gin_apod_fts_idx (actual time=0.059..0.059
rows=493 loops=1)
         Index Cond: (fts @@ websearch_to_tsquery('galaxies'::text))
 Planning Time: 0.125 ms
 Execution Time: 1.518 ms
(7 rows)

On Sat, Dec 17, 2022 at 11:34 PM Dan Langille <dan@langille.org> wrote:
>
> On Sat, Dec 17, 2022, at 3:22 PM, Dan Langille wrote:
> > On Sat, Dec 17, 2022, at 3:14 PM, Dan Langille wrote:
> >> On Sat, Dec 17, 2022, at 2:55 PM, Tom Lane wrote:
> >>> "Dan Langille" <dan@langille.org> writes:
> >>>>  pkgmessage_textsearchable2 | tsvector                 |           |          | generated always as
(to_tsvector('english'::regconfig,translate(pkgmessage, '/'::text, ' '::text))) stored
 
> >>>
> >>> That is not likely to play well with this:
> >>>
> >>>> freshports.org=> show  default_text_search_config ;
> >>>>  default_text_search_config
> >>>> ----------------------------
> >>>>  pg_catalog.simple
> >>>
> >>> because "english" and "simple" will stem words differently.
> >>>
> >>> regression=# select websearch_to_tsquery('english', 'example');
> >>>  websearch_to_tsquery
> >>> ----------------------
> >>>  'exampl'
> >>> (1 row)
> >>>
> >>> regression=# select websearch_to_tsquery('simple', 'example');
> >>>  websearch_to_tsquery
> >>> ----------------------
> >>>  'example'
> >>> (1 row)
> >>>
> >>> If what is in your tsvector is 'exampl', then only the first of
> >>> these will match.  So IMO the question is not "why is it failing
> >>> on prod?", it's "how the heck did it work on the other machine?".
> >>> You won't get nice results if websearch_to_tsquery is using a
> >>> different TS configuration than to_tsvector did.
> >>
> >> I think this shows why we are getting the results we see.  Credit to ch
> >> on IRC for asking this question.
> >>
> >> The problem host:
> >>
> >> freshports.org=> select websearch_to_tsquery('example');
> >>  websearch_to_tsquery
> >> ----------------------
> >>  'example'
> >> (1 row)
> >
> > Ahh, this explains the differences and as to why it works where it shouldn't?
> >
> > freshports.org=> select setting, source from pg_settings where name =
> > 'default_text_search_config';
> >       setting      | source
> > -------------------+---------
> >  pg_catalog.simple | default
> > (1 row)
> >
> >
> >>
> >> The hosts on which this search works
> >>
> >> freshports.devgit=# select websearch_to_tsquery('example');
> >>  websearch_to_tsquery
> >> ----------------------
> >>  'exampl'
> >> (1 row)
> >
> >
> > freshports.devgit=# select setting, source from pg_settings where name
> > = 'default_text_search_config';
> >       setting       |       source
> > --------------------+--------------------
> >  pg_catalog.english | configuration file
> > (1 row)
> >
> >
> > At least now I know what I can play with to get all hosts in sync.
>
> Here we go, on the problem database, create a new field, based on simple, not english.
>
> ALTER TABLE public.ports
>     ADD COLUMN pkgmessage_textsearchable3 tsvector generated always as (to_tsvector('simple'::regconfig,
translate(pkgmessage,'/'::text, ' '::text))) stored;
 
>
> Index it:
>
> CREATE INDEX ports_pkgmessage_textsearchable3_idx
>     ON public.ports USING gin
>     (pkgmessage_textsearchable3)
>     TABLESPACE pg_default;
> CREATE INDEX
>
> query it:
>
> freshports.org=> SELECT id as port_id, element_pathname(element_id)
>   FROM ports
>  WHERE pkgmessage_textsearchable3  @@ websearch_to_tsquery('example');
>  port_id |                           element_pathname
> ---------+----------------------------------------------------------------------
>    34126 | /ports/head/security/pond
>    74559 | /ports/branches/2015Q3/emulators/linux_base-c6
>    60310 | /ports/branches/2020Q4/www/gitlab-ce
>    38345 | /ports/head/www/gitlab
>    46842 | /ports/branches/2018Q1/mail/postfix-sasl
>    51532 | /ports/branches/2019Q1/graphics/drm-legacy-kmod
>
> Success. Thank you Mr Lane.
>
> --
>   Dan Langille
>   dan@langille.org
>
>


-- 
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: tsvector not giving expected results on one host

От
Oleg Bartunov
Дата:
I

On Wed, Dec 21, 2022 at 1:12 PM Oleg Bartunov <obartunov@postgrespro.ru> wrote:
>
> Dan,
>
> it's always good to specify configuration name in a query to avoid
> recheck,  since
> websearch_to_tsquery(regconfig, text) is immutable, while
> websearch_to_tsquery(text) is stable.

immutable function calculates once in planning time, but stable
function calculates during running time,
so the difference may be very big depending on how many tuples found.

>
> See the difference:
>
> [local]:5433 oleg@oleg=# explain (analyze,costs off)  select title
> from apod where  websearch_to_tsquery('simple','galaxies') @@ fts;
>                                       QUERY PLAN
> ---------------------------------------------------------------------------------------
>  Bitmap Heap Scan on apod (actual time=0.008..0.008 rows=0 loops=1)
>    Recheck Cond: ('''galaxies'''::tsquery @@ fts)
>    ->  Bitmap Index Scan on gin_apod_fts_idx (actual time=0.007..0.007
> rows=0 loops=1)
>          Index Cond: (fts @@ '''galaxies'''::tsquery)
>  Planning Time: 0.134 ms
>  Execution Time: 0.022 ms
> (6 rows)
>
> Time: 0.369 ms
> [local]:5433 oleg@oleg=# explain (analyze,costs off)  select title
> from apod where  websearch_to_tsquery('galaxies') @@ fts;
>                                        QUERY PLAN
> -----------------------------------------------------------------------------------------
>  Bitmap Heap Scan on apod (actual time=0.107..1.463 rows=493 loops=1)
>    Filter: (websearch_to_tsquery('galaxies'::text) @@ fts)
>    Heap Blocks: exact=276
>    ->  Bitmap Index Scan on gin_apod_fts_idx (actual time=0.059..0.059
> rows=493 loops=1)
>          Index Cond: (fts @@ websearch_to_tsquery('galaxies'::text))
>  Planning Time: 0.125 ms
>  Execution Time: 1.518 ms
> (7 rows)
>
> On Sat, Dec 17, 2022 at 11:34 PM Dan Langille <dan@langille.org> wrote:
> >
> > On Sat, Dec 17, 2022, at 3:22 PM, Dan Langille wrote:
> > > On Sat, Dec 17, 2022, at 3:14 PM, Dan Langille wrote:
> > >> On Sat, Dec 17, 2022, at 2:55 PM, Tom Lane wrote:
> > >>> "Dan Langille" <dan@langille.org> writes:
> > >>>>  pkgmessage_textsearchable2 | tsvector                 |           |          | generated always as
(to_tsvector('english'::regconfig,translate(pkgmessage, '/'::text, ' '::text))) stored
 
> > >>>
> > >>> That is not likely to play well with this:
> > >>>
> > >>>> freshports.org=> show  default_text_search_config ;
> > >>>>  default_text_search_config
> > >>>> ----------------------------
> > >>>>  pg_catalog.simple
> > >>>
> > >>> because "english" and "simple" will stem words differently.
> > >>>
> > >>> regression=# select websearch_to_tsquery('english', 'example');
> > >>>  websearch_to_tsquery
> > >>> ----------------------
> > >>>  'exampl'
> > >>> (1 row)
> > >>>
> > >>> regression=# select websearch_to_tsquery('simple', 'example');
> > >>>  websearch_to_tsquery
> > >>> ----------------------
> > >>>  'example'
> > >>> (1 row)
> > >>>
> > >>> If what is in your tsvector is 'exampl', then only the first of
> > >>> these will match.  So IMO the question is not "why is it failing
> > >>> on prod?", it's "how the heck did it work on the other machine?".
> > >>> You won't get nice results if websearch_to_tsquery is using a
> > >>> different TS configuration than to_tsvector did.
> > >>
> > >> I think this shows why we are getting the results we see.  Credit to ch
> > >> on IRC for asking this question.
> > >>
> > >> The problem host:
> > >>
> > >> freshports.org=> select websearch_to_tsquery('example');
> > >>  websearch_to_tsquery
> > >> ----------------------
> > >>  'example'
> > >> (1 row)
> > >
> > > Ahh, this explains the differences and as to why it works where it shouldn't?
> > >
> > > freshports.org=> select setting, source from pg_settings where name =
> > > 'default_text_search_config';
> > >       setting      | source
> > > -------------------+---------
> > >  pg_catalog.simple | default
> > > (1 row)
> > >
> > >
> > >>
> > >> The hosts on which this search works
> > >>
> > >> freshports.devgit=# select websearch_to_tsquery('example');
> > >>  websearch_to_tsquery
> > >> ----------------------
> > >>  'exampl'
> > >> (1 row)
> > >
> > >
> > > freshports.devgit=# select setting, source from pg_settings where name
> > > = 'default_text_search_config';
> > >       setting       |       source
> > > --------------------+--------------------
> > >  pg_catalog.english | configuration file
> > > (1 row)
> > >
> > >
> > > At least now I know what I can play with to get all hosts in sync.
> >
> > Here we go, on the problem database, create a new field, based on simple, not english.
> >
> > ALTER TABLE public.ports
> >     ADD COLUMN pkgmessage_textsearchable3 tsvector generated always as (to_tsvector('simple'::regconfig,
translate(pkgmessage,'/'::text, ' '::text))) stored;
 
> >
> > Index it:
> >
> > CREATE INDEX ports_pkgmessage_textsearchable3_idx
> >     ON public.ports USING gin
> >     (pkgmessage_textsearchable3)
> >     TABLESPACE pg_default;
> > CREATE INDEX
> >
> > query it:
> >
> > freshports.org=> SELECT id as port_id, element_pathname(element_id)
> >   FROM ports
> >  WHERE pkgmessage_textsearchable3  @@ websearch_to_tsquery('example');
> >  port_id |                           element_pathname
> > ---------+----------------------------------------------------------------------
> >    34126 | /ports/head/security/pond
> >    74559 | /ports/branches/2015Q3/emulators/linux_base-c6
> >    60310 | /ports/branches/2020Q4/www/gitlab-ce
> >    38345 | /ports/head/www/gitlab
> >    46842 | /ports/branches/2018Q1/mail/postfix-sasl
> >    51532 | /ports/branches/2019Q1/graphics/drm-legacy-kmod
> >
> > Success. Thank you Mr Lane.
> >
> > --
> >   Dan Langille
> >   dan@langille.org
> >
> >
>
>
> --
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company



-- 
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: tsvector not giving expected results on one host

От
Dan Langille
Дата:
Oleg Bartunov wrote on 12/21/22 12:31 PM:
I

On Wed, Dec 21, 2022 at 1:12 PM Oleg Bartunov <obartunov@postgrespro.ru> wrote:
Dan,

it's always good to specify configuration name in a query to avoid
recheck,  since
websearch_to_tsquery(regconfig, text) is immutable, while
websearch_to_tsquery(text) is stable.

I am glad to hear this because one of my decisions was:

* change the query to include configuration name
* change the default configuration name

I think I'm going to start using websearch_to_tsquery(regconfig, text)
immutable function calculates once in planning time, but stable
function calculates during running time,
so the difference may be very big depending on how many tuples found.

Looking at my data, the largest tsvector collection has 453 elements.  About 40 have more
than 170.  Total number of rows is about 2400.

Another issue discussed on IRC: why store my tsvector values? Why not just index them instead?

At present: I have this column:

 pkgmessage_textsearchable  | tsvector   |   |  | generated always as (to_tsvector('english'::regconfig, pkgmessage)) stored

with this index: ports_pkgmessage_textsearchable_idx" gin (pkgmessage_textsearchable)

Instead, I could replace that column and index with this index:

"testing" gin (to_tsvector('english'::regconfig, pkgmessage))

Simple testing showed it was comparable if not slightly faster.

The plan now: implement the index on to_tsvector, not a column, and start specifying the configuration. That's in the near future.

I've written up this journey at https://news.freshports.org/2022/12/18/when-tsvector-was-working-as-expected-on-most-hosts-but-not-one/

thank you
See the difference:

[local]:5433 oleg@oleg=# explain (analyze,costs off)  select title
from apod where  websearch_to_tsquery('simple','galaxies') @@ fts;                                      QUERY PLAN
--------------------------------------------------------------------------------------- Bitmap Heap Scan on apod (actual time=0.008..0.008 rows=0 loops=1)   Recheck Cond: ('''galaxies'''::tsquery @@ fts)   ->  Bitmap Index Scan on gin_apod_fts_idx (actual time=0.007..0.007
rows=0 loops=1)         Index Cond: (fts @@ '''galaxies'''::tsquery) Planning Time: 0.134 ms Execution Time: 0.022 ms
(6 rows)

Time: 0.369 ms
[local]:5433 oleg@oleg=# explain (analyze,costs off)  select title
from apod where  websearch_to_tsquery('galaxies') @@ fts;                                       QUERY PLAN
----------------------------------------------------------------------------------------- Bitmap Heap Scan on apod (actual time=0.107..1.463 rows=493 loops=1)   Filter: (websearch_to_tsquery('galaxies'::text) @@ fts)   Heap Blocks: exact=276   ->  Bitmap Index Scan on gin_apod_fts_idx (actual time=0.059..0.059
rows=493 loops=1)         Index Cond: (fts @@ websearch_to_tsquery('galaxies'::text)) Planning Time: 0.125 ms Execution Time: 1.518 ms
(7 rows)