Re: Incorrect FTS result with GIN index

Поиск
Список
Период
Сортировка
От Oleg Bartunov
Тема Re: Incorrect FTS result with GIN index
Дата
Msg-id Pine.LNX.4.64.1007171450210.32129@sn.sai.msu.ru
обсуждение исходный текст
Ответ на Incorrect FTS result with GIN index  (Artur Dabrowski <ad@astec.com.pl>)
Ответы Re: Incorrect FTS result with GIN index  (Artur Dabrowski <ad@astec.com.pl>)
Список pgsql-general
Artur,

I downloaded your dump and tried your queries with index, I see no problem
so far.

                               Table "public.search_tab"
      Column     |  Type   |                        Modifiers
----------------+---------+----------------------------------------------------------
  id             | integer | not null default nextval('search_tab_id_seq1'::regclass)
  keywords       | text    |
  collection_urn | text    |
  bbox           | text    |
  object_urn     | text    | not null
  description    | text    |
  category       | text    |
  summary        | text    |
  priority       | integer |
Indexes:
     "search_tab_pkey1" PRIMARY KEY, btree (id)
     "idx_keywords_ger" gin (to_tsvector('german'::regconfig, keywords))

test=# explain analyze select count(*) from search_tab
where (to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:*'))
and   (to_tsvector('german', keywords ) @@ to_tsquery('german', 'dd:*'));
                                                                                QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=103.87..103.88 rows=1 width=0) (actual time=24.784..24.784 rows=1 loops=1)
    ->  Bitmap Heap Scan on search_tab  (cost=5.21..103.80 rows=25 width=0) (actual time=24.642..24.769 rows=123
loops=1)
          Recheck Cond: ((to_tsvector('german'::regconfig, keywords) @@ '''ee'':*'::tsquery) AND
(to_tsvector('german'::regconfig,keywords) @@ '''dd'':*'::tsquery)) 
          ->  Bitmap Index Scan on idx_keywords_ger  (cost=0.00..5.21 rows=25 width=0) (actual time=24.620..24.620
rows=123loops=1) 
                Index Cond: ((to_tsvector('german'::regconfig, keywords) @@ '''ee'':*'::tsquery) AND
(to_tsvector('german'::regconfig,keywords) @@ '''dd'':*'::tsquery)) 
  Total runtime: 24.830 ms
(6 rows)

see rows=123


On Thu, 15 Jul 2010, Artur Dabrowski wrote:

>
> Hello,
>
> I was trying to use GIN index, but the results seem be incorrect.
>
>
> 1. QUERY WITHOUT INDEX
> select count(*) from search_tab where
> (to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:*')) and
> (to_tsvector('german', keywords ) @@ to_tsquery('german', 'dd:*'));
>
> count
> -------
>   123
> (1 row)
>
>
> 2. CREATING INDEX
> create index idx_keywords_ger on search_tab
> using gin(to_tsvector('german', keywords));
>
>
> 3. QUERY WITH INDEX
> select count(*) from search_tab where
> (to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:*')) and
> (to_tsvector('german', keywords ) @@ to_tsquery('german', 'dd:*'));
>
> count
> -------
>   116
> (1 row)
>
>
> The number of rows is different. To make things more funny and ensure
> problem is not caused by dictionary normalisation:
>
> 4. EQUIVALENT QUERY WITH INDEX
> select count(*) from search_tab where
> (to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:* & dd:*'));
>
> count
> -------
>   123
> (1 row)
>
> I tried the same with simple-based dictionary. The problem is always
> reproducible.
>
> Total count of records in my database is 1 006 300 if it matters.
>
> One of missing results is the following: "lSWN eeIf hInEI IN
> SIL3WugEOANcEGVWL1L LBAGAeLlGS ttfL DDhuDEIni9 ce". If the query is more
> specifically targeted to find this row then it founds it:
>
> 5. MORE DETAILED QUERY WITH INDEX
> select keywords from search_tab where
> (to_tsvector('german', keywords ) @@ to_tsquery('german', 'eeI:* & dd:*'));
>
>                                     keywords
>
> --------------------------------------------------------------------------------
>
> lSWN eeIf hInEI IN SIL3WugEOANcEGVWL1L LBAGAeLlGS ttfL DDhuDEIni9 ce
> tSALWIEEIn-3WNecGAINfLuLAV DDLIWNG E Lt h c8  BiIfgGl1 EeIhulSLenS6LDe5O
> hGn DDlhIgGEAcS1O eeiEEI WnILWELS68VBLL AGNIAfINt6 lLuWuNeDc ItLfe SL
> hGe WIiI EeItnLLuA1efOh3ALWc  uGINEltcIBE LnegLDNA3 DD SVNG LSSIlWfE
> eeIW ItueS W39LnELg-GuDLEhAn8BeFG IVi DDNEfLG1SI 1tNIOA  lAhNLLccfWISE l
> 6em on.0nsRH nehSA2l1HAsauncu0I65l7 ddnsn1SAS i u0eLAnlr t70gaains w gzsH
> eeiog
> rfiwgso0g364l1 1wU eei1n 5lL dDA 0
> DDInNcEfSWAEAtcL1IeSuAG5LE Lilh8tEGeDg f3B eEIOL7h uWV-L1IGN LINWeIn l S
> ils eeiru00ewH.6sgAeHoSlLhglso0 asn0u2a atisA0 ddcngAnzRA Se Au2 nm8ns0
> uS8snH
> DDD EWlE1GShhLe8L NENI  tuL cgGGInfcBAlLfIO L1S eeIWeAEnILStu AViWNI
> n IOLLt 0Alih tuWNE L nAGlVSNSDI DDeW BIegfG EeIhL9ELeScELWGAIfN1uIc
> DnSE eeIWLu9tLNhNEuAt I1BelhGGfLWLS nSWINI eiELgAIG DDLEclV7 IO c Af
> EeIElfN L4I lE2G cSOLniAWgSVItc ILDN L57BuDfALtSIe-WnGhGIW DDA NE1Lhuee
> hNILN DD L6flSEeW1gthfI L1WAlENE eEIGIAt VGBDO uGLeLccAeSuLWIn Ii nS
> (14 rows)
>
>
> Did I misunderstood something or is it a bug?
>
> Best regards
> Artur
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

В списке pgsql-general по дате отправления:

Предыдущее
От: Thomas Kellerer
Дата:
Сообщение: Re: pg_dump and --inserts / --column-inserts
Следующее
От: Jerry LeVan
Дата:
Сообщение: Fedora 13 killed dblink this week...