Обсуждение: Incorrect FTS result with GIN index

Поиск
Список
Период
Сортировка

Incorrect FTS result with GIN index

От
Artur Dabrowski
Дата:
Hello,

I was trying to use GIN index, but the results seem be incorrect.


1. QUERY WITHOUT INDEX
select count(*) from search_tab where
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:*')) and
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'dd:*'));

 count
-------
   123
(1 row)


2. CREATING INDEX
create index idx_keywords_ger on search_tab
using gin(to_tsvector('german', keywords));


3. QUERY WITH INDEX
select count(*) from search_tab where
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:*')) and
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'dd:*'));

 count
-------
   116
(1 row)


The number of rows is different. To make things more funny and ensure
problem is not caused by dictionary normalisation:

4. EQUIVALENT QUERY WITH INDEX
select count(*) from search_tab where
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:* & dd:*'));

 count
-------
   123
(1 row)

I tried the same with simple-based dictionary. The problem is always
reproducible.

Total count of records in my database is 1 006 300 if it matters.

One of missing results is the following: "lSWN eeIf hInEI IN
SIL3WugEOANcEGVWL1L LBAGAeLlGS ttfL DDhuDEIni9 ce". If the query is more
specifically targeted to find this row then it founds it:

5. MORE DETAILED QUERY WITH INDEX
select keywords from search_tab where
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'eeI:* & dd:*'));

                                     keywords

--------------------------------------------------------------------------------

lSWN eeIf hInEI IN SIL3WugEOANcEGVWL1L LBAGAeLlGS ttfL DDhuDEIni9 ce
tSALWIEEIn-3WNecGAINfLuLAV DDLIWNG E Lt h c8  BiIfgGl1 EeIhulSLenS6LDe5O
hGn DDlhIgGEAcS1O eeiEEI WnILWELS68VBLL AGNIAfINt6 lLuWuNeDc ItLfe SL
hGe WIiI EeItnLLuA1efOh3ALWc  uGINEltcIBE LnegLDNA3 DD SVNG LSSIlWfE
eeIW ItueS W39LnELg-GuDLEhAn8BeFG IVi DDNEfLG1SI 1tNIOA  lAhNLLccfWISE l
6em on.0nsRH nehSA2l1HAsauncu0I65l7 ddnsn1SAS i u0eLAnlr t70gaains w gzsH
eeiog
rfiwgso0g364l1 1wU eei1n 5lL dDA 0
DDInNcEfSWAEAtcL1IeSuAG5LE Lilh8tEGeDg f3B eEIOL7h uWV-L1IGN LINWeIn l S
ils eeiru00ewH.6sgAeHoSlLhglso0 asn0u2a atisA0 ddcngAnzRA Se Au2 nm8ns0
uS8snH
DDD EWlE1GShhLe8L NENI  tuL cgGGInfcBAlLfIO L1S eeIWeAEnILStu AViWNI
n IOLLt 0Alih tuWNE L nAGlVSNSDI DDeW BIegfG EeIhL9ELeScELWGAIfN1uIc
DnSE eeIWLu9tLNhNEuAt I1BelhGGfLWLS nSWINI eiELgAIG DDLEclV7 IO c Af
EeIElfN L4I lE2G cSOLniAWgSVItc ILDN L57BuDfALtSIe-WnGhGIW DDA NE1Lhuee
hNILN DD L6flSEeW1gthfI L1WAlENE eEIGIAt VGBDO uGLeLccAeSuLWIn Ii nS
(14 rows)


Did I misunderstood something or is it a bug?

Best regards
Artur
--
View this message in context: http://old.nabble.com/Incorrect-FTS-result-with-GIN-index-tp29172750p29172750.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Incorrect FTS results with GIN index

От
Artur Dabrowski
Дата:
I pasted incorrect query in point 5. It should be:

5. MORE DETAILED QUERY WITH INDEX
select keywords from search_tab where
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'eeI:*')) and
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'dd:*'));

                                     keywords

--------------------------------------------------------------------------------

lSWN eeIf hInEI IN SIL3WugEOANcEGVWL1L LBAGAeLlGS ttfL DDhuDEIni9 ce
tSALWIEEIn-3WNecGAINfLuLAV DDLIWNG E Lt h c8  BiIfgGl1 EeIhulSLenS6LDe5O
hGn DDlhIgGEAcS1O eeiEEI WnILWELS68VBLL AGNIAfINt6 lLuWuNeDc ItLfe SL
hGe WIiI EeItnLLuA1efOh3ALWc  uGINEltcIBE LnegLDNA3 DD SVNG LSSIlWfE
eeIW ItueS W39LnELg-GuDLEhAn8BeFG IVi DDNEfLG1SI 1tNIOA  lAhNLLccfWISE l
6em on.0nsRH nehSA2l1HAsauncu0I65l7 ddnsn1SAS i u0eLAnlr t70gaains w gzsH
eeiog
rfiwgso0g364l1 1wU eei1n 5lL dDA 0
DDInNcEfSWAEAtcL1IeSuAG5LE Lilh8tEGeDg f3B eEIOL7h uWV-L1IGN LINWeIn l S
ils eeiru00ewH.6sgAeHoSlLhglso0 asn0u2a atisA0 ddcngAnzRA Se Au2 nm8ns0
uS8snH
DDD EWlE1GShhLe8L NENI  tuL cgGGInfcBAlLfIO L1S eeIWeAEnILStu AViWNI
n IOLLt 0Alih tuWNE L nAGlVSNSDI DDeW BIegfG EeIhL9ELeScELWGAIfN1uIc
DnSE eeIWLu9tLNhNEuAt I1BelhGGfLWLS nSWINI eiELgAIG DDLEclV7 IO c Af
EeIElfN L4I lE2G cSOLniAWgSVItc ILDN L57BuDfALtSIe-WnGhGIW DDA NE1Lhuee
hNILN DD L6flSEeW1gthfI L1WAlENE eEIGIAt VGBDO uGLeLccAeSuLWIn Ii nS
(14 rows)
--
View this message in context: http://old.nabble.com/Incorrect-FTS-results-with-GIN-index-tp29172750p29172950.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Incorrect FTS results with GIN index

От
Artur Dabrowski
Дата:
My version of PostgreSQL is 8.4.3.
--
View this message in context: http://old.nabble.com/Incorrect-FTS-results-with-GIN-index-tp29172750p29173652.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Incorrect FTS result with GIN index

От
Oleg Bartunov
Дата:
Artur,

I downloaded your dump and tried your queries with index, I see no problem
so far.

                               Table "public.search_tab"
      Column     |  Type   |                        Modifiers
----------------+---------+----------------------------------------------------------
  id             | integer | not null default nextval('search_tab_id_seq1'::regclass)
  keywords       | text    |
  collection_urn | text    |
  bbox           | text    |
  object_urn     | text    | not null
  description    | text    |
  category       | text    |
  summary        | text    |
  priority       | integer |
Indexes:
     "search_tab_pkey1" PRIMARY KEY, btree (id)
     "idx_keywords_ger" gin (to_tsvector('german'::regconfig, keywords))

test=# explain analyze select count(*) from search_tab
where (to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:*'))
and   (to_tsvector('german', keywords ) @@ to_tsquery('german', 'dd:*'));
                                                                                QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=103.87..103.88 rows=1 width=0) (actual time=24.784..24.784 rows=1 loops=1)
    ->  Bitmap Heap Scan on search_tab  (cost=5.21..103.80 rows=25 width=0) (actual time=24.642..24.769 rows=123
loops=1)
          Recheck Cond: ((to_tsvector('german'::regconfig, keywords) @@ '''ee'':*'::tsquery) AND
(to_tsvector('german'::regconfig,keywords) @@ '''dd'':*'::tsquery)) 
          ->  Bitmap Index Scan on idx_keywords_ger  (cost=0.00..5.21 rows=25 width=0) (actual time=24.620..24.620
rows=123loops=1) 
                Index Cond: ((to_tsvector('german'::regconfig, keywords) @@ '''ee'':*'::tsquery) AND
(to_tsvector('german'::regconfig,keywords) @@ '''dd'':*'::tsquery)) 
  Total runtime: 24.830 ms
(6 rows)

see rows=123


On Thu, 15 Jul 2010, Artur Dabrowski wrote:

>
> Hello,
>
> I was trying to use GIN index, but the results seem be incorrect.
>
>
> 1. QUERY WITHOUT INDEX
> select count(*) from search_tab where
> (to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:*')) and
> (to_tsvector('german', keywords ) @@ to_tsquery('german', 'dd:*'));
>
> count
> -------
>   123
> (1 row)
>
>
> 2. CREATING INDEX
> create index idx_keywords_ger on search_tab
> using gin(to_tsvector('german', keywords));
>
>
> 3. QUERY WITH INDEX
> select count(*) from search_tab where
> (to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:*')) and
> (to_tsvector('german', keywords ) @@ to_tsquery('german', 'dd:*'));
>
> count
> -------
>   116
> (1 row)
>
>
> The number of rows is different. To make things more funny and ensure
> problem is not caused by dictionary normalisation:
>
> 4. EQUIVALENT QUERY WITH INDEX
> select count(*) from search_tab where
> (to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:* & dd:*'));
>
> count
> -------
>   123
> (1 row)
>
> I tried the same with simple-based dictionary. The problem is always
> reproducible.
>
> Total count of records in my database is 1 006 300 if it matters.
>
> One of missing results is the following: "lSWN eeIf hInEI IN
> SIL3WugEOANcEGVWL1L LBAGAeLlGS ttfL DDhuDEIni9 ce". If the query is more
> specifically targeted to find this row then it founds it:
>
> 5. MORE DETAILED QUERY WITH INDEX
> select keywords from search_tab where
> (to_tsvector('german', keywords ) @@ to_tsquery('german', 'eeI:* & dd:*'));
>
>                                     keywords
>
> --------------------------------------------------------------------------------
>
> lSWN eeIf hInEI IN SIL3WugEOANcEGVWL1L LBAGAeLlGS ttfL DDhuDEIni9 ce
> tSALWIEEIn-3WNecGAINfLuLAV DDLIWNG E Lt h c8  BiIfgGl1 EeIhulSLenS6LDe5O
> hGn DDlhIgGEAcS1O eeiEEI WnILWELS68VBLL AGNIAfINt6 lLuWuNeDc ItLfe SL
> hGe WIiI EeItnLLuA1efOh3ALWc  uGINEltcIBE LnegLDNA3 DD SVNG LSSIlWfE
> eeIW ItueS W39LnELg-GuDLEhAn8BeFG IVi DDNEfLG1SI 1tNIOA  lAhNLLccfWISE l
> 6em on.0nsRH nehSA2l1HAsauncu0I65l7 ddnsn1SAS i u0eLAnlr t70gaains w gzsH
> eeiog
> rfiwgso0g364l1 1wU eei1n 5lL dDA 0
> DDInNcEfSWAEAtcL1IeSuAG5LE Lilh8tEGeDg f3B eEIOL7h uWV-L1IGN LINWeIn l S
> ils eeiru00ewH.6sgAeHoSlLhglso0 asn0u2a atisA0 ddcngAnzRA Se Au2 nm8ns0
> uS8snH
> DDD EWlE1GShhLe8L NENI  tuL cgGGInfcBAlLfIO L1S eeIWeAEnILStu AViWNI
> n IOLLt 0Alih tuWNE L nAGlVSNSDI DDeW BIegfG EeIhL9ELeScELWGAIfN1uIc
> DnSE eeIWLu9tLNhNEuAt I1BelhGGfLWLS nSWINI eiELgAIG DDLEclV7 IO c Af
> EeIElfN L4I lE2G cSOLniAWgSVItc ILDN L57BuDfALtSIe-WnGhGIW DDA NE1Lhuee
> hNILN DD L6flSEeW1gthfI L1WAlENE eEIGIAt VGBDO uGLeLccAeSuLWIn Ii nS
> (14 rows)
>
>
> Did I misunderstood something or is it a bug?
>
> Best regards
> Artur
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: Incorrect FTS result with GIN index

От
Artur Dabrowski
Дата:
Hello Oleg,

my results are different. The analysis looks like this (please note the
different numbers of rows):

Aggregate  (cost=104.05..104.06 rows=1 width=0) (actual
time=152.133..152.135 rows=1 loops=1)
  ->  Bitmap Heap Scan on search_tab  (cost=5.39..103.98 rows=25 width=0)
(actual time=76.546..151.834 rows=116 loops=1)
        Recheck Cond: ((to_tsvector('german'::regconfig, keywords) @@
'''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@
'''dd'':*'::tsquery))
        ->  Bitmap Index Scan on idx_keywords_ger  (cost=0.00..5.38 rows=25
width=0) (actual time=76.292..76.292 rows=506 loops=1)
              Index Cond: ((to_tsvector('german'::regconfig, keywords) @@
'''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@
'''dd'':*'::tsquery))
Total runtime: 152.389 ms


I have no idea, what could be the reason for different behaviour on your and
my machine (windows xp, postgreSQL 8.4.3)?
I reproduced the same wrong behaviour on a machine of my co-worker (windows
xp, postgreSQL 8.4.4).


--
View this message in context: http://old.nabble.com/Incorrect-FTS-results-with-GIN-index-tp29172750p29203020.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Incorrect FTS result with GIN index

От
Oleg Bartunov
Дата:
Artur,

I don't know, but could you try linux machine ?

Oleg
On Mon, 19 Jul 2010, Artur Dabrowski wrote:

>
> Hello Oleg,
>
> my results are different. The analysis looks like this (please note the
> different numbers of rows):
>
> Aggregate  (cost=104.05..104.06 rows=1 width=0) (actual
> time=152.133..152.135 rows=1 loops=1)
>  ->  Bitmap Heap Scan on search_tab  (cost=5.39..103.98 rows=25 width=0)
> (actual time=76.546..151.834 rows=116 loops=1)
>        Recheck Cond: ((to_tsvector('german'::regconfig, keywords) @@
> '''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@
> '''dd'':*'::tsquery))
>        ->  Bitmap Index Scan on idx_keywords_ger  (cost=0.00..5.38 rows=25
> width=0) (actual time=76.292..76.292 rows=506 loops=1)
>              Index Cond: ((to_tsvector('german'::regconfig, keywords) @@
> '''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@
> '''dd'':*'::tsquery))
> Total runtime: 152.389 ms
>
>
> I have no idea, what could be the reason for different behaviour on your and
> my machine (windows xp, postgreSQL 8.4.3)?
> I reproduced the same wrong behaviour on a machine of my co-worker (windows
> xp, postgreSQL 8.4.4).
>
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: Incorrect FTS result with GIN index

От
Artur Dabrowski
Дата:
I tested the same backup on our CentOS 5.4 virtual machine (running on xen
server) and the results are really weird (118 rows, comparing to 116 on win
xp and 123 expected):

Aggregate  (cost=104.00..104.01 rows=1 width=0) (actual
time=120.373..120.374 rows=1 loops=1)
  ->  Bitmap Heap Scan on search_tab  (cost=5.35..103.93 rows=25 width=0)
(actual time=59.418..120.137 rows=118 loops=1)
        Recheck Cond: ((to_tsvector('german'::regconfig, keywords) @@
'''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@
'''dd'':*'::tsquery))
        ->  Bitmap Index Scan on idx_keywords_ger  (cost=0.00..5.34 rows=25
width=0) (actual time=59.229..59.229 rows=495 loops=1)
              Index Cond: ((to_tsvector('german'::regconfig, keywords) @@
'''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@
'''dd'':*'::tsquery))
Total runtime: 120.670 ms

And here are the configuration details:

PostgreSQL:
postgresql84-server-8.4.4-1.el5_5.1

# uname -r
2.6.18-164.15.1.el5xen

# cat /etc/redhat-release
CentOS release 5.4 (Final)

# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU            5140  @ 2.33GHz
stepping        : 6
cpu MHz         : 2333.416
cache size      : 4096 KB
physical id     : 0
siblings        : 1
core id         : 0
cpu cores       : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu de tsc msr pae cx8 apic sep cmov pat clflush acpi mmx
fxsr sse sse2 ss ht syscall lm constant_tsc pni cx16 lahf_lm
bogomips        : 5835.83
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:







Oleg Bartunov wrote:
>
> Artur,
>
> I don't know, but could you try linux machine ?
>
> Oleg
>

--
View this message in context: http://old.nabble.com/Incorrect-FTS-results-with-GIN-index-tp29172750p29212116.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Incorrect FTS result with GIN index

От
Artur Dabrowski
Дата:
The CentOS used for testing is a 64-bits version.


Artur Dabrowski wrote:
>
> I tested the same backup on our CentOS 5.4 virtual machine (running on xen
> server) and the results are really weird (118 rows, comparing to 116 on
> win xp and 123 expected):
>
>
>

--
View this message in context: http://old.nabble.com/Incorrect-FTS-results-with-GIN-index-tp29172750p29212162.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Incorrect FTS result with GIN index

От
Oleg Bartunov
Дата:
Artur,

I recommend post your problem to -hackers mailing list. I have no idea,
what could be a problem.

My machine is:
uname -a
Linux mira 2.6.33-020633-generic #020633 SMP Thu Feb 25 10:10:03 UTC 2010 x86_64 GNU/Linux

PostgreSQL 8.4.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (Ubuntu 4.4.1-4ubuntu9) 4.4.1, 64-bit

As a last resort I recommend you to compile pg yourself and see if the
problem exists.

Oleg


On Tue, 20 Jul 2010, Artur Dabrowski wrote:

>
> I tested the same backup on our CentOS 5.4 virtual machine (running on xen
> server) and the results are really weird (118 rows, comparing to 116 on win
> xp and 123 expected):
>
> Aggregate  (cost=104.00..104.01 rows=1 width=0) (actual
> time=120.373..120.374 rows=1 loops=1)
>  ->  Bitmap Heap Scan on search_tab  (cost=5.35..103.93 rows=25 width=0)
> (actual time=59.418..120.137 rows=118 loops=1)
>        Recheck Cond: ((to_tsvector('german'::regconfig, keywords) @@
> '''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@
> '''dd'':*'::tsquery))
>        ->  Bitmap Index Scan on idx_keywords_ger  (cost=0.00..5.34 rows=25
> width=0) (actual time=59.229..59.229 rows=495 loops=1)
>              Index Cond: ((to_tsvector('german'::regconfig, keywords) @@
> '''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@
> '''dd'':*'::tsquery))
> Total runtime: 120.670 ms
>
> And here are the configuration details:
>
> PostgreSQL:
> postgresql84-server-8.4.4-1.el5_5.1
>
> # uname -r
> 2.6.18-164.15.1.el5xen
>
> # cat /etc/redhat-release
> CentOS release 5.4 (Final)
>
> # cat /proc/cpuinfo
> processor       : 0
> vendor_id       : GenuineIntel
> cpu family      : 6
> model           : 15
> model name      : Intel(R) Xeon(R) CPU            5140  @ 2.33GHz
> stepping        : 6
> cpu MHz         : 2333.416
> cache size      : 4096 KB
> physical id     : 0
> siblings        : 1
> core id         : 0
> cpu cores       : 1
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 10
> wp              : yes
> flags           : fpu de tsc msr pae cx8 apic sep cmov pat clflush acpi mmx
> fxsr sse sse2 ss ht syscall lm constant_tsc pni cx16 lahf_lm
> bogomips        : 5835.83
> clflush size    : 64
> cache_alignment : 64
> address sizes   : 36 bits physical, 48 bits virtual
> power management:
>
>
>
>
>
>
>
> Oleg Bartunov wrote:
>>
>> Artur,
>>
>> I don't know, but could you try linux machine ?
>>
>> Oleg
>>
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: Incorrect FTS result with GIN index

От
Artur Dabrowski
Дата:
Oleg,

thanks for your help.

I sent a post to pg-hackers list:
http://old.nabble.com/Query-results-differ-depending-on-operating-system-%28using-GIN%29-ts29213082.html

As to compiling pg... I will no do this since I do not really feel
comfortable doing it and cannot dedicate too much time to this problem.

Artur



Oleg Bartunov wrote:
>
> Artur,
>
> I recommend post your problem to -hackers mailing list. I have no idea,
> what could be a problem.
>
> My machine is:
> uname -a
> Linux mira 2.6.33-020633-generic #020633 SMP Thu Feb 25 10:10:03 UTC 2010
> x86_64 GNU/Linux
>
> PostgreSQL 8.4.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (Ubuntu
> 4.4.1-4ubuntu9) 4.4.1, 64-bit
>
> As a last resort I recommend you to compile pg yourself and see if the
> problem exists.
>
> Oleg
>
>
> On Tue, 20 Jul 2010, Artur Dabrowski wrote:
>
>>
>> I tested the same backup on our CentOS 5.4 virtual machine (running on
>> xen
>> server) and the results are really weird (118 rows, comparing to 116 on
>> win
>> xp and 123 expected):
>>
>> Aggregate  (cost=104.00..104.01 rows=1 width=0) (actual
>> time=120.373..120.374 rows=1 loops=1)
>>  ->  Bitmap Heap Scan on search_tab  (cost=5.35..103.93 rows=25 width=0)
>> (actual time=59.418..120.137 rows=118 loops=1)
>>        Recheck Cond: ((to_tsvector('german'::regconfig, keywords) @@
>> '''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@
>> '''dd'':*'::tsquery))
>>        ->  Bitmap Index Scan on idx_keywords_ger  (cost=0.00..5.34
>> rows=25
>> width=0) (actual time=59.229..59.229 rows=495 loops=1)
>>              Index Cond: ((to_tsvector('german'::regconfig, keywords) @@
>> '''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@
>> '''dd'':*'::tsquery))
>> Total runtime: 120.670 ms
>>
>> And here are the configuration details:
>>
>> PostgreSQL:
>> postgresql84-server-8.4.4-1.el5_5.1
>>
>> # uname -r
>> 2.6.18-164.15.1.el5xen
>>
>> # cat /etc/redhat-release
>> CentOS release 5.4 (Final)
>>
>> # cat /proc/cpuinfo
>> processor       : 0
>> vendor_id       : GenuineIntel
>> cpu family      : 6
>> model           : 15
>> model name      : Intel(R) Xeon(R) CPU            5140  @ 2.33GHz
>> stepping        : 6
>> cpu MHz         : 2333.416
>> cache size      : 4096 KB
>> physical id     : 0
>> siblings        : 1
>> core id         : 0
>> cpu cores       : 1
>> fpu             : yes
>> fpu_exception   : yes
>> cpuid level     : 10
>> wp              : yes
>> flags           : fpu de tsc msr pae cx8 apic sep cmov pat clflush acpi
>> mmx
>> fxsr sse sse2 ss ht syscall lm constant_tsc pni cx16 lahf_lm
>> bogomips        : 5835.83
>> clflush size    : 64
>> cache_alignment : 64
>> address sizes   : 36 bits physical, 48 bits virtual
>> power management:
>>
>>
>>
>>
>>
>>
>>
>> Oleg Bartunov wrote:
>>>
>>> Artur,
>>>
>>> I don't know, but could you try linux machine ?
>>>
>>> Oleg
>>>
>>
>>
>
>      Regards,
>          Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
>

--
View this message in context: http://old.nabble.com/Incorrect-FTS-results-with-GIN-index-tp29172750p29215929.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Incorrect FTS result with GIN index

От
Oleg Bartunov
Дата:
Artur,

you could get much more problems in future. Full text search problem may be
signature of more general problem with your postgres setup. So, I'd recommend
to find a source of the problem


Oleg
On Tue, 20 Jul 2010, Artur Dabrowski wrote:

>
> Oleg,
>
> thanks for your help.
>
> I sent a post to pg-hackers list:
> http://old.nabble.com/Query-results-differ-depending-on-operating-system-%28using-GIN%29-ts29213082.html
>
> As to compiling pg... I will no do this since I do not really feel
> comfortable doing it and cannot dedicate too much time to this problem.
>
> Artur
>
>
>
> Oleg Bartunov wrote:
>>
>> Artur,
>>
>> I recommend post your problem to -hackers mailing list. I have no idea,
>> what could be a problem.
>>
>> My machine is:
>> uname -a
>> Linux mira 2.6.33-020633-generic #020633 SMP Thu Feb 25 10:10:03 UTC 2010
>> x86_64 GNU/Linux
>>
>> PostgreSQL 8.4.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (Ubuntu
>> 4.4.1-4ubuntu9) 4.4.1, 64-bit
>>
>> As a last resort I recommend you to compile pg yourself and see if the
>> problem exists.
>>
>> Oleg
>>
>>
>> On Tue, 20 Jul 2010, Artur Dabrowski wrote:
>>
>>>
>>> I tested the same backup on our CentOS 5.4 virtual machine (running on
>>> xen
>>> server) and the results are really weird (118 rows, comparing to 116 on
>>> win
>>> xp and 123 expected):
>>>
>>> Aggregate  (cost=104.00..104.01 rows=1 width=0) (actual
>>> time=120.373..120.374 rows=1 loops=1)
>>>  ->  Bitmap Heap Scan on search_tab  (cost=5.35..103.93 rows=25 width=0)
>>> (actual time=59.418..120.137 rows=118 loops=1)
>>>        Recheck Cond: ((to_tsvector('german'::regconfig, keywords) @@
>>> '''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@
>>> '''dd'':*'::tsquery))
>>>        ->  Bitmap Index Scan on idx_keywords_ger  (cost=0.00..5.34
>>> rows=25
>>> width=0) (actual time=59.229..59.229 rows=495 loops=1)
>>>              Index Cond: ((to_tsvector('german'::regconfig, keywords) @@
>>> '''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@
>>> '''dd'':*'::tsquery))
>>> Total runtime: 120.670 ms
>>>
>>> And here are the configuration details:
>>>
>>> PostgreSQL:
>>> postgresql84-server-8.4.4-1.el5_5.1
>>>
>>> # uname -r
>>> 2.6.18-164.15.1.el5xen
>>>
>>> # cat /etc/redhat-release
>>> CentOS release 5.4 (Final)
>>>
>>> # cat /proc/cpuinfo
>>> processor       : 0
>>> vendor_id       : GenuineIntel
>>> cpu family      : 6
>>> model           : 15
>>> model name      : Intel(R) Xeon(R) CPU            5140  @ 2.33GHz
>>> stepping        : 6
>>> cpu MHz         : 2333.416
>>> cache size      : 4096 KB
>>> physical id     : 0
>>> siblings        : 1
>>> core id         : 0
>>> cpu cores       : 1
>>> fpu             : yes
>>> fpu_exception   : yes
>>> cpuid level     : 10
>>> wp              : yes
>>> flags           : fpu de tsc msr pae cx8 apic sep cmov pat clflush acpi
>>> mmx
>>> fxsr sse sse2 ss ht syscall lm constant_tsc pni cx16 lahf_lm
>>> bogomips        : 5835.83
>>> clflush size    : 64
>>> cache_alignment : 64
>>> address sizes   : 36 bits physical, 48 bits virtual
>>> power management:
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Oleg Bartunov wrote:
>>>>
>>>> Artur,
>>>>
>>>> I don't know, but could you try linux machine ?
>>>>
>>>> Oleg
>>>>
>>>
>>>
>>
>>      Regards,
>>          Oleg
>> _____________________________________________________________
>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>> Sternberg Astronomical Institute, Moscow University, Russia
>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>> phone: +007(495)939-16-83, +007(495)939-23-83
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>>
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: Incorrect FTS result with GIN index

От
Artur Dabrowski
Дата:
Hello Oleg,

I totally agree, that the problem should be fixed. Saying this, I need to
add that:
- I have no knowledge of postgres development,
- I cannot dedicate any significant time to this problem,
- I am no longer working for the project where the problem occurred,
- In the mentioned project the problem is not considered business-critical
at the moment (although it may be in the future).

Nevertheless I think it should be still interesting for postgres developers
community to fix it. The point is I have no needed knowledge nor time to fix
it.

As to my postgres setup - it's nothing special, it's just a regular version
from postgres' webpage.

Best regards
Artur







Oleg Bartunov wrote:
>
> Artur,
>
> you could get much more problems in future. Full text search problem may
> be
> signature of more general problem with your postgres setup. So, I'd
> recommend
> to find a source of the problem
>
>
> Oleg
>

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Incorrect-FTS-results-with-GIN-index-tp1928607p2227845.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: Incorrect FTS result with GIN index

От
Tom Lane
Дата:
Oleg Bartunov <oleg@sai.msu.su> writes:
> I recommend post your problem to -hackers mailing list. I have no idea,
> what could be a problem.

I wonder whether the problem is not windows versus non windows but
original database versus copies.  If it is a GIN bug it seems quite
possible that it would depend on the order of insertion of the index
entries, which a simple dump-and-reload probably wouldn't duplicate.

If you were working from a dump it'd be easy to try creating the index
before populating the table to see if the bug can be reproduced then,
but there's no certainty that would provoke the bug.

The rest of us have not seen the dump data, so we have no hope of
doing anything with this report anyway.

            regards, tom lane

Re: Incorrect FTS result with GIN index

От
Oleg Bartunov
Дата:
Tom,

you can download dump http://mira.sai.msu.su/~megera/tmp/search_tab.dump

Oleg
On Tue, 27 Jul 2010, Tom Lane wrote:

> Oleg Bartunov <oleg@sai.msu.su> writes:
>> I recommend post your problem to -hackers mailing list. I have no idea,
>> what could be a problem.
>
> I wonder whether the problem is not windows versus non windows but
> original database versus copies.  If it is a GIN bug it seems quite
> possible that it would depend on the order of insertion of the index
> entries, which a simple dump-and-reload probably wouldn't duplicate.
>
> If you were working from a dump it'd be easy to try creating the index
> before populating the table to see if the bug can be reproduced then,
> but there's no certainty that would provoke the bug.
>
> The rest of us have not seen the dump data, so we have no hope of
> doing anything with this report anyway.
>
>             regards, tom lane
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83