Обсуждение: tsvector/tsearch equality and/or portability issue issue ?
We just had a complaint on IRC that: devel=# select 'blah foo bar'::tsvector = 'blah foo bar'::tsvector;?column? ----------f (1 row) and that searches for certain values would not return all matches under some circumstances. a little bit of testing shows the following: postgres=# create table foo (bla tsvector); CREATE TABLE postgres=# insert into foo values ('bla bla'); INSERT 0 1 postgres=# insert into foo values ('bla bla'); INSERT 0 1 postgres=# select bla from foo group by bla; bla -------'bla' (1 row) postgres=# create index foo_idx on foo(bla); CREATE INDEX postgres=# set enable_seqscan to off; SET postgres=# select bla from foo group by bla; bla -------'bla''bla' (2 rows) postgres=# set enable_seqscan to on; SET postgres=# select bla from foo group by bla; bla -------'bla' (1 row) ouch :-( I can reproduce that at least on OpenBSD/i386 and Debian Etch/x86_64. It is also noteworthy that the existing regression tests for tsearch2 do not seem to do any equality testing ... Stefan
On Thursday 24 August 2006 10:34, Stefan Kaltenbrunner wrote: > We just had a complaint on IRC that: > > devel=# select 'blah foo bar'::tsvector = 'blah foo bar'::tsvector; > ?column? > ---------- > f > (1 row) > This could be an endianess issue? This was probably the same person who posted this on the OpenFTS list. He's compiled from source : <snip> dew=# select version(); PostgreSQL 8.1.4 on powerpc-apple-darwin8.6.0, compiled by GCCpowerpc-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer,Inc. build5250) </snip> I don't have any access to an OSX box to verify things ATM. I am trying to get access to one though. :S Can someone else verify this right now? Andy
> devel=# select 'blah foo bar'::tsvector = 'blah foo bar'::tsvector; > ?column? > ---------- > f > (1 row) Fixed in 8.1 and HEAD. Thank you -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
On Aug 24, 2006, at 12:58 , Andrew J. Kopciuch wrote: > On Thursday 24 August 2006 10:34, Stefan Kaltenbrunner wrote: >> We just had a complaint on IRC that: >> >> devel=# select 'blah foo bar'::tsvector = 'blah foo bar'::tsvector; >> ?column? >> ---------- >> f >> (1 row) >> > > > This could be an endianess issue? > > This was probably the same person who posted this on the OpenFTS list. > > He's compiled from source : > > <snip> > dew=# select version(); > PostgreSQL 8.1.4 on powerpc-apple-darwin8.6.0, compiled by GCC > powerpc-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. > build > 5250) > </snip> > > I don't have any access to an OSX box to verify things ATM. I am > trying to > get access to one though. :S Can someone else verify this right > now? Stefan said he reproduced on OpenBSD/i386 so it is unlikely to be an endianness issue. Anyway, here's the comparison code- I guess it doesn't use strcmp to avoid encoding silliness. (?) static int silly_cmp_tsvector(const tsvector * a, const tsvector * b) { if (a->len < b->len) return -1; else if (a->len > b->len) return 1; else if (a->size < b->size) return -1; else if (a->size > b->size) return 1; else { unsigned char *aptr = (unsigned char *) (a->data) + DATAHDRSIZE; unsigned char *bptr = (unsigned char *) (b->data) + DATAHDRSIZE; while (aptr - ((unsigned char *) (a->data)) < a->len) { if (*aptr !=*bptr) return (*aptr < *bptr) ? -1 : 1; aptr++; bptr++; } } return 0; }
"Andrew J. Kopciuch" <akopciuch@bddf.ca> writes: > On Thursday 24 August 2006 10:34, Stefan Kaltenbrunner wrote: >> devel=# select 'blah foo bar'::tsvector = 'blah foo bar'::tsvector; >> ?column? >> ---------- >> f >> (1 row) > This could be an endianess issue? Apparently not, it works for me on HPPA (big endian) and on Darwin/PPC (ditto). I'm testing CVS HEAD though, not 8.1 branch. However ... I also see that tsearch2's regression test is dumping core on my OS X machine. I haven't cvs update'd for awhile on this machine though --- will bring it to HEAD and report back. Can some other people try this? We need to get a handle on which machines show the problem. regards, tom lane
Teodor Sigaev wrote: >> devel=# select 'blah foo bar'::tsvector = 'blah foo bar'::tsvector; >> ?column? >> ---------- >> f >> (1 row) > > Fixed in 8.1 and HEAD. Thank you thanks for the fast response - would it maybe be worthwhile to add regression tests for this kind of thing though ? Stefan
> Stefan said he reproduced on OpenBSD/i386 so it is unlikely to be an > endianness issue. Anyway, here's the comparison code- I guess it doesn't > use strcmp to avoid encoding silliness. (?) I suppose that ordering for tsvector type is some strange and it hasn't any matter. For me, it's a secret why it's needed :) The reason of bug was: some internal parts of tsvector should be shortaligned, so there was an unused bytes. Previous comparing function compares they too... -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Tom Lane wrote: > "Andrew J. Kopciuch" <akopciuch@bddf.ca> writes: >> On Thursday 24 August 2006 10:34, Stefan Kaltenbrunner wrote: >>> devel=# select 'blah foo bar'::tsvector = 'blah foo bar'::tsvector; >>> ?column? >>> ---------- >>> f >>> (1 row) > >> This could be an endianess issue? > > Apparently not, it works for me on HPPA (big endian) and on Darwin/PPC > (ditto). I'm testing CVS HEAD though, not 8.1 branch. > > However ... I also see that tsearch2's regression test is dumping > core on my OS X machine. I haven't cvs update'd for awhile on this > machine though --- will bring it to HEAD and report back. > > Can some other people try this? We need to get a handle on which > machines show the problem. I am trying on current copy of HEAD.. however: jd@scratch:~/pgsqldev$ bin/psql -U postgres postgres < share/contrib/tsearch2.sql SET BEGIN NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "pg_ts_dict_pkey" for table "pg_ts_dict" CREATE TABLE CREATE FUNCTION CREATE FUNCTION CREATE FUNCTION CREATE FUNCTION CREATE FUNCTION CREATE FUNCTION CREATE FUNCTION INSERT 57434167 1 CREATE FUNCTION CREATE FUNCTION INSERT 57434170 1 ERROR: could not find function "snb_ru_init_koi8" in file "/usr/local/pgsql/lib/tsearch2.so" ERROR: current transaction is aborted, commands ignored until end of transaction block ERROR: current transaction is aborted, commands ignored until end of transaction block I will try on 8.1 in a moment. Joshua D. Drake > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutionssince 1997 http://www.commandprompt.com/
> Can some other people try this? We need to get a handle on which > machines show the problem. d@scratch:~/pgsqldev$ /usr/local/pgsql/bin/psql -U postgres postgres Welcome to psql 8.1.3, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help with psql commands \gor terminate with semicolon to execute query \q to quit postgres=# select 'blah foo bar'::tsvector = 'blah foo bar'::tsvector; ?column? ---------- t (1 row) postgres=# AMD 64 X2, Ubuntu Dapper LTS. Sincerely, Joshua D. Drake > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutionssince 1997 http://www.commandprompt.com/
>> Can some other people try this? We need to get a handle on which >> machines show the problem. > > I am trying on current copy of HEAD.. however: Ignore the below... This is an error with my linker/ld.so.conf Joshua D. Drake > > jd@scratch:~/pgsqldev$ bin/psql -U postgres postgres < > share/contrib/tsearch2.sql > SET > BEGIN > NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index > "pg_ts_dict_pkey" for table "pg_ts_dict" > CREATE TABLE > CREATE FUNCTION > CREATE FUNCTION > CREATE FUNCTION > CREATE FUNCTION > CREATE FUNCTION > CREATE FUNCTION > CREATE FUNCTION > INSERT 57434167 1 > CREATE FUNCTION > CREATE FUNCTION > INSERT 57434170 1 > ERROR: could not find function "snb_ru_init_koi8" in file > "/usr/local/pgsql/lib/tsearch2.so" > ERROR: current transaction is aborted, commands ignored until end of > transaction block > ERROR: current transaction is aborted, commands ignored until end of > transaction block > > I will try on 8.1 in a moment. > > Joshua D. Drake > > > >> >> regards, tom lane >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 3: Have you checked our extensive FAQ? >> >> http://www.postgresql.org/docs/faq >> > > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutionssince 1997 http://www.commandprompt.com/
Teodor Sigaev <teodor@sigaev.ru> writes: > Fixed in 8.1 and HEAD. Thank you This appears to have created a regression test failure: *** ./expected/tsearch2.out Sun Jun 18 12:55:28 2006 --- ./results/tsearch2.out Thu Aug 24 14:30:02 2006 *************** *** 2496,2503 **** f | f | '345':1 'qwerti':2 'copyright':3 f | 'qq':7 'bar':2,8 'foo':1,3,6 'copyright':9 - f | 'a':1A,2,3C 'b':5A,6B,7C,8B f | 'a':1A,2,3B 'b':5A,6A,7C,8 f | '7w' 'ch' 'd7' 'eo' 'gw' 'i4' 'lq' 'o6' 'qt' 'y0' f | 'ar' 'ei' 'kq' 'ma' 'qa''qh' 'qq' 'qz' 'rx' 'st' f | 'gs' 'i6' 'i9' 'j2' 'l0' 'oq' 'qx' 'sc' 'xe' 'yu' --- 2496,2503 ---- f | f | '345':1 'qwerti':2 'copyright':3 f | 'qq':7 'bar':2,8 'foo':1,3,6 'copyright':9 f | 'a':1A,2,3B 'b':5A,6A,7C,8 + f | 'a':1A,2,3C 'b':5A,6B,7C,8B f | '7w' 'ch' 'd7' 'eo' 'gw' 'i4' 'lq' 'o6' 'qt' 'y0' f | 'ar''ei' 'kq' 'ma' 'qa' 'qh' 'qq' 'qz' 'rx' 'st' f | 'gs' 'i6' 'i9' 'j2' 'l0' 'oq' 'qx' 'sc' 'xe' 'yu' ====================================================================== regards, tom lane
"Joshua D. Drake" <jd@commandprompt.com> writes: >>> Can some other people try this? We need to get a handle on which >>> machines show the problem. >> >> I am trying on current copy of HEAD.. however: Looks like Teodor already solved the problem, so no need for a fire drill anymore. regards, tom lane
Oops. Fixed. Tom Lane wrote: > Teodor Sigaev <teodor@sigaev.ru> writes: >> Fixed in 8.1 and HEAD. Thank you > > This appears to have created a regression test failure: > > *** ./expected/tsearch2.out Sun Jun 18 12:55:28 2006 > --- ./results/tsearch2.out Thu Aug 24 14:30:02 2006 > *************** > *** 2496,2503 **** > f | > f | '345':1 'qwerti':2 'copyright':3 > f | 'qq':7 'bar':2,8 'foo':1,3,6 'copyright':9 > - f | 'a':1A,2,3C 'b':5A,6B,7C,8B > f | 'a':1A,2,3B 'b':5A,6A,7C,8 > f | '7w' 'ch' 'd7' 'eo' 'gw' 'i4' 'lq' 'o6' 'qt' 'y0' > f | 'ar' 'ei' 'kq' 'ma' 'qa' 'qh' 'qq' 'qz' 'rx' 'st' > f | 'gs' 'i6' 'i9' 'j2' 'l0' 'oq' 'qx' 'sc' 'xe' 'yu' > --- 2496,2503 ---- > f | > f | '345':1 'qwerti':2 'copyright':3 > f | 'qq':7 'bar':2,8 'foo':1,3,6 'copyright':9 > f | 'a':1A,2,3B 'b':5A,6A,7C,8 > + f | 'a':1A,2,3C 'b':5A,6B,7C,8B > f | '7w' 'ch' 'd7' 'eo' 'gw' 'i4' 'lq' 'o6' 'qt' 'y0' > f | 'ar' 'ei' 'kq' 'ma' 'qa' 'qh' 'qq' 'qz' 'rx' 'st' > f | 'gs' 'i6' 'i9' 'j2' 'l0' 'oq' 'qx' 'sc' 'xe' 'yu' > > ====================================================================== > > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
On Thu, Aug 24, 2006 at 09:40:13PM +0400, Teodor Sigaev wrote: > >devel=# select 'blah foo bar'::tsvector = 'blah foo bar'::tsvector; > > ?column? > >---------- > > f > >(1 row) > > Fixed in 8.1 and HEAD. Thank you Things still seem to be broken for me. Among other things, the script at <http://unununium.org/~indigo/testvectors.sql.bz2> fails. It performs two tests, comparing 1000 random vectors with positions and random weights, and comparing the same vectors, but stripped. Oddly, the unstripped comparisons all pass, which is not consistant with what I am seeing in my database. However, I'm yet unable to reproduce those problems. It's worth noting that in running this script I have seen the number of failures change, which seems to indicate that some uninitialized memory is still being compared. test=# \i testvectors.sql BEGIN CREATE FUNCTION CREATE TABLEtotal vectors in test set --------------------------- 1000 (1 row) failing unstripped equality ----------------------------- 0 (1 row) failing stripped equality --------------------------- 389 (1 row) ROLLBACK test=#
Phil Frost <indigo@bitglue.com> writes: > Things still seem to be broken for me. Among other things, the script at > <http://unununium.org/~indigo/testvectors.sql.bz2> fails. It performs two > tests, comparing 1000 random vectors with positions and random weights, and > comparing the same vectors, but stripped. Oddly, the unstripped comparisons all > pass, which is not consistant with what I am seeing in my database. However, > I'm yet unable to reproduce those problems. It looks to me like tsvector comparison may be too strong. The strip() function evidently thinks that it's OK to rearrange the string chunks into the same order as the WordEntry items, which suggests to me that the "pos" fields are not really semantically significant. But silly_cmp_tsvector() considers that a difference in pos values is important. I don't understand the data structure well enough to know which one to believe, but something's not consistent here. regards, tom lane
>> comparing the same vectors, but stripped. Oddly, the unstripped comparisons all >> pass, which is not consistant with what I am seeing in my database. However, >> I'm yet unable to reproduce those problems. Fixed: strncmp was called with wrong length parameter. > > It looks to me like tsvector comparison may be too strong. The strip() > function evidently thinks that it's OK to rearrange the string chunks > into the same order as the WordEntry items, which suggests to me that > the "pos" fields are not really semantically significant. But > silly_cmp_tsvector() considers that a difference in pos values is > important. I don't understand the data structure well enough to know > which one to believe, but something's not consistent here. You are right: Pos really means position of lexeme itself in a tail of tsvector structure. So, it's removed from comparison. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/