Обсуждение: a tsearch issue
Hello
I found a interesting issue when I checked a tsearch prefix searching.
We use a ispell based dictionary
CREATE TEXT SEARCH DICTIONARY cspell (template=ispell, dictfile = czech, afffile=czech, stopwords=czech);
CREATE TEXT SEARCH CONFIGURATION cs (copy=english);
ALTER TEXT SEARCH CONFIGURATION cs ALTER MAPPING FOR word, asciiword WITH cspell, simple;
Then I created a table
postgres=# create table n(a varchar);
CREATE TABLE
postgres=# insert into n values('Stěhule'),('Chromečka');
INSERT 0 2
postgres=# select * from n; a
───────────StěhuleChromečka
(2 rows)
and I tested a prefix searching:
I found a following issue
postgres=# select * from n where to_tsvector('cs', a) @@
to_tsquery('cs','Stě:*') ;a
───
(0 rows)
I expected one row. The problem is in transformation of word 'Stě'
postgres=# select * from ts_debug('cs','Stě:*') ;
─[ RECORD 1 ]┬──────────────────
alias │ word
description │ Word, all letters
token │ Stě
dictionaries │ {cspell,simple}
dictionary │ cspell
lexemes │ {sto}
─[ RECORD 2 ]┼──────────────────
alias │ blank
description │ Space symbols
token │ :*
dictionaries │ {}
dictionary │ [null]
lexemes │ [null]
Ispell disctionary cannot to work well with a first n chars from word.
I don't know what is correct solution of this problem.
Minimally note in prefix search, so this cannot work well with *spell
dictionaries - or description of this issue.
Regards
Pavel Stehue
On Fri, 2011-11-04 at 11:22 +0100, Pavel Stehule wrote:
> Hello
>
> I found a interesting issue when I checked a tsearch prefix searching.
>
> We use a ispell based dictionary
>
> CREATE TEXT SEARCH DICTIONARY cspell
> (template=ispell, dictfile = czech, afffile=czech, stopwords=czech);
> CREATE TEXT SEARCH CONFIGURATION cs (copy=english);
> ALTER TEXT SEARCH CONFIGURATION cs
> ALTER MAPPING FOR word, asciiword WITH cspell, simple;
>
> Then I created a table
>
> postgres=# create table n(a varchar);
> CREATE TABLE
> postgres=# insert into n values('Stěhule'),('Chromečka');
> INSERT 0 2
> postgres=# select * from n;
> a
> ───────────
> Stěhule
> Chromečka
> (2 rows)
>
> and I tested a prefix searching:
>
> I found a following issue
>
> postgres=# select * from n where to_tsvector('cs', a) @@
> to_tsquery('cs','Stě:*') ;
> a
> ───
> (0 rows)
Most likely you are hit by this problem.
http://archives.postgresql.org/pgsql-hackers/2011-10/msg01347.php
'Stě' may be a stopword in czech.
> I expected one row. The problem is in transformation of word 'Stě'
>
> postgres=# select * from ts_debug('cs','Stě:*') ;
> ─[ RECORD 1 ]┬──────────────────
> alias │ word
> description │ Word, all letters
> token │ Stě
> dictionaries │ {cspell,simple}
> dictionary │ cspell
> lexemes │ {sto}
> ─[ RECORD 2 ]┼──────────────────
> alias │ blank
> description │ Space symbols
> token │ :*
> dictionaries │ {}
> dictionary │ [null]
> lexemes │ [null]
>
':*' is only specific to to_tsquery. ts_debug just invokes the parser.
So this is not correct.
-Sushant.