Обсуждение: Behaviour of to_tsquery(stopwords only)

Поиск
Список
Период
Сортировка

Behaviour of to_tsquery(stopwords only)

От
Richard Huxton
Дата:
I'm not sure what value a tsquery has if it's composed from stopwords 
only, but it doesn't seem to be null or equal to itself.

That strikes me as ... unintuitive, although I'm happy to be re-educated 
on this.

I think it's because CompareTSQ (tsquery_op.c, line 142) doesn't have a 
case to handle query sizes of zero. That's what seems to be returned 
from tsearch/to_tsany.c lines ~ 345-350.


SELECT  qid,words,query,  (query is null) AS isnull,  (query = to_tsquery(words)) as issame
FROM  util.queries
ORDER BY qid DESC LIMIT 5;

NOTICE:  text-search query contains only stop words or doesn't contain 
lexemes, ignored
NOTICE:  text-search query contains only stop words or doesn't contain 
lexemes, ignored qid  |  words   |   query    | isnull | issame
------+----------+------------+--------+-------- 1000 | to       |            | f      | f  999 | or       |
|f      | f  998 | requests | 'request'  | f      | t  997 | site     | 'site'     | f      | t  996 | document |
'document'| f      | t
 
(5 rows)

--   Richard Huxton  Archonet Ltd


Re: Behaviour of to_tsquery(stopwords only)

От
Richard Huxton
Дата:
Further tsquery comparison fun:

=> SELECT q.qid, q.query, count(*) FROM doc.documents d, util.queries q 
WHERE d.words @@ q.query AND (q.query::text=$$'tender'$$) GROUP BY 
q.qid, q.query ; qid |  query   | count
-----+----------+------- 195 | 'tender' |   374 248 | 'tender' |   374 257 | 'tender' |   374 332 | 'tender' |   374
401| 'tender' |   374 409 | 'tender' |   374 519 | 'tender' |   374 557 | 'tender' |   374 736 | 'tender' |   374 749 |
'tender'|   374 869 | 'tender' |   374 879 | 'tender' |   374 926 | 'tender' |   374
 
(13 rows)

=> SELECT q.query, count(*) FROM doc.documents d, util.queries q WHERE 
d.words @@ q.query AND (q.query::text=$$'tender'$$) GROUP BY q.query ;  query   | count
----------+------- 'tender' |  1870 'tender' |  1496 'tender' |  1496
(3 rows)


It seems to be that the tsquery is remembering the shape of the original 
query, even though it's been trimmed.


=> SELECT q.query, min(qid), max(qid), count(*) FROM doc.documents d, 
util.queries q WHERE d.words @@ q.query AND (q.query::text=$$'tender'$$) 
GROUP BY q.query ;  query   | min | max | count
----------+-----+-----+------- 'tender' | 736 | 926 |  1870 (5 rows aggregated) 'tender' | 401 | 557 |  1496 (4 rows
aggregated)'tender' | 195 | 332 |  1496 (4 rows aggregated)
 
(3 rows)

=> SELECT * FROM util.queries WHERE qid IN (195,248, 257, 332, 
401,409,519,557,736,749,869,879,926) ORDER BY qid; qid |        words        |  query
-----+---------------------+---------- 195 | can & of & tenders  | 'tender' (3 clauses) 248 | tender & the & this |
'tender'(3 clauses) 257 | have & tender & for | 'tender' (3 clauses) 332 | for & tenders & of  | 'tender' (3 clauses)
401| tender & with       | 'tender' (2 clauses) 409 | tenders & to        | 'tender' (2 clauses) 519 | tender & to
  | 'tender' (2 clauses) 557 | tenders & be        | 'tender' (2 clauses) 736 | tenderer            | 'tender' (1
clause)749 | tender              | 'tender' (1 clause) 869 | tender              | 'tender' (1 clause) 879 | tender
        | 'tender' (1 clause) 926 | tender              | 'tender' (1 clause)
 
(13 rows)

So - is this a bug, feature, "feature"?

--   Richard Huxton  Archonet Ltd


Re: Behaviour of to_tsquery(stopwords only)

От
Teodor Sigaev
Дата:
> => SELECT * FROM util.queries WHERE qid IN (195,248, 257, 332, 
> 401,409,519,557,736,749,869,879,926) ORDER BY qid;
>  qid |        words        |  query
> -----+---------------------+----------
>  195 | can & of & tenders  | 'tender' (3 clauses)
>  248 | tender & the & this | 'tender' (3 clauses)
>  257 | have & tender & for | 'tender' (3 clauses)
>  332 | for & tenders & of  | 'tender' (3 clauses)
>  401 | tender & with       | 'tender' (2 clauses)
>  409 | tenders & to        | 'tender' (2 clauses)
>  519 | tender & to         | 'tender' (2 clauses)
>  557 | tenders & be        | 'tender' (2 clauses)
>  736 | tenderer            | 'tender' (1 clause)
>  749 | tender              | 'tender' (1 clause)
>  869 | tender              | 'tender' (1 clause)
>  879 | tender              | 'tender' (1 clause)
>  926 | tender              | 'tender' (1 clause)
> (13 rows)
> 
> So - is this a bug, feature, "feature"?

It's definitely a bug:
select count(*), query from queries group by query; count |  query
-------+----------     3 | 'tender'     4 | 'tender'     4 | 'tender'
(3 rows)

Will fix it soon.
-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: Behaviour of to_tsquery(stopwords only)

От
Richard Huxton
Дата:
Teodor Sigaev wrote:
>>
>> So - is this a bug, feature, "feature"?
> 
> It's definitely a bug:
> select count(*), query from queries group by query;
>  count |  query
> -------+----------
>      3 | 'tender'
>      4 | 'tender'
>      4 | 'tender'
> (3 rows)
> 
> Will fix it soon.

Ah, smashing.

--   Richard Huxton  Archonet Ltd


Re: Behaviour of to_tsquery(stopwords only)

От
Teodor Sigaev
Дата:
Fixed for CVS HEAD and 8.3, will fix for previous versions too.

Richard Huxton wrote:
> Teodor Sigaev wrote:
>>>
>>> So - is this a bug, feature, "feature"?
>>
>> It's definitely a bug:
>> select count(*), query from queries group by query;
>>  count |  query
>> -------+----------
>>      3 | 'tender'
>>      4 | 'tender'
>>      4 | 'tender'
>> (3 rows)
>>
>> Will fix it soon.
> 
> Ah, smashing.
> 

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/