Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?
| От | Tom Lane |
|---|---|
| Тема | Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ? |
| Дата | |
| Msg-id | 16167.1465337110@sss.pgh.pa.us обсуждение исходный текст |
| Ответ на | Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ? (Jean-Pierre Pelletier <jppelletier@e-djuster.com>) |
| Ответы |
Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?
Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ? Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ? |
| Список | pgsql-hackers |
Jean-Pierre Pelletier <jppelletier@e-djuster.com> writes:
> I wanted to test if phraseto_tsquery(), new with 9.6 could be used for
> matching consecutive words but it won't work for us if it cannot handle
> consecutive *duplicate* words.
> For example, the following returns true: select
> phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue');
> Is this expected ?
I concur that that seems like a rather useless behavior. If we have
"x <-> y" it is not possible to match at distance zero, while if we
have "x <-> x" it seems unlikely that the user is expecting us to
treat that identically to "x". So phrase search simply should not
consider distance-zero matches.
The attached one-liner patch seems to fix this problem, though I am
uncertain whether any other places need to be changed to match.
Also, there is a regression test case that changes:
*** /home/postgres/pgsql/src/test/regress/expected/tstypes.out Thu May 5 19:21:17 2016
--- /home/postgres/pgsql/src/test/regress/results/tstypes.out Tue Jun 7 17:55:41 2016
***************
*** 897,903 ****
SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A');
ts_rank_cd
------------
! 0.0714286
(1 row)
SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B');
--- 897,903 ----
SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A');
ts_rank_cd
------------
! 0
(1 row)
SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B');
I'm not sure if this case is intentionally exhibiting the behavior that
both parts of "s:* <-> sa:A" can be matched to the same lexeme, or if the
result simply wasn't thought about carefully.
regards, tom lane
diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c
index 591e59c..95ad69b 100644
*** a/src/backend/utils/adt/tsvector_op.c
--- b/src/backend/utils/adt/tsvector_op.c
*************** TS_phrase_execute(QueryItem *curitem,
*** 1409,1415 ****
{
while (Lpos < Ldata.pos + Ldata.npos)
{
! if (WEP_GETPOS(*Lpos) <= WEP_GETPOS(*Rpos))
{
/*
* Lpos is behind the Rpos, so we have to check the
--- 1409,1415 ----
{
while (Lpos < Ldata.pos + Ldata.npos)
{
! if (WEP_GETPOS(*Lpos) < WEP_GETPOS(*Rpos))
{
/*
* Lpos is behind the Rpos, so we have to check the
В списке pgsql-hackers по дате отправления: