Обсуждение: Documentation bug in 8.3?
Reading through the text search data type docs:
http://www.postgresql.org/docs/8.3/static/datatype-textsearch.html#DATATYPE-TSVECTOR
it says:
Optionally, integer position(s) can be attached to any or all of the
lexemes:
SELECT 'a:1 fat:2 cat:3 sat:4 on:5 a:6 mat:7 and:8 ate:9 a:10 fat:11
rat:12'::tsvector;
tsvector
-------------------------------------------------------------------------------
'a':1,6,10 'on':5 'and':8 'ate':9 'cat':3 'fat':2,11 'mat':7 'rat':12
'sat':4
A position normally indicates the source word's location in the
document. Positional information can be used for proximity ranking.
Position values can range from 1 to 16383; larger numbers are silently
clamped to 16383. Duplicate position entries are discarded.
----------------------------------------
However in my testing of 8.3 duplicate position entries are not
discarded:
test=> SELECT 'a:1 b:1'::tsvector;
tsvector
-------------
'a':1 'b':1
(1 row)
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes:
> clamped to 16383. Duplicate position entries are discarded.
> ----------------------------------------
> However in my testing of 8.3 duplicate position entries are not
> discarded:
> test=> SELECT 'a:1 b:1'::tsvector;
> tsvector
> -------------
> 'a':1 'b':1
> (1 row)
Those aren't duplicates, because they're not attached to the same
lexeme. The comment is talking about this behavior:
regression=# SELECT 'a:1 a:1'::tsvector;
tsvector
----------
'a':1
(1 row)
regression=# SELECT 'a:1,2,1'::tsvector;
tsvector
----------
'a':1,2
(1 row)
regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > clamped to 16383. Duplicate position entries are discarded. > > ---------------------------------------- > > > However in my testing of 8.3 duplicate position entries are not > > discarded: > > > test=> SELECT 'a:1 b:1'::tsvector; > > tsvector > > ------------- > > 'a':1 'b':1 > > (1 row) > > Those aren't duplicates, because they're not attached to the same > lexeme. The comment is talking about this behavior: > > regression=# SELECT 'a:1 a:1'::tsvector; > tsvector > ---------- > 'a':1 > (1 row) > > regression=# SELECT 'a:1,2,1'::tsvector; > tsvector > ---------- > 'a':1,2 > (1 row) OK, thanks. I will clarify the documentation. Patch attached and applied. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + Index: doc/src/sgml/datatype.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/datatype.sgml,v retrieving revision 1.222 diff -c -c -r1.222 datatype.sgml *** doc/src/sgml/datatype.sgml 2 Jan 2008 19:53:13 -0000 1.222 --- doc/src/sgml/datatype.sgml 12 Jan 2008 21:50:51 -0000 *************** *** 3330,3336 **** document. Positional information can be used for <firstterm>proximity ranking</firstterm>. Position values can range from 1 to 16383; larger numbers are silently clamped to 16383. ! Duplicate position entries are discarded. </para> <para> --- 3330,3336 ---- document. Positional information can be used for <firstterm>proximity ranking</firstterm>. Position values can range from 1 to 16383; larger numbers are silently clamped to 16383. ! Duplicate positions for the same lexeme are discarded. </para> <para>