Обсуждение: Normalized Ranking example incorrect in text search
http://developer.postgresql.org/pgdocs/postgres/textsearch-controls.html Ranking Search Results shows and example which says "This is the same example using normalized ranking" and then gives a query which calculates normalization in an incorrect manner, yet without using the normalization parameter. A correct example would be something like this: SELECT title, ts_rank_cd(textsearch, query, 8 /*Normalization*/) AS rank FROM apod, to_tsquery('neutrino|(dark & matter)') query WHERE query @@ textsearch ORDER BY rank DESC LIMIT 10; I can't rerun the query because I don't have the example data set used. Is that available? This section also describes the two ranking functions supplied and suggests you can write your own also. - Can we say what the differences are between the two ranking functions? Why do we have two? - Can we supply or link to an example ranking function to allow people to write their own? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Simon Riggs <simon@2ndquadrant.com> writes: > http://developer.postgresql.org/pgdocs/postgres/textsearch-controls.html > Ranking Search Results > shows and example which says > "This is the same example using normalized ranking" > and then gives a query which calculates normalization in an incorrect > manner, On what basis do you claim that's an incorrect manner? It's exactly what is described in the paragraph just before the examples. > A correct example > would be something like this: > SELECT title, ts_rank_cd(textsearch, query, 8 /*Normalization*/) AS rank Why is that correct (or more correct than other ways)? > - Can we say what the differences are between the two ranking functions? > Why do we have two? We already say that: the _cd function doesn't work without positional info in the input tsvector. regards, tom lane
I wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
>> and then gives a query which calculates normalization in an incorrect
>> manner,
> On what basis do you claim that's an incorrect manner? It's exactly
> what is described in the paragraph just before the examples.
... although on reflection, it seems pretty stupid to be recommending
a method that requires two evaluations at each row of an admittedly
expensive function.
Seems like we should add one more normalization flag bit:
32 --- replace computed rank by rank / (rank + 1)
and then the second example would be
SELECT title, ts_rank_cd(textsearch, query, 32 /* rank/(rank+1) */) AS rank
FROM apod, to_tsquery('neutrino|(dark & matter)') query
WHERE query @@ textsearch
ORDER BY rank DESC LIMIT 10;
with no change in the example output.
regards, tom lane