Обсуждение: Improve Full text rank in a query
Hi all,
I'm running the following query to match a supplied text string to an actual
place name which is recorded in a table with extra info like coordinates,
etc.
SELECT ts_rank_cd(textsearchable_index_col , query, 32 /* rank/(rank+1) */)
AS rank,*
FROM gazetteer, to_tsquery('Gunbower|Island|Vic') query
WHERE query @@ textsearchable_index_col order by rank desc, concise_ga desc,
auda_alloc desc LIMIT 10
When I run this I get the following top two results:
Pos Rank Name
State
1 0.23769 Gunbower Island Primary School Vic
2 0.23769 Gunbower Island Vic
The textsearchable_index_col for each of these looks like this:
'vic':6 '9999':5 'gunbow':1 'island':2 'school':4 'primari':3 'victoria':7
'vic':4 '9999':3 'gunbow':1 'island':2 'victoria':5
I'm new to this, but I can't figure out why the "Gunbower Island Primary
School" is getting top place. How do I get the query to improve the ranking
so that an exact match (like "Gunbower|Island|Vic") gets a higher position?
Thanks,
bw
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.516 / Virus Database: 269.21.4/1309 - Release Date: 3/03/2008
6:50 PM
"b wragg" <bwragg@tpg.com.au> writes:
> I'm new to this, but I can't figure out why the "Gunbower Island Primary
> School" is getting top place. How do I get the query to improve the ranking
> so that an exact match (like "Gunbower|Island|Vic") gets a higher position?
I'm new at this too, but AFAICS these are both exact matches: they have
the same matching lexemes at the same positions, so the basic rank
calculation is going to come out exactly the same. Normalization option
32 doesn't help (as the manual notes, it's purely cosmetic). So it's
random chance which one comes out first.
What I think you might want is one of the other normalization options,
so that shorter documents are preferred. Either 1, 2, 8, or 16 would
do fine for this simple example --- which one you want depends on just
how heavily you want to favor shorter documents.
regards, tom lane
On Fri, 7 Mar 2008, b wragg wrote:
> Hi all,
>
> I'm running the following query to match a supplied text string to an actual
> place name which is recorded in a table with extra info like coordinates,
> etc.
>
> SELECT ts_rank_cd(textsearchable_index_col , query, 32 /* rank/(rank+1) */)
> AS rank,*
> FROM gazetteer, to_tsquery('Gunbower|Island|Vic') query
> WHERE query @@ textsearchable_index_col order by rank desc, concise_ga desc,
> auda_alloc desc LIMIT 10
>
> When I run this I get the following top two results:
>
> Pos Rank Name
> State
> 1 0.23769 Gunbower Island Primary School Vic
> 2 0.23769 Gunbower Island Vic
>
> The textsearchable_index_col for each of these looks like this:
>
> 'vic':6 '9999':5 'gunbow':1 'island':2 'school':4 'primari':3 'victoria':7
> 'vic':4 '9999':3 'gunbow':1 'island':2 'victoria':5
>
> I'm new to this, but I can't figure out why the "Gunbower Island Primary
> School" is getting top place. How do I get the query to improve the ranking
> so that an exact match (like "Gunbower|Island|Vic") gets a higher position?
you can read documentation and use document length normalization flag,
or write your own ranking function.
>
> Thanks,
>
> bw
>
>
>
>
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.5.516 / Virus Database: 269.21.4/1309 - Release Date: 3/03/2008
> 6:50 PM
>
>
>
>
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83