Re: application of KNN code to US zipcode searches?

Поиск
Список
Период
Сортировка
От Oleg Bartunov
Тема Re: application of KNN code to US zipcode searches?
Дата
Msg-id Pine.LNX.4.64.1102172316360.278@sn.sai.msu.ru
обсуждение исходный текст
Ответ на Re: application of KNN code to US zipcode searches?  (Mark Stosberg <mark@summersault.com>)
Ответы Re: application of KNN code to US zipcode searches?
Список pgsql-performance
Mark,

we investigating pgsphere http://pgsphere.projects.postgresql.org/, if we could add KNN support.


Oleg
On Thu, 17 Feb 2011, Mark Stosberg wrote:

>
>> I thought the benefit of KNN was that you could retrieve the rows in
>> distance order, so that a query for the closest 20 locations (for
>> example) would be very fast.  I wouldn't have expected it to be
>> helpful when you're selecting all the rows regardless of distance.
>
> Kevin,
>
> Thanks for the feedback. You are right that my "reduced test case"
> wasn't a good approximation. I added a limit, to simulate finding the
> 100 zipcodes closest to 90210.
>
> Below I compare 4 approaches to the same query:
>
> 1. Cube search
> 2. Earth Distance Search
> 3. Simple point distance (no index)
> 4. Simple point distance (KNN)
>
> Now KNN benchmarks to be almost 100x faster! That's very promising.
> Then there's only the issue that simple point distance is not expected
> to be a good enough approximation of earth-distances. Perhaps that can
> be solved by pre-computing coordinates based on the lat/long pairs....
> much like the map projections used to present a curved surface on a flat
> map? Given that's OK to be be a few miles off, it seems we have some
> leeway here.
>
> Recommendations?
>
>    Mark
>
> EXPLAIN ANALYZE
> SELECT zipcode,
>    cube_distance( '(-2513120.64361786, -4645511.0460328,
> 3575538.9507084)', zipcodes.earth_coords)/1609.344 AS radius
>    FROM zipcodes ORDER BY radius LIMIT 100;
>
> ---------------------------------------------------------------
> Limit  (cost=2946.70..2946.95 rows=100 width=62) (actual
> time=167.650..168.064 rows=100 loops=1)
>   ->  Sort  (cost=2946.70..3050.40 rows=41483 width=62) (actual
> time=167.644..167.829 rows=100 loops=1)
>         Sort Key: ((cube_distance('(-2513120.64361786,
> -4645511.0460328, 3575538.9507084)'::cube, earth_coords) /
> 1609.344::double precision))
>         Sort Method: top-N heapsort  Memory: 20kB
>         ->  Seq Scan on zipcodes  (cost=0.00..1361.24 rows=41483
> width=62) (actual time=0.030..90.807 rows=41483 loops=1)
> Total runtime: 168.300 ms
>
> ############################################################3
>
> -- Using Earthdistance
> EXPLAIN ANALYZE SELECT zipcode,
>    lon_lat <@> '(-118.412426,34.096629)' As radius
>    FROM zipcodes
>    ORDER BY lon_lat <@> '(-118.412426,34.096629)'
>    LIMIT 100;
>
> ------------------------------------------------------------
> Limit  (cost=2842.99..2843.24 rows=100 width=22) (actual
> time=187.995..188.451 rows=100 loops=1)
>   ->  Sort  (cost=2842.99..2946.70 rows=41483 width=22) (actual
> time=187.989..188.149 rows=100 loops=1)
>         Sort Key: ((lon_lat <@> '(-118.412426,34.096629)'::point))
>         Sort Method: top-N heapsort  Memory: 20kB
>         ->  Seq Scan on zipcodes  (cost=0.00..1257.54 rows=41483
> width=22) (actual time=0.033..108.203 rows=41483 loops=1)
> Total runtime: 188.660 ms
>
> ##########################################
>
> Using simple point distance, but with no Gist Index:
>
> EXPLAIN ANALYZE SELECT zipcode,
>    lon_lat <-> '(-118.412426,34.096629)' As radius
>    FROM zipcodes
>    ORDER BY lon_lat <-> '(-118.412426,34.096629)'
>    LIMIT 100;
>
> --------------------------------------------------------
> Limit  (cost=2842.99..2843.24 rows=100 width=22) (actual
> time=160.574..161.057 rows=100 loops=1)
>   ->  Sort  (cost=2842.99..2946.70 rows=41483 width=22) (actual
> time=160.568..160.691 rows=100 loops=1)
>         Sort Key: ((lon_lat <-> '(-118.412426,34.096629)'::point))
>         Sort Method: top-N heapsort  Memory: 20kB
>         ->  Seq Scan on zipcodes  (cost=0.00..1257.54 rows=41483
> width=22) (actual time=0.027..84.610 rows=41483 loops=1)
> Total runtime: 161.226 ms
>
> #########################################
>
> -- Using KNN-GIST index
> EXPLAIN ANALYZE SELECT zipcode,
>    lon_lat <-> '(-118.412426,34.096629)' As radius
>    FROM zipcodes
>    ORDER BY lon_lat <-> '(-118.412426,34.096629)'
>    LIMIT 100;
> ------------------------------------------------------------------
> Limit  (cost=0.00..12.94 rows=100 width=22) (actual time=0.447..1.892
> rows=100 loops=1)
>   ->  Index Scan using zipcodes_knn on zipcodes  (cost=0.00..5365.93
> rows=41483 width=22) (actual time=0.440..1.407 rows=100 loops=1)
>         Order By: (lon_lat <-> '(-118.412426,34.096629)'::point)
> Total runtime: 2.198 ms
>
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: application of KNN code to US zipcode searches?
Следующее
От: Mark Stosberg
Дата:
Сообщение: Re: application of KNN code to US zipcode searches?