KNN-GiST with recheck

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема KNN-GiST with recheck
Дата
Msg-id CAPpHfdu_qBLNRnv-r=_tofZYYa6-r=Z5_MGF4_phAOkWcYxfRg@mail.gmail.com
обсуждение исходный текст
Ответы Re: KNN-GiST with recheck  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Re: KNN-GiST with recheck  (Peter Geoghegan <pg@heroku.com>)
Список pgsql-hackers
Hackers!

This patch was split from thread:

I've split it to separate thead, because it's related to partial sort only conceptually not technically. Also I renamed it to "knn-gist-recheck" from "partial-knn" as more appropriate name. In the attached version docs are updated. Possible weak point of this patch design is that it fetches heap tuple from GiST scan. However, I didn't receive any notes about its design, so, I'm going to put it to commitfest.

Here goes a desription of this patch same as in original thread.

KNN-GiST provides ability to get ordered results from index, but this order is based only on index information. For instance, GiST index contains bounding rectangles for polygons, and we can't get exact distance to polygon from index (similar situation is in PostGIS). In attached patch, GiST distance method can set recheck flag (similar to consistent method). This flag means that distance method returned lower bound of distance and we should recheck it from heap.

See an example.

create table test as (select id, polygon(3+(random()*10)::int, circle(point(random(), random()), 0.0003 + random()*0.001)) as p from generate_series(1,1000000) id);
create index test_idx on test using gist (p);

We can get results ordered by distance from polygon to point.

postgres=# select id, p <-> point(0.5,0.5) from test order by p <-> point(0.5,0.5) limit 10;
   id   |       ?column?       
--------+----------------------
 755611 | 0.000405855808916853
 807562 | 0.000464123777564343
 437778 | 0.000738524708741959
 947860 |  0.00076250998760724
 389843 | 0.000886362723569568
  17586 | 0.000981960100555216
 411329 |  0.00145338112316853
 894191 |  0.00149399559703506
 391907 |   0.0016647896049741
 235381 |  0.00167554614889509
(10 rows)

It's fast using just index scan.

                                                            QUERY PLAN                                                            
----------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.29..1.86 rows=10 width=36) (actual time=0.180..0.230 rows=10 loops=1)
   ->  Index Scan using test_idx on test  (cost=0.29..157672.29 rows=1000000 width=36) (actual time=0.179..0.228 rows=10 loops=1)
         Order By: (p <-> '(0.5,0.5)'::point)
 Total runtime: 0.305 ms
(4 rows)

------
With best regards,
Alexander Korotkov.
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alexander Korotkov
Дата:
Сообщение: Re: GIN improvements part 1: additional information
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: nested hstore patch