Re: GSOC Student Project Idea

Поиск
Список
Период
Сортировка
От Florian Pflug
Тема Re: GSOC Student Project Idea
Дата
Msg-id D2B64FEC-A6A8-4894-BF73-7CC5AACDB23C@phlo.org
обсуждение исходный текст
Ответ на Re: GSOC Student Project Idea  (Alexander Korotkov <aekorotkov@gmail.com>)
Ответы Re: GSOC Student Project Idea  (Michael Schuh <schuh.mike@gmail.com>)
Re: GSOC Student Project Idea  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Список pgsql-hackers
On Apr23, 2013, at 23:25 , Alexander Korotkov <aekorotkov@gmail.com> wrote:
> I've taken a brief look on the paper and implementation. As I can see iDistance implements some global building
strategy.I mean, for example, it selects some point, calculates distances from selected point to all points in dataset
etc.So, it uses the whole dataset at the same time.  
>
> However you can try to implement global index building in GiST or SP-GiST. In this case I think you should carefully
estimateyour capabilities during single GSoC project. You would need to extend GiST or SP-GiST interface and write
completelynew implementation of tree building algorithm. Question of how to exactly extend GiST or SP-GiST interface
forthis case could appear to be very hard even theoretically. 

+1. That seemed to be a major roadblock to me too when I read the paper.

You could work around that by making partition identification a separate step. You'd have a function
 idist_analyze(cfg name, table name, field name)

which'd identify suitable partitions for the data distribution in table.field and store them somewhere. Such a set of
pre-identifiedpartitions would be akin to a tsearch configuration, i.e. all other parts of the iDistance machinery
woulduse it to map points to index keys and queries to ranges of those keys. You'll want to look at how tsearch handles
that,and check if the method can indeed be applied to iDistance. 

In a first cut, you'd probably only allow inserts into index which don't change the maximum distances from the
partitioncenters that idist_analyze() found. With that restriction in place, you might get away without GiST or
SP-GiSt,and simply use postgres' standard btree, if you find a way to map queries of the form "field
<idistance-knn-operator>idistance_knn_function(point, distance)" to "where (idstance_keys(field) between P1_lowerbound
ANDP2_upperbound) OR (idistance_keys(field) between P2_lowerbuild AND P2_upperbound) …" or something equivalent. Or you
couldstart with a GiST-based btree and extend it to support "distance" function. Looking at contrib/btree_gist and the
built-inGiST operator class for point types should help. 

best regards,
Florian Pflug




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kyotaro HORIGUCHI
Дата:
Сообщение: Re: Failing start-up archive recovery at Standby mode in PG9.2.4
Следующее
От: Fabrízio de Royes Mello
Дата:
Сообщение: Re: Proposal to add --single-row to psql