Re: GSOC Student Project Idea

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: GSOC Student Project Idea
Дата
Msg-id CAPpHfdskjgBMsXd-dLMnhFG-8c2KRyZxseKkByV_6gev6ea6GQ@mail.gmail.com
обсуждение исходный текст
Ответ на GSOC Student Project Idea  (Michael Schuh <schuh.mike@gmail.com>)
Ответы Re: GSOC Student Project Idea  (Florian Pflug <fgp@phlo.org>)
Список pgsql-hackers
Michael,

On Mon, Apr 22, 2013 at 4:28 PM, Michael Schuh <schuh.mike@gmail.com> wrote:
Although I do not have a lot of experience with PostgreSQL development, I am eager to learn and commit my summer to enabling another fantastic feature for the community. Since iDistance is a non-recursive, data-driven, space-based partitioning strategy which builds directly onto a B+-tree, I believe the implementation should be possible using only GiST support. Please let me know if this is of any interest, or if you have any additional questions. Unfortunately, I will be unavailable most of the day, but I plan to fill out the GSOC application later this evening.

I've taken a brief look on the paper and implementation. As I can see iDistance implements some global building strategy. I mean, for example, it selects some point, calculates distances from selected point to all points in dataset etc. So, it uses the whole dataset at the same time. 

GiST and SP-GiST now behaves differently. They have interface functions which operates at maximum very small portion of data which fits to disk page. You can try to fit iDistance into GiST or SP-GiST interface. But will it be that efficient in this case? I've experimented with fitting VP-tree (which looks similar to iDistance at first glance) into SP-GiST. Then my results was depressing: it was useless with levenshtein distance, and it was worse than R-tree with multidimensional points. So I think we really need global index building algorithm in order to reveal power of such data structures. But GiST and SP-GiST currently doesn't support it.

However you can try to implement global index building in GiST or SP-GiST. In this case I think you should carefully estimate your capabilities during single GSoC project. You would need to extend GiST or SP-GiST interface and write completely new implementation of tree building algorithm. Question of how to exactly extend GiST or SP-GiST interface for this case could appear to be very hard even theoretically.

In short my opinion is so: don't expect miracle by just implementing GiST or SP-GiST interface functions.

------
With best regards,
Alexander Korotkov.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: 9.3 Beta1 status report
Следующее
От: Kevin Grittner
Дата:
Сообщение: Re: REFRESH MATERIALIZED VIEW command in PL block hitting Assert