[GSoC] kmedoids status report

Поиск
Список
Период
Сортировка
От Maxence Ahlouche
Тема [GSoC] kmedoids status report
Дата
Msg-id CAJeaomUmGypOcfhkkgwYRZmCLQQ-6uA0=1yeHTr567EaW3CBEw@mail.gmail.com
обсуждение исходный текст
Список pgsql-hackers
Hi!

Here is a report of what has been discussed yesterday on IRC.

The kmedoids module now seems to work correctly on basic datasets. I've also implemented its variants with different seeding methods: random initial medoids, and initial medoids distributed among the points (similar to kmeans++ [0]).

The next steps are:
  • Making better tests (1-2d)
  • Writing the documentation (1d)
  • Adapting my code to GP and HAWQ -- btw, are default parameters now available in GP and HAWQ? (1-2d)
  • Refactoring kmedoids and kmeans, as there is code duplication between those two.
    For this step, I don't know if I'll have time to create a clustering module, and make kmeans and kmedoids submodules of it. If yes, then it's perfect; otherwise, I'll just rename the common functions in kmeans, and have kmedoids call them from there.

Hai also helped me setup (once more) the VM where GreenPlum and HAWQ are installed, so that I can test my code on these DBMS.

As a reminder, I'm supposed to stop coding next Monday, and then the last week is dedicated to documentation, tests, refactoring and polishing. 

Regards,

Maxence

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Teodor Sigaev
Дата:
Сообщение: Wraparound limits
Следующее
От: Ants Aasma
Дата:
Сообщение: Re: Reporting the commit LSN at commit time