Re: Does people favor to have matrix data type?

Поиск
Список
Период
Сортировка
От ktm@rice.edu
Тема Re: Does people favor to have matrix data type?
Дата
Msg-id 20160525132243.GD32767@aart.rice.edu
обсуждение исходный текст
Ответ на Re: Does people favor to have matrix data type?  (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
Список pgsql-hackers
On Wed, May 25, 2016 at 09:10:02AM +0000, Kouhei Kaigai wrote:
> > -----Original Message-----
> > From: Simon Riggs [mailto:simon@2ndQuadrant.com]
> > Sent: Wednesday, May 25, 2016 4:39 PM
> > To: Kaigai Kouhei(海外 浩平)
> > Cc: pgsql-hackers@postgresql.org
> > Subject: Re: [HACKERS] Does people favor to have matrix data type?
> > 
> > On 25 May 2016 at 03:52, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> > 
> > 
> >     In a few days, I'm working for a data type that represents matrix in
> >     mathematical area. Does people favor to have this data type in the core,
> >     not only my extension?
> > 
> > 
> > If we understood the use case, it might help understand whether to include it or not.
> > 
> > Multi-dimensionality of arrays isn't always useful, so this could be good.
> >
> As you may expect, the reason why I've worked for matrix data type is one of
> the groundwork for GPU acceleration, but not limited to.
> 
> What I tried to do is in-database calculation of some analytic algorithm; not
> exporting entire dataset to client side.
> My first target is k-means clustering; often used to data mining.
> When we categorize N-items which have M-attributes into k-clusters, the master
> data can be shown in NxM matrix; that is equivalent to N vectors in M-dimension.
> The cluster centroid is also located inside of the M-dimension space, so it
> can be shown in kxM matrix; that is equivalent to k vectors in M-dimension.
> The k-means algorithm requires to calculate the distance to any cluster centroid
> for each items, thus, it produces Nxk matrix; that is usually called as distance
> matrix. Next, it updates the cluster centroid using the distance matrix, then
> repeat the entire process until convergence.
> 
> The heart of workload is calculation of distance matrix. When I tried to write
> k-means algorithm using SQL + R, its performance was not sufficient (poor).
>   https://github.com/kaigai/toybox/blob/master/Rstat/pgsql-kmeans.r
> 
> If we would have native functions we can use instead of the complicated SQL
> expression, it will make sense for people who tries in-database analytics.
> 
> Also, fortunately, PostgreSQL's 2-D array format is binary compatible to BLAS
> library's requirement. It will allow GPU to process large matrix in HPC grade
> performance.
> 
> Thanks,
> --
> NEC Business Creation Division / PG-Strom Project
> KaiGai Kohei <kaigai@ak.jp.nec.com>

Hi,

Have you looked at Perl Data Language under pl/perl? It has pretty nice support
for matrix calculations:

http://pdl.perl.org

Regards,
Ken



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kouhei Kaigai
Дата:
Сообщение: Re: Does people favor to have matrix data type?
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: Inheritance