Re: Does people favor to have matrix data type?

Поиск
Список
Период
Сортировка
От Kouhei Kaigai
Тема Re: Does people favor to have matrix data type?
Дата
Msg-id 9A28C8860F777E439AA12E8AEA7694F8011F77FE@BPXM15GP.gisp.nec.co.jp
обсуждение исходный текст
Ответ на Re: Does people favor to have matrix data type?  (Joe Conway <mail@joeconway.com>)
Ответы Re: Does people favor to have matrix data type?  (Joe Conway <mail@joeconway.com>)
Список pgsql-hackers
> On 05/28/2016 03:33 PM, Kouhei Kaigai wrote:
> >> -----Original Message-----
> >> From: Joe Conway [mailto:mail@joeconway.com]
> >> Sent: Sunday, May 29, 2016 1:40 AM
> >> To: Kaigai Kouhei(海外 浩平); Jim Nasby; Ants Aasma; Simon Riggs
> >> Cc: pgsql-hackers@postgresql.org
> >> Subject: Re: [HACKERS] Does people favor to have matrix data type?
> >>
> >> On 05/28/2016 07:12 AM, Kouhei Kaigai wrote:
> >>> Sparse matrix! It is a disadvantaged area for the current array format.
> >>>
> >>> I have two ideas. HPC folks often split a large matrix into multiple
> >>> grid. A grid is typically up to 1024x1024 matrix, for example.
> >>> If a grid is consists of all zero elements, it is obvious we don't need
> >>> to have individual elements on the grid.
> >>> One other idea is compression. If most of matrix is zero, it is an ideal
> >>> data for compression, and it is easy to reconstruct only when calculation.
> >>>
> >>>> Related to this, Tom has mentioned in the past that perhaps we should
> >>>> support abstract use of the [] construct. Currently point finds a way to
> >>>> make use of [], but I think that's actually coded into the grammar.
> >>>>
> >>> Yep, if we consider 2D-array is matrix, no special enhancement is needed
> >>> to use []. However, I'm inclined to have own data structure for matrix
> >>> to present the sparse matrix.
> >>
> >> +1 I'm sure this would be useful for PL/R as well.
> >>
> >> Joe
> >>
> > It is pretty good idea to combine PL/R and PL/CUDA (what I'm now working)
> > for advanced analytics. We will be able to off-load heavy computing portion
> > to GPU, then also utilize various R functions inside database.
> 
> Agreed. Perhaps at some point we should discuss closer integration of
> some sort, or at least a sample use case.
>
What I'm trying to implement first is k-means clustering by GPU. It core workload
is iteration of massive distance calculations. When I run kmeans() function of R
for million items with 10 clusters on 40 dimensions, it took about thousand seconds.
If GPU version provides the result matrix more rapidly, then I expect R can plot
relationship between items and clusters in human friendly way.

For the closer integration, it may be valuable if PL/R and PL/CUDA can exchange
the data structure with no serialization/de-serialization when PL/R code tries
to call SQL functions. IIUC, pg.spi.exec("SELECT my_function(...)") is the only
way to call SQL functions inside PL/R scripts.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andreas Seltenreich
Дата:
Сообщение: Re: [sqlsmith] PANIC: failed to add BRIN tuple
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Re: Reviewing freeze map code