Re: GSoC - Idea Discussion

Поиск
Список
Период
Сортировка
От hitesh ramani
Тема Re: GSoC - Idea Discussion
Дата
Msg-id BAY176-W2178F5333B792CA7753536DC0E0@phx.gbl
обсуждение исходный текст
Ответ на Re: GSoC - Idea Discussion  (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
Ответы Re: GSoC - Idea Discussion  (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
Список pgsql-hackers
Hello devs,

Thank you so much for the feedback, to answer to your questions:

Tomas:
>So you've created an array of 1M integers, and it's 7x faster on GPU 
>compared to pg_qsort(), correct?

No, I meant general sorting, not on pg_qsort()

>Well, it might surprise you, but PostgreSQL almost never sorts numbers 
>like this. PostgreSQL sorts tuples, which is way more complicated and, 
>considering the variable length of tuples (causing issues with memory 
>access), rather unsuitable for GPU devices. I might be missing 
>something, of course.
>
>Also, it often needs additional information, like collations when 
>sorting by a text field, for example.

I totally agree with you on this point, my current target area is very confined as this is the beginning, I'm only considering integer values in one row.

>Why don't you show us the source code? Would be simpler than explaining 
>what it does.

You can have a look at the code here: https://github.com/hiteshramani/Postgres-CUDA
This is a compiled code, you can see the call to CUDA function in src/port/qsort.c and .h files - qsort_normal.h and qsort_cuda.h. The hello world program is in src/port/qsort_cuda.cu. Compilation happens in 2 phases - compile and link, I compiled the cuda file with nvcc and for linked I edited the makefile of src/timezone/ because zic build needed the linking of the cuda file.
Suggestions are welcome.

>I'd recommend discussing the code here. It's certainly quite complex, 
>especially if this is your first encounter with it.

Yes, I felt it's a little complex but couldn't find a lot of help resources online. I'm looking for help.

>PostgreSQL uses adaptive sort - in-memory when it fits into work_mem, 
>on-disk when it does not. This is decided at runtime.
>
>You'll have to do the same thing, because the amount of memory available 
>on GPUs is limited to a few GBs, and it needs to work for datasets 
>exceeding that limit (the amount of data is uncertain at planning time).

Yes, I thought of that too. A call could be made with the integer array as an input to the GPU. The GPU then returns the result with a sorted array. I want to proceed step by step, as there are methods to sort amount which exceed the GPU memory.

Álvaro Herrera:
I downloaded the zip of the latest custom_join repo I saw 2 days ago. I'll check once again. Thank you. :)

KaiGai Kohei:

>Let me say CUDA is better than OpenCL :-)
>Because of software quality of OpenCL runtime drivers provided by each vendor,
>I've often faced mysterious problems. Only nvidia's runtime are enough reliable
>from my point of view. In addition, when we implement using OpenCL is a feature
>fully depends on hardware characteristics, so we cannot ignore physical hardware
>underlying the abstraction layer.
>So, I'm now reworking the code to move CUDA from OpenCL.

That's great, I'd love to help you with that and contribute in it.

>It seems to me you are a little bit optimistic.
>Unlike CPU code, GPU-Sorting logic has to reference device memory space,
>so all the data to be compared needs to be transferred to GPU devices.
>Any pointer on host address space is not valid on GPU calculation.
>Amount of device memory is usually smaller than host memory, so your code
>needs a capability to combined multiple chunks that is partially sorted...
>Probably, it is not all here.

Aren't there algorithms which help you if the device memory is limited and the data is massive? I have a rough memory because I did a course online, where I saw algorithms to deal with such problems I suppose.

Thanks and Regards,
Hitesh Ramani

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dmitry Dolgov
Дата:
Сообщение: Re: GSoC 2015: Extra Jsonb functionality
Следующее
От: Thom Brown
Дата:
Сообщение: Re: GSoC 2015: Extra Jsonb functionality