Re: Parallel threads in query

Поиск
Список
Период
Сортировка
От Konstantin Knizhnik
Тема Re: Parallel threads in query
Дата
Msg-id c9575f44-4211-78f8-e561-e1ed1baa724f@postgrespro.ru
обсуждение исходный текст
Ответ на Parallel threads in query  (Darafei "Komяpa" Praliaskouski <me@komzpa.net>)
Список pgsql-hackers

On 31.10.2018 22:07, Darafei "Komяpa" Praliaskouski wrote:
> Hi,
>
> I've tried porting some of PostGIS algorithms to utilize multiple 
> cores via OpenMP to return faster.
>
> Question is, what's the best policy to allocate cores so we can play 
> nice with rest of postgres?
>
> What I'd like to see is some function that I can call and get a number 
> of threads I'm allowed to run, that will also advise rest of postgres 
> to not use them, and a function to return the cores back (or do it 
> automatically at the end of query). Is there an infrastructure for that?

I do not completely understand which PostGIS algorithms  you are going 
to make parallel.
So may be you should first clarify it.
There are three options to perform parallel execution of the single 
query in Postgres now:

1. Use existed Postgres parallel capabilities. For example if there is 
some expensive function f() which you are going to execute concurrently,
then  you do not need to do anything: parallel seq scan will do it for 
you. You can configure arbitrary number of parallel workers and so 
control level of concurrency.
The restriction of the current Postgres parallel query processing 
implementation is that
- parallel workers are started for each query
- it is necessary to serialize and pass to parallel workers a lot of 
things from coordinator
- in case of seqscan, workers will compete for pages to scan, so 
effective number of workers should be < 10, while most powerful modern 
servers have hundreds of COU cores.

2. Implement you own parallel plan nodes using existed Postgres parallel 
infrastructure. Such approach has most chances to be committed in 
Postgres core.
But disadvantages are mostly the same as in 1) Exchange of data between 
different process is much more complex and expensive than access to 
common memory in case of threads. Mostly likely you will have to use 
shared message queue and dynamic shared memory, implemented in Postgres 
specially for interaction of parallel workers .

3. Use multithreading to provide concurrent execution of your particular 
algorithm (s[awn threads within backend). You should be very careful 
with this approach, because Postgres code is not thread safe. So you 
should not try to execute in thread any subplan or call any postgres 
functions (unless you are 100% sure that them are thread safe).
This approach may be easy to implement and provide better performance 
than 1). But please notice its limitations. I have used such approach in 
my IMCS extension (In-Memory-Columnar-Store).

You can look at pg_strom extension as an example of providing parallel 
query execution (in this case using parallel capabilities of video cards).

-- 

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: zheap: a new storage format for PostgreSQL
Следующее
От: Erik Rijkers
Дата:
Сообщение: Re: row filtering for logical replication