Обсуждение: Parallel threads in query

Поиск

Список

Период

Сортировка

Parallel threads in query

От

Darafei "Komяpa" Praliaskouski

Дата:

31 октября 2018 г., 22:07:43

Hi,

I've tried porting some of PostGIS algorithms to utilize multiple cores via OpenMP to return faster.

Question is, what's the best policy to allocate cores so we can play nice with rest of postgres?

What I'd like to see is some function that I can call and get a number of threads I'm allowed to run, that will also advise rest of postgres to not use them, and a function to return the cores back (or do it automatically at the end of query). Is there an infrastructure for that?

--

Darafei Praliaskouski
Support me: http://patreon.com/komzpa

Re: Parallel threads in query

От

Tom Lane

Дата:

01 ноября 2018 г., 00:11:16

=?UTF-8?Q?Darafei_=22Kom=D1=8Fpa=22_Praliaskouski?= <me@komzpa.net> writes:
> Question is, what's the best policy to allocate cores so we can play nice
> with rest of postgres?

> What I'd like to see is some function that I can call and get a number of
> threads I'm allowed to run, that will also advise rest of postgres to not
> use them, and a function to return the cores back (or do it automatically
> at the end of query). Is there an infrastructure for that?

There is not, because we do not use or support multiple threads inside
a Postgres backend, and have no intention of doing so any time soon.
There is a huge amount of non-thread-safe infrastructure there, and
if you call any of it from other than the main thread, bad things will
happen.

You might be able to make this work if the threaded stuff is kept rigidly
separate from any core Postgres code, but it'll be error-prone.

            regards, tom lane

Re: Parallel threads in query

От

David Fetter

Дата:

01 ноября 2018 г., 02:11:37

On Wed, Oct 31, 2018 at 09:07:43AM -1000, Darafei "Komяpa" Praliaskouski wrote:
> Hi,
> 
> I've tried porting some of PostGIS algorithms to utilize multiple cores via
> OpenMP to return faster.

Great!

> Question is, what's the best policy to allocate cores so we can play nice
> with rest of postgres?

As Tom mentioned, the PostgreSQL backend is not yet threaded and is
not likely to be any time in the next couple of years. There
has been at least one attempt recently to start this work.
https://www.postgresql.org/message-id/9defcb14-a918-13fe-4b80-a0b02ff85527%40postgrespro.ru 

> What I'd like to see is some function that I can call and get a
> number of threads I'm allowed to run, that will also advise rest of
> postgres to not use them, and a function to return the cores back
> (or do it automatically at the end of query). Is there an
> infrastructure for that?

Not really, as above.  In the case of GIS queries, you may be able to
send a large-enough payload of work to make it worthwhile to do some
kind of IPC or even a (short, high-bandwidth, low-latency) network hop
to communicate with a separate OpenMP GIS server.  So long as the
threads don't directly interact with the backend, you could do this
safely.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Parallel threads in query

От

Konstantin Knizhnik

Дата:

01 ноября 2018 г., 10:19:45

On 31.10.2018 22:07, Darafei "Komяpa" Praliaskouski wrote:
> Hi,
>
> I've tried porting some of PostGIS algorithms to utilize multiple 
> cores via OpenMP to return faster.
>
> Question is, what's the best policy to allocate cores so we can play 
> nice with rest of postgres?
>
> What I'd like to see is some function that I can call and get a number 
> of threads I'm allowed to run, that will also advise rest of postgres 
> to not use them, and a function to return the cores back (or do it 
> automatically at the end of query). Is there an infrastructure for that?

I do not completely understand which PostGIS algorithms  you are going 
to make parallel.
So may be you should first clarify it.
There are three options to perform parallel execution of the single 
query in Postgres now:

1. Use existed Postgres parallel capabilities. For example if there is 
some expensive function f() which you are going to execute concurrently,
then  you do not need to do anything: parallel seq scan will do it for 
you. You can configure arbitrary number of parallel workers and so 
control level of concurrency.
The restriction of the current Postgres parallel query processing 
implementation is that
- parallel workers are started for each query
- it is necessary to serialize and pass to parallel workers a lot of 
things from coordinator
- in case of seqscan, workers will compete for pages to scan, so 
effective number of workers should be < 10, while most powerful modern 
servers have hundreds of COU cores.

2. Implement you own parallel plan nodes using existed Postgres parallel 
infrastructure. Such approach has most chances to be committed in 
Postgres core.
But disadvantages are mostly the same as in 1) Exchange of data between 
different process is much more complex and expensive than access to 
common memory in case of threads. Mostly likely you will have to use 
shared message queue and dynamic shared memory, implemented in Postgres 
specially for interaction of parallel workers .

3. Use multithreading to provide concurrent execution of your particular 
algorithm (s[awn threads within backend). You should be very careful 
with this approach, because Postgres code is not thread safe. So you 
should not try to execute in thread any subplan or call any postgres 
functions (unless you are 100% sure that them are thread safe).
This approach may be easy to implement and provide better performance 
than 1). But please notice its limitations. I have used such approach in 
my IMCS extension (In-Memory-Columnar-Store).

You can look at pg_strom extension as an example of providing parallel 
query execution (in this case using parallel capabilities of video cards).

-- 

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Parallel threads in query

От

Paul Ramsey

Дата:

01 ноября 2018 г., 20:10:33

On Wed, Oct 31, 2018 at 2:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Darafei "Komяpa" Praliaskouski <me@komzpa.net> writes:
> Question is, what's the best policy to allocate cores so we can play nice
> with rest of postgres?

There is not, because we do not use or support multiple threads inside
a Postgres backend, and have no intention of doing so any time soon.

As a practical matter though, if we're multi-threading a heavy PostGIS function, presumably simply grabbing *every* core is not a recommended or friendly practice. My finger-in-the-wind guess would be that the value of max_parallel_workers_per_gather would be the most reasonable value to use to limit the number of cores a parallel PostGIS function should use. Does that make sense?

P

Re: Parallel threads in query

От

Andres Freund

Дата:

01 ноября 2018 г., 20:15:27

On 2018-11-01 10:10:33 -0700, Paul Ramsey wrote:
> On Wed, Oct 31, 2018 at 2:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 
> > =?UTF-8?Q?Darafei_=22Kom=D1=8Fpa=22_Praliaskouski?= <me@komzpa.net>
> > writes:
> > > Question is, what's the best policy to allocate cores so we can play nice
> > > with rest of postgres?
> >
> 
> 
> > There is not, because we do not use or support multiple threads inside
> > a Postgres backend, and have no intention of doing so any time soon.
> >
> 
> As a practical matter though, if we're multi-threading  a heavy PostGIS
> function, presumably simply grabbing *every* core is not a recommended or
> friendly practice. My finger-in-the-wind guess would be that the value
> of max_parallel_workers_per_gather would be the most reasonable value to
> use to limit the number of cores a parallel PostGIS function should use.
> Does that make sense?

I'm not sure that's a good approximation.  Postgres' infrastructure
prevents every query from using max_parallel_workers_per_gather
processes due to the global max_worker_processes limit.  I think you
probably would want something very very roughly approximating a global
limit - otherwise you'll either need to set the per-process limit way
too low, or overwhelm machines with context switches.

Greetings,

Andres Freund

Re: Parallel threads in query

От

Tomas Vondra

Дата:

01 ноября 2018 г., 21:33:39

On 11/01/2018 06:15 PM, Andres Freund wrote:
> On 2018-11-01 10:10:33 -0700, Paul Ramsey wrote:
>> On Wed, Oct 31, 2018 at 2:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>>> =?UTF-8?Q?Darafei_=22Kom=D1=8Fpa=22_Praliaskouski?= <me@komzpa.net>
>>> writes:
>>>> Question is, what's the best policy to allocate cores so we can play nice
>>>> with rest of postgres?
>>>
>>
>>
>>> There is not, because we do not use or support multiple threads inside
>>> a Postgres backend, and have no intention of doing so any time soon.
>>>
>>
>> As a practical matter though, if we're multi-threading  a heavy PostGIS
>> function, presumably simply grabbing *every* core is not a recommended or
>> friendly practice. My finger-in-the-wind guess would be that the value
>> of max_parallel_workers_per_gather would be the most reasonable value to
>> use to limit the number of cores a parallel PostGIS function should use.
>> Does that make sense?
> 
> I'm not sure that's a good approximation.  Postgres' infrastructure
> prevents every query from using max_parallel_workers_per_gather
> processes due to the global max_worker_processes limit.  I think you
> probably would want something very very roughly approximating a global
> limit - otherwise you'll either need to set the per-process limit way
> too low, or overwhelm machines with context switches.
> 

Yeah. Without a global limit it would be fairly trivial to create way
too many threads - say when a query gets parallelized, and each worker
creates a bunch of private threads. And then a bunch of such queries
executed concurrently, and it's getting bad pretty fast.

In theory, simulating such global limit should be possible using a bit
of shared memory for the current total, per-process counter and probably
some simple abort handling (say, just like contrib/openssl does using
ResourceOwner).

A better solution might be to start a bgworker managing a connection
pool and forward the requests to it using IPC (and enforce the thread
count limit there).

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Parallel threads in query

От

Andres Freund

Дата:

01 ноября 2018 г., 21:40:37

Hi,

On 2018-11-01 19:33:39 +0100, Tomas Vondra wrote:
> In theory, simulating such global limit should be possible using a bit
> of shared memory for the current total, per-process counter and probably
> some simple abort handling (say, just like contrib/openssl does using
> ResourceOwner).

Right.  I don't think you even need something resowner like, given that
anything using threads better make it absolutely absolutely impossible
that an error can escape.

> A better solution might be to start a bgworker managing a connection
> pool and forward the requests to it using IPC (and enforce the thread
> count limit there).

That doesn't really seem feasible for cases like this - after all, you'd
only use threads to work on individual rows if you wanted to parallelize
relatively granular per-row work or such. Adding cross-process IPC seems
like it'd make that perform badly.

Greetings,

Andres Freund

Re: Parallel threads in query

От

Darafei "Komяpa" Praliaskouski

Дата:

01 ноября 2018 г., 21:43:27

In theory, simulating such global limit should be possible using a bit
of shared memory for the current total, per-process counter and probably
some simple abort handling (say, just like contrib/openssl does using
ResourceOwner).

I would expect that this limit is already available and it's parallel worker limit. Basically, when start a new thread I would like to somehow consume a part of parallel worker limit - a thread is a kind of parallel worker, from user's perspective. If I have 4 cores and Postgres already started 4 parallel workers, I don't really want to start 4 threads for each of them, or 4 for one of them and 1 for each of the rest, if I manage that separately from parallel worker limit.

IPC and co - that's another question and out of scope for this one. Since OpenMP allows to write multithreaded code by just adding more #pragma around loops, I don't want to reinvent that part of infrastructure.

--

Darafei Praliaskouski
Support me: http://patreon.com/komzpa

Re: Parallel threads in query

От

Tomas Vondra

Дата:

01 ноября 2018 г., 21:44:54

On 11/01/2018 07:40 PM, Andres Freund wrote:
> Hi,
> 
> On 2018-11-01 19:33:39 +0100, Tomas Vondra wrote:
>> In theory, simulating such global limit should be possible using a bit
>> of shared memory for the current total, per-process counter and probably
>> some simple abort handling (say, just like contrib/openssl does using
>> ResourceOwner).
> 
> Right.  I don't think you even need something resowner like, given that
> anything using threads better make it absolutely absolutely impossible
> that an error can escape.
> 

True. Still, I wonder if the process can die in a way that would fail to
update the counter.

> 
>> A better solution might be to start a bgworker managing a connection
>> pool and forward the requests to it using IPC (and enforce the thread
>> count limit there).
> 
> That doesn't really seem feasible for cases like this - after all, you'd
> only use threads to work on individual rows if you wanted to parallelize
> relatively granular per-row work or such. Adding cross-process IPC seems
> like it'd make that perform badly.
> 

I think that very much depends on how expensive the tasks handled by the
threads are. It may still be cheaper than a reasonable IPC, and if you
don't create/destroy threads, that also saves quite a bit of time.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Parallel threads in query

От

Tomas Vondra

Дата:

01 ноября 2018 г., 21:50:06

On 11/01/2018 07:43 PM, Darafei "Komяpa" Praliaskouski wrote:
>     In theory, simulating such global limit should be possible using a bit
>     of shared memory for the current total, per-process counter and probably
>     some simple abort handling (say, just like contrib/openssl does using
>     ResourceOwner).
> 
> 
> I would expect that this limit is already available and it's parallel
> worker limit. Basically, when start a new thread I would like to somehow
> consume a part of parallel worker limit - a thread is a kind of parallel
> worker, from user's perspective. If I have 4 cores and Postgres already
> started 4 parallel workers, I don't really want to start 4 threads for
> each of them, or 4 for one of them and 1 for each of the rest, if I
> manage that separately from parallel worker limit.
> 

Well, PostgreSQL does that, but only for the process-based parallelism.
It has no idea about threads, so it can't work out of the box. Also, the
max_worker_processes limit determines various shared memory we need to
manage those processes, so it's really not about threads.

If you need something like that for threads, feel free to do that, but
I'd strongly suggest using a separate counter (perhaps using a m_w_p as
an initial value).

> IPC and co - that's another question and out of scope for this one.
> Since OpenMP allows to write multithreaded code by just adding more
> #pragma around loops, I don't want to reinvent that part of infrastructure.

Maybe. I don't know OpenMP that well, so I can't really safe if that's a
good idea or not.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Parallel threads in query

От

Andres Freund

Дата:

01 ноября 2018 г., 21:50:35

Hi,

On 2018-11-01 19:44:54 +0100, Tomas Vondra wrote:
> On 11/01/2018 07:40 PM, Andres Freund wrote:
> > On 2018-11-01 19:33:39 +0100, Tomas Vondra wrote:
> >> In theory, simulating such global limit should be possible using a bit
> >> of shared memory for the current total, per-process counter and probably
> >> some simple abort handling (say, just like contrib/openssl does using
> >> ResourceOwner).
> > 
> > Right.  I don't think you even need something resowner like, given that
> > anything using threads better make it absolutely absolutely impossible
> > that an error can escape.
> > 
> 
> True. Still, I wonder if the process can die in a way that would fail to
> update the counter.

You'd better make that case a panic restart.


> >> A better solution might be to start a bgworker managing a connection
> >> pool and forward the requests to it using IPC (and enforce the thread
> >> count limit there).
> > 
> > That doesn't really seem feasible for cases like this - after all, you'd
> > only use threads to work on individual rows if you wanted to parallelize
> > relatively granular per-row work or such. Adding cross-process IPC seems
> > like it'd make that perform badly.
> > 
> 
> I think that very much depends on how expensive the tasks handled by the
> threads are. It may still be cheaper than a reasonable IPC, and if you
> don't create/destroy threads, that also saves quite a bit of time.

I'm not following. How can you have a pool *and* threads? Those seem to
be contradictory in PG's architecture? You need full blown IPC with your
proposal afaict?

Greetings,

Andres Freund

Re: Parallel threads in query

От

Tomas Vondra

Дата:

01 ноября 2018 г., 21:57:17

On 11/01/2018 07:50 PM, Andres Freund wrote:
> Hi,
> 
> On 2018-11-01 19:44:54 +0100, Tomas Vondra wrote:
>> On 11/01/2018 07:40 PM, Andres Freund wrote:
>>> On 2018-11-01 19:33:39 +0100, Tomas Vondra wrote:
>>>> In theory, simulating such global limit should be possible using a bit
>>>> of shared memory for the current total, per-process counter and probably
>>>> some simple abort handling (say, just like contrib/openssl does using
>>>> ResourceOwner).
>>>
>>> Right.  I don't think you even need something resowner like, given that
>>> anything using threads better make it absolutely absolutely impossible
>>> that an error can escape.
>>>
>>
>> True. Still, I wonder if the process can die in a way that would fail to
>> update the counter.
> 
> You'd better make that case a panic restart.
> 
> 
>>>> A better solution might be to start a bgworker managing a connection
>>>> pool and forward the requests to it using IPC (and enforce the thread
>>>> count limit there).
>>>
>>> That doesn't really seem feasible for cases like this - after all, you'd
>>> only use threads to work on individual rows if you wanted to parallelize
>>> relatively granular per-row work or such. Adding cross-process IPC seems
>>> like it'd make that perform badly.
>>>
>>
>> I think that very much depends on how expensive the tasks handled by the
>> threads are. It may still be cheaper than a reasonable IPC, and if you
>> don't create/destroy threads, that also saves quite a bit of time.
> 
> I'm not following. How can you have a pool *and* threads? Those seem to
> be contradictory in PG's architecture? You need full blown IPC with your
> proposal afaict?
> 

My suggestion was to create a bgworker, which would then internally
allocate and manage a pool of threads. It could then open some sort of
IPC (say, as dumb as unix socket). The backends could could then send
requests to it, and it would respond to them. Not sure why/how would
this contradict PG's architecture?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Parallel threads in query

От

Andres Freund

Дата:

01 ноября 2018 г., 22:03:43

On 2018-11-01 19:57:17 +0100, Tomas Vondra wrote:
> >> I think that very much depends on how expensive the tasks handled by the
> >> threads are. It may still be cheaper than a reasonable IPC, and if you
> >> don't create/destroy threads, that also saves quite a bit of time.
> > 
> > I'm not following. How can you have a pool *and* threads? Those seem to
> > be contradictory in PG's architecture? You need full blown IPC with your
> > proposal afaict?
> > 
> 
> My suggestion was to create a bgworker, which would then internally
> allocate and manage a pool of threads. It could then open some sort of
> IPC (say, as dumb as unix socket). The backends could could then send
> requests to it, and it would respond to them. Not sure why/how would
> this contradict PG's architecture?

Because you said "faster than reasonable IPC" - which to me implies that
you don't do full blown IPC. Which using threads in a bgworker is very
strongly implying. What you're proposing strongly implies multiple
context switches just to process a few results. Even before, but
especially after, spectre that's an expensive proposition.

Greetings,

Andres Freund

Re: Parallel threads in query

От

Darafei "Komяpa" Praliaskouski

Дата:

01 ноября 2018 г., 22:17:56

Because you said "faster than reasonable IPC" - which to me implies that
you don't do full blown IPC. Which using threads in a bgworker is very
strongly implying. What you're proposing strongly implies multiple
context switches just to process a few results. Even before, but
especially after, spectre that's an expensive proposition.

To have some idea of what it could be:

a)

PostGIS has ST_ClusterKMeans window function. It collects all geometries passed to it to memory, re-packs to more compact buffer and starts a loop that goes over it several (let's say 10..100) times. Then it spits out all the assigned cluster numbers for each of the input rows.

It's all great when you need to calculate KMeans of 200-50000 rows, but for a million input rows even a hundred passes on a single core are painful.

b)
PostGIS has ST_Subdivide function. It takes a single row of geometry (usually super-large, like a continent or the wholeness of Russia) and splits it into many rows that have more simple shape, by performing a horizontal or vertical split recursively. Since it's a tree traversal, it can be paralleled efficiently, with one task becoming to follow the right subpart of geometry and other - to follow left part of it.

Both seem to be a standard thing for OpenMP, which has compiler support in GCC and clang and MSVC. For an overview how it work, have a look here:

https://web.archive.org/web/20180828151435/https://bisqwit.iki.fi/story/howto/openmp/

So, do I understand correctly that I need to start a parallel worker that does nothing for each thread I launch to consume the parallel worker limit?

--

Darafei Praliaskouski
Support me: http://patreon.com/komzpa

Re: Parallel threads in query

От

Andres Freund

Дата:

01 ноября 2018 г., 22:24:25

Hi,

On 2018-11-01 09:17:56 -1000, Darafei "Komяpa" Praliaskouski wrote:
> So, do I understand correctly that I need to start a parallel worker that
> does nothing for each thread I launch to consume the parallel worker limit?

No, I don't think that'd be reasonable. I think what we're saying is
that there's no way to reasonably use the parallel worker limit as the
limitation for what you're trying to do. You need custom infrastructure.

Greetings,

Andres Freund

Re: Parallel threads in query

От

Tomas Vondra

Дата:

01 ноября 2018 г., 22:32:39

On 11/01/2018 08:03 PM, Andres Freund wrote:
> On 2018-11-01 19:57:17 +0100, Tomas Vondra wrote:
>>>> I think that very much depends on how expensive the tasks handled by the
>>>> threads are. It may still be cheaper than a reasonable IPC, and if you
>>>> don't create/destroy threads, that also saves quite a bit of time.
>>>
>>> I'm not following. How can you have a pool *and* threads? Those seem to
>>> be contradictory in PG's architecture? You need full blown IPC with your
>>> proposal afaict?
>>>
>>
>> My suggestion was to create a bgworker, which would then internally
>> allocate and manage a pool of threads. It could then open some sort of
>> IPC (say, as dumb as unix socket). The backends could could then send
>> requests to it, and it would respond to them. Not sure why/how would
>> this contradict PG's architecture?
> 
> Because you said "faster than reasonable IPC" - which to me implies that
> you don't do full blown IPC. Which using threads in a bgworker is very
> strongly implying. What you're proposing strongly implies multiple
> context switches just to process a few results. Even before, but
> especially after, spectre that's an expensive proposition.
> 

Gah! I meant to wrote "faster with reasonable IPC" - i.e. faster/cheaper
than a solution that would create threads ad-hoc.

My assumption is that the tasks are fairly large, and may take quite a
bit of time to process (say, a couple of seconds?). In which cese the
the extra context switches are not a major issue. But maybe I'm wrong.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services