Обсуждение: What happens if I create new threads from within a postgresql function?

Поиск
Список
Период
Сортировка

What happens if I create new threads from within a postgresql function?

От
Seref Arikan
Дата:
Greetings,
What would happen if I create multiple threads from within a postgresql
function written in C?
I have the opportunity to do parallel processing on binary data, and I need
to create multiple threads to do that.
If I can ensure that all my threads complete their work before I exit my
function, would this cause any trouble ?
I am aware of postgresql's single threaded nature when executing queries,
but is this a limitation for custom multi threaded code use in C based
functions?
I can't see any problems other than my custom spawn threads living beyond
my function's execution and memory/resource allocation issues, but if I can
handle them, should not I be safe?

I believe I've seen someone applying a similar principle to use GPUs with
postgresql, and I'm quite interested in giving this a try, unless I'm
missing something.

Best regards
Seref

Re: What happens if I create new threads from within a postgresql function?

От
Bruce Momjian
Дата:
On Mon, Feb 18, 2013 at 11:10:51AM +0000, Seref Arikan wrote:
> Greetings,
> What would happen if I create multiple threads from within a postgresql
> function written in C?
> I have the opportunity to do parallel processing on binary data, and I need to
> create multiple threads to do that.
> If I can ensure that all my threads complete their work before I exit my
> function, would this cause any trouble ?
> I am aware of postgresql's single threaded nature when executing queries, but
> is this a limitation for custom multi threaded code use in C based functions?
> I can't see any problems other than my custom spawn threads living beyond my
> function's execution and memory/resource allocation issues, but if I can handle
> them, should not I be safe?
>
> I believe I've seen someone applying a similar principle to use GPUs with
> postgresql, and I'm quite interested in giving this a try, unless I'm missing
> something.

I think it would be fine.  I expect to be researching this soon myself:

     http://wiki.postgresql.org/wiki/Parallel_Query_Execution

Let me know how it works out.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Re: What happens if I create new threads from within a postgresql function?

От
Seref Arikan
Дата:
Thanks Bruce,
I too think that it should be fine, as long as I make sure that the spawned
threads do not call back to originating thread and they are properly
terminated once they're finished etc.
Various messages I've seen in the list archives seem to mention that
spawning threads is a bad idea, etc etc. I just could not find a technical
discussion of why this is a bad idea. Maybe I have failed to generate the
correct search terms.
It would be great to know why and when this would be a dangerous thing  to
do.

Best regards
Seref



On Mon, Feb 18, 2013 at 2:36 PM, Bruce Momjian <bruce@momjian.us> wrote:

> On Mon, Feb 18, 2013 at 11:10:51AM +0000, Seref Arikan wrote:
> > Greetings,
> > What would happen if I create multiple threads from within a postgresql
> > function written in C?
> > I have the opportunity to do parallel processing on binary data, and I
> need to
> > create multiple threads to do that.
> > If I can ensure that all my threads complete their work before I exit my
> > function, would this cause any trouble ?
> > I am aware of postgresql's single threaded nature when executing
> queries, but
> > is this a limitation for custom multi threaded code use in C based
> functions?
> > I can't see any problems other than my custom spawn threads living
> beyond my
> > function's execution and memory/resource allocation issues, but if I can
> handle
> > them, should not I be safe?
> >
> > I believe I've seen someone applying a similar principle to use GPUs with
> > postgresql, and I'm quite interested in giving this a try, unless I'm
> missing
> > something.
>
> I think it would be fine.  I expect to be researching this soon myself:
>
>         http://wiki.postgresql.org/wiki/Parallel_Query_Execution
>
> Let me know how it works out.
>
> --
>   Bruce Momjian  <bruce@momjian.us>        http://momjian.us
>   EnterpriseDB                             http://enterprisedb.com
>
>   + It's impossible for everything to be true. +
>

Re: What happens if I create new threads from within a postgresql function?

От
Bruce Momjian
Дата:
On Mon, Feb 18, 2013 at 02:51:13PM +0000, Seref Arikan wrote:
> Thanks Bruce,
> I too think that it should be fine, as long as I make sure that the spawned
> threads do not call back to originating thread and they are properly terminated
> once they're finished etc.
> Various messages I've seen in the list archives seem to mention that spawning
> threads is a bad idea, etc etc. I just could not find a technical discussion of
> why this is a bad idea. Maybe I have failed to generate the correct search
> terms.
> It would be great to know why and when this would be a dangerous thing  to do.

The problem comes with calling Postgres subsystems from multiple
threads, which it doesn't sound like you are doing.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Re: What happens if I create new threads from within a postgresql function?

От
Seref Arikan
Дата:
Thanks,
You are right, it is not what I'm doing, I'm simply calling code that works
on the binary blob using multiple threads.


On Mon, Feb 18, 2013 at 3:02 PM, Bruce Momjian <bruce@momjian.us> wrote:

> On Mon, Feb 18, 2013 at 02:51:13PM +0000, Seref Arikan wrote:
> > Thanks Bruce,
> > I too think that it should be fine, as long as I make sure that the
> spawned
> > threads do not call back to originating thread and they are properly
> terminated
> > once they're finished etc.
> > Various messages I've seen in the list archives seem to mention that
> spawning
> > threads is a bad idea, etc etc. I just could not find a technical
> discussion of
> > why this is a bad idea. Maybe I have failed to generate the correct
> search
> > terms.
> > It would be great to know why and when this would be a dangerous thing
>  to do.
>
> The problem comes with calling Postgres subsystems from multiple
> threads, which it doesn't sound like you are doing.
>
> --
>   Bruce Momjian  <bruce@momjian.us>        http://momjian.us
>   EnterpriseDB                             http://enterprisedb.com
>
>   + It's impossible for everything to be true. +
>

Re: What happens if I create new threads from within a postgresql function?

От
Merlin Moncure
Дата:
On Mon, Feb 18, 2013 at 5:10 AM, Seref Arikan
<serefarikan@kurumsalteknoloji.com> wrote:
> Greetings,
> What would happen if I create multiple threads from within a postgresql
> function written in C?
> I have the opportunity to do parallel processing on binary data, and I need
> to create multiple threads to do that.
> If I can ensure that all my threads complete their work before I exit my
> function, would this cause any trouble ?
> I am aware of postgresql's single threaded nature when executing queries,
> but is this a limitation for custom multi threaded code use in C based
> functions?
> I can't see any problems other than my custom spawn threads living beyond my
> function's execution and memory/resource allocation issues, but if I can
> handle them, should not I be safe?
>
> I believe I've seen someone applying a similar principle to use GPUs with
> postgresql, and I'm quite interested in giving this a try, unless I'm
> missing something.

Some things immediately jump to mind:
*) backend library routines are not multi-thread safe.  Notably, the
SPI interface and the memory allocator, but potentially anything.  So
your spawned threads should avoid calling the backend API.  I don't
even know if it's safe to call malloc.

*) postgres exception handling can burn you, so I'd be stricter than
"before I exit my function"...really, you need to make sure threads
terminate before any potentially exception throwing backend routine
fires, which is basically all of them including palloc memory
allocation and interrupt checking.  So, we must understand that:

While your threads are executing, your query can't be cancelled --
only a hard kill will take the database down.  If you're ok with that
risk, then go for it.  If you're not, then I'd thinking about
sendinging the bytea through a protocol to a threaded processing
server running outside of the database.  More work and slower
(protocol overhead), but much more robust.

merlin

Re: What happens if I create new threads from within a postgresql function?

От
Bruce Momjian
Дата:
On Mon, Feb 18, 2013 at 09:56:22AM -0600, Merlin Moncure wrote:
> On Mon, Feb 18, 2013 at 5:10 AM, Seref Arikan
> <serefarikan@kurumsalteknoloji.com> wrote:
> > Greetings,
> > What would happen if I create multiple threads from within a postgresql
> > function written in C?
> > I have the opportunity to do parallel processing on binary data, and I need
> > to create multiple threads to do that.
> > If I can ensure that all my threads complete their work before I exit my
> > function, would this cause any trouble ?
> > I am aware of postgresql's single threaded nature when executing queries,
> > but is this a limitation for custom multi threaded code use in C based
> > functions?
> > I can't see any problems other than my custom spawn threads living beyond my
> > function's execution and memory/resource allocation issues, but if I can
> > handle them, should not I be safe?
> >
> > I believe I've seen someone applying a similar principle to use GPUs with
> > postgresql, and I'm quite interested in giving this a try, unless I'm
> > missing something.
>
> Some things immediately jump to mind:
> *) backend library routines are not multi-thread safe.  Notably, the
> SPI interface and the memory allocator, but potentially anything.  So
> your spawned threads should avoid calling the backend API.  I don't
> even know if it's safe to call malloc.
>
> *) postgres exception handling can burn you, so I'd be stricter than
> "before I exit my function"...really, you need to make sure threads
> terminate before any potentially exception throwing backend routine
> fires, which is basically all of them including palloc memory
> allocation and interrupt checking.  So, we must understand that:
>
> While your threads are executing, your query can't be cancelled --
> only a hard kill will take the database down.  If you're ok with that
> risk, then go for it.  If you're not, then I'd thinking about
> sendinging the bytea through a protocol to a threaded processing
> server running outside of the database.  More work and slower
> (protocol overhead), but much more robust.

You can see the approach of not calling any PG-specific routines from
theads here:

    http://wiki.postgresql.org/wiki/Parallel_Query_Execution#Approaches

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Re: What happens if I create new threads from within a postgresql function?

От
Atri Sharma
Дата:
Sent from my iPad

On 18-Feb-2013, at 22:27, Bruce Momjian <bruce@momjian.us> wrote:

> On Mon, Feb 18, 2013 at 09:56:22AM -0600, Merlin Moncure wrote:
>> On Mon, Feb 18, 2013 at 5:10 AM, Seref Arikan
>> <serefarikan@kurumsalteknoloji.com> wrote:
>>> Greetings,
>>> What would happen if I create multiple threads from within a postgresql
>>> function written in C?
>>> I have the opportunity to do parallel processing on binary data, and I n=
eed
>>> to create multiple threads to do that.
>>> If I can ensure that all my threads complete their work before I exit my=

>>> function, would this cause any trouble ?
>>> I am aware of postgresql's single threaded nature when executing queries=
,
>>> but is this a limitation for custom multi threaded code use in C based
>>> functions?
>>> I can't see any problems other than my custom spawn threads living beyon=
d my
>>> function's execution and memory/resource allocation issues, but if I can=

>>> handle them, should not I be safe?
>>>=20
>>> I believe I've seen someone applying a similar principle to use GPUs wit=
h
>>> postgresql, and I'm quite interested in giving this a try, unless I'm
>>> missing something.
>>=20
>> Some things immediately jump to mind:
>> *) backend library routines are not multi-thread safe.  Notably, the
>> SPI interface and the memory allocator, but potentially anything.  So
>> your spawned threads should avoid calling the backend API.  I don't
>> even know if it's safe to call malloc.
>>=20
>> *) postgres exception handling can burn you, so I'd be stricter than
>> "before I exit my function"...really, you need to make sure threads
>> terminate before any potentially exception throwing backend routine
>> fires, which is basically all of them including palloc memory
>> allocation and interrupt checking.  So, we must understand that:
>>=20
>> While your threads are executing, your query can't be cancelled --
>> only a hard kill will take the database down.  If you're ok with that
>> risk, then go for it.  If you're not, then I'd thinking about
>> sendinging the bytea through a protocol to a threaded processing
>> server running outside of the database.  More work and slower
>> (protocol overhead), but much more robust.
>=20
> You can see the approach of not calling any PG-specific routines from
> theads here:
>=20
>    http://wiki.postgresql.org/wiki/Parallel_Query_Execution#Approaches
>=20


Is there any way to locally synchronise the threads in my code,and send the r=
equests to the PostgreSQL backend one at a time? Like a waiting queue in my c=
ode?

Regards,

Atri=

Re: What happens if I create new threads from within a postgresql function?

От
Bruce Momjian
Дата:
On Mon, Feb 18, 2013 at 10:33:26PM +0530, Atri Sharma wrote:
> >> While your threads are executing, your query can't be cancelled --
> >> only a hard kill will take the database down.  If you're ok with that
> >> risk, then go for it.  If you're not, then I'd thinking about
> >> sendinging the bytea through a protocol to a threaded processing
> >> server running outside of the database.  More work and slower
> >> (protocol overhead), but much more robust.
> >
> > You can see the approach of not calling any PG-specific routines from
> > theads here:
> >
> >    http://wiki.postgresql.org/wiki/Parallel_Query_Execution#Approaches
> >
>
>
> Is there any way to locally synchronise the threads in my code,and
> send the requests to the PostgreSQL backend one at a time? Like a waiting
> queue in my code?

Is this from the client code?  That is easy from libpq using
asynchronous queries.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Re: What happens if I create new threads from within a postgresql function?

От
Atri Sharma
Дата:
Sent from my iPad

On 18-Feb-2013, at 22:38, Bruce Momjian <bruce@momjian.us> wrote:

> On Mon, Feb 18, 2013 at 10:33:26PM +0530, Atri Sharma wrote:
>>>> While your threads are executing, your query can't be cancelled --
>>>> only a hard kill will take the database down.  If you're ok with that
>>>> risk, then go for it.  If you're not, then I'd thinking about
>>>> sendinging the bytea through a protocol to a threaded processing
>>>> server running outside of the database.  More work and slower
>>>> (protocol overhead), but much more robust.
>>>=20
>>> You can see the approach of not calling any PG-specific routines from
>>> theads here:
>>>=20
>>>   http://wiki.postgresql.org/wiki/Parallel_Query_Execution#Approaches
>>=20
>>=20
>> Is there any way to locally synchronise the threads in my code,and
>> send the requests to the PostgreSQL backend one at a time? Like a waiting=

>> queue in my code?
>=20
> Is this from the client code?  That is easy from libpq using
> asynchronous queries.
>=20
>=20

Actually, I haven't yet faced any such scenario.I was just thinking of all t=
he possibilities that can happen in this case.Hehehe

If we want to do this from a function in PostgreSQL itself, would a local sy=
nchronisation mechanism work?

Regards,

Atri=

Re: What happens if I create new threads from within a postgresql function?

От
Bruce Momjian
Дата:
On Mon, Feb 18, 2013 at 10:46:39PM +0530, Atri Sharma wrote:
>
>
> Sent from my iPad
>
> On 18-Feb-2013, at 22:38, Bruce Momjian <bruce@momjian.us> wrote:
>
> > On Mon, Feb 18, 2013 at 10:33:26PM +0530, Atri Sharma wrote:
> >>>> While your threads are executing, your query can't be cancelled --
> >>>> only a hard kill will take the database down.  If you're ok with that
> >>>> risk, then go for it.  If you're not, then I'd thinking about
> >>>> sendinging the bytea through a protocol to a threaded processing
> >>>> server running outside of the database.  More work and slower
> >>>> (protocol overhead), but much more robust.
> >>>
> >>> You can see the approach of not calling any PG-specific routines from
> >>> theads here:
> >>>
> >>>   http://wiki.postgresql.org/wiki/Parallel_Query_Execution#Approaches
> >>
> >>
> >> Is there any way to locally synchronise the threads in my code,and
> >> send the requests to the PostgreSQL backend one at a time? Like a waiting
> >> queue in my code?
> >
> > Is this from the client code?  That is easy from libpq using
> > asynchronous queries.
> >
> >
>
> Actually, I haven't yet faced any such scenario.I was just thinking of all the possibilities that can happen in this
case.Hehehe
>
> If we want to do this from a function in PostgreSQL itself, would a local synchronisation mechanism work?

So your server-side function wants to start a new backend --- yeah, that
works.   /contrib/dblink does exactly that.  Calling it from threads
should have the same limitations you would normally have from libpq.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Re: What happens if I create new threads from within a postgresql function?

От
Atri Sharma
Дата:
Sent from my iPad

On 18-Feb-2013, at 22:58, Bruce Momjian <bruce@momjian.us> wrote:

> On Mon, Feb 18, 2013 at 10:46:39PM +0530, Atri Sharma wrote:
>>=20
>>=20
>> Sent from my iPad
>>=20
>> On 18-Feb-2013, at 22:38, Bruce Momjian <bruce@momjian.us> wrote:
>>=20
>>> On Mon, Feb 18, 2013 at 10:33:26PM +0530, Atri Sharma wrote:
>>>>>> While your threads are executing, your query can't be cancelled --
>>>>>> only a hard kill will take the database down.  If you're ok with that=

>>>>>> risk, then go for it.  If you're not, then I'd thinking about
>>>>>> sendinging the bytea through a protocol to a threaded processing
>>>>>> server running outside of the database.  More work and slower
>>>>>> (protocol overhead), but much more robust.
>>>>>=20
>>>>> You can see the approach of not calling any PG-specific routines from
>>>>> theads here:
>>>>>=20
>>>>>  http://wiki.postgresql.org/wiki/Parallel_Query_Execution#Approaches
>>>>=20
>>>>=20
>>>> Is there any way to locally synchronise the threads in my code,and
>>>> send the requests to the PostgreSQL backend one at a time? Like a waiti=
ng
>>>> queue in my code?
>>>=20
>>> Is this from the client code?  That is easy from libpq using
>>> asynchronous queries.
>>>=20
>>>=20
>>=20
>> Actually, I haven't yet faced any such scenario.I was just thinking of al=
l the possibilities that can happen in this case.Hehehe
>>=20
>> If we want to do this from a function in PostgreSQL itself, would a local=
 synchronisation mechanism work?
>=20
> So your server-side function wants to start a new backend --- yeah, that
> works.   /contrib/dblink does exactly that.  Calling it from threads
> should have the same limitations you would normally have from libpq.
>=20
>=20

Got that,thanks a ton!

I will see the dblink code.

BTW, is there no way to introduce a general synchronisation mechanism for se=
rver side code? A kind of construct which would be the standard way to manag=
e synchronisation ? I was thinking of something on the lines of a monitor.

Atri=

Re: What happens if I create new threads from within a postgresql function?

От
Bruce Momjian
Дата:
On Mon, Feb 18, 2013 at 11:25:44PM +0530, Atri Sharma wrote:
> >>>> Is there any way to locally synchronise the threads in my code,and
> >>>> send the requests to the PostgreSQL backend one at a time? Like a waiting
> >>>> queue in my code?
> >>>
> >>> Is this from the client code?  That is easy from libpq using
> >>> asynchronous queries.
> >>>
> >>>
> >>
> >> Actually, I haven't yet faced any such scenario.I was just thinking of all the possibilities that can happen in
thiscase.Hehehe 
> >>
> >> If we want to do this from a function in PostgreSQL itself, would a local synchronisation mechanism work?
> >
> > So your server-side function wants to start a new backend --- yeah, that
> > works.   /contrib/dblink does exactly that.  Calling it from threads
> > should have the same limitations you would normally have from libpq.
> >
> >
>
> Got that,thanks a ton!
>
> I will see the dblink code.
>
> BTW, is there no way to introduce a general synchronisation mechanism for server side code? A kind of construct which
wouldbe the standard way to manage synchronisation ? I was thinking of something on the lines of a monitor. 

You would use the standard methods, semaphores for processes, thread
locks for threads.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Re: What happens if I create new threads from within a postgresql function?

От
Atri Sharma
Дата:
Sent from my iPad

On 18-Feb-2013, at 23:39, Bruce Momjian <bruce@momjian.us> wrote:

> On Mon, Feb 18, 2013 at 11:25:44PM +0530, Atri Sharma wrote:
>>>>>> Is there any way to locally synchronise the threads in my code,and
>>>>>> send the requests to the PostgreSQL backend one at a time? Like a wai=
ting
>>>>>> queue in my code?
>>>>>=20
>>>>> Is this from the client code?  That is easy from libpq using
>>>>> asynchronous queries.
>>>>>=20
>>>>>=20
>>>>=20
>>>> Actually, I haven't yet faced any such scenario.I was just thinking of a=
ll the possibilities that can happen in this case.Hehehe
>>>>=20
>>>> If we want to do this from a function in PostgreSQL itself, would a loc=
al synchronisation mechanism work?
>>>=20
>>> So your server-side function wants to start a new backend --- yeah, that=

>>> works.   /contrib/dblink does exactly that.  Calling it from threads
>>> should have the same limitations you would normally have from libpq.
>>>=20
>>>=20
>>=20
>> Got that,thanks a ton!
>>=20
>> I will see the dblink code.
>>=20
>> BTW, is there no way to introduce a general synchronisation mechanism for=
 server side code? A kind of construct which would be the standard way to ma=
nage synchronisation ? I was thinking of something on the lines of a monitor=
.
>=20
> You would use the standard methods, semaphores for processes, thread
> locks for threads.
>=20

I will try it out.Thanks a ton!

Regards,

Atri=

Re: What happens if I create new threads from within a postgresql function?

От
Seref Arikan
Дата:
Hi Merlin,
My plan is exactly what you've suggested, sending bytea to an external
server. The networking library I'm using uses threads, and this is where I
am creating threads.


On Mon, Feb 18, 2013 at 3:56 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

> On Mon, Feb 18, 2013 at 5:10 AM, Seref Arikan
> <serefarikan@kurumsalteknoloji.com> wrote:
> > Greetings,
> > What would happen if I create multiple threads from within a postgresql
> > function written in C?
> > I have the opportunity to do parallel processing on binary data, and I
> need
> > to create multiple threads to do that.
> > If I can ensure that all my threads complete their work before I exit my
> > function, would this cause any trouble ?
> > I am aware of postgresql's single threaded nature when executing queries,
> > but is this a limitation for custom multi threaded code use in C based
> > functions?
> > I can't see any problems other than my custom spawn threads living
> beyond my
> > function's execution and memory/resource allocation issues, but if I can
> > handle them, should not I be safe?
> >
> > I believe I've seen someone applying a similar principle to use GPUs with
> > postgresql, and I'm quite interested in giving this a try, unless I'm
> > missing something.
>
> Some things immediately jump to mind:
> *) backend library routines are not multi-thread safe.  Notably, the
> SPI interface and the memory allocator, but potentially anything.  So
> your spawned threads should avoid calling the backend API.  I don't
> even know if it's safe to call malloc.
>
> *) postgres exception handling can burn you, so I'd be stricter than
> "before I exit my function"...really, you need to make sure threads
> terminate before any potentially exception throwing backend routine
> fires, which is basically all of them including palloc memory
> allocation and interrupt checking.  So, we must understand that:
>
> While your threads are executing, your query can't be cancelled --
> only a hard kill will take the database down.  If you're ok with that
> risk, then go for it.  If you're not, then I'd thinking about
> sendinging the bytea through a protocol to a threaded processing
> server running outside of the database.  More work and slower
> (protocol overhead), but much more robust.
>
> merlin
>

Re: What happens if I create new threads from within a postgresql function?

От
Merlin Moncure
Дата:
On Mon, Feb 18, 2013 at 1:59 PM, Seref Arikan
<serefarikan@kurumsalteknoloji.com> wrote:
> Hi Merlin,
> My plan is exactly what you've suggested, sending bytea to an external
> server. The networking library I'm using uses threads, and this is where I
> am creating threads.

Well, TBH, I find that odd.  I know some network libraries use threads
instead of a more asynchronous model.  I can't speak to the
particulars because I don't have them, but my gut is the path you're
going on is fraught with danger.  ISTM you're better off doing socket
connection through standard socket calls.

merlin