Обсуждение: pgAgent crashes on failed connection

Поиск
Список
Период
Сортировка

pgAgent crashes on failed connection

От
Heikki Linnakangas
Дата:
I created 100 identical pgagent jobs, with one step that simply does
"SELECT pg_sleep(10)". I then forced them all to run immediately, with
"UPDATE pgagent.pga_job SET jobnextrun=now();". pgagent crashed.

What happened is that the when all those jobs are launched at the same
time, the server ran into the max_connections limit, and pgagent didn't
handle that too well. JobThread::JobThread constructor does not check
for NULL result from DBConn::Get(), and passes a NULL connection to
Job::Job, which tries to reference it, leading to a segfault.

I propose the attached patch.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Вложения

Re: pgAgent crashes on failed connection

От
Merlin Moncure
Дата:
On Thu, Aug 4, 2011 at 2:19 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> I created 100 identical pgagent jobs, with one step that simply does "SELECT
> pg_sleep(10)". I then forced them all to run immediately, with "UPDATE
> pgagent.pga_job SET jobnextrun=now();". pgagent crashed.
>
> What happened is that the when all those jobs are launched at the same time,
> the server ran into the max_connections limit, and pgagent didn't handle
> that too well. JobThread::JobThread constructor does not check for NULL
> result from DBConn::Get(), and passes a NULL connection to Job::Job, which
> tries to reference it, leading to a segfault.
>
> I propose the attached patch.

hm, in the event that happens, is that logged in the client somehow?
wouldn't you want to throw an exception or something like that?

merlin

Re: pgAgent crashes on failed connection

От
Dave Page
Дата:
On Thu, Aug 4, 2011 at 8:30 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Thu, Aug 4, 2011 at 2:19 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> I created 100 identical pgagent jobs, with one step that simply does "SELECT
>> pg_sleep(10)". I then forced them all to run immediately, with "UPDATE
>> pgagent.pga_job SET jobnextrun=now();". pgagent crashed.
>>
>> What happened is that the when all those jobs are launched at the same time,
>> the server ran into the max_connections limit, and pgagent didn't handle
>> that too well. JobThread::JobThread constructor does not check for NULL
>> result from DBConn::Get(), and passes a NULL connection to Job::Job, which
>> tries to reference it, leading to a segfault.
>>
>> I propose the attached patch.
>
> hm, in the event that happens, is that logged in the client somehow?
> wouldn't you want to throw an exception or something like that?

I think the most straightforward way to handle this is to dump an
error into pgagent.pga_joblog when deleting the thread. Might be a
little ugly to pass the original error message back rather than a
generic one though. Can you take a look Heikki?

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pgAgent crashes on failed connection

От
Heikki Linnakangas
Дата:
(sorry for the late reply, this fell through the cracks..)

On 10.08.2011 11:48, Dave Page wrote:
> On Thu, Aug 4, 2011 at 8:30 PM, Merlin Moncure<mmoncure@gmail.com>  wrote:
>> On Thu, Aug 4, 2011 at 2:19 PM, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com>  wrote:
>>> I created 100 identical pgagent jobs, with one step that simply does "SELECT
>>> pg_sleep(10)". I then forced them all to run immediately, with "UPDATE
>>> pgagent.pga_job SET jobnextrun=now();". pgagent crashed.
>>>
>>> What happened is that the when all those jobs are launched at the same time,
>>> the server ran into the max_connections limit, and pgagent didn't handle
>>> that too well. JobThread::JobThread constructor does not check for NULL
>>> result from DBConn::Get(), and passes a NULL connection to Job::Job, which
>>> tries to reference it, leading to a segfault.
>>>
>>> I propose the attached patch.
>>
>> hm, in the event that happens, is that logged in the client somehow?
>> wouldn't you want to throw an exception or something like that?
>
> I think the most straightforward way to handle this is to dump an
> error into pgagent.pga_joblog when deleting the thread. Might be a
> little ugly to pass the original error message back rather than a
> generic one though. Can you take a look Heikki?

You mean something like the attached? Works for me, but inserting an
entry in joblog for each failed attempt might create a lot of entries
there, if the problem persists.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Вложения

Re: pgAgent crashes on failed connection

От
Merlin Moncure
Дата:
On Wed, Sep 28, 2011 at 2:16 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> (sorry for the late reply, this fell through the cracks..)
>
> On 10.08.2011 11:48, Dave Page wrote:
>>
>> On Thu, Aug 4, 2011 at 8:30 PM, Merlin Moncure<mmoncure@gmail.com>  wrote:
>>>
>>> On Thu, Aug 4, 2011 at 2:19 PM, Heikki Linnakangas
>>> <heikki.linnakangas@enterprisedb.com>  wrote:
>>>>
>>>> I created 100 identical pgagent jobs, with one step that simply does
>>>> "SELECT
>>>> pg_sleep(10)". I then forced them all to run immediately, with "UPDATE
>>>> pgagent.pga_job SET jobnextrun=now();". pgagent crashed.
>>>>
>>>> What happened is that the when all those jobs are launched at the same
>>>> time,
>>>> the server ran into the max_connections limit, and pgagent didn't handle
>>>> that too well. JobThread::JobThread constructor does not check for NULL
>>>> result from DBConn::Get(), and passes a NULL connection to Job::Job,
>>>> which
>>>> tries to reference it, leading to a segfault.
>>>>
>>>> I propose the attached patch.
>>>
>>> hm, in the event that happens, is that logged in the client somehow?
>>> wouldn't you want to throw an exception or something like that?
>>
>> I think the most straightforward way to handle this is to dump an
>> error into pgagent.pga_joblog when deleting the thread. Might be a
>> little ugly to pass the original error message back rather than a
>> generic one though. Can you take a look Heikki?
>
> You mean something like the attached? Works for me, but inserting an entry
> in joblog for each failed attempt might create a lot of entries there, if
> the problem persists.

Is it correct behavior to throw unhandled sql exception if the logging
query fails?

merlin

Re: pgAgent crashes on failed connection

От
Heikki Linnakangas
Дата:
On 28.09.2011 20:24, Merlin Moncure wrote:
> On Wed, Sep 28, 2011 at 2:16 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com>  wrote:
>> (sorry for the late reply, this fell through the cracks..)
>>
>> On 10.08.2011 11:48, Dave Page wrote:
>>>
>>> On Thu, Aug 4, 2011 at 8:30 PM, Merlin Moncure<mmoncure@gmail.com>    wrote:
>>>>
>>>> On Thu, Aug 4, 2011 at 2:19 PM, Heikki Linnakangas
>>>> <heikki.linnakangas@enterprisedb.com>    wrote:
>>>>>
>>>>> I created 100 identical pgagent jobs, with one step that simply does
>>>>> "SELECT
>>>>> pg_sleep(10)". I then forced them all to run immediately, with "UPDATE
>>>>> pgagent.pga_job SET jobnextrun=now();". pgagent crashed.
>>>>>
>>>>> What happened is that the when all those jobs are launched at the same
>>>>> time,
>>>>> the server ran into the max_connections limit, and pgagent didn't handle
>>>>> that too well. JobThread::JobThread constructor does not check for NULL
>>>>> result from DBConn::Get(), and passes a NULL connection to Job::Job,
>>>>> which
>>>>> tries to reference it, leading to a segfault.
>>>>>
>>>>> I propose the attached patch.
>>>>
>>>> hm, in the event that happens, is that logged in the client somehow?
>>>> wouldn't you want to throw an exception or something like that?
>>>
>>> I think the most straightforward way to handle this is to dump an
>>> error into pgagent.pga_joblog when deleting the thread. Might be a
>>> little ugly to pass the original error message back rather than a
>>> generic one though. Can you take a look Heikki?
>>
>> You mean something like the attached? Works for me, but inserting an entry
>> in joblog for each failed attempt might create a lot of entries there, if
>> the problem persists.
>
> Is it correct behavior to throw unhandled sql exception if the logging
> query fails?

Sorry, I didn't understand that. I don't see any exceptions been thrown.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: pgAgent crashes on failed connection

От
Merlin Moncure
Дата:
On Wed, Sep 28, 2011 at 3:32 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 28.09.2011 20:24, Merlin Moncure wrote:
>>
>> On Wed, Sep 28, 2011 at 2:16 PM, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com>  wrote:
>>>
>>> (sorry for the late reply, this fell through the cracks..)
>>>
>>> On 10.08.2011 11:48, Dave Page wrote:
>>>>
>>>> On Thu, Aug 4, 2011 at 8:30 PM, Merlin Moncure<mmoncure@gmail.com>
>>>>  wrote:
>>>>>
>>>>> On Thu, Aug 4, 2011 at 2:19 PM, Heikki Linnakangas
>>>>> <heikki.linnakangas@enterprisedb.com>    wrote:
>>>>>>
>>>>>> I created 100 identical pgagent jobs, with one step that simply does
>>>>>> "SELECT
>>>>>> pg_sleep(10)". I then forced them all to run immediately, with "UPDATE
>>>>>> pgagent.pga_job SET jobnextrun=now();". pgagent crashed.
>>>>>>
>>>>>> What happened is that the when all those jobs are launched at the same
>>>>>> time,
>>>>>> the server ran into the max_connections limit, and pgagent didn't
>>>>>> handle
>>>>>> that too well. JobThread::JobThread constructor does not check for
>>>>>> NULL
>>>>>> result from DBConn::Get(), and passes a NULL connection to Job::Job,
>>>>>> which
>>>>>> tries to reference it, leading to a segfault.
>>>>>>
>>>>>> I propose the attached patch.
>>>>>
>>>>> hm, in the event that happens, is that logged in the client somehow?
>>>>> wouldn't you want to throw an exception or something like that?
>>>>
>>>> I think the most straightforward way to handle this is to dump an
>>>> error into pgagent.pga_joblog when deleting the thread. Might be a
>>>> little ugly to pass the original error message back rather than a
>>>> generic one though. Can you take a look Heikki?
>>>
>>> You mean something like the attached? Works for me, but inserting an
>>> entry
>>> in joblog for each failed attempt might create a lot of entries there, if
>>> the problem persists.
>>
>> Is it correct behavior to throw unhandled sql exception if the logging
>> query fails?
>
> Sorry, I didn't understand that. I don't see any exceptions been thrown.

whoop! never mind -- I assumed conn->Execute could throw an exception!

merlin

Re: pgAgent crashes on failed connection

От
Dave Page
Дата:
On Wed, Sep 28, 2011 at 8:16 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> (sorry for the late reply, this fell through the cracks..)
>
> On 10.08.2011 11:48, Dave Page wrote:
>>
>> On Thu, Aug 4, 2011 at 8:30 PM, Merlin Moncure<mmoncure@gmail.com>  wrote:
>>>
>>> On Thu, Aug 4, 2011 at 2:19 PM, Heikki Linnakangas
>>> <heikki.linnakangas@enterprisedb.com>  wrote:
>>>>
>>>> I created 100 identical pgagent jobs, with one step that simply does
>>>> "SELECT
>>>> pg_sleep(10)". I then forced them all to run immediately, with "UPDATE
>>>> pgagent.pga_job SET jobnextrun=now();". pgagent crashed.
>>>>
>>>> What happened is that the when all those jobs are launched at the same
>>>> time,
>>>> the server ran into the max_connections limit, and pgagent didn't handle
>>>> that too well. JobThread::JobThread constructor does not check for NULL
>>>> result from DBConn::Get(), and passes a NULL connection to Job::Job,
>>>> which
>>>> tries to reference it, leading to a segfault.
>>>>
>>>> I propose the attached patch.
>>>
>>> hm, in the event that happens, is that logged in the client somehow?
>>> wouldn't you want to throw an exception or something like that?
>>
>> I think the most straightforward way to handle this is to dump an
>> error into pgagent.pga_joblog when deleting the thread. Might be a
>> little ugly to pass the original error message back rather than a
>> generic one though. Can you take a look Heikki?
>
> You mean something like the attached? Works for me, but inserting an entry
> in joblog for each failed attempt might create a lot of entries there, if
> the problem persists.

Yes, like that :-). Thanks - patch applied.

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company