On Tue, Jul 7, 2015 at 12:02 AM, Josh Berkus <josh@agliodbs.com> wrote:
> On 07/06/2015 10:26 AM, Josh Berkus wrote:
>> All,
>>
>> On the same server with the SQL errors (sequences are cleaned up now),
>> we're having pgagent start all jobs in the "r" state, log the first step
>> in each job in the "r" state, and then hang forever (or, at least for
>> three days). At this point, 8 different jobs are in "r" state.
>
> ... as an update, what it appears to be doing is executing the first
> step of each job, successfully. Then, for some reason, it attempts to
> destroy that jobthread after just the first step. I say "attempts"
> because that child backend is still running, and none of pga_job,
> pga_joblog, or pga_jobsteplog get updated.
>
> I've looked in jobs.cpp and pgAgent.cpp, and I can't figure out what
> would cause it to stop after the first step. All of the steps I've
> looked at are regular queries, and can be successfully executed by hand.
Are you able to get a stacktrace from a running instance that's hung?
Also, does setting the logging level to debug reveal anything useful
in the filesystem logs? (pgagent -l2 ....)
--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company