Re: Issue with pgAgent jobs forever running state after server restart

Поиск
Список
Период
Сортировка
От Dave Page
Тема Re: Issue with pgAgent jobs forever running state after server restart
Дата
Msg-id CA+OCxoxaOzLnQBKZ=R4P=QcdLB08nNZQ3=dCSYTJoAd4JPPxKw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Issue with pgAgent jobs forever running state after server restart  (Bastiaan Olij <bastiaan@basenlily.me>)
Список pgadmin-support
On Sat, Mar 16, 2013 at 11:25 PM, Bastiaan Olij <bastiaan@basenlily.me> wrote:
> Hi Leo,
>
> On 17/03/13 9:37 AM, Paragon Corporation wrote:
>> This has been an issue for a while but has only become a more common issue
>> for us recently and one we can more easily predict.
>>
>> We are running pgAgent on windows, and whenever we restart the server, if a
>> job is in the middle of a run when we restart, it gets stuck in a forever
>> endless running state.
>>
>> To fix the issue, we have to go into the pgagent.pga_job table and get rid
>> of the jobagentid that is in there for the specific job that is stuck.
>>
>> We are running the pgAgent 3.3.0 that is available via Stack Builder.
>>
> When the job starts pgAgent simply updates the status to running. When
> it finishes it updates the status to either failed or completed.
> When you restart in the middle of a job running that job gets
> interrupted but as pgAgent also shuts down it is not able to change the
> status on the job. When it restarts it has all but forgotten about that
> job that was running.

Actually, that's not what should happen. When an agent first runs, it
records it's backend PID in pga_jobagent. Then, when it starts a job
it sets the pga_job.jobagentid column to it's backend PID as well.
Later, when the agent restarts it:

1) Checks the pga_jobagent table for agent PIDs that are thought to be
running, and compares them with the PIDs listed in pg_stat_activity.
Any that don't existing in pg_stat_activity are considered to be
zombies and are recorded in a temp table.

2) It then updates the job log and job step log for any jobs/steps
that were being executed by a zombie agent, setting the status to 'd'
(oddly, for aborted).

3) It then resets the execution info in pga_job, so the job can
execute again on it's next schedule.

4) Finally it clears the entry from the pga_jobagent table.

As far as I'm aware, this has always worked well. The only caveat
being that it will fail of course, if when the agent restarts there
happens to be a new connection with the same backend PID as was
previously used. That means that the zombie detection can conceivably
fail from time to time, but that should be relatively rare. I'm not
sure there's any easy way round that, though maybe there is something
that could be done with the appname GUC that we have now.

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgadmin-support по дате отправления:

Предыдущее
От: Guillaume Lelarge
Дата:
Сообщение: Re: No smallserial data type in version 1.16.1
Следующее
От: "Jan-Peter Seifert"
Дата:
Сообщение: Aw: [pgadmin-support] pgadmin 1.14.0 Ubuntu12.04 "Failed to write to history file!" AND "KATASTROPHE, kann die Logdatei nicht öffnen!"