Обсуждение: Using pgAdmin and pgAgent with Greenplum

Поиск
Список
Период
Сортировка

Using pgAdmin and pgAgent with Greenplum

От
"Roberts, Jon"
Дата:
I am using pgAdmin with Greenplum and generally, it works very well.  It
doesn't show the distribution of tables but that isn't a big deal.

I now need a scheduling solution and pgAgent is the natural choice.  I first
reviewed this and saw that the query that is launched by pgAdmin contains a
correlated subquery which GP doesn't support.  Dave was nice enough to
include a revised (and better performing) SQL statement that works with GP.

Now I move on and actually start a daemon to execute a job.  The job runs
but the status never changes from "Running".

Looking more closely at pgagent.sql (which I only slightly modified to make
it work with GP), I notice my oversight.  There are triggers on three tables
and GP doesn't support triggers.

I then looked at the code and see that in job.cpp, it has embedded SQL
commands which then relies on database triggers.  So short of recompiling
the C++ code, there isn't a way for me to fix it to work with GP.

Ideally, the SQL commands found in job.cpp and pgAgent.cpp would not be
there.  Instead, the C++ code would execute functions.  In other words,
pgAdmin would focus on presentation and put the data logic in the database.

With this solution, it makes it possible to refactor the SQL code for job
scheduling without having to recompile the executable.

This is not only good for GP but also for those DBAs that want to add
columns to the pgagent tables, add SQL hints, add more robust logging, etc
but are unable to do it because the SQL is compiled in C++ rather than in
the database.


Jon

Re: Using pgAdmin and pgAgent with Greenplum

От
Dave Page
Дата:
Roberts, Jon wrote:
> I am using pgAdmin with Greenplum and generally, it works very well.  It
> doesn't show the distribution of tables but that isn't a big deal.
>
> I now need a scheduling solution and pgAgent is the natural choice.  I first
> reviewed this and saw that the query that is launched by pgAdmin contains a
> correlated subquery which GP doesn't support.  Dave was nice enough to
> include a revised (and better performing) SQL statement that works with GP.
>
> Now I move on and actually start a daemon to execute a job.  The job runs
> but the status never changes from "Running".
>
> Looking more closely at pgagent.sql (which I only slightly modified to make
> it work with GP), I notice my oversight.  There are triggers on three tables
> and GP doesn't support triggers.

Urgh. I can see why, but still, urgh :-)

> I then looked at the code and see that in job.cpp, it has embedded SQL
> commands which then relies on database triggers.  So short of recompiling
> the C++ code, there isn't a way for me to fix it to work with GP.
>
> Ideally, the SQL commands found in job.cpp and pgAgent.cpp would not be
> there.  Instead, the C++ code would execute functions.  In other words,
> pgAdmin would focus on presentation and put the data logic in the database.

I think you're mixing up pgAdmin and pgAgent. The code you refer to is
all in pgAgent which doesn't do any presentation. All the pgAgent does
is query the jobs, log the results and cause the trigger to fire to
update the next run date. Yes, that could all be done through a pl/pgsql
API, however that does mean that it's easy to make changes that don't
cause the next run time to be updated (eg. if the admin updates the
tables manually). We could prevent that with appropriate rules, but I
guess GP doesn't support them either?

This mechanism would also make support more difficult as we'd need to
ensure that any time we ask someone to tweak a value that they also
manually make sure the next run date is updated.

In a nutshell, I'm not crazy about such a change unless a good, simple
API can be developed and we can arrange it such that the triggers are
still used for the non-GP case. I'm happy to look at any proposed
patches though.

> With this solution, it makes it possible to refactor the SQL code for job
> scheduling without having to recompile the executable.
>
> This is not only good for GP but also for those DBAs that want to add
> columns to the pgagent tables, add SQL hints, add more robust logging, etc
> but are unable to do it because the SQL is compiled in C++ rather than in
> the database.

We don't want them hacking anything about in there because we won't know
what they've broken when they ask for support.

Regards, Dave.

Re: Using pgAdmin and pgAgent with Greenplum

От
Dave Page
Дата:
Roberts, Jon wrote:
> I've looked at this code all day long and tried many hacks to make it work
> with GP but there isn't a way.
>
> The easiest way I can think of to handle this with the least amount of
> change to the architecture is to add another column to pg_job.  Maybe a
> Boolean called jobcompleted.  Then create a view called vw_pg_job which
> executes the function pga_next_schedule that returns the jobnextrun value.

I'm not sure why you'd need the extra column, but there are some issues
with using a view to replace the nextrun column:

- pgAgent queries each job every minute to see if it needs to run. That
could lead to a lot of cpu being used on that rather nasty pl/pgsql
function.

- When we spec'ed pgAgent, it was decided that if a schedule were
missed, that instance should run immediately on startup. The proposed
change would prevent that behaviour, though personally I'm not sure
that's necessarily a bad thing.

- The 'Run Now' feature in pgAdmin would need to be reworked - perhaps
by adding a flag to pg_job to indicate the job should run once
regardless of scheduling.

The second are more minor issues, but the first doesn't seem at all good
to me.

Regards, Dave.


Re: Using pgAdmin and pgAgent with Greenplum

От
"Roberts, Jon"
Дата:
I found more problems when trying to use GP.  For instance, this update
statement will not work in GP.

UPDATE pgagent.pga_jobsteplog SET jslstatus='d'
WHERE jslid IN SELECT jslid
FROM pga_tmp_zombies z, pgagent.pga_job j, pgagent.pga_joblog l,
pgagent.pga_jobsteplog s
WHERE z.jagpid=j.jobagentid AND j.jobid = l.jlgjobid
AND l.jlgid = s.jsljlgid AND s.jslstatus='r'

This code is found in pgAgent.cpp.

Updates can only happen like this when the updated column is distributed by
the same column as the source.  I created a for loop to get around this for
testing and it took 30 seconds to complete.

So, I think the best solution is to use PostgreSQL as the database server to
hold my jobs.  I was able to use a database link to execute a function in GP
without any problems.  Using a batch step also works but we have to execute
psql with the host, database, and sql command specified in the script for
each step which is error prone.

However, to make pgAgent and pgAdmin work better with my solution, it would
be nice to have an enhancement.  The enhancement could also benefit others
wanting to separate the database server where jobs are maintained from the
target server(s) where sql should be executed.

On the screen where you define a SQL step, add another parameter for Server.
When executing the SQL defined, pgAgent would connect to the Server and then
execute the SQL defined in the database specified in the step.

This solution would require an additional screen to define the servers too.

Is there much demand for allowing a SQL job step to be executed on a remote
server and database?


Jon
> -----Original Message-----
> From: Dave Page [mailto:dpage@postgresql.org]
> Sent: Monday, October 29, 2007 8:53 AM
> To: Roberts, Jon
> Cc: pgadmin-hackers
> Subject: Re: [pgadmin-hackers] Using pgAdmin and pgAgent with Greenplum
>
> Roberts, Jon wrote:
> > I've looked at this code all day long and tried many hacks to make it
> work
> > with GP but there isn't a way.
> >
> > The easiest way I can think of to handle this with the least amount of
> > change to the architecture is to add another column to pg_job.  Maybe a
> > Boolean called jobcompleted.  Then create a view called vw_pg_job which
> > executes the function pga_next_schedule that returns the jobnextrun
> value.
>
> I'm not sure why you'd need the extra column, but there are some issues
> with using a view to replace the nextrun column:
>
> - pgAgent queries each job every minute to see if it needs to run. That
> could lead to a lot of cpu being used on that rather nasty pl/pgsql
> function.
>
> - When we spec'ed pgAgent, it was decided that if a schedule were
> missed, that instance should run immediately on startup. The proposed
> change would prevent that behaviour, though personally I'm not sure
> that's necessarily a bad thing.
>
> - The 'Run Now' feature in pgAdmin would need to be reworked - perhaps
> by adding a flag to pg_job to indicate the job should run once
> regardless of scheduling.
>
> The second are more minor issues, but the first doesn't seem at all good
> to me.
>
> Regards, Dave.

Re: Using pgAdmin and pgAgent with Greenplum

От
Dave Page
Дата:
Roberts, Jon wrote:
> I found more problems when trying to use GP.  For instance, this update
> statement will not work in GP.
>
> UPDATE pgagent.pga_jobsteplog SET jslstatus='d'
> WHERE jslid IN SELECT jslid
> FROM pga_tmp_zombies z, pgagent.pga_job j, pgagent.pga_joblog l,
> pgagent.pga_jobsteplog s
> WHERE z.jagpid=j.jobagentid AND j.jobid = l.jlgjobid
> AND l.jlgid = s.jsljlgid AND s.jslstatus='r'
>
> This code is found in pgAgent.cpp.
>
> Updates can only happen like this when the updated column is distributed by
> the same column as the source.  I created a for loop to get around this for
> testing and it took 30 seconds to complete.

Urgh.

> So, I think the best solution is to use PostgreSQL as the database server to
> hold my jobs.  I was able to use a database link to execute a function in GP
> without any problems.  Using a batch step also works but we have to execute
> psql with the host, database, and sql command specified in the script for
> each step which is error prone.

Yeah.

> However, to make pgAgent and pgAdmin work better with my solution, it would
> be nice to have an enhancement.  The enhancement could also benefit others
> wanting to separate the database server where jobs are maintained from the
> target server(s) where sql should be executed.
>
> On the screen where you define a SQL step, add another parameter for Server.
> When executing the SQL defined, pgAgent would connect to the Server and then
> execute the SQL defined in the database specified in the step.
>
> This solution would require an additional screen to define the servers too.
>
> Is there much demand for allowing a SQL job step to be executed on a remote
> server and database?

I've never heard anyone ask before, but I don't think it's an
unreasonable request. It would be more simple to just add an additional
textbox to the SQL Step to allow a custom connection string to be
specified. If blank it works as now, otherwise it just uses whats there.

Anyone wanna pickup this mod?

/D

Re: Using pgAdmin and pgAgent with Greenplum

От
"Roberts, Jon"
Дата:
>
> > However, to make pgAgent and pgAdmin work better with my solution, it
> would
> > be nice to have an enhancement.  The enhancement could also benefit
> others
> > wanting to separate the database server where jobs are maintained from
> the
> > target server(s) where sql should be executed.
> >
> > On the screen where you define a SQL step, add another parameter for
> Server.
> > When executing the SQL defined, pgAgent would connect to the Server and
> then
> > execute the SQL defined in the database specified in the step.
> >
> > This solution would require an additional screen to define the servers
> too.
> >
> > Is there much demand for allowing a SQL job step to be executed on a
> remote
> > server and database?
>
> I've never heard anyone ask before, but I don't think it's an
> unreasonable request. It would be more simple to just add an additional
> textbox to the SQL Step to allow a custom connection string to be
> specified. If blank it works as now, otherwise it just uses whats there.
>
> Anyone wanna pickup this mod?
>
> /D

There are three problems with having a custom connection string for each
step.  1) If I need to update the connection like change the password, I
have to do it for each step.  2) Developers will see the password in clear
text.  3) It is error prone because I might mistype the connection
information for a step.

I am glad that you are interested in adding this mod and I'm glad that
PostgreSQL works so well for this.  :)


Jon

Re: Using pgAdmin and pgAgent with Greenplum

От
Dave Page
Дата:
Roberts, Jon wrote:
> There are three problems with having a custom connection string for each
> step.  1) If I need to update the connection like change the password, I
> have to do it for each step.  2) Developers will see the password in clear
> text.

I consider those irrelevant because plastered all over the docs will be
something like "Don't use passwords here, put them in pgpass files
instead". Besides, #2 would be the case even if you defined the servers
in a separate table.

> 3) It is error prone because I might mistype the connection
> information for a step.

Copy 'n' paste is your friend :-). But yes, that could be an issue. Not
sure if it's worth the hassle of a whole extra UI for maintaining server
definitions though - I'd guess not to be honest.

Regards, Dave

Re: Using pgAdmin and pgAgent with Greenplum

От
"Roberts, Jon"
Дата:
> Roberts, Jon wrote:
> > There are three problems with having a custom connection string for each
> > step.  1) If I need to update the connection like change the password, I
> > have to do it for each step.  2) Developers will see the password in
> clear
> > text.
>
> I consider those irrelevant because plastered all over the docs will be
> something like "Don't use passwords here, put them in pgpass files
> instead". Besides, #2 would be the case even if you defined the servers
> in a separate table.
>

I was unaware of pgpass.  So you are saying I would create/edit the file to
have hostname:port:database:username:password and then in the step, I would
just reference the hostname?  If so, I completely agree with you and that
would be awesome!

I'm guessing I can use pgpass with dblink too?



Jon

Re: Using pgAdmin and pgAgent with Greenplum

От
Dave Page
Дата:
Roberts, Jon wrote:
>> Roberts, Jon wrote:
>>> There are three problems with having a custom connection string for each
>>> step.  1) If I need to update the connection like change the password, I
>>> have to do it for each step.  2) Developers will see the password in
>> clear
>>> text.
>> I consider those irrelevant because plastered all over the docs will be
>> something like "Don't use passwords here, put them in pgpass files
>> instead". Besides, #2 would be the case even if you defined the servers
>> in a separate table.
>>
>
> I was unaware of pgpass.  So you are saying I would create/edit the file to
> have hostname:port:database:username:password and then in the step, I would
> just reference the hostname?  If so, I completely agree with you and that
> would be awesome!

You need to specify any non-default values... but then you normally
should be doing that for the pgAgent daemon/service as well (unless you
use ident).

> I'm guessing I can use pgpass with dblink too?

Any libpq based application. It's also where pgAdmin stores passwords
you opt to save.

Regards, Dave