Re: proposal: contrib module - generic command scheduler

Поиск
Список
Период
Сортировка
От Pavel Stehule
Тема Re: proposal: contrib module - generic command scheduler
Дата
Msg-id CAFj8pRDumbF8RyR=Rb4Px4PeML-fvYprb43sXGi=hqkixOVqqQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: proposal: contrib module - generic command scheduler  (Craig Ringer <craig@2ndquadrant.com>)
Ответы Re: proposal: contrib module - generic command scheduler  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Список pgsql-hackers


2015-05-13 4:08 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:


On 13 May 2015 at 00:31, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2015-05-12 11:27 GMT+02:00 hubert depesz lubaczewski <depesz@depesz.com>:
On Tue, May 12, 2015 at 09:25:50AM +0200, Pavel Stehule wrote:
> create type scheduled_time as (second int[], minute int[], hour int[], dow
> int[], month int[]);
>  (,"{1,10,20,30,40,50}",,,) .. run every 10 minutes.
>  (,"{5}",,,) .. run once per hour
> Comments, notices?

First, please note that I'm definitely not a hacker, just a user.

One comment that I'd like to make, is that since we're at planning
phase, I think it would be great to add capability to limit number of
executions of given command.
This would allow running things like "at" in unix - run once, at given
time, and that's it.

I would not to store state on this level - so "at" should be implemented on higher level. There is very high number of possible strategies, what can be done with failed tasks - and I would not to open this topic. I believe with proposed scheduler, anybody can simply implement what need in PLpgSQL with dynamic SQL. But on second hand "run once" can be implemented with proposed API too.

That seems reasonable in a v1, so long as there's room to easily extend it without pain to add "at"-like one-shot commands, at-startup commands, etc.

I'd prefer to see a scheduling interface that's a close match for cron's or that leaves room for it - so things like "*/5" for every five minutes, ranges like "Mon-Fri", etc. If there's a way to express similar capabilities more cleanly using PostgreSQL's types and conventions that makes sense, but I'm not sure a composite type of arrays fits that.

I though about it too - but the parser for this cron time will be longer than all other code probably. I see a possibility to write constructors that simplify creating a value of this type. Some like

make_scheduled_time(secs => '*/5', dows => 'Mon-Fri') or make_scheduled_time(at =>'2015-014-05 10:00:0'::timestamp);
 
There are two possible ways - composite with arrays or custom composite. I'll decide later.

There are basic points:

1. don't hold a states, results of commands
2. It execute task immediately in related time window once (from start to next start), when necessary worker is available
3. When command fails, it writes info to log only
4. When command runs too long (over specified timeout), it is killed.
5. When command waits to free worker, write to log
6. When command was not be executed due missing workers (and max_workers > 0), write to log


How do you plan to manage the bgworkers?

I am thinking about one static supervisor, that will hold a calendar in shared memory, that will start dynamic bgworkers for commands per database. The scheduler is enabled in all databases, where the proposed extension is installed.

For working with prototype I am planning to use SPI, but maybe it is not necessary - so commands like VACUUM, CREATE DATABASE, DROP DATABASE can be supported too. But I didn't tested it and I don't know if it is possible or not. It can define new hooks too. So some other extensions can be based on it.
 


In BDR, where we have a similar need to have workers across multiple databases, and where each database contains a list of workers to launch, we have:

* A single static "supervisor" bgworker. In 9.5 this will connect with InvalidOid as the target database so it can only access shared catalogs. In 9.4 this isn't possible in the bgworker API so we have to connect to a dummy database.

* A dynamic background worker for each database in which BDR is enabled, which is launched from the supervisor. We check which DBs are BDR-enabled by (ab)using database security labels and checking pg_shseclabel from the supervisor worker so we only launch bgworkers on BDR-enabled DBs.

* A dynamic background worker for each peer node, launched by the per-database worker based on the contents of that database's bdr.bdr_connections table.


What I suspect you're going to want is:

* A static worker launched by your extension when it starts, which launches per-db workers for each DB in which the scheduler is enabled. You could use a GUC listing scheduler-enabled DBs in postgresql.conf and have an on-reload hook to update it, you don't need to do the security label hack.

* A DB scheduler worker, which looks up the scheduled tasks list, finds the next scheduled event, and sleeps on a long latch timeout until then, resetting it when interrupted. When it reaches the scheduled event it would launch a one-shot BGW_NO_RESTART worker to run the desired PL/PgSQL procedure over the SPI.

* A task runner worker, which gets launched by the db scheduler to actually run a task using the SPI.


Does that match your current thinking?

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: Sequence Access Method WIP
Следующее
От: Jim Nasby
Дата:
Сообщение: Re: proposal: contrib module - generic command scheduler