Обсуждение: background triggers?

Поиск
Список
Период
Сортировка

background triggers?

От
Sim Zacks
Дата:
Is there any way to write a statement trigger that runs in the
background? In my testing, when I have an After Insert For Each
Statement trigger, the function call does not end until the trigger is
finished processing.

What I would like to do, and please let me know if there is a better way
to do this, is to have an after event statement trigger run in a
separate process so the function call ends when it ends and the
statement trigger gets executed on its own time.

The use that I have for this at the moment, and I can think of many
other uses, is that I want to populate a statistics table each time that
a table is updated. But the code to populate the table takes 10 seconds
to run. I don't want the user to have to wait 10 seconds to add a record.

Re: background triggers?

От
Terry Lee Tucker
Дата:
On Tuesday 23 May 2006 11:19 am, Sim Zacks <sim@compulab.co.il> thus
communicated:
--> Is there any way to write a statement trigger that runs in the
--> background? In my testing, when I have an After Insert For Each
--> Statement trigger, the function call does not end until the trigger is
--> finished processing.
-->
--> What I would like to do, and please let me know if there is a better way
--> to do this, is to have an after event statement trigger run in a
--> separate process so the function call ends when it ends and the
--> statement trigger gets executed on its own time.
-->
--> The use that I have for this at the moment, and I can think of many
--> other uses, is that I want to populate a statistics table each time that
--> a table is updated. But the code to populate the table takes 10 seconds
--> to run. I don't want the user to have to wait 10 seconds to add a record.
-->
--> ---------------------------(end of broadcast)---------------------------
--> TIP 6: explain analyze is your friend
-->

How about notice processing. See Asynchronous Notification in the docs. You
can have your trigger make the notification and you can process execute the
statistics function when the notice arrives and you could even do it in the
background.

Re: background triggers?

От
Chris Browne
Дата:
Sim Zacks <sim@compulab.co.il> writes:
> Is there any way to write a statement trigger that runs in the
> background? In my testing, when I have an After Insert For Each
> Statement trigger, the function call does not end until the trigger is
> finished processing.
>
> What I would like to do, and please let me know if there is a better
> way to do this, is to have an after event statement trigger run in a
> separate process so the function call ends when it ends and the
> statement trigger gets executed on its own time.
>
> The use that I have for this at the moment, and I can think of many
> other uses, is that I want to populate a statistics table each time
> that a table is updated. But the code to populate the table takes 10
> seconds to run. I don't want the user to have to wait 10 seconds to
> add a record.

This seems a case for using NOTIFY/LISTEN.

- You have a process connected to the database that runs LISTEN,
  causing it to listen for a particular message.

  LISTEN regen_statistics;

- Your trigger submits a notification:

  NOTIFY regen_statistics;

The notification doesn't get committed to internal table
pg_catalog.pg_listener until the transaction doing the writing
completes its COMMIT, so the listener won't "hear" anything until
then...

Per the docs:

  The method a frontend application must use to detect notify events
  depends on which Postgres application programming interface it
  uses. With the basic libpq library, the application issues LISTEN as
  an ordinary SQL command, and then must periodically call the routine
  PQnotifies to find out whether any notify events have been
  received. Other interfaces such as libpgtcl provide higher-level
  methods for handling notify events; indeed, with libpgtcl the
  application programmer should not even issue LISTEN or UNLISTEN
  directly. See the documentation for the library you are using for
  more details.
--
select 'cbbrowne' || '@' || 'ntlug.org';
http://cbbrowne.com/info/multiplexor.html
"I am a bomb technician. If you see me running, try to keep up..."

Re: background triggers?

От
Rafal Pietrak
Дата:
On Tue, 2006-05-23 at 15:56 +0000, Chris Browne wrote:
> > The use that I have for this at the moment, and I can think of many
> > other uses, is that I want to populate a statistics table each time
> > that a table is updated. But the code to populate the table takes 10
> > seconds to run. I don't want the user to have to wait 10 seconds to
> > add a record.
>
> This seems a case for using NOTIFY/LISTEN.
>
> - You have a process connected to the database that runs LISTEN,
>   causing it to listen for a particular message.
>
>   LISTEN regen_statistics;
>
> - Your trigger submits a notification:
>
>   NOTIFY regen_statistics;

Some time ago I fell into quite similair problem. But LISTEN/NOTIFY was
not a solution.

In that case I needed to *disconnect* and never bother about the outcome
of a long running background trigger.

So if I may re-phrase the question: "is there a way to have a trigger,
that, when launched, can check if it's already running in backgroung for
some other INSERT, return imediately if so, but turn into background for
a long-lasting job if not".

The functionality required is "lazy" update of agregate tables, which
need not be 100% acurate, but should be kept in-sync whenever possible.

-R


Re: background triggers?

От
Kenneth Downs
Дата:
Rafal Pietrak wrote:

>So if I may re-phrase the question: "is there a way to have a trigger,
>that, when launched, can check if it's already running in backgroung for
>some other INSERT, return imediately if so, but turn into background for
>a long-lasting job if not".
>
>
>
Rafal, I'm wondering why you want to do this.  You may be "fighting the
framework".

If you are trying to do something that is totally unsupported, it is
probably for a pretty good reason, usually dealing with security or data
loss.  You can probably get what you want by supported methods, but it
may require looking at the problem in a different way.

What is it you are trying to accomplish?  Is it just performance?

Вложения

Re: background triggers?

От
"Dawid Kuroczko"
Дата:
On 5/23/06, Rafal Pietrak <rafal@zorro.isa-geek.com> wrote:
> On Tue, 2006-05-23 at 15:56 +0000, Chris Browne wrote:
> > > The use that I have for this at the moment, and I can think of many
> > > other uses, is that I want to populate a statistics table each time
> > > that a table is updated. But the code to populate the table takes 10
> > > seconds to run. I don't want the user to have to wait 10 seconds to
> > > add a record.
> >
> > This seems a case for using NOTIFY/LISTEN.
> >
> > - You have a process connected to the database that runs LISTEN,
> >   causing it to listen for a particular message.
> >
> >   LISTEN regen_statistics;
> >
> > - Your trigger submits a notification:
> >
> >   NOTIFY regen_statistics;
>
> Some time ago I fell into quite similair problem. But LISTEN/NOTIFY was
> not a solution.
>
> In that case I needed to *disconnect* and never bother about the outcome
> of a long running background trigger.

The idea is that you *disconnect* and you have a daemon running at the
server side, which will handle LISTEN efficiently.  Daemon can be quickly
written in perl, and it can use select(2) call to listen for incoming notifies.
Fast, efficient and powerful.

> So if I may re-phrase the question: "is there a way to have a trigger,
> that, when launched, can check if it's already running in backgroung for
> some other INSERT, return imediately if so, but turn into background for
> a long-lasting job if not".

It is also possible.  You probably want to use locks checking in your
trigger (I'm writing from memory, so forgive me syntax erros, if any).
a trigger might be, say:

CREATE OR REPLACE FUNCTION foo_trigger() RETURNS trigger AS $$
    BEGIN
        SELECT key FROM foo_stats WHERE key = NEW.key FOR UPDATE NOWAIT;
        -- this will either lock the row with "key" or return
immediately, if it's been locked:
        UPDATE foo_stats SET count=(SELECT count(*) FROM foo WHERE
key=NEW.key) WHERE key=NEW.key;
        RETURN NEW;
    EXCEPTION
        WHEN lockbusyorsomething THEN RETURN NEW;
    END;
$$ LANGUAGE PLpgSQL;

Most likely there are better ways to accomplish your goal.

   Regards,
      Dawid

Re: background triggers?

От
"Florian G. Pflug"
Дата:
Sim Zacks wrote:
> Is there any way to write a statement trigger that runs in the
> background? In my testing, when I have an After Insert For Each
> Statement trigger, the function call does not end until the trigger is
> finished processing.
>
> What I would like to do, and please let me know if there is a better way
> to do this, is to have an after event statement trigger run in a
> separate process so the function call ends when it ends and the
> statement trigger gets executed on its own time.
Create some queue table, and let the after-statement trigger insert a
record into the queue.

Some daemon, or a cron-job, can then periodically check the queue, and
perform the action you want to do asynchronously.

If you don't want to use polling to find new queue entries, you can use
notify/listen to inform the daemon of new queue entries.

greetings, Florian Pflug


Re: background triggers?

От
"Florian G. Pflug"
Дата:
Rafal Pietrak wrote:
> On Tue, 2006-05-23 at 15:56 +0000, Chris Browne wrote:
>>> The use that I have for this at the moment, and I can think of many
>>> other uses, is that I want to populate a statistics table each time
>>> that a table is updated. But the code to populate the table takes 10
>>> seconds to run. I don't want the user to have to wait 10 seconds to
>>> add a record.
>> This seems a case for using NOTIFY/LISTEN.
>>
>> - You have a process connected to the database that runs LISTEN,
>>   causing it to listen for a particular message.
>>
>>   LISTEN regen_statistics;
>>
>> - Your trigger submits a notification:
>>
>>   NOTIFY regen_statistics;
>
> Some time ago I fell into quite similair problem. But LISTEN/NOTIFY was
> not a solution.
>
> In that case I needed to *disconnect* and never bother about the outcome
> of a long running background trigger.
You can "disconnect" after doing NOTIFY, as long as you commit the transaction
you called NOTIFY in. Otherwise your notification gets rolles back, just like
your other database updates. But since the updates never happened then, your
aggregate table won't need rebuilding in that case either...

> So if I may re-phrase the question: "is there a way to have a trigger,
> that, when launched, can check if it's already running in backgroung for
> some other INSERT, return imediately if so, but turn into background for
> a long-lasting job if not".
Not easily, because according the transaction isolation rules, one transaction
has not way of finding out what another concurrent transaction might be doing.

But your background daemon can check for this. Before starting a "aggregate run",
it'd need store the currently-visible transaction-ids. If a new queue-entry is created
while the first job is still running, it either starts a new job directly after the first
one finished (if the transaction that created the queue entry wasn't visible at the time
the first job was started), or just deletes the new queue entry (If the previous run already
saw the changes introduced by the transaction that created the queue entry)

> The functionality required is "lazy" update of agregate tables, which
> need not be 100% acurate, but should be kept in-sync whenever possible.
Why isn't the solution outlines above sufficient?

greetings, FLorian Pflug



Re: background triggers?

От
Rafal Pietrak
Дата:
On Tue, 2006-05-23 at 13:01 -0400, Kenneth Downs wrote:
> Rafal Pietrak wrote:
> >some other INSERT, return imediately if so, but turn into background for
> >a long-lasting job if not".
> >
> >
> >
> Rafal, I'm wondering why you want to do this.  You may be "fighting the
> framework".

Yes, most probably. I'm afraid of that :(

Still...

> If you are trying to do something that is totally unsupported, it is
> probably for a pretty good reason, usually dealing with security or data
> loss.  You can probably get what you want by supported methods, but it
> may require looking at the problem in a different way.
>
> What is it you are trying to accomplish?  Is it just performance?


OK. here is a 'real life' example. It works more like a post office. Now
and then, there arrive a 'delivery man' with a bunch of post to deliver.
Post office takes the batch, checks and stamps each and every item, and
hands over a receipt. But the actual bin-ing (into delivery channels)
and routing (dispatch) is handled without the delivery man standing and
waiting for the process to end.

In my *real*life* case I have a file with hundreds of tousends of
'transactions' uploaded by operator (several times times a day, and in
fact, by 10-30 operators) - those are checked at INSERT time. But after
that check and INSERT, I'd like to say to the operator: "OK, jour job is
done, don't warry about the rest".

But there is more work to do with the batch.

I have to 'route the transactions' to their relevant accounts, and see
how those change the 'status' of those accounts, consequently, store the
updated status within the account itself. This is tedious and time
consuming. But it have to be done, because 'customers' query account
status for those 'agregate status information' and it would be *very*
haevy for the database if those queries required browsing of the entire
'transaction log'. Number of 'Transactions' to number of accounts is
like milions to thousends.

A plain INSERT of batch takes 5-10minutes on desktop postgresql (800MHz
machine, ATA disks). When I attach trigger (*Very* simple funciton) to
update the accounts, the INSERT take hours (2-4). But when I make just
one single update of all accounts at the end of the batch insert, it
takes 20-30min.

So my solution was 1) to have an additional table "TABLE dirty_log(tiem
timestamp, who text)", which gets inserted a row *after* a batch of
INSERTS. 2) and a 'server side process', run every 5-10 minutes, which
makes accounts update, and which: A) does NOT launche when another such
process is currently running; B) purges DIRTY_LOG table after it's done.
This is quite obvoisly a 'user space implementation' of the 'background
triggers'. I needed that.

Natuaraly, having this 'bacground trigger' I loose acuracy of the
account information. But I gain on system efficiency - this is
engineering decision. When we have to take those (like the
implementation above), it's good to have 'system tools' (like
'background triggers') that support us.

But of cource I may be wrong all togather. I'd really like to know the
techniq, which is 'along the line' of RDBM systems design, which serves
that same purpose.

Is there a better solution?

NB: the 'batch INSERT' I mentioned above is done by www server. It's
quite vital to have the the server process terminated (meaning: not keep
it waiting for LISTEN/NOPTIFY event - meaning not use LISTEN/NOTIFY), as
apache will keep the connection opened until the process ends.

In 'real life', this scenario is applicable also to 'service network':
1. Say, you have 100_000 - 1000_000 vending machines (VM) network.
2. each is is loaded with c.a. 100 item types (300 types in the entire
network).
3. each VM dispatches an item every 2-3 minutes. which make overall
'transaction traffic' at the level of over hundreds per second.
4. assume, that for 'customer management', you need to store quite a bit
of data with each item-dispense 'transaction'. Meaning: transaction are
not very light, and their details have to be kept for long time.
5. obviously, you need to manage your stock (each of the 300
item-types): you keep VM loaded and keep some stock at central store.

(ATMs are a good example of such netowrk)

So:
1. 'transaction traffic' is so signifficant, that you really have to
'optimise for that'
2. you don't really have to know *exactly* when you run out of stock,
because each VM has signifficant local item store, so if you get
notified, that a particular VM gets close to the bottom with particular
item, you may dispatch a reload in 10min, but it's also OK to dispatch
that in 2hours - meaning, the 'acocunt information' does not have to be
'immediately acurate'. Far more important is 'dispatch transaction'
performance.
3. normally, you 'keep an eye' on you VM network - meaning, you issue a
'statistics' query quite frequently. If that was a 'haevy query' it
would degrade your database performance quite signifficantly - we really
need the 'agregate information' stored within 'item-accounts'.

Is there a clean, 'along the framework' design that serves this reality?

-R


Re: background triggers?

От
Sim Zacks
Дата:
My understanding of Listen/Notify is that it is a completely
disconnected process running on the database server.
It should be run as a dameon (or service in Windows) application. The
trigger on the table calls the notify SQL command and then the trigger,
and thereby the statement, ends.
The notify then tells this daemon application to wake up and start
processing.
To make sure that a second process does not start while the first
process is running, you should have a running processes table which gets
inserted when it starts and updated when it ends. That way your process
can check if one is currently running or not.

Another way of doing it is to have a cron job check every X minutes for
records in the table. When they are there, it should run the process. It
can also have a flag that says don't run another process until this one
is finished.

I may not have understood exactly what you are trying to do, but from
what I understood, this will solve your problem.

Sim

Rafal Pietrak wrote:
> On Tue, 2006-05-23 at 13:01 -0400, Kenneth Downs wrote:
>> Rafal Pietrak wrote:
>>> some other INSERT, return imediately if so, but turn into background for
>>> a long-lasting job if not".
>>>
>>>
>>>
>> Rafal, I'm wondering why you want to do this.  You may be "fighting the
>> framework".
>
> Yes, most probably. I'm afraid of that :(
>
> Still...
>
>> If you are trying to do something that is totally unsupported, it is
>> probably for a pretty good reason, usually dealing with security or data
>> loss.  You can probably get what you want by supported methods, but it
>> may require looking at the problem in a different way.
>>
>> What is it you are trying to accomplish?  Is it just performance?
>
>
> OK. here is a 'real life' example. It works more like a post office. Now
> and then, there arrive a 'delivery man' with a bunch of post to deliver.
> Post office takes the batch, checks and stamps each and every item, and
> hands over a receipt. But the actual bin-ing (into delivery channels)
> and routing (dispatch) is handled without the delivery man standing and
> waiting for the process to end.
>
> In my *real*life* case I have a file with hundreds of tousends of
> 'transactions' uploaded by operator (several times times a day, and in
> fact, by 10-30 operators) - those are checked at INSERT time. But after
> that check and INSERT, I'd like to say to the operator: "OK, jour job is
> done, don't warry about the rest".
>
> But there is more work to do with the batch.
>
> I have to 'route the transactions' to their relevant accounts, and see
> how those change the 'status' of those accounts, consequently, store the
> updated status within the account itself. This is tedious and time
> consuming. But it have to be done, because 'customers' query account
> status for those 'agregate status information' and it would be *very*
> haevy for the database if those queries required browsing of the entire
> 'transaction log'. Number of 'Transactions' to number of accounts is
> like milions to thousends.
>
> A plain INSERT of batch takes 5-10minutes on desktop postgresql (800MHz
> machine, ATA disks). When I attach trigger (*Very* simple funciton) to
> update the accounts, the INSERT take hours (2-4). But when I make just
> one single update of all accounts at the end of the batch insert, it
> takes 20-30min.
>
> So my solution was 1) to have an additional table "TABLE dirty_log(tiem
> timestamp, who text)", which gets inserted a row *after* a batch of
> INSERTS. 2) and a 'server side process', run every 5-10 minutes, which
> makes accounts update, and which: A) does NOT launche when another such
> process is currently running; B) purges DIRTY_LOG table after it's done.
> This is quite obvoisly a 'user space implementation' of the 'background
> triggers'. I needed that.
>
> Natuaraly, having this 'bacground trigger' I loose acuracy of the
> account information. But I gain on system efficiency - this is
> engineering decision. When we have to take those (like the
> implementation above), it's good to have 'system tools' (like
> 'background triggers') that support us.
>
> But of cource I may be wrong all togather. I'd really like to know the
> techniq, which is 'along the line' of RDBM systems design, which serves
> that same purpose.
>
> Is there a better solution?
>
> NB: the 'batch INSERT' I mentioned above is done by www server. It's
> quite vital to have the the server process terminated (meaning: not keep
> it waiting for LISTEN/NOPTIFY event - meaning not use LISTEN/NOTIFY), as
> apache will keep the connection opened until the process ends.
>
> In 'real life', this scenario is applicable also to 'service network':
> 1. Say, you have 100_000 - 1000_000 vending machines (VM) network.
> 2. each is is loaded with c.a. 100 item types (300 types in the entire
> network).
> 3. each VM dispatches an item every 2-3 minutes. which make overall
> 'transaction traffic' at the level of over hundreds per second.
> 4. assume, that for 'customer management', you need to store quite a bit
> of data with each item-dispense 'transaction'. Meaning: transaction are
> not very light, and their details have to be kept for long time.
> 5. obviously, you need to manage your stock (each of the 300
> item-types): you keep VM loaded and keep some stock at central store.
>
> (ATMs are a good example of such netowrk)
>
> So:
> 1. 'transaction traffic' is so signifficant, that you really have to
> 'optimise for that'
> 2. you don't really have to know *exactly* when you run out of stock,
> because each VM has signifficant local item store, so if you get
> notified, that a particular VM gets close to the bottom with particular
> item, you may dispatch a reload in 10min, but it's also OK to dispatch
> that in 2hours - meaning, the 'acocunt information' does not have to be
> 'immediately acurate'. Far more important is 'dispatch transaction'
> performance.
> 3. normally, you 'keep an eye' on you VM network - meaning, you issue a
> 'statistics' query quite frequently. If that was a 'haevy query' it
> would degrade your database performance quite signifficantly - we really
> need the 'agregate information' stored within 'item-accounts'.
>
> Is there a clean, 'along the framework' design that serves this reality?
>
> -R
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

Re: background triggers?

От
Rafal Pietrak
Дата:
On Wed, 2006-05-24 at 13:36 +0200, Sim Zacks wrote:
> My understanding of Listen/Notify is that it is a completely
> disconnected process running on the database server.

Yes. But In my particular case (and I presume, the intention of
'bacground triggers' is that) a programmer (like myself) is not really
interested in the outcome of thet trigger process. So there is no
'technical' need for him/her to create the server side proces *provided*
hi/she can setup a job *within* the database server itself, and just go
away.

That's the idea of 'background triggers'.

Surely, there are work arounds. Like the LISTEN/NOTIFY server (not
datagase server, but system server) daemon that takes database server
notiffications. And even a system server daemon, that simply uses
synchronous database communication (like I did in my case). The problem
is, that I have this 'eatching desire', to have such technical issues
supported 'withing the framework of RDBM architecture'.

That's why I keep thinking, that the solution I used is actually
'bitting the fances', while gurus do it some other, *better* way. But if
not, a think that 'bakground triggers' could help here.

>
> I may not have understood exactly what you are trying to do, but from
> what I understood, this will solve your problem.

I think you did. I just feel that 'background triggers' is 'real life'
engineering issue, so it should get some backing from RDBMS.

just my 2c.

-R



Re: background triggers?

От
Kenneth Downs
Дата:
Rafal Pietrak wrote:

>A plain INSERT of batch takes 5-10minutes on desktop postgresql (800MHz
>machine, ATA disks). When I attach trigger (*Very* simple funciton) to
>update the accounts, the INSERT take hours (2-4). But when I make just
>one single update of all accounts at the end of the batch insert, it
>takes 20-30min.
>
>
>
Why not have the INSERT go to an "inbox" table, a table whose only job
is to receive the data for future processing.

Your client code should mark all rows with a batch number as they go
in.  Then when the batch is loaded, simply invoke a stored procedure to
process them.  Pass the stored procedure the batch number.

IOW, have your "background trigger" be a stored procedure that is
invoked by the client, instead of trying to get the server to do it.

Вложения

Re: background triggers?

От
Sim Zacks
Дата:
I agree with you that it would be better to have a background trigger
run by the database server. That was exactly what I asked. The answer
was that there is no current way to do a background trigger and to
accomplish what I want to do I need to write a daemon that calls Listen
and then on the Notify run my function.

If background triggers were a possibility that would make the whole
thing so much easier.

Rafal Pietrak wrote:
> On Wed, 2006-05-24 at 13:36 +0200, Sim Zacks wrote:
>> My understanding of Listen/Notify is that it is a completely
>> disconnected process running on the database server.
>
> Yes. But In my particular case (and I presume, the intention of
> 'bacground triggers' is that) a programmer (like myself) is not really
> interested in the outcome of thet trigger process. So there is no
> 'technical' need for him/her to create the server side proces *provided*
> hi/she can setup a job *within* the database server itself, and just go
> away.
>
> That's the idea of 'background triggers'.
>
> Surely, there are work arounds. Like the LISTEN/NOTIFY server (not
> datagase server, but system server) daemon that takes database server
> notiffications. And even a system server daemon, that simply uses
> synchronous database communication (like I did in my case). The problem
> is, that I have this 'eatching desire', to have such technical issues
> supported 'withing the framework of RDBM architecture'.
>
> That's why I keep thinking, that the solution I used is actually
> 'bitting the fances', while gurus do it some other, *better* way. But if
> not, a think that 'bakground triggers' could help here.
>
>> I may not have understood exactly what you are trying to do, but from
>> what I understood, this will solve your problem.
>
> I think you did. I just feel that 'background triggers' is 'real life'
> engineering issue, so it should get some backing from RDBMS.
>
> just my 2c.
>
> -R
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

Re: background triggers?

От
Rafal Pietrak
Дата:
On Wed, 2006-05-24 at 07:41 -0400, Kenneth Downs wrote:
> >
> Why not have the INSERT go to an "inbox" table, a table whose only job
> is to receive the data for future processing.

Actually, it 'sort of' works that way.

> Your client code should mark all rows with a batch number as they go
> in.  Then when the batch is loaded, simply invoke a stored procedure to
> process them.  Pass the stored procedure the batch number.

If I have that stored procedure and if I issue command that would launch
such stored procedure from "psql>" prompt: how long will I have to wait
for another prompt? 1) until the procedure ends its job. 2) right away,
the procedure does its job unabidedly 'in the background'.

My impression was, that I get the next prompt after the procedure
finishes, so it wouldn't be a solution. But if (2) applies, that is
really it.... Frankly, it would take me some time to get back to those
sources (and generate simulation data) - so anybody knows the answer?

-R

Re: background triggers?

От
Kenneth Downs
Дата:
Rafal Pietrak wrote:
On Wed, 2006-05-24 at 07:41 -0400, Kenneth Downs wrote: 
Why not have the INSERT go to an "inbox" table, a table whose only job 
is to receive the data for future processing.   
Actually, it 'sort of' works that way.
 
Your client code should mark all rows with a batch number as they go 
in.  Then when the batch is loaded, simply invoke a stored procedure to 
process them.  Pass the stored procedure the batch number.   
If I have that stored procedure and if I issue command that would launch
such stored procedure from "psql>" prompt: how long will I have to wait
for another prompt? 1) until the procedure ends its job. 2) right away,
the procedure does its job unabidedly 'in the background'. 
 
What web server and OS are you using?  In linux/apache you can fork off a process that runs the SP and then detach from it. 
Вложения

Re: background triggers?

От
Sim Zacks
Дата:
The problem with client code processing a function is that unless you
are using threads (my client application is not in a multi-threaded
environment), the client has to wait for the server to return from the
end of the function. I don't want the client to wait and the result
doesn't affect the user at all, so there is no reason why he should wait.

Kenneth Downs wrote:
> Rafal Pietrak wrote:
>
>> A plain INSERT of batch takes 5-10minutes on desktop postgresql (800MHz
>> machine, ATA disks). When I attach trigger (*Very* simple funciton) to
>> update the accounts, the INSERT take hours (2-4). But when I make just
>> one single update of all accounts at the end of the batch insert, it
>> takes 20-30min.
>>
>>
>>
> Why not have the INSERT go to an "inbox" table, a table whose only job
> is to receive the data for future processing.
>
> Your client code should mark all rows with a batch number as they go
> in.  Then when the batch is loaded, simply invoke a stored procedure to
> process them.  Pass the stored procedure the batch number.
>
> IOW, have your "background trigger" be a stored procedure that is
> invoked by the client, instead of trying to get the server to do it.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match

Re: background triggers?

От
Kenneth Downs
Дата:
Sim Zacks wrote:

> The problem with client code processing a function is that unless you
> are using threads (my client application is not in a multi-threaded
> environment), the client has to wait for the server to return from the
> end of the function. I don't want the client to wait and the result
> doesn't affect the user at all, so there is no reason why he should wait.

No reason for him to wait.  Even Windows supports background processing
(geez I think so anyway).

If you are on Linux spawning a background process is a no-brainer,
there's plenty of ways to do it.  There ought to be a way on windows I
would think.

>
> Kenneth Downs wrote:
>
>> Rafal Pietrak wrote:
>>
>>> A plain INSERT of batch takes 5-10minutes on desktop postgresql (800MHz
>>> machine, ATA disks). When I attach trigger (*Very* simple funciton) to
>>> update the accounts, the INSERT take hours (2-4). But when I make just
>>> one single update of all accounts at the end of the batch insert, it
>>> takes 20-30min.
>>>
>>>
>>>
>> Why not have the INSERT go to an "inbox" table, a table whose only
>> job is to receive the data for future processing.
>>
>> Your client code should mark all rows with a batch number as they go
>> in.  Then when the batch is loaded, simply invoke a stored procedure
>> to process them.  Pass the stored procedure the batch number.
>>
>> IOW, have your "background trigger" be a stored procedure that is
>> invoked by the client, instead of trying to get the server to do it.
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 9: In versions below 8.0, the planner will ignore your desire to
>>        choose an index scan if your joining column's datatypes do not
>>        match
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match



Вложения

Re: background triggers?

От
Sim Zacks
Дата:
The reason why the client application would have to wait is because the
front-end is written in MSAccess. While Windows supports multi
threading, Access does not support multi threading.
(Multi threading allows background processing, IYDK)

Kenneth Downs wrote:
> Sim Zacks wrote:
>
>> The problem with client code processing a function is that unless you
>> are using threads (my client application is not in a multi-threaded
>> environment), the client has to wait for the server to return from the
>> end of the function. I don't want the client to wait and the result
>> doesn't affect the user at all, so there is no reason why he should wait.
>
> No reason for him to wait.  Even Windows supports background processing
> (geez I think so anyway).
>
> If you are on Linux spawning a background process is a no-brainer,
> there's plenty of ways to do it.  There ought to be a way on windows I
> would think.
>>
>> Kenneth Downs wrote:
>>
>>> Rafal Pietrak wrote:
>>>
>>>> A plain INSERT of batch takes 5-10minutes on desktop postgresql (800MHz
>>>> machine, ATA disks). When I attach trigger (*Very* simple funciton) to
>>>> update the accounts, the INSERT take hours (2-4). But when I make just
>>>> one single update of all accounts at the end of the batch insert, it
>>>> takes 20-30min.
>>>>
>>>>
>>>>
>>> Why not have the INSERT go to an "inbox" table, a table whose only
>>> job is to receive the data for future processing.
>>>
>>> Your client code should mark all rows with a batch number as they go
>>> in.  Then when the batch is loaded, simply invoke a stored procedure
>>> to process them.  Pass the stored procedure the batch number.
>>>
>>> IOW, have your "background trigger" be a stored procedure that is
>>> invoked by the client, instead of trying to get the server to do it.
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 9: In versions below 8.0, the planner will ignore your desire to
>>>        choose an index scan if your joining column's datatypes do not
>>>        match
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 9: In versions below 8.0, the planner will ignore your desire to
>>       choose an index scan if your joining column's datatypes do not
>>       match
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org

Re: background triggers?

От
Christopher Browne
Дата:
The world rejoiced as rafal@zorro.isa-geek.com (Rafal Pietrak) wrote:
> On Wed, 2006-05-24 at 07:41 -0400, Kenneth Downs wrote:
>> >
>> Why not have the INSERT go to an "inbox" table, a table whose only job
>> is to receive the data for future processing.
>
> Actually, it 'sort of' works that way.
>
>> Your client code should mark all rows with a batch number as they go
>> in.  Then when the batch is loaded, simply invoke a stored procedure to
>> process them.  Pass the stored procedure the batch number.
>
> If I have that stored procedure and if I issue command that would launch
> such stored procedure from "psql>" prompt: how long will I have to wait
> for another prompt? 1) until the procedure ends its job. 2) right away,
> the procedure does its job unabidedly 'in the background'.
>
> My impression was, that I get the next prompt after the procedure
> finishes, so it wouldn't be a solution. But if (2) applies, that is
> really it.... Frankly, it would take me some time to get back to those
> sources (and generate simulation data) - so anybody knows the answer?

It depends on something else...

Is the process that is starting up the "background batch process"
making other changes to the data?

If it is, then the "foreground" process MUST submit a COMMIT
in order to finalize the batch before kicking off anything in the
background.  Otherwise, the background process won't see the batch in
its "full glory."  You can never read uncommitted data, after all.

That being the case (that you MUST have a COMMIT), I don't see any
reason to consider LISTEN/NOTIFY to be an insufficient mechanism.

It would only be insufficient if you don't submit a COMMIT.

But the background process won't see the right data unless you COMMIT.
So you *MUST* submit a COMMIT, making LISTEN/NOTIFY quite adequate to
the task...
--
let name="cbbrowne" and tld="gmail.com" in String.concat "@" [name;tld];;
http://linuxdatabases.info/info/x.html
You know  how most packages say  "Open here". What is  the protocol if
the package says, "Open somewhere else"?

Re: background triggers?

От
Sim Zacks
Дата:
The problem with Listen/Notify is that you have to create another client
application that is not managed in the central database as a daemon to
call a function. If the server supported background triggers, then you
wouldn't need an extra client application to use the listen/notify
framework, you would just need to have a trigger with the background
option on and call the function.

That is a lot of extra overhead.

Christopher Browne wrote:
> The world rejoiced as rafal@zorro.isa-geek.com (Rafal Pietrak) wrote:
>> On Wed, 2006-05-24 at 07:41 -0400, Kenneth Downs wrote:
>>> Why not have the INSERT go to an "inbox" table, a table whose only job
>>> is to receive the data for future processing.
>> Actually, it 'sort of' works that way.
>>
>>> Your client code should mark all rows with a batch number as they go
>>> in.  Then when the batch is loaded, simply invoke a stored procedure to
>>> process them.  Pass the stored procedure the batch number.
>> If I have that stored procedure and if I issue command that would launch
>> such stored procedure from "psql>" prompt: how long will I have to wait
>> for another prompt? 1) until the procedure ends its job. 2) right away,
>> the procedure does its job unabidedly 'in the background'.
>>
>> My impression was, that I get the next prompt after the procedure
>> finishes, so it wouldn't be a solution. But if (2) applies, that is
>> really it.... Frankly, it would take me some time to get back to those
>> sources (and generate simulation data) - so anybody knows the answer?
>
> It depends on something else...
>
> Is the process that is starting up the "background batch process"
> making other changes to the data?
>
> If it is, then the "foreground" process MUST submit a COMMIT
> in order to finalize the batch before kicking off anything in the
> background.  Otherwise, the background process won't see the batch in
> its "full glory."  You can never read uncommitted data, after all.
>
> That being the case (that you MUST have a COMMIT), I don't see any
> reason to consider LISTEN/NOTIFY to be an insufficient mechanism.
>
> It would only be insufficient if you don't submit a COMMIT.
>
> But the background process won't see the right data unless you COMMIT.
> So you *MUST* submit a COMMIT, making LISTEN/NOTIFY quite adequate to
> the task...

Re: background triggers?

От
Chris Browne
Дата:
Sim Zacks <sim@compulab.co.il> writes:
> The reason why the client application would have to wait is because
> the front-end is written in MSAccess. While Windows supports multi
> threading, Access does not support multi threading.
> (Multi threading allows background processing, IYDK)

Well, you *clearly* need to have some sort of "daemon" running in
order to do this.

I expect it will in effect be a LISTEN process that waits for clients
to submit NOTIFY requests.

Even if you don't actually choose to use NOTIFY/LISTEN, per se, you'll
doubtless wind up creating an ad hoc, informally-specified
implementation of part of it...
--
let name="cbbrowne" and tld="ntlug.org" in name ^ "@" ^ tld;;
http://www.ntlug.org/~cbbrowne/unix.html
"Any sufficiently complicated C or Fortran program contains an ad hoc
informally-specified bug-ridden slow implementation of half of Common
Lisp."  -- Philip Greenspun

Re: background triggers?

От
Rafal Pietrak
Дата:
On Wed, 2006-05-24 at 08:38 -0400, Kenneth Downs wrote:
> >
> What web server and OS are you using?  In linux/apache you can fork
> off a process that runs the SP and then detach from it.

Actually it is linux/apache. Yes, I could do it that way, but eventually
I've sattled for a cron job that lookes-up a single 'dirty_log' table
and does the housekeeping if necesery.

Still, from my point of view, this is a workaround. Not an elegant,
'database' solution, which I was looking for.

-R

Re: background triggers?

От
Rafal Pietrak
Дата:
On Wed, 2006-05-24 at 09:46 -0400, Christopher Browne wrote:
> The world rejoiced as rafal@zorro.isa-geek.com (Rafal Pietrak) wrote:
> > My impression was, that I get the next prompt after the procedure
> > finishes, so it wouldn't be a solution. But if (2) applies, that is
> > really it.... Frankly, it would take me some time to get back to those
> > sources (and generate simulation data) - so anybody knows the answer?
>
> It depends on something else...
>
> Is the process that is starting up the "background batch process"
> making other changes to the data?

OK. just to focus attention:
1. I have a batch of INSERTS done by program called "main" - a simple
cgi-bin/script in perl.
2. I have a stored procedure installed within the RDBMS.
3. that procedure is executed as the last 'intentional' DB request done
by my main script.
4. then the script would like to exit, leaving 'stored procedure' doing
whatever that procedure is designed to do, but it'll take more then a
couple of minutes to complete.
5. would my main process have to wait (given current implementation of
libpq) for that procedure to finish? Or the PG_disconnect finishes right
away, without waiting for procedure return?
6. the procedure, after minutes of busy housekeeping the database, does
a final COMMIT. But the main process that called it is long gone by that
time.

In my script I do: "Batch-INSERTS; COMMINT; call procedure; exit";

I'm not quite sure if I can write a stored procedure so, that in case
when: INSERT takes 5min, call_proc takes 20min, --> the whole script
returns after 5min (meaning, time needed by the stored procedure to
execute does not influence script execution time)?

If I can write such procedure ... triggers are stored procedures, I
could try to write such trigger???

-R


Re: background triggers?

От
Sim Zacks
Дата:
The question is not how it is implemented, the question is what does the
end user have to do to accomplish his requirements. If I have to write
my own daemon, then I have to debug it and maintain it, write a script
so it automatically turns on when the machine is rebooted. If I have
multiple environments, for development, testing and production, I will
need to modify it to handle this. If I move the database to a new server
this script has to be in the action items to be moved along with its
accompanying start-up scripts. If I decide to recover a backup to test
something (something I do on an occasional basis), I will need to
consider my daemon as well. It is basically an outside process that has
a lot of overhead (not necessarily computing overhead).

If the database supported background triggers, it might be implemented
by a daemon or by the Listen/Notify framework, but I really couldn't
care less. It is part of the database. I only need to write the function
  and the trigger code with an option to run this in the background. No
matter what I do now, or how many environments I have, the database
handles it all. If I backup my database and install it on another
server, my process is still intact.

The reason why this should be part of the database engine and not
another client application is because what you want is a trigger. All
you want is when a table is updated that another function should run,
you don't need any outside processing. The Listen/Notify framework is
needed for a case where you would like non-database actions to take
place. Your client application can then go and do what it needs to and
then comes back and can tell the database that it is done.

Chris Browne wrote:

> Well, you *clearly* need to have some sort of "daemon" running in
> order to do this.
>
> I expect it will in effect be a LISTEN process that waits for clients
> to submit NOTIFY requests.
>
> Even if you don't actually choose to use NOTIFY/LISTEN, per se, you'll
> doubtless wind up creating an ad hoc, informally-specified
> implementation of part of it...

Re: background triggers?

От
Thomas Hallgren
Дата:
Sim Zacks wrote:
> If the database supported background triggers, it might be implemented
> by a daemon or by the Listen/Notify framework, but I really couldn't
> care less. It is part of the database.
 >
Assume the backend would handle this, what would the transaction semantics look like? You
can't wait for the background work to complete before you commit your transaction, so what
should happen when the background "trigger" fails? Or what if a number of such triggers
where fired and then rolled back?

The only thing the database *can* do, is what it does today. It provides the hooks needed
for specialized code that can react on the *outcome* of transactions (and then perform its
task asynchronously using transactions that are autonomous to the first one).

What you're trying to do doesn't belong in triggers and the concept of "background triggers"
doesn't make any sense. Triggers execute (and complete) within a transaction and the work
they perform should be rolled back if the transaction is rolled back.

Kind regards,
Thomas Hallgren


Re: background triggers?

От
Rafal Pietrak
Дата:
I'd like to propose a 'syntax/semantics' of such trigger:

Triggers normally execute inside of a transaction.

A COMMIT within a trigger could mean: "do a fork: fork-1) return to the
main and schedule COMMIT there, fork-2) continue in bacground".

From the perspective of my earlier applications, it would be desired to
have an argument for such COMMIT - a label, which is a *global* database
object (may be just a semaphore), and is:
1) Assuming semaphore implementation - semaphore should be tested and
fail if already asserted, NOT tesed an block. FORK should be initiated
only if semaphore test succeeds.
2) the execution of procedure within fork-2 (meaning, after semaphore
assertion succeeds) should be posponed until caller actually COMMITS.
3) On EXIT, fork-2 deasserts semaphore.
4) in the simplest case, the semaphore can be defined on the trigger
function name itself, and consequently, the 'label' for the COMMIT
wouldn't be necesary?


-R

On Thu, 2006-05-25 at 08:50 +0200, Sim Zacks wrote:
> The question is not how it is implemented, the question is what does the
> end user have to do to accomplish his requirements. If I have to write
> my own daemon, then I have to debug it and maintain it, write a script
> so it automatically turns on when the machine is rebooted. If I have
> multiple environments, for development, testing and production, I will
> need to modify it to handle this. If I move the database to a new server
> this script has to be in the action items to be moved along with its
> accompanying start-up scripts. If I decide to recover a backup to test
> something (something I do on an occasional basis), I will need to
> consider my daemon as well. It is basically an outside process that has
> a lot of overhead (not necessarily computing overhead).
>
> If the database supported background triggers, it might be implemented
> by a daemon or by the Listen/Notify framework, but I really couldn't
> care less. It is part of the database. I only need to write the function
>   and the trigger code with an option to run this in the background. No
> matter what I do now, or how many environments I have, the database
> handles it all. If I backup my database and install it on another
> server, my process is still intact.
>
> The reason why this should be part of the database engine and not
> another client application is because what you want is a trigger. All
> you want is when a table is updated that another function should run,
> you don't need any outside processing. The Listen/Notify framework is
> needed for a case where you would like non-database actions to take
> place. Your client application can then go and do what it needs to and
> then comes back and can tell the database that it is done.
>
> Chris Browne wrote:
>
> > Well, you *clearly* need to have some sort of "daemon" running in
> > order to do this.
> >
> > I expect it will in effect be a LISTEN process that waits for clients
> > to submit NOTIFY requests.
> >
> > Even if you don't actually choose to use NOTIFY/LISTEN, per se, you'll
> > doubtless wind up creating an ad hoc, informally-specified
> > implementation of part of it...
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org

Re: background triggers?

От
"Dawid Kuroczko"
Дата:
On 5/25/06, Rafal Pietrak <rafal@zorro.isa-geek.com> wrote:
> I'd like to propose a 'syntax/semantics' of such trigger:
>
> Triggers normally execute inside of a transaction.
>
> A COMMIT within a trigger could mean: "do a fork: fork-1) return to the
> main and schedule COMMIT there, fork-2) continue in bacground".

I don't think fork(2)ing a running backed is a good idea, probably it would
end up with major data corruption.  You want to call fork(2) in your
application. Something like: "if (fork()==0) { reestablish connection, issue
trigger-code on the database } else  { gracefully return };"

> >From the perspective of my earlier applications, it would be desired to
> have an argument for such COMMIT - a label, which is a *global* database
> object (may be just a semaphore), and is:
> 1) Assuming semaphore implementation - semaphore should be tested and
> fail if already asserted, NOT tesed an block. FORK should be initiated
> only if semaphore test succeeds.
> 2) the execution of procedure within fork-2 (meaning, after semaphore
> assertion succeeds) should be posponed until caller actually COMMITS.
> 3) On EXIT, fork-2 deasserts semaphore.
> 4) in the simplest case, the semaphore can be defined on the trigger
> function name itself, and consequently, the 'label' for the COMMIT
> wouldn't be necesary?

Hmm, I've got a feeling its something like "I don't feel like coding it in
application, so it would be better if community changed the backend
to do it". :) However what you propose i 1,2,3,4 points is somewhat
similar to already existing 2PC (2-phase commit), which PostgreSQL
implements. Probably not what you want, but should be valuable to
know, I guess.

And as for COMMIT; within function...  Not possible, not without
breaking awful lot of things.  Think about a tranasction as:
BEGIN;
  -- commands, like INSERTs, ended with commit;
  -- user given triggers fired after user issued COMMIT;
  -- Referential Integrity triggers and what not
COMMIT; -- actual commit performed by DB.

If your trigger would call COMMIT, the referential integrity triggers would
not have anything to do -- the commit would be already done.  No referential
integrity, you might as well use MyISAM then. ;)

So... let's assume the "commit" whould not actually commit, but rather
start another backend and do the work [1].  The problem is that newly
started backed would not see the work until the old backend actually
COMMIT;

The idea of commit within a procedure might be interesting, but from
the perspective of very-long-runing queries which update whole a lot
of rows, but that's another issue.

   Regards,
       Dawid

[1]: If you really insist on doing it this way, of course you may!  Here is
a fishing rod:
write a trigger in PL/perlU, which will fork();  The newly started child will
use DBI to connect to database, and issue your query, and then call
exit(0) to be sure you don't return to backend.  You might want to call
exec() with a pre-prepared script doing above work.

From the perspective of the "main" backend, the trigger will call fork(),
and finish.  And your application will commit.

That's everything you need to do it the way you want it.  Have fun!

Re: background triggers?

От
Thomas Hallgren
Дата:
Rafal Pietrak wrote:
> I'd like to propose a 'syntax/semantics' of such trigger:
>
> Triggers normally execute inside of a transaction.
>
> A COMMIT within a trigger could mean: "do a fork: fork-1) return to the
> main and schedule COMMIT there, fork-2) continue in bacground".
>
And what if fork-1) returns to the main, attempts the COMMIT but instead and rolls back due
to a violated constraint? Where does that leave fork-2?

Regards,
Thomas Hallgren

Re: background triggers?

От
Rafal Pietrak
Дата:
On Thu, 2006-05-25 at 10:33 +0200, Thomas Hallgren wrote:
> Rafal Pietrak wrote:
> > I'd like to propose a 'syntax/semantics' of such trigger:
> >
> > Triggers normally execute inside of a transaction.
> >
> > A COMMIT within a trigger could mean: "do a fork: fork-1) return to the
> > main and schedule COMMIT there, fork-2) continue in bacground".
> >
> And what if fork-1) returns to the main, attempts the COMMIT but instead and rolls back due
> to a violated constraint? Where does that leave fork-2?
>
> Regards,
> Thomas Hallgren

No problem at all (at least in particular case of an application I have
in mind :). The precedure that remains within fork-2 just does a time
consuming housekeeping. Like a cleanup - always succeeds, even if
sometimes is not really necesary (like in case of main rolling-back).

And that's exacly why I thing that it should be 'released to run' by
RDBMS *after* the main COMMITS (or ROLLES-BACK). It should be run on
COMMITED (visible to the world) changes, not on session trancients.

-R


Re: background triggers?

От
Rafal Pietrak
Дата:
On Thu, 2006-05-25 at 10:21 +0200, Dawid Kuroczko wrote:
> On 5/25/06, Rafal Pietrak <rafal@zorro.isa-geek.com> wrote:
> > I'd like to propose a 'syntax/semantics' of such trigger:
> >
> > Triggers normally execute inside of a transaction.
> >
> > A COMMIT within a trigger could mean: "do a fork: fork-1) return to the
> > main and schedule COMMIT there, fork-2) continue in bacground".
>
> I don't think fork(2)ing a running backed is a good idea, probably it would
> end up with major data corruption.  You want to call fork(2) in your
> application. Something like: "if (fork()==0) { reestablish connection, issue
> trigger-code on the database } else  { gracefully return };"

I'm not that fluent in postgresql backend programming. But further on,
you write a suggestion of a trigger skeleton, which should be feasable -
so if such trigger (user level function) is feasable, may be it could be
implemented by backend engine, too.... as if it were a syntax shortcut
(using wining trigger COMMIT keyword) into the trigger implementation
you suggest at the end of your response below?

> Hmm, I've got a feeling its something like "I don't feel like coding it in
> application, so it would be better if community changed the backend
> to do it". :) However what you propose i 1,2,3,4 points is somewhat

:) YES!!!

> similar to already existing 2PC (2-phase commit), which PostgreSQL
> implements. Probably not what you want, but should be valuable to
> know, I guess.

May be. Currnetly I wouldn't know - I never used 2PC. When I've learned
about 2PC, I though 2PC is for occasions when application can figure out
how to re-run a transaction in some other way when 'primary' way fails
(and rolls-back). The alternative way might re-use parcial work acheved
by the original path, up to the checkpoint.

I never thought this can be used for a *disconnedted*, forked additional
instance of a back-end process.

I might have been wrong - I must get back to books.

But the goal here is not to re-run a transaction some other way, but to
cut the transaction short, and do tasks which don't need transaction
braces, outside of a transaction.

To run part of the trigger ourside of a transaction. Thusly make the
transaction commit sooner.

I don't think 2PC gives any help here.

> If your trigger would call COMMIT, the referential integrity triggers would

I didn't really ment 'semantics of COMMIT'. I just ment using COMMIT
keyword, as it normally *may*never* be used within the trigger; and
initiate/implement semantics, which 'detaches' the remaining processing
implemented/encoded as contained within that trigger funciton, from the
original transaction execution.

> So... let's assume the "commit" whould not actually commit, but rather
> start another backend and do the work [1].  The problem is that newly
> started backed would not see the work until the old backend actually
> COMMIT;

Yes. And in fact, it should be kept blocked until that moment.

> [1]: If you really insist on doing it this way, of course you may!  Here is
> a fishing rod:
> write a trigger in PL/perlU, which will fork();  The newly started child will
> use DBI to connect to database, and issue your query, and then call
> exit(0) to be sure you don't return to backend.  You might want to call
> exec() with a pre-prepared script doing above work.
>
> From the perspective of the "main" backend, the trigger will call fork(),
> and finish.  And your application will commit.
>
> That's everything you need to do it the way you want it.  Have fun!

If that works, may be it could be implemented within the database
backend? And accesable to client programming by means of COMMIT keyword
(or to be more generic: COMMIT macro, provided also for other language
bindings) within the trigger function body?

-R

Re: background triggers?

От
Thomas Hallgren
Дата:
Rafal Pietrak wrote:
> On Thu, 2006-05-25 at 10:33 +0200, Thomas Hallgren wrote:
>> Rafal Pietrak wrote:
>>> I'd like to propose a 'syntax/semantics' of such trigger:
>>>
>>> Triggers normally execute inside of a transaction.
>>>
>>> A COMMIT within a trigger could mean: "do a fork: fork-1) return to the
>>> main and schedule COMMIT there, fork-2) continue in bacground".
>>>
>> And what if fork-1) returns to the main, attempts the COMMIT but instead and rolls back due
>> to a violated constraint? Where does that leave fork-2?
>>
>> Regards,
>> Thomas Hallgren
>
> No problem at all (at least in particular case of an application I have
> in mind :). The precedure that remains within fork-2 just does a time
> consuming housekeeping. Like a cleanup - always succeeds, even if
> sometimes is not really necesary (like in case of main rolling-back).
>
A somewhat limited use-case to form generic database functionality on, wouldn't you say?

> And that's exacly why I thing that it should be 'released to run' by
> RDBMS *after* the main COMMITS (or ROLLES-BACK). It should be run on
> COMMITED (visible to the world) changes, not on session trancients.
>
Right, so it's not a trigger. It's another session (another transaction) that reacts on a
notification that is sent only if the first transaction succeeds. This is exactly what
notify/listen is for.

Regards,
Thomas Hallgren

Re: background triggers?

От
Rafal Pietrak
Дата:
On Thu, 2006-05-25 at 11:29 +0200, Thomas Hallgren wrote:
> Rafal Pietrak wrote:
> > consuming housekeeping. Like a cleanup - always succeeds, even if
> > sometimes is not really necesary (like in case of main rolling-back).
> >
> A somewhat limited use-case to form generic database functionality on, wouldn't you say?

OK. I admit.

...but may be ... :) OK. just a final comment and I'm done.

> > And that's exacly why I thing that it should be 'released to run' by
> > RDBMS *after* the main COMMITS (or ROLLES-BACK). It should be run on
> > COMMITED (visible to the world) changes, not on session trancients.
> >
> Right, so it's not a trigger. It's another session (another transaction) that reacts on a
> notification that is sent only if the first transaction succeeds. This is exactly what
> notify/listen is for.

Yes.

And no.

It is a trigger. But 'the other kind of' trigger.

<political-section-possible-source-of-fame-and-war-sorry>

The thing is. That I've seen *very* inefficent database application,
mainly because that was 'the easiest way to go'.

One of programmer's main virtue is lazyness. Of which I myself am proud
of :)

Thinking along the lines: "OK, we have this agregate tables, but we
don't need them 100% acurate, so let's not trigger syncing them with
transaction log on every INSERT" takes effort, and more often then not,
the implementation goes along as: "We have those aggregate tables - we
must keep those in-sync with main log, so we trigger an UPDATE on every
INSERT.... forking a housekeeper process to receive NOTIFY .... naaa....
I don't think so, may be next release". But once the application is in
production, we don't redesign when database load comes to the point
where performence suffer. The redesign is too risky.

My point is, that having a tool like COMMIT within a trigger function,
may result in a better application created easier right from the
beginning.

We don't see programmers wide use of LISTEN/NOTIFY, as I believe is
'just that little over the acceptable complaxity border'. The's why the
request may sound like 'for rare cases/limited use'.

The programmers' environment should 'gravitate' us to create good
software. The gravity is those little thing we find handy - triggers are
handy.... regretably lead to inefficiency.

</political-section>

I mean. Workaround exists but they are just workarounds nonetheless.

Then again. I just wanted to back-up the request which I've seen valid
and help explaining it. Obviously I wouldn't like to endlessly discuss
it's pros and cons. I think, the idea is well stated now, and if someone
is in the mood of implementing it (now, or in some unforseen future) -
hi/she has all the (end-user) explanations in the archieve.

-R

Re: background triggers?

От
"Florian G. Pflug"
Дата:
Rafal Pietrak wrote:
> On Wed, 2006-05-24 at 13:36 +0200, Sim Zacks wrote:
>> My understanding of Listen/Notify is that it is a completely
>> disconnected process running on the database server.
>
> Yes. But In my particular case (and I presume, the intention of
> 'bacground triggers' is that) a programmer (like myself) is not really
> interested in the outcome of thet trigger process. So there is no
> 'technical' need for him/her to create the server side proces *provided*
> hi/she can setup a job *within* the database server itself, and just go
> away.
Well, exactly "not being interested in the outcome" is IMHO the reason
why your demands clash with archtecture of a RDBMS. Most RDBMS go a long
way to ensure consistency and safety of your data, once they promised
(usually by not raising an error at least on "commit") that they stored
them. This doesn't match very well with asynchronous processes for which
"nobody cares" (In the sense that there is nobody to reports errors to).

> That's the idea of 'background triggers'.
>
> Surely, there are work arounds. Like the LISTEN/NOTIFY server (not
> datagase server, but system server) daemon that takes database server
> notiffications. And even a system server daemon, that simply uses
> synchronous database communication (like I did in my case). The problem
> is, that I have this 'eatching desire', to have such technical issues
> supported 'withing the framework of RDBM architecture'.
I wouldn't call this a workaround. It's a sane solution to your problem.
The only real downside I can see is that it requires you to implement
that daemon - but that can be solved once and for all - see below.

> That's why I keep thinking, that the solution I used is actually
> 'bitting the fances', while gurus do it some other, *better* way. But if
> not, a think that 'bakground triggers' could help here.
>
>> I may not have understood exactly what you are trying to do, but from
>> what I understood, this will solve your problem.
>
> I think you did. I just feel that 'background triggers' is 'real life'
> engineering issue, so it should get some backing from RDBMS.
I don't think the RDBMS can do much more than provide a notification
framework - which postgres does (LISTEN / NOTIFY).

But what would probably ease your 'etching desire' is if there was a
general-purpose daemon that could be told the execute a given statement
either at predefined times, or when a notification arrives, or both.

Newer pgadmin3 versions include pgAgent, which is kind of a
postgres-aware cron daemon, as far as I know. I don't think that pgAgent
currently supports LISTEN/NOTIFY, but you could talk to the pgadmin
maintainers about adding support for that.

Then you wouldn't need to write the daemon yourself, and it would even
be "part of the framework" as long as you consider pgadmin to be "part
of the framework".

greetings, Florian Pflug



>
> just my 2c.
>
> -R
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>


Re: background triggers?

От
Rafal Pietrak
Дата:
On Thu, 2006-05-25 at 18:49 +0200, Florian G. Pflug wrote:
> Rafal Pietrak wrote:
> > 'technical' need for him/her to create the server side proces *provided*
> > hi/she can setup a job *within* the database server itself, and just go
> > away.
> Well, exactly "not being interested in the outcome" is IMHO the reason
> why your demands clash with archtecture of a RDBMS. Most RDBMS go a long
> way to ensure consistency and safety of your data, once they promised
> (usually by not raising an error at least on "commit") that they stored
> them. This doesn't match very well with asynchronous processes for which
> "nobody cares" (In the sense that there is nobody to reports errors to).

No, no no. This is completly different story.

That would really be very missfortunate if we couldn't relay on RDBMS
'confirmed storage'.

Here I'm just not interested in that procedure outcome: if it eventually
COMMITED or ROLLED-BACK. But obviously I am interested in consistancy of
database, when the detached procedure COMMITS.

I mean. It looks like this is *really* a novelty for RDBMS design - I
feel, that real programmers here (you guys :) are so hard to persuade
its necesary, because it's so far from the 'synchronous nature' of
clasical RDBMS design and triggers in particular.

But I'd like to express my believe, that having such tool within the
server can help build better database applications.

BTW: It's tempting to bring here another example from another world: OS
kernel design (linux in particular) does have a 'sort of triggers' - the
interrupt service subsystem. To have the overal system efficient,
interrupt serive routines are split into 'front-end' and 'back-ends'.
Front-ends execute 'inside' interrupt context - interrupts disabled.
Back-ends execute 'outside' interupt context - interrupts enabled.

The goal here would be to allow for similar optimisation handed over to
database programmers. Some part of trigger must be executed within the
context of an opened transaction; but another part may be executed
without constrains of blocking the caller.

Lots of similarities.

Having an easy to use database backend framework that supports splitting
trigger execution, will allow 'user space' programmers optimise trigger
implementations.

> But what would probably ease your 'etching desire' is if there was a
> general-purpose daemon that could be told the execute a given statement
> either at predefined times, or when a notification arrives, or both.

Yesss. A pg_crontab system table. One, that keeps stored procedures for
scheduled execution within the backend by the backend (like VACUUM)
would be nice. Yes, yes, yes. But that's not exactly the functionality
of 'background triggers'.

> Then you wouldn't need to write the daemon yourself, and it would even
> be "part of the framework" as long as you consider pgadmin to be "part
> of the framework".

I wouldn't.

I would only call it a framework if I can say COMMIT within the trigger
body. Or alternatively, if I can define a trigger, so to say: "FOR EACH
COMMIT" (pls note, that it's a different point in time, then "FOR EACH
STATEMENT") which I could also define as "DETACHED" - launched by the
forked backend.

sory :)

regards,

-R

Re: background triggers?

От
"Dawid Kuroczko"
Дата:
On 5/25/06, Rafal Pietrak <rafal@zorro.isa-geek.com> wrote:
> > Well, exactly "not being interested in the outcome" is IMHO the reason
> > why your demands clash with archtecture of a RDBMS. Most RDBMS go a long
> > way to ensure consistency and safety of your data, once they promised
> > (usually by not raising an error at least on "commit") that they stored
> > them. This doesn't match very well with asynchronous processes for which
> > "nobody cares" (In the sense that there is nobody to reports errors to).
>
> No, no no. This is completly different story.
>
> That would really be very missfortunate if we couldn't relay on RDBMS
> 'confirmed storage'.

Oh, I think Florian meant that it is strange that your application is not
interested in the trigger's output.  Of course one might want to add
a notify-about-a-trigger-failure-by-email feature to circumvent that,
but I won't be going so far off.

What is here, is that with your approach, you fire a trigger and forget
about it.  It either commits some time later, or does not, and you
don't know it.  You don't know it, because your application went on,
did other things, and has no way of knowing what's with the commit.

Well, you can speculate, that you will notice that no work is being
done.  But why?  Maybe the trigger is inefficient and isss soo sloooow,
iittt taaaakessss aaaaaggeeees tooo cooompleeete.  Or maybe
it ROLLBACKed, effectively removing all evidence of the work done.
With this approach, you don't know it -- and this is what probably
struck Florian's "strange for RDBMS" feeling.

> Here I'm just not interested in that procedure outcome: if it eventually
> COMMITED or ROLLED-BACK. But obviously I am interested in consistancy of
> database, when the detached procedure COMMITS.

Hmm.  How are you going to get it?  No, an "on ROLLBACK" trigger is not
a good idea! :-)

> I mean. It looks like this is *really* a novelty for RDBMS design - I
> feel, that real programmers here (you guys :) are so hard to persuade
> its necesary, because it's so far from the 'synchronous nature' of
> clasical RDBMS design and triggers in particular.

Don't get me wrong, but a word "bizzarre" is more suitable than
"novelty".  The background processing is there since very long
time -- why do you think LISTEN/NOTIFY was implemented? :)

> But I'd like to express my believe, that having such tool within the
> server can help build better database applications.

write faster <> write better.  As I wrote some time earlier, you can
code a trigger in PL/perlU doing exactly what you want.  The more
usual approach of using LISTEN/NOTIFY or a cron job is easier to
manage (you have much better control on how many times the
given function is called).  Imagine a query with thousands of INSERTS
grouped inside a transaction.  Your background trigger will mean
that postgresql will be spawning awfully alot of new connections,
for nothing, as they won't see a new rows from different transaction.

You said that your scheme would implement exclusive locking.
Well, if I were writing such an application, I would rather want such
code to be fired not more frequently than 1 minute.
ON COMMIT DETACHED WITH MINIMUM DELAY '1 minute'::interval; ? :-)

For locking you can simply use existing infrastructure, like
CREATE TABLE my_locking_table (lasttime timestamptz);
INSERT INTO my_locking_table(now());
BEGIN
  SELECT lasttime FROM my_locking_table WHERE lasttime < now()-'1
minute'::interval FOR UPDATE NOWAIT;
  IF FOUND THEN
    -- do dome work
    UPDATE my_locking_table SET lattime=now();
    RETURN;
  END IF;
  EXECPTION when locked...
END;

And if you want to check for 'ps auxw|grep backup.sh', you may
also, without need for extending these things.

> I would only call it a framework if I can say COMMIT within the trigger
> body. Or alternatively, if I can define a trigger, so to say: "FOR EACH
> COMMIT" (pls note, that it's a different point in time, then "FOR EACH
> STATEMENT") which I could also define as "DETACHED" - launched by the
> forked backend.

Actually, I like the idea of "ON COMMIT" trigger (though without the
"DETACHED" part), but this is another story...

   Regards,
       Dawid

Re: background triggers?

От
"Florian G. Pflug"
Дата:
Dawid Kuroczko wrote:
> On 5/25/06, Rafal Pietrak <rafal@zorro.isa-geek.com> wrote:
>> > Well, exactly "not being interested in the outcome" is IMHO the reason
>> > why your demands clash with archtecture of a RDBMS. Most RDBMS go a
>> long
>> > way to ensure consistency and safety of your data, once they promised
>> > (usually by not raising an error at least on "commit") that they stored
>> > them. This doesn't match very well with asynchronous processes for
>> which
>> > "nobody cares" (In the sense that there is nobody to reports errors
>> to)..
>>
>> No, no no. This is completly different story.
>>
>> That would really be very missfortunate if we couldn't relay on RDBMS
>> 'confirmed storage'.
>
> Oh, I think Florian meant that it is strange that your application is not
> interested in the trigger's output.  Of course one might want to add
> a notify-about-a-trigger-failure-by-email feature to circumvent that,
> but I won't be going so far off.
Ouch. ;-)

> What is here, is that with your approach, you fire a trigger and forget
> about it.  It either commits some time later, or does not, and you
> don't know it.  You don't know it, because your application went on,
> did other things, and has no way of knowing what's with the commit.
>
> Well, you can speculate, that you will notice that no work is being
> done.  But why?  Maybe the trigger is inefficient and isss soo sloooow,
> iittt taaaakessss aaaaaggeeees tooo cooompleeete.  Or maybe
> it ROLLBACKed, effectively removing all evidence of the work done.
> With this approach, you don't know it -- and this is what probably
> struck Florian's "strange for RDBMS" feeling.
This is exactly what I meant. Since there is always a possibillity for
failure (out of space, maschine crash, ...), all guarantees a RDBMS
gives you are of the form "If I responded with OK to command ... _then_
you can assume .....".

Doing things asynchronously defeats this for two reasons. First, nobody
will be able to check "if it responded OK". Second, since your proposed
background trigger would _need_ to fire only after the transaction
comitted (Which snapshot should it see otherwise? And when would its own
changes be visible?). But what if the trigger fails then? I can't cause
the transaction that caused it's run to rollback now, because the
database already committed that transaction.

>> I mean. It looks like this is *really* a novelty for RDBMS design - I
>> feel, that real programmers here (you guys :) are so hard to persuade
>> its necesary, because it's so far from the 'synchronous nature' of
>> clasical RDBMS design and triggers in particular.
>
> Don't get me wrong, but a word "bizzarre" is more suitable than
> "novelty".  The background processing is there since very long
> time -- why do you think LISTEN/NOTIFY was implemented? :)
>
>> But I'd like to express my believe, that having such tool within the
>> server can help build better database applications.
IMHO it won't. The _very_ reason why people use database systems is
because they provide strong guarantees about isolation and durability
of transactions. _Additionally_ the provide convenient searching, and
indexing, and stuff like that. But, at least from my point of view, the
part really hard to get right is the correct behavior of transactions.

I know that people "grew up" with mysql, especially mysql before at
least version 4, think differently. For them, a database is merely a
convenient replacement for flat files, providing a nice language (sql)
to specify complicated algorithms which would be tendious to hand-code.
If this is your interpretation of what a database does, then I can see
why this "background trigger" feature sounds so intriguing.

But if look
at things from the "transactional" point of view, then the feature
doesn't sound intriguing any more, because it just doesn't work together
well with transactions, and would have very strange semantics when
compared to other feature of the database.

>> I would only call it a framework if I can say COMMIT within the trigger
>> body. Or alternatively, if I can define a trigger, so to say: "FOR EACH
>> COMMIT" (pls note, that it's a different point in time, then "FOR EACH
>> STATEMENT") which I could also define as "DETACHED" - launched by the
>> forked backend.
I'd suggest that you try to find a way to do what you want _outside_ of
the backend. I'd actually love to see a more generic pgAgent (see my
other mail) that is distributed in contrib, or even installed by default.

You could use such a "postgres-aware" cron to schedule vacuum runs,
periodic cleanup jobs, or start some procedure on some notification.
_But_ it wouldn't be part of the backend. It would be a separate
process, connecting like any other process.

> Actually, I like the idea of "ON COMMIT" trigger (though without the
> "DETACHED" part), but this is another story...
Me too, although 2PC-Transactions would actually need to call it on
prepare, not on commit, so "on commit" would be a bad name.

greetings, Florian Pflug

Re: background triggers?

От
Rafal Pietrak
Дата:
On Thu, 2006-05-25 at 20:27 +0200, Dawid Kuroczko wrote:
> On 5/25/06, Rafal Pietrak <rafal@zorro.isa-geek.com> wrote:
> > Here I'm just not interested in that procedure outcome: if it eventually
> > COMMITED or ROLLED-BACK. But obviously I am interested in consistancy of
> > database, when the detached procedure COMMITS.
>
> Hmm.  How are you going to get it?  No, an "on ROLLBACK" trigger is not
> a good idea! :-)

good point. But no. I was just to quick with typeing. I meant 'basically
interested', that is I'd like to relay on COMMITs guaranee of consistant
database. But I don't have to see that guarantee on every COMMIT. I'd
prefere to trust the database on that.

trigger on ROLLBACK ..... now, that's bizzare.

> > I mean. It looks like this is *really* a novelty for RDBMS design - I
> > feel, that real programmers here (you guys :) are so hard to persuade
> > its necesary, because it's so far from the 'synchronous nature' of
> > clasical RDBMS design and triggers in particular.
>
> Don't get me wrong, but a word "bizzarre" is more suitable than

OK. agree :)

> You said that your scheme would implement exclusive locking.

No. never mentioned locking. Not me.

> Well, if I were writing such an application, I would rather want such
> code to be fired not more frequently than 1 minute.
> ON COMMIT DETACHED WITH MINIMUM DELAY '1 minute'::interval; ? :-)

sweet. But there are issues.

In cases when the main traffic (keep in ming a nnumerous concurent
batches of INSERTS) does not rest for a single second, the above would
probably delay the agregare UPDATES indefinitely. Which may not
neceserly be appropriate. So there should be an indication, whether to
wait for 1 minute idle-time between INSERTS (foreground/main), or 1
minute idle-time between UPDATES (background/trigger).

Still, very, very desirable.

> Actually, I like the idea of "ON COMMIT" trigger (though without the
> "DETACHED" part), but this is another story...

By now, I like it even more then the initial 1) 2) 3) 4) scenario :) ...
but the DETACHED is vital.

regards,

-R