Обсуждение: Writing most code in Stored Procedures
I have few queries regarding the use of Stored Procedures, Functions and Triggers in an RDBMS. (1) When to use Stored Procedure? Writing an INSERT query in a Stored Procedure is better or firing it from the application level? (2) Can a Trigger call a Stored Procedure? (3) What type of code must reside in application and what type in RDBMS? (4) Is it faster to work at application level or at the database level?
Rohit wrote: > I have few queries regarding the use of Stored Procedures, Functions > and Triggers in an RDBMS. These are all easy questions to answer: "it depends". OK, so you might want some reasons... > (1) When to use Stored Procedure? Writing an INSERT query in a Stored > Procedure is better or firing it from the application level? Some people like to have all access to the database through stored procedures. This can make it simpler to control access to the data and seems popular with MSSQL developers. Personally, I use SQL as my interface - that's its purpose. Actually, that's not quite true, I use various wrappers that generate SQL for most queries. > (2) Can a Trigger call a Stored Procedure? It has to, although they're just called functions in PostgreSQL, can be written in a range of procedural languages and take place within a transaction the same as any other function-call. > (3) What type of code must reside in application and what type in > RDBMS? Anything that's to do with data integrity I put in the database. Anything that's to do with process I put in the application (or middle-ware layer). The tricky bit is deciding what to do when you have something that's a bit of both. > (4) Is it faster to work at application level or at the database level? Probably faster in the database, assuming you have only one machine. If you have more than one machine then you can have each machine designed for its purpose. Of course, faster to run might be at the cost of more difficult to write. It might also mean your process code can crash the database server. -- Richard Huxton Archonet Ltd
On 8/15/07, Rohit <rpk.general@gmail.com> wrote: > I have few queries regarding the use of Stored Procedures, Functions > and Triggers in an RDBMS. > > (1) When to use Stored Procedure? Writing an INSERT query in a Stored > Procedure is better or firing it from the application level? > > (2) Can a Trigger call a Stored Procedure? > > (3) What type of code must reside in application and what type in > RDBMS? > > (4) Is it faster to work at application level or at the database level? One of the major advantages of stored procedures over application code is that stored procedures can get to the data much more quickly than an application can, in general. An application needs to talk to PostgreSQL through some sort of driver (JDBC, libpq, etc.) and typically data need to traverse a network to get from the database machine to the application server. Stored procedures, on the other hand, don't have the overhead of either the network or the driver. However, stored procedures can be more difficult to debug, profile, etc., so they might not be the best for really complex logic. They tend to be really useful if you're doing something that requires lots and lots of queries to the database, and don't need anything else (data from other sources, user interaction, etc.), because that takes greatest advantage of their quick connection to the data. Richard Huxton's point that stored procedures are typically best for data integrity types of functions, whereas business logic should often be in application code is an excellent rule of thumb. -Josh
On 8/15/07, Rohit <rpk.general@gmail.com> wrote: > I have few queries regarding the use of Stored Procedures, Functions > and Triggers in an RDBMS. > > (1) When to use Stored Procedure? Writing an INSERT query in a Stored > Procedure is better or firing it from the application level? > > (2) Can a Trigger call a Stored Procedure? > > (3) What type of code must reside in application and what type in > RDBMS? > > (4) Is it faster to work at application level or at the database level? I'm fairly hardcore about keeping as much business logic as I can in the database. In fact, I only do SELECTs from the application, and usually via Views. All inserts, updates and deletes are via procs. I'm a proponent of separating application code from presentation and application code from database code. And HTML from layout style, for that matter. In addition to the other reasons you've gotten: It lets me blackbox commonly used functions, such as a function to insert a normalized demographic record for a customer, an organization, a user, a guarantor, a physician, etc. It isolates database logic so it can be debugged separately from the application. It reduces the application's vulnerability to SQL injection, especially if another developer (never me <g>) forgets to massage potentially tainted user input. Another is because I typically do my web application programming in PHP5 but the offline scripts in Perl. Both can call the same stored procedures so I don't have multiple copies of database code to maintain and debug. Another is because I want transactions to start and end in the database, not in external application code which might crash before a COMMIT. Another is because I'm a freelancer and generally hand off my applications to the client's tech department, which is often a junior level grasshopper who knows just enough SQL to be dangerous. Using stored procedures raises the bar on what they need to know about RDMBSes before they start hacking working code. And, yes, it's faster. Particularly if business logic decisions have to be made in context with a transaction.
On 8/15/07, Steve Manes <smanes@magpie.com> wrote: > I'm fairly hardcore about keeping as much business logic as I can in the > database. In fact, I only do SELECTs from the application, and usually > via Views. All inserts, updates and deletes are via procs. I'm a > proponent of separating application code from presentation and > application code from database code. And HTML from layout style, for > that matter. > > In addition to the other reasons you've gotten: > Another is because I want transactions to start and end in the database, > not in external application code which might crash before a COMMIT. Hmm, how do you handle this logically? Do your applications never need to submit chunks of work at once? Or do you do something like fill in a temporary table, and have a proc work from that? > Another is because I'm a freelancer and generally hand off my > applications to the client's tech department, which is often a junior > level grasshopper who knows just enough SQL to be dangerous. Using > stored procedures raises the bar on what they need to know about RDMBSes > before they start hacking working code. ...although that helps explain "why" a bit more: if you're stuck with people who can't use transactions properly, it makes sense to keep them out of reach...
Trevor Talbot wrote: >> Another is because I want transactions to start and end in the database, >> not in external application code which might crash before a COMMIT. > > Hmm, how do you handle this logically? Do your applications never > need to submit chunks of work at once? Or do you do something like > fill in a temporary table, and have a proc work from that? Of course not "never" but the goal is one logical database transaction per page invocation. It's a discipline I got into when I was working on a site that clocked 3/4 billion page views month on a severely overtaxed Sun Oracle box. A lot of it involves caching on the web server. For instance, static data like pulldown data gets cached and refreshed either by expiry or by the application blowing the cache after an update. If I have a thousand users pulling the same list of physicians a hundred times a day, I cache it. If users need to view a complex report where the dependent data only changes once every 24 hours (like an adserver pull), I cache it and have an offline job rebuild it when fresh data becomes available. Result sets get cached either in the user's session or in hidden DIVs to reduce redundant database calls. I build database-intensive applications starting at the database, beginning with an ERD, then the proc primitives. Then I wireframe the application from the perspective of the database, building macros of procedures. After testing, these go into the application's API. The brass ring is being able to take a validated, prepared and filtered POST array and send it to a stored procedure. And rather than building a 200+ line SELECT statement in the application, building a view to encapsulate it. I'm not saying that's the best way to build an application. It's just what I'm most comfortable doing and it works for me.
Rohit wrote: >> (4) Is it faster to work at application level or at the database level? Richard Huxton wrote: > Probably faster in the database, assuming you have only one machine. If > you have more than one machine then you can have each machine designed > for its purpose. Of course, faster to run might be at the cost of more > difficult to write. It might also mean your process code can crash the > database server. Which would you rather have, a wrong answer really quickly, a correct answer slowly, or no answer at all? Be clear on the metric for "faster" before devising strategies. Personally I prefer "more correct" first. As Richard hinted, reliability and stability are usually important also, I'd say more important. "Faster to run" also depends on the degree of client concurrency. Sometimes middleware can scale logic better than the database engine can. Systems can achieve higher aggregate throughput with perhaps a reduction in individual response time. If you put logic where it "belongs" that can help scalability. The database tends to be a central resource for all concurrent clients, so logic in that layer is potentially a bottleneck. Danger in that layer is danger to everyone. Business logic that relies on data but performs session-specific processing should live in the middleware in many cases. One can throw extra hardware at that layer to parallelize client services, and errors tend to not escape a specific client session. Business logic that is actually data-integrity logic could break the system if it were in the middleware layer. Deadlocks, race conditions and other nastiness argue that that type of logic belongs in a more central location, with the data. One can then exploit the database engine capabilities for this stuff. As Richard metioned, there are plenty of boundary cases that require a judgment call. Try to analyze which approach will have more risk in such cases; sometimes that helps discriminate. It ain't always easy; that's why they pay us the big bucks. -- Lew
On 8/16/07, Steve Manes <smanes@magpie.com> wrote: > On 8/15/07, Rohit <rpk.general@gmail.com> wrote: > > Another is because I typically do my web application programming in PHP5 > but the offline scripts in Perl. Both can call the same stored > procedures so I don't have multiple copies of database code to maintain > and debug. I'm also thinking the same thing once before. Doing as much logic in the database and at the application layer most of it would be just a "SELECT .." call. It allows me to experiment with different kind of langguage (PHP, Python, Ruby) without having to rewrite the business logic of the application. But using this kind of approach make it hard to debug and maintain. Changing simple thing would mean updating the database schema and it would always frigthened me some silly mistake can affect the integrity of the data. In the end, I slowly move the logic out of the db and decided on one platform for the application (PHP). -- kamal, www.k4ml.com
Steve Manes wrote: > I'm fairly hardcore about keeping as much business logic as I can in the > database. In fact, I only do SELECTs from the application, and usually > via Views. All inserts, updates and deletes are via procs. ... > And, yes, it's faster. Particularly if business logic decisions have to > be made in context with a transaction. I have a thread I started ages ago over on the PERFORM list that I'm sadly just now being able to provide some insight on. I'll be replying on that thread in more detail, but the short of it turns out to be that at least in this one application, using stored procs for inserts is slowing down the app considerably. The app does high volume inserts and updates, about 16 million rows a day. By switching from stored procs to inline inserts, elapsed time dropped from 2595 seconds to 991 seconds for a test run. So the moral of the story is that, as anyone who has worked professionally for a number of years knows, no magic bullets exist. General guidelines can be helpful, but each scenario must be individually investigated. -- Guy Rouillier
Guy Rouillier wrote: > Steve Manes wrote: >> I'm fairly hardcore about keeping as much business logic as I can in >> the database. In fact, I only do SELECTs from the application, and >> usually via Views. All inserts, updates and deletes are via procs. > ... >> And, yes, it's faster. Particularly if business logic decisions have >> to be made in context with a transaction. > > I have a thread I started ages ago over on the PERFORM list that I'm > sadly just now being able to provide some insight on. I'll be > replying on that thread in more detail, but the short of it turns out > to be that at least in this one application, using stored procs for > inserts is slowing down the app considerably. The app does high > volume inserts and updates, about 16 million rows a day. By switching > from stored procs to inline inserts, elapsed time dropped from 2595 > seconds to 991 seconds for a test run. > > So the moral of the story is that, as anyone who has worked > professionally for a number of years knows, no magic bullets exist. > General guidelines can be helpful, but each scenario must be > individually investigated. > I would suggest separating the strategy of server-side biz rules from the implementation method of stored procedures. You can also implement biz rules as triggers and stick with inline inserts, updates and deletes, which is what we do in my shop. That being said, the enforcement of the biz rules has to be done somewhere, and no matter where it is done it will involve disk reads and validations. It always *seemed* to me that putting the code on a table trigger involves the least possible overhead for doing these things. It provides complete real-time enforcement within the transaction with no network overhead, and has the simplest possible interface: plain old SQL. -- Kenneth Downs Secure Data Software, Inc. www.secdat.com www.andromeda-project.org 631-689-7200 Fax: 631-689-0527 cell: 631-379-0010
Guy Rouillier wrote: > I have a thread I started ages ago over on the PERFORM list that I'm > sadly just now being able to provide some insight on. I'll be replying > on that thread in more detail, but the short of it turns out to be that > at least in this one application, using stored procs for inserts is > slowing down the app considerably. The app does high volume inserts and > updates, about 16 million rows a day. By switching from stored procs to > inline inserts, elapsed time dropped from 2595 seconds to 991 seconds > for a test run. > > So the moral of the story is that, as anyone who has worked > professionally for a number of years knows, no magic bullets exist. > General guidelines can be helpful, but each scenario must be > individually investigated. Absolutely. You can't assume that every application is going to fit neatly into the same development jig. Hope the MVC frameworks zealots don't read that. <g> I worked on a fairly high volume site using PG, an ad trafficking workflow application, which imported ~2 million placements daily from DoubleClick, OAS and Accipiter. Everything had to be imported and the reports run and cached by 8am so the clients stare blankly at 10,000 row Excel charts over their morning coffee. Moving all the application-bound inserts into stored procedures didn't achieve nearly the performance enhancement I'd assumed I'd get, which I figured was due to the overhead of the procs themselves.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 08/17/07 18:00, Steve Manes wrote: > Guy Rouillier wrote: >> I have a thread I started ages ago over on the PERFORM list that I'm >> sadly just now being able to provide some insight on. I'll be >> replying on that thread in more detail, but the short of it turns out >> to be that at least in this one application, using stored procs for >> inserts is slowing down the app considerably. The app does high >> volume inserts and updates, about 16 million rows a day. By switching >> from stored procs to inline inserts, elapsed time dropped from 2595 >> seconds to 991 seconds for a test run. >> >> So the moral of the story is that, as anyone who has worked >> professionally for a number of years knows, no magic bullets exist. >> General guidelines can be helpful, but each scenario must be >> individually investigated. > > Absolutely. You can't assume that every application is going to fit > neatly into the same development jig. Hope the MVC frameworks zealots > don't read that. <g> > > I worked on a fairly high volume site using PG, an ad trafficking > workflow application, which imported ~2 million placements daily from > DoubleClick, OAS and Accipiter. Everything had to be imported and the > reports run and cached by 8am so the clients stare blankly at 10,000 row > Excel charts over their morning coffee. > > Moving all the application-bound inserts into stored procedures didn't > achieve nearly the performance enhancement I'd assumed I'd get, which I > figured was due to the overhead of the procs themselves. Would that be because the original app was written in a compiled language, but the SPs in an interpreted language? - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGxjK1S9HxQb37XmcRAnYWAKCRV+INrpvl83lhnt4iadIMrBNIRgCgr8J2 UK3F87ji/24mrISLl+WmLnY= =5csM -----END PGP SIGNATURE-----
Ron Johnson wrote: >> Moving all the application-bound inserts into stored procedures didn't >> achieve nearly the performance enhancement I'd assumed I'd get, which I >> figured was due to the overhead of the procs themselves. > > Would that be because the original app was written in a compiled > language, but the SPs in an interpreted language? No, because the application language was Perl5 for both. I think it was just the overhead of 2 million inserts via procs versus 2 million inline inserts (without the proc overhead).
Steve Manes wrote: > Moving all the application-bound inserts into stored procedures didn't > achieve nearly the performance enhancement I'd assumed I'd get, which I > figured was due to the overhead of the procs themselves. That's the conclusion I'm coming to as well for my app with very high insert/update rates. Is there a PG knowledgebase where this kind of information might be useful? -- Guy Rouillier
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 08/17/07 21:45, Steve Manes wrote: > Ron Johnson wrote: >>> Moving all the application-bound inserts into stored procedures didn't >>> achieve nearly the performance enhancement I'd assumed I'd get, which I >>> figured was due to the overhead of the procs themselves. >> >> Would that be because the original app was written in a compiled >> language, but the SPs in an interpreted language? > > No, because the application language was Perl5 for both. I think it was > just the overhead of 2 million inserts via procs versus 2 million inline > inserts (without the proc overhead). Interesting. Does PG have to initiate the Perl interpreter every time you call a Perl-written SP? - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGxwiAS9HxQb37XmcRAgi2AJ9Yq2drImecZVTbZR0Wo4VKlpaiXwCgpjHo 8KcDWqDpW6BWNWCj+ZUQFU4= =Qlpg -----END PGP SIGNATURE-----
On 8/18/07, Ron Johnson <ron.l.johnson@cox.net> wrote: > Interesting. Does PG have to initiate the Perl interpreter every > time you call a Perl-written SP? IIRC PostgreSQL should only load the perl interpreter once per session. - Josh
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Josh Tolley wrote: > On 8/18/07, Ron Johnson <ron.l.johnson@cox.net> wrote: >> Interesting. Does PG have to initiate the Perl interpreter every >> time you call a Perl-written SP? > > IIRC PostgreSQL should only load the perl interpreter once per session. Right. Joshua D. Drake > > - Josh > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGxxmFATb/zqfZUUQRAoL+AJ0cyZ3FaB82PGbGvC/fPYWPo9UcjwCgjiDo U5XCNKFZhhdiJ3Lf5x68lKQ= =4HMG -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 08/18/07 11:08, Joshua D. Drake wrote: > Josh Tolley wrote: >> On 8/18/07, Ron Johnson <ron.l.johnson@cox.net> wrote: >>> Interesting. Does PG have to initiate the Perl interpreter every >>> time you call a Perl-written SP? >> IIRC PostgreSQL should only load the perl interpreter once per session. > > Right. I'd have been stunned if you did it any other way. So why is Perl-SP-INSERT so much slower than Perl-SQL-INSERT? (I can imagine that the SP code path would be longer, but since IO is the slowest part of the system, I'm surprised that it's *that* much slower.) - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGxx5jS9HxQb37XmcRAn1vAKCEWTovgcj/w/uFVK0nankGdbOFuACg6HSq dUnMN0dPsdQ8NKSDW0EahcU= =P8gN -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ron Johnson wrote: > On 08/18/07 11:08, Joshua D. Drake wrote: >> Josh Tolley wrote: >>> On 8/18/07, Ron Johnson <ron.l.johnson@cox.net> wrote: >>>> Interesting. Does PG have to initiate the Perl interpreter every >>>> time you call a Perl-written SP? >>> IIRC PostgreSQL should only load the perl interpreter once per session. >> Right. > > I'd have been stunned if you did it any other way. > > So why is Perl-SP-INSERT so much slower than Perl-SQL-INSERT? > > (I can imagine that the SP code path would be longer, but since IO > is the slowest part of the system, I'm surprised that it's *that* > much slower.) We would have to see his script as there are several variables at hand here. 1. Is he using do or execute with his perl-sql? If he is using execute then he is getting the advantage of prepared query inserts. 2. Did he use a single transaction? If so then he is not paying for 2 million commits, which he would be with the procedure. Sincerely, Joshua D. Drake - ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGxzFYATb/zqfZUUQRAvUWAKCHm4EOtmrblRcnoVLVSEA0/Hd1JQCfTbgu 4tRGmCvqZSuBsGvQyUOQBzU= =edfC -----END PGP SIGNATURE-----
Ron Johnson wrote: > Interesting. Does PG have to initiate the Perl interpreter every > time you call a Perl-written SP? I mean the *application* language was Perl for both the inline insert and the proc call. The proc was written in plpgsql.
Ron Johnson wrote: > So why is Perl-SP-INSERT so much slower than Perl-SQL-INSERT? > > (I can imagine that the SP code path would be longer, but since IO > is the slowest part of the system, I'm surprised that it's *that* > much slower.) I'm guessing that since PG allows overloaded SP names, the slowness is coming from resolving which SP to run. But that is just a guess. In my environment, I don't *have* overloaded SPs, only a single version of a given name. But when I was doing the conversion from Oracle, it took me a couple tries to get the SP signatures correct, so I'm pretty sure PG is still going through the resolution logic, even if you only have a single instance of a given name. -- Guy Rouillier