Обсуждение: PostgreSQL Function Language Performance: C vs PL/PGSQL
I have been Googling for answers on this for a while, and have not been able to find anything satisfactory.
Imagine that you have a stored procedure which is currently written using PL/PGSQL. This stored procedure performs lots of long, complex SQL queries (95% SELECT statements, 5% INSERT or UPDATE) and these queries are interspersed with some minor math and some control logic, along with some logging through the use of RAISE. Each logging statement is inside an IF/THEN which just checks a boolean flag to determine if logging is turned on. The function returns a set of cursors to several different result sets. The function is 50%-60% SQL queries and the rest is logging, control logic, and little bit of math.
Would a query such as this obtain any performance improvement by being re-written using C?
Are there specific cases where writing a function in C would be highly desirable verses using PL/PGSQL (aside from simply gaining access to functionality not present in PL/PGSQL)?
Are there specific cases where writing a function in C would be slower than writing the equivalent in PL/PGSQL?
Basically, I am looking for some guidelines based primarily on performance of when I should use C to write a function verses using PL/PGSQL.
Can anybody quantify any of the performance differences between doing a particular task in C verses doing the same thing in PL/PGSQL? For example, performing a SELECT query or executing a certain number of lines of control logic (primarily IF/THEN, but an occasional loop included)? How about assignments or basic math like addition/subtraction/multiplication/division?
When executing SQL queries inside a C-based function, is there any way to have all of the SQL queries pre-planned through the compilation process, definition of the function, and loading of the .so file similar to PL/PGSQL? Would I get better performance writing each SQL query as a stored procedure and then call these stored procedures from within a C-based function which does the logging, math, control logic, and builds the result sets and cursors?
Thanks in advance for any answers anyone can provide to these questions.
* Eliot Gable (egable+pgsql-performance@gmail.com) wrote: > Would a query such as this obtain any performance improvement by being > re-written using C? I wouldn't expect the queries called by the pl/pgsql function to be much faster if called through SPI from C instead. I think the question you need to answer is- how long does the pl/pgsql code take vs. the overall time the function takes as a whole? You could then consider that your 'max benefit' (or pretty close to it) which could be gained by rewriting it in C. > Are there specific cases where writing a function in C would be highly > desirable verses using PL/PGSQL (aside from simply gaining access to > functionality not present in PL/PGSQL)? Cases where a function is called over and over again, or there are loops which go through tons of data, or there's alot of data processing to be done. > Are there specific cases where writing a function in C would be slower than > writing the equivalent in PL/PGSQL? Probably not- provided the C code is written correctly. You can certainly screw that up (eg: not preparing a query in C and having PG replan it every time would probably chew up any advantage C has over pl/pgsql, in a simple function). > Basically, I am looking for some guidelines based primarily on performance > of when I should use C to write a function verses using PL/PGSQL. Realize that C functions have alot of other issues associated with them- typically they're much larger foot-guns, for one, for another, C is an untrusted language because it can do all kinds of bad things. So you have to be a superuser to create them. > Can anybody quantify any of the performance differences between doing a > particular task in C verses doing the same thing in PL/PGSQL? For example, > performing a SELECT query or executing a certain number of lines of control > logic (primarily IF/THEN, but an occasional loop included)? How about > assignments or basic math like > addition/subtraction/multiplication/division? Actually performing a SELECT through SPI vs. calling it from pl/pgsql probably won't result in that much difference, presuming most of the time there is in the actual query itself. Assignments, basic math, control logic, etc, will all be faster in C. You need to figure out if that work is taking enough time to justify the switch though. > When executing SQL queries inside a C-based function, is there any way to > have all of the SQL queries pre-planned through the compilation process, > definition of the function, and loading of the .so file similar to PL/PGSQL? You might be able to do that when the module is loaded, but I'm not 100% sure.. Depends on if you can start using SPI in _PG_init.. I think there was some discussion about that recently but I'm not sure what the answer was. > Would I get better performance writing each SQL query as a stored procedure > and then call these stored procedures from within a C-based function which > does the logging, math, control logic, and builds the result sets and > cursors? Uhh, I'd guess 'no' to that one. Thanks, Stephen
Вложения
Thanks for the quick follow-up. So, you are saying that if I can do SPI in _PG_init, then I could prepare all my queries there and they would be prepared once for the entire function when it is loaded? That would certainly achieve what I want. Does anybody know whether I can do SPI in _PG_init?
The function gets called a lot, but not in the same transaction. It is only called once per transaction.
--
Eliot Gable
"We do not inherit the Earth from our ancestors: we borrow it from our children." ~David Brower
"I decided the words were too conservative for me. We're not borrowing from our children, we're stealing from them--and it's not even considered to be a crime." ~David Brower
"Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live; not live to eat.) ~Marcus Tullius Cicero
On Wed, May 26, 2010 at 12:18 PM, Stephen Frost <sfrost@snowman.net> wrote:
* Eliot Gable (egable+pgsql-performance@gmail.com) wrote:I wouldn't expect the queries called by the pl/pgsql function to be much
> Would a query such as this obtain any performance improvement by being
> re-written using C?
faster if called through SPI from C instead. I think the question you
need to answer is- how long does the pl/pgsql code take vs. the overall
time the function takes as a whole? You could then consider that your
'max benefit' (or pretty close to it) which could be gained by rewriting
it in C.Cases where a function is called over and over again, or there are loops
> Are there specific cases where writing a function in C would be highly
> desirable verses using PL/PGSQL (aside from simply gaining access to
> functionality not present in PL/PGSQL)?
which go through tons of data, or there's alot of data processing to be
done.Probably not- provided the C code is written correctly. You can
> Are there specific cases where writing a function in C would be slower than
> writing the equivalent in PL/PGSQL?
certainly screw that up (eg: not preparing a query in C and having PG
replan it every time would probably chew up any advantage C has over
pl/pgsql, in a simple function).Realize that C functions have alot of other issues associated with them-
> Basically, I am looking for some guidelines based primarily on performance
> of when I should use C to write a function verses using PL/PGSQL.
typically they're much larger foot-guns, for one, for another, C is an
untrusted language because it can do all kinds of bad things. So you
have to be a superuser to create them.Actually performing a SELECT through SPI vs. calling it from pl/pgsql
> Can anybody quantify any of the performance differences between doing a
> particular task in C verses doing the same thing in PL/PGSQL? For example,
> performing a SELECT query or executing a certain number of lines of control
> logic (primarily IF/THEN, but an occasional loop included)? How about
> assignments or basic math like
> addition/subtraction/multiplication/division?
probably won't result in that much difference, presuming most of the
time there is in the actual query itself. Assignments, basic math,
control logic, etc, will all be faster in C. You need to figure out if
that work is taking enough time to justify the switch though.You might be able to do that when the module is loaded, but I'm not 100%
> When executing SQL queries inside a C-based function, is there any way to
> have all of the SQL queries pre-planned through the compilation process,
> definition of the function, and loading of the .so file similar to PL/PGSQL?
sure.. Depends on if you can start using SPI in _PG_init.. I think
there was some discussion about that recently but I'm not sure what the
answer was.Uhh, I'd guess 'no' to that one.
> Would I get better performance writing each SQL query as a stored procedure
> and then call these stored procedures from within a C-based function which
> does the logging, math, control logic, and builds the result sets and
> cursors?
Thanks,
Stephen
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
iEYEARECAAYFAkv9Sd8ACgkQrzgMPqB3kihj/gCdEIA8DhnvZX4Hz3tof6yzLscS
Lf8An2Xp8R/KXnkmp8uWg+84Cz7Pp7R3
=AX4g
-----END PGP SIGNATURE-----
--
Eliot Gable
"We do not inherit the Earth from our ancestors: we borrow it from our children." ~David Brower
"I decided the words were too conservative for me. We're not borrowing from our children, we're stealing from them--and it's not even considered to be a crime." ~David Brower
"Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live; not live to eat.) ~Marcus Tullius Cicero
* Eliot Gable (egable+pgsql-performance@gmail.com) wrote: > Thanks for the quick follow-up. So, you are saying that if I can do SPI in > _PG_init, then I could prepare all my queries there and they would be > prepared once for the entire function when it is loaded? That would > certainly achieve what I want. Does anybody know whether I can do SPI in > _PG_init? Unless you're using EXECUTE in your pl/pgsql, the queries in your pl/pgsql function are already getting prepared on the first call of the function for a given backend connection.. If you're using EXECUTE in pl/gpsql then your problem might be planning time. Moving that to C isn't going to change things as much as you might hope if you still have to plan the query every time you call it.. > The function gets called a lot, but not in the same transaction. It is only > called once per transaction. That's not really relevant.. Is it called alot from the same backend/database connection? If so, and if you're using regular SELECT statements and the like (not EXECUTE), then they're getting prepared the first time they're used and that is kept across transactions. Thanks, Stephen
Вложения
Ah, that clears things up. Yes, the connections are more or less persistent. I have a connection manager which doles connections out to the worker threads and reclaims them when the workers are done with them. It dynamically adds new connections based on load. Each worker obtains a connection from the connection manager, performs a transaction which involves executing the function and pulling back the results from the cursors, then releases the connection back to the connection manager for other workers to use. So, this means that even when written in C, the SQL queries will be planned and cached on each connection after the first execution. So, I guess the question just becomes whether using SPI in C has any extra overhead verses using PL/PGSQL which might make it slower for performing queries. Since PostgreSQL is written in C, I assume there is no such additional overhead. I assume that the PL/PGSQL implementation at its heart also uses SPI to perform those executions. Is that a fair statement?
--
Eliot Gable
"We do not inherit the Earth from our ancestors: we borrow it from our children." ~David Brower
"I decided the words were too conservative for me. We're not borrowing from our children, we're stealing from them--and it's not even considered to be a crime." ~David Brower
"Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live; not live to eat.) ~Marcus Tullius Cicero
On Wed, May 26, 2010 at 12:32 PM, Stephen Frost <sfrost@snowman.net> wrote:
* Eliot Gable (egable+pgsql-performance@gmail.com) wrote:> Thanks for the quick follow-up. So, you are saying that if I can do SPI inUnless you're using EXECUTE in your pl/pgsql, the queries in your
> _PG_init, then I could prepare all my queries there and they would be
> prepared once for the entire function when it is loaded? That would
> certainly achieve what I want. Does anybody know whether I can do SPI in
> _PG_init?
pl/pgsql function are already getting prepared on the first call of the
function for a given backend connection.. If you're using EXECUTE in
pl/gpsql then your problem might be planning time. Moving that to C
isn't going to change things as much as you might hope if you still have
to plan the query every time you call it..That's not really relevant.. Is it called alot from the same
> The function gets called a lot, but not in the same transaction. It is only
> called once per transaction.
backend/database connection? If so, and if you're using regular SELECT
statements and the like (not EXECUTE), then they're getting prepared the
first time they're used and that is kept across transactions.
Thanks,
Stephen
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
iEYEARECAAYFAkv9TTMACgkQrzgMPqB3kijiNQCfY/wTud+VZ4Z53Lw8cNY/N9ZD
0R4AnA4diz1aptFGYXh3j8N9/k96C7/S
=6oz+
-----END PGP SIGNATURE-----
--
Eliot Gable
"We do not inherit the Earth from our ancestors: we borrow it from our children." ~David Brower
"I decided the words were too conservative for me. We're not borrowing from our children, we're stealing from them--and it's not even considered to be a crime." ~David Brower
"Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live; not live to eat.) ~Marcus Tullius Cicero
* Eliot Gable (egable+pgsql-performance@gmail.com) wrote: > Since PostgreSQL is written in C, I assume there is no > such additional overhead. I assume that the PL/PGSQL implementation at its > heart also uses SPI to perform those executions. Is that a fair statement? Right, but I also wouldn't expect a huge improvment either, unless you're calling these queries a ton, or the queries that you're calling from the pl/pgsql are pretty short-lived. Don't get me wrong, C is going to be faster, but it depends on exactly what's going on as to if it's going to be an overall improvment of, say, 10%, or a 10-fold improvment. :) Thanks, Stephen
Вложения
On 5/26/10 9:47 AM, Stephen Frost wrote: > * Eliot Gable (egable+pgsql-performance@gmail.com) wrote: >> Since PostgreSQL is written in C, I assume there is no >> such additional overhead. I assume that the PL/PGSQL implementation at its >> heart also uses SPI to perform those executions. Is that a fair statement? > > Right, but I also wouldn't expect a huge improvment either, unless > you're calling these queries a ton, or the queries that you're calling > from the pl/pgsql are pretty short-lived. > > Don't get me wrong, C is going to be faster, but it depends on exactly > what's going on as to if it's going to be an overall improvment of, say, > 10%, or a 10-fold improvment. :) Or a 0.1% improvement, which is more likely. Or that the PL/PGSQL version is even faster than the C version, because ifyou do any string regexp in your function, Perl has extremely efficient algorithms, probably better than you have timeto write in C. We use Perl extensively and have never had any complaints. The database activity completely dominates all queries, and theperformance of Perl has never even been noticable. We use a C functions for a few things, and it is a big nuisance. Every time you upgrade Postgres or your OS, there's a chancethe recompile will fail because of changed header files. Any bugs in your code crash Postgres itself. We avoid Cas much as possible (and I love C, been doing it since 1984). Craig
On Wed, May 26, 2010 at 12:41 PM, Eliot Gable <egable+pgsql-performance@gmail.com> wrote: > Ah, that clears things up. Yes, the connections are more or less persistent. > I have a connection manager which doles connections out to the worker > threads and reclaims them when the workers are done with them. It > dynamically adds new connections based on load. Each worker obtains a > connection from the connection manager, performs a transaction which > involves executing the function and pulling back the results from the > cursors, then releases the connection back to the connection manager for > other workers to use. So, this means that even when written in C, the SQL > queries will be planned and cached on each connection after the first > execution. So, I guess the question just becomes whether using SPI in C has > any extra overhead verses using PL/PGSQL which might make it slower for > performing queries. Since PostgreSQL is written in C, I assume there is no > such additional overhead. I assume that the PL/PGSQL implementation at its > heart also uses SPI to perform those executions. Is that a fair statement? At best, if you are a ninja with the marginally documented backend api, you will create code that goes about as fast as your pl/pgsql function for 10 times the amount of input work, unless there are heavy amounts of 'other than sql' code in your function. The reason to write C in the backend is: *) Interface w/3rd party libraries w/C linkage *) Do things that are illegal in regular SQL (write files, etc) *) Make custom types Things like that. If your pl/pgsql function is running slow, it's probably better to look at what's going on there. merlin
On Fri, 28 May 2010, Merlin Moncure wrote: > At best, if you are a ninja with the marginally documented backend > api, you will create code that goes about as fast as your pl/pgsql > function for 10 times the amount of input work, unless there are heavy > amounts of 'other than sql' code in your function. The reason to > write C in the backend is: > > *) Interface w/3rd party libraries w/C linkage > *) Do things that are illegal in regular SQL (write files, etc) > *) Make custom types The major case I found when writing pl/pgsql was when trying to build arrays row by row. AFAIK when I tried it, adding a row to an array caused the whole array to be copied, which put a bit of a damper on performance. Matthew -- "The problem with defending the purity of the English language is that English is about as pure as a cribhouse whore. We don't just borrow words; on occasion, English has pursued other languages down alleyways to beat them unconscious and rifle their pockets for new vocabulary." - James Nicoll
* Matthew Wakeling (matthew@flymine.org) wrote: > The major case I found when writing pl/pgsql was when trying to build > arrays row by row. AFAIK when I tried it, adding a row to an array caused > the whole array to be copied, which put a bit of a damper on performance. Using the built-ins now available in 8.4 (array_agg), that copying doesn't happen any more. Thanks, Stephen
Вложения
On Tue, 1 Jun 2010, Stephen Frost wrote: > * Matthew Wakeling (matthew@flymine.org) wrote: >> The major case I found when writing pl/pgsql was when trying to build >> arrays row by row. AFAIK when I tried it, adding a row to an array caused >> the whole array to be copied, which put a bit of a damper on performance. > > Using the built-ins now available in 8.4 (array_agg), that copying > doesn't happen any more. Thanks. I had wondered if that had been improved. Matthew -- Our riverbanks and seashores have a beauty all can share, provided there's at least one boot, three treadless tyres, a half-eaten pork pie, some oil drums, an old felt hat, a lorry-load of tar blocks, and a broken bedstead there. -- Flanders and Swann
On Tue, Jun 1, 2010 at 8:59 AM, Matthew Wakeling <matthew@flymine.org> wrote: > On Tue, 1 Jun 2010, Stephen Frost wrote: >> >> * Matthew Wakeling (matthew@flymine.org) wrote: >>> >>> The major case I found when writing pl/pgsql was when trying to build >>> arrays row by row. AFAIK when I tried it, adding a row to an array caused >>> the whole array to be copied, which put a bit of a damper on performance. >> >> Using the built-ins now available in 8.4 (array_agg), that copying >> doesn't happen any more. > > Thanks. I had wondered if that had been improved. even better is array(query) -- which has been around for a while. not too many people know about it because it's syntactically weird but it's the preferred way to build arrays when you don't need true aggregation (group by and such). generally speaking, concatenation of any kind in loops should be avoided in pl/pgsql. in fact, whenever writing pl/pgsql, it's all to easy to over-use the loop construct...every time you're looping it's always good to ask yourself: 'can this be done in a query?'. merlin