Обсуждение: PostgreSQL Function Language Performance: C vs PL/PGSQL

Поиск
Список
Период
Сортировка

PostgreSQL Function Language Performance: C vs PL/PGSQL

От
Eliot Gable
Дата:
I have been Googling for answers on this for a while, and have not been able to find anything satisfactory.

Imagine that you have a stored procedure which is currently written using PL/PGSQL. This stored procedure performs lots of long, complex SQL queries (95% SELECT statements, 5% INSERT or UPDATE) and these queries are interspersed with some minor math and some control logic, along with some logging through the use of RAISE. Each logging statement is inside an IF/THEN which just checks a boolean flag to determine if logging is turned on. The function returns a set of cursors to several different result sets. The function is 50%-60% SQL queries and the rest is logging, control logic, and little bit of math. 

Would a query such as this obtain any performance improvement by being re-written using C?

Are there specific cases where writing a function in C would be highly desirable verses using PL/PGSQL (aside from simply gaining access to functionality not present in PL/PGSQL)?

Are there specific cases where writing a function in C would be slower than writing the equivalent in PL/PGSQL?

Basically, I am looking for some guidelines based primarily on performance of when I should use C to write a function verses using PL/PGSQL. 

Can anybody quantify any of the performance differences between doing a particular task in C verses doing the same thing in PL/PGSQL? For example, performing a SELECT query or executing a certain number of lines of control logic (primarily IF/THEN, but an occasional loop included)? How about assignments or basic math like addition/subtraction/multiplication/division? 

When executing SQL queries inside a C-based function, is there any way to have all of the SQL queries pre-planned through the compilation process, definition of the function, and loading of the .so file similar to PL/PGSQL? Would I get better performance writing each SQL query as a stored procedure and then call these stored procedures from within a C-based function which does the logging, math, control logic, and builds the result sets and cursors?

Thanks in advance for any answers anyone can provide to these questions.

Re: PostgreSQL Function Language Performance: C vs PL/PGSQL

От
Stephen Frost
Дата:
* Eliot Gable (egable+pgsql-performance@gmail.com) wrote:
> Would a query such as this obtain any performance improvement by being
> re-written using C?

I wouldn't expect the queries called by the pl/pgsql function to be much
faster if called through SPI from C instead.  I think the question you
need to answer is- how long does the pl/pgsql code take vs. the overall
time the function takes as a whole?  You could then consider that your
'max benefit' (or pretty close to it) which could be gained by rewriting
it in C.

> Are there specific cases where writing a function in C would be highly
> desirable verses using PL/PGSQL (aside from simply gaining access to
> functionality not present in PL/PGSQL)?

Cases where a function is called over and over again, or there are loops
which go through tons of data, or there's alot of data processing to be
done.

> Are there specific cases where writing a function in C would be slower than
> writing the equivalent in PL/PGSQL?

Probably not- provided the C code is written correctly.  You can
certainly screw that up (eg: not preparing a query in C and having PG
replan it every time would probably chew up any advantage C has over
pl/pgsql, in a simple function).

> Basically, I am looking for some guidelines based primarily on performance
> of when I should use C to write a function verses using PL/PGSQL.

Realize that C functions have alot of other issues associated with them-
typically they're much larger foot-guns, for one, for another, C is an
untrusted language because it can do all kinds of bad things.  So you
have to be a superuser to create them.

> Can anybody quantify any of the performance differences between doing a
> particular task in C verses doing the same thing in PL/PGSQL? For example,
> performing a SELECT query or executing a certain number of lines of control
> logic (primarily IF/THEN, but an occasional loop included)? How about
> assignments or basic math like
> addition/subtraction/multiplication/division?

Actually performing a SELECT through SPI vs. calling it from pl/pgsql
probably won't result in that much difference, presuming most of the
time there is in the actual query itself.  Assignments, basic math,
control logic, etc, will all be faster in C.  You need to figure out if
that work is taking enough time to justify the switch though.

> When executing SQL queries inside a C-based function, is there any way to
> have all of the SQL queries pre-planned through the compilation process,
> definition of the function, and loading of the .so file similar to PL/PGSQL?

You might be able to do that when the module is loaded, but I'm not 100%
sure..  Depends on if you can start using SPI in _PG_init..  I think
there was some discussion about that recently but I'm not sure what the
answer was.

> Would I get better performance writing each SQL query as a stored procedure
> and then call these stored procedures from within a C-based function which
> does the logging, math, control logic, and builds the result sets and
> cursors?

Uhh, I'd guess 'no' to that one.

    Thanks,

        Stephen

Вложения

Re: PostgreSQL Function Language Performance: C vs PL/PGSQL

От
Eliot Gable
Дата:
Thanks for the quick follow-up. So, you are saying that if I can do SPI in _PG_init, then I could prepare all my queries there and they would be prepared once for the entire function when it is loaded? That would certainly achieve what I want. Does anybody know whether I can do SPI in _PG_init?

The function gets called a lot, but not in the same transaction. It is only called once per transaction. 

On Wed, May 26, 2010 at 12:18 PM, Stephen Frost <sfrost@snowman.net> wrote:
* Eliot Gable (egable+pgsql-performance@gmail.com) wrote:
> Would a query such as this obtain any performance improvement by being
> re-written using C?

I wouldn't expect the queries called by the pl/pgsql function to be much
faster if called through SPI from C instead.  I think the question you
need to answer is- how long does the pl/pgsql code take vs. the overall
time the function takes as a whole?  You could then consider that your
'max benefit' (or pretty close to it) which could be gained by rewriting
it in C.

> Are there specific cases where writing a function in C would be highly
> desirable verses using PL/PGSQL (aside from simply gaining access to
> functionality not present in PL/PGSQL)?

Cases where a function is called over and over again, or there are loops
which go through tons of data, or there's alot of data processing to be
done.

> Are there specific cases where writing a function in C would be slower than
> writing the equivalent in PL/PGSQL?

Probably not- provided the C code is written correctly.  You can
certainly screw that up (eg: not preparing a query in C and having PG
replan it every time would probably chew up any advantage C has over
pl/pgsql, in a simple function).

> Basically, I am looking for some guidelines based primarily on performance
> of when I should use C to write a function verses using PL/PGSQL.

Realize that C functions have alot of other issues associated with them-
typically they're much larger foot-guns, for one, for another, C is an
untrusted language because it can do all kinds of bad things.  So you
have to be a superuser to create them.

> Can anybody quantify any of the performance differences between doing a
> particular task in C verses doing the same thing in PL/PGSQL? For example,
> performing a SELECT query or executing a certain number of lines of control
> logic (primarily IF/THEN, but an occasional loop included)? How about
> assignments or basic math like
> addition/subtraction/multiplication/division?

Actually performing a SELECT through SPI vs. calling it from pl/pgsql
probably won't result in that much difference, presuming most of the
time there is in the actual query itself.  Assignments, basic math,
control logic, etc, will all be faster in C.  You need to figure out if
that work is taking enough time to justify the switch though.

> When executing SQL queries inside a C-based function, is there any way to
> have all of the SQL queries pre-planned through the compilation process,
> definition of the function, and loading of the .so file similar to PL/PGSQL?

You might be able to do that when the module is loaded, but I'm not 100%
sure..  Depends on if you can start using SPI in _PG_init..  I think
there was some discussion about that recently but I'm not sure what the
answer was.

> Would I get better performance writing each SQL query as a stored procedure
> and then call these stored procedures from within a C-based function which
> does the logging, math, control logic, and builds the result sets and
> cursors?

Uhh, I'd guess 'no' to that one.

       Thanks,

               Stephen

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkv9Sd8ACgkQrzgMPqB3kihj/gCdEIA8DhnvZX4Hz3tof6yzLscS
Lf8An2Xp8R/KXnkmp8uWg+84Cz7Pp7R3
=AX4g
-----END PGP SIGNATURE-----




--
Eliot Gable

"We do not inherit the Earth from our ancestors: we borrow it from our children." ~David Brower

"I decided the words were too conservative for me. We're not borrowing from our children, we're stealing from them--and it's not even considered to be a crime." ~David Brower

"Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live; not live to eat.) ~Marcus Tullius Cicero

Re: PostgreSQL Function Language Performance: C vs PL/PGSQL

От
Stephen Frost
Дата:
* Eliot Gable (egable+pgsql-performance@gmail.com) wrote:
> Thanks for the quick follow-up. So, you are saying that if I can do SPI in
> _PG_init, then I could prepare all my queries there and they would be
> prepared once for the entire function when it is loaded? That would
> certainly achieve what I want. Does anybody know whether I can do SPI in
> _PG_init?

Unless you're using EXECUTE in your pl/pgsql, the queries in your
pl/pgsql function are already getting prepared on the first call of the
function for a given backend connection..  If you're using EXECUTE in
pl/gpsql then your problem might be planning time.  Moving that to C
isn't going to change things as much as you might hope if you still have
to plan the query every time you call it..

> The function gets called a lot, but not in the same transaction. It is only
> called once per transaction.

That's not really relevant..  Is it called alot from the same
backend/database connection?  If so, and if you're using regular SELECT
statements and the like (not EXECUTE), then they're getting prepared the
first time they're used and that is kept across transactions.

    Thanks,

        Stephen

Вложения

Re: PostgreSQL Function Language Performance: C vs PL/PGSQL

От
Eliot Gable
Дата:
Ah, that clears things up. Yes, the connections are more or less persistent. I have a connection manager which doles connections out to the worker threads and reclaims them when the workers are done with them. It dynamically adds new connections based on load. Each worker obtains a connection from the connection manager, performs a transaction which involves executing the function and pulling back the results from the cursors, then releases the connection back to the connection manager for other workers to use. So, this means that even when written in C, the SQL queries will be planned and cached on each connection after the first execution. So, I guess the question just becomes whether using SPI in C has any extra overhead verses using PL/PGSQL which might make it slower for performing queries. Since PostgreSQL is written in C, I assume there is no such additional overhead. I assume that the PL/PGSQL implementation at its heart also uses SPI to perform those executions. Is that a fair statement?

On Wed, May 26, 2010 at 12:32 PM, Stephen Frost <sfrost@snowman.net> wrote:
* Eliot Gable (egable+pgsql-performance@gmail.com) wrote:
> Thanks for the quick follow-up. So, you are saying that if I can do SPI in
> _PG_init, then I could prepare all my queries there and they would be
> prepared once for the entire function when it is loaded? That would
> certainly achieve what I want. Does anybody know whether I can do SPI in
> _PG_init?

Unless you're using EXECUTE in your pl/pgsql, the queries in your
pl/pgsql function are already getting prepared on the first call of the
function for a given backend connection..  If you're using EXECUTE in
pl/gpsql then your problem might be planning time.  Moving that to C
isn't going to change things as much as you might hope if you still have
to plan the query every time you call it..

> The function gets called a lot, but not in the same transaction. It is only
> called once per transaction.

That's not really relevant..  Is it called alot from the same
backend/database connection?  If so, and if you're using regular SELECT
statements and the like (not EXECUTE), then they're getting prepared the
first time they're used and that is kept across transactions.

       Thanks,

               Stephen

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkv9TTMACgkQrzgMPqB3kijiNQCfY/wTud+VZ4Z53Lw8cNY/N9ZD
0R4AnA4diz1aptFGYXh3j8N9/k96C7/S
=6oz+
-----END PGP SIGNATURE-----




--
Eliot Gable

"We do not inherit the Earth from our ancestors: we borrow it from our children." ~David Brower

"I decided the words were too conservative for me. We're not borrowing from our children, we're stealing from them--and it's not even considered to be a crime." ~David Brower

"Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live; not live to eat.) ~Marcus Tullius Cicero

Re: PostgreSQL Function Language Performance: C vs PL/PGSQL

От
Stephen Frost
Дата:
* Eliot Gable (egable+pgsql-performance@gmail.com) wrote:
> Since PostgreSQL is written in C, I assume there is no
> such additional overhead. I assume that the PL/PGSQL implementation at its
> heart also uses SPI to perform those executions. Is that a fair statement?

Right, but I also wouldn't expect a huge improvment either, unless
you're calling these queries a ton, or the queries that you're calling
from the pl/pgsql are pretty short-lived.

Don't get me wrong, C is going to be faster, but it depends on exactly
what's going on as to if it's going to be an overall improvment of, say,
10%, or a 10-fold improvment. :)

    Thanks,

        Stephen

Вложения

Re: PostgreSQL Function Language Performance: C vs PL/PGSQL

От
Craig James
Дата:
On 5/26/10 9:47 AM, Stephen Frost wrote:
> * Eliot Gable (egable+pgsql-performance@gmail.com) wrote:
>> Since PostgreSQL is written in C, I assume there is no
>> such additional overhead. I assume that the PL/PGSQL implementation at its
>> heart also uses SPI to perform those executions. Is that a fair statement?
>
> Right, but I also wouldn't expect a huge improvment either, unless
> you're calling these queries a ton, or the queries that you're calling
> from the pl/pgsql are pretty short-lived.
>
> Don't get me wrong, C is going to be faster, but it depends on exactly
> what's going on as to if it's going to be an overall improvment of, say,
> 10%, or a 10-fold improvment. :)

Or a 0.1% improvement, which is more likely.  Or that the PL/PGSQL version is even faster than the C version, because
ifyou do any string regexp in your function, Perl has extremely efficient algorithms, probably better than you have
timeto write in C. 

We use Perl extensively and have never had any complaints.  The database activity completely dominates all queries, and
theperformance of Perl has never even been noticable. 

We use a C functions for a few things, and it is a big nuisance.  Every time you upgrade Postgres or your OS, there's a
chancethe recompile will fail because of changed header files.  Any bugs in your code crash Postgres itself.  We avoid
Cas much as possible (and I love C, been doing it since 1984). 

Craig

Re: PostgreSQL Function Language Performance: C vs PL/PGSQL

От
Merlin Moncure
Дата:
On Wed, May 26, 2010 at 12:41 PM, Eliot Gable
<egable+pgsql-performance@gmail.com> wrote:
> Ah, that clears things up. Yes, the connections are more or less persistent.
> I have a connection manager which doles connections out to the worker
> threads and reclaims them when the workers are done with them. It
> dynamically adds new connections based on load. Each worker obtains a
> connection from the connection manager, performs a transaction which
> involves executing the function and pulling back the results from the
> cursors, then releases the connection back to the connection manager for
> other workers to use. So, this means that even when written in C, the SQL
> queries will be planned and cached on each connection after the first
> execution. So, I guess the question just becomes whether using SPI in C has
> any extra overhead verses using PL/PGSQL which might make it slower for
> performing queries. Since PostgreSQL is written in C, I assume there is no
> such additional overhead. I assume that the PL/PGSQL implementation at its
> heart also uses SPI to perform those executions. Is that a fair statement?

At best, if you are a ninja with the marginally documented backend
api, you will create code that goes about as fast as your pl/pgsql
function for 10 times the amount of input work, unless there are heavy
amounts of 'other than sql' code in your function.  The reason to
write C in the backend is:

*) Interface w/3rd party libraries w/C linkage
*) Do things that are illegal in regular SQL (write files, etc)
*) Make custom types

Things like that.  If your pl/pgsql function is running slow, it's
probably better to look at what's going on there.

merlin

Re: PostgreSQL Function Language Performance: C vs PL/PGSQL

От
Matthew Wakeling
Дата:
On Fri, 28 May 2010, Merlin Moncure wrote:
> At best, if you are a ninja with the marginally documented backend
> api, you will create code that goes about as fast as your pl/pgsql
> function for 10 times the amount of input work, unless there are heavy
> amounts of 'other than sql' code in your function.  The reason to
> write C in the backend is:
>
> *) Interface w/3rd party libraries w/C linkage
> *) Do things that are illegal in regular SQL (write files, etc)
> *) Make custom types

The major case I found when writing pl/pgsql was when trying to build
arrays row by row. AFAIK when I tried it, adding a row to an array caused
the whole array to be copied, which put a bit of a damper on performance.

Matthew

--
 "The problem with defending the purity of the English language is that
 English is about as pure as a cribhouse whore. We don't just borrow words;
 on occasion, English has pursued other languages down alleyways to beat
 them unconscious and rifle their pockets for new vocabulary." - James Nicoll

Re: PostgreSQL Function Language Performance: C vs PL/PGSQL

От
Stephen Frost
Дата:
* Matthew Wakeling (matthew@flymine.org) wrote:
> The major case I found when writing pl/pgsql was when trying to build
> arrays row by row. AFAIK when I tried it, adding a row to an array caused
> the whole array to be copied, which put a bit of a damper on performance.

Using the built-ins now available in 8.4 (array_agg), that copying
doesn't happen any more.

    Thanks,

        Stephen

Вложения

Re: PostgreSQL Function Language Performance: C vs PL/PGSQL

От
Matthew Wakeling
Дата:
On Tue, 1 Jun 2010, Stephen Frost wrote:
> * Matthew Wakeling (matthew@flymine.org) wrote:
>> The major case I found when writing pl/pgsql was when trying to build
>> arrays row by row. AFAIK when I tried it, adding a row to an array caused
>> the whole array to be copied, which put a bit of a damper on performance.
>
> Using the built-ins now available in 8.4 (array_agg), that copying
> doesn't happen any more.

Thanks. I had wondered if that had been improved.

Matthew

--
 Our riverbanks and seashores have a beauty all can share, provided
 there's at least one boot, three treadless tyres, a half-eaten pork
 pie, some oil drums, an old felt hat, a lorry-load of tar blocks,
 and a broken bedstead there.                   -- Flanders and Swann

Re: PostgreSQL Function Language Performance: C vs PL/PGSQL

От
Merlin Moncure
Дата:
On Tue, Jun 1, 2010 at 8:59 AM, Matthew Wakeling <matthew@flymine.org> wrote:
> On Tue, 1 Jun 2010, Stephen Frost wrote:
>>
>> * Matthew Wakeling (matthew@flymine.org) wrote:
>>>
>>> The major case I found when writing pl/pgsql was when trying to build
>>> arrays row by row. AFAIK when I tried it, adding a row to an array caused
>>> the whole array to be copied, which put a bit of a damper on performance.
>>
>> Using the built-ins now available in 8.4 (array_agg), that copying
>> doesn't happen any more.
>
> Thanks. I had wondered if that had been improved.

even better is array(query) -- which has been around for a while. not
too many people know about it because it's syntactically weird but
it's the preferred way to build arrays when you don't need true
aggregation (group by and such).

generally speaking, concatenation of any kind in loops should be
avoided in pl/pgsql.  in fact, whenever writing pl/pgsql, it's all to
easy to over-use the loop construct...every time you're looping it's
always good to ask yourself: 'can this be done in a query?'.

merlin