Обсуждение: Performance Tuning Large PL/PGSQL Stored Procedure

Поиск
Список
Период
Сортировка

Performance Tuning Large PL/PGSQL Stored Procedure

От
Eliot Gable
Дата:
I would greatly appreciate any advice anyone could give me in terms of performance tuning a large PL/PGSQL stored procedure. First, I should point out that I have read a considerable amount of information in the online PostgreSQL documentation and on Google about optimizing SQL queries and PostgreSQL. I am looking for any additional insights that my research may have overlooked. So, let me explain a little about how this stored procedure is constructed.

The stored procedure is written in PL/PGSQL and is 3,000+ lines long. It works with around 60 tables and a dozen or so complex types that are defined in an additional 2,000 lines of SQL.

The procedure takes individual arguments of various types as input parameters and returns a single row result of a complex type.

The complex type contains roughly 25 fields, mostly text, plus another 10 REFCURSORs.

The application that calls the stored procedure was also written by me in C++ and uses asynchronous libpq API commands to execute a single SQL transaction which calls the stored procedure and also performs a FETCH ALL on all open cursors. It then returns all results into various structures. All rows of all cursors that are open are always used for every call to the stored procedure.

The stored procedure implements various logic which determines which tables in the database to query and how to filter the results from those queries to return only the relevant information needed by the C++ application.

Currently, in terms of optimization, I have taken the following approaches based on the following reasoning:

1. For all queries whose results need to return to the C++ application, I utilize cursors so that all results can be readied and generated by the stored procedure with just one call to the PostgreSQL backend. I accomplish this using asynchronous libpq API calls to issue a single transaction to the server. The first command after the BEGIN is a SELECT * FROM MyStoredProc(blah), which is then followed by FETCH ALL commands for each cursor that the stored procedure leaves open. I then follow up with multiple API calls to return the results and retrieve the rows from those results. This minimizes the amount of back-and-forth between my C++ application and the database backend.

1a. Incidentally, I am also using cursors for most queries inside the stored procedure that do not return results to the C++ application. I am unsure whether this incurs a performance penalty compared to doing, for example, a SELECT ... INTO (var1, var2, ...) within the body of the stored procedure. Instead of using SELECT ... INTO, I am using OPEN cursor_name; FETCH cursor_name INTO (var1, var2).

2. I have built indexes on all columns that are used in where clauses and joins.

3. I use lots of joins to pull in data from various tables (there are around 60 tables that are queried with each call to the stored procedure).

4. When performing joins, the first table listed is the one that returns the most context-specific results, which always also means that it has the most-specific and fewest number of relevant rows. I then join them in order of least number of result rows with all inner joins preceding left outer joins.

5. Instead of using UNIONs and EXCEPT clauses, I use multiple WITH clauses to define several different query-specific views. I order them such that I can join additional tables in later WITH clauses to the views created previously in a way that minimizes the number of rows involved in the JOIN operations while still providing provably accurate result sets. The EXCEPT clauses are then replaced by also defining one view which contains a set of IDs that I want filtered from the final result set and using a WHERE id NOT IN (SELECT id FROM filtered_view). Typically, this approach leaves me with just one UNION of two previously defined views (the union is required because it is returning results from different tables with different columns), which is then aliased and joined to additional tables. This allows all of the the JOINS and the sole remaining UNION to be applied just once each in calculation of the final result set. As an example, two of the queries I replaced with this approach utilized four UNIONs followed by two EXCEPT clauses, and each of those utilized as many as 8 JOINs in building their result sets. In one case the query dropped from 173 "explain analyze" lines to 71 "explain analyze" lines and dropped from 1.2ms execution time to 0.49ms execution time. The other query started at 136 "explain analyze" lines and dropped to 66 "explain analyze" lines. It's execution time dropped from 1.6ms to 0.66ms. This is due to the fact that each WITH clause (and the JOINS/UNIONS contained in them) are executed just once for each query and can be used multiple times later. In addition, filters can be applied to the individual result sets for each WITH clause which reduces the number of rows being worked on during later JOIN and filtering operations.

6. I specify individual columns that are returned for nearly every query utilized in the stored procedure.

7. When I have a query I need to execute whose results will be used in several other queries, I currently open the cursor for that query using the FOR ... LOOP construct to retrieve all records in the result set and build a result array using the array_append() method. I then do an unnest(my_array) AS blah inside the other queries where I need to use the results so that they do not need to be re-computed for each query. I am unsure about how efficient this method is, and I was wondering if there is some way to create a view inside a stored procedure that could be used instead. In each of the cases where I do this, the results from the set must be returned via an open cursor to my C++ application as it also requires the results from these particular queries.


Some things to note:

For most of the joins, they simply join on foreign key IDs and no additional filtering criteria are used on their information. Only a handful of the joined tables bring in additional criteria by which to filter the result set.

The approach used in 7 with cursors and building a result array which is then unnested has me worried in terms of performance. It seems to me there should be some better way to accomplish the same thing.

The stored procedure does not perform updates or inserts, only selects.


Anyway, if anyone has some insights into performance tweaks or new approaches I might try that may lead to enhanced performance, I would greatly appreciate hearing about them. I am not completely dissatisfied with the performance of the stored procedure, but this is going to be used in a very high volume environment (hundreds or possibly even thousands of calls to this stored procedure every second). The more performant it is, the less hardware I need to deploy. It currently takes about 45ms to execute the query and retrieve all of the results into the C++ application. Query execution time takes up about 16ms of that 45ms. This is on a 3-year old Core 2 Duo, so it's not exactly top-of-the-line hardware.


--
Eliot Gable

"We do not inherit the Earth from our ancestors: we borrow it from our children." ~David Brower

"I decided the words were too conservative for me. We're not borrowing from our children, we're stealing from them--and it's not even considered to be a crime." ~David Brower

"Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live; not live to eat.) ~Marcus Tullius Cicero

Re: Performance Tuning Large PL/PGSQL Stored Procedure

От
Merlin Moncure
Дата:
On Tue, Mar 23, 2010 at 5:00 PM, Eliot Gable
<egable+pgsql-performance@gmail.com> wrote:
> The complex type contains roughly 25 fields, mostly text, plus another 10
> REFCURSORs.

How many rows min/max/avg are coming back in your refcursors?  Are you
using cursors in order to return multiple complex data structures
(sets, etc) in a single function call?

> The application that calls the stored procedure was also written by me in
> C++ and uses asynchronous libpq API commands to execute a single SQL
> transaction which calls the stored procedure and also performs a FETCH ALL
> on all open cursors. It then returns all results into various structures.
> All rows of all cursors that are open are always used for every call to the
> stored procedure.
>
> The stored procedure implements various logic which determines which tables
> in the database to query and how to filter the results from those queries to
> return only the relevant information needed by the C++ application.
>
> Currently, in terms of optimization, I have taken the following approaches
> based on the following reasoning:
>
> 1. For all queries whose results need to return to the C++ application, I
> utilize cursors so that all results can be readied and generated by the
> stored procedure with just one call to the PostgreSQL backend. I accomplish
> this using asynchronous libpq API calls to issue a single transaction to the
> server. The first command after the BEGIN is a SELECT * FROM
> MyStoredProc(blah), which is then followed by FETCH ALL commands for each
> cursor that the stored procedure leaves open. I then follow up with multiple
> API calls to return the results and retrieve the rows from those results.
> This minimizes the amount of back-and-forth between my C++ application and
> the database backend.
>
> 1a. Incidentally, I am also using cursors for most queries inside the stored
> procedure that do not return results to the C++ application. I am unsure
> whether this incurs a performance penalty compared to doing, for example, a
> SELECT ... INTO (var1, var2, ...) within the body of the stored procedure.
> Instead of using SELECT ... INTO, I am using OPEN cursor_name; FETCH
> cursor_name INTO (var1, var2).
>
> 2. I have built indexes on all columns that are used in where clauses and
> joins.
>
> 3. I use lots of joins to pull in data from various tables (there are around
> 60 tables that are queried with each call to the stored procedure).
>
> 4. When performing joins, the first table listed is the one that returns the
> most context-specific results, which always also means that it has the
> most-specific and fewest number of relevant rows. I then join them in order
> of least number of result rows with all inner joins preceding left outer
> joins.
>
> 5. Instead of using UNIONs and EXCEPT clauses, I use multiple WITH clauses
> to define several different query-specific views. I order them such that I
> can join additional tables in later WITH clauses to the views created

WITH clauses can make your queries much easier to read and yield great
speedups if you need to access the table expression multiple times
from other parts of the query.  however, in some cases you can get
into trouble because a standard set of joins is going to give the
planner the most flexibility in terms of query optimization.

> previously in a way that minimizes the number of rows involved in the JOIN
> operations while still providing provably accurate result sets. The EXCEPT
> clauses are then replaced by also defining one view which contains a set of
> IDs that I want filtered from the final result set and using a WHERE id NOT
> IN (SELECT id FROM filtered_view). Typically, this approach leaves me with
> just one UNION of two previously defined views (the union is required


UNION is always an optimization target (did you mean UNION ALL?)

> 7. When I have a query I need to execute whose results will be used in
> several other queries, I currently open the cursor for that query using the
> FOR ... LOOP construct to retrieve all records in the result set and build a
> result array using the array_append() method. I then do an unnest(my_array)

do not use array_append.  always do array(select ...) whenever it is
possible. when it isn't, rethink your problem until it is possible.
only exception is to use array_agg aggregate if your problem really is
an aggregation type of thing.  as a matter of fact, any for...loop is
an optimization target because a re-think will probably yield a query
that does the same thing without paying for the loop.

>
> For most of the joins, they simply join on foreign key IDs and no additional
> filtering criteria are used on their information. Only a handful of the
> joined tables bring in additional criteria by which to filter the result
> set.
>
> The approach used in 7 with cursors and building a result array which is
> then unnested has me worried in terms of performance. It seems to me there
> should be some better way to accomplish the same thing.
>
> The stored procedure does not perform updates or inserts, only selects.
>
>
> Anyway, if anyone has some insights into performance tweaks or new
> approaches I might try that may lead to enhanced performance, I would
> greatly appreciate hearing about them. I am not completely dissatisfied with
> the performance of the stored procedure, but this is going to be used in a
> very high volume environment (hundreds or possibly even thousands of calls
> to this stored procedure every second). The more performant it is, the less
> hardware I need to deploy. It currently takes about 45ms to execute the
> query and retrieve all of the results into the C++ application. Query
> execution time takes up about 16ms of that 45ms. This is on a 3-year old
> Core 2 Duo, so it's not exactly top-of-the-line hardware.

If you are chasing milliseconds, using C/C++, and dealing with complex
data structures coming in/out of the database, I would absolutely
advise you to check out the libpqtypes library (disclaimer, I co-wrote
it!) in order to speed up data transfer. The library is highly
optimized and facilitates all transfers in binary which yields good
gains when sending types which are expensive to hammer to text (bytea,
timestamp, etc).

In addition, using libpqtypes you can use arrays of composites (in
8.3+) to send/receive complex structures (even trees, etc) and pull
the entire set of data in a single query.  This is an alternative to
the refcursor/fetch method which involves extra round trips and has
other problems (but is the way to go if you need to progressive fetch
large amounts of data).

As a general tip, I suggest 'divide and conquer'.  Sprinkle your
procedure with 'raise notice %', gettimeofday(); And record the time
spent on the various steps of the execution.  This will give better
measurements then pulling pieces of the function out and running them
outside with constants for the arguments.  Identify the problem spots
and direct your energies there.

merlin

http://libpqtypes.esilo.com/
http://pgfoundry.org/projects/libpqtypes/

Re: Performance Tuning Large PL/PGSQL Stored Procedure

От
Eliot Gable
Дата:

On Thu, Mar 25, 2010 at 10:00 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Tue, Mar 23, 2010 at 5:00 PM, Eliot Gable
<egable+pgsql-performance@gmail.com> wrote:
> The complex type contains roughly 25 fields, mostly text, plus another 10
> REFCURSORs.

How many rows min/max/avg are coming back in your refcursors?  Are you
using cursors in order to return multiple complex data structures
(sets, etc) in a single function call?


I think the largest number of rows is around 40. Most are substantially smaller. However, most of them have about two dozen or more columns, and I have already shortened the list of columns to the minimum possible. The average number of rows is around 10, but the largest sets of rows also have the most columns. I'm using the cursors in order to obtain multiple complex data structures in a single function call.

 
> The application that calls the stored procedure was also written by me in
> C++ and uses asynchronous libpq API commands to execute a single SQL
> transaction which calls the stored procedure and also performs a FETCH ALL
> on all open cursors. It then returns all results into various structures.
> All rows of all cursors that are open are always used for every call to the
> stored procedure.
>
> The stored procedure implements various logic which determines which tables
> in the database to query and how to filter the results from those queries to
> return only the relevant information needed by the C++ application.
>
> Currently, in terms of optimization, I have taken the following approaches
> based on the following reasoning:
>
> 1. For all queries whose results need to return to the C++ application, I
> utilize cursors so that all results can be readied and generated by the
> stored procedure with just one call to the PostgreSQL backend. I accomplish
> this using asynchronous libpq API calls to issue a single transaction to the
> server. The first command after the BEGIN is a SELECT * FROM
> MyStoredProc(blah), which is then followed by FETCH ALL commands for each
> cursor that the stored procedure leaves open. I then follow up with multiple
> API calls to return the results and retrieve the rows from those results.
> This minimizes the amount of back-and-forth between my C++ application and
> the database backend.
>
> 1a. Incidentally, I am also using cursors for most queries inside the stored
> procedure that do not return results to the C++ application. I am unsure
> whether this incurs a performance penalty compared to doing, for example, a
> SELECT ... INTO (var1, var2, ...) within the body of the stored procedure.
> Instead of using SELECT ... INTO, I am using OPEN cursor_name; FETCH
> cursor_name INTO (var1, var2).
>
> 2. I have built indexes on all columns that are used in where clauses and
> joins.
>
> 3. I use lots of joins to pull in data from various tables (there are around
> 60 tables that are queried with each call to the stored procedure).
>
> 4. When performing joins, the first table listed is the one that returns the
> most context-specific results, which always also means that it has the
> most-specific and fewest number of relevant rows. I then join them in order
> of least number of result rows with all inner joins preceding left outer
> joins.
>
> 5. Instead of using UNIONs and EXCEPT clauses, I use multiple WITH clauses
> to define several different query-specific views. I order them such that I
> can join additional tables in later WITH clauses to the views created

WITH clauses can make your queries much easier to read and yield great
speedups if you need to access the table expression multiple times
from other parts of the query.  however, in some cases you can get
into trouble because a standard set of joins is going to give the
planner the most flexibility in terms of query optimization.


So far, every case I have converted to WITH clauses has resulted in more than double the speed (half the time required to perform the query). The main reason appears to be from avoiding calculating JOIN conditions multiple times in different parts of the query due to the UNION and EXCEPT clauses.
 
> previously in a way that minimizes the number of rows involved in the JOIN
> operations while still providing provably accurate result sets. The EXCEPT
> clauses are then replaced by also defining one view which contains a set of
> IDs that I want filtered from the final result set and using a WHERE id NOT
> IN (SELECT id FROM filtered_view). Typically, this approach leaves me with
> just one UNION of two previously defined views (the union is required


UNION is always an optimization target (did you mean UNION ALL?)


Thanks for the suggestion on UNION ALL; I indeed do not need elimination of duplicates, so UNION ALL is a better option.
 
> 7. When I have a query I need to execute whose results will be used in
> several other queries, I currently open the cursor for that query using the
> FOR ... LOOP construct to retrieve all records in the result set and build a
> result array using the array_append() method. I then do an unnest(my_array)

do not use array_append.  always do array(select ...) whenever it is
possible. when it isn't, rethink your problem until it is possible.
only exception is to use array_agg aggregate if your problem really is
an aggregation type of thing.  as a matter of fact, any for...loop is
an optimization target because a re-think will probably yield a query
that does the same thing without paying for the loop.


I suspected it was a performance issue. I will see if I can find an alternative way of doing it. Based on your feedback, I think I may know how to do it now.
 
>
> For most of the joins, they simply join on foreign key IDs and no additional
> filtering criteria are used on their information. Only a handful of the
> joined tables bring in additional criteria by which to filter the result
> set.
>
> The approach used in 7 with cursors and building a result array which is
> then unnested has me worried in terms of performance. It seems to me there
> should be some better way to accomplish the same thing.
>
> The stored procedure does not perform updates or inserts, only selects.
>
>
> Anyway, if anyone has some insights into performance tweaks or new
> approaches I might try that may lead to enhanced performance, I would
> greatly appreciate hearing about them. I am not completely dissatisfied with
> the performance of the stored procedure, but this is going to be used in a
> very high volume environment (hundreds or possibly even thousands of calls
> to this stored procedure every second). The more performant it is, the less
> hardware I need to deploy. It currently takes about 45ms to execute the
> query and retrieve all of the results into the C++ application. Query
> execution time takes up about 16ms of that 45ms. This is on a 3-year old
> Core 2 Duo, so it's not exactly top-of-the-line hardware.

If you are chasing milliseconds, using C/C++, and dealing with complex
data structures coming in/out of the database, I would absolutely
advise you to check out the libpqtypes library (disclaimer, I co-wrote
it!) in order to speed up data transfer. The library is highly
optimized and facilitates all transfers in binary which yields good
gains when sending types which are expensive to hammer to text (bytea,
timestamp, etc).

The data returned from the application is just rows of strings, numbers (ints and doubles), and booleans. Would I see a good speedup with libpqtypes when dealing with those data types?
 
In addition, using libpqtypes you can use arrays of composites (in
8.3+) to send/receive complex structures (even trees, etc) and pull
the entire set of data in a single query.  This is an alternative to
the refcursor/fetch method which involves extra round trips and has
other problems (but is the way to go if you need to progressive fetch
large amounts of data).

So, you are saying that I can return a complex type as a result which contains arrays of other complex types and just use my single SELECT command to retrieve the whole data set? That would be much simpler and I imagine must faster.
 
As a general tip, I suggest 'divide and conquer'.  Sprinkle your
procedure with 'raise notice %', gettimeofday(); And record the time
spent on the various steps of the execution.  This will give better
measurements then pulling pieces of the function out and running them
outside with constants for the arguments.  Identify the problem spots
and direct your energies there.

As a matter of fact, I have a callback messaging function set up so that my RAISE NOTICE commands call back to my C++ program and generate log messages. The log messages show date + time + microseconds, so I can see how long it takes to go through each part.

I really am chasing milliseconds here, and I appreciate all your feedback. You've given me a relatively large number of possible optimizations I can try out. I will definitely try out the libpqtypes. That sounds like a promising way to further cut down on execution time. I think most of my performance penalty is in transfering the results back to the C++ application.


--
Eliot Gable

"We do not inherit the Earth from our ancestors: we borrow it from our children." ~David Brower

"I decided the words were too conservative for me. We're not borrowing from our children, we're stealing from them--and it's not even considered to be a crime." ~David Brower

"Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live; not live to eat.) ~Marcus Tullius Cicero

Re: Performance Tuning Large PL/PGSQL Stored Procedure

От
Merlin Moncure
Дата:
On Thu, Mar 25, 2010 at 11:56 PM, Eliot Gable
<egable+pgsql-performance@gmail.com> wrote:
>>
>> How many rows min/max/avg are coming back in your refcursors?  Are you
>> using cursors in order to return multiple complex data structures
>> (sets, etc) in a single function call?
>>
>
> I think the largest number of rows is around 40. Most are substantially
> smaller. However, most of them have about two dozen or more columns, and I
> have already shortened the list of columns to the minimum possible. The
> average number of rows is around 10, but the largest sets of rows also have
> the most columns. I'm using the cursors in order to obtain multiple complex
> data structures in a single function call.

ok, small sets.  yes, passing them back to the client as arrays is
probably going to be faster.  It's a trivial change to your proc.  you
have to define a type for your array element the way we are going to
use it.  you can use a composite type or a table (I prefer a table).

create table mystuff_t
(
  a text,
  b int,
  c timestamptz
);

create function myproc([...], mystuffs out mystuff_t[])
[inside proc]

replace your cursor declaration with this:

select array
(
   select (a,b,c)::mystuff_t from [...]
) into mystuffs;

code an alternate version of the function and then inside libpq
execute the query in binary and discard the results, timing the
results and comparing to how you run your query now also discarding
the results.  we want to time it this way because from timing it from
psql includes the time to print out the array in text format which we
can avoid with libpqtypes (which we are not going to mess with, until
we know there is a resaon to go in this direction).  We do need to
include the time to turn around and fetch the data from the
refcursors. If you see at least a 10-20% improvement, it warrants
further effort IMO (and say goodbye to refcursors forever).

>> WITH clauses can make your queries much easier to read and yield great
>> speedups if you need to access the table expression multiple times
>> from other parts of the query.  however, in some cases you can get
>> into trouble because a standard set of joins is going to give the
>> planner the most flexibility in terms of query optimization.
>>
>
> So far, every case I have converted to WITH clauses has resulted in more
> than double the speed (half the time required to perform the query). The
> main reason appears to be from avoiding calculating JOIN conditions multiple
> times in different parts of the query due to the UNION and EXCEPT clauses.

I have a hard time believing that unless there are other factors
compromising the planner like bad statistics or a non optimal query or
you are dealing with a relatively special case.

'EXCEPT' btw is also an optimization target. maybe think about
converting to 'letf join where rightcol is null' or something like
that.  not 100% sure, I think some work was done recently on except so
this advice may not be as true as it used to be, and possibly moot if
the number of rows being considered by except is very small.

> So, you are saying that I can return a complex type as a result which
> contains arrays of other complex types and just use my single SELECT command
> to retrieve the whole data set? That would be much simpler and I imagine
> must faster.

yes, however you will want to receive as few complex types as
possible, meaning your result set should still have multiple columns.
reducing the number of columns is not an optimization target.  in
other words, do the minimal amount of stacking necessary to allow
single query extraction of data.

> I really am chasing milliseconds here, and I appreciate all your feedback.
> You've given me a relatively large number of possible optimizations I can
> try out. I will definitely try out the libpqtypes. That sounds like a
> promising way to further cut down on execution time. I think most of my
> performance penalty is in transfering the results back to the C++
> application.

yes.  I've suggested libpqtypes to a number of people on the lists,
and you are what i'd consider the ideal candidate. libpqtypes will
completely transform the way you think about postgresql and libpq.
good luck. if you need help setting it up you can email me privately
or on the libpqtypes list.

merlin

Re: Performance Tuning Large PL/PGSQL Stored Procedure

От
Richard Huxton
Дата:
On 26/03/10 03:56, Eliot Gable wrote:
>
> I really am chasing milliseconds here, and I appreciate all your feedback.
> You've given me a relatively large number of possible optimizations I can
> try out. I will definitely try out the libpqtypes. That sounds like a
> promising way to further cut down on execution time. I think most of my
> performance penalty is in transfering the results back to the C++
> application.

In addition to all of Merlin's good advice, if the client is on a
different machine to the server then try sticking wireshark or similar
onto the connection. That should make it pretty clear where the main
costs are in getting your data back.

--
   Richard Huxton
   Archonet Ltd