Обсуждение: Question about SQL performance

Поиск
Список
Период
Сортировка

Question about SQL performance

От
Jason Lustig
Дата:
I have some questions about the performance of certain types of SQL
statements.

What sort of speed increase is there usually with binding parameters
(and thus preparing statements) v. straight sql with interpolated
variables? Will Postgresql realize that the following queries are
effectively the same (and thus re-use the query plan) or will it
think they are different?

    SELECT * FROM mytable WHERE item = 5;
    SELECT * FROM mytable WHERE item = 10;

Obviously to me or you they could use the same plan. From what I
understand (correct me if I'm wrong), if you use parameter binding -
like "SELECT * FROM mytable WHERE item = ?" - Postgresql will know
that the queries can re-use the query plan, but I don't know if the
system will recognize this with above situation.

Also, what's the difference between prepared statements (using
PREPARE and EXECUTE) and regular functions (CREATE FUNCTION)? How do
they impact performance? From what I understand there is no exact
parallel to stored procedures (as in MS SQL or oracle, that are
completely precompiled) in Postgresql. At the same time, the
documentation (and other sites as well, probably because they don't
know what they're talking about when it comes to databases) is vague
because  PL/pgSQL is often said to be able to write stored procedures
but nowhere does it say that PL/pgSQL programs are precompiled.

Thanks
Jason

Re: Question about SQL performance

От
PFC
Дата:
> What sort of speed increase is there usually with binding parameters
> (and thus preparing statements) v. straight sql with interpolated
> variables? Will Postgresql realize that the following queries are
> effectively the same (and thus re-use the query plan) or will it think
> they are different?
>
>     SELECT * FROM mytable WHERE item = 5;
>     SELECT * FROM mytable WHERE item = 10;

    No, if you send the above as text (not prepared) they are two different
queries.
    Postgres' query executor is so fast that parsing and planning can take
longer than query execution sometimes. This is true of very simple selects
like above, or some very complex queries which take a long time to plan
but don't actually process a lot of rows.
    I had this huge query (1 full page of SQL) with 5 joins, aggregates and
subqueries, returning about 30 rows ; it executed in about 5 ms, planning
and parsing time was significant...

> Obviously to me or you they could use the same plan. From what I
> understand (correct me if I'm wrong), if you use parameter binding -
> like "SELECT * FROM mytable WHERE item = ?" - Postgresql will know that
> the queries can re-use the query plan, but I don't know if the system
> will recognize this with above situation.

    It depends if your client library is smart enough to prepare the
statements...

> Also, what's the difference between prepared statements (using PREPARE
> and EXECUTE) and regular functions (CREATE FUNCTION)? How do they impact
> performance? From what I understand there is no exact parallel to stored
> procedures (as in MS SQL or oracle, that are completely precompiled) in
> Postgresql. At the same time, the documentation (and other sites as
> well, probably because they don't know what they're talking about when
> it comes to databases) is vague because  PL/pgSQL is often said to be
> able to write stored procedures but nowhere does it say that PL/pgSQL
> programs are precompiled.

    PG stores the stored procedures as text. On first invocation, in each
connection, they are "compiled", ie. all statements in the SP are
prepared, so the first invocation in a connection is slower than next
invocations. This is a problem if you do not use persistent connections.

    A simple select, when prepared, will take about 25 microseconds inside a
SP and 50-100 microseconds as a query over the network. If not prepared,
about 150 µs or 2-3x slower.

    FYI Postgres beats MyISAM on "small simple selects" if you use prepared
queries.


    I use the following Python code to auto-prepare my queries :

db = PGConn( a function that returns a DB connection )
db.prep_exec( "SELECT * FROM stuff WHERE id = %s", 1 )    # prepares and
executes
db.prep_exec( "SELECT * FROM stuff WHERE id = %s", 2 )    # executes only


class PGConn( object ):

    def __init__( self, db_connector ):
        self.db_connector = db_connector
        self.reconnect()

    def reconnect( self ):
        self.prep_cache = {}
        self.db = self.db_connector()
        self.db.set_isolation_level( 0 ) # autocommit

    def cursor( self ):
#        return self.db.cursor( cursor_factory=psycopg2.extras.DictCursor )
        return self.db.cursor(  )

    def execute( self, sql, *args ):
        cursor = self.cursor()
        try:
            cursor.execute( sql, args )
        except:
            cursor.execute( "ROLLBACK" )
            raise
        return cursor

    def executemany( self, sql, *args ):
        cursor = self.cursor()
        try:
            cursor.executemany( sql, args )
        except:
            cursor.execute( "ROLLBACK" )
            raise
        return cursor

    def prep_exec( self, sql, *args ):
        cursor = self.cursor()
        stmt = self.prep_cache.get( sql )
        if stmt is None:
            name = "stmt_%s" % (len( self.prep_cache ) + 1)
            if args:
                prep = sql % tuple( "$%d"%(x+1) for x in xrange( len( args )) )
            else:
                prep = sql
            prep = "PREPARE %s AS %s" % (name, prep)
            cursor.execute( prep )
            if args:
                stmt = "EXECUTE %s( %s )" % (name, ", ".join( ["%s"] * len( args ) ))
            else:
                stmt = "EXECUTE %s" % (name,)
            self.prep_cache[ sql ] = stmt

        try:
            cursor.execute( stmt, args )
        except Exception, e:
            traceback.print_exc()
            print "Error while executing prepared SQL statement :", stmt
            print "Arguments :", args
            print "Original SQL is :", sql
            cursor.execute( "ROLLBACK" )
            raise

        return cursor


Re: Question about SQL performance

От
mark@mark.mielke.cc
Дата:
On Mon, Jun 04, 2007 at 11:18:30PM -0400, Jason Lustig wrote:
> I have some questions about the performance of certain types of SQL
> statements.
>
> What sort of speed increase is there usually with binding parameters
> (and thus preparing statements) v. straight sql with interpolated
> variables? Will Postgresql realize that the following queries are
> effectively the same (and thus re-use the query plan) or will it
> think they are different?
>
>     SELECT * FROM mytable WHERE item = 5;
>     SELECT * FROM mytable WHERE item = 10;
>
> Obviously to me or you they could use the same plan. From what I
> understand (correct me if I'm wrong), if you use parameter binding -
> like "SELECT * FROM mytable WHERE item = ?" - Postgresql will know
> that the queries can re-use the query plan, but I don't know if the
> system will recognize this with above situation.

Although they could use the same plan, it is possible that using the
same plan is non-optimal. For example, if I know that 99% of the table
contains item = 5, but only 1% of the table contains item = 10, then
the 'best plan' may be a sequential scan for item = 5, but an index scan
for item = 10.

In the case of a prepared query, PostgreSQL will pick a plan that will
be good for all values, which may not be best for specific queries. You
save parsing time and planning time, but may risk increasing execution
time.

> Also, what's the difference between prepared statements (using
> PREPARE and EXECUTE) and regular functions (CREATE FUNCTION)? How do
> they impact performance? From what I understand there is no exact
> parallel to stored procedures (as in MS SQL or oracle, that are
> completely precompiled) in Postgresql. At the same time, the
> documentation (and other sites as well, probably because they don't
> know what they're talking about when it comes to databases) is vague
> because  PL/pgSQL is often said to be able to write stored procedures
> but nowhere does it say that PL/pgSQL programs are precompiled.

I think you can find all of these answers in the documentation, including
my comments about prepared queries. Does it matter if the program is
precompiled? I believe it is, but why would it matter?

Are you addressing a real performance problem? Or are you trying to avoid
issues that you are not sure if they exist or not? :-)

Prepared queries are going to improve performance due to being able to
execute multiple queries without communicating back to the
client. Especially for short queries, network latency can be a
significant factor for execution speed.

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


Re: Question about SQL performance

От
Richard Huxton
Дата:
Jason Lustig wrote:
> I have some questions about the performance of certain types of SQL
> statements.
>
> What sort of speed increase is there usually with binding parameters
> (and thus preparing statements) v. straight sql with interpolated
> variables? Will Postgresql realize that the following queries are
> effectively the same (and thus re-use the query plan) or will it think
> they are different?

PG will plan "raw" sql every time you issue a query.

>     SELECT * FROM mytable WHERE item = 5;
>     SELECT * FROM mytable WHERE item = 10;
>
> Obviously to me or you they could use the same plan.

Except that in-between query 1 and 2 I inserted 10 million rows where
item=10. Still obvious?

 > From what I
> understand (correct me if I'm wrong), if you use parameter binding -
> like "SELECT * FROM mytable WHERE item = ?" - Postgresql will know that
> the queries can re-use the query plan, but I don't know if the system
> will recognize this with above situation.

If you are using PREPARE/EXECUTE (or your client-side library is doing
it for you).

> Also, what's the difference between prepared statements (using PREPARE
> and EXECUTE) and regular functions (CREATE FUNCTION)? How do they impact
> performance?

Functions can be in any language, but if they both are in SQL and do the
same thing, no real difference.

 > From what I understand there is no exact parallel to stored
> procedures (as in MS SQL or oracle, that are completely precompiled) in
> Postgresql.

You can write functions in C - that's compiled. Not sure if java
procedural code has its byte-code cached between sessions.

 > At the same time, the documentation (and other sites as
> well, probably because they don't know what they're talking about when
> it comes to databases) is vague because  PL/pgSQL is often said to be
> able to write stored procedures but nowhere does it say that PL/pgSQL
> programs are precompiled.

I don't see the connection.
1. You can write procedural code in pl/pgsql
2. It's not precompiled (it's "compiled" on first use)

Are you looking to solve a particular problem?

--
   Richard Huxton
   Archonet Ltd