Обсуждение: Prepared statements performance

Поиск
Список
Период
Сортировка

Prepared statements performance

От
Daniel McGreal
Дата:
Hi!

My reading to date suggests that prepared statements should be faster to execute than issuing the same statement multiple times. However, issuing 100'000 INSERTs turned out to be more than ten times faster than executing the same prepared statement 100'000 times when executed via pgAdmin. The table was:

CREATE TABLE test
(
  one date,
  two boolean,
  three character varying,
  four integer,
  five numeric(18,5),
  id serial NOT NULL --note the index here
)

The prepared statement test lasting ~160 seconds was:

TRUNCATE test;

BEGIN;
PREPARE foo(date, boolean, varchar, integer, numeric(18,5)) AS
    INSERT INTO test (one, two, three, four, five) VALUES ($1, $2, $3, $4, $5);

EXECUTE foo('2011-01-01', true, 'three', 4, 5.5);
-- 99'999 more executes...
END;

The insertion test lasting ~12 seconds was:

TRUNCATE test;

BEGIN;
INSERT INTO test (one, two, three, four, five) VALUES ('2011-01-01', true, 'three', 4, 5.5);
-- 99'999 more inserts...
END;

I'm assuming then that I've done something mistakenly.

Many thanks,
Dan.

Re: Prepared statements performance

От
Daniel McGreal
Дата:
Hi again,

I did a follow up test using 'multi-value' inserts which is three times faster than multiple inserts thusly:

TRUNCATE test;
BEGIN;
INSERT INTO test (one, two, three, four, five) VALUES ('2011-01-01', true, 'three', 4, 5.5)
,('2011-01-01', true, 'three', 4, 5.5)
-- 99'998 more , ('2011-01-01', true, 'three', 4, 5.5) ...;
END;

This is the kind of speed increase I was hoping for when using prepared statements (which makes sense because in this multi-value insert the query is only being planned once?).

Thanks,
Dan.
P.S. Mac OS X 10.7.3 using PostgreSQL 9.1.2.

On Thu, May 10, 2012 at 9:25 AM, Daniel McGreal <daniel.mcgreal@redbite.com> wrote:
Hi!

My reading to date suggests that prepared statements should be faster to execute than issuing the same statement multiple times. However, issuing 100'000 INSERTs turned out to be more than ten times faster than executing the same prepared statement 100'000 times when executed via pgAdmin. The table was:

CREATE TABLE test
(
  one date,
  two boolean,
  three character varying,
  four integer,
  five numeric(18,5),
  id serial NOT NULL --note the index here
)

The prepared statement test lasting ~160 seconds was:

TRUNCATE test;

BEGIN;
PREPARE foo(date, boolean, varchar, integer, numeric(18,5)) AS
    INSERT INTO test (one, two, three, four, five) VALUES ($1, $2, $3, $4, $5);

EXECUTE foo('2011-01-01', true, 'three', 4, 5.5);
-- 99'999 more executes...
END;

The insertion test lasting ~12 seconds was:

TRUNCATE test;

BEGIN;
INSERT INTO test (one, two, three, four, five) VALUES ('2011-01-01', true, 'three', 4, 5.5);
-- 99'999 more inserts...
END;

I'm assuming then that I've done something mistakenly.

Many thanks,
Dan.


Re: Prepared statements performance

От
Pavel Stehule
Дата:
Hello

2012/5/10 Daniel McGreal <daniel.mcgreal@redbite.com>:
> Hi again,
>
> I did a follow up test using 'multi-value' inserts which is three times
> faster than multiple inserts thusly:
>

if you need speed, use a COPY statement - it should be 10x faster than INSERTS

Pavel

>
> TRUNCATE test;
> BEGIN;
> INSERT INTO test (one, two, three, four, five) VALUES ('2011-01-01', true,
> 'three', 4, 5.5)
>
> ,('2011-01-01', true, 'three', 4, 5.5)
> -- 99'998 more , ('2011-01-01', true, 'three', 4, 5.5) ...;
> END;
>
> This is the kind of speed increase I was hoping for when using prepared
> statements (which makes sense because in this multi-value insert the query
> is only being planned once?).
>
> Thanks,
> Dan.
> P.S. Mac OS X 10.7.3 using PostgreSQL 9.1.2.
>
>
>> On Thu, May 10, 2012 at 9:25 AM, Daniel McGreal
>> <daniel.mcgreal@redbite.com> wrote:
>>>
>>> Hi!
>>>
>>> My reading to date suggests that prepared statements should be faster to
>>> execute than issuing the same statement multiple times. However, issuing
>>> 100'000 INSERTs turned out to be more than ten times faster than executing
>>> the same prepared statement 100'000 times when executed via pgAdmin. The
>>> table was:
>>>
>>> CREATE TABLE test
>>> (
>>>   one date,
>>>   two boolean,
>>>   three character varying,
>>>   four integer,
>>>   five numeric(18,5),
>>>   id serial NOT NULL --note the index here
>>> )
>>>
>>> The prepared statement test lasting ~160 seconds was:
>>>
>>> TRUNCATE test;
>>>
>>> BEGIN;
>>> PREPARE foo(date, boolean, varchar, integer, numeric(18,5)) AS
>>>     INSERT INTO test (one, two, three, four, five) VALUES ($1, $2, $3,
>>> $4, $5);
>>>
>>> EXECUTE foo('2011-01-01', true, 'three', 4, 5.5);
>>> -- 99'999 more executes...
>>> END;
>>>
>>> The insertion test lasting ~12 seconds was:
>>>
>>> TRUNCATE test;
>>>
>>> BEGIN;
>>> INSERT INTO test (one, two, three, four, five) VALUES ('2011-01-01',
>>> true, 'three', 4, 5.5);
>>> -- 99'999 more inserts...
>>> END;
>>>
>>> I'm assuming then that I've done something mistakenly.
>>>
>>> Many thanks,
>>> Dan.
>>
>>
>

Re: Prepared statements performance

От
Daniel McGreal
Дата:
Hi,

Unfortunately these are experimental conditions. The conditions surrounding the intended application are such that my two options are prepared statements or many inserts. I put the multi-value inserts in as I was curious as to why prepared statements would be slower given they only plan the query once (as also does the multi-value insert, I assume).

It turns out though that the results are skewed by using pgAdmin. Executing my scripts from the command line gives much more appropriate results.

Thanks,
Dan.

On Thu, May 10, 2012 at 10:16 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hello

2012/5/10 Daniel McGreal <daniel.mcgreal@redbite.com>:
> Hi again,
>
> I did a follow up test using 'multi-value' inserts which is three times
> faster than multiple inserts thusly:
>

if you need speed, use a COPY statement - it should be 10x faster than INSERTS

Pavel

>
> TRUNCATE test;
> BEGIN;
> INSERT INTO test (one, two, three, four, five) VALUES ('2011-01-01', true,
> 'three', 4, 5.5)
>
> ,('2011-01-01', true, 'three', 4, 5.5)
> -- 99'998 more , ('2011-01-01', true, 'three', 4, 5.5) ...;
> END;
>
> This is the kind of speed increase I was hoping for when using prepared
> statements (which makes sense because in this multi-value insert the query
> is only being planned once?).
>
> Thanks,
> Dan.
> P.S. Mac OS X 10.7.3 using PostgreSQL 9.1.2.
>
>
>> On Thu, May 10, 2012 at 9:25 AM, Daniel McGreal
>> <daniel.mcgreal@redbite.com> wrote:
>>>
>>> Hi!
>>>
>>> My reading to date suggests that prepared statements should be faster to
>>> execute than issuing the same statement multiple times. However, issuing
>>> 100'000 INSERTs turned out to be more than ten times faster than executing
>>> the same prepared statement 100'000 times when executed via pgAdmin. The
>>> table was:
>>>
>>> CREATE TABLE test
>>> (
>>>   one date,
>>>   two boolean,
>>>   three character varying,
>>>   four integer,
>>>   five numeric(18,5),
>>>   id serial NOT NULL --note the index here
>>> )
>>>
>>> The prepared statement test lasting ~160 seconds was:
>>>
>>> TRUNCATE test;
>>>
>>> BEGIN;
>>> PREPARE foo(date, boolean, varchar, integer, numeric(18,5)) AS
>>>     INSERT INTO test (one, two, three, four, five) VALUES ($1, $2, $3,
>>> $4, $5);
>>>
>>> EXECUTE foo('2011-01-01', true, 'three', 4, 5.5);
>>> -- 99'999 more executes...
>>> END;
>>>
>>> The insertion test lasting ~12 seconds was:
>>>
>>> TRUNCATE test;
>>>
>>> BEGIN;
>>> INSERT INTO test (one, two, three, four, five) VALUES ('2011-01-01',
>>> true, 'three', 4, 5.5);
>>> -- 99'999 more inserts...
>>> END;
>>>
>>> I'm assuming then that I've done something mistakenly.
>>>
>>> Many thanks,
>>> Dan.
>>
>>
>

Re: Prepared statements performance

От
Alban Hertroys
Дата:
On 10 May 2012 11:30, Daniel McGreal <daniel.mcgreal@redbite.com> wrote:
> I put the multi-value inserts in as I was
> curious as to why prepared statements would be slower given they only plan
> the query once (as also does the multi-value insert, I assume).

That's a common misconception.

The reason that prepared statements are often slower, is exactly
_because_ they only plan the query once. Because the query-plan is
stored when the query gets prepared, the same plan gets used for every
combination of query parameters, so it has to be a fairly generic
query plan.

OTOH, the multi-value insert knows exactly what combinations of
"parameters" will be used in the query and the query planner can
optimise the query for those parameters. It wouldn't surprise me if it
would re-evaluate plan branch choices based on which row of values is
currently being inserted.

I think it's safe to say that prepared statements are only efficient
when you're dealing with repeated complicated queries, where preparing
the query plan takes a significant amount of time. It'll also shave
some time off queries that are inefficient regardless of how you
execute them (for example, because the query always needs to perform a
sequential scan).
They'll also be faster on database servers with a slower query planner
than the one in Postgres.

In most (all?) other cases, executing the query directly is probably faster.

Of course there are other benefits to prepared statements, such as a
natural immunity to SQL injection.
--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

Re: Prepared statements performance

От
Radosław Smogura
Дата:
On Thu, 10 May 2012 13:52:29 +0200, Alban Hertroys wrote:
> On 10 May 2012 11:30, Daniel McGreal <daniel.mcgreal@redbite.com>
> wrote:
>> I put the multi-value inserts in as I was
>> curious as to why prepared statements would be slower given they
>> only plan
>> the query once (as also does the multi-value insert, I assume).
>
> That's a common misconception.
>
> The reason that prepared statements are often slower, is exactly
> _because_ they only plan the query once. Because the query-plan is
> stored when the query gets prepared, the same plan gets used for
> every
> combination of query parameters, so it has to be a fairly generic
> query plan.
>
> OTOH, the multi-value insert knows exactly what combinations of
> "parameters" will be used in the query and the query planner can
> optimise the query for those parameters. It wouldn't surprise me if
> it
> would re-evaluate plan branch choices based on which row of values is
> currently being inserted.
>
> I think it's safe to say that prepared statements are only efficient
> when you're dealing with repeated complicated queries, where
> preparing
> the query plan takes a significant amount of time. It'll also shave
> some time off queries that are inefficient regardless of how you
> execute them (for example, because the query always needs to perform
> a
> sequential scan).
> They'll also be faster on database servers with a slower query
> planner
> than the one in Postgres.
>
> In most (all?) other cases, executing the query directly is probably
> faster.
>
> Of course there are other benefits to prepared statements, such as a
> natural immunity to SQL injection.
> --
> If you can't see the forest for the trees,
> Cut the trees and you'll see there is no forest.
May I ask what kind of planning may occur during insert?
Regards,
Radek

Re: Prepared statements performance

От
Merlin Moncure
Дата:
On Thu, May 10, 2012 at 6:52 AM, Alban Hertroys <haramrae@gmail.com> wrote:
> On 10 May 2012 11:30, Daniel McGreal <daniel.mcgreal@redbite.com> wrote:
>> I put the multi-value inserts in as I was
>> curious as to why prepared statements would be slower given they only plan
>> the query once (as also does the multi-value insert, I assume).
>
> That's a common misconception.
>
> The reason that prepared statements are often slower, is exactly
> _because_ they only plan the query once. Because the query-plan is
> stored when the query gets prepared, the same plan gets used for every
> combination of query parameters, so it has to be a fairly generic
> query plan.
>
> OTOH, the multi-value insert knows exactly what combinations of
> "parameters" will be used in the query and the query planner can
> optimise the query for those parameters. It wouldn't surprise me if it
> would re-evaluate plan branch choices based on which row of values is
> currently being inserted.
>
> I think it's safe to say that prepared statements are only efficient
> when you're dealing with repeated complicated queries, where preparing
> the query plan takes a significant amount of time. It'll also shave
> some time off queries that are inefficient regardless of how you
> execute them (for example, because the query always needs to perform a
> sequential scan).
> They'll also be faster on database servers with a slower query planner
> than the one in Postgres.
>
> In most (all?) other cases, executing the query directly is probably faster.
>
> Of course there are other benefits to prepared statements, such as a
> natural immunity to SQL injection.

That can be often true, but for simple inserts there is no plan to get
wrong.  Prepared statements can knock about 30-50% of statement
latency off in such cases if you're not i/o bound.

Definitely though prepared statements are headache though and I rarely use them.

merlin

Re: Prepared statements performance

От
Alban Hertroys
Дата:
On 10 May 2012 15:05, Radosław Smogura <rsmogura@softperience.eu> wrote:
> May I ask what kind of planning may occur during insert?

Well, for example, if there's a unique constraint on the table then
the database will have to check that the newly inserted values don't
conflict with values that are already in the table. It needs to plan
an efficient strategy for that, which depends on the values being
inserted.

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

Re: Prepared statements performance

От
Tom Lane
Дата:
Alban Hertroys <haramrae@gmail.com> writes:
> On 10 May 2012 15:05, Radosław Smogura <rsmogura@softperience.eu> wrote:
>> May I ask what kind of planning may occur during insert?

> Well, for example, if there's a unique constraint on the table then
> the database will have to check that the newly inserted values don't
> conflict with values that are already in the table. It needs to plan
> an efficient strategy for that, which depends on the values being
> inserted.

There is no planning associated with checking unique constraints; that's
just a matter for the index mechanisms.

I think the real point here is that a simple INSERT/VALUES has such a
trivial plan that there is hardly any gain to be had by avoiding the
planning stage.  Then the other overhead of a prepared statement
(looking up the saved plan, checking it's not stale, etc) outweighs
that.  Or at least it could.  3x slower seems a bit fishy; I wonder
whether there's some client-side inefficiency involved in that.
Doing performance measurements with pgAdmin seems pretty questionable
in the first place ...

            regards, tom lane

Re: Prepared statements performance

От
Daniel McGreal
Дата:
Doing the same tests from psql gives:
  1. ~2.5 seconds for INSERT/VALUES
  2. ~10 seconds for prepared statement executes
  3. ~15 seconds for multiple INSERTs

Dan.

On Thu, May 10, 2012 at 3:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alban Hertroys <haramrae@gmail.com> writes:
> On 10 May 2012 15:05, Radosław Smogura <rsmogura@softperience.eu> wrote:
>> May I ask what kind of planning may occur during insert?

> Well, for example, if there's a unique constraint on the table then
> the database will have to check that the newly inserted values don't
> conflict with values that are already in the table. It needs to plan
> an efficient strategy for that, which depends on the values being
> inserted.

There is no planning associated with checking unique constraints; that's
just a matter for the index mechanisms.

I think the real point here is that a simple INSERT/VALUES has such a
trivial plan that there is hardly any gain to be had by avoiding the
planning stage.  Then the other overhead of a prepared statement
(looking up the saved plan, checking it's not stale, etc) outweighs
that.  Or at least it could.  3x slower seems a bit fishy; I wonder
whether there's some client-side inefficiency involved in that.
Doing performance measurements with pgAdmin seems pretty questionable
in the first place ...

                       regards, tom lane

Re: Prepared statements performance

От
"Daniel Verite"
Дата:
    Aleksander Rozman wrote:

> Now I am not sure two of this directories are old databases, but I think
> they are... Is there a way to register one of this databases into new
> installation (I am sure that directory "1" is old postgres database, and
> "11563" is my database I want to save, and "11564" is new postgresql
> database (this is only directory with new date).

Actually 1, 11563 and 11564 are what you get with a fresh new installation.
On a 8.4 install on Ubuntu, the corresponding databases are:

SELECT oid, datname from pg_database where oid in (1,11563,11564);

  oid  |  datname
-------+-----------
     1 | template1
 11563 | template0
 11564 | postgres

Also normally initdb wouldn't work on a non-empty data directory, anyway.
I'd say that either the old data directory has been moved aside at some point
of the upgrade procedure, or it has been wiped out :(

Best regards,
--
Daniel
PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org