Обсуждение: Postgres insert performance and storage requirement compared to Oracle

Поиск
Список
Период
Сортировка

Postgres insert performance and storage requirement compared to Oracle

От
Divakar Singh
Дата:
Hello Experts,
My application uses Oracle DB, and makes use of OCI interface.
I have been able to develop similar interface using postgreSQL library.
However, I have done some tests but results for PostgreSQL have not been encouraging for a few of them.

My questions/scenarios are:

1. How does PostgreSQL perform when inserting data into an indexed (type: btree) table? Is it true that as you add the indexes on a table, the performance deteriorates significantly whereas Oracle does not show that much performance decrease. I have tried almost all postgreSQL performance tips available. I want to have very good "insert" performance (with indexes), "select" performance is not that important at this point of time.

2. What are the average storage requirements of postgres compared to Oracle? I inserted upto 1 million records. The storage requirement of postgreSQL is almost double than that of Oracle.

Thanks in anticipation.

Re: Postgres insert performance and storage requirement compared to Oracle

От
"Joshua D. Drake"
Дата:
On Mon, 2010-10-25 at 11:12 -0700, Divakar Singh wrote:

> My questions/scenarios are:
>
> 1. How does PostgreSQL perform when inserting data into an indexed
> (type: btree)
> table? Is it true that as you add the indexes on a table, the
> performance
> deteriorates significantly whereas Oracle does not show that much
> performance
> decrease. I have tried almost all postgreSQL performance tips
> available. I want
> to have very good "insert" performance (with indexes), "select"
> performance is
> not that important at this point of time.

Did you test?

>
> 2. What are the average storage requirements of postgres compared to
> Oracle? I
> inserted upto 1 million records. The storage requirement of postgreSQL
> is almost
> double than that of Oracle.

What was your table structure?

Joshua D. Drake


--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


Re: Postgres insert performance and storage requirement compared to Oracle

От
Scott Marlowe
Дата:
On Mon, Oct 25, 2010 at 12:12 PM, Divakar Singh <dpsmails@yahoo.com> wrote:
> Hello Experts,
> My application uses Oracle DB, and makes use of OCI interface.
> I have been able to develop similar interface using postgreSQL library.
> However, I have done some tests but results for PostgreSQL have not been
> encouraging for a few of them.

Tell us more about your tests and results please.

Re: Postgres insert performance and storage requirement compared to Oracle

От
Divakar Singh
Дата:

> My questions/scenarios are:
>
> 1. How does PostgreSQL perform when inserting data into an indexed
> (type: btree)
> table? Is it true that as you add the indexes on a table, the
> performance
> deteriorates significantly whereas Oracle does not show that much
> performance
> decrease. I have tried almost all postgreSQL performance tips
> available. I want
> to have very good "insert" performance (with indexes), "select"
> performance is
> not that important at this point of time.

-- Did you test?

Yes. the performance was comparable when using SQL procedure. However, When I used libpq, PostgreSQL performed very bad. There was some difference in environment also between these 2 tests, but I am assuming libpq vs SQL was the real cause. Or it was something else?
 
>
> 2. What are the average storage requirements of postgres compared to
> Oracle? I
> inserted upto 1 million records. The storage requirement of postgreSQL
> is almost
> double than that of Oracle.

 -- What was your table structure?

Some 10-12 columns ( like 2 timestamp, 4 int and 4 varchar), with 5 indexes on varchar and int fields including 1 implicit index coz of PK.


Joshua D. Drake


--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


Re: Postgres insert performance and storage requirement compared to Oracle

От
Josh Kupershmidt
Дата:
On Mon, Oct 25, 2010 at 2:12 PM, Divakar Singh <dpsmails@yahoo.com> wrote:
> 1. How does PostgreSQL perform when inserting data into an indexed (type:
> btree) table? Is it true that as you add the indexes on a table, the
> performance deteriorates significantly whereas Oracle does not show that
> much performance decrease. I have tried almost all postgreSQL performance
> tips available. I want to have very good "insert" performance (with
> indexes), "select" performance is not that important at this point of time.

I don't claim to have any experience with Oracle, but this boast
smells fishy. See for example Figure 3-2 (pp. 57-58) in "The Art of
SQL", where the author presents simple charts showing the performance
impact upon INSERTs of adding indexes to a table in Oracle and MySQL:
they're both in the same ballpark, and the performance impact is
indeed significant. As Joshua Drake suggests, table schemas and test
results would help your case.

Josh

Re: Postgres insert performance and storage requirement compared to Oracle

От
Divakar Singh
Дата:
Storage test was simple, but the data (seconds taken) for INSERT test for PG vs Oracle for 1, 2, 3,4 and 5 indexes was:
PG:
25
30
37
42
45


Oracle:

33
43
50
65
68
Rows inserted: 100,000
Above results show good INSERT performance of PG when using SQL procedures. But performance when I use C++ lib is very bad. I did that test some time back so I do not have data for that right now.


From: Scott Marlowe <scott.marlowe@gmail.com>
To: Divakar Singh <dpsmails@yahoo.com>
Cc: pgsql-performance@postgresql.org
Sent: Mon, October 25, 2010 11:56:27 PM
Subject: Re: [PERFORM] Postgres insert performance and storage requirement compared to Oracle

On Mon, Oct 25, 2010 at 12:12 PM, Divakar Singh <dpsmails@yahoo.com> wrote:
> Hello Experts,
> My application uses Oracle DB, and makes use of OCI interface.
> I have been able to develop similar interface using postgreSQL library.
> However, I have done some tests but results for PostgreSQL have not been
> encouraging for a few of them.

Tell us more about your tests and results please.

Re: Postgres insert performance and storage requirement compared to Oracle

От
"Joshua D. Drake"
Дата:
On Mon, 2010-10-25 at 11:36 -0700, Divakar Singh wrote:
>
> 68 Rows inserted: 100,000
> Above results show good INSERT performance of PG when using SQL
> procedures. But
> performance when I use C++ lib is very bad. I did that test some time
> back so I
> do not have data for that right now.

This is interesting, are you using libpq or libpqXX?

Joshua D. Drake


>
>
--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


Re: Postgres insert performance and storage requirement compared to Oracle

От
Steve Singer
Дата:
On 10-10-25 02:31 PM, Divakar Singh wrote:
>
>  > My questions/scenarios are:
>  >
>  > 1. How does PostgreSQL perform when inserting data into an indexed
>  > (type: btree)
>  > table? Is it true that as you add the indexes on a table, the
>  > performance
>  > deteriorates significantly whereas Oracle does not show that much
>  > performance
>  > decrease. I have tried almost all postgreSQL performance tips
>  > available. I want
>  > to have very good "insert" performance (with indexes), "select"
>  > performance is
>  > not that important at this point of time.
>
> -- Did you test?
>
> Yes. the performance was comparable when using SQL procedure. However,
> When I used libpq, PostgreSQL performed very bad. There was some
> difference in environment also between these 2 tests, but I am assuming
> libpq vs SQL was the real cause. Or it was something else?

So your saying that when you load the data with psql it loads fine, but
when you load it using libpq it takes much longer?

How are you using libpq?
-Are you opening and closing the database connection between each insert?
-Are you doing all of your inserts as one big transaction or are you
doing a transaction per insert
-Are you using prepared statements for your inserts?
-Are you using the COPY command to load your data or the INSERT command?
-Are you running your libpq program on the same server as postgresql?
-How is your libpq program connecting to postgresql, is it using ssl?

>
> Some 10-12 columns ( like 2 timestamp, 4 int and 4 varchar), with 5
> indexes on varchar and int fields including 1 implicit index coz of PK.

If your run "VACUUM VERBOSE tablename" on the table, what does it say?

You also don't mention which version of postgresql your using.

>
>
> Joshua D. Drake
>
>
> --
> PostgreSQL.org Major Contributor
> Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
> Consulting, Training, Support, Custom Development, Engineering
> http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
>
>


Re: Postgres insert performance and storage requirement compared to Oracle

От
Divakar Singh
Дата:
Hi Joshua,
I have been only using libpq.
Is libpqXX better than the other?
Is there any notable facility in libpqxx which could aid in fast inserts or better performance in general?
 
Best Regards,
Divakar



From: Joshua D. Drake <jd@commandprompt.com>
To: Divakar Singh <dpsmails@yahoo.com>
Cc: Scott Marlowe <scott.marlowe@gmail.com>; pgsql-performance@postgresql.org
Sent: Tue, October 26, 2010 12:08:52 AM
Subject: Re: [PERFORM] Postgres insert performance and storage requirement compared to Oracle

On Mon, 2010-10-25 at 11:36 -0700, Divakar Singh wrote:

> 68 Rows inserted: 100,000
> Above results show good INSERT performance of PG when using SQL
> procedures. But
> performance when I use C++ lib is very bad. I did that test some time
> back so I
> do not have data for that right now.

This is interesting, are you using libpq or libpqXX?

Joshua D. Drake


>
>
--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Postgres insert performance and storage requirement compared to Oracle

От
Divakar Singh
Дата:

Answers:

How are you using libpq?
-Are you opening and closing the database connection between each insert?

[Need to check, will come back on this]

-Are you doing all of your inserts as one big transaction or are you  doing a transaction per insert

[Answer: for C++ program, one insert per transaction in PG as well as Oracle. But in stored proc, I think both use only 1 transaction for all inserts]

-Are you using prepared statements for your inserts?

[Need to check, will come back on this]

-Are you using the COPY command to load your data or the INSERT command?

[No]

-Are you running your libpq program on the same server as postgresql?

[Yes]

-How is your libpq program connecting to postgresql, is it using ssl?

[No]

If your run "VACUUM VERBOSE tablename" on the table, what does it say?

[Need to check, will come back on this]

You also don't mention which version of postgresql your using.

[Latest, 9.x]
 
Best Regards,
Divakar



From: Steve Singer <ssinger@ca.afilias.info>
To: Divakar Singh <dpsmails@yahoo.com>
Cc: jd@commandprompt.com; pgsql-performance@postgresql.org
Sent: Tue, October 26, 2010 12:16:46 AM
Subject: Re: [PERFORM] Postgres insert performance and storage requirement compared to Oracle

On 10-10-25 02:31 PM, Divakar Singh wrote:
>
>  > My questions/scenarios are:
>  >
>  > 1. How does PostgreSQL perform when inserting data into an indexed
>  > (type: btree)
>  > table? Is it true that as you add the indexes on a table, the
>  > performance
>  > deteriorates significantly whereas Oracle does not show that much
>  > performance
>  > decrease. I have tried almost all postgreSQL performance tips
>  > available. I want
>  > to have very good "insert" performance (with indexes), "select"
>  > performance is
>  > not that important at this point of time.
>
> -- Did you test?
>
> Yes. the performance was comparable when using SQL procedure. However,
> When I used libpq, PostgreSQL performed very bad. There was some
> difference in environment also between these 2 tests, but I am assuming
> libpq vs SQL was the real cause. Or it was something else?

So your saying that when you load the data with psql it loads fine, but
when you load it using libpq it takes much longer?

How are you using libpq?
-Are you opening and closing the database connection between each insert?
-Are you doing all of your inserts as one big transaction or are you
doing a transaction per insert
-Are you using prepared statements for your inserts?
-Are you using the COPY command to load your data or the INSERT command?
-Are you running your libpq program on the same server as postgresql?
-How is your libpq program connecting to postgresql, is it using ssl?

>
> Some 10-12 columns ( like 2 timestamp, 4 int and 4 varchar), with 5
> indexes on varchar and int fields including 1 implicit index coz of PK.

If your run "VACUUM VERBOSE tablename" on the table, what does it say?

You also don't mention which version of postgresql your using.

>
>
> Joshua D. Drake
>
>
> --
> PostgreSQL.org Major Contributor
> Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
> Consulting, Training, Support, Custom Development, Engineering
> http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
>
>


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Postgres insert performance and storage requirement compared to Oracle

От
Mladen Gogala
Дата:
Profiling could tell you where is the time lost and where is your
program spending time. Having experience with both Oracle and Postgres,
I don't feel that there is much of a difference in the insert speed. I
am not using C++, I am using scripting languages like Perl and PHP and,
as far as inserts go, I don't see much of a difference. I have an
application which inserts approximately 600,000 records into a
PostgreSQL 9.0.1 per day, in chunks of up to 60,000 records every hour.
The table is partitioned and there are indexes on the underlying
partitions. I haven't noticed any problems with inserts. Also, if I use
"copy" instead of the "insert" command, I can be even faster.  In
addition to that, PostgreSQL doesn't support index organized tables.

Divakar Singh wrote:
> Storage test was simple, but the data (seconds taken) for INSERT test
> for PG vs Oracle for 1, 2, 3,4 and 5 indexes was:
> PG:
> 25
> 30
> 37
> 42
> 45
>
>
>
> Oracle:
>
> 33
> 43
> 50
> 65
> 68
>
> Rows inserted: 100,000
> Above results show good INSERT performance of PG when using SQL
> procedures. But performance when I use C++ lib is very bad. I did that
> test some time back so I do not have data for that right now.
>
> ------------------------------------------------------------------------
> *From:* Scott Marlowe <scott.marlowe@gmail.com>
> *To:* Divakar Singh <dpsmails@yahoo.com>
> *Cc:* pgsql-performance@postgresql.org
> *Sent:* Mon, October 25, 2010 11:56:27 PM
> *Subject:* Re: [PERFORM] Postgres insert performance and storage
> requirement compared to Oracle
>
> On Mon, Oct 25, 2010 at 12:12 PM, Divakar Singh <dpsmails@yahoo.com
> <mailto:dpsmails@yahoo.com>> wrote:
> > Hello Experts,
> > My application uses Oracle DB, and makes use of OCI interface.
> > I have been able to develop similar interface using postgreSQL library.
> > However, I have done some tests but results for PostgreSQL have not been
> > encouraging for a few of them.
>
> Tell us more about your tests and results please.
>


--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions




Re: Postgres insert performance and storage requirement compared to Oracle

От
Ray Stell
Дата:
On Mon, Oct 25, 2010 at 11:39:30AM -0700, Divakar Singh wrote:
> Thanks Ray,
> Already seen that, but it does not tell about storage requirement compared to
> Oracle. I find it takes 2 times space than oracle.
>
>
>  Best Regards,
> Divakar
> ________________________________
> From: Ray Stell <stellr@cns.vt.edu>
> To: Divakar Singh <dpsmails@yahoo.com>
> Sent: Tue, October 26, 2010 12:05:23 AM
> Subject: Re: [PERFORM] Postgres insert performance and storage requirement
> compared to Oracle
>
> On Mon, Oct 25, 2010 at 11:12:40AM -0700, Divakar Singh wrote:
> >
> > 2. What are the average storage requirements of postgres compared to Oracle? I
>
> > inserted upto 1 million records. The storage requirement of postgreSQL is
> >almost
> >
> > double than that of Oracle.
>
> there's a fine manual:
> http://www.postgresql.org/docs/9.0/interactive/storage.html


Maybe compare to oracle's storage documentation:

 http://download.oracle.com/docs/cd/E11882_01/server.112/e17118/sql_elements001.htm#SQLRF30020
 http://download.oracle.com/docs/cd/E11882_01/server.112/e17120/schema007.htm#ADMIN11622

I don't believe for a second the byte count is double in pg, but that's just
a religious expression, I've never counted.

Re: Postgres insert performance and storage requirement compared to Oracle

От
Scott Marlowe
Дата:
On Mon, Oct 25, 2010 at 12:36 PM, Divakar Singh <dpsmails@yahoo.com> wrote:
>
> Storage test was simple, but the data (seconds taken) for INSERT test for PG vs Oracle for 1, 2, 3,4 and 5 indexes
was:
> PG:
> 25
> 30
> 37
> 42
> 45
>
> Oracle:
>
> 33
> 43
> 50
> 65
> 68
> Rows inserted: 100,000
> Above results show good INSERT performance of PG when using SQL procedures. But performance when I use C++ lib is
verybad. I did that test some time back so I do not have data for that right now. 

So, assuming I wanted to reproduce your results, can you provide a
self contained test case that shows these differences?  I have always
gotten really good performance using libpq myself, so I'm looking for
what it is you might be doing differently from me that would make it
so slow.

Re: Postgres insert performance and storage requirement compared to Oracle

От
Alan Hodgson
Дата:
On October 25, 2010 11:36:24 am Divakar Singh wrote:
> Above results show good INSERT performance of PG when using SQL procedures.
> But performance when I use C++ lib is very bad. I did that test some time
> back so I do not have data for that right now.

Wrap it in a transaction.

Re: Postgres insert performance and storage requirement compared to Oracle

От
Merlin Moncure
Дата:
On Mon, Oct 25, 2010 at 2:12 PM, Divakar Singh <dpsmails@yahoo.com> wrote:
> Hello Experts,
> My application uses Oracle DB, and makes use of OCI interface.
> I have been able to develop similar interface using postgreSQL library.
> However, I have done some tests but results for PostgreSQL have not been
> encouraging for a few of them.
>
> My questions/scenarios are:
>
> 1. How does PostgreSQL perform when inserting data into an indexed (type:
> btree) table? Is it true that as you add the indexes on a table, the
> performance deteriorates significantly whereas Oracle does not show that
> much performance decrease. I have tried almost all postgreSQL performance
> tips available. I want to have very good "insert" performance (with
> indexes), "select" performance is not that important at this point of time.
>
> 2. What are the average storage requirements of postgres compared to Oracle?
> I inserted upto 1 million records. The storage requirement of postgreSQL is
> almost double than that of Oracle.
>u
> Thanks in anticipation.

I ran the following tests w/libpqtypes.  While you probably wont end
up using libpqtypes, it's illustrative to mention it because it's
generally the easiest way to get data into postgres and by far the
fastest (excluding 'COPY').  source code follows after the sig (I
banged it out quite quickly, it's messy!) :-).  I am not seeing your
results.

via libpqtypes: Inserting, begin..insert..(repeat 1000000x) commit;
local workstation: 2m24s
remote server: 8m8s

via libpqtypes, but stacking array and unstacking on server (this
could be optimized further by using local prepare):
local workstation: 43s (io bound)
remote server: 29s  (first million)
remote server: 29s  (second million)
create index (1.8s) remote
remote server: 33s  (third million, w/index)

obviously insert at a time tests are network bound.  throw a couple of
indexes in there and you should see some degradation, but nothing too
terrible.

merlin
libpqtypes.esilo.com

ins1.c (insert at a time)
#include "libpq-fe.h"
#include "libpqtypes.h"

#define INS_COUNT 1000000

int main()
{
 int i;

 PGconn *conn = PQconnectdb("host=devdb dbname=postgres port=8071");
 if(PQstatus(conn) != CONNECTION_OK)
 {
   printf("bad connection");
   return -1;
 }

 PQtypesRegister(conn);

 PQexec(conn, "begin");

 for(i=0; i<INS_COUNT; i++)
 {
   PGint4 a=i;
   PGtext b = "some_text";
   PGtimestamp c;
   PGbytea d;

   d.len = 8;
   d.data = b;

   c.date.isbc   = 0;
   c.date.year   = 2000;
   c.date.mon    = 0;
   c.date.mday   = 19;
   c.time.hour   = 10;
   c.time.min    = 41;
   c.time.sec    = 6;
   c.time.usec   = 0;
   c.time.gmtoff = -18000;

   PGresult *res = PQexecf(conn, "insert into ins_test(a,b,c,d)
values(%int4, %text, %timestamptz, %bytea)", a, b, &c, &d);

   if(!res)
   {
     printf("got %s\n", PQgeterror());
     return -1;
   }
   PQclear(res);
 }

 PQexec(conn, "commit");

 PQfinish(conn);
}


ins2.c (array stack/unstack)
#include "libpq-fe.h"
#include "libpqtypes.h"

#define INS_COUNT 1000000

int main()
{
 int i;

 PGconn *conn = PQconnectdb("host=devdb dbname=postgres port=8071");
 PGresult *res;
 if(PQstatus(conn) != CONNECTION_OK)
 {
   printf("bad connection");
   return -1;
 }

 PQtypesRegister(conn);

 PGregisterType type = {"ins_test", NULL, NULL};
 PQregisterComposites(conn, &type, 1);

 PGparam *p =  PQparamCreate(conn);
 PGarray arr;
 arr.param = PQparamCreate(conn);
 arr.ndims = 0;

 for(i=0; i<INS_COUNT; i++)
 {
   PGint4 a=i;
   PGtext b = "some_text";
   PGtimestamp c;
   PGbytea d;
   PGparam *i = PQparamCreate(conn);

   d.len = 8;
   d.data = b;

   c.date.isbc   = 0;
   c.date.year   = 2000;
   c.date.mon    = 0;
   c.date.mday   = 19;
   c.time.hour   = 10;
   c.time.min    = 41;
   c.time.sec    = 6;
   c.time.usec   = 0;
   c.time.gmtoff = -18000;

   PQputf(i, "%int4 %text %timestamptz %bytea", a, b, &c, &d);
   PQputf(arr.param, "%ins_test", i);
 }

 if(!PQputf(p, "%ins_test[]", &arr))
 {
   printf("putf failed: %s\n", PQgeterror());
   return -1;
 }
 res = PQparamExec(conn, p, "insert into ins_test select (unnest($1)).*", 1);

 if(!res)
 {
   printf("got %s\n", PQgeterror());
   return -1;
 }
 PQclear(res);
 PQfinish(conn);
}

Re: Postgres insert performance and storage requirement compared to Oracle

От
Merlin Moncure
Дата:
On Mon, Oct 25, 2010 at 4:28 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> I ran the following tests w/libpqtypes.  While you probably wont end
> up using libpqtypes, it's illustrative to mention it because it's
> generally the easiest way to get data into postgres and by far the
> fastest (excluding 'COPY').  source code follows after the sig (I
> banged it out quite quickly, it's messy!) :-).  I am not seeing your
> results.

I had a really horrible bug in there -- leaking a param inside the
array push loop.  cleaning it up dropped another 5 seconds or so from
the 4th million inserted to the remote server!.  Using local prepare
(PQspecPrepare) prob another second or two could be shaved off.

 PGparam *t = PQparamCreate(conn);

 for(i=0; i<INS_COUNT; i++)
 {
   PGint4 a=i;
   PGtext b = "some_text";
   PGtimestamp c;
   PGbytea d;

   d.len = 8;
   d.data = b;

   c.date.isbc   = 0;
   c.date.year   = 2000;
   c.date.mon    = 0;
   c.date.mday   = 19;
   c.time.hour   = 10;
   c.time.min    = 41;
   c.time.sec    = 6;
   c.time.usec   = 0;
   c.time.gmtoff = -18000;

   PQputf(t, "%int4 %text %timestamptz %bytea", a, b, &c, &d);
   PQputf(arr.param, "%ins_test", t);
   PQparamReset(t);
 }

merlin

Re: Postgres insert performance and storage requirement compared to Oracle

От
Divakar Singh
Дата:
Hi Merlin,
Thanks for your quick input.
Well 1 difference worth mentioning:
I am inserting each row in a separate transaction, due to design of my program.
 
-Divakar



From: Merlin Moncure <mmoncure@gmail.com>
To: Divakar Singh <dpsmails@yahoo.com>
Cc: pgsql-performance@postgresql.org
Sent: Tue, October 26, 2010 2:21:02 AM
Subject: Re: [PERFORM] Postgres insert performance and storage requirement compared to Oracle

On Mon, Oct 25, 2010 at 4:28 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> I ran the following tests w/libpqtypes.  While you probably wont end
> up using libpqtypes, it's illustrative to mention it because it's
> generally the easiest way to get data into postgres and by far the
> fastest (excluding 'COPY').  source code follows after the sig (I
> banged it out quite quickly, it's messy!) :-).  I am not seeing your
> results.

I had a really horrible bug in there -- leaking a param inside the
array push loop.  cleaning it up dropped another 5 seconds or so from
the 4th million inserted to the remote server!.  Using local prepare
(PQspecPrepare) prob another second or two could be shaved off.

PGparam *t = PQparamCreate(conn);

for(i=0; i<INS_COUNT; i++)
{
  PGint4 a=i;
  PGtext b = "some_text";
  PGtimestamp c;
  PGbytea d;

  d.len = 8;
  d.data = b;

  c.date.isbc  = 0;
  c.date.year  = 2000;
  c.date.mon    = 0;
  c.date.mday  = 19;
  c.time.hour  = 10;
  c.time.min    = 41;
  c.time.sec    = 6;
  c.time.usec  = 0;
  c.time.gmtoff = -18000;

  PQputf(t, "%int4 %text %timestamptz %bytea", a, b, &c, &d);
  PQputf(arr.param, "%ins_test", t);
  PQparamReset(t);
}

merlin

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Postgres insert performance and storage requirement compared to Oracle

От
Merlin Moncure
Дата:
On Tue, Oct 26, 2010 at 7:44 AM, Divakar Singh <dpsmails@yahoo.com> wrote:
> Hi Merlin,
> Thanks for your quick input.
> Well 1 difference worth mentioning:
> I am inserting each row in a separate transaction, due to design of my
> program.

Well, that right there is going to define your application
performance. You have basically three major issues -- postgresql
executes each query synchronously through the protocol, transaction
overhead, and i/o issues coming from per transaction sync.  libpq
supports asynchronous queries, but only from the clients point of view
-- so that this only helps if you have non trivial work to do setting
up each query.  The database is inherently capable of doing what you
want it to do...you may just have to rethink certain things if you
want to unlock the true power of postgres...

You have several broad areas of attack:
*) client side: use prepared queries (PQexecPrepared) possibly
asynchronously (PQsendPrepared).  Reasonably you can expect 5-50%
speedup if not i/o bound
*) Stage data to a temp table:  temp tables are not wal logged or
synced.  Periodically they can be flushed to a permanent table.
Possible data loss
*) Relax sync policy (synchronous_commit/fsync) -- be advised these
settings are dangerous
*) Multiple client writers -- as long as you are not i/o bound, you
will see big improvements in tps from multiple clients
*) Stage/queue application data before inserting it -- requires
retooling application, but you can see orders of magnitude jump insert
performance

merlin

Re: Postgres insert performance and storage requirement compared to Oracle

От
Leonardo Francalanci
Дата:
> temp  tables are not wal logged or
> synced.  Periodically they can be flushed  to a permanent table.


What do you mean with "Periodically they can be flushed  to
a permanent table"? Just doing

insert into tabb select * from temptable

or using a proper, per-temporary table command???




Re: Postgres insert performance and storage requirement compared to Oracle

От
Merlin Moncure
Дата:
On Tue, Oct 26, 2010 at 11:08 AM, Leonardo Francalanci <m_lists@yahoo.it> wrote:
>> temp  tables are not wal logged or
>> synced.  Periodically they can be flushed  to a permanent table.
>
>
> What do you mean with "Periodically they can be flushed  to
> a permanent table"? Just doing
>
> insert into tabb select * from temptable
>

yup, that's exactly what I mean -- this will give you more uniform
insert performance (your temp table doesn't even need indexes).  Every
N records (say 10000) you send to permanent and truncate the temp
table.  Obviously, this is more fragile approach so weigh the
pros/cons carefully.

merlin

Re: Postgres insert performance and storage requirement compared to Oracle

От
Mladen Gogala
Дата:
On 10/26/2010 11:41 AM, Merlin Moncure wrote:
> yup, that's exactly what I mean -- this will give you more uniform
> insert performance (your temp table doesn't even need indexes).  Every
> N records (say 10000) you send to permanent and truncate the temp
> table.  Obviously, this is more fragile approach so weigh the
> pros/cons carefully.
>
> merlin

Truncate temporary table? What a horrible advice! All that you need is
the temporary table to delete rows on commit.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions




Re: Postgres insert performance and storage requirement compared to Oracle

От
Jon Nelson
Дата:
On Tue, Oct 26, 2010 at 4:02 PM, Mladen Gogala
<mladen.gogala@vmsinfo.com> wrote:
> On 10/26/2010 11:41 AM, Merlin Moncure wrote:
>>
>> yup, that's exactly what I mean -- this will give you more uniform
>> insert performance (your temp table doesn't even need indexes).  Every
>> N records (say 10000) you send to permanent and truncate the temp
>> table.  Obviously, this is more fragile approach so weigh the
>> pros/cons carefully.
>>
>> merlin
>
> Truncate temporary table? What a horrible advice! All that you need is the
> temporary table to delete rows on commit.

I believe Merlin was suggesting that, after doing 10000 inserts into
the temporary table, that something like this might work better:

start loop:
  populate rows in temporary table
  insert from temporary table into permanent table
  truncate temporary table
  loop

I do something similar, where I COPY data to a temporary table, do
lots of manipulations, and then perform a series of INSERTS from the
temporary table into a permanent table.

--
Jon

Re: Postgres insert performance and storage requirement compared to Oracle

От
Mladen Gogala
Дата:
On 10/26/2010 5:27 PM, Jon Nelson wrote:
> start loop:
>    populate rows in temporary table
>    insert from temporary table into permanent table
>    truncate temporary table
>    loop
>
> I do something similar, where I COPY data to a temporary table, do
> lots of manipulations, and then perform a series of INSERTS from the
> temporary table into a permanent table.
>

1) It's definitely not faster because you have to insert into the
temporary table, in addition to inserting into the permanent table.
2) This is what I had in mind:

mgogala=# create table a(c1 int);
CREATE TABLE
mgogala=# create temporary table t1(c1 int) on commit delete rows;
CREATE TABLE
mgogala=# begin;
BEGIN
mgogala=# insert into t1 select generate_series(1,1000);
INSERT 0 1000
mgogala=# insert into a select * from t1;
INSERT 0 1000
mgogala=# commit;
COMMIT
mgogala=# select count(*) from a;
  count
-------
   1000
(1 row)

mgogala=# select count(*) from t1;
  count
-------
      0
(1 row)

The table is created with "on commit obliterate rows" option which means
that there is no need to do "truncate". The "truncate" command is a
heavy artillery. Truncating a temporary table is like shooting ducks in
a duck pond, with a howitzer.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions




Re: Postgres insert performance and storage requirement compared to Oracle

От
Merlin Moncure
Дата:
On Tue, Oct 26, 2010 at 5:54 PM, Mladen Gogala
<mladen.gogala@vmsinfo.com> wrote:
> On 10/26/2010 5:27 PM, Jon Nelson wrote:
>>
>> start loop:
>>   populate rows in temporary table
>>   insert from temporary table into permanent table
>>   truncate temporary table
>>   loop
>>
>> I do something similar, where I COPY data to a temporary table, do
>> lots of manipulations, and then perform a series of INSERTS from the
>> temporary table into a permanent table.
>>
>
> 1) It's definitely not faster because you have to insert into the temporary
> table, in addition to inserting into the permanent table.
> 2) This is what I had in mind:
>
> mgogala=# create table a(c1 int);
> CREATE TABLE
> mgogala=# create temporary table t1(c1 int) on commit delete rows;
> CREATE TABLE
> mgogala=# begin;
> BEGIN
> mgogala=# insert into t1 select generate_series(1,1000);
> INSERT 0 1000
> mgogala=# insert into a select * from t1;
> INSERT 0 1000
> mgogala=# commit;
> COMMIT
> mgogala=# select count(*) from a;
>  count
> -------
>  1000
> (1 row)
>
> mgogala=# select count(*) from t1;
>  count
> -------
>     0
> (1 row)
>
> The table is created with "on commit obliterate rows" option which means
> that there is no need to do "truncate". The "truncate" command is a heavy
> artillery. Truncating a temporary table is like shooting ducks in a duck
> pond, with a howitzer.

You are not paying attention ;-).  Look upthread: "I am inserting each
row in a separate transaction, due to design of my program." (also on
commit/drop is no picnic either, but I digress...)

merlin

Re: Postgres insert performance and storage requirement compared to Oracle

От
Robert Haas
Дата:
On Tue, Oct 26, 2010 at 5:54 PM, Mladen Gogala
<mladen.gogala@vmsinfo.com> wrote:
> The table is created with "on commit obliterate rows" option which means
> that there is no need to do "truncate". The "truncate" command is a heavy
> artillery. Truncating a temporary table is like shooting ducks in a duck
> pond, with a howitzer.

This is just not true.  ON COMMIT DELETE ROWS simply arranges for a
TRUNCATE to happen immediately before each commit.  See
PreCommit_on_commit_actions() in tablecmds.c.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Postgres insert performance and storage requirement compared to Oracle

От
Merlin Moncure
Дата:
On Tue, Oct 26, 2010 at 6:50 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Oct 26, 2010 at 5:54 PM, Mladen Gogala
> <mladen.gogala@vmsinfo.com> wrote:
>> The table is created with "on commit obliterate rows" option which means
>> that there is no need to do "truncate". The "truncate" command is a heavy
>> artillery. Truncating a temporary table is like shooting ducks in a duck
>> pond, with a howitzer.
>
> This is just not true.  ON COMMIT DELETE ROWS simply arranges for a
> TRUNCATE to happen immediately before each commit.  See
> PreCommit_on_commit_actions() in tablecmds.c.

quite so.  If you are doing anything performance sensitive with 'on
commit drop', you are better off organizing a cache around
txid_current() (now(), pid for older pg versions).  Skips the writes
to the system catalogs and truncate.

merlin

Re: Postgres insert performance and storage requirement compared to Oracle

От
Ivan Voras
Дата:
On 10/26/10 17:41, Merlin Moncure wrote:
> On Tue, Oct 26, 2010 at 11:08 AM, Leonardo Francalanci <m_lists@yahoo.it> wrote:
>>> temp  tables are not wal logged or
>>> synced.  Periodically they can be flushed  to a permanent table.
>>
>>
>> What do you mean with "Periodically they can be flushed  to
>> a permanent table"? Just doing
>>
>> insert into tabb select * from temptable
>>
>
> yup, that's exactly what I mean -- this will give you more uniform

In effect, when so much data is in temporary storage, a better option
would be to simply configure "synchronous_commit = off" (better in the
sense that the application would not need to be changed). The effects
are almost the same - in both cases transactions might be lost but the
database will survive.


Re: Re: Postgres insert performance and storage requirement compared to Oracle

От
Merlin Moncure
Дата:
On Wed, Oct 27, 2010 at 6:13 AM, Ivan Voras <ivoras@freebsd.org> wrote:
> On 10/26/10 17:41, Merlin Moncure wrote:
>> On Tue, Oct 26, 2010 at 11:08 AM, Leonardo Francalanci <m_lists@yahoo.it> wrote:
>>>> temp  tables are not wal logged or
>>>> synced.  Periodically they can be flushed  to a permanent table.
>>>
>>>
>>> What do you mean with "Periodically they can be flushed  to
>>> a permanent table"? Just doing
>>>
>>> insert into tabb select * from temptable
>>>
>>
>> yup, that's exactly what I mean -- this will give you more uniform
>
> In effect, when so much data is in temporary storage, a better option
> would be to simply configure "synchronous_commit = off" (better in the
> sense that the application would not need to be changed). The effects
> are almost the same - in both cases transactions might be lost but the
> database will survive.

right -- although that's a system wide setting and perhaps other
tables still require full synchronous fsync.  Still -- fair point
(although I bet you are still going to get better performance going by
the temp route if only by a hair).

merlin

Re: Postgres insert performance and storage requirement compared to Oracle

От
Divakar Singh
Дата:
Hi Steve and other friends,
Some information you would be interested in:
I did some further tests using libpq  in my code.
I used a stored proc to insert 100 thousand rows in a table, it took 25 sec (almost same as time taken by Oracle PL/SQL and OCI interface).
Same inserts through libpq take 70 seconds.
I am inserting all records in a single transaction.
So, the problem seems to be optimization of usage of libpq  in my code.
I am attaching my code below.
Is any optimization possible in this?
Do prepared statements help in cutting down the insert time to half for this kind of inserts? One of the major problems with libpq usage is lack of good documentation and examples.
I could not get any good example of prepared stmt usage anywhere.

//----------------------------------------------------------------------------------------------------------------------------
/*
 * testlibpq.c
 *
 *      Test the C version of libpq, the PostgreSQL frontend library.
 */
#include <stdio.h>
#include <stdlib.h>
#include <libpq-fe.h>
#include "iostream.h"
#include "stdio.h"
#include <time.h>

static void
exit_nicely(PGconn *conn)
{
    PQfinish(conn);
    exit(1);
}

int
main(int argc, char **argv)
{
    const char *conninfo;
    PGconn     *conn;
    PGresult   *res;
    int         nFields;
    int         i=0;
    int howmany=0;
    if (argc<2)
    {
        cout<<"please pass no of records as parameter"<<endl;
        return -1;
    }
    sscanf(argv[1], "%d", &howmany);
    cout<<"inserting "<<howmany<<" records"<<endl;

    time_t mytime1 = time(0);
    cout<<"starting at "<<asctime(localtime(&mytime1))<<endl;



    /*
     * If the user supplies a parameter on the command line, use it as the
     * conninfo string; otherwise default to setting dbname=postgres and using
     * environment variables or defaults for all other connection parameters.
     */
        conninfo = "host=x.y.z.a dbname=xyz port=5432 user=sd password=fg" ;


    /* Make a connection to the database */
    conn = PQconnectdb(conninfo);

    /* Check to see that the backend connection was successfully made */
    if (PQstatus(conn) != CONNECTION_OK)
    {
        fprintf(stderr, "Connection to database failed: %s",
                PQerrorMessage(conn));
        exit_nicely(conn);
    }

    /*
     * Our test case here involves using a cursor, for which we must be inside
     * a transaction block.  We could do the whole thing with a single
     * PQexec() of "select * from pg_database", but that's too trivial to make
     * a good example.
     */

    /* Start a transaction block */
    res = PQexec(conn, "BEGIN");
    if (PQresultStatus(res) != PGRES_COMMAND_OK)
    {
        fprintf(stderr, "BEGIN command failed: %s", PQerrorMessage(conn));
        PQclear(res);
        exit_nicely(conn);
    }

    /*
     * Should PQclear PGresult whenever it is no longer needed to avoid memory
     * leaks
     */
    PQclear(res);

    char query[1024]={0};

    for (; i<howmany;i++ )
    {

    sprintf (query, "INSERT INTO aaaa(a, b, c, d, e, f, g, h, j, k, l, m, n, p) VALUES (67, 'ABRAKADABRA', 'ABRAKADABRA', 'ABRAKADABRA', '1-Dec-2010', 'ABRAKADABRA', 'ABRAKADABRA',  'ABRAKADABRA', '1-Dec-2010', 99999, 99999, %d, 9999, 'ABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRA')", i);

    res = PQexec(conn, query);

      if (PQresultStatus(res) != PGRES_COMMAND_OK)
        {
          cout<<"error at iteration "<<i<<":"<<PQresultErrorMessage(res)<<endl;
                  PQclear(res);
          break;
        }
        //PQclear(res);


    }

    /* close the portal ... we don't bother to check for errors ... */
    /*res = PQexec(conn, "CLOSE myportal");
    PQclear(res);*/

    /* end the transaction */
    res = PQexec(conn, "END");
    PQclear(res);

    cout<<i<<" records inserted!"<<endl;

    mytime1 = time(0);
    cout<<"Finished at "<<asctime(localtime(&mytime1))<<endl;


    /* close the connection to the database and cleanup */
    PQfinish(conn);

    return 0;
}

//----------------------------------------------------------------------------------------------------------------------------

 
Best Regards,
Divakar



From: Divakar Singh <dpsmails@yahoo.com>
To: Steve Singer <ssinger@ca.afilias.info>
Cc: jd@commandprompt.com; pgsql-performance@postgresql.org
Sent: Tue, October 26, 2010 12:22:31 AM
Subject: Re: [PERFORM] Postgres insert performance and storage requirement compared to Oracle


Answers:

How are you using libpq?
-Are you opening and closing the database connection between each insert?

[Need to check, will come back on this]

-Are you doing all of your inserts as one big transaction or are you  doing a transaction per insert

[Answer: for C++ program, one insert per transaction in PG as well as Oracle. But in stored proc, I think both use only 1 transaction for all inserts]

-Are you using prepared statements for your inserts?

[Need to check, will come back on this]

-Are you using the COPY command to load your data or the INSERT command?

[No]

-Are you running your libpq program on the same server as postgresql?

[Yes]

-How is your libpq program connecting to postgresql, is it using ssl?

[No]

If your run "VACUUM VERBOSE tablename" on the table, what does it say?

[Need to check, will come back on this]

You also don't mention which version of postgresql your using.

[Latest, 9.x]
 
Best Regards,
Divakar



From: Steve Singer <ssinger@ca.afilias.info>
To: Divakar Singh <dpsmails@yahoo.com>
Cc: jd@commandprompt.com; pgsql-performance@postgresql.org
Sent: Tue, October 26, 2010 12:16:46 AM
Subject: Re: [PERFORM] Postgres insert performance and storage requirement compared to Oracle

On 10-10-25 02:31 PM, Divakar Singh wrote:
>
>  > My questions/scenarios are:
>  >
>  > 1. How does PostgreSQL perform when inserting data into an indexed
>  > (type: btree)
>  > table? Is it true that as you add the indexes on a table, the
>  > performance
>  > deteriorates significantly whereas Oracle does not show that much
>  > performance
>  > decrease. I have tried almost all postgreSQL performance tips
>  > available. I want
>  > to have very good "insert" performance (with indexes), "select"
>  > performance is
>  > not that important at this point of time.
>
> -- Did you test?
>
> Yes. the performance was comparable when using SQL procedure. However,
> When I used libpq, PostgreSQL performed very bad. There was some
> difference in environment also between these 2 tests, but I am assuming
> libpq vs SQL was the real cause. Or it was something else?

So your saying that when you load the data with psql it loads fine, but
when you load it using libpq it takes much longer?

How are you using libpq?
-Are you opening and closing the database connection between each insert?
-Are you doing all of your inserts as one big transaction or are you
doing a transaction per insert
-Are you using prepared statements for your inserts?
-Are you using the COPY command to load your data or the INSERT command?
-Are you running your libpq program on the same server as postgresql?
-How is your libpq program connecting to postgresql, is it using ssl?

>
> Some 10-12 columns ( like 2 timestamp, 4 int and 4 varchar), with 5
> indexes on varchar and int fields including 1 implicit index coz of PK.

If your run "VACUUM VERBOSE tablename" on the table, what does it say?

You also don't mention which version of postgresql your using.

>
>
> Joshua D. Drake
>
>
> --
> PostgreSQL.org Major Contributor
> Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
> Consulting, Training, Support, Custom Development, Engineering
> http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
>
>


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: Postgres insert performance and storage requirement compared to Oracle

От
Steve Singer
Дата:
On 10-10-27 10:00 AM, Divakar Singh wrote:
> Hi Steve and other friends,
> Some information you would be interested in:
> I did some further tests using libpq in my code.
> I used a stored proc to insert 100 thousand rows in a table, it took 25
> sec (almost same as time taken by Oracle PL/SQL and OCI interface).
> Same inserts through libpq take 70 seconds.
> I am inserting all records in a single transaction.
> So, the problem seems to be optimization of usage of libpq in my code.
> I am attaching my code below.
> Is any optimization possible in this?
> Do prepared statements help in cutting down the insert time to half for
> this kind of inserts? One of the major problems with libpq usage is lack
> of good documentation and examples.
> I could not get any good example of prepared stmt usage anywhere.


Yes using prepared statements should make this go faster, but your best
bet might be to use the COPY command.  I don't have a PQprepare example
handy though we probably should add one to the docs.

The copy command would be used similar to

PQexec(conn,"COPY TO aaaa  (a,b,c,d,e,f,g,h,j,k,l,m,n,p) FROM STDIN WITH
(DELIMITER ',') ");
for(; i < howmany;i++)
{
sprintf(query,"67,'ABRAKADABRA', 'ABRAKADABRA', 'ABRAKADABRA', '1-Dec
2010', 'ABRAKADABRA', 'ABRAKADABRA', 'ABRAKADABRA', '1-Dec-2010',
99999, 99999, %d, 9999,

'ABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRA'\n",


i);
res = PQputCopyData(conn,query,strlen(query);

}
PQputCopyEnd(conn,NULL);

I have not actually tried the above code snippet, it is just to give you
the general idea.

You call PQexec with the COPY command outside the loop then at each loop
iteration you call PQputCopyData with some of the data that gets passed
to the server.


You can combine multiple lines on a single PQputCopyData call if you want.

http://www.postgresql.org/docs/9.0/interactive/sql-copy.html
http://www.postgresql.org/docs/9.0/interactive/libpq-copy.html






>
>
> char query[1024]={0};
>
> for (; i<howmany;i++ )
> {
>
> sprintf (query, "INSERT INTO aaaa(a, b, c, d, e, f, g, h, j, k, l, m, n,
> p) VALUES (67, 'ABRAKADABRA', 'ABRAKADABRA', 'ABRAKADABRA',
> '1-Dec-2010', 'ABRAKADABRA', 'ABRAKADABRA', 'ABRAKADABRA', '1-Dec-2010',
> 99999, 99999, %d, 9999,
>
'ABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRA')",
> i);
>
> res = PQexec(conn, query);
>
> if (PQresultStatus(res) != PGRES_COMMAND_OK)
> {
> cout<<"error at iteration "<<i<<":"<<PQresultErrorMessage(res)<<endl;
> PQclear(res);
> break;
> }
> //PQclear(res);
>
>
> }
>
> /* close the portal ... we don't bother to check for errors ... */
> /*res = PQexec(conn, "CLOSE myportal");
> PQclear(res);*/
<snip>



Re: Postgres insert performance and storage requirement compared to Oracle

От
"Reid Thompson"
Дата:

On Wed, 2010-10-27 at 11:42 -0400, Steve Singer wrote:
> I don't have a PQprepare example
> handy though we probably should add one to the docs.
>
I would like to see this.  A several minutes web search didn't turn up
an example for me either.

thanks,
reid

Re: Postgres insert performance and storage requirement compared to Oracle

От
David Wilson
Дата:


On Wed, Oct 27, 2010 at 10:00 AM, Divakar Singh <dpsmails@yahoo.com> wrote:

    for (; i<howmany;i++ )
    {

    sprintf (query, "INSERT INTO aaaa(a, b, c, d, e, f, g, h, j, k, l, m, n, p) VALUES (67, 'ABRAKADABRA', 'ABRAKADABRA', 'ABRAKADABRA', '1-Dec-2010', 'ABRAKADABRA', 'ABRAKADABRA',  'ABRAKADABRA', '1-Dec-2010', 99999, 99999, %d, 9999, 'ABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRA')", i);

    res = PQexec(conn, query);

      if (PQresultStatus(res) != PGRES_COMMAND_OK)
        {
          cout<<"error at iteration "<<i<<":"<<PQresultErrorMessage(res)<<endl;
                  PQclear(res);
          break;
        }
        //PQclear(res);


    }

Why is that PQclear(res) commented out? You're leaking result status for every insert.


--
- David T. Wilson
david.t.wilson@gmail.com

Re: Postgres insert performance and storage requirement compared to Oracle

От
Divakar Singh
Дата:




    for (; i<howmany;i++ )
    {

    sprintf (query, "INSERT INTO aaaa(a, b, c, d, e, f, g, h, j, k, l, m, n, p) VALUES (67, 'ABRAKADABRA', 'ABRAKADABRA', 'ABRAKADABRA', '1-Dec-2010', 'ABRAKADABRA', 'ABRAKADABRA',  'ABRAKADABRA', '1-Dec-2010', 99999, 99999, %d, 9999, 'ABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRAABRAKADABRA')", i);

    res = PQexec(conn, query);

      if (PQresultStatus(res) != PGRES_COMMAND_OK)
        {
          cout<<"error at iteration "<<i<<":"<<PQresultErrorMessage(res)<<endl;
                  PQclear(res);
          break;
        }
        //PQclear(res);


    }

>Why is that PQclear(res) commented out? You're leaking result status for every insert.


I did that purposely to see if cleanup part is contributing to any performance loss.
Right now in my test, memory leak is not a concern for me but performance is.
Though I understand that memory leak can also result in performance loss if leak is too much.
However, in this case, commenting or uncommenting this statement did not cause any change in performance.
 




Re: Postgres insert performance and storage requirement compared to Oracle

От
Scott Carey
Дата:
On Oct 26, 2010, at 2:54 PM, Mladen Gogala wrote:

> On 10/26/2010 5:27 PM, Jon Nelson wrote:
>> start loop:
>>   populate rows in temporary table
>>   insert from temporary table into permanent table
>>   truncate temporary table
>>   loop
>>
>> I do something similar, where I COPY data to a temporary table, do
>> lots of manipulations, and then perform a series of INSERTS from the
>> temporary table into a permanent table.
>>
>
> 1) It's definitely not faster because you have to insert into the
> temporary table, in addition to inserting into the permanent table.

It is almost always significantly faster than a direct bulk load into a table.
* The temp table has no indexes, the final table usually does, bulk operations on indexes are faster than per row
operations.
* The final table might require both updates and inserts, doing these in bulk from a temp stage table is far faster
thanper row. 
* You don't even have to commit after the merge from the temp table, and can loop until its all done, then commit --
thoughthis can have table/index bloat implications if doing updates. 

> 2) This is what I had in mind:
>
> mgogala=# create table a(c1 int);
> CREATE TABLE
> mgogala=# create temporary table t1(c1 int) on commit delete rows;
> CREATE TABLE
> mgogala=# begin;
> BEGIN
> mgogala=# insert into t1 select generate_series(1,1000);
> INSERT 0 1000
> mgogala=# insert into a select * from t1;
> INSERT 0 1000
> mgogala=# commit;
> COMMIT
> mgogala=# select count(*) from a;
>  count
> -------
>   1000
> (1 row)
>
> mgogala=# select count(*) from t1;
>  count
> -------
>      0
> (1 row)
>
> The table is created with "on commit obliterate rows" option which means
> that there is no need to do "truncate". The "truncate" command is a
> heavy artillery. Truncating a temporary table is like shooting ducks in
> a duck pond, with a howitzer.

???  Test it.  DELETE is slow, truncate is nearly instantaneous for normal tables.  For temp tables its the same thing.
Maybe in Oracle TRUNCATE is a howitzer, in Postgres its lightweight.  Your loop above requires a commit after every
1000rows.  What if you require that all rows are seen at once or not at all?  What if you fail part way through?  One
bigtransaction is often a better idea and/or required.  Especially in postgres, with no undo-log, bulk inserts in one
largetransaction work out very well -- usually better than multiple smaller transactions. 
>
> --
>
> Mladen Gogala
> Sr. Oracle DBA
> 1500 Broadway
> New York, NY 10036
> (212) 329-5251
> http://www.vmsinfo.com
> The Leader in Integrated Media Intelligence Solutions
>
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


Re: Postgres insert performance and storage requirement compared to Oracle

От
Merlin Moncure
Дата:
On Tue, Oct 26, 2010 at 11:10 PM, Divakar Singh <dpsmails@yahoo.com> wrote:
> Dear All,
> Thanks for your inputs on the insert performance part.
> Any suggestion on storage requirement?
> VACUUM is certainly not an option, because this is something related to
> maintenance AFTER insertion.
> I am talking about the plain storage requirement w.r. to Oracle, which I
> observed is twice of Oracle in case millions of rows are inserted.
> Anybody who tried to analyze the average storage requirement of PG w.r. to
> Oracle?

There isn't much you can to about storage use other than avoid stupid
things (like using char() vs varchar()), smart table layout, toast
compression, etc.  Are you sure this is a problem?

merlin

Re: Postgres insert performance and storage requirement compared to Oracle

От
Mladen Gogala
Дата:
On 10/27/2010 1:48 PM, Scott Carey wrote:

It is almost always significantly faster than a direct bulk load into a table.  
* The temp table has no indexes, the final table usually does, bulk operations on indexes are faster than per row operations.
* The final table might require both updates and inserts, doing these in bulk from a temp stage table is far faster than per row.
* You don't even have to commit after the merge from the temp table, and can loop until its all done, then commit -- though this can have table/index bloat implications if doing updates.

Scott, I find this very hard to believe. If you are inserting into a temporary table and then into the target table, you will do 2 inserts instead of just one. What you are telling me is that it is faster for me to drive from NYC to Washington DC by driving first to Miami and then from Miami to DC.

2) This is what I had in mind:

mgogala=# create table a(c1 int);
CREATE TABLE
mgogala=# create temporary table t1(c1 int) on commit delete rows;
CREATE TABLE
mgogala=# begin;
BEGIN
mgogala=# insert into t1 select generate_series(1,1000);
INSERT 0 1000
mgogala=# insert into a select * from t1;
INSERT 0 1000
mgogala=# commit;
COMMIT
mgogala=# select count(*) from a;count
------- 1000
(1 row)

mgogala=# select count(*) from t1;count
-------    0
(1 row)

The table is created with "on commit obliterate rows" option which means 
that there is no need to do "truncate". The "truncate" command is a 
heavy artillery. Truncating a temporary table is like shooting ducks in 
a duck pond, with a howitzer.
???  Test it.  DELETE is slow, truncate is nearly instantaneous for normal tables.  For temp tables its the same thing.  Maybe in Oracle TRUNCATE is a howitzer, in Postgres its lightweight.  

Truncate has specific list of tasks to do:
1)  lock the table in the exclusive mode to prevent concurrent transactions on the table.
2)  Release the file space and update the table headers.
3)  Flush any buffers possibly residing in shared memory.
4)  Repeat the procedures on the indexes.

Of course, in case of the normal table, all of these changes are logged, possibly producing WAL archives. That is still much faster than delete which depends on the number of rows that need to be deleted, but not exactly lightweight, either. In Postgres, truncate recognizes that the table is a temporary table so it makes a few shortcuts, which makes the truncate faster.

1) No need to flush buffers.
2) Locking requirements are much less stringent.
3) No WAL archives are produced.

Temporary tables are completely different beasts in Oracle and Postgres. Yes, you are right, truncate of a temporary table is a big no-no in the Oracle world, especially in the RAC environment. However, I do find "ON COMMIT DELETE ROWS" trick to be more elegant than the truncate. Here is the classic Tom Kyte, on the topic of truncating the temporary tables: http://tinyurl.com/29kph3p

"
NO. truncate is DDL. DDL is expensive. Truncation is something that should be done very infrequently.
 Now, I don't mean "turn your truncates into DELETE's" -- that would  be even worse. I mean -- avoid having
 to truncate or delete every row in the first  place. Use a transaction based temporary table and upon commit, it'll empty itself."

Your loop above requires a commit after every 1000 rows.  What if you require that all rows are seen at once or not at all?  What if you fail part way through?  One big transaction is often a better idea and/or required.  Especially in postgres, with no undo-log, bulk inserts in one large transaction work out very well -- usually better than multiple smaller transactions.

I don't contest that. I also prefer to do things in one big transaction, if possible.

-- 
Mladen Gogala 
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions


Re: Postgres insert performance and storage requirement compared to Oracle

От
Merlin Moncure
Дата:
On Wed, Oct 27, 2010 at 2:06 PM, Mladen Gogala
<mladen.gogala@vmsinfo.com> wrote:
> Scott, I find this very hard to believe. If you are inserting into a
> temporary table and then into the target table, you will do 2 inserts
> instead of just one. What you are telling me is that it is faster for me to
> drive from NYC to Washington DC by driving first to Miami and then from
> Miami to DC.

The reason why in one transaction per insert environment staging to
temp table first is very simple...non temp table inserts have to be
wal logged and fsync'd.  When you batch them into the main table, you
get more efficient use of WAL and ONE sync operation.  This is
especially advantageous if the inserts are coming fast and furious and
there are other things going on in the database at the time, or there
are multiple inserters.

If you have luxury of batching data in a transaction, you don't have
to worry about it.

merlin

Re: Postgres insert performance and storage requirement compared to Oracle

От
Divakar Singh
Дата:
yes this is a very clearly visible problem.
The difference b/w oracle and PG increases with more rows.
when oracle takes 3 GB, PG takes around 6 GB.
I only use varchar.
I will try to use your tips on "smart table layout, toast compression".
Assuming these suggested options do not have any performance penalty?
 
Best Regards,
Divakar



From: Merlin Moncure <mmoncure@gmail.com>
To: Divakar Singh <dpsmails@yahoo.com>
Cc: Robert Haas <robertmhaas@gmail.com>; Mladen Gogala <mladen.gogala@vmsinfo.com>; pgsql-performance@postgresql.org
Sent: Wed, October 27, 2010 11:36:00 PM
Subject: Re: [PERFORM] Postgres insert performance and storage requirement compared to Oracle

On Tue, Oct 26, 2010 at 11:10 PM, Divakar Singh <dpsmails@yahoo.com> wrote:
> Dear All,
> Thanks for your inputs on the insert performance part.
> Any suggestion on storage requirement?
> VACUUM is certainly not an option, because this is something related to
> maintenance AFTER insertion.
> I am talking about the plain storage requirement w.r. to Oracle, which I
> observed is twice of Oracle in case millions of rows are inserted.
> Anybody who tried to analyze the average storage requirement of PG w.r. to
> Oracle?

There isn't much you can to about storage use other than avoid stupid
things (like using char() vs varchar()), smart table layout, toast
compression, etc.  Are you sure this is a problem?

merlin

Re: Postgres insert performance and storage requirement compared to Oracle

От
Merlin Moncure
Дата:
On Wed, Oct 27, 2010 at 2:14 PM, Divakar Singh <dpsmails@yahoo.com> wrote:
> yes this is a very clearly visible problem.
> The difference b/w oracle and PG increases with more rows.
> when oracle takes 3 GB, PG takes around 6 GB.
> I only use varchar.
> I will try to use your tips on "smart table layout, toast compression".
> Assuming these suggested options do not have any performance penalty?

These will only be helpful in particular cases, for example if your
layout is bad :-).  toast compression is for dealing with large datums
(on by default iirc).  Also it's very hard to get apples to apples
comparison test via synthetic insertion benchmark.  It's simply not
the whole story.

The deal with postgres is that things are pretty optimized and fairly
unlikely to get a whole lot better than they are today.  The table
layout is pretty optimal already, nulls are bitmaps, data lengths are
using fancy bitwise length mechanism, etc.  Each record in postgres
has a 20 byte header that has to be factored in to any storage
estimation, plus the index usage.

Postgres indexes are pretty compact, and oracle (internals I am not
familiar with) also has to do MVCC type management, so I am suspecting
your measurement is off (aka, operator error) or oracle is cheating
somehow by optimizing away storage requirements somehow via some sort
of tradeoff.  However you still fail to explain why storage size is a
problem.  Are planning to port oracle to postgres on a volume that is
>50% full? :-)

merlin

Re: Postgres insert performance and storage requirement compared to Oracle

От
Jesper Krogh
Дата:
On 2010-10-27 20:28, Merlin Moncure wrote:
> Postgres indexes are pretty compact, and oracle (internals I am not
> familiar with) also has to do MVCC type management, so I am suspecting
> your measurement is off (aka, operator error) or oracle is cheating
> somehow by optimizing away storage requirements somehow via some sort
> of tradeoff.  However you still fail to explain why storage size is a
> problem.  Are planning to port oracle to postgres on a volume that is
> 50% full? :-)
>
Pretty ignorant comment.. sorry ..

But when your database approaches something that is not mainly
fitting in memory, space directly translates into speed and a more
compact table utillizes the OS-page cache better. This is both
true for index and table page caching.

And the more compact your table the later you hit the stage where
you cant fit into memory anymore.

.. but if above isn't issues, then your statements are true.

--
Jesper

Re: Postgres insert performance and storage requirement compared to Oracle

От
Steve Singer
Дата:
On 10-10-27 02:14 PM, Divakar Singh wrote:
> yes this is a very clearly visible problem.
> The difference b/w oracle and PG increases with more rows.
> when oracle takes 3 GB, PG takes around 6 GB.
> I only use varchar.
> I will try to use your tips on "smart table layout, toast compression".
> Assuming these suggested options do not have any performance penalty?
> Best Regards,
> Divakar

In between test runs are you cleaning out the tables with a "DELETE FROM
aaaaa" or are you using the TRUNCATE command? Or dropping the table and
recreating it.

If your just using DELETE it might be that disk space is still being
used by the old versions of the rows.

Also is postgresql using more space than oracle for storing the index
data or the main table data?  and is any particular index larger on
postgresql compared to Oracle.




>
>
> ------------------------------------------------------------------------
> *From:* Merlin Moncure <mmoncure@gmail.com>
> *To:* Divakar Singh <dpsmails@yahoo.com>
> *Cc:* Robert Haas <robertmhaas@gmail.com>; Mladen Gogala
> <mladen.gogala@vmsinfo.com>; pgsql-performance@postgresql.org
> *Sent:* Wed, October 27, 2010 11:36:00 PM
> *Subject:* Re: [PERFORM] Postgres insert performance and storage
> requirement compared to Oracle
>
> On Tue, Oct 26, 2010 at 11:10 PM, Divakar Singh <dpsmails@yahoo.com
> <mailto:dpsmails@yahoo.com>> wrote:
>  > Dear All,
>  > Thanks for your inputs on the insert performance part.
>  > Any suggestion on storage requirement?
>  > VACUUM is certainly not an option, because this is something related to
>  > maintenance AFTER insertion.
>  > I am talking about the plain storage requirement w.r. to Oracle, which I
>  > observed is twice of Oracle in case millions of rows are inserted.
>  > Anybody who tried to analyze the average storage requirement of PG
> w.r. to
>  > Oracle?
>
> There isn't much you can to about storage use other than avoid stupid
> things (like using char() vs varchar()), smart table layout, toast
> compression, etc. Are you sure this is a problem?
>
> merlin
>


Re: Postgres insert performance and storage requirement compared to Oracle

От
Merlin Moncure
Дата:
On Wed, Oct 27, 2010 at 2:42 PM, Jesper Krogh <jesper@krogh.cc> wrote:
> On 2010-10-27 20:28, Merlin Moncure wrote:
>>
>> Postgres indexes are pretty compact, and oracle (internals I am not
>> familiar with) also has to do MVCC type management, so I am suspecting
>> your measurement is off (aka, operator error) or oracle is cheating
>> somehow by optimizing away storage requirements somehow via some sort
>> of tradeoff.  However you still fail to explain why storage size is a
>> problem.  Are planning to port oracle to postgres on a volume that is
>> 50% full? :-)
>>
>
> Pretty ignorant comment.. sorry ..
>
> But when your database approaches something that is not mainly
> fitting in memory, space directly translates into speed and a more
> compact table utillizes the OS-page cache better. This is both
> true for index and table page caching.
>
> And the more compact your table the later you hit the stage where
> you cant fit into memory anymore.
>
> .. but if above isn't issues, then your statements are true.

Yes, I am quite aware of how the o/s page cache works.  All else being
equal, I more compact database obviously would be preferred.  However
'all else' is not necessarily equal.  I can mount my database on bzip
volume, that must make it faster, right?  wrong.  I understand the
postgres storage architecture pretty well, and the low hanging fruit
having been grabbed further layout compression is only going to come
as a result of tradeoffs.

Now, comparing oracle vs postgres, mvcc works differently because
oracle uses rollback logs while postgres maintains extra/old versions
in the heap.  This will add up to big storage usage based on various
things, but should not so much be reflected via insert only test.

merlin

Re: Postgres insert performance and storage requirement compared to Oracle

От
Alex Hunsaker
Дата:
On Wed, Oct 27, 2010 at 08:00, Divakar Singh <dpsmails@yahoo.com> wrote:
> I am attaching my code below.
> Is any optimization possible in this?
> Do prepared statements help in cutting down the insert time to half for this
> kind of inserts?

In half? not for me.  Optimization possible? Sure, using the code you
pasted (time ./a.out 100000 <method>):
PQexec: 41s
PQexecPrepared: 36s
1 insert statement: 7s
COPY: 1s
psql: 256ms

Basically the above echoes the suggestions of others, use COPY if you can.

Find the source for the above attached.  Its just a very quick
modified version of what you posted. [ disclaimer the additions I
added are almost certainly missing some required error checking... ]

[ psql is fast because the insert is really dumb: insert into aaaa (a,
b, c, d, e, f, g, h, j, k, l, m, n, p) select 1, 'asdf', 'asdf',
'asdf', 'asdf', 'asdf', 'asdf', 'asdf', 'asdf', 'asdf', 'asdf',
'asdf', 'asdf', 'asdf' from generate_series(1, 100000); ]

Вложения

Re: Postgres insert performance and storage requirement compared to Oracle

От
Jesper Krogh
Дата:
On 2010-10-27 20:51, Merlin Moncure wrote:

>> Yes, I am quite aware of how the o/s page cache works.  All else being
>> equal, I more compact database obviously would be preferred.  However
>> 'all else' is not necessarily equal.  I can mount my database on bzip
>> volume, that must make it faster, right?  wrong.  I understand the
>> postgres storage architecture pretty well, and the low hanging fruit
>> having been grabbed further layout compression is only going to come
>> as a result of tradeoffs.
>>
Or configureabillity.. Not directly related to overall space consumption
but I have been working on a patch that would make TOAST* kick in
earlier in the process, giving a "slimmer" main table with visibillity
information
and simple columns and moving larger colums more aggressively to TOAST.

The overall disadvantage of TOAST is the need for an extra disk seek if
you actually need the data. If the application rarely needs the large
columns but often do count/filtering on simple values this will eventually
lead to a better utillization of the OS-page-cache with a very small
overhead
to PG (in terms of code) and 0 overhead in the applications that benefit.

Keeping in mind that as SSD-drives get more common the "the extra disk seek"
drops dramatically, but the drive is by itself probably still 100-1000x
slower than
main memory, so keeping "the right data" in the OS-cache is also a
parameter.

If you deal with data where the individual tuple-size goes up, currently
TOAST
first kicks in at 2KB (compressed size) which leads to a very sparse
main table
in terms of visibillity information and count and selects on simple values
will drag a huge amount of data into the cache-layers thats not needed
there.

Another suggestion could be to make the compression of text columns kick in
earlier .. if thats possible. (I dont claim that its achiveable)

Unless the tuple-header is hugely bloated I have problems creating a
situation in my
head where hammering that one can change anything significantly.

* http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg159726.html

--
Jesper

Re: Re: Postgres insert performance and storage requirement compared to Oracle

От
Robert Haas
Дата:
On Wed, Oct 27, 2010 at 6:13 AM, Ivan Voras <ivoras@freebsd.org> wrote:
> On 10/26/10 17:41, Merlin Moncure wrote:
>> On Tue, Oct 26, 2010 at 11:08 AM, Leonardo Francalanci <m_lists@yahoo.it> wrote:
>>>> temp  tables are not wal logged or
>>>> synced.  Periodically they can be flushed  to a permanent table.
>>>
>>>
>>> What do you mean with "Periodically they can be flushed  to
>>> a permanent table"? Just doing
>>>
>>> insert into tabb select * from temptable
>>>
>>
>> yup, that's exactly what I mean -- this will give you more uniform
>
> In effect, when so much data is in temporary storage, a better option
> would be to simply configure "synchronous_commit = off" (better in the
> sense that the application would not need to be changed). The effects
> are almost the same - in both cases transactions might be lost but the
> database will survive.

Gee, I wonder if it would possible for PG to automatically do an
asynchronous commit of any transaction which touches only temp tables.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Postgres insert performance and storage requirement compared to Oracle

От
Divakar Singh
Дата:
So another question pops up: What method in PostgreSQL does the stored proc use when I issue multiple insert (for loop for 100 thousand records) in the stored proc?
It takes half the time compared to the consecutive "insert" using libpq.
In the backend, does it use COPY or prepared statement? or something else?
 
Best Regards,
Divakar



From: Alex Hunsaker <badalex@gmail.com>
To: Divakar Singh <dpsmails@yahoo.com>
Cc: Steve Singer <ssinger@ca.afilias.info>; jd@commandprompt.com; pgsql-performance@postgresql.org
Sent: Thu, October 28, 2010 1:15:06 AM
Subject: Re: [PERFORM] Postgres insert performance and storage requirement compared to Oracle

On Wed, Oct 27, 2010 at 08:00, Divakar Singh <dpsmails@yahoo.com> wrote:
> I am attaching my code below.
> Is any optimization possible in this?
> Do prepared statements help in cutting down the insert time to half for this
> kind of inserts?

In half? not for me.  Optimization possible? Sure, using the code you
pasted (time ./a.out 100000 <method>):
PQexec: 41s
PQexecPrepared: 36s
1 insert statement: 7s
COPY: 1s
psql: 256ms

Basically the above echoes the suggestions of others, use COPY if you can.

Find the source for the above attached.  Its just a very quick
modified version of what you posted. [ disclaimer the additions I
added are almost certainly missing some required error checking... ]

[ psql is fast because the insert is really dumb: insert into aaaa (a,
b, c, d, e, f, g, h, j, k, l, m, n, p) select 1, 'asdf', 'asdf',
'asdf', 'asdf', 'asdf', 'asdf', 'asdf', 'asdf', 'asdf', 'asdf',
'asdf', 'asdf', 'asdf' from generate_series(1, 100000); ]

Re: Re: Postgres insert performance and storage requirement compared to Oracle

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> Gee, I wonder if it would possible for PG to automatically do an
> asynchronous commit of any transaction which touches only temp tables.

Hmm ... do we need a commit at all in such a case?  If our XID has only
gone into temp tables, I think we need to write to clog, but we don't
really need a WAL entry, synced or otherwise.

            regards, tom lane

Re: Postgres insert performance and storage requirement compared to Oracle

От
Alex Hunsaker
Дата:
On Wed, Oct 27, 2010 at 21:08, Divakar Singh <dpsmails@yahoo.com> wrote:
> So another question pops up: What method in PostgreSQL does the stored proc
> use when I issue multiple insert (for loop for 100 thousand records) in the
> stored proc?

It uses prepared statements (unless you are using execute).  There is
also the benefit of not being on the network.  Assuming 0.3ms avg
latency, 1 packet per query and 100,000 queries-- thats 30s just from
latency!  Granted this is just a silly estimate that happens to (more
or less) fit my numbers...

Re: Postgres insert performance and storage requirement compared to Oracle

От
Pavel Stehule
Дата:
Hello

2010/10/28 Divakar Singh <dpsmails@yahoo.com>:
> So another question pops up: What method in PostgreSQL does the stored proc
> use when I issue multiple insert (for loop for 100 thousand records) in the
> stored proc?

nothing special - but it run as inprocess inside server backend. The
are no data casting, there are no overhead from communication, there
are no overhead from content switch.

Regards

Pavel Stehule

> It takes half the time compared to the consecutive "insert" using libpq.
> In the backend, does it use COPY or prepared statement? or something else?
>
> Best Regards,
> Divakar
>
> ________________________________
> From: Alex Hunsaker <badalex@gmail.com>
> To: Divakar Singh <dpsmails@yahoo.com>
> Cc: Steve Singer <ssinger@ca.afilias.info>; jd@commandprompt.com;
> pgsql-performance@postgresql.org
> Sent: Thu, October 28, 2010 1:15:06 AM
> Subject: Re: [PERFORM] Postgres insert performance and storage requirement
> compared to Oracle
>
> On Wed, Oct 27, 2010 at 08:00, Divakar Singh <dpsmails@yahoo.com> wrote:
>> I am attaching my code below.
>> Is any optimization possible in this?
>> Do prepared statements help in cutting down the insert time to half for
>> this
>> kind of inserts?
>
> In half? not for me.  Optimization possible? Sure, using the code you
> pasted (time ./a.out 100000 <method>):
> PQexec: 41s
> PQexecPrepared: 36s
> 1 insert statement: 7s
> COPY: 1s
> psql: 256ms
>
> Basically the above echoes the suggestions of others, use COPY if you can.
>
> Find the source for the above attached.  Its just a very quick
> modified version of what you posted. [ disclaimer the additions I
> added are almost certainly missing some required error checking... ]
>
> [ psql is fast because the insert is really dumb: insert into aaaa (a,
> b, c, d, e, f, g, h, j, k, l, m, n, p) select 1, 'asdf', 'asdf',
> 'asdf', 'asdf', 'asdf', 'asdf', 'asdf', 'asdf', 'asdf', 'asdf',
> 'asdf', 'asdf', 'asdf' from generate_series(1, 100000); ]
>
>

Re: Postgres insert performance and storage requirement compared to Oracle

От
Merlin Moncure
Дата:
On Wed, Oct 27, 2010 at 3:47 PM, Jesper Krogh <jesper@krogh.cc> wrote:
> On 2010-10-27 20:51, Merlin Moncure wrote:
>
>>> Yes, I am quite aware of how the o/s page cache works.  All else being
>>> equal, I more compact database obviously would be preferred.  However
>>> 'all else' is not necessarily equal.  I can mount my database on bzip
>>> volume, that must make it faster, right?  wrong.  I understand the
>>> postgres storage architecture pretty well, and the low hanging fruit
>>> having been grabbed further layout compression is only going to come
>>> as a result of tradeoffs.
>>>
>
> Or configureabillity.. Not directly related to overall space consumption
> but I have been working on a patch that would make TOAST* kick in
> earlier in the process, giving a "slimmer" main table with visibillity
> information
> and simple columns and moving larger colums more aggressively to TOAST.

Do you have any benchmarks supporting if/when such a change would be beneficial?

merlin

Re: Postgres insert performance and storage requirement compared to Oracle

От
Jesper Krogh
Дата:
On 2010-10-28 15:13, Merlin Moncure wrote:
> On Wed, Oct 27, 2010 at 3:47 PM, Jesper Krogh<jesper@krogh.cc>  wrote:
>
>> On 2010-10-27 20:51, Merlin Moncure wrote:
>>
>>
>>>> Yes, I am quite aware of how the o/s page cache works.  All else being
>>>> equal, I more compact database obviously would be preferred.  However
>>>> 'all else' is not necessarily equal.  I can mount my database on bzip
>>>> volume, that must make it faster, right?  wrong.  I understand the
>>>> postgres storage architecture pretty well, and the low hanging fruit
>>>> having been grabbed further layout compression is only going to come
>>>> as a result of tradeoffs.
>>>>
>>>>
>> Or configureabillity.. Not directly related to overall space consumption
>> but I have been working on a patch that would make TOAST* kick in
>> earlier in the process, giving a "slimmer" main table with visibillity
>> information
>> and simple columns and moving larger colums more aggressively to TOAST.
>>
> Do you have any benchmarks supporting if/when such a change would be beneficial?
>
>
On, IO-bound queries it pretty much translates to the ration between
the toast-table-size vs. the main-table-size.

Trying to aggressively speed up "select count(*) from table" gives this:
http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg146153.html
with shutdown of pg and drop caches inbetween... the "default" select
count (*) on 50K tuples
gives 4.613ms (2 tuples pr page) vs. 318ms... (8 tuples pr page).

PG default is inbetween...


--
Jesper


Re: Postgres insert performance and storage requirement compared to Oracle

От
Divakar Singh
Дата:
Dear All,
Thanks for your inputs on the insert performance part.
Any suggestion on storage requirement?
VACUUM is certainly not an option, because this is something related to maintenance AFTER insertion.
I am talking about the plain storage requirement w.r. to Oracle, which I observed is twice of Oracle in case millions of rows are inserted.
Anybody who tried to analyze the average storage requirement of PG w.r. to Oracle?
 
Best Regards,
Divakar



From: Merlin Moncure <mmoncure@gmail.com>
To: Robert Haas <robertmhaas@gmail.com>
Cc: Mladen Gogala <mladen.gogala@vmsinfo.com>; pgsql-performance@postgresql.org
Sent: Wed, October 27, 2010 4:46:53 AM
Subject: Re: [PERFORM] Postgres insert performance and storage requirement compared to Oracle

On Tue, Oct 26, 2010 at 6:50 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Oct 26, 2010 at 5:54 PM, Mladen Gogala
> <mladen.gogala@vmsinfo.com> wrote:
>> The table is created with "on commit obliterate rows" option which means
>> that there is no need to do "truncate". The "truncate" command is a heavy
>> artillery. Truncating a temporary table is like shooting ducks in a duck
>> pond, with a howitzer.
>
> This is just not true.  ON COMMIT DELETE ROWS simply arranges for a
> TRUNCATE to happen immediately before each commit.  See
> PreCommit_on_commit_actions() in tablecmds.c.

quite so.  If you are doing anything performance sensitive with 'on
commit drop', you are better off organizing a cache around
txid_current() (now(), pid for older pg versions).  Skips the writes
to the system catalogs and truncate.

merlin

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance