Обсуждение: getGeneratedKeys method

Поиск
Список
Период
Сортировка

getGeneratedKeys method

От
Ceki Gulcu
Дата:
Hello,

We, the log4j developers, are working on an appender
that will write
log4j events to various RDBMS Various fields of an
event object will
go to 3 different tables, namely the logging_event
table, the
logging_event_property table and the
logging_event_exception
table. Each entry in the logging_event_property and
the
logging_event_exception tables uses a reference to
event_id, where
event_id is a database generated primary key in the
logging_event
table.

For entries in the logging_event_property and
logging_event_exception
tables to be meaningful, we absolutely need the
generated event_id
each time we insert a new logging_event row. We are
able to do this by
following each logging_event insertion with a query to
the database
asking for the event_id of the last inserted event.
This works for
multiple database systems.

However, we have discovered that batching multiple
insertions gives a
very significant boost in performance. Thus, we would
like to insert
say 50 logging_event rows, then multiple
logging_event_property rows
and then multiple logging_event_exception rows. We are
using the
JDBC getGeneratedKeys method to obtain the event_ids.
Unfortunately, this
function is not implemented in Postgresql.


Looking at the archives it seems that this
functionality requires
support for the database back end which does not exist
currently. Is
that correct?

Is there another way to insert into multiple tables in
batches as
described above without using the JDBC
getGeneratedKeys  method?

Your views on the matter would be highly appreciated.
Thanking you
in advance,


--
Ceki G�lc�

     For log4j documentation consider "The complete
log4j manual"
     ISBN: 2970036908
http://www.qos.ch/shop/products/clm_t.jsp





__________________________________
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/

Re: getGeneratedKeys method

От
Dave Cramer
Дата:
Ceki,

Yes, this is correct, there is no functionality in postgres to get the
generated keys.

This is because the keys are generated using a sequence which can be any
of the columns in the row, there is no apriori knowledge of the column
that is being generated.

There are three ways around this.

1) get the key before the insert. select nextval('sequence_name');

2) get the key after the insert leaving the key value out of the insert,
or inserting a null there. select currval('sequence_name');

3) do an insert and a select at the same time ie.
insert into foo ....; select curval('sequence name')

FYI, nextval and currval are valid in the context, so you don't have to
worry about other connections getting in the way. These functions do the
"right" thing.



On Wed, 2004-05-26 at 07:56, Ceki Gulcu wrote:
> Hello,
>
> We, the log4j developers, are working on an appender
> that will write
> log4j events to various RDBMS Various fields of an
> event object will
> go to 3 different tables, namely the logging_event
> table, the
> logging_event_property table and the
> logging_event_exception
> table. Each entry in the logging_event_property and
> the
> logging_event_exception tables uses a reference to
> event_id, where
> event_id is a database generated primary key in the
> logging_event
> table.
>
> For entries in the logging_event_property and
> logging_event_exception
> tables to be meaningful, we absolutely need the
> generated event_id
> each time we insert a new logging_event row. We are
> able to do this by
> following each logging_event insertion with a query to
> the database
> asking for the event_id of the last inserted event.
> This works for
> multiple database systems.
>
> However, we have discovered that batching multiple
> insertions gives a
> very significant boost in performance. Thus, we would
> like to insert
> say 50 logging_event rows, then multiple
> logging_event_property rows
> and then multiple logging_event_exception rows. We are
> using the
> JDBC getGeneratedKeys method to obtain the event_ids.
> Unfortunately, this
> function is not implemented in Postgresql.
>
>
> Looking at the archives it seems that this
> functionality requires
> support for the database back end which does not exist
> currently. Is
> that correct?
>
> Is there another way to insert into multiple tables in
> batches as
> described above without using the JDBC
> getGeneratedKeys  method?
>
> Your views on the matter would be highly appreciated.
> Thanking you
> in advance,
--
Dave Cramer
519 939 0336
ICQ # 14675561


Re: getGeneratedKeys method

От
Oliver Jowett
Дата:
Ceki Gulcu wrote:
> For entries in the logging_event_property and
> logging_event_exception
> tables to be meaningful, we absolutely need the
> generated event_id
> each time we insert a new logging_event row. We are
> able to do this by
> following each logging_event insertion with a query to
> the database
> asking for the event_id of the last inserted event.
> This works for
> multiple database systems.
>
> However, we have discovered that batching multiple
> insertions gives a
> very significant boost in performance.

Note that as it stands this isn't true for the postgresql driver,
executeBatch() actually just runs each query synchronously. Fixing this
is at the top of my todo list..

> Thus, we would
> like to insert
> say 50 logging_event rows, then multiple
> logging_event_property rows
> and then multiple logging_event_exception rows. We are
> using the
> JDBC getGeneratedKeys method to obtain the event_ids.
> Unfortunately, this
> function is not implemented in Postgresql.
>
>
> Looking at the archives it seems that this
> functionality requires
> support for the database back end which does not exist
> currently. Is
> that correct?

It'd probably be possible to implement support for returning the OID of
inserted rows in a batch update. You can already get at that OID for
individual inserts via a postgresql-specific interface,
PGStatement.getLastOID(). However, the OID doesn't make a great key
anyway, and I think the general move is towards omitting OIDs from
tables altogether.

Anything else is going to be harder as the server does not provide much
in the way of metadata back from a query, and the JDBC driver does
little parsing of the SQL queries.

If you don't mind doing DB-specific implementations, and you have all of
the data for a particular event available at once, you might want to
look at using CREATE SEQUENCE for the event ID (I guess you're already
doing this if you want autogenerated keys) and use something like:

   INSERT INTO logging_event(nextval('id'), ...)
   INSERT INTO logging_event_property(currval('id'), "property 1", ...)
   INSERT INTO logging_event_property(currval('id'), "property 2", ...)
   INSERT INTO logging_event_exception(currval('id'), 'line 1", ...)
   INSERT INTO logging_event_exception(currval('id'), "line 2", ...)

You'd be able to batch those INSERTs so long as you keep them grouped
correctly.

This is safe to do even if multiple connections are using the sequence
concurrently -- see
http://www.postgresql.org/docs/7.4/static/functions-sequence.html.

-O

Re: getGeneratedKeys method

От
Ceki Gulcu
Дата:
Hello,

Thanks for all the replies received so far. I really
appreciate it.

One important point I failed to mention in my original
mail, was that we are using a prepared statement to
perform the initial batch of inserts via executeBatch,
followed by another prepared statement and invoking
executeBatch and a third prepared
statement+executeBatch.

Your suggestions imply the use of regular statements
as opposed to prepared statements. I don't know
whether the performance boost is due to the use of
prepared statements or whether it's more the
"batching", that is regrouping the SQL statements in
one transaction.

If you are interested you can find the actual code at

http://cvs.apache.org/viewcvs.cgi/logging-log4j/src/java/org/apache/log4j/db/

DBAppender is the current implementation which works
with PostgreSQL, MySQL, Oracle, DB2 and MsSQL. With
pooled-connections, it takes under 10 milliseconds to
persist a single event. Without pooling, it's 50
millis.

DBAppender2 which uses prepares statements, we can get
the figure to as low as 1 milli. However, although the
performance is much better, the results are incorrect.

It the performance improvement is linked to the use of
prepared statements, then there isn't much I can do.
However, if regular statements would also yield a
performance improvement, I could try the techniques
suggested in previous mails in this thread.

Anyway, thank you for you assistance. It has been very
helpful.

--
Ceki G�lc�

     For log4j documentation consider "The complete
log4j manual"
     ISBN: 2970036908
http://www.qos.ch/shop/products/clm_t.jsp

--- Oliver Jowett <oliver@opencloud.com> wrote:
> Ceki Gulcu wrote:
> > For entries in the logging_event_property and
> > logging_event_exception
> > tables to be meaningful, we absolutely need the
> > generated event_id
> > each time we insert a new logging_event row. We
> are
> > able to do this by
> > following each logging_event insertion with a
> query to
> > the database
> > asking for the event_id of the last inserted
> event.
> > This works for
> > multiple database systems.
> >
> > However, we have discovered that batching multiple
> > insertions gives a
> > very significant boost in performance.
>
> Note that as it stands this isn't true for the
> postgresql driver,
> executeBatch() actually just runs each query
> synchronously. Fixing this
> is at the top of my todo list..
>
> > Thus, we would
> > like to insert
> > say 50 logging_event rows, then multiple
> > logging_event_property rows
> > and then multiple logging_event_exception rows. We
> are
> > using the
> > JDBC getGeneratedKeys method to obtain the
> event_ids.
> > Unfortunately, this
> > function is not implemented in Postgresql.
> >
> >
> > Looking at the archives it seems that this
> > functionality requires
> > support for the database back end which does not
> exist
> > currently. Is
> > that correct?
>
> It'd probably be possible to implement support for
> returning the OID of
> inserted rows in a batch update. You can already get
> at that OID for
> individual inserts via a postgresql-specific
> interface,
> PGStatement.getLastOID(). However, the OID doesn't
> make a great key
> anyway, and I think the general move is towards
> omitting OIDs from
> tables altogether.
>
> Anything else is going to be harder as the server
> does not provide much
> in the way of metadata back from a query, and the
> JDBC driver does
> little parsing of the SQL queries.
>
> If you don't mind doing DB-specific implementations,
> and you have all of
> the data for a particular event available at once,
> you might want to
> look at using CREATE SEQUENCE for the event ID (I
> guess you're already
> doing this if you want autogenerated keys) and use
> something like:
>
>    INSERT INTO logging_event(nextval('id'), ...)
>    INSERT INTO logging_event_property(currval('id'),
> "property 1", ...)
>    INSERT INTO logging_event_property(currval('id'),
> "property 2", ...)
>    INSERT INTO
> logging_event_exception(currval('id'), 'line 1",
> ...)
>    INSERT INTO
> logging_event_exception(currval('id'), "line 2",
> ...)
>
> You'd be able to batch those INSERTs so long as you
> keep them grouped
> correctly.
>
> This is safe to do even if multiple connections are
> using the sequence
> concurrently -- see
>
http://www.postgresql.org/docs/7.4/static/functions-sequence.html.
>
> -O
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org





__________________________________
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/

Re: getGeneratedKeys method

От
Oliver Jowett
Дата:
Ceki Gulcu wrote:
> Hello,
>
> Thanks for all the replies received so far. I really
> appreciate it.
>
> One important point I failed to mention in my original
> mail, was that we are using a prepared statement to
> perform the initial batch of inserts via executeBatch,
> followed by another prepared statement and invoking
> executeBatch and a third prepared
> statement+executeBatch.
>
> Your suggestions imply the use of regular statements
> as opposed to prepared statements. I don't know
> whether the performance boost is due to the use of
> prepared statements or whether it's more the
> "batching", that is regrouping the SQL statements in
> one transaction.

I initially thought you could batch the inserts in the approach I
suggested, but of course addBatch and executeBatch live on a particular
PreparedStatement so you can't do that.

Ok, how about something like this instead:

   CREATE SEQUENCE id INCREMENT BY 100;

Then limit batch size to 100, and for each batch do:

   SELECT nextval('id')
   -- let's assume this returns 1000

Next do the actual batch inserts, grouped by statement and computing ID
values yourself based on the returned nextval + offset:

   INSERT INTO logging_event VALUES (1000, ...)
   INSERT INTO logging_event VALUES (1001, ...)
   INSERT INTO logging_event VALUES (1002, ...)
   -- ...
   INSERT INTO logging_event VALUES (1099, ...)

   INSERT INTO logging_event_property VALUES (1000, 'foo', ...)
   INSERT INTO logging_event_property VALUES (1000, 'bar', ...)
   INSERT INTO logging_event_property VALUES (1001, 'foo', ...)
   INSERT INTO logging_event_property VALUES (1001, 'bar', ...)
   -- etc

   INSERT INTO logging_event_exception VALUES (1000, 'line 1', ...)
   INSERT INTO logging_event_exception VALUES (1000, 'line 2', ...)
   INSERT INTO logging_event_exception VALUES (1001, 'line 1', ...)
   INSERT INTO logging_event_exception VALUES (1001, 'line 2', ...)
   -- etc

Because of the INCREMENT BY clause, the generated IDs won't collide with
another concurrent inserter. This should let you use both
PreparedStatement and batch execution I think.

There is a tradeoff between rate of ID consumption and maximum batch
size to be made, but if you're using int8 for IDs then ID consumption is
unlikely to be an issue..

-O

Re: getGeneratedKeys method

От
Ceki Gulcu
Дата:
Hi Oliver,

The increment by N aproach, where N < batch size, is
imho very original. It should work but the DBAppender
code has to be compatible with different RDBMSs. The
approach you suggest assumes that we can manually set
event_id, which is not necessarily always the case, or
maybe it is, but I am not sure. More importantly, on
some RDBMS we are using auto-generated keys where it
is not always possible to set the increment value.
(Maybe it is always possible, I'll have to check that
too.)

Anyway, I now know when an increment value other than
1 could be useful.

Thanks again for your help.

--- Oliver Jowett <oliver@opencloud.com> wrote:
> Ceki Gulcu wrote:
> > Hello,
> >
> > Thanks for all the replies received so far. I
> really
> > appreciate it.
> >
> > One important point I failed to mention in my
> original
> > mail, was that we are using a prepared statement
> to
> > perform the initial batch of inserts via
> executeBatch,
> > followed by another prepared statement and
> invoking
> > executeBatch and a third prepared
> > statement+executeBatch.
> >
> > Your suggestions imply the use of regular
> statements
> > as opposed to prepared statements. I don't know
> > whether the performance boost is due to the use of
> > prepared statements or whether it's more the
> > "batching", that is regrouping the SQL statements
> in
> > one transaction.
>
> I initially thought you could batch the inserts in
> the approach I
> suggested, but of course addBatch and executeBatch
> live on a particular
> PreparedStatement so you can't do that.
>
> Ok, how about something like this instead:
>
>    CREATE SEQUENCE id INCREMENT BY 100;
>
> Then limit batch size to 100, and for each batch do:
>
>    SELECT nextval('id')
>    -- let's assume this returns 1000
>
> Next do the actual batch inserts, grouped by
> statement and computing ID
> values yourself based on the returned nextval +
> offset:
>
>    INSERT INTO logging_event VALUES (1000, ...)
>    INSERT INTO logging_event VALUES (1001, ...)
>    INSERT INTO logging_event VALUES (1002, ...)
>    -- ...
>    INSERT INTO logging_event VALUES (1099, ...)
>
>    INSERT INTO logging_event_property VALUES (1000,
> 'foo', ...)
>    INSERT INTO logging_event_property VALUES (1000,
> 'bar', ...)
>    INSERT INTO logging_event_property VALUES (1001,
> 'foo', ...)
>    INSERT INTO logging_event_property VALUES (1001,
> 'bar', ...)
>    -- etc
>
>    INSERT INTO logging_event_exception VALUES (1000,
> 'line 1', ...)
>    INSERT INTO logging_event_exception VALUES (1000,
> 'line 2', ...)
>    INSERT INTO logging_event_exception VALUES (1001,
> 'line 1', ...)
>    INSERT INTO logging_event_exception VALUES (1001,
> 'line 2', ...)
>    -- etc
>
> Because of the INCREMENT BY clause, the generated
> IDs won't collide with
> another concurrent inserter. This should let you use
> both
> PreparedStatement and batch execution I think.
>
> There is a tradeoff between rate of ID consumption
> and maximum batch
> size to be made, but if you're using int8 for IDs
> then ID consumption is
> unlikely to be an issue..
>
> -O





__________________________________
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/

Re: getGeneratedKeys method

От
Dave Cramer
Дата:
Ceki,

Couple of things:

1) alot of persistence layers use a GenerateKey interface to generate
the key ahead of time, that way you can use multiple implementations for
whichever db you use.

2) MSSQL for sure will not let you insert any value into the serial
column.

3) there's yet another option for serial values in postgres which is
cacheing, if you were going to increment by 10, you may wish to cache by
100 which means that for each connection it will only have to write to
the backend 1 time for every 10 fetch's

Dave
On Thu, 2004-05-27 at 09:50, Ceki Gulcu wrote:
> Hi Oliver,
>
> The increment by N aproach, where N < batch size, is
> imho very original. It should work but the DBAppender
> code has to be compatible with different RDBMSs. The
> approach you suggest assumes that we can manually set
> event_id, which is not necessarily always the case, or
> maybe it is, but I am not sure. More importantly, on
> some RDBMS we are using auto-generated keys where it
> is not always possible to set the increment value.
> (Maybe it is always possible, I'll have to check that
> too.)
>
> Anyway, I now know when an increment value other than
> 1 could be useful.
>
> Thanks again for your help.
>
> --- Oliver Jowett <oliver@opencloud.com> wrote:
> > Ceki Gulcu wrote:
> > > Hello,
> > >
> > > Thanks for all the replies received so far. I
> > really
> > > appreciate it.
> > >
> > > One important point I failed to mention in my
> > original
> > > mail, was that we are using a prepared statement
> > to
> > > perform the initial batch of inserts via
> > executeBatch,
> > > followed by another prepared statement and
> > invoking
> > > executeBatch and a third prepared
> > > statement+executeBatch.
> > >
> > > Your suggestions imply the use of regular
> > statements
> > > as opposed to prepared statements. I don't know
> > > whether the performance boost is due to the use of
> > > prepared statements or whether it's more the
> > > "batching", that is regrouping the SQL statements
> > in
> > > one transaction.
> >
> > I initially thought you could batch the inserts in
> > the approach I
> > suggested, but of course addBatch and executeBatch
> > live on a particular
> > PreparedStatement so you can't do that.
> >
> > Ok, how about something like this instead:
> >
> >    CREATE SEQUENCE id INCREMENT BY 100;
> >
> > Then limit batch size to 100, and for each batch do:
> >
> >    SELECT nextval('id')
> >    -- let's assume this returns 1000
> >
> > Next do the actual batch inserts, grouped by
> > statement and computing ID
> > values yourself based on the returned nextval +
> > offset:
> >
> >    INSERT INTO logging_event VALUES (1000, ...)
> >    INSERT INTO logging_event VALUES (1001, ...)
> >    INSERT INTO logging_event VALUES (1002, ...)
> >    -- ...
> >    INSERT INTO logging_event VALUES (1099, ...)
> >
> >    INSERT INTO logging_event_property VALUES (1000,
> > 'foo', ...)
> >    INSERT INTO logging_event_property VALUES (1000,
> > 'bar', ...)
> >    INSERT INTO logging_event_property VALUES (1001,
> > 'foo', ...)
> >    INSERT INTO logging_event_property VALUES (1001,
> > 'bar', ...)
> >    -- etc
> >
> >    INSERT INTO logging_event_exception VALUES (1000,
> > 'line 1', ...)
> >    INSERT INTO logging_event_exception VALUES (1000,
> > 'line 2', ...)
> >    INSERT INTO logging_event_exception VALUES (1001,
> > 'line 1', ...)
> >    INSERT INTO logging_event_exception VALUES (1001,
> > 'line 2', ...)
> >    -- etc
> >
> > Because of the INCREMENT BY clause, the generated
> > IDs won't collide with
> > another concurrent inserter. This should let you use
> > both
> > PreparedStatement and batch execution I think.
> >
> > There is a tradeoff between rate of ID consumption
> > and maximum batch
> > size to be made, but if you're using int8 for IDs
> > then ID consumption is
> > unlikely to be an issue..
> >
> > -O
>
>
>
>
>
> __________________________________
> Do you Yahoo!?
> Friends.  Fun.  Try the all-new Yahoo! Messenger.
> http://messenger.yahoo.com/
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>
>
>
> !DSPAM:40b5f32a168261639319129!
>
>
--
Dave Cramer
519 939 0336
ICQ # 14675561


Re: getGeneratedKeys method

От
"Michael Nonemacher"
Дата:
A slight modification of (1) that I've used is "select
nextval('sequence_name'), nextval('sequence_name'),
nextval('sequence_name'), ...", to get the next n keys, then follow that
with a bunch of inserts that use those keys.  This lets (1) play a
little more nicely with batch inserts (although (3) may be more useful).

mike

-----Original Message-----
From: pgsql-jdbc-owner@postgresql.org
[mailto:pgsql-jdbc-owner@postgresql.org] On Behalf Of Dave Cramer
Sent: Wednesday, May 26, 2004 7:12 AM
To: Ceki Gulcu
Cc: pgsql-jdbc@postgresql.org
Subject: Re: [JDBC] getGeneratedKeys method


Ceki,

Yes, this is correct, there is no functionality in postgres to get the
generated keys.

This is because the keys are generated using a sequence which can be any
of the columns in the row, there is no apriori knowledge of the column
that is being generated.

There are three ways around this.

1) get the key before the insert. select nextval('sequence_name');

2) get the key after the insert leaving the key value out of the insert,
or inserting a null there. select currval('sequence_name');

3) do an insert and a select at the same time ie.
insert into foo ....; select curval('sequence name')

FYI, nextval and currval are valid in the context, so you don't have to
worry about other connections getting in the way. These functions do the
"right" thing.



On Wed, 2004-05-26 at 07:56, Ceki Gulcu wrote:
> Hello,
>
> We, the log4j developers, are working on an appender
> that will write
> log4j events to various RDBMS Various fields of an
> event object will
> go to 3 different tables, namely the logging_event
> table, the
> logging_event_property table and the
> logging_event_exception
> table. Each entry in the logging_event_property and
> the
> logging_event_exception tables uses a reference to
> event_id, where
> event_id is a database generated primary key in the logging_event
> table.
>
> For entries in the logging_event_property and logging_event_exception
> tables to be meaningful, we absolutely need the
> generated event_id
> each time we insert a new logging_event row. We are
> able to do this by
> following each logging_event insertion with a query to
> the database
> asking for the event_id of the last inserted event.
> This works for
> multiple database systems.
>
> However, we have discovered that batching multiple
> insertions gives a
> very significant boost in performance. Thus, we would
> like to insert
> say 50 logging_event rows, then multiple logging_event_property rows
> and then multiple logging_event_exception rows. We are
> using the
> JDBC getGeneratedKeys method to obtain the event_ids.
> Unfortunately, this
> function is not implemented in Postgresql.
>
>
> Looking at the archives it seems that this
> functionality requires
> support for the database back end which does not exist currently. Is
> that correct?
>
> Is there another way to insert into multiple tables in batches as
> described above without using the JDBC
> getGeneratedKeys  method?
>
> Your views on the matter would be highly appreciated. Thanking you
> in advance,
--
Dave Cramer
519 939 0336
ICQ # 14675561


---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if
your
      joining column's datatypes do not match

Re: getGeneratedKeys method

От
Sven Köhler
Дата:
> Yes, this is correct, there is no functionality in postgres to get the
> generated keys.

That's sad.

> This is because the keys are generated using a sequence which can be any
> of the columns in the row, there is no apriori knowledge of the column
> that is being generated.

First, we have to define what a "generated key" is! If you define it as
a value generated by a sequence, than this is hard to find out.

Simply take the following definition:
A column in the primary key is a "generated key", iff a value for the
columns hasn't been given in the INSERT and if the column has a default
value.

With this definition, is should be easy for the server to find the
generated keys.

> 2) get the key after the insert leaving the key value out of the insert,
> or inserting a null there. select currval('sequence_name');

but you have to know the sequence name for that. That's really bad!