Обсуждение: improvement suggestions for performance design
Hi I have the following scenario for a database that I need to design, and would like some hints on what to improve or do differently to achieve the desired performance goal, disregarding hardware and postgres tuning. The premise is an attribute database that stores about 100 different attribute types as attribute values. Every X seconds, Y number of new attribute values are stored in the database. X is constant and currently between 6 and 20 seconds, depending on the setup. In the future X could become as low as 3 seconds. Y can, within the next 5-10 years, become as high as 200 000. That means that for example, every 6 seconds 100 000 attributes needs to be written to the database. At the same time, somewhere between 5-20 users needs to read parts of those newly written attributes, maybe in total 30 000 attributes. This continues for the duration of the field operation, which could be 18hrs a day for 6 weeks. So the total db size is up towards 200 gigs. Now here is how I suggest doing this: 1- the tables table attribute_values: id int attr_type int ( references attribute_types(id) ) posX int posY int data_type int value varchar(50) table attribute_types: id int name varchar(200); 2- function a function that receives an array of data and inserts each attribute. perhaps one array per attribute data (type, posX, posY, data_type, value) so five arrays as in parameters ot the function 3- java client the client receives the data from a corba request, and splits it into, say 4 equally sized blocks and executes 4 threads that insert each block (this seems to be more efficient than just using one thread.) Now I am wondering if this is the most efficient way of doing it? - I know that I could group the attributes so that each type of attribute gets its own table with all attributes in one row. But I am not sure if that is any more efficient than ont attribute per row since I pass everything to the function as an array. With the above design a change in attribute types only requires changing the data in a table instead of having to modify the client, the function and the tables. - I am also wondering if writing the client and function in C would create a more efficient solution. any comments? ps, I am currently running postgres 8.1, but could probably use 8.2 if it is needed for functionality or performance reasons. It will run on a sparc machine with solaris 10 and perhaps 4-6 processors, as many GB of RAM as necessary and SCSI disks ( perhaps in raid 0 ). regards thomas
I would strongly suggest that you use a proper relational schema, instead of storing everything in two tables. I don't know your application, but a schema like that is called an Entity-Attribute-Value (though your entity seems to be just posx and posy) and it should raise a big red flag in the mind of any database designer. In particular, constructing queries against an EAV schema is a major pain in the ass. This has been discussed before on postgresql lists as well, you might want to search and read the previous discussions. Ignoring the EAV issue for a moment, it's hard to give advice without knowing what kind of queries are going to executed. Are the lookups always going to be by id? By posx/posy perhaps? By attribute? tfinneid@ifi.uio.no wrote: > Hi > > I have the following scenario for a database that I need to design, and > would like some hints on what to improve or do differently to achieve the > desired performance goal, disregarding hardware and postgres tuning. > > The premise is an attribute database that stores about 100 different > attribute types as attribute values. Every X seconds, Y number of new > attribute values are stored in the database. X is constant and currently > between 6 and 20 seconds, depending on the setup. In the future X could > become as low as 3 seconds. Y can, within the next 5-10 years, become as > high as 200 000. > > That means that for example, every 6 seconds 100 000 attributes needs to > be written to the database. > > At the same time, somewhere between 5-20 users needs to read parts of > those newly written attributes, maybe in total 30 000 attributes. > > This continues for the duration of the field operation, which could be > 18hrs a day for 6 weeks. So the total db size is up towards 200 gigs. > > Now here is how I suggest doing this: > > 1- the tables > > table attribute_values: > id int > attr_type int ( references attribute_types(id) ) > posX int > posY int > data_type int > value varchar(50) > > table attribute_types: > id int > name varchar(200); > > > > 2- function > > a function that receives an array of data and inserts each attribute. > perhaps one array per attribute data (type, posX, posY, data_type, > value) so five arrays as in parameters ot the function > > 3- java client > > the client receives the data from a corba request, and splits it > into, say 4 equally sized blocks and executes 4 threads that insert > each block (this seems to be more efficient than just using one > thread.) > > Now I am wondering if this is the most efficient way of doing it? > > - I know that I could group the attributes so that each type of attribute > gets its own table with all attributes in one row. But I am not sure if > that is any more efficient than ont attribute per row since I pass > everything to the function as an array. > With the above design a change in attribute types only requires changing > the data in a table instead of having to modify the client, the function > and the tables. > > - I am also wondering if writing the client and function in C would create > a more efficient solution. > > any comments? > > ps, I am currently running postgres 8.1, but could probably use 8.2 if it > is needed for functionality or performance reasons. It will run on a sparc > machine with solaris 10 and perhaps 4-6 processors, as many GB of RAM as > necessary and SCSI disks ( perhaps in raid 0 ). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
> I would strongly suggest that you use a proper relational schema, > instead of storing everything in two tables. I don't know your > application, but a schema like that is called an Entity-Attribute-Value > (though your entity seems to be just posx and posy) and it should raise > a big red flag in the mind of any database designer. In particular, > constructing queries against an EAV schema is a major pain in the ass. > This has been discussed before on postgresql lists as well, you might > want to search and read the previous discussions. I get your point, but the thing is the attributes have no particular relation to each other, other than belonging to same attribute groups. There are no specific rules that states that certain attributes are always used together, such as with an address record. It depends on what attributes the operator wants to study. This is why I don't find any reason to group the attributes into separate tables and columns. I am still looking into the design of the tables, but I need to get at proper test harness running before I can start ruling things out. And a part of that, is for example, efficient ways of transferring the insert data from the client to the db, instead of just single command inserts. This is where bulk transfer by arrays probably would be preferable. > Ignoring the EAV issue for a moment, it's hard to give advice without > knowing what kind of queries are going to executed. Are the lookups > always going to be by id? By posx/posy perhaps? By attribute? the query will be by attribute type and posx/y. So for position x,y, give me the following attributes... thomas
On 7/5/07, tfinneid@ifi.uio.no <tfinneid@ifi.uio.no> wrote:
> I would strongly suggest that you use a proper relational schema,
> instead of storing everything in two tables. I don't know your
> application, but a schema like that is called an Entity-Attribute-Value
> (though your entity seems to be just posx and posy) and it should raise
> a big red flag in the mind of any database designer. In particular,
> constructing queries against an EAV schema is a major pain in the ass.
> This has been discussed before on postgresql lists as well, you might
> want to search and read the previous discussions.
I get your point, but the thing is the attributes have no particular
relation to each other, other than belonging to same attribute groups.
There are no specific rules that states that certain attributes are always
used together, such as with an address record. It depends on what
attributes the operator wants to study. This is why I don't find any
reason to group the attributes into separate tables and columns.
I am still looking into the design of the tables, but I need to get at
proper test harness running before I can start ruling things out. And a
part of that, is for example, efficient ways of transferring the insert
data from the client to the db, instead of just single command inserts.
This is where bulk transfer by arrays probably would be preferable.
> Ignoring the EAV issue for a moment, it's hard to give advice without
> knowing what kind of queries are going to executed. Are the lookups
> always going to be by id? By posx/posy perhaps? By attribute?
the query will be by attribute type and posx/y. So for position x,y, give
me the following attributes...
thomas
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org
I don't know much about this EAV stuff. Except to say that my company is in a situation with a lot of adds and bulk deletes and I wish the tables were designed with partitioning in mind. That is if you know how much, order of magnitude, data each table will hold or will pass through (add and delete), you may want to design the table with partitioning in mind. I have not done any partitioning so I cannot give you details but can tell you that mass deletes are a breeze because you just "drop" that part of the table. I think it is a sub table. And that alleviates table bloat and excessive vacuuming.
Good luck.
--
Yudhvir Singh Sidhu
408 375 3134 cell
> On 7/5/07, tfinneid@ifi.uio.no <tfinneid@ifi.uio.no> wrote: > > I don't know much about this EAV stuff. Except to say that my company is > in > a situation with a lot of adds and bulk deletes and I wish the tables were > designed with partitioning in mind. That is if you know how much, order of > magnitude, data each table will hold or will pass through (add and > delete), > you may want to design the table with partitioning in mind. I have not > done > any partitioning so I cannot give you details but can tell you that mass > deletes are a breeze because you just "drop" that part of the table. I > think > it is a sub table. And that alleviates table bloat and excessive > vacuuming. By partitioning, do you mean some sort of internal db table partitioning scheme or just me dividing the data into different tables? There want be many deletes, but there might of course be some. Additionally, because of the performance requirements, there wont be time to run vacuum in between the insert, except for in non-operational periods. which will only be a couple of hours during the day. So vacuum will have to be scheduled at those times, instead of the normal intervals.
On 7/5/07, tfinneid@ifi.uio.no <tfinneid@ifi.uio.no> wrote:
> On 7/5/07, tfinneid@ifi.uio.no < tfinneid@ifi.uio.no> wrote:
>
> I don't know much about this EAV stuff. Except to say that my company is
> in
> a situation with a lot of adds and bulk deletes and I wish the tables were
> designed with partitioning in mind. That is if you know how much, order of
> magnitude, data each table will hold or will pass through (add and
> delete),
> you may want to design the table with partitioning in mind. I have not
> done
> any partitioning so I cannot give you details but can tell you that mass
> deletes are a breeze because you just "drop" that part of the table. I
> think
> it is a sub table. And that alleviates table bloat and excessive
> vacuuming.
By partitioning, do you mean some sort of internal db table partitioning
scheme or just me dividing the data into different tables?
There want be many deletes, but there might of course be some.
Additionally, because of the
performance requirements, there wont be time to run vacuum in between the
insert, except for in non-operational periods. which will only be a couple
of hours during the day. So vacuum will have to be scheduled at those
times, instead of the normal intervals.
---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at
http://www.postgresql.org/about/donate
Internal db table partitioning. Check out: http://www.postgresql.org/docs/8.2/static/ddl-partitioning.html
--
Yudhvir Singh Sidhu
408 375 3134 cell
tfinneid@ifi.uio.no wrote: >> I would strongly suggest that you use a proper relational schema, >> instead of storing everything in two tables. I don't know your >> application, but a schema like that is called an Entity-Attribute-Value >> (though your entity seems to be just posx and posy) and it should raise >> a big red flag in the mind of any database designer. In particular, >> constructing queries against an EAV schema is a major pain in the ass. >> This has been discussed before on postgresql lists as well, you might >> want to search and read the previous discussions. > > I get your point, but the thing is the attributes have no particular > relation to each other, other than belonging to same attribute groups. > There are no specific rules that states that certain attributes are always > used together, such as with an address record. It depends on what > attributes the operator wants to study. This is why I don't find any > reason to group the attributes into separate tables and columns. ISTM that a properly normalized schema would look something like this: create table position ( posX int not null, posY int not null, primary key (posX, posY) ); create table colour ( posX int not null, posY int not null, colour varchar(50) not null, primary key (posX, posY), foreign key (posX, posY) references position (posX, posY) ); create table population ( posX int not null, posY int not null, population int notn u, primary key (posX, posY), foreign key (posX, posY) references position (posX, posY) ); where colour and population are examples of attributes you want to store. If you have 100 different attributes, you'll have 100 tables like that. That may sound like a lot, but it's not. This allows you to use proper data types for the attributes, as well as constraints and all the other goodies a good relational data model gives you It also allows you to build proper indexes on the attributes. For example, if you store populations as text, you're going to have a hard time building an index that allows you to query for positions with a population between 100-2000 efficiently. These are all imaginary examples, but what I'm trying to point out here is that a proper relational schema allows you to manage and query your data much more easily and with more flexibility, allows for future extensions. A normalized schema will also take less space, which means less I/O and more performance, because there's no need to store metadata like the data_type, attr_type on every row. For performance reasons, you might actually want to not store the position-table at all in the above schema. An alternative design would be to have a single table, with one column per attribute: create table position ( posX int not null, posY int not null, colour varchar(50), population int, ... primary key (posX, posY) ) This is more space-efficient, especially if you have a lot of attributes on same coordinates. You can easily add and drop columns as needed, using ALTER TABLE. > I am still looking into the design of the tables, but I need to get at > proper test harness running before I can start ruling things out. And a > part of that, is for example, efficient ways of transferring the insert > data from the client to the db, instead of just single command inserts. > This is where bulk transfer by arrays probably would be preferable. Before you start fiddling with functions, I'd suggest that you try batching the inserts with the JDBC PreparedStatement batch facility. Splitting the inserts into multiple threads in your application sounds messy. The inserts would have to be in separate transactions, for example. Presumably your CORBA ORB will spawn multiple threads for you when there's a lot requests coming in, so the overall throughput should be the same with a single thread per request. BTW, I concur with Y Sidhu that with data volumes as high as you have, partitioning is a good idea. It's a lot easier to manage 20 10 GB table partitions, than one 200 GB table. For example, VACUUM, CLUSTER, CREATE INDEX can be done partition per partition, instead of as a single huge operatio that runs for hours. Though if you choose to have just one table per attribute type, each table might be conveniently small by nature, so that no partitioning is required. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > ISTM that a properly normalized schema would look something like this: > > create table position ( > posX int not null, > posY int not null, > primary key (posX, posY) > ); > > create table colour ( > posX int not null, > posY int not null, > colour varchar(50) not null, > primary key (posX, posY), > foreign key (posX, posY) references position (posX, posY) > ); > > create table population ( > posX int not null, > posY int not null, > population int notn u, > primary key (posX, posY), > foreign key (posX, posY) references position (posX, posY) > ); I agree that this is a way it could be done. > where colour and population are examples of attributes you want to > store. If you have 100 different attributes, you'll have 100 tables like > that. That may sound like a lot, but it's not. In any case, there is no point in having one table per attribute, as some attributes are logically grouped and can therefore be grouped toghether in the table. Since there are 5-10 groups of attributes, 5-10 tables would be enough. > > This allows you to use proper data types for the attributes, as well as > constraints and all the other goodies a good relational data model gives > you > > It also allows you to build proper indexes on the attributes. For > example, if you store populations as text, you're going to have a hard > time building an index that allows you to query for positions with a > population between 100-2000 efficiently. Performing queries on the attribute value is of no interrest, so that does not matter, > These are all imaginary examples, but what I'm trying to point out here > is that a proper relational schema allows you to manage and query your > data much more easily and with more flexibility, allows for future > extensions. They have been treating their data this way for the last 20 years, and there is nothing on the horizon that tells neither them nor me that it will be any different the next 10 years. So I am not sure I need to plan for that. > A normalized schema will also take less space, which means less I/O and > more performance, That is what I am trying to find out, if it is true for this scenario as well. > because there's no need to store metadata like the > data_type, attr_type on every row. data_type and attr_type are not decorative meta_data, they are actively used as query parameters for each attribute, if they where not there I would not be able to perform the queries I need to do. For performance reasons, you might > actually want to not store the position-table at all in the above schema. > > An alternative design would be to have a single table, with one column > per attribute: > > create table position ( > posX int not null, > posY int not null, > colour varchar(50), > population int, > ... > primary key (posX, posY) > ) > > This is more space-efficient, especially if you have a lot of attributes > on same coordinates. You can easily add and drop columns as needed, > using ALTER TABLE. > >> I am still looking into the design of the tables, but I need to get at >> proper test harness running before I can start ruling things out. And a >> part of that, is for example, efficient ways of transferring the insert >> data from the client to the db, instead of just single command inserts. >> This is where bulk transfer by arrays probably would be preferable. > > Before you start fiddling with functions, I'd suggest that you try > batching the inserts with the JDBC PreparedStatement batch facility. I have done that, now I need to have something to compare it against, preferably a function written in plpgsql and one in c. So any other suggestions on how to efficiently bulk transfer the data to the db for insertion? > Splitting the inserts into multiple threads in your application sounds > messy. Well, it has been tested and showed to make postgres perform much better, ie. 100 000 inserts separated between 4 threads performed much faster than with a single thread alone. > BTW, I concur with Y Sidhu that with data volumes as high as you have, > partitioning is a good idea. Yes, I will be looking into to it. regards thomas
Hi Thomas & all, 2007/7/6, Thomas Finneid <tfinneid@ifi.uio.no>: > > Heikki Linnakangas wrote: > > ISTM that a properly normalized schema would look something like this: [example of tables per attr referencing main table containing only primary key] > I agree that this is a way it could be done. Indeed. Another way is postgresql-specific: create table position ( posX int not null, posY int not null, ); create table population INHERITS position ( population int not null ); -- optionally: create unique index population_idx(posX,posY,population); This leads to each attr table inheriting posX, posY from position; you never insert anything to position itself, but you can use it to list all positions that have any attributes in any of the inherited tables (in that sense it's a view over all its children). > In any case, there is no point in having one table per attribute, as > some attributes are logically grouped and can therefore be grouped > toghether in the table. Since there are 5-10 groups of attributes, 5-10 > tables would be enough. This sounds very sensible. This way you would send only 1 table (or procedure, or prepared statement) name instead of as many attr_types as you have attributes in a group. So instead of calling 'your_procedure(type, posX, posY, data_type, value)' for each 5 values separately you would call 'attrgroup_and_datatype_specific_procedure(posX, posY, value1, value2, value3, value4, value5)'. Inside the procedure the inserts change from 'insert into attribute_values values (type, posX, posY, data_type, value)' to 'insert into attrgroup_and_datatype_specific_table values (posX, posY, value1, value2, value3, value4, value5)' -- so you save four inserts and for each value inserted you use 2/5 extra fields instead of 4. You are allowed to use shorter names for the tables and procedures ;) It should be trivial to hide this separation in client; you could even create new tables for new kinds of attribute-datatype combinations automatically on the fly. > They have been treating their data this way for the last 20 years, and > there is nothing on the horizon that tells neither them nor me that it > will be any different the next 10 years. So I am not sure I need to plan > for that. Is it possible that normalization has been skipped originally because of lack of resources or knowledge of the nature of data to be imported, or lack of dynamism on the part of the original tools (such as creation of type specific tables on the fly), that would now be available, or at least worth a second look? > > A normalized schema will also take less space, which means less I/O and > > more performance, > > That is what I am trying to find out, if it is true for this scenario as > well. Well, you're saving four extra ints per each value, when you only need two per 5-10 values. If you happen to save numerical data as the value in the text field for some data_types, you are losing a lot more. > > because there's no need to store metadata like the > > data_type, attr_type on every row. > > data_type and attr_type are not decorative meta_data, they are actively > used as query parameters for each attribute, if they where not there I > would not be able to perform the queries I need to do. You can still express them as table or column names rather than extra data per row. > > Before you start fiddling with functions, I'd suggest that you try > > batching the inserts with the JDBC PreparedStatement batch facility. > > I have done that, now I need to have something to compare it against, > preferably a function written in plpgsql and one in c. > So any other suggestions on how to efficiently bulk transfer the data to > the db for insertion? COPY is plentitudes faster than INSERT: http://www.postgresql.org/docs/8.1/interactive/sql-copy.html If you can't just push the data straight into the final table with COPY, push it into a temporary table that you go through with the database procedure. Shameless plug: If you use Java and miss COPY functionality in the driver, it's available at http://kato.iki.fi/sw/db/postgresql/jdbc/copy/ I was able to practically nullify time spent inserting with that. > Well, it has been tested and showed to make postgres perform much > better, ie. 100 000 inserts separated between 4 threads performed much > faster than with a single thread alone. Sounds interesting. The results should still end up written into the same table, so are you sure this didn't end up using the same time at server end - would that even matter to you? We ended up having best results with sequential batches of around 10 000 rows each. > > BTW, I concur with Y Sidhu that with data volumes as high as you have, > > partitioning is a good idea. > > Yes, I will be looking into to it. Depending on distribution of your data, saving each attribute group (or datatype, or both) to its own table will take you some way to the same direction. If you have no indexes and do no deletes (like it seems to be in your case), size of table might not matter much. It might make sense in your case, though, to name tables with times, like attribute_values_YYYYMMDD, and create a new table for each chosen period, be it month, day or even per batch. (It takes a few seconds to create a table though.) To keep viewing the data as your customer is used to, you can hide the separation of data into partitions by inheriting each partial table from an identical ancestor table, that then serves as a view over all its children -- but that's already explained in the docs. Separation into tables by attribute groups you have to hide with a view, or procedures, preferably server-side. Cheers, -- Kalle Hallivuori +358-41-5053073 http://korpiq.iki.fi/
Kalle Hallivuori wrote: > COPY is plentitudes faster than INSERT: > http://www.postgresql.org/docs/8.1/interactive/sql-copy.html > > If you can't just push the data straight into the final table with > COPY, push it into a temporary table that you go through with the > database procedure. > > Shameless plug: If you use Java and miss COPY functionality in the > driver, it's available at > > http://kato.iki.fi/sw/db/postgresql/jdbc/copy/ > > I was able to practically nullify time spent inserting with that. Interresting, I will definately have a look at it. What is the maturity level of the code at this point? and what is potentially missing to bring it up to production quality? (stability is of the utmost importance in my usage scenario.) > >> Well, it has been tested and showed to make postgres perform much >> better, ie. 100 000 inserts separated between 4 threads performed much >> faster than with a single thread alone. > > Sounds interesting. The results should still end up written into the > same table, so are you sure this didn't end up using the same time at > server end - would that even matter to you? yes it would matter, because a number of clients are waiting to read the data before the next batch of data is inserted. (in essence every 6 seconds 40000 attributes must be written, and after that write 8-16 clients read most of that data based on query criteria.and this is just today, in the future, 5-10 years, it might be as high as 2-300 000 attributes per 3 seconds. thomas
Hi. 2007/7/8, Thomas Finneid <tfinneid@student.matnat.uio.no>: > > Kalle Hallivuori wrote: > > > COPY is plentitudes faster than INSERT: > > http://www.postgresql.org/docs/8.1/interactive/sql-copy.html > > > > If you can't just push the data straight into the final table with > > COPY, push it into a temporary table that you go through with the > > database procedure. > > > > Shameless plug: If you use Java and miss COPY functionality in the > > driver, it's available at > > > > http://kato.iki.fi/sw/db/postgresql/jdbc/copy/ > > > > I was able to practically nullify time spent inserting with that. > > Interresting, I will definately have a look at it. > What is the maturity level of the code at this point? and what is > potentially missing to bring it up to production quality? (stability is > of the utmost importance in my usage scenario.) It's my third implementation, based on earlier work by Kris Jurka, a maintainer of the JDBC driver. (It is really quite short so it's easy to keep it clear.) I consider it mature enough to have accommodated it as part of an upcoming large application, but I'd really like to hear others' opinions. Documentation I should add one of these days, maybe even rewrite the javadoc. You can use COPY as is on the server side without the patch, but then you need to get the data as CSV or TSV files onto the database server machine, and use db superuser privileges to import it. My patch just adds the ability to feed data from client with normal user privileges. -- Kalle Hallivuori +358-41-5053073 http://korpiq.iki.fi/
Kalle Hallivuori wrote: > Hi. > > 2007/7/8, Thomas Finneid <tfinneid@student.matnat.uio.no>: >> >> Kalle Hallivuori wrote: >> >> > COPY is plentitudes faster than INSERT: >> > http://www.postgresql.org/docs/8.1/interactive/sql-copy.html >> > >> > If you can't just push the data straight into the final table with >> > COPY, push it into a temporary table that you go through with the >> > database procedure. >> > >> > Shameless plug: If you use Java and miss COPY functionality in the >> > driver, it's available at >> > >> > http://kato.iki.fi/sw/db/postgresql/jdbc/copy/ >> > >> > I was able to practically nullify time spent inserting with that. >> >> Interresting, I will definately have a look at it. >> What is the maturity level of the code at this point? and what is >> potentially missing to bring it up to production quality? (stability is >> of the utmost importance in my usage scenario.) > > It's my third implementation, based on earlier work by Kris Jurka, a > maintainer of the JDBC driver. (It is really quite short so it's easy > to keep it clear.) I consider it mature enough to have accommodated it > as part of an upcoming large application, but I'd really like to hear > others' opinions. Documentation I should add one of these days, maybe > even rewrite the javadoc. Hi I have tested your COPY patch (actually I tested postgresql-jdbc-8.2-505-copy-20070716.jdbc3.jar) and it is really fast, actually just as fast as serverside COPY (boths tests was performed on local machine). This means I am interrested in using it in my project, but I have some concerns that needs to be adressed, (and I am prepared to help in any way I can). The following are the concerns I have - While testing I got some errors, which needs to be fixed (detailed below) - The patch must be of production grade quality - I would like the patch to be part of the official pg JDBC driver. The error I got the most is : This command runs a single run, single thread and generates 10000 rows of data tofi@duplo:~/svn/pores$ java -server -Xms20m -Xmx256m -cp /usr/java/jdk1.5.0_06/jre/lib/rt.jar:.:src/:test/:conf/:lib/postgresql-jdbc-8.2-505-copy-20070716.jdbc3.jar wg.daemon.Main -m SINGLE_WRITE -t 1 -r 1 -c 10000 -p CPBulk Initialising connection... Performing insert... Build bulk data time: 0s 211ms toString() bulk data time: 0s 4ms time: 0s 205ms org.postgresql.util.PSQLException: Unexpected command completion while copying: COPY at org.postgresql.core.v3.QueryExecutorImpl.executeCopy(QueryExecutorImpl.java:706) at org.postgresql.copy.CopyManager.copyIntoDB(CopyManager.java:50) at org.postgresql.copy.CopyManager.copyIntoDB(CopyManager.java:37) at wg.storage.pores1.CPBulk.addAttributes(CPBulk.java:72) at wg.daemon.Daemon.run(Daemon.java:57) tofi@duplo:~/svn/pores$ ls -al lib/ regards thomas
Hi all! 2007/7/18, Thomas Finneid <tfinneid@student.matnat.uio.no>: > Hi I have tested your COPY patch (actually I tested > postgresql-jdbc-8.2-505-copy-20070716.jdbc3.jar) and it is really fast, > actually just as fast as serverside COPY (boths tests was performed on > local machine). Happy to hear there's interest toward this solution. > This means I am interrested in using it in my project, but I have some > concerns that needs to be adressed, (and I am prepared to help in any > way I can). The following are the concerns I have > > - While testing I got some errors, which needs to be fixed (detailed below) > - The patch must be of production grade quality > - I would like the patch to be part of the official pg JDBC driver. Definitely agreed, those are my requirements as well. We can discuss bug fixing among ourselves; new versions I'll announce on pgsql-jdbc list. -- Kalle Hallivuori +358-41-5053073 http://korpiq.iki.fi/