Обсуждение: Inserting 'large' amounts of data
I have a web application which allows users to upload a lot of phone numbers. I need to store those numbers to a database. Usualy, one would upload around 70k-100k of records, totaling around 2 MB in size. I'm using tomcat as an application server, and JDBC to connect to pg8.3 database. I will have around 20-50 concurent users in peek hours, and even that is quite overestimated. I could create the temporary file on the filesystem where database cluster is located and then execute COPY mytable FROM '/tmp/upload-data/uuidofsomesort.csv' WITH CSV', but the 'problem' is that database server and tomcat reside on different physical machines. What would one recommend as the best way to insert those data? Mario
Mario Splivalo wrote: > I have a web application which allows users to upload a lot of phone > numbers. I need to store those numbers to a database. Usualy, one would > upload around 70k-100k of records, totaling around 2 MB in size. > > ... > t would one recommend as the best way to insert those data? > I believe you can use org.postgresql.copy.CopyIn() ... there are variants that use a writeToCopy() call to send the data, or a java.io.InputStream, or a java.io.Reader ...
Right. As of the 8.4 release, you have a jdbc API to the postgresql COPY functionality. Cast your Connection to a PGConnection and call .getCopyAPI(). You can then use the CopyManager[1] to copy data in from your tomcat servlet. [1] http://jdbc.postgresql.org/documentation/publicapi/org/postgresql/copy/CopyManager.html -- Maciek Sakrejda | Software Engineer | Truviso (650) 242-3500 Main (650) 242-3501 F msakrejda@truviso.com www.truviso.com
> > >I have a web application which allows users to upload a lot of phone >numbers. I need to store those numbers to a database. Usualy, one would >upload around 70k-100k of records, totaling around 2 MB in size. > >I'm using tomcat as an application server, and JDBC to connect to pg8.3 >database. > >I will have around 20-50 concurent users in peek hours, and even that is >quite overestimated. > >I could create the temporary file on the filesystem where database >cluster is located and then execute COPY mytable FROM >'/tmp/upload-data/uuidofsomesort.csv' WITH CSV', but the 'problem' is >that database server and tomcat reside on different physical machines. > >What would one recommend as the best way to insert those data? > > Mario > Hello Mario, If the users already have the data in CSV format why not let them do it via the app. server? The connection can be made across machines if setup properly. http://dandymadeproductions.com/projects/MyJSQLView/docs/javadocs/index.html CSVDataImportThread.java This class could be used as a basis, with some work. I have a bug with data that has semicolons, could be more robust also, but could be used for a start. John wrote: > I believe you can use org.postgresql.copy.CopyIn() ... there are > variants that use a writeToCopy() call to send the data, or a > java.io.InputStream, or a java.io.Reader ... This sounds a lot cleaner. danap.
John R Pierce wrote: > Mario Splivalo wrote: >> I have a web application which allows users to upload a lot of phone >> numbers. I need to store those numbers to a database. Usualy, one would >> upload around 70k-100k of records, totaling around 2 MB in size. >> >> ... >> t would one recommend as the best way to insert those data? > > I believe you can use org.postgresql.copy.CopyIn() ... there are > variants that use a writeToCopy() call to send the data, or a > java.io.InputStream, or a java.io.Reader ... Yes, I gues this is the way to go. I'll have to upgrade to 8.4, but somehow I'd like to wait for a month or so before making the switch. Thank you all. Mike
On Thu, 27 Aug 2009, Mario Splivalo wrote: > Yes, I gues this is the way to go. I'll have to upgrade to 8.4, but somehow > I'd like to wait for a month or so before making the switch. > You only need the 8.4 JDBC driver, not a 8.4 server. Copy support should work for all 7.4 and later servers. Kris Jurka
Kris Jurka wrote: > > > On Thu, 27 Aug 2009, Mario Splivalo wrote: > >> Yes, I gues this is the way to go. I'll have to upgrade to 8.4, but >> somehow I'd like to wait for a month or so before making the switch. >> > > You only need the 8.4 JDBC driver, not a 8.4 server. Copy support > should work for all 7.4 and later servers. speaking of using JDBC COPY FROM STDIN via this CopyManager interface... how does that handle constraint violations, like duplicate primary key?
In the same manner as regular DML--you'll get a constraint violation: cqdb=# create table foo(a int unique); NOTICE: CREATE TABLE / UNIQUE will create implicit index "foo_a_key" for table "foo" CREATE TABLE cqdb=# copy foo from stdin; Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself. >> 1 >> 1 >> 1 >> \. ERROR: duplicate key value violates unique constraint "foo_a_key" CONTEXT: COPY foo, line 2: "1" (This is straight through psql, but essentially the same thing would happen through jdbc). -- Maciek Sakrejda | Software Engineer | Truviso (650) 242-3500 Main (650) 242-3501 F msakrejda@truviso.com www.truviso.com
John R Pierce wrote: > > speaking of using JDBC COPY FROM STDIN via this CopyManager > interface... how does that handle constraint violations, like duplicate > primary key? > Just like a regular copy failure via psql or constraint violation. An Exception is thrown and the transaction is aborted (which you must rollback). Kris Jurka
Kris Jurka wrote: > > > On Thu, 27 Aug 2009, Mario Splivalo wrote: > >> Yes, I gues this is the way to go. I'll have to upgrade to 8.4, but >> somehow I'd like to wait for a month or so before making the switch. >> > > You only need the 8.4 JDBC driver, not a 8.4 server. Copy support > should work for all 7.4 and later servers. Works like a charm! :) Mike