Обсуждение: Newbie: PG8 and text file parsing
I need some advice.. I am new to PG8 - I am working on the project that needs to load data daily to PostgreSQL from the text file.. the text file is 40MB comma delimited file with row consisting of 20-30 fields and couple thousands of rows.. so the data needs to be parsed and if the row exists it needs to be updated, otherwise new record is going to be created. My problem is with parsing: 1-what would be a good approach to do it? is plpgSQL suitable for this? or should I use PHP or other language to to that and run as a cron job/service? 2-how about committing transactions? would there be a problem if it is done at the end of all the updates? Thank you Paul
On Fri, 2005-02-18 at 20:34 -0500, Paul R wrote: > I need some advice.. I am new to PG8 - I am working on the project > that needs to load data daily to PostgreSQL from the text file.. the > text file is 40MB comma delimited file with row consisting of 20-30 > fields and couple thousands of rows.. Hello, We had to do something similar once. We used plPythonU to copy the file from the filesystem to a temporary table. Then from the temporary table insert into the permanent table. Sincerely, Joshua D. Drake > > so the data needs to be parsed and if the row exists it needs to be > updated, otherwise new record is going to be created. My problem is > with parsing: > > 1-what would be a good approach to do it? is plpgSQL suitable for > this? or should I use PHP or other language to to that and run as a > cron job/service? > > 2-how about committing transactions? would there be a problem if it > is done at the end of all the updates? > > Thank you > Paul > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend -- Command Prompt, Inc., Your PostgreSQL solutions company. 503-667-4564 Custom programming, 24x7 support, managed services, and hosting Open Source Authors: plPHP, pgManage, Co-Authors: plPerlNG Reliable replication, Mammoth Replicator - http://www.commandprompt.com/
> I need some advice.. I am new to PG8 - I am working on the project > that needs to load data daily to PostgreSQL from the text file.. the > text file is 40MB comma delimited file with row consisting of 20-30 > fields and couple thousands of rows.. > > so the data needs to be parsed and if the row exists it needs to be > updated, otherwise new record is going to be created. My problem is > with parsing: > > 1-what would be a good approach to do it? is plpgSQL suitable for > this? or should I use PHP or other language to to that and run as a > cron job/service? There are some tools which can read a text file and treat it as if it's a real table. Using them would make your job much easier. ftp://ftp.sra.co.jp/pub/cmd/postgres/exttable/exttable-0.5.tar.gz http://pgfoundry.org/projects/dbi-link/ > 2-how about committing transactions? would there be a problem if it > is done at the end of all the updates? Not at all, I think.
blueaxon@gmail.com (Paul R) wrote: > I need some advice.. I am new to PG8 - I am working on the project > that needs to load data daily to PostgreSQL from the text file.. the > text file is 40MB comma delimited file with row consisting of 20-30 > fields and couple thousands of rows.. > > so the data needs to be parsed and if the row exists it needs to be > updated, otherwise new record is going to be created. My problem is > with parsing: > > 1-what would be a good approach to do it? is plpgSQL suitable for > this? or should I use PHP or other language to to that and run as a > cron job/service? When I have done this sort of thing, I have used Perl to turn the data into a file in a format that can be readily read in using the PostgreSQL-specific "COPY" command. If you like PHP or Python or Icon/sed/awk/SNOBOL, that's your call; using a language to "parse" the text that you are familiar with is likely to be better than using a language that you aren't familiar with... > 2-how about committing transactions? would there be a problem if it > is done at the end of all the updates? No, no problem at all. When we did the data conversion for the .ORG registry, it involved splitting the data into 50,000 record "chunks", each loaded in using COPY. The prime reason to split at 50,000 was that if problems were encountered, it's a lot easier to edit a file with 50,000 records using vi than it is to edit one with 5 million records. I don't recall needing to use that capability, but it gave everyone confidence to know it was there... -- select 'cbbrowne' || '@' || 'gmail.com'; http://cbbrowne.com/info/ /"\ \ / ASCII RIBBON CAMPAIGN X AGAINST HTML MAIL / \
You can always do an insert and add a trigger that checks the key for existence. If it exists do the update instead. That way you have one line of code doing both inserts and updates. The only problem with it is that it looks confusing from an outsiders perspective, so it would have to be documented very well. Sim "Paul R" <blueaxon@gmail.com> wrote in message news:3eb42da105021817342feb0b2d@mail.gmail.com... > I need some advice.. I am new to PG8 - I am working on the project > that needs to load data daily to PostgreSQL from the text file.. the > text file is 40MB comma delimited file with row consisting of 20-30 > fields and couple thousands of rows.. > > so the data needs to be parsed and if the row exists it needs to be > updated, otherwise new record is going to be created. My problem is > with parsing: > > 1-what would be a good approach to do it? is plpgSQL suitable for > this? or should I use PHP or other language to to that and run as a > cron job/service? > > 2-how about committing transactions? would there be a problem if it > is done at the end of all the updates? > > Thank you > Paul > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend >