Обсуждение: loading data from flat text file
I am porting my database from MySQL to PostgreSQL 7.0 One of the first problems I ran into is the lack of data inload command. I would like to insert data from a 15 million record flat file into a table. How would I do that in PGSQL?
COPY command. [Charset iso-8859-1 unsupported, filtering to ASCII...] > I am porting my database from MySQL to PostgreSQL 7.0 One of the first > problems I ran into is the lack of data inload command. I would like to > insert data from a 15 million record flat file into a table. How would I do > that in PGSQL? -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
So I am almost there, except my data is formated as follows:
"chunk1","chunk2","chunk3"
how would I tell COPY that my data is encapsulated in " and separated by , ? Furthermore, I did not find a manual on the COPY command. Anyone?
-----Original Message-----
From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
Sent: Monday, May 22, 2000 6:23 PM
To: Voitenko, Denis
Cc: 'pgsql-general@postgresql.org'
Subject: Re: [GENERAL] loading data from flat text file
COPY command.
[Charset iso-8859-1 unsupported, filtering to ASCII...]
> I am porting my database from MySQL to PostgreSQL 7.0 One of the first
> problems I ran into is the lack of data inload command. I would like to
> insert data from a 15 million record flat file into a table. How would I do
> that in PGSQL?
-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
			
		
> "Voitenko, Denis" wrote:
>
> So I am almost there, except my data is formated as follows:
>
> "chunk1","chunk2","chunk3"
>
> how would I tell COPY that my data is encapsulated in " and separated
> by , ? Furthermore, I did not find a manual on the COPY command.
> Anyone?
In /usr/local/pgsql/doc (assuming that's where you installed PostgreSQL)
you will find a good deal of html format documentation.  From the docs:
COPY [ BINARY ] table [ WITH OIDS ]
    FROM { 'filename' | stdin }
    [ [USING] DELIMITERS 'delimiter' ]
    [ WITH NULL AS 'null string' ]
COPY [ BINARY ] table [ WITH OIDS ]
    TO { 'filename' | stdout }
    [ [USING] DELIMITERS 'delimiter' ]
    [ WITH NULL AS 'null string' ]
Basically, you just need to specify the delimiters.
Bruce Momjian is also in the process of writing a book on PostgreSQL.
You can download a snapshot of its current state from
http://www.postgresql.org/docs/awbook.html.
-Ron-
			
		On Tue, May 23, 2000 at 10:10:41AM -0400, Ron Peterson wrote:
> > "Voitenko, Denis" wrote:
> >
> > So I am almost there, except my data is formated as follows:
> >
> > "chunk1","chunk2","chunk3"
> >
> > how would I tell COPY that my data is encapsulated in " and separated
> > by , ? Furthermore, I did not find a manual on the COPY command.
> > Anyone?
>
> In /usr/local/pgsql/doc (assuming that's where you installed PostgreSQL)
> you will find a good deal of html format documentation.  From the docs:
>
> COPY [ BINARY ] table [ WITH OIDS ]
>     FROM { 'filename' | stdin }
>     [ [USING] DELIMITERS 'delimiter' ]
>     [ WITH NULL AS 'null string' ]
> COPY [ BINARY ] table [ WITH OIDS ]
>     TO { 'filename' | stdout }
>     [ [USING] DELIMITERS 'delimiter' ]
>     [ WITH NULL AS 'null string' ]
>
> Basically, you just need to specify the delimiters.
>
However, the quotes might give you problems. Postgresql does not treat
them specially, and expects delimited files, not seperated files. This
leads to two problems. If you have text fields with internal commas,
they'll split at the internal comma. The second problem is that the quotes
will be stored with your data, and depending on the column type, may not
transform at all: i.e. trying to store "12" in an int column won't work.
They way a postgresql's copy delimited file handles embedded delimiters
is to quote them with a back slash, as so:
chunk one,another\, different\, chunk,third chunk
So, you'll need to preprocess your flat file some. A simple sed should
do it.  My usual trick for this is to find some character sequence that
_isn't_ in the dataset, like '|||', and do a global replace on "," with
the pipes, then delete quotes, quote the commas, and resubstitute the
pipes with commas. This converts a seperated file into a delimited one.
cat my_file | sed 's/","/|||/g' | sed 's/^"//g'| sed 's/"$//g'| sed \
   's/,/\,/g'| sed 's/|||/,/g' >newfile
Sort of ugly, but it should work. If you can get your other DB to dump
in a delimited format, instead of a quoted CSV format, everything should
work much easier.
COPY wasn't really designed for importing data and data transformation,
but as a reliable means of doing bulk dump and restore of tables.
Ross
--
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu>
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St.,  Houston, TX 77005