Re: loading data from flat text file

Поиск
Список
Период
Сортировка
От Ross J. Reedstrom
Тема Re: loading data from flat text file
Дата
Msg-id 20000523101230.B5814@rice.edu
обсуждение исходный текст
Ответ на Re: loading data from flat text file  (Ron Peterson <rpeterson@yellowbank.com>)
Список pgsql-general
On Tue, May 23, 2000 at 10:10:41AM -0400, Ron Peterson wrote:
> > "Voitenko, Denis" wrote:
> >
> > So I am almost there, except my data is formated as follows:
> >
> > "chunk1","chunk2","chunk3"
> >
> > how would I tell COPY that my data is encapsulated in " and separated
> > by , ? Furthermore, I did not find a manual on the COPY command.
> > Anyone?
>
> In /usr/local/pgsql/doc (assuming that's where you installed PostgreSQL)
> you will find a good deal of html format documentation.  From the docs:
>
> COPY [ BINARY ] table [ WITH OIDS ]
>     FROM { 'filename' | stdin }
>     [ [USING] DELIMITERS 'delimiter' ]
>     [ WITH NULL AS 'null string' ]
> COPY [ BINARY ] table [ WITH OIDS ]
>     TO { 'filename' | stdout }
>     [ [USING] DELIMITERS 'delimiter' ]
>     [ WITH NULL AS 'null string' ]
>
> Basically, you just need to specify the delimiters.
>

However, the quotes might give you problems. Postgresql does not treat
them specially, and expects delimited files, not seperated files. This
leads to two problems. If you have text fields with internal commas,
they'll split at the internal comma. The second problem is that the quotes
will be stored with your data, and depending on the column type, may not
transform at all: i.e. trying to store "12" in an int column won't work.

They way a postgresql's copy delimited file handles embedded delimiters
is to quote them with a back slash, as so:

chunk one,another\, different\, chunk,third chunk

So, you'll need to preprocess your flat file some. A simple sed should
do it.  My usual trick for this is to find some character sequence that
_isn't_ in the dataset, like '|||', and do a global replace on "," with
the pipes, then delete quotes, quote the commas, and resubstitute the
pipes with commas. This converts a seperated file into a delimited one.

cat my_file | sed 's/","/|||/g' | sed 's/^"//g'| sed 's/"$//g'| sed \
   's/,/\,/g'| sed 's/|||/,/g' >newfile

Sort of ugly, but it should work. If you can get your other DB to dump
in a delimited format, instead of a quoted CSV format, everything should
work much easier.

COPY wasn't really designed for importing data and data transformation,
but as a reliable means of doing bulk dump and restore of tables.

Ross
--
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu>
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St.,  Houston, TX 77005

В списке pgsql-general по дате отправления:

Предыдущее
От: Tatsuo Ishii
Дата:
Сообщение: Re: Re: Join with other database's table
Следующее
От: Stephane Bortzmeyer
Дата:
Сообщение: Re: Timezones on Tru64 (Digital Unix)