Обсуждение: Re: Copy From csv file with double quotes as null

Поиск
Список
Период
Сортировка

Re: Copy From csv file with double quotes as null

От
"Donald Catanzaro, PhD"
Дата:
Hi All,

I am apparently totally misreading how to import data using the COPY FROM command, can someone give assistance ?

I have two issues, both dealing with double quotes as NULL.  The data is CSV with NULL being represented by a double quote (e.g. "") in all columns of the table.

ISSUE A)  The following command bombs:

COPY testdata FROM 'c:/temp/test.csv' CSV HEADER;

with the following error:

ERROR:  invalid input syntax for type double precision: ""
CONTEXT:  COPY testdata, line 7, column latitude: ""

********** Error **********

ERROR: invalid input syntax for type double precision: ""
SQL state: 22P02
Context: COPY testdata, line 7, column latitude: ""

So, latitude is a double precision column and  I think that PostgreSQL is interpreting the double quote as a NULL string and then it can not be placed into that column because it is a double precision column.

Issue B)  I have an associated issue with a text value where the NULL in the data being represented by a double quote (e.g. "") is being inputed as a quote.  I can not use the switch NULL AS '"' because PostgreSQL says "the quote character must not appear in the NULL specification"


Given the file sizes are huge, I would rather not have to try to preprocess the data.  Is there anyway the COPY FROM command can handle this data smoothly ?
 
-- 
-Don 

Don Catanzaro, PhD                  
Landscape Ecologist
dgcatanzaro@gmail.com
16144 Sigmond Lane
Lowell, AR 72745
479-751-3616

Re: Copy From csv file with double quotes as null

От
Craig Ringer
Дата:
On 9/09/2010 2:48 AM, Donald Catanzaro, PhD wrote:
> So, latitude is a double precision column and I think that PostgreSQL is
> interpreting the double quote as a NULL string

No, it's interpreting it as an empty string, not NULL. I suspect that's
what you meant, but "NULL string" is still NULL, you're getting an
empty(zero-length) *non-null* string.

See: http://www.postgresql.org/docs/8.4/static/sql-copy.html

I'm not sure COPY ... CSV knows how to treat "" as null. It has options
to force empty unquoted text input columns to be *non* null, but I don't
see the reverse, a way to force the empty string to be treated as null.

While the documentation notes that "many programs produce strange and
occasionally perverse CSV files", this particular format quirk doesn't
seem that strange and I'm surprised to see no obvious way to handle it.

Perhaps this is a case where you might need to pre-process the csv input
to clean it up a bit? I usually use Python's "csv" module for that. I
imagine ETL tools like Pentaho or Talend can do the job, but have never
used them.

> Issue B) I have an associated issue with a text value where the NULL in
> the data being represented by a double quote (e.g. "") is being inputed
> as a quote. I can not use the switch NULL AS '"' because PostgreSQL says
> "the quote character must not appear in the NULL specification"

Again, PostgreSQL knows how to handle the reverse, where the empty
unquoted string is null, and "" is the zero-length non-null string. It
doesn't seem to be able to deal with "" as a representation of NULL, though.

At least this one, unlike the double precision issue, can be handled by
dropping the NOT NULL constraint on the text column concerned, then
running an
   UPDATE tablename SET col=NULL WHERE col=""
and restoring the constraint.

> Given the file sizes are huge, I would rather not have to try to
> preprocess the data. Is there anyway the COPY FROM command can handle
> this data smoothly ?

It doesn't look like it to me.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/