Обсуждение: Re: Copy From csv file with double quotes as null
Hi All,
I am apparently totally misreading how to import data using the COPY FROM command, can someone give assistance ?
I have two issues, both dealing with double quotes as NULL. The data is CSV with NULL being represented by a double quote (e.g. "") in all columns of the table.
ISSUE A) The following command bombs:
with the following error:
So, latitude is a double precision column and I think that PostgreSQL is interpreting the double quote as a NULL string and then it can not be placed into that column because it is a double precision column.
Issue B) I have an associated issue with a text value where the NULL in the data being represented by a double quote (e.g. "") is being inputed as a quote. I can not use the switch NULL AS '"' because PostgreSQL says "the quote character must not appear in the NULL specification"
Given the file sizes are huge, I would rather not have to try to preprocess the data. Is there anyway the COPY FROM command can handle this data smoothly ?
I am apparently totally misreading how to import data using the COPY FROM command, can someone give assistance ?
I have two issues, both dealing with double quotes as NULL. The data is CSV with NULL being represented by a double quote (e.g. "") in all columns of the table.
ISSUE A) The following command bombs:
COPY testdata FROM 'c:/temp/test.csv' CSV HEADER;
with the following error:
ERROR: invalid input syntax for type double precision: ""
CONTEXT: COPY testdata, line 7, column latitude: ""
********** Error **********
ERROR: invalid input syntax for type double precision: ""
SQL state: 22P02
Context: COPY testdata, line 7, column latitude: ""
So, latitude is a double precision column and I think that PostgreSQL is interpreting the double quote as a NULL string and then it can not be placed into that column because it is a double precision column.
Issue B) I have an associated issue with a text value where the NULL in the data being represented by a double quote (e.g. "") is being inputed as a quote. I can not use the switch NULL AS '"' because PostgreSQL says "the quote character must not appear in the NULL specification"
Given the file sizes are huge, I would rather not have to try to preprocess the data. Is there anyway the COPY FROM command can handle this data smoothly ?
-- -Don Don Catanzaro, PhD Landscape Ecologist dgcatanzaro@gmail.com 16144 Sigmond Lane Lowell, AR 72745 479-751-3616
On 9/09/2010 2:48 AM, Donald Catanzaro, PhD wrote: > So, latitude is a double precision column and I think that PostgreSQL is > interpreting the double quote as a NULL string No, it's interpreting it as an empty string, not NULL. I suspect that's what you meant, but "NULL string" is still NULL, you're getting an empty(zero-length) *non-null* string. See: http://www.postgresql.org/docs/8.4/static/sql-copy.html I'm not sure COPY ... CSV knows how to treat "" as null. It has options to force empty unquoted text input columns to be *non* null, but I don't see the reverse, a way to force the empty string to be treated as null. While the documentation notes that "many programs produce strange and occasionally perverse CSV files", this particular format quirk doesn't seem that strange and I'm surprised to see no obvious way to handle it. Perhaps this is a case where you might need to pre-process the csv input to clean it up a bit? I usually use Python's "csv" module for that. I imagine ETL tools like Pentaho or Talend can do the job, but have never used them. > Issue B) I have an associated issue with a text value where the NULL in > the data being represented by a double quote (e.g. "") is being inputed > as a quote. I can not use the switch NULL AS '"' because PostgreSQL says > "the quote character must not appear in the NULL specification" Again, PostgreSQL knows how to handle the reverse, where the empty unquoted string is null, and "" is the zero-length non-null string. It doesn't seem to be able to deal with "" as a representation of NULL, though. At least this one, unlike the double precision issue, can be handled by dropping the NOT NULL constraint on the text column concerned, then running an UPDATE tablename SET col=NULL WHERE col="" and restoring the constraint. > Given the file sizes are huge, I would rather not have to try to > preprocess the data. Is there anyway the COPY FROM command can handle > this data smoothly ? It doesn't look like it to me. -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/