Re: Copy From csv file with double quotes as null

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: Copy From csv file with double quotes as null
Дата
Msg-id 4C882E8E.6080301@postnewspapers.com.au
обсуждение исходный текст
Ответ на Re: Copy From csv file with double quotes as null  ("Donald Catanzaro, PhD" <dgcatanzaro@gmail.com>)
Список pgsql-general
On 9/09/2010 2:48 AM, Donald Catanzaro, PhD wrote:
> So, latitude is a double precision column and I think that PostgreSQL is
> interpreting the double quote as a NULL string

No, it's interpreting it as an empty string, not NULL. I suspect that's
what you meant, but "NULL string" is still NULL, you're getting an
empty(zero-length) *non-null* string.

See: http://www.postgresql.org/docs/8.4/static/sql-copy.html

I'm not sure COPY ... CSV knows how to treat "" as null. It has options
to force empty unquoted text input columns to be *non* null, but I don't
see the reverse, a way to force the empty string to be treated as null.

While the documentation notes that "many programs produce strange and
occasionally perverse CSV files", this particular format quirk doesn't
seem that strange and I'm surprised to see no obvious way to handle it.

Perhaps this is a case where you might need to pre-process the csv input
to clean it up a bit? I usually use Python's "csv" module for that. I
imagine ETL tools like Pentaho or Talend can do the job, but have never
used them.

> Issue B) I have an associated issue with a text value where the NULL in
> the data being represented by a double quote (e.g. "") is being inputed
> as a quote. I can not use the switch NULL AS '"' because PostgreSQL says
> "the quote character must not appear in the NULL specification"

Again, PostgreSQL knows how to handle the reverse, where the empty
unquoted string is null, and "" is the zero-length non-null string. It
doesn't seem to be able to deal with "" as a representation of NULL, though.

At least this one, unlike the double precision issue, can be handled by
dropping the NOT NULL constraint on the text column concerned, then
running an
   UPDATE tablename SET col=NULL WHERE col=""
and restoring the constraint.

> Given the file sizes are huge, I would rather not have to try to
> preprocess the data. Is there anyway the COPY FROM command can handle
> this data smoothly ?

It doesn't look like it to me.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

В списке pgsql-general по дате отправления:

Предыдущее
От: Adrian Klaver
Дата:
Сообщение: Re: variable name in plpgsql
Следующее
От: Darren Duncan
Дата:
Сообщение: Re: variable name in plpgsql