Обсуждение: Problem with COPY CSV

Поиск
Список
Период
Сортировка

Problem with COPY CSV

От
Christopher Kings-Lynne
Дата:
Attached is a small test extract from the USDA nutrient database.

The problem is that the script won't load the COPY data correctly.  This
is with CVS HEAD (and 8.1).

It is the 4th column in the table that gives the problem (nutr_no
integer).  Each of the 3 COPY rows has a different way of specifying a
number:

* As an unquoted empty value
* As a quoted number
* As a quoted empty value

Now, I'm specifying "NULL AS ''" in the COPY command but it refuses to
recognize the quoted empty value as NULL.

Anyone have any ideas?  Is this a bug?  The manual even says that using
"NULL AS ''" lets you make no distinction between empty and quoted empty
values.

Chris

Вложения

Re: Problem with COPY CSV

От
Andrew Dunstan
Дата:

Christopher Kings-Lynne wrote:

> Attached is a small test extract from the USDA nutrient database.
>
> The problem is that the script won't load the COPY data correctly.  
> This is with CVS HEAD (and 8.1).
>
> It is the 4th column in the table that gives the problem (nutr_no 
> integer).  Each of the 3 COPY rows has a different way of specifying a 
> number:
>
> * As an unquoted empty value
> * As a quoted number
> * As a quoted empty value
>
> Now, I'm specifying "NULL AS ''" in the COPY command but it refuses to 
> recognize the quoted empty value as NULL.
>
> Anyone have any ideas?  Is this a bug?  The manual even says that 
> using "NULL AS ''" lets you make no distinction between empty and 
> quoted empty values.


It is not a bug. It is working as designed and as documented. The rule 
is basically that a null value is never quoted and a quoted value is 
never null. The docs say:

"The CSV format has no standard way to distinguish a NULL value from an 
empty string. PostgreSQL's COPY handles this by quoting. A NULL is 
output as the NULL string and is not quoted, while a data value matching 
the NULL string is quoted. Therefore, using the default settings, a NULL 
is written as an unquoted empty string, while an empty string is written 
with double quotes (""). Reading values follows similar rules. You can 
use FORCE NOT NULL to prevent NULL input comparisons for specific columns."

The corollary of this is that you can't make a quoted value be accepted 
as null (FORCE NOT NULL does the converse, i.e. it makes an unquoted 
value be considered as not null).

The handling of nulls in CSV mode was discussed in almost nauseous 
detail at the time we did CSV about 18 months ago, so there should be no 
surprise here.

For data as irregular as this I suggest that you import it into a text 
field and then update the value of that field to NULL where it's empty - 
you could then do ALTER TYPE ... USING ... . The other possibility would 
be to preprocess the data.

I at least am not inclined to tinker too much more with CSV mode - we 
could end up catering for every weird output format in the world if 
we're not careful.

cheers

andrew