Re: Review: Patch FORCE_NULL option for copy COPY in CSV mode

Поиск
Список
Период
Сортировка
От Ian Lawrence Barwick
Тема Re: Review: Patch FORCE_NULL option for copy COPY in CSV mode
Дата
Msg-id CAB8KJ=jVfVi3U8DTn_K05spir6jJoSGFnWVVvvrzL7tK9dOUeQ@mail.gmail.com
обсуждение исходный текст
Ответ на Review: Patch FORCE_NULL option for copy COPY in CSV mode  (Payal Singh <payal@omniti.com>)
Ответы Re: Review: Patch FORCE_NULL option for copy COPY in CSV mode  (Andrew Dunstan <andrew@dunslane.net>)
Список pgsql-hackers
2013-11-01 Payal Singh <payal@omniti.com>:
> The post was made before I subscribed to the mailing list, so posting my
> review separately. The link to the original patch mail is
> http://www.postgresql.org/message-id/CAB8KJ=jS-Um4TGwenS5wLUfJK6K4rNOm_V6GRUj+tcKekL2=GQ@mail.gmail.com
>
>>
>> Hi,
>>
>> This patch implements the following TODO item:
>>
>>   Allow COPY in CSV mode to control whether a quoted zero-length
>>   string is treated as NULL
>>
>>       Currently this is always treated as a zero-length string,
>>       which generates an error when loading into an integer column
>>
>>         Re: [PATCHES] allow CSV quote in NULL
>>         http://archives.postgresql.org/pgsql-hackers/2007-07/msg00905.php
>>
>>
>>   http://wiki.postgresql.org/wiki/Todo#COPY
>>
>>
>> I had a very definite use-case for this functionality recently while
>> importing
>> CSV files generated by Oracle, and was somewhat frustrated by the
>> existence
>> of a FORCE_NOT_NULL option for specific columns, but not one for
>> FORCE_NULL.
>>
>> I'll add this to the November commitfest.
>>
>>
>> Regards
>>
>> Ian Barwick
>
>
> ==================
> Contents & Purpose
> ==================
>
> This patch introduces a new 'FORCE_NULL' option which has the opposite
> functionality of the already present 'FORCE_NOT_NULL' option for the COPY
> command. Prior to this option there was no way to convert a zeroed-length
> quoted value to a NULL value when COPY FROM is used to import data from CSV
> formatted files.
>
> ==================
> Submission Review
> ==================
>
> The patch is in the correct context diff format. It includes changes to the
> documentation as well as additional regression tests. The description has
> been discussed and defined in the previous mails leading to this patch.
>
> =====================
> Functionality Review
> =====================
>
> CORRECTION NEEDED - Due to a minor error in code (details in 'Code Review'
> section below), force_null option is not limited to COPY FROM, and works
> even when COPY TO is used. This should instead give an error message.
>
> The updated documentation describes the added functionality clearly.
>
> All regression tests passed successfully.
>
> Code compilation after including patch was successful. No warnings either.
>
> Manually tested COPY FROM with FORCE_NULL for a number of scenarios, all
> with expected outputs. No issues.
>
> Been testing the patch for a few days, no crashes or weird behavior
> observed.
>
> =============================================
> Code Formatting Review (Needs Improvement)
> =============================================
>
> Looks good, the tabs between variable declaration and accompanying comment
> can be improved.
>
> =================================
> Code Review (Needs Improvement)
> =================================
>
> 1. There is a " NOTE: force_not_null option are not applied to the returned
> fields." before COPY FROM block. Similar note should be added for force_null
> option too.
>
> 2. One of the conditions to check and give an error if force_null is true
> and copy from is false is wrong, cstate->force_null should be checked
> instead of cstate->force_notnull:
>
>         /* Check force_notnull */
>         if (!cstate->csv_mode && cstate->force_notnull != NIL)
>                 ereport(ERROR,
>                                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>                                  errmsg("COPY force not null available only
> in CSV mode")));
>         if (cstate->force_notnull != NIL && !is_from)
>                 ereport(ERROR,
>                                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>                           errmsg("COPY force not null only available using
> COPY FROM")));
>
>         /* Check force_null */
>         if (!cstate->csv_mode && cstate->force_null != NIL)
>                 ereport(ERROR,
>                                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>                                  errmsg("COPY force null available only in
> CSV mode")));
>
> ==>  if (cstate->force_notnull != NIL && !is_from)
>                 ereport(ERROR,
>                                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>                           errmsg("COPY force null only available using COPY
> FROM")));
>
> ===============================
> Suggested Changes & Conclusion
> ===============================
>
> The Above mentioned error condition should be corrected. Minor comments and
> tab changes are upto the author.
>
> In all, suggested modifications aside, the patch works well and in my
> opinion, would be a useful addition to the COPY command.

Hi Payal

Many thanks for the review, and my apologies for not getting back to
you earlier.

Updated version of the patch attached with suggested corrections. I'm not
sure about the tabs in the variable declarations - the whole section
seems to be all over the place (regardless of whether tabs are set to 4 or
8 spaces) and could do with tidying up, however I didn't want to expand the
scope of the patch too much.

Quick recap of the reasons behind this patch - we had a bunch of CSV files
(and by "bunch" I mean totalling hundreds of millions of lines) exported
from a data source which was unable to distinguish between an empty string
and a null value. Too late we realized we had a ridiculous number of
comma separated rows with some empty strings which should be kept as such,
and some empty strings which should be imported as NULLs. "Wait", I said,
for I remembered COPY FROM had some kind of option involving specifying
the NULL status of certain columns. Alas it turned out to be the opposite
of what we required, and we found another workaround, but I thought as
it's likely we would face a similar situation in the future, it would
be handy to have the option available.

Example:

Given a table like this (a very stripped-down version of the original use
case):

    CREATE TABLE nulltest (
      prefix TEXT NOT NULL DEFAULT '',
      altname TEXT DEFAULT NULL
    );

I want to be able to do this:

    postgres=# COPY nulltest FROM STDIN WITH (FORMAT CSV, FORCE_NULL (altname));
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> "",""
    >> \.
    postgres=# \pset null NULL
    Null display (null) is "NULL".
    postgres=# SELECT * FROM nulltest ;
     prefix | altname
    --------+---------
            | NULL
    (1 row)


I don't have any strong feelings about the syntax - as is it's just
the logical opposite of what's already available.

Regards

Ian Barwick

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Cédric Villemain
Дата:
Сообщение: Re: alternative back-end block formats
Следующее
От: Kyotaro HORIGUCHI
Дата:
Сообщение: Re: Retain dynamic shared memory segments for postmaster lifetime