Обсуждение: Import: empty string and expression

Поиск
Список
Период
Сортировка

Import: empty string and expression

От
Quan Zongliang
Дата:
Hi, all

String:
How to distinguish NULL and empty?

In the COPY command, it conside no quoted empty as NULL and quoted ones as empty.
This rule not defined in the CSV standard(RFC4180).
For example:
   A,,B  has three item: 'A', NULL and 'B'
   A,"",B has three item: 'A', empty and 'B'
Should we follow this rule?
I think it is good.

Some code must be rewrited, because I treat both as empty now.
And with wxString, null and empty can't be distiguished.

Expression:
If end users decide to import with INSERT statement, they can define expressions.
(With other cases, only the constants can be used.
The pgLoader support this option. I forgot to support constants.
Thanks Guillaume, he introduced this good tool to me.)
Those ones must has one column and one row.
That is, it must can be valuated by SELECT statement emitted FROM clauses.

Now, I use <column-name> style in the expression.
The delimiter option use this style too. (<TAB> means TAB char.)
Exscaped char is XML-style, & < and > , and so on.
Because I noticed pgAdmin has a option "Show NULL values as <NULL>".
Is it appropriate?

Regards.

-----------------------------------------------
Quan Zongliang
quanzongliang@gmail.com
CIT Japan:  http://www.cit.co.jp
CIT China:  http://www.citbj.com.cn


Re: Import: empty string and expression

От
Dave Page
Дата:
On Tue, Feb 24, 2009 at 9:23 AM, Quan Zongliang <quanzongliang@gmail.com> wrote:
> Hi, all
>
> String:
> How to distinguish NULL and empty?
>
> In the COPY command, it conside no quoted empty as NULL and quoted ones as empty.
> This rule not defined in the CSV standard(RFC4180).
> For example:
>   A,,B  has three item: 'A', NULL and 'B'
>   A,"",B has three item: 'A', empty and 'B'
> Should we follow this rule?
> I think it is good.

Sounds good to me.


> Now, I use <column-name> style in the expression.
> The delimiter option use this style too. (<TAB> means TAB char.)
> Exscaped char is XML-style, & < and > , and so on.

Why do we need to escape those characters? If I follow correctly, this
will be in an expression that the user enters themselves?

> Because I noticed pgAdmin has a option "Show NULL values as <NULL>".
> Is it appropriate?

For entering non-printable characters shouldn't we just accept
whatever PostgreSQL will? Or am I missing the point?

--
Dave Page
EnterpriseDB UK:   http://www.enterprisedb.com

Re: Import: empty string and expression

От
Quan Zongliang
Дата:
> Why do we need to escape those characters? If I follow correctly, this
> will be in an expression that the user enters themselves?
I am sorry, Dave.
That description is not clear. You know, my English is bad.

There, the column mean CSV file's column, not table column.
The end user can set a option "First row is header".
CSV's first row will be consided column names.
If not, they will be "Column #1, Column #2, Column #3 ..."
Then, in the expression, the end user can input "<Column #1> || substring(<Column #3> from 2 for 1)".
It will be convert to corresponding row's data for every statement.

For example:
A1,B1,C1,D1,E1,F1 -->  'A1' || 'C1'
A2,B2,C2,D2,E2,F2 -->  'A2' || 'C2'
...

INSERT INTO t_name(col_list) VALUES( [some values], 'A1' || substring('C1' from 2 for 1), [some values]);
INSERT INTO t_name(col_list) VALUES( [some values], 'A2' || substring('C2' from 2 for 1), [some values]);
...

I think it is useful when the end user want to import the CSV data with different columns to table.

Regards.

-----------------------------------------------
Quan Zongliang
quanzongliang@gmail.com
CIT Japan:  http://www.cit.co.jp
CIT China:  http://www.citbj.com.cn


Re: Import: empty string and expression

От
Quan Zongliang
Дата:
> For example:
> A1,B1,C1,D1,E1,F1 -->  'A1' || 'C1'
> A2,B2,C2,D2,E2,F2 -->  'A2' || 'C2'
> ...
should be:
A1,B1,C1,D1,E1,F1 -->  'A1' || substring('C1' from 2 for 1)
A2,B2,C2,D2,E2,F2 -->  'A2' || substring('C2' from 2 for 1)

I am careless.

-----------------------------------------------
Quan Zongliang
quanzongliang@gmail.com
CIT Japan:  http://www.cit.co.jp
CIT China:  http://www.citbj.com.cn


Re: Import: empty string and expression

От
Quan Zongliang
Дата:
> > String:
> > How to distinguish NULL and empty?
> >
> > In the COPY command, it conside no quoted empty as NULL and quoted ones as empty.
> > This rule not defined in the CSV standard(RFC4180).
> > For example:
> >   A,,B  has three item: 'A', NULL and 'B'
> >   A,"",B has three item: 'A', empty and 'B'
> > Should we follow this rule?
> > I think it is good.
>
> Sounds good to me.

With the width-delimiter, user can set empty values by "5,,12" or "5,0,12".
Should we add NULL-value support? Like: "5,<NULL>,12"

How about other types? boolean, integer, float, and so on.
This is my idea:
For String, the user can select one chioce:
    "empty"     (generate "FORCE NOT NULL" option for COPY or '' for INSERT),
    "NULL"      (left nochange, for column nullable column)
    "ERROR"     (don't generate statement, log only).
    "CONSTANT"  (user-defined string)
All of them act on NULL.
For others:
    "NULL"      (left nochange)
    "ERROR"     (don't generate statement, log only).
    "CONSTANT"  (user-defined string) or
    [other options depend on type]
All of them act on NULL or empty.

Until now, I have no idea with array type.


-----------------------------------------------
Quan Zongliang
quanzongliang@gmail.com
CIT Japan:  http://www.cit.co.jp
CIT China:  http://www.citbj.com.cn


Re: Import: empty string and expression

От
Dave Page
Дата:
On Thu, Feb 26, 2009 at 1:44 AM, Quan Zongliang <quanzongliang@gmail.com> wrote:
>> Why do we need to escape those characters? If I follow correctly, this
>> will be in an expression that the user enters themselves?
> I am sorry, Dave.
> That description is not clear. You know, my English is bad.
>
> There, the column mean CSV file's column, not table column.
> The end user can set a option "First row is header".
> CSV's first row will be consided column names.
> If not, they will be "Column #1, Column #2, Column #3 ..."
> Then, in the expression, the end user can input "<Column #1> || substring(<Column #3> from 2 for 1)".
> It will be convert to corresponding row's data for every statement.

Right - so why do we need to escape anything in the expression?
Shouldn't the user just be able to free-type whatever they want (in
the same way they would if they were writing the expression as part of
a query in the Query Tool?

--
Dave Page
EnterpriseDB UK:   http://www.enterprisedb.com

Re: Import: empty string and expression

От
Dave Page
Дата:
On Thu, Feb 26, 2009 at 6:30 AM, Quan Zongliang <quanzongliang@gmail.com> wrote:

> With the width-delimiter, user can set empty values by "5,,12" or "5,0,12".
> Should we add NULL-value support? Like: "5,<NULL>,12"

No, I think we should just use:

5,,6 = 5, NULL, 6

and

5,'',6 = 5, EMPTY STRING, 6

> How about other types? boolean, integer, float, and so on.

Well they don't have an 'empty' value anyway, so no value would always
equal NULL.

> This is my idea:
> For String, the user can select one chioce:
>    "empty"     (generate "FORCE NOT NULL" option for COPY or '' for INSERT),
>    "NULL"      (left nochange, for column nullable column)
>    "ERROR"     (don't generate statement, log only).
>    "CONSTANT"  (user-defined string)

Have a set  of radio buttons and a textbox perhaps:

NULL AS:  NULL  O   empty string O    constant   O    [                ]


> All of them act on NULL.
> For others:
>    "NULL"      (left nochange)
>    "ERROR"     (don't generate statement, log only).
>    "CONSTANT"  (user-defined string) or
>    [other options depend on type]
> All of them act on NULL or empty.

NULL AS:  NULL  O    constant   O    [                ]

I wouldn't bother with ERROR for any type - let the server detect
errors and reject data accordingly.

> Until now, I have no idea with array type.

I wouldn't worry about those yet. Let's learn to walk first, and fly later :-)

--
Dave Page
EnterpriseDB UK:   http://www.enterprisedb.com

Re: Import: empty string and expression

От
Quan Zongliang
Дата:
> Right - so why do we need to escape anything in the expression?
> Shouldn't the user just be able to free-type whatever they want (in
> the same way they would if they were writing the expression as part of
> a query in the Query Tool?
I have to convert column names in expression to data for every row
before generate statement. Without escape char, I don't know
what is column-name, what is lexical. Once the user want to input
a lexical same as one of column-names, confusion occur.

For example, with CSV row "A1,B1,C1,D1":
'Column #1' || <Column #3> will be converted to 'Column #1' || 'C1'
The first is a lexical, the second is CSV column.

But 'Column #1' || Column #3 can't be converted properly.


-----------------------------------------------
Quan Zongliang
quanzongliang@gmail.com
CIT Japan:  http://www.cit.co.jp
CIT China:  http://www.citbj.com.cn


Re: Import: empty string and expression

От
Dave Page
Дата:
On Thu, Feb 26, 2009 at 9:17 AM, Quan Zongliang <quanzongliang@gmail.com> wrote:
>> Right - so why do we need to escape anything in the expression?
>> Shouldn't the user just be able to free-type whatever they want (in
>> the same way they would if they were writing the expression as part of
>> a query in the Query Tool?
> I have to convert column names in expression to data for every row
> before generate statement. Without escape char, I don't know
> what is column-name, what is lexical. Once the user want to input
> a lexical same as one of column-names, confusion occur.
>
> For example, with CSV row "A1,B1,C1,D1":
> 'Column #1' || <Column #3> will be converted to 'Column #1' || 'C1'
> The first is a lexical, the second is CSV column.
>
> But 'Column #1' || Column #3 can't be converted properly.

Oh I see what you mean. Let's just use some notation like:

"$(#1) || substring($(#3) from 2 for 1)"

for positional notation, or

"$(col_1) || substring($(col_3) from 2 for 1)"

for name based. You can't have an operator called $( so there
shouldn't be any collision with PG operators in the expression - and
to allow the user to include a constant of $(...), we can tell them to
use \$(...} or $$(...)

--
Dave Page
EnterpriseDB UK:   http://www.enterprisedb.com

Re: Import: empty string and expression

От
Quan Zongliang
Дата:
> Have a set  of radio buttons and a textbox perhaps:
> NULL AS:  NULL  O   empty string O    constant   O    [                ]
> NULL AS:  NULL  O    constant   O    [                ]
My plan is the user can set this option for every column.
Because different column has different set(nullable or not) and different type.

> I wouldn't bother with ERROR for any type - let the server detect
> errors and reject data accordingly.
With COPY command, it stops operation at the first error.
Maybe the user don't want this. Especially when they want to import a huge size file.

> I wouldn't worry about those yet. Let's learn to walk first, and fly later :-)
Yes, I think so, too.

-----------------------------------------------
Quan Zongliang
quanzongliang@gmail.com
CIT Japan:  http://www.cit.co.jp
CIT China:  http://www.citbj.com.cn


Re: Import: empty string and expression

От
Quan Zongliang
Дата:
> Oh I see what you mean. Let's just use some notation like:
>
> "$(#1) || substring($(#3) from 2 for 1)"
>
> for positional notation, or
>
> "$(col_1) || substring($(col_3) from 2 for 1)"
>
> for name based. You can't have an operator called $( so there
> shouldn't be any collision with PG operators in the expression - and
> to allow the user to include a constant of $(...), we can tell them to

Got it.

-----------------------------------------------
Quan Zongliang
quanzongliang@gmail.com
CIT Japan:  http://www.cit.co.jp
CIT China:  http://www.citbj.com.cn


Re: Import: empty string and expression

От
Dave Page
Дата:
On Thu, Feb 26, 2009 at 9:46 AM, Quan Zongliang <quanzongliang@gmail.com> wrote:
>> Have a set  of radio buttons and a textbox perhaps:
>> NULL AS:  NULL  O   empty string O    constant   O    [                ]
>> NULL AS:  NULL  O    constant   O    [                ]
> My plan is the user can set this option for every column.
> Because different column has different set(nullable or not) and different type.
>
>> I wouldn't bother with ERROR for any type - let the server detect
>> errors and reject data accordingly.
> With COPY command, it stops operation at the first error.
> Maybe the user don't want this. Especially when they want to import a huge size file.

Hmm, OK.

The common way to handle errors in COPY style imports btw, is to batch
the imports, and when one batch fails, split it in two and try again.
Keep dividing it down until the error lines are found, and everything
else is imported. Of course, if you have lots of errors that can
pretty inefficient.


--
Dave Page
EnterpriseDB UK:   http://www.enterprisedb.com