Обсуждение: pgsql: Add regression tests for CSV and \., and add automatic quoting of
pgsql: Add regression tests for CSV and \., and add automatic quoting of
От
momjian@postgresql.org (Bruce Momjian)
Дата:
Log Message:
-----------
Add regression tests for CSV and \., and add automatic quoting of a
single column dump that has a \. value, so the load works properly. I
also added documentation describing this issue.
Backpatch to 8.1.X.
Tags:
----
REL8_1_STABLE
Modified Files:
--------------
pgsql/doc/src/sgml/ref:
copy.sgml (r1.70 -> r1.70.2.1)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/ref/copy.sgml.diff?r1=1.70&r2=1.70.2.1)
pgsql/src/backend/commands:
copy.c (r1.254.2.2 -> r1.254.2.3)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/commands/copy.c.diff?r1=1.254.2.2&r2=1.254.2.3)
pgsql/src/test/regress/expected:
copy2.out (r1.22 -> r1.22.2.1)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/test/regress/expected/copy2.out.diff?r1=1.22&r2=1.22.2.1)
pgsql/src/test/regress/sql:
copy2.sql (r1.13 -> r1.13.2.1)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/test/regress/sql/copy2.sql.diff?r1=1.13&r2=1.13.2.1)
Re: pgsql: Add regression tests for CSV and \., and add automatic quoting of
От
"Andrew Dunstan"
Дата:
Andrew Dunstan said:
> Bruce Momjian said:
>> Log Message:
>> -----------
>> Add regression tests for CSV and \., and add automatic quoting of a
single column dump that has a \. value, so the load works properly. I also
added documentation describing this issue.
>>
>
> This seems unnecessarily elaborate, in code that is already byzantine. I
think we can safely quote *any* field that has \. regardless of whether or
not it is a singleton. There's no need to make a single column a special
case - if it's valid for a singleton it's valid for any, and vice versa.
>
Now that I've woken up properly I realise that it's also just wrong - it
will miss the case we need to catch of the first column of a multi-column
line beginning with \. - just treat them all the same and all will be well.
Also, this test is suspicious:
strcmp(string, "\\.") == 0
Don't we also want to quote it if the field reads \.x ?
strncmp(string, "\\.",2) == 0
seems like it would be a better test.
cheers
andrew
Andrew Dunstan wrote:
> Andrew Dunstan said:
> > Bruce Momjian said:
> >> Log Message:
> >> -----------
> >> Add regression tests for CSV and \., and add automatic quoting of a
> single column dump that has a \. value, so the load works properly. I also
> added documentation describing this issue.
> >>
> >
> > This seems unnecessarily elaborate, in code that is already byzantine. I
> think we can safely quote *any* field that has \. regardless of whether or
> not it is a singleton. There's no need to make a single column a special
> case - if it's valid for a singleton it's valid for any, and vice versa.
> >
>
>
> Now that I've woken up properly I realise that it's also just wrong - it
> will miss the case we need to catch of the first column of a multi-column
> line beginning with \. - just treat them all the same and all will be well.
>
> Also, this test is suspicious:
>
> strcmp(string, "\\.") == 0
>
> Don't we also want to quote it if the field reads \.x ?
> strncmp(string, "\\.",2) == 0
> seems like it would be a better test.
Have you looked at the regression tests I added? \.x will no longer be
interpreted as an end-of-data marker:
COPY testeoc FROM stdin CSV;
a\.
\.b
c\.d
"\."
\.
COPY testeoc TO stdout CSV;
a\.
\.b
c\.d
"\."
Our documentation says \. must appear alone on a line. With non-CSV, we
allow \. to appear on the end of a line too because it can not be a data
value, but for CSV, we have to enforce that.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian said: >> Now that I've woken up properly I realise that it's also just wrong - >> it will miss the case we need to catch of the first column of a >> multi-column line beginning with \. - just treat them all the same and >> all will be well. >> >> Also, this test is suspicious: >> >> strcmp(string, "\\.") == 0 >> >> Don't we also want to quote it if the field reads \.x ? >> strncmp(string, "\\.",2) == 0 >> seems like it would be a better test. > > Have you looked at the regression tests I added? \.x will no longer be > interpreted as an end-of-data marker: > Oh. I didn't realise that part of the behaviour had changed. Does that alter our robustness? I still think it seems odd to quote the field depending on it being the only field. That feels more inconsistent than quoting it whenever it is \. But I guess it's a corner case that is hardly likely to happen in real life. cheers andrew
Andrew Dunstan wrote: > Bruce Momjian said: > >> Now that I've woken up properly I realise that it's also just wrong - > >> it will miss the case we need to catch of the first column of a > >> multi-column line beginning with \. - just treat them all the same and > >> all will be well. > >> > >> Also, this test is suspicious: > >> > >> strcmp(string, "\\.") == 0 > >> > >> Don't we also want to quote it if the field reads \.x ? > >> strncmp(string, "\\.",2) == 0 > >> seems like it would be a better test. > > > > Have you looked at the regression tests I added? \.x will no longer be > > interpreted as an end-of-data marker: > > > > Oh. I didn't realise that part of the behaviour had changed. Does that alter > our robustness? It improves our robustness, for sure. > I still think it seems odd to quote the field depending on it being the only > field. That feels more inconsistent than quoting it whenever it is \. > > But I guess it's a corner case that is hardly likely to happen in real life. That is why I did it only for the single-column case, so the quoting would happen only when required, and it would limit confusion. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073