Обсуждение: COPY FROM is not 8bit clean

Поиск
Список
Период
Сортировка

COPY FROM is not 8bit clean

От
Darcy Buskermolen
Дата:
ACK!!!!! must rember which MTA I'm useing...
When useing COPY FROM 'file' DELIMITER '\254' copyfrom reads past the
delimiter and ends up with parse errors when trying to do the insert


What the ?? why dind' tthat go through with the body of the text.. *sigh*
I'll resend in the AM..

Re: COPY FROM is not 8bit clean

От
Tatsuo Ishii
Дата:
> When useing COPY FROM 'file' DELIMITER '\254' copyfrom reads past the
> delimiter and ends up with parse errors when trying to do the insert
> 
> 
> What the ?? why dind' tthat go through with the body of the text.. *sigh*
> I'll resend in the AM.. 

Good catch. It's definitely a bug in copy command. Please try
following patches (this is against 7.2).

*** src/backend/commands/copy.c.orig    Tue Feb 26 21:11:05 2002
--- src/backend/commands/copy.c    Tue Feb 26 21:11:35 2002
***************
*** 1024,1030 **** CopyReadAttribute(FILE *fp, bool *isnull, char *delim, int *newline, char *null_print) {     int
      c;
 
!     int            delimc = delim[0];  #ifdef MULTIBYTE     int            mblen;
--- 1024,1030 ---- CopyReadAttribute(FILE *fp, bool *isnull, char *delim, int *newline, char *null_print) {     int
      c;
 
!     int            delimc = (unsigned char)delim[0];  #ifdef MULTIBYTE     int            mblen;


Re: COPY FROM is not 8bit clean

От
Darcy Buskermolen
Дата:
Postgres was not compiled with Multibyte, if I replace the if (delimc == c)
with if (strstr(delim,c)) it works as expected. This changes was
implemented for performance reasons according to the CVS log.



At 11:57 PM 2/25/02 -0500, Tom Lane wrote:
>Darcy Buskermolen <darcy@ok-connect.com> writes:
>> When useing COPY FROM 'file' DELIMITER '\254' copyfrom reads past the
>> delimiter and ends up with parse errors when trying to do the insert
>
>Are you perhaps operating in a multibyte encoding in which \254 is
>just the first byte of a multibyte character?
>
>I'm not sure what we do in such a case, and even less sure what we
>should do ... but I am entirely prepared to believe that we don't
>do the Right Thing ...
>
>            regards, tom lane
>
>

Re: COPY FROM is not 8bit clean

От
Darcy Buskermolen
Дата:
This patch solves the problem.

At 09:16 PM 2/26/02 +0900, Tatsuo Ishii wrote:
>> When useing COPY FROM 'file' DELIMITER '\254' copyfrom reads past the
>> delimiter and ends up with parse errors when trying to do the insert
>> 
>> 
>> What the ?? why dind' tthat go through with the body of the text.. *sigh*
>> I'll resend in the AM.. 
>
>Good catch. It's definitely a bug in copy command. Please try
>following patches (this is against 7.2).
>
>*** src/backend/commands/copy.c.orig    Tue Feb 26 21:11:05 2002
>--- src/backend/commands/copy.c    Tue Feb 26 21:11:35 2002
>***************
>*** 1024,1030 ****
>  CopyReadAttribute(FILE *fp, bool *isnull, char *delim, int *newline,
char *null_print)
>  {
>      int            c;
>!     int            delimc = delim[0];
>  
>  #ifdef MULTIBYTE
>      int            mblen;
>--- 1024,1030 ----
>  CopyReadAttribute(FILE *fp, bool *isnull, char *delim, int *newline,
char *null_print)
>  {
>      int            c;
>!     int            delimc = (unsigned char)delim[0];
>  
>  #ifdef MULTIBYTE
>      int            mblen;
>
>


Re: COPY FROM is not 8bit clean

От
Tom Lane
Дата:
Darcy Buskermolen <darcy@ok-connect.com> writes:
> Postgres was not compiled with Multibyte, if I replace the if (delimc == c)
> with if (strstr(delim,c)) it works as expected. This changes was
> implemented for performance reasons according to the CVS log.

Yeah, my error :-(.  See Tatsuo's reply for the correct fix.

            regards, tom lane

Re: COPY FROM is not 8bit clean

От
Bruce Momjian
Дата:
Can someone explain why this fixes the problem.  I thought it was safe
to assign a char to an int and do a compare.  The compare I see is:
    if (c == delimc)        break;


---------------------------------------------------------------------------

Darcy Buskermolen wrote:
> This patch solves the problem.
> 
> At 09:16 PM 2/26/02 +0900, Tatsuo Ishii wrote:
> >> When useing COPY FROM 'file' DELIMITER '\254' copyfrom reads past the
> >> delimiter and ends up with parse errors when trying to do the insert
> >> 
> >> 
> >> What the ?? why dind' tthat go through with the body of the text.. *sigh*
> >> I'll resend in the AM.. 
> >
> >Good catch. It's definitely a bug in copy command. Please try
> >following patches (this is against 7.2).
> >
> >*** src/backend/commands/copy.c.orig    Tue Feb 26 21:11:05 2002
> >--- src/backend/commands/copy.c    Tue Feb 26 21:11:35 2002
> >***************
> >*** 1024,1030 ****
> >  CopyReadAttribute(FILE *fp, bool *isnull, char *delim, int *newline,
> char *null_print)
> >  {
> >      int            c;
> >!     int            delimc = delim[0];
> >  
> >  #ifdef MULTIBYTE
> >      int            mblen;
> >--- 1024,1030 ----
> >  CopyReadAttribute(FILE *fp, bool *isnull, char *delim, int *newline,
> char *null_print)
> >  {
> >      int            c;
> >!     int            delimc = (unsigned char)delim[0];
> >  
> >  #ifdef MULTIBYTE
> >      int            mblen;
> >
> >
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
> 
> http://www.postgresql.org/users-lounge/docs/faq.html
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: COPY FROM is not 8bit clean

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Can someone explain why this fixes the problem.

Think about a machine where char is signed by default.  Extracting \254
into an int will produce -2, which will not equal \254 returned by getc.
        regards, tom lane


Re: COPY FROM is not 8bit clean

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Can someone explain why this fixes the problem.
> 
> Think about a machine where char is signed by default.  Extracting \254
> into an int will produce -2, which will not equal \254 returned by getc.

Oh, I thought that the int returned by getc already had that sign
extension, but now I remember it doesn't.  In fact, it specifically
returns an int so -1 can be identified.  Got it.  Seems I am forgetting
some of my C.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026