Re: EOL characters and multibyte encodings

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: EOL characters and multibyte encodings
Дата
Msg-id 8064.1182465526@sss.pgh.pa.us
обсуждение исходный текст
Ответ на EOL characters and multibyte encodings  (Joe Conway <mail@joeconway.com>)
Ответы Re: EOL characters and multibyte encodings  (Joe Conway <mail@joeconway.com>)
Список pgsql-hackers
Joe Conway <mail@joeconway.com> writes:
> I finally was able PL/R to compile and run on Windows recently. This has 
> lead to people using a Windows based client (typically PgAdmin III) to 
> create PL/R functions. Immediately I started to receive reports of 
> failures that turned out to be due to the carriage return (\r) used in 
> standard Win32 EOLs (\r\n). It seems that the R parser only accepts 
> newlines (\n), even on Win32 (confirmed on r-devel list with a core 
> developer).

> My first thought on fixing this issue was to simply replace all 
> instances of '\r' in pg_proc.prosrc with '\n' prior to sending it to the 
> R parser. As far as I know, any instances of '\r' embedded in a 
> syntactically valid R statement must be escaped (i.e. literally the 
> characters "\" and "r"), so that should not be a problem. But I am 
> concerned about how this potentially plays against multibyte characters. 
> Is it safe to do this, or do I need to use a mb-aware replace algorithm?

It's safe, because you'll be dealing with prosrc inside the backend,
therefore using a backend-legal encoding, and those don't have any ASCII
aliasing problems (all bytes of an MB character must have high bit set).

However I dislike doing it exactly that way because line numbers in the
R script will all get doubled.  Unless R never reports errors in terms
of line numbers, you'd be better off to either delete the \r characters
or replace them with spaces.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Joe Conway
Дата:
Сообщение: EOL characters and multibyte encodings
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: EOL characters and multibyte encodings