Re: [GENERAL] trouble with to_char('L')

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: [GENERAL] trouble with to_char('L')
Дата
Msg-id 201003222014.o2MKErr17486@momjian.us
обсуждение исходный текст
Ответ на Re: [GENERAL] trouble with to_char('L')  (Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp>)
Ответы Re: [GENERAL] trouble with to_char('L')  (Magnus Hagander <magnus@hagander.net>)
Список pgsql-hackers
Takahiro Itagaki wrote:
> 
> Bruce Momjian <bruce@momjian.us> wrote:
> 
> > Takahiro Itagaki wrote:
> > > Since 9.0 has GetPlatformEncoding() for the purpose, we could simplify
> > > db_encoding_strdup() with the function. Like this:
> > 
> > OK, I don't have any Win32 people testing this patch so if we want this
> > fixed for 9.0 someone is going to have to test my patch to see that it
> > works.  Can you make the adjustments suggested above to my patch and
> > test it to see that it works so we can apply it for 9.0?
> 
> Here is a full patch that can be applied cleanly to HEAD.
> Can anyone test it on Windows?
> 
> I'm not sure why temporary changes of lc_ctype was required in the
> original patch. The codes are not included in my patch, but please
> notice me it is still needed.

Sorry for the delay in replying to you.

I considered your idea of using the existing Postgres encoding
conversion routines to do the conversion of localenv() strings, but
found two problems.

First, GetPlatformEncoding() caches its result, so it assumes the
LC_CTYPE never changes for the server, while fixing this issue actually
requires us to change LC_CTYPE.  We could avoid the caching but that
then involves complex table lookups, etc, which seems overly complex:

+       /* convert the string to the database encoding */
+       pstr = (char *) pg_do_encoding_conversion(
+                                               (unsigned char *) str, strlen(str),
+                                               GetPlatformEncoding(), GetDatabaseEncoding());

Second, having our backend routines do the conversion seems wrong
because it is possible for someone to set LC_MONETARY to an encoding
that our database does not understand, e.g. UTF16, but one that WIN32
can convert to a valid encoding.

The reason we are doing all this is because of this updated comment in
my patch:
ftp://momjian.us/pub/postgresql/mypatches/pg_locale

+    *  Ideally, monetary and numeric local symbols could be returned in
+    *  any server encoding.  Unfortunately, the WIN32 API does not allow
+    *  setlocale() to return values in a codepage/CTYPE that uses more
+    *  than two bytes per character, like UTF-8:
+    *
+    *      http://msdn.microsoft.com/en-us/library/x99tb11d.aspx
+    *
+    *  Evidently, LC_CTYPE allows us to control the encoding used
+    *  for strings returned by localeconv().  The Open Group
+    *  standard, mentioned at the top of this C file, doesn't
+    *  explicitly state this.
+    *
+    *  Therefore, we set LC_CTYPE to match LC_NUMERIC and
+    *  LC_MONETARY, call localeconv(), and use mbstowcs() to
+    *  convert the locale-aware string, e.g. Euro symbol (which
+    *  is not in UTF-8), to the server encoding.

One new idea would be to set LC_CTYPE to UTF16/widechars unconditionally
on Win32 and then just convert that always to the server encoding with
win32_wchar_to_db_encoding(), instead of using the encoding from
LC_MONETARY to set LC_CTYPE and having to do double-conversion.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Kevin Grittner"
Дата:
Сообщение: Re: Comments on Exclusion Constraints and related datatypes
Следующее
От: Greg Stark
Дата:
Сообщение: Re: [postgis-users] ERROR: array size exceeds themaximumallowed(134217727)