Re: More message encoding woes

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: More message encoding woes
Дата
Msg-id 49DB2666.1050800@enterprisedb.com
обсуждение исходный текст
Ответ на Re: More message encoding woes  (Peter Eisentraut <peter_e@gmx.net>)
Ответы Re: More message encoding woes  (Peter Eisentraut <peter_e@gmx.net>)
Список pgsql-hackers
Peter Eisentraut wrote:
> On Tuesday 07 April 2009 11:21:25 Heikki Linnakangas wrote:
>> Using the name for the latin1 encoding in the currently Windows-only
>> mapping table, "LATIN1", you get no translation because that name is not
>> recognized by the system. Using the other name "ISO-8859-1", it works.
>> "LATIN1" is not listed in the output of locale -m either.
>
> You are looking in the wrong place.  What we need is for iconv to recognize
> the encoding name used by PostgreSQL.  iconv --list is the primary hint for
> that.
>
> The locale names provided by the operating system are arbitrary and unrelated.

Oh, ok. I guess we can do the simple fix you proposed then.

Patch attached. Instead of checking for LC_CTYPE == C, I'm checking
"pg_get_encoding_from_locale(NULL) == encoding" which is more close to
what we actually want. The downside is that
pg_get_encoding_from_locale(NULL) isn't exactly free, but the upside is
that we don't need to keep this in sync with the rules we have in CREATE
DATABASE that enforce that locale matches encoding.

This doesn't include the cleanup to make the mapping table easier to
maintain that Magnus was going to have a look at before I started this
thread.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
*** a/src/backend/utils/mb/mbutils.c
--- b/src/backend/utils/mb/mbutils.c
***************
*** 890,896 **** cliplen(const char *str, int len, int limit)
      return l;
  }

! #if defined(ENABLE_NLS) && defined(WIN32)
  static const struct codeset_map {
      int    encoding;
      const char *codeset;
--- 890,896 ----
      return l;
  }

! #if defined(ENABLE_NLS)
  static const struct codeset_map {
      int    encoding;
      const char *codeset;
***************
*** 929,935 **** static const struct codeset_map {
      {PG_EUC_TW, "EUC-TW"},
      {PG_EUC_JIS_2004, "EUC-JP"}
  };
! #endif /* WIN32 */

  void
  SetDatabaseEncoding(int encoding)
--- 929,935 ----
      {PG_EUC_TW, "EUC-TW"},
      {PG_EUC_JIS_2004, "EUC-JP"}
  };
! #endif /* ENABLE_NLS */

  void
  SetDatabaseEncoding(int encoding)
***************
*** 946,960 **** SetDatabaseEncoding(int encoding)
  }

  /*
!  * On Windows, we need to explicitly bind gettext to the correct
!  * encoding, because gettext() tends to get confused.
   */
  void
  pg_bind_textdomain_codeset(const char *domainname, int encoding)
  {
! #if defined(ENABLE_NLS) && defined(WIN32)
      int     i;

      for (i = 0; i < lengthof(codeset_map_array); i++)
      {
          if (codeset_map_array[i].encoding == encoding)
--- 946,975 ----
  }

  /*
!  * Bind gettext to the correct encoding.
   */
  void
  pg_bind_textdomain_codeset(const char *domainname, int encoding)
  {
! #if defined(ENABLE_NLS)
      int     i;

+     /*
+      * gettext() uses the encoding specified by LC_CTYPE by default,
+      * so if that matches the database encoding, we don't need to do
+      * anything. This is not for performance, but because if
+      * bind_textdomain_codeset() doesn't recognize the codeset name we
+      * pass it, it will fall back to English and we don't want that to
+      * happen unnecessarily.
+      *
+      * On Windows, though, gettext() tends to get confused so we always
+      * bind it.
+      */
+ #ifndef WIN32
+     if (pg_get_encoding_from_locale(NULL) == encoding)
+         return;
+ #endif
+
      for (i = 0; i < lengthof(codeset_map_array); i++)
      {
          if (codeset_map_array[i].encoding == encoding)

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: More message encoding woes
Следующее
От: Hiroshi Inoue
Дата:
Сообщение: Re: More message encoding woes