Re: [HACKERS] initdb initalization failure for collation "ja_JP"

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [HACKERS] initdb initalization failure for collation "ja_JP"
Дата
Msg-id 8599.1498250322@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: [HACKERS] initdb initalization failure for collation "ja_JP"  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [HACKERS] initdb initalization failure for collation "ja_JP"  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
I wrote:
> One question that I've got is why the ICU portion refuses to load
> any entries unless is_encoding_supported_by_icu(GetDatabaseEncoding()).
> Surely this is completely wrong?  I should think that what we load into
> pg_collation ought to be independent of template1's encoding, the same
> as it is for libc collations, and the right place to be making a test
> like that is where somebody attempts to use an ICU collation.  But
> I've not tried to test it.

So I did test that, and found out the presumable reason why that's there:
icu_from_uchar() falls over if the database encoding is unsupported, and
we use that to convert ICU "display names" for use as comments for the
ICU collations.  But that's not very much less wrongheaded, because it will
allow non-ASCII characters into the initial database contents, which is
absolutely not acceptable.  We assume we can bit-copy the contents of
template0 and it will be valid in any encoding.

Therefore, I think the right thing to do is remove that test and change
get_icu_locale_comment() so that it rejects non-ASCII text, making the
encoding conversion trivial, as in the attached patch.

On my Fedora 25 laptop, the only collations that go without a comment
in this approach are the "nb" ones (Norwegian Bokmål).  As I recall,
that locale is a second-class citizen for other reasons already,
precisely because of its loony insistence on a non-ASCII name even
when we're asking for an Anglicized version.

I'm inclined to add a test to reject non-ASCII in the ICU locale names as
well as the comments.  We've had to do that for libc locale names, and
this experience shows that the ICU locale maintainers don't have their
heads screwed on any straighter.  But this patch doesn't do that.

            regards, tom lane

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 1c43f0b..6febe63 100644
*** a/src/backend/commands/collationcmds.c
--- b/src/backend/commands/collationcmds.c
*************** get_icu_language_tag(const char *localen
*** 431,437 ****

  /*
   * Get a comment (specifically, the display name) for an ICU locale.
!  * The result is a palloc'd string.
   */
  static char *
  get_icu_locale_comment(const char *localename)
--- 431,439 ----

  /*
   * Get a comment (specifically, the display name) for an ICU locale.
!  * The result is a palloc'd string, or NULL if we can't get a comment
!  * or find that it's not all ASCII.  (We can *not* accept non-ASCII
!  * comments, because the contents of template0 must be encoding-agnostic.)
   */
  static char *
  get_icu_locale_comment(const char *localename)
*************** get_icu_locale_comment(const char *local
*** 439,444 ****
--- 441,447 ----
      UErrorCode    status;
      UChar        displayname[128];
      int32        len_uchar;
+     int32        i;
      char       *result;

      status = U_ZERO_ERROR;
*************** get_icu_locale_comment(const char *local
*** 446,456 ****
                                      displayname, lengthof(displayname),
                                      &status);
      if (U_FAILURE(status))
!         ereport(ERROR,
!                 (errmsg("could not get display name for locale \"%s\": %s",
!                         localename, u_errorName(status))));

!     icu_from_uchar(&result, displayname, len_uchar);

      return result;
  }
--- 449,468 ----
                                      displayname, lengthof(displayname),
                                      &status);
      if (U_FAILURE(status))
!         return NULL;            /* no good reason to raise an error */

!     /* Check for non-ASCII comment */
!     for (i = 0; i < len_uchar; i++)
!     {
!         if (displayname[i] > 127)
!             return NULL;
!     }
!
!     /* OK, transcribe */
!     result = palloc(len_uchar + 1);
!     for (i = 0; i < len_uchar; i++)
!         result[i] = displayname[i];
!     result[len_uchar] = '\0';

      return result;
  }
*************** pg_import_system_collations(PG_FUNCTION_
*** 642,655 ****

      /* Load collations known to ICU */
  #ifdef USE_ICU
-     if (!is_encoding_supported_by_icu(GetDatabaseEncoding()))
-     {
-         ereport(NOTICE,
-                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-                  errmsg("encoding \"%s\" not supported by ICU",
-                         pg_encoding_to_char(GetDatabaseEncoding()))));
-     }
-     else
      {
          int            i;

--- 654,659 ----
*************** pg_import_system_collations(PG_FUNCTION_
*** 661,666 ****
--- 665,671 ----
          {
              const char *name;
              char       *langtag;
+             char       *icucomment;
              const char *collcollate;
              UEnumeration *en;
              UErrorCode    status;
*************** pg_import_system_collations(PG_FUNCTION_
*** 686,693 ****

                  CommandCounterIncrement();

!                 CreateComments(collid, CollationRelationId, 0,
!                                get_icu_locale_comment(name));
              }

              /*
--- 691,700 ----

                  CommandCounterIncrement();

!                 icucomment = get_icu_locale_comment(name);
!                 if (icucomment)
!                     CreateComments(collid, CollationRelationId, 0,
!                                    icucomment);
              }

              /*
*************** pg_import_system_collations(PG_FUNCTION_
*** 720,727 ****

                      CommandCounterIncrement();

!                     CreateComments(collid, CollationRelationId, 0,
!                                    get_icu_locale_comment(localeid));
                  }
              }
              if (U_FAILURE(status))
--- 727,736 ----

                      CommandCounterIncrement();

!                     icucomment = get_icu_locale_comment(name);
!                     if (icucomment)
!                         CreateComments(collid, CollationRelationId, 0,
!                                        icucomment);
                  }
              }
              if (U_FAILURE(status))

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: [HACKERS] Logical replication: stuck spinlock atReplicationSlotRelease
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] Logical replication: stuck spinlock at ReplicationSlotRelease