Обсуждение: BUG #3766: tsearch2 index creation error

Поиск
Список
Период
Сортировка

BUG #3766: tsearch2 index creation error

От
"Thomas Haegi"
Дата:
The following bug has been logged online:

Bug reference:      3766
Logged by:          Thomas Haegi
Email address:      me@alternize.com
PostgreSQL version: 8.3b3
Operating system:   Windows 2003
Description:        tsearch2 index creation error
Details:

when following the documentation
(http://www.postgresql.org/docs/8.3/static/textsearch-tables.html), the
creation of an gin tsearch index fails:

CREATE INDEX posts_fts_idx ON forum.posts USING gin(to_tsvector('english',
p_msg_clean));

ERROR:  translation from wchar_t to server encoding failed: No error

********** Error **********

ERROR: translation from wchar_t to server encoding failed: No error
SQL state: 22021



field p_msg_clean is TEXT (unlimited), db encoding is UTF8.

- thomas

Re: BUG #3766: tsearch2 index creation error

От
Tom Lane
Дата:
"Thomas Haegi" <me@alternize.com> writes:
> Operating system:   Windows 2003

> CREATE INDEX posts_fts_idx ON forum.posts USING gin(to_tsvector('english',
> p_msg_clean));
> ERROR:  translation from wchar_t to server encoding failed: No error

Hmm.  That error message is close to some code that is specific to the
Windows-and-UTF8 case, which might explain why I don't see it.

Can any Windows hackers check into whether the WIN32 coding in
wchar2char() and char2wchar() in ts_locale.c is sane?

            regards, tom lane

Re: BUG #3766: tsearch2 index creation error

От
"Thomas H."
Дата:
Tom Lane wrote:
>> Operating system:   Windows 2003
>
>> CREATE INDEX posts_fts_idx ON forum.posts USING gin(to_tsvector('english',
>> p_msg_clean));
>> ERROR:  translation from wchar_t to server encoding failed: No error
>
> Hmm.  That error message is close to some code that is specific to the
> Windows-and-UTF8 case, which might explain why I don't see it.
>
> Can any Windows hackers check into whether the WIN32 coding in
> wchar2char() and char2wchar() in ts_locale.c is sane?

has anyone had the chance to look into that problem? i'd be more than
willing to help testing an updated build if needed.

thanks,
thomas

Re: BUG #3766: tsearch2 index creation error

От
Tom Lane
Дата:
"Thomas H." <me@alternize.com> writes:
> Tom Lane wrote:
>> Can any Windows hackers check into whether the WIN32 coding in
>> wchar2char() and char2wchar() in ts_locale.c is sane?

> has anyone had the chance to look into that problem? i'd be more than
> willing to help testing an updated build if needed.

After re-reading Microsoft's man pages I think I see the problem ---
attached patch is applied.

            regards, tom lane

Index: src/backend/tsearch/ts_locale.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/tsearch/ts_locale.c,v
retrieving revision 1.4
diff -c -r1.4 ts_locale.c
*** src/backend/tsearch/ts_locale.c    15 Nov 2007 21:14:38 -0000    1.4
--- src/backend/tsearch/ts_locale.c    24 Nov 2007 21:14:49 -0000
***************
*** 23,29 ****
   * wchar2char --- convert wide characters to multibyte format
   *
   * This has the same API as the standard wcstombs() function; in particular,
!  * tolen is the maximum number of bytes to store at *to, and *from should be
   * zero-terminated.  The output will be zero-terminated iff there is room.
   */
  size_t
--- 23,29 ----
   * wchar2char --- convert wide characters to multibyte format
   *
   * This has the same API as the standard wcstombs() function; in particular,
!  * tolen is the maximum number of bytes to store at *to, and *from must be
   * zero-terminated.  The output will be zero-terminated iff there is room.
   */
  size_t
***************
*** 73,93 ****
      {
          int            r;

!         r = MultiByteToWideChar(CP_UTF8, 0, from, fromlen, to, tolen);
!
!         if (r <= 0)
          {
!             pg_verifymbstr(from, fromlen, false);
!             ereport(ERROR,
!                     (errcode(ERRCODE_CHARACTER_NOT_IN_REPERTOIRE),
!                      errmsg("invalid multibyte character for locale"),
!                      errhint("The server's LC_CTYPE locale is probably incompatible with the database encoding.")));
          }

!         Assert(r <= tolen);

!         /* Microsoft counts the zero terminator in the result */
!         return r - 1;
      }
  #endif   /* WIN32 */

--- 73,100 ----
      {
          int            r;

!         /* stupid Microsloth API does not work for zero-length input */
!         if (fromlen == 0)
!             r = 0;
!         else
          {
!             r = MultiByteToWideChar(CP_UTF8, 0, from, fromlen, to, tolen - 1);
!
!             if (r <= 0)
!             {
!                 /* see notes in oracle_compat.c about error reporting */
!                 pg_verifymbstr(from, fromlen, false);
!                 ereport(ERROR,
!                         (errcode(ERRCODE_CHARACTER_NOT_IN_REPERTOIRE),
!                          errmsg("invalid multibyte character for locale"),
!                          errhint("The server's LC_CTYPE locale is probably incompatible with the database
encoding.")));
!             }
          }

!         Assert(r < tolen);
!         to[r] = 0;

!         return r;
      }
  #endif   /* WIN32 */

Re: BUG #3766: tsearch2 index creation error

От
"Thomas H."
Дата:
tom lane wrote:
>>> Can any Windows hackers check into whether the WIN32 coding in
>>> wchar2char() and char2wchar() in ts_locale.c is sane?
>
>> has anyone had the chance to look into that problem? i'd be more than
>> willing to help testing an updated build if needed.
>
> After re-reading Microsoft's man pages I think I see the problem ---
> attached patch is applied.
>

thank you for taking a shot at the problem. unfortunately, i still
couldn't get around to get a mvc build environement up & running so i
can not compile the patch myself.

if any of the win32-hackers (magnus?) can provide me with a binary, i
can test it. else i'll wait for the next official build.

thanks,
thomas

Re: BUG #3766: tsearch2 index creation error

От
"Thomas H."
Дата:
Tom Lane wrote:
> "Thomas H." <me@alternize.com> writes:
>> Tom Lane wrote:
>>> Can any Windows hackers check into whether the WIN32 coding in
>>> wchar2char() and char2wchar() in ts_locale.c is sane?
>
>> has anyone had the chance to look into that problem? i'd be more than
>> willing to help testing an updated build if needed.
>
> After re-reading Microsoft's man pages I think I see the problem ---
> attached patch is applied.
>
>             regards, tom lane

tsearch2 works fine now in the official win32 b4 build

thanks,
thomas