Обсуждение: BUG #3766: tsearch2 index creation error
The following bug has been logged online: Bug reference: 3766 Logged by: Thomas Haegi Email address: me@alternize.com PostgreSQL version: 8.3b3 Operating system: Windows 2003 Description: tsearch2 index creation error Details: when following the documentation (http://www.postgresql.org/docs/8.3/static/textsearch-tables.html), the creation of an gin tsearch index fails: CREATE INDEX posts_fts_idx ON forum.posts USING gin(to_tsvector('english', p_msg_clean)); ERROR: translation from wchar_t to server encoding failed: No error ********** Error ********** ERROR: translation from wchar_t to server encoding failed: No error SQL state: 22021 field p_msg_clean is TEXT (unlimited), db encoding is UTF8. - thomas
"Thomas Haegi" <me@alternize.com> writes:
> Operating system: Windows 2003
> CREATE INDEX posts_fts_idx ON forum.posts USING gin(to_tsvector('english',
> p_msg_clean));
> ERROR: translation from wchar_t to server encoding failed: No error
Hmm. That error message is close to some code that is specific to the
Windows-and-UTF8 case, which might explain why I don't see it.
Can any Windows hackers check into whether the WIN32 coding in
wchar2char() and char2wchar() in ts_locale.c is sane?
regards, tom lane
Tom Lane wrote:
>> Operating system: Windows 2003
>
>> CREATE INDEX posts_fts_idx ON forum.posts USING gin(to_tsvector('english',
>> p_msg_clean));
>> ERROR: translation from wchar_t to server encoding failed: No error
>
> Hmm. That error message is close to some code that is specific to the
> Windows-and-UTF8 case, which might explain why I don't see it.
>
> Can any Windows hackers check into whether the WIN32 coding in
> wchar2char() and char2wchar() in ts_locale.c is sane?
has anyone had the chance to look into that problem? i'd be more than
willing to help testing an updated build if needed.
thanks,
thomas
"Thomas H." <me@alternize.com> writes:
> Tom Lane wrote:
>> Can any Windows hackers check into whether the WIN32 coding in
>> wchar2char() and char2wchar() in ts_locale.c is sane?
> has anyone had the chance to look into that problem? i'd be more than
> willing to help testing an updated build if needed.
After re-reading Microsoft's man pages I think I see the problem ---
attached patch is applied.
regards, tom lane
Index: src/backend/tsearch/ts_locale.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/tsearch/ts_locale.c,v
retrieving revision 1.4
diff -c -r1.4 ts_locale.c
*** src/backend/tsearch/ts_locale.c 15 Nov 2007 21:14:38 -0000 1.4
--- src/backend/tsearch/ts_locale.c 24 Nov 2007 21:14:49 -0000
***************
*** 23,29 ****
* wchar2char --- convert wide characters to multibyte format
*
* This has the same API as the standard wcstombs() function; in particular,
! * tolen is the maximum number of bytes to store at *to, and *from should be
* zero-terminated. The output will be zero-terminated iff there is room.
*/
size_t
--- 23,29 ----
* wchar2char --- convert wide characters to multibyte format
*
* This has the same API as the standard wcstombs() function; in particular,
! * tolen is the maximum number of bytes to store at *to, and *from must be
* zero-terminated. The output will be zero-terminated iff there is room.
*/
size_t
***************
*** 73,93 ****
{
int r;
! r = MultiByteToWideChar(CP_UTF8, 0, from, fromlen, to, tolen);
!
! if (r <= 0)
{
! pg_verifymbstr(from, fromlen, false);
! ereport(ERROR,
! (errcode(ERRCODE_CHARACTER_NOT_IN_REPERTOIRE),
! errmsg("invalid multibyte character for locale"),
! errhint("The server's LC_CTYPE locale is probably incompatible with the database encoding.")));
}
! Assert(r <= tolen);
! /* Microsoft counts the zero terminator in the result */
! return r - 1;
}
#endif /* WIN32 */
--- 73,100 ----
{
int r;
! /* stupid Microsloth API does not work for zero-length input */
! if (fromlen == 0)
! r = 0;
! else
{
! r = MultiByteToWideChar(CP_UTF8, 0, from, fromlen, to, tolen - 1);
!
! if (r <= 0)
! {
! /* see notes in oracle_compat.c about error reporting */
! pg_verifymbstr(from, fromlen, false);
! ereport(ERROR,
! (errcode(ERRCODE_CHARACTER_NOT_IN_REPERTOIRE),
! errmsg("invalid multibyte character for locale"),
! errhint("The server's LC_CTYPE locale is probably incompatible with the database
encoding.")));
! }
}
! Assert(r < tolen);
! to[r] = 0;
! return r;
}
#endif /* WIN32 */
tom lane wrote: >>> Can any Windows hackers check into whether the WIN32 coding in >>> wchar2char() and char2wchar() in ts_locale.c is sane? > >> has anyone had the chance to look into that problem? i'd be more than >> willing to help testing an updated build if needed. > > After re-reading Microsoft's man pages I think I see the problem --- > attached patch is applied. > thank you for taking a shot at the problem. unfortunately, i still couldn't get around to get a mvc build environement up & running so i can not compile the patch myself. if any of the win32-hackers (magnus?) can provide me with a binary, i can test it. else i'll wait for the next official build. thanks, thomas
Tom Lane wrote: > "Thomas H." <me@alternize.com> writes: >> Tom Lane wrote: >>> Can any Windows hackers check into whether the WIN32 coding in >>> wchar2char() and char2wchar() in ts_locale.c is sane? > >> has anyone had the chance to look into that problem? i'd be more than >> willing to help testing an updated build if needed. > > After re-reading Microsoft's man pages I think I see the problem --- > attached patch is applied. > > regards, tom lane tsearch2 works fine now in the official win32 b4 build thanks, thomas