Обсуждение: BUG #3766: tsearch2 index creation error
The following bug has been logged online: Bug reference: 3766 Logged by: Thomas Haegi Email address: me@alternize.com PostgreSQL version: 8.3b3 Operating system: Windows 2003 Description: tsearch2 index creation error Details: when following the documentation (http://www.postgresql.org/docs/8.3/static/textsearch-tables.html), the creation of an gin tsearch index fails: CREATE INDEX posts_fts_idx ON forum.posts USING gin(to_tsvector('english', p_msg_clean)); ERROR: translation from wchar_t to server encoding failed: No error ********** Error ********** ERROR: translation from wchar_t to server encoding failed: No error SQL state: 22021 field p_msg_clean is TEXT (unlimited), db encoding is UTF8. - thomas
"Thomas Haegi" <me@alternize.com> writes: > Operating system: Windows 2003 > CREATE INDEX posts_fts_idx ON forum.posts USING gin(to_tsvector('english', > p_msg_clean)); > ERROR: translation from wchar_t to server encoding failed: No error Hmm. That error message is close to some code that is specific to the Windows-and-UTF8 case, which might explain why I don't see it. Can any Windows hackers check into whether the WIN32 coding in wchar2char() and char2wchar() in ts_locale.c is sane? regards, tom lane
Tom Lane wrote: >> Operating system: Windows 2003 > >> CREATE INDEX posts_fts_idx ON forum.posts USING gin(to_tsvector('english', >> p_msg_clean)); >> ERROR: translation from wchar_t to server encoding failed: No error > > Hmm. That error message is close to some code that is specific to the > Windows-and-UTF8 case, which might explain why I don't see it. > > Can any Windows hackers check into whether the WIN32 coding in > wchar2char() and char2wchar() in ts_locale.c is sane? has anyone had the chance to look into that problem? i'd be more than willing to help testing an updated build if needed. thanks, thomas
"Thomas H." <me@alternize.com> writes: > Tom Lane wrote: >> Can any Windows hackers check into whether the WIN32 coding in >> wchar2char() and char2wchar() in ts_locale.c is sane? > has anyone had the chance to look into that problem? i'd be more than > willing to help testing an updated build if needed. After re-reading Microsoft's man pages I think I see the problem --- attached patch is applied. regards, tom lane Index: src/backend/tsearch/ts_locale.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/tsearch/ts_locale.c,v retrieving revision 1.4 diff -c -r1.4 ts_locale.c *** src/backend/tsearch/ts_locale.c 15 Nov 2007 21:14:38 -0000 1.4 --- src/backend/tsearch/ts_locale.c 24 Nov 2007 21:14:49 -0000 *************** *** 23,29 **** * wchar2char --- convert wide characters to multibyte format * * This has the same API as the standard wcstombs() function; in particular, ! * tolen is the maximum number of bytes to store at *to, and *from should be * zero-terminated. The output will be zero-terminated iff there is room. */ size_t --- 23,29 ---- * wchar2char --- convert wide characters to multibyte format * * This has the same API as the standard wcstombs() function; in particular, ! * tolen is the maximum number of bytes to store at *to, and *from must be * zero-terminated. The output will be zero-terminated iff there is room. */ size_t *************** *** 73,93 **** { int r; ! r = MultiByteToWideChar(CP_UTF8, 0, from, fromlen, to, tolen); ! ! if (r <= 0) { ! pg_verifymbstr(from, fromlen, false); ! ereport(ERROR, ! (errcode(ERRCODE_CHARACTER_NOT_IN_REPERTOIRE), ! errmsg("invalid multibyte character for locale"), ! errhint("The server's LC_CTYPE locale is probably incompatible with the database encoding."))); } ! Assert(r <= tolen); ! /* Microsoft counts the zero terminator in the result */ ! return r - 1; } #endif /* WIN32 */ --- 73,100 ---- { int r; ! /* stupid Microsloth API does not work for zero-length input */ ! if (fromlen == 0) ! r = 0; ! else { ! r = MultiByteToWideChar(CP_UTF8, 0, from, fromlen, to, tolen - 1); ! ! if (r <= 0) ! { ! /* see notes in oracle_compat.c about error reporting */ ! pg_verifymbstr(from, fromlen, false); ! ereport(ERROR, ! (errcode(ERRCODE_CHARACTER_NOT_IN_REPERTOIRE), ! errmsg("invalid multibyte character for locale"), ! errhint("The server's LC_CTYPE locale is probably incompatible with the database encoding."))); ! } } ! Assert(r < tolen); ! to[r] = 0; ! return r; } #endif /* WIN32 */
tom lane wrote: >>> Can any Windows hackers check into whether the WIN32 coding in >>> wchar2char() and char2wchar() in ts_locale.c is sane? > >> has anyone had the chance to look into that problem? i'd be more than >> willing to help testing an updated build if needed. > > After re-reading Microsoft's man pages I think I see the problem --- > attached patch is applied. > thank you for taking a shot at the problem. unfortunately, i still couldn't get around to get a mvc build environement up & running so i can not compile the patch myself. if any of the win32-hackers (magnus?) can provide me with a binary, i can test it. else i'll wait for the next official build. thanks, thomas
Tom Lane wrote: > "Thomas H." <me@alternize.com> writes: >> Tom Lane wrote: >>> Can any Windows hackers check into whether the WIN32 coding in >>> wchar2char() and char2wchar() in ts_locale.c is sane? > >> has anyone had the chance to look into that problem? i'd be more than >> willing to help testing an updated build if needed. > > After re-reading Microsoft's man pages I think I see the problem --- > attached patch is applied. > > regards, tom lane tsearch2 works fine now in the official win32 b4 build thanks, thomas