> pg_mb2wchar_with_len() converts server encoded strings to pg_wchar
> strings. But pg_wchar is typedef'd as unsigned int which is not the
> same as wchar_t at least on Windows (unsigned short).
Oops. The problem is here. TParserInit allocates twice less memory than needed.
And it happens if sizeof(wchar_t) < sizeof(pg_wchar) and C-locale for
non-Windows box. Also for Windows, encoding should be non-utf. So, all p_is*
functions are broken in this case because they work with wrong data.
.
> I modified it corresponding to the change in char2wchar() so that
> wchar2char(char2wchar(x)) becomes x. Though I'm not sure if it is
mbstowcs/wcstombs doesn't work with C-locale in other OSes too, so that's not
needed.
> If there's an effective function like pg_wchar2mb_with_len() which
> converts wchar_t strings to server encoded strings, we had better
> simply call it for char2wchar().
I don't see a way to produce correct result of char2wchar with C-locale and
sizeof(wchar_t) = 2.
In summary, I suggest to remove support of C-locale from char2wchar function and
tsearch's parser should directly use pg_mb2wchar_with_len() in case of
C-locale and multibyte encoding. In all other places char2wchar is called only
for non-C locale.
Please, test attached patch.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/