Re: Windows and locales and UTF-8 (oh my)

Поиск
Список
Период
Сортировка
От Magnus Hagander
Тема Re: Windows and locales and UTF-8 (oh my)
Дата
Msg-id 20071015114010.GD5806@svr2.hagander.net
обсуждение исходный текст
Ответ на Re: Windows and locales and UTF-8 (oh my)  (Magnus Hagander <magnus@hagander.net>)
Ответы Re: Windows and locales and UTF-8 (oh my)  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Mon, Oct 15, 2007 at 01:26:00PM +0200, Magnus Hagander wrote:
> On Mon, Oct 15, 2007 at 11:09:54AM +0200, Magnus Hagander wrote:
> > On Sat, Oct 06, 2007 at 01:53:31PM -0400, Tom Lane wrote:
> > > I am thinking that Dave's discovery explains some previously unsolved
> > > bug reports, such as
> > > http://archives.postgresql.org/pgsql-bugs/2007-05/msg00260.php
> > > If Windows returns LC_CTYPE=C in a situation like this, then
> > > the various single-byte-charset optimization paths that are enabled by
> > > lc_ctype_is_c() would be mistakenly used, leading to misbehavior in
> > > upper()/lower() and other places.  ISTM we had better hack
> > > lc_ctype_is_c() so that on Windows (only), if the database encoding
> > > is UTF-8 then it returns FALSE regardless of what setlocale says.
> > 
> > Yes, I think we a change to that routine.
> > 
> > But. What about the case when we actually *have* locale=C and
> > encoding=UTF8. We need to care for that one somehow. Perhaps we should look
> > at LC_COLLATE instead (again, on Windows only. Possibly even only in the
> > windows+locale_returns_c+encoring=utf8 case, to distinguish these two)?
> 
> Hmm. Looking more at that, may there be another problem? Looking at
> WriteControlFile(), it writes out what setlocale(LC_CTYPE) returns, which
> will then be "C" - even if the database isn't in C.
> 
> But I don't really know when that code is called, or if I'm just looking at
> things wrong. Just starting up and shutting down the database leaves it at
> Swedish_Sweden.1252, not C.
> (1252 is still the wrong encoding specifyer, but it'll work anyway since we
> convert to UTF16)

Gah, got that backwards. Of course it does, because it only returns "C" if
we set to Swedish_Sweden.65001, and we don't *do* that with the patch I
sent in earlier. We set it to Swedish_Sweden, which is a perfectly valid
LC_CTYPE.

And given that, do we even nede to special-case lc_ctype_is_c() at all? If
we never pass in a .65001 locale (which we don't, because it fails)?

//Magnus


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Magnus Hagander
Дата:
Сообщение: Re: Windows and locales and UTF-8 (oh my)
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Windows and locales and UTF-8 (oh my)