Re: garbage in psql -l
От | Roger Leigh |
---|---|
Тема | Re: garbage in psql -l |
Дата | |
Msg-id | 20091125001431.GD14791@codelibre.net обсуждение исходный текст |
Ответ на | Re: garbage in psql -l (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: garbage in psql -l
|
Список | pgsql-hackers |
On Tue, Nov 24, 2009 at 05:43:00PM -0500, Tom Lane wrote: > Roger Leigh <rleigh@codelibre.net> writes: > > On Tue, Nov 24, 2009 at 02:19:27PM -0500, Tom Lane wrote: > >> I wonder whether the most prudent solution wouldn't be to prevent > >> default use of linestyle=unicode if ~/.psqlrc hasn't been read. > > > This problem is caused when there's a mismatch between the > > client encoding and the user's locale. We can detect this at > > runtime and fall back to ASCII if we know they are incompatible. > > Well, no, that is *one* of the possible failure modes. I've hit others > already in the short time that the patch has been installed. The one > that's bit me most is that the locale environment seen by psql doesn't > necessarily match what my xterm at the other end of an ssh connection > is prepared to do --- which is something that psql simply doesn't have > a way to detect. Again, this is something that's never mattered before > unless one was really pushing non-ASCII data around, and even then it > was often possible to be sloppy. Sure, but this type of misconfiguration is entirely outside the purview of psql. Everything else on the system, from man(1) to gcc emacs and vi will be sending UTF-8 codes to your terminal for any non-ASCII character they display. While psql using UTF-8 for its tables is certainly exposing the problem, in reality it was already broken, and it's not psql's "fault" for using functionality the system said was available. It would equally break if you stored non-ASCII characters in your UTF-8-encoded database and then ran a SELECT query, since UTF-8 codes would again be sent to the terminal. For the specific case here, where the locale is KOI8-R, we can determine at runtime that this isn't a UTF-8 locale and stay using ASCII. I'll be happy to send a patch in to correct this specific case. At least on GNU/Linux, checking nl_langinfo(CODESET) is considered definitive for testing which character set is available, and it's the standard SUS/POSIX interface for querying the locale. > I'd be more excited about finding a way to use linestyle=unicode by > default if it had anything beyond cosmetic benefits. But it doesn't, > and it's hard to justify ratcheting up the requirements for users to get > their configurations exactly straight when that's all they'll get for it. Bar the lack of nl_langinfo checking, once this is added we will go out of our way to make sure that the system is capable of handling UTF-8. This is, IMHO, the limit of how far i/any/ tool should go to handle things. Worrying about misconfigured terminals, something which is entirely the user's responsiblility, is I think a step too far--going down this road means you'll be artificially limited to ASCII, and the whole point of using nl_langinfo is to allow sensible autoconfiguation, which almost all programs do nowadays. I don't think it makes sense to "penalise" the majority of users with correctly-configured systems because a small minority have a misconfigured terminal input encoding. It is 2009, and all contemporary systems support Unicode, and for the majority it is the default. Every one of the GNU utilities, plus most other free software, localises itself using gettext, which in a UTF-8 locale, even English locales, will transparently recode its output into the locale codeset. This hasn't resulted in major problems for people using these tools; it's been like this way for years now. Regards, Roger -- .''`. Roger Leigh: :' : Debian GNU/Linux http://people.debian.org/~rleigh/`. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `- GPG Public Key: 0x25BFB848 Please GPG sign your mail.
В списке pgsql-hackers по дате отправления: