Re: Character Encoding Question

Поиск
Список
Период
Сортировка
От Don Parris
Тема Re: Character Encoding Question
Дата
Msg-id CAJ-7yom8TOnO3=87BJRPBCOiwBUY=0iiwkfBQZVJc36Se_Er2g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Character Encoding Question  (Daniele Varrazzo <daniele.varrazzo@gmail.com>)
Список psycopg
On Fri, Mar 29, 2013 at 5:35 AM, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote:
On Fri, Mar 29, 2013 at 2:01 AM, Don Parris <parrisdc@gmail.com> wrote:

> Aha!  As it turns out, I started looking into the character set support in
> the postgresql documentation, and discovered the psql -l command.  It showed
> this test database is actually *not* encoded in UTF-8 at all, but rather in
> ASCII.  I am not sure how I managed to do that, but I did.  I was sure I had
> used the same DB creation script and just changed the DB name, but clearly,
> I missed something.  I am not sure if it is necessary to drop and re-create
> the database to correct this, but that is what I have done.
>
> When I tried using \encoding or SET client_encoding, I got no errors, but I
> still saw this test DB set as ASCII when running the psql -l command.
> Anyway, I'll have to pursue this further later.  Many thanks for the help!

In this case you should convert your database to utf8 (because it
contains utf8 data) asap. SQL_ASCII actually doesn't mean ASCII but
means store whatever octet you throw at it as it is, it's more akin to
binary data (but without the possibility to store 0x00). From your
examples, and with some luck, your database may contain utf8 only
data, but if you connect with different clients or encodings and feed
some latin1 etc. the database will be just happy to accept everything,
no question asked; just, it will be a nightmare to read the data back
or to make it uniform later.

If you don't have familiarity with encodings and relative problems,
the Spolsky article is a nice introduction
<http://www.joelonsoftware.com/articles/Unicode.html>.



Thanks Daniele,

I think I sent a follow-up post to this one saying that I have now converted this db to UTF-8.  I appreciate your help in tracking down what the problem was, as well as the link to this article.  Good reading for sure.  If I understand the article correctly, I can handle pretty much any language - Korean, bulgarian, Arabic, etc... - by using the UTF-8 encoding.  Is that correct?

Incidentally, my code actually broke on records that were only in English.  Or at least that is how it appears.  The particular table I was searching on contains no non-English letters.  It probably will contain non-English characters in the future, but does not now.

I am very interested in being able to support multiple languages, as my wife and I speak Castillano (Peruvian flavored) and I speak a little German and a few words in other languages.  That's a topic for another day and probably for another list, however.  :-)

Again, many thanks to all of you for the help!



--
D.C. Parris, FMP, Linux+, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
GPG Key ID: F5E179BE

В списке psycopg по дате отправления:

Предыдущее
От: Daniele Varrazzo
Дата:
Сообщение: Re: Character Encoding Question
Следующее
От: Don Parris
Дата:
Сообщение: How to Handle ltree path Data Type