Character encoding problems
От | Bruce Clay |
---|---|
Тема | Character encoding problems |
Дата | |
Msg-id | 35b888aa-eac8-4b23-9f17-a04feb58854b@Mariah обсуждение исходный текст |
Ответы |
Re: Character encoding problems
|
Список | pgsql-general |
Sorry for the duplicate postings. I have only recieved one reply so far and that was a suggestion to post to this forum. I trying to build a database to support natural language processing from a variety of data files posted on the internet. Many of them are identified as using UTF-8 encoding. Some of these are dictionary files fro WinEdt. Some are froman Open Source multi-lingual health care package. When I try to build a table from several of the different languages I get the following error ERROR: invalid byte sequence for encoding "UTF8": 0x82 I checked the encoding and it is indeed set up for Unicode-8. I tried to create databases using a variety of other encodingtypes such as WIN1252 and others and I got the same error message from all of them except SQL_ASCII. When I created the database using SQL_ASCII I received the warning that the database could only store 7 bit data. When Iloaded the data in this database I did not have any errors and when I look at the data it seems to be the same as in theoriginal text file. Is there a "proper" encoding type that I should use to load the word lists so they can be interoperable with the WordNetdataset that happily uses the UTF8 encoding? Bruce
В списке pgsql-general по дате отправления: