Re: LATIN9 - hex in varchar after convert

Поиск
Список
Период
Сортировка
От Steve Tucknott (TuSol)
Тема Re: LATIN9 - hex in varchar after convert
Дата
Msg-id 2c0310b2f49cf950c266334a1c3cfa2ed747395f.camel@tusol.co.uk
обсуждение исходный текст
Ответ на Re: LATIN9 - hex in varchar after convert  (Laurenz Albe <laurenz.albe@cybertec.at>)
Список pgsql-novice
aaa
On Sun, 2020-04-26 at 04:24 +0200, Laurenz Albe wrote:
On Sat, 2020-04-25 at 10:41 +0100, Steve Tucknott (TuSol) wrote:
I have a table with a varchar(5000) that contains general text. The table is typically
maintained via a GUI, but on this occasion I received a spreadsheet with data and
loaded it - via copy - from a csv extracted from that. The data looked fine in psql,
but when looking at the data in the GUI, characters such as single quote marks (')
appeared as a series of special characters. I assumed that the spreadsheet then had
some different encoding (UTF8?) and that I then needed to 'translate' the characters.

Very likely, the characters were not really single quotes, but "curly quotes"
(UNICODE 201C and 201E) characters.

One of the following scenarios must have taken place:

1. The file was encoded in UTF-8, but when you copied the data in, the encoding
   you specified (or had by default) was a single-gyte encoding like LATIN9.

   The curly quotes are more than one byte in UTF-8, but each byte was interpreted as
   a LATIN9 character.

   The solution would be to specify ENCODING 'UTF8' with COPY.

2. The characters are actually fine in the database, and you loaded them correctly,
   and your database client encoding is UTF8, but your terminal is in LATIN9.

   The characters were displayed correctly, but your terminal interpreted each
   byte as a character.

To determine which was the case, look what bytes are in the database:

SELECT badcol, badcol::bytea FROM tab WHERE id = 12345;

Yours,
Laurenz Albe

Laurenz,
Thank you for the reply. After correcting my 'convert' mistake (thanks again to Tom), I decided to drop the rows loaded from the spreadsheet and correct the source data and reload - only as Libreoffice has a simple 'find and replace' - so I could copy the 'naff' quote character into the find field and then replace with a standard quote. Looks fine so far!

Thank you all for your help,
Steve

В списке pgsql-novice по дате отправления:

Предыдущее
От: Laurenz Albe
Дата:
Сообщение: Re: LATIN9 - hex in varchar after convert
Следующее
От: JORGE MALDONADO
Дата:
Сообщение: Advice on a table structure