Re: How to find freak UTF-8 character?

Поиск
Список
Период
Сортировка
От Leif Biberg Kristensen
Тема Re: How to find freak UTF-8 character?
Дата
Msg-id 201110012316.06162.leif@solumslekt.org
обсуждение исходный текст
Ответ на Re: How to find freak UTF-8 character?  (Andrew Sullivan <ajs@crankycanuck.ca>)
Ответы Re: How to find freak UTF-8 character?
Re: How to find freak UTF-8 character?
Re: How to find freak UTF-8 character?
Список pgsql-general
On Saturday 1. October 2011 21.29.45 Andrew Sullivan wrote:
> I see you found it, but note that it's _not_ a spurious UTF-8
> character: it's a right-to-left mark, ans is a perfectly ok UTF-8 code
> point.

Andrew,
thank you for your reply. Yes I know that this is a perfectly legal UTF-8
character. It crept into my database as a result of a copy-and-paste job from
a web site. The point is that it doesn't have a counterpart in ISO-8859-1 to
which I regularly have to export the data.

The offending character came from this URL:
<http://www.soge.kviteseid.no/individual.php?pid=I2914&ged=Kviteseid.GED&tab=0>

and the text that I copied and pasted from the page looks like this in the
source code:

Aslaug Steinarsdotter Fjågesund  ‎(I2914)‎

I'm going to write to the webmaster of the site and ask why that character,
represented in the HTML as the ‎ entity, has to appear in a Norwegian web
site which never should have to display text in anything but left-to-right
order.

> If you need a subset of the UTF-8 character set, you want to make sure
> you have some sort of constraint in your application or your database
> that prevents insertion of anything at all in UTF-8.  This is a need
> people often forget when working in an internationalized setting,
> because there's a lot of crap that comes from the client side in a
> UTF-8 setting that might not come in other settings (like LATIN1).

I don't want any constraint of that sort. I'm perfectly happy with UTF-8. And
now that I've found out how to spot problematic characters that will crash my
export script, it's really not an issue anymore. The character didn't print
neither in psql nor in my PHP frontend, so I just removed the problematic text
and re-entered it by hand. Problem solved.

But thank you for the idea, I think that I will strip out at least any ‎
entities from text entered into the database.

By the way, is there a setting in psql that will output unprintable characters
as question marks or something?

regards, Leif.

В списке pgsql-general по дате отправления:

Предыдущее
От: Dmitriy Igrishin
Дата:
Сообщение: Re: bytea columns and large values
Следующее
От: Jeff Adams
Дата:
Сообщение: SQL Help - Finding Next Lowest Value of Current Row Value