Matthew Peter wrote:
> Is it possible to automatically detect the language encoding of incoming
> data? For instance if Japanese is used, is there a way to know it is
> Japanese from a bit in the charset, a dictionary-based evaluation or
> otherwise?
>
Have a look at http://www.mozilla.org/projects/intl/chardet.html and
http://chardet.feedparser.org/ for some implementations of this idea.
These detectors are often inaccurate though (and sometimes fail
completely), see the warning at the bottom of
http://chardet.feedparser.org/docs/supported-encodings.html
Regards,
LL