On 26 Dec 2011, at 8:22, Adarsh Sharma wrote:
> Dear all,
>
> I am facing a unique issue when I try to load an sql into a postgresql database :-
Actually, your issue isn't unique at all. You'll find it reoccurs on this list regularly, although perhaps less
frequentthese days.
> I faced an issue some days ago & I solved the issue by the below command :
> cat backup.sql | recode iso-8859-1..u8 > backup.sql
That command assumes that every string in the sql file is encoded as iso-8859-1 (unless it already is unicode).
> But this time the byte sequence changes to Japanese , & I fail to solve the issue. Please let me know how to solve
theissue as typing the error in Google shows only one link:
> ( http://blog.e-shell.org/134 )
The above recode command works for the guys in the blog post you linked, as they were converting a database with
Spanishdata to UTF-8. They knew what encoding they were coming from.
In your case, you have a mixed bag of encodings, going all the way from latin encodings to japanese.
I'm not sure what recode would do to data that's in a different encoding than the specified source encoding - I expect
thatit will just assume it's in the specified source encoding (it cannot know that this isn't the case for a particular
string)and attempt to convert it to UTF-8 _using that encoding_.
Chances are you just converted valid data in a different encoding (than the source encoding you specified) into garbage
inUTF-8... I seem to recall that if recode runs into problems recoding a string to UTF-8 it will leave it untouched,
butthat will NOT happen in all cases. Sometimes it will succeed, even though the result has no meaning to a human.
That's a nasty problem you ran into, I hope the archives provide the wisdom you need.
Alban Hertroys
--
Screwing up is an excellent way to attach something to the ceiling.