Обсуждение: Easy way to convert a database from WIN1252 to UTF8?

Поиск
Список
Период
Сортировка

Easy way to convert a database from WIN1252 to UTF8?

От
Mike Christensen
Дата:
I'd like to convert a small database to UTF8 before it becomes too
large.  I'm running on 8.3.x on Windows.  It doesn't seem that pgAdmin
has any native way of doing this, what's the easiest way to go about
doing this?  Thanks!

Mike

Re: Easy way to convert a database from WIN1252 to UTF8?

От
Justin Graf
Дата:

On 7/1/2010 11:08 AM, Mike Christensen wrote:
> I'd like to convert a small database to UTF8 before it becomes too
> large.  I'm running on 8.3.x on Windows.  It doesn't seem that pgAdmin
> has any native way of doing this, what's the easiest way to go about
> doing this?  Thanks!
>
> Mike
>

Dump/Backup the database , then create a new database using utf-8
then restore the database.




All legitimate Magwerks Corporation quotations are sent in a .PDF file attachment with a unique ID number generated by
ourproprietary quotation system. Quotations received via any other form of communication will not be honored. 

CONFIDENTIALITY NOTICE: This e-mail, including attachments, may contain legally privileged, confidential or other
informationproprietary to Magwerks Corporation and is intended solely for the use of the individual to whom it
addresses.If the reader of this e-mail is not the intended recipient or authorized agent, the reader is hereby notified
thatany unauthorized viewing, dissemination, distribution or copying of this e-mail is strictly prohibited. If you have
receivedthis e-mail in error, please notify the sender by replying to this message and destroy all occurrences of this
e-mailimmediately. 
Thank you.

Вложения

Re: Easy way to convert a database from WIN1252 to UTF8?

От
Mike Christensen
Дата:
On Thu, Jul 1, 2010 at 9:31 AM, Mike Christensen <mike@kitchenpc.com> wrote:
> On Thu, Jul 1, 2010 at 10:22 AM, Justin Graf <justin@magwerks.com> wrote:
>>
>>
>> On 7/1/2010 11:08 AM, Mike Christensen wrote:
>>> I'd like to convert a small database to UTF8 before it becomes too
>>> large.  I'm running on 8.3.x on Windows.  It doesn't seem that pgAdmin
>>> has any native way of doing this, what's the easiest way to go about
>>> doing this?  Thanks!
>>>
>>> Mike
>>>
>>
>> Dump/Backup the database , then create a new database using utf-8
>> then restore the database.
>>
>
> This is what I'm trying to do, but it's a total nightmare..
>
> First, I did a:
>
> pg_dump -U root MyDB > c:\DB.dbs.out
>
> which appears to have worked..  Then, I edited the file in Notepad and
> saved it as UTF8 which also appears to have worked.  Next, I created
> the new DB using UTF8, and ran:
>
> psql -U root MyDB2 < c:\DB.dbs.out
>
> I get pages and pages of errors about foreign key restraint violations
> and other stuff.  Looking at the DB after, almost all tables are
> empty.  When I look at the DB.dbs.out file more carefully, the problem
> is fairly obvious.  It attempts to create all the tables in
> alphabetical order.  First, it inserts data into the "A" table which
> has a FK restraint on the B table which isn't populated yet.
> Obviously, this is going to cause problems.  Perhaps when restoring a
> DB you're supposed to drop all restraints first?  I suppose I could
> hack this into working eventually, but I was hoping there was an
> easier way..  Thanks..
>
> Mike
>

Ok first off, my bad it seems the db_dump scripts are smart enough to
add constraints last..

Here's the actual errors I get while restoring, which seem to result
in several tables being empty.  Does this mean that the data in my
source database has somehow lost integrity, even though I have
constraints in place?

ERROR:  character 0x9d of encoding "WIN1252" has no equivalent in "UTF8"
CONTEXT:  COPY notifications, line 170
ERROR:  character 0x9d of encoding "WIN1252" has no equivalent in "UTF8"
CONTEXT:  COPY recipes, line 2
ERROR:  character 0x9d of encoding "WIN1252" has no equivalent in "UTF8"
CONTEXT:  COPY wallposts, line 13
ERROR:  insert or update on table "reciperatings" violates foreign key constrain
t "fk2f0492774b4d795a"
DETAIL:  Key (recipeid)=(1202a454-ca32-49e0-9f38-0d973d80b2a9) is not present in
 table "recipes".
ERROR:  insert or update on table "recipeingredients" violates foreign key const
raint "fk673433f4b4d795a"
DETAIL:  Key (recipeid)=(139ff23b-bead-48ac-b606-fc12467ff940) is not present in
 table "recipes".
ERROR:  insert or update on table "calendars" violates foreign key constraint "f
kb1f25cf24b4d795a"
DETAIL:  Key (recipeid)=(2f944c26-efd1-406c-bd27-0be760e09f19) is not present in
 table "recipes".
ERROR:  insert or update on table "recipetags" violates foreign key constraint "
fkeabd5f754b4d795a"
DETAIL:  Key (recipeid)=(f9ebec00-6ebd-459d-a093-060bc3a755af) is not present in
 table "recipes".
ERROR:  insert or update on table "recipecomments" violates foreign key constrai
nt "fkec92d3c74b4d795a"
DETAIL:  Key (recipeid)=(1af68f09-0cba-4fae-8a08-9e56a3e72676) is not present in
 table "recipes".
ERROR:  insert or update on table "favorites" violates foreign key constraint "f
kf92d78404b4d795a"
DETAIL:  Key (recipeid)=(86c9e722-3508-4910-8a3e-10a705271cbb) is not present in
 table "recipes".

Re: Easy way to convert a database from WIN1252 to UTF8?

От
Mike Christensen
Дата:
On Thu, Jul 1, 2010 at 10:22 AM, Justin Graf <justin@magwerks.com> wrote:
>
>
> On 7/1/2010 11:08 AM, Mike Christensen wrote:
>> I'd like to convert a small database to UTF8 before it becomes too
>> large.  I'm running on 8.3.x on Windows.  It doesn't seem that pgAdmin
>> has any native way of doing this, what's the easiest way to go about
>> doing this?  Thanks!
>>
>> Mike
>>
>
> Dump/Backup the database , then create a new database using utf-8
> then restore the database.
>

This is what I'm trying to do, but it's a total nightmare..

First, I did a:

pg_dump -U root MyDB > c:\DB.dbs.out

which appears to have worked..  Then, I edited the file in Notepad and
saved it as UTF8 which also appears to have worked.  Next, I created
the new DB using UTF8, and ran:

psql -U root MyDB2 < c:\DB.dbs.out

I get pages and pages of errors about foreign key restraint violations
and other stuff.  Looking at the DB after, almost all tables are
empty.  When I look at the DB.dbs.out file more carefully, the problem
is fairly obvious.  It attempts to create all the tables in
alphabetical order.  First, it inserts data into the "A" table which
has a FK restraint on the B table which isn't populated yet.
Obviously, this is going to cause problems.  Perhaps when restoring a
DB you're supposed to drop all restraints first?  I suppose I could
hack this into working eventually, but I was hoping there was an
easier way..  Thanks..

Mike

Re: Easy way to convert a database from WIN1252 to UTF8?

От
Sam Mason
Дата:
On Thu, Jul 01, 2010 at 09:31:12AM -0700, Mike Christensen wrote:
> Then, I edited the file in Notepad and saved it as UTF8 which also
> appears to have worked.

I don't think you want to be doing this. The file should contain a "set
client_encoding" of the correct value which will let PG to do the "right
thing" when you import it back into a UTF8 database.

--
  Sam  http://samason.me.uk/

Re: Easy way to convert a database from WIN1252 to UTF8?

От
Mike Christensen
Дата:
On Thu, Jul 1, 2010 at 9:44 AM, Sam Mason <sam@samason.me.uk> wrote:
> On Thu, Jul 01, 2010 at 09:31:12AM -0700, Mike Christensen wrote:
>> Then, I edited the file in Notepad and saved it as UTF8 which also
>> appears to have worked.
>
> I don't think you want to be doing this. The file should contain a "set
> client_encoding" of the correct value which will let PG to do the "right
> thing" when you import it back into a UTF8 database.
>
> --
>  Sam  http://samason.me.uk/
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Yup, that fixed it.  I had to change the DB.dbs.out file to use the
new encoding.  Duh!  Everything's working now.

Mike

Re: Easy way to convert a database from WIN1252 to UTF8?

От
Sam Mason
Дата:
On Thu, Jul 01, 2010 at 09:47:03AM -0700, Mike Christensen wrote:
> On Thu, Jul 1, 2010 at 9:44 AM, Sam Mason <sam@samason.me.uk> wrote:
> > On Thu, Jul 01, 2010 at 09:31:12AM -0700, Mike Christensen wrote:
> >> Then, I edited the file in Notepad and saved it as UTF8 which also
> >> appears to have worked.
> >
> > I don't think you want to be doing this. The file should contain a "set
> > client_encoding" of the correct value which will let PG to do the "right
> > thing" when you import it back into a UTF8 database.
>
> Yup, that fixed it.  I had to change the DB.dbs.out file to use the
> new encoding.  Duh!  Everything's working now.

AFAIK, you should be able to leave the file exactly as it came out of
pg_dump and PG will do all the conversions automatically.

If you've not checked already, it may be worth checking through to see
what row was on line 170 and see if it's got the right thing in there
now.

--
  Sam  http://samason.me.uk/

Re: Easy way to convert a database from WIN1252 to UTF8?

От
Mike Christensen
Дата:
On Thu, Jul 1, 2010 at 9:54 AM, Sam Mason <sam@samason.me.uk> wrote:
> On Thu, Jul 01, 2010 at 09:47:03AM -0700, Mike Christensen wrote:
>> On Thu, Jul 1, 2010 at 9:44 AM, Sam Mason <sam@samason.me.uk> wrote:
>> > On Thu, Jul 01, 2010 at 09:31:12AM -0700, Mike Christensen wrote:
>> >> Then, I edited the file in Notepad and saved it as UTF8 which also
>> >> appears to have worked.
>> >
>> > I don't think you want to be doing this. The file should contain a "set
>> > client_encoding" of the correct value which will let PG to do the "right
>> > thing" when you import it back into a UTF8 database.
>>
>> Yup, that fixed it.  I had to change the DB.dbs.out file to use the
>> new encoding.  Duh!  Everything's working now.
>
> AFAIK, you should be able to leave the file exactly as it came out of
> pg_dump and PG will do all the conversions automatically.
>
> If you've not checked already, it may be worth checking through to see
> what row was on line 170 and see if it's got the right thing in there
> now.
>
> --
>  Sam  http://samason.me.uk/
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Yup, the problem is line 170 doesn't actually match up to the
DB.dbs.out file line 170 (which is a blank line).  I believe it means
line 170 from the stdin pipe it was processing for the copy command.
Suffice to say, there was some weird character in my database that PG
can't automatically translate from WIN1252 to UTF8, and apparently it
will drop that /entire/ COPY command (the entire table doesn't get
populated!)..

As to what character was the culprit, I'm not entirely sure how to
figure this out.  I guess I could look for that hex value?  However,
if I set the encoding in the script itself, everything works
perfectly.

I'm not sure this is a PG bug or anything, but having more detail on
this error might be helpful.

Mike

Re: Easy way to convert a database from WIN1252 to UTF8?

От
Sam Mason
Дата:
On Thu, Jul 01, 2010 at 10:01:02AM -0700, Mike Christensen wrote:
> Yup, the problem is line 170 doesn't actually match up to the
> DB.dbs.out file line 170 (which is a blank line).  I believe it means
> line 170 from the stdin pipe it was processing for the copy command.

Doh, that's annoying.  It would be nice to know that it's done the right
thing rather than "some" thing.

> Suffice to say, there was some weird character in my database that PG
> can't automatically translate from WIN1252 to UTF8, and apparently it
> will drop that /entire/ COPY command (the entire table doesn't get
> populated!)..

Yup, this is deliberate.  You can also run psql with "-1" to put the
whole lot (i.e. every table/view/... creation and data insert) in a
transaction which will cause the whole restore to be rolled back if
something doesn't look right as well.

> As to what character was the culprit, I'm not entirely sure how to
> figure this out.  I guess I could look for that hex value?  However,
> if I set the encoding in the script itself, everything works
> perfectly.

PG is doing the right thing, 9D is undefined in Win1252.  I guess you've
either got other problems or this was just an artifact of converting
from Win1252 to UTF8 external to PG and then not telling it that you'd
done that.

--
  Sam  http://samason.me.uk/

Re: Easy way to convert a database from WIN1252 to UTF8?

От
Mike Christensen
Дата:
On Thu, Jul 1, 2010 at 10:07 AM, Sam Mason <sam@samason.me.uk> wrote:
> On Thu, Jul 01, 2010 at 10:01:02AM -0700, Mike Christensen wrote:
>> Yup, the problem is line 170 doesn't actually match up to the
>> DB.dbs.out file line 170 (which is a blank line).  I believe it means
>> line 170 from the stdin pipe it was processing for the copy command.
>
> Doh, that's annoying.  It would be nice to know that it's done the right
> thing rather than "some" thing.
>
>> Suffice to say, there was some weird character in my database that PG
>> can't automatically translate from WIN1252 to UTF8, and apparently it
>> will drop that /entire/ COPY command (the entire table doesn't get
>> populated!)..
>
> Yup, this is deliberate.  You can also run psql with "-1" to put the
> whole lot (i.e. every table/view/... creation and data insert) in a
> transaction which will cause the whole restore to be rolled back if
> something doesn't look right as well.
>
>> As to what character was the culprit, I'm not entirely sure how to
>> figure this out.  I guess I could look for that hex value?  However,
>> if I set the encoding in the script itself, everything works
>> perfectly.
>
> PG is doing the right thing, 9D is undefined in Win1252.  I guess you've
> either got other problems or this was just an artifact of converting
> from Win1252 to UTF8 external to PG and then not telling it that you'd
> done that.
>
> --
>  Sam  http://samason.me.uk/
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Yeah, looking at the lines in question I don't really see anything
wrong with them.  Everything is going into the database as UTF8 so
maybe some weird characters got stuck in there somehow with the old
default encoding.  This is the main reason why I'm converting to UTF8
now, so data will be consistent across all layers..  Good to get these
bugs out of the way while the data set is relatively small.

If anyone wants me to do any more debugging, I'd be more than happy to
but I'm satisfied with the results.  Thanks!

Mike