Обсуждение: Easy way to convert a database from WIN1252 to UTF8?
I'd like to convert a small database to UTF8 before it becomes too large. I'm running on 8.3.x on Windows. It doesn't seem that pgAdmin has any native way of doing this, what's the easiest way to go about doing this? Thanks! Mike
On 7/1/2010 11:08 AM, Mike Christensen wrote: > I'd like to convert a small database to UTF8 before it becomes too > large. I'm running on 8.3.x on Windows. It doesn't seem that pgAdmin > has any native way of doing this, what's the easiest way to go about > doing this? Thanks! > > Mike > Dump/Backup the database , then create a new database using utf-8 then restore the database. All legitimate Magwerks Corporation quotations are sent in a .PDF file attachment with a unique ID number generated by ourproprietary quotation system. Quotations received via any other form of communication will not be honored. CONFIDENTIALITY NOTICE: This e-mail, including attachments, may contain legally privileged, confidential or other informationproprietary to Magwerks Corporation and is intended solely for the use of the individual to whom it addresses.If the reader of this e-mail is not the intended recipient or authorized agent, the reader is hereby notified thatany unauthorized viewing, dissemination, distribution or copying of this e-mail is strictly prohibited. If you have receivedthis e-mail in error, please notify the sender by replying to this message and destroy all occurrences of this e-mailimmediately. Thank you.
Вложения
On Thu, Jul 1, 2010 at 9:31 AM, Mike Christensen <mike@kitchenpc.com> wrote: > On Thu, Jul 1, 2010 at 10:22 AM, Justin Graf <justin@magwerks.com> wrote: >> >> >> On 7/1/2010 11:08 AM, Mike Christensen wrote: >>> I'd like to convert a small database to UTF8 before it becomes too >>> large. I'm running on 8.3.x on Windows. It doesn't seem that pgAdmin >>> has any native way of doing this, what's the easiest way to go about >>> doing this? Thanks! >>> >>> Mike >>> >> >> Dump/Backup the database , then create a new database using utf-8 >> then restore the database. >> > > This is what I'm trying to do, but it's a total nightmare.. > > First, I did a: > > pg_dump -U root MyDB > c:\DB.dbs.out > > which appears to have worked.. Then, I edited the file in Notepad and > saved it as UTF8 which also appears to have worked. Next, I created > the new DB using UTF8, and ran: > > psql -U root MyDB2 < c:\DB.dbs.out > > I get pages and pages of errors about foreign key restraint violations > and other stuff. Looking at the DB after, almost all tables are > empty. When I look at the DB.dbs.out file more carefully, the problem > is fairly obvious. It attempts to create all the tables in > alphabetical order. First, it inserts data into the "A" table which > has a FK restraint on the B table which isn't populated yet. > Obviously, this is going to cause problems. Perhaps when restoring a > DB you're supposed to drop all restraints first? I suppose I could > hack this into working eventually, but I was hoping there was an > easier way.. Thanks.. > > Mike > Ok first off, my bad it seems the db_dump scripts are smart enough to add constraints last.. Here's the actual errors I get while restoring, which seem to result in several tables being empty. Does this mean that the data in my source database has somehow lost integrity, even though I have constraints in place? ERROR: character 0x9d of encoding "WIN1252" has no equivalent in "UTF8" CONTEXT: COPY notifications, line 170 ERROR: character 0x9d of encoding "WIN1252" has no equivalent in "UTF8" CONTEXT: COPY recipes, line 2 ERROR: character 0x9d of encoding "WIN1252" has no equivalent in "UTF8" CONTEXT: COPY wallposts, line 13 ERROR: insert or update on table "reciperatings" violates foreign key constrain t "fk2f0492774b4d795a" DETAIL: Key (recipeid)=(1202a454-ca32-49e0-9f38-0d973d80b2a9) is not present in table "recipes". ERROR: insert or update on table "recipeingredients" violates foreign key const raint "fk673433f4b4d795a" DETAIL: Key (recipeid)=(139ff23b-bead-48ac-b606-fc12467ff940) is not present in table "recipes". ERROR: insert or update on table "calendars" violates foreign key constraint "f kb1f25cf24b4d795a" DETAIL: Key (recipeid)=(2f944c26-efd1-406c-bd27-0be760e09f19) is not present in table "recipes". ERROR: insert or update on table "recipetags" violates foreign key constraint " fkeabd5f754b4d795a" DETAIL: Key (recipeid)=(f9ebec00-6ebd-459d-a093-060bc3a755af) is not present in table "recipes". ERROR: insert or update on table "recipecomments" violates foreign key constrai nt "fkec92d3c74b4d795a" DETAIL: Key (recipeid)=(1af68f09-0cba-4fae-8a08-9e56a3e72676) is not present in table "recipes". ERROR: insert or update on table "favorites" violates foreign key constraint "f kf92d78404b4d795a" DETAIL: Key (recipeid)=(86c9e722-3508-4910-8a3e-10a705271cbb) is not present in table "recipes".
On Thu, Jul 1, 2010 at 10:22 AM, Justin Graf <justin@magwerks.com> wrote: > > > On 7/1/2010 11:08 AM, Mike Christensen wrote: >> I'd like to convert a small database to UTF8 before it becomes too >> large. I'm running on 8.3.x on Windows. It doesn't seem that pgAdmin >> has any native way of doing this, what's the easiest way to go about >> doing this? Thanks! >> >> Mike >> > > Dump/Backup the database , then create a new database using utf-8 > then restore the database. > This is what I'm trying to do, but it's a total nightmare.. First, I did a: pg_dump -U root MyDB > c:\DB.dbs.out which appears to have worked.. Then, I edited the file in Notepad and saved it as UTF8 which also appears to have worked. Next, I created the new DB using UTF8, and ran: psql -U root MyDB2 < c:\DB.dbs.out I get pages and pages of errors about foreign key restraint violations and other stuff. Looking at the DB after, almost all tables are empty. When I look at the DB.dbs.out file more carefully, the problem is fairly obvious. It attempts to create all the tables in alphabetical order. First, it inserts data into the "A" table which has a FK restraint on the B table which isn't populated yet. Obviously, this is going to cause problems. Perhaps when restoring a DB you're supposed to drop all restraints first? I suppose I could hack this into working eventually, but I was hoping there was an easier way.. Thanks.. Mike
On Thu, Jul 01, 2010 at 09:31:12AM -0700, Mike Christensen wrote: > Then, I edited the file in Notepad and saved it as UTF8 which also > appears to have worked. I don't think you want to be doing this. The file should contain a "set client_encoding" of the correct value which will let PG to do the "right thing" when you import it back into a UTF8 database. -- Sam http://samason.me.uk/
On Thu, Jul 1, 2010 at 9:44 AM, Sam Mason <sam@samason.me.uk> wrote: > On Thu, Jul 01, 2010 at 09:31:12AM -0700, Mike Christensen wrote: >> Then, I edited the file in Notepad and saved it as UTF8 which also >> appears to have worked. > > I don't think you want to be doing this. The file should contain a "set > client_encoding" of the correct value which will let PG to do the "right > thing" when you import it back into a UTF8 database. > > -- > Sam http://samason.me.uk/ > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > Yup, that fixed it. I had to change the DB.dbs.out file to use the new encoding. Duh! Everything's working now. Mike
On Thu, Jul 01, 2010 at 09:47:03AM -0700, Mike Christensen wrote: > On Thu, Jul 1, 2010 at 9:44 AM, Sam Mason <sam@samason.me.uk> wrote: > > On Thu, Jul 01, 2010 at 09:31:12AM -0700, Mike Christensen wrote: > >> Then, I edited the file in Notepad and saved it as UTF8 which also > >> appears to have worked. > > > > I don't think you want to be doing this. The file should contain a "set > > client_encoding" of the correct value which will let PG to do the "right > > thing" when you import it back into a UTF8 database. > > Yup, that fixed it. I had to change the DB.dbs.out file to use the > new encoding. Duh! Everything's working now. AFAIK, you should be able to leave the file exactly as it came out of pg_dump and PG will do all the conversions automatically. If you've not checked already, it may be worth checking through to see what row was on line 170 and see if it's got the right thing in there now. -- Sam http://samason.me.uk/
On Thu, Jul 1, 2010 at 9:54 AM, Sam Mason <sam@samason.me.uk> wrote: > On Thu, Jul 01, 2010 at 09:47:03AM -0700, Mike Christensen wrote: >> On Thu, Jul 1, 2010 at 9:44 AM, Sam Mason <sam@samason.me.uk> wrote: >> > On Thu, Jul 01, 2010 at 09:31:12AM -0700, Mike Christensen wrote: >> >> Then, I edited the file in Notepad and saved it as UTF8 which also >> >> appears to have worked. >> > >> > I don't think you want to be doing this. The file should contain a "set >> > client_encoding" of the correct value which will let PG to do the "right >> > thing" when you import it back into a UTF8 database. >> >> Yup, that fixed it. I had to change the DB.dbs.out file to use the >> new encoding. Duh! Everything's working now. > > AFAIK, you should be able to leave the file exactly as it came out of > pg_dump and PG will do all the conversions automatically. > > If you've not checked already, it may be worth checking through to see > what row was on line 170 and see if it's got the right thing in there > now. > > -- > Sam http://samason.me.uk/ > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > Yup, the problem is line 170 doesn't actually match up to the DB.dbs.out file line 170 (which is a blank line). I believe it means line 170 from the stdin pipe it was processing for the copy command. Suffice to say, there was some weird character in my database that PG can't automatically translate from WIN1252 to UTF8, and apparently it will drop that /entire/ COPY command (the entire table doesn't get populated!).. As to what character was the culprit, I'm not entirely sure how to figure this out. I guess I could look for that hex value? However, if I set the encoding in the script itself, everything works perfectly. I'm not sure this is a PG bug or anything, but having more detail on this error might be helpful. Mike
On Thu, Jul 01, 2010 at 10:01:02AM -0700, Mike Christensen wrote: > Yup, the problem is line 170 doesn't actually match up to the > DB.dbs.out file line 170 (which is a blank line). I believe it means > line 170 from the stdin pipe it was processing for the copy command. Doh, that's annoying. It would be nice to know that it's done the right thing rather than "some" thing. > Suffice to say, there was some weird character in my database that PG > can't automatically translate from WIN1252 to UTF8, and apparently it > will drop that /entire/ COPY command (the entire table doesn't get > populated!).. Yup, this is deliberate. You can also run psql with "-1" to put the whole lot (i.e. every table/view/... creation and data insert) in a transaction which will cause the whole restore to be rolled back if something doesn't look right as well. > As to what character was the culprit, I'm not entirely sure how to > figure this out. I guess I could look for that hex value? However, > if I set the encoding in the script itself, everything works > perfectly. PG is doing the right thing, 9D is undefined in Win1252. I guess you've either got other problems or this was just an artifact of converting from Win1252 to UTF8 external to PG and then not telling it that you'd done that. -- Sam http://samason.me.uk/
On Thu, Jul 1, 2010 at 10:07 AM, Sam Mason <sam@samason.me.uk> wrote: > On Thu, Jul 01, 2010 at 10:01:02AM -0700, Mike Christensen wrote: >> Yup, the problem is line 170 doesn't actually match up to the >> DB.dbs.out file line 170 (which is a blank line). I believe it means >> line 170 from the stdin pipe it was processing for the copy command. > > Doh, that's annoying. It would be nice to know that it's done the right > thing rather than "some" thing. > >> Suffice to say, there was some weird character in my database that PG >> can't automatically translate from WIN1252 to UTF8, and apparently it >> will drop that /entire/ COPY command (the entire table doesn't get >> populated!).. > > Yup, this is deliberate. You can also run psql with "-1" to put the > whole lot (i.e. every table/view/... creation and data insert) in a > transaction which will cause the whole restore to be rolled back if > something doesn't look right as well. > >> As to what character was the culprit, I'm not entirely sure how to >> figure this out. I guess I could look for that hex value? However, >> if I set the encoding in the script itself, everything works >> perfectly. > > PG is doing the right thing, 9D is undefined in Win1252. I guess you've > either got other problems or this was just an artifact of converting > from Win1252 to UTF8 external to PG and then not telling it that you'd > done that. > > -- > Sam http://samason.me.uk/ > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > Yeah, looking at the lines in question I don't really see anything wrong with them. Everything is going into the database as UTF8 so maybe some weird characters got stuck in there somehow with the old default encoding. This is the main reason why I'm converting to UTF8 now, so data will be consistent across all layers.. Good to get these bugs out of the way while the data set is relatively small. If anyone wants me to do any more debugging, I'd be more than happy to but I'm satisfied with the results. Thanks! Mike