Обсуждение: Change server encoding after the fact
I would like to change my server_encoding which is currently SQL_ASCII to UTF8.
I have existing data that I would like to keep.
From my understanding of the steps I need to:
1) alter the template1 database encoding via
UPDATE pg_database SET encoding = 6 where datname IN ('template0', 'template1');
2) Dump my current database
pg_dump -Fc foo > foo.db
3) Drop my current database
drop database foo;
4) recreate it with the proper encoding
create database foo with template = template1 encoding = 'UTF-8';
5) restore from backup
pg_restore -d foo foo.db
Are these the correct steps to perform or is there an easier / in-place way?
Also, when I dump my old DB and restore it, will it be converted appropriately (e.g. it came from am SQL_ASCII encoding
andits going into a UTF-8 database)?
Thank you
/Cody
On Fri, Sep 30, 2011 at 11:25 AM, Cody Caughlan <toolbag@gmail.com> wrote:
> I would like to change my server_encoding which is currently SQL_ASCII to UTF8.
>
> I have existing data that I would like to keep.
>
> From my understanding of the steps I need to:
>
> 1) alter the template1 database encoding via
>
> UPDATE pg_database SET encoding = 6 where datname IN ('template0', 'template1');
Just create database using template0 as template and you can skip this step ^^
> Are these the correct steps to perform or is there an easier / in-place way?
> Also, when I dump my old DB and restore it, will it be converted appropriately (e.g. it came from am SQL_ASCII
encodingand its going into a UTF-8 database)?
You might need to set client encoding when restoring. Or use iconv to
convert from one encoding to another, which is what I usually do.
Note that it's VERY likely you'll have data in a SQL_ASCII db that
won't go into a UTF8 database without some lossiness.
Thanks Scott. See below:
On Fri, Sep 30, 2011 at 11:05 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Fri, Sep 30, 2011 at 11:25 AM, Cody Caughlan <toolbag@gmail.com> wrote:Just create database using template0 as template and you can skip this step ^^
> I would like to change my server_encoding which is currently SQL_ASCII to UTF8.
>
> I have existing data that I would like to keep.
>
> From my understanding of the steps I need to:
>
> 1) alter the template1 database encoding via
>
> UPDATE pg_database SET encoding = 6 where datname IN ('template0', 'template1');
Wouldn't this only work if my template0 was UTF8 itself?
=> select datname, pg_encoding_to_char(encoding) from pg_database;
datname | pg_encoding_to_char
----------------------+---------------------
template1 | SQL_ASCII
template0 | SQL_ASCII
postgres | SQL_ASCII
So it appears both template0 & template1 are SQL_ASCII, so how would creating from a new DB from template0 be any different than template1?
> Are these the correct steps to perform or is there an easier / in-place way?You might need to set client encoding when restoring. Or use iconv to
> Also, when I dump my old DB and restore it, will it be converted appropriately (e.g. it came from am SQL_ASCII encoding and its going into a UTF-8 database)?
convert from one encoding to another, which is what I usually do.
Note that it's VERY likely you'll have data in a SQL_ASCII db that
won't go into a UTF8 database without some lossiness.
Yes, I see this might be the case. From my playing around with iconv I cannot even properly do the conversion:
$ pg_dump -Fp foo > foo.sql
$ file -i foo.sql
foo.sql: text/plain; charset=us-ascii
$ iconv -f utf-8 foo.sql > utf8.sql
iconv: illegal input sequence at position 2512661
Uh oh... I cannot event convert it?
Whats my next step at this point if I cannot even convert my data? I'd be OK with some lossiness.
Thanks again
/Cody
On Fri, Sep 30, 2011 at 12:26 PM, Cody Caughlan <toolbag@gmail.com> wrote:
> Thanks Scott. See below:
>
> On Fri, Sep 30, 2011 at 11:05 AM, Scott Marlowe <scott.marlowe@gmail.com>
> wrote:
>>
>> On Fri, Sep 30, 2011 at 11:25 AM, Cody Caughlan <toolbag@gmail.com> wrote:
>> > I would like to change my server_encoding which is currently SQL_ASCII
>> > to UTF8.
>> >
>> > I have existing data that I would like to keep.
>> >
>> > From my understanding of the steps I need to:
>> >
>> > 1) alter the template1 database encoding via
>> >
>> > UPDATE pg_database SET encoding = 6 where datname IN ('template0',
>> > 'template1');
>>
>> Just create database using template0 as template and you can skip this
>> step ^^
>
>
> Wouldn't this only work if my template0 was UTF8 itself?
> => select datname, pg_encoding_to_char(encoding) from pg_database;
> datname | pg_encoding_to_char
> ----------------------+---------------------
> template1 | SQL_ASCII
> template0 | SQL_ASCII
> postgres | SQL_ASCII
>
> So it appears both template0 & template1 are SQL_ASCII, so how would
> creating from a new DB from template0 be any different than template1?
Well, let's try, shall we? From a freshly created cluster on my
laptop, running 8.4:
smarlowe=# select datname, pg_encoding_to_char(encoding) from pg_database;
datname | pg_encoding_to_char
-----------+---------------------
template1 | SQL_ASCII
template0 | SQL_ASCII
postgres | SQL_ASCII
smarlowe | SQL_ASCII
(4 rows)
smarlowe=# create database j template template0 encoding 'UTF8';
CREATE DATABASE
Seems to work.
P.s. I'm not sure why it works, I just know that it does. :)
>> > Are these the correct steps to perform or is there an easier / in-place
>> > way?
>>
>> > Also, when I dump my old DB and restore it, will it be converted
>> > appropriately (e.g. it came from am SQL_ASCII encoding and its going into a
>> > UTF-8 database)?
>>
>> You might need to set client encoding when restoring. Or use iconv to
>> convert from one encoding to another, which is what I usually do.
>> Note that it's VERY likely you'll have data in a SQL_ASCII db that
>> won't go into a UTF8 database without some lossiness.
>
>
> Yes, I see this might be the case. From my playing around with iconv I
> cannot even properly do the conversion:
> $ pg_dump -Fp foo > foo.sql
> $ file -i foo.sql
> foo.sql: text/plain; charset=us-ascii
> $ iconv -f utf-8 foo.sql > utf8.sql
> iconv: illegal input sequence at position 2512661
I think you got it backwards, the -f should be somthing other than
utf-8 right? That's what the -t should be right? Try iconv without a
-f switch and a -t of utf-8 and see what happens...
Scott Marlowe <scott.marlowe@gmail.com> writes:
> On Fri, Sep 30, 2011 at 12:26 PM, Cody Caughlan <toolbag@gmail.com> wrote:
>> So it appears both template0 & template1 are SQL_ASCII, so how would
>> creating from a new DB from template0 be any different than template1?
> P.s. I'm not sure why it works, I just know that it does. :)
CREATE DATABASE assumes that template0 cannot contain any non-ASCII
data, so it's okay to clone it and then pretend that the result has some
other encoding. The same assumption cannot be made for template1, since
that's user-modifiable.
regards, tom lane
Please see below.
On Fri, Sep 30, 2011 at 11:38 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Fri, Sep 30, 2011 at 12:26 PM, Cody Caughlan <toolbag@gmail.com> wrote:Well, let's try, shall we? From a freshly created cluster on my
> Thanks Scott. See below:
>
> On Fri, Sep 30, 2011 at 11:05 AM, Scott Marlowe <scott.marlowe@gmail.com>
> wrote:
>>
>> On Fri, Sep 30, 2011 at 11:25 AM, Cody Caughlan <toolbag@gmail.com> wrote:
>> > I would like to change my server_encoding which is currently SQL_ASCII
>> > to UTF8.
>> >
>> > I have existing data that I would like to keep.
>> >
>> > From my understanding of the steps I need to:
>> >
>> > 1) alter the template1 database encoding via
>> >
>> > UPDATE pg_database SET encoding = 6 where datname IN ('template0',
>> > 'template1');
>>
>> Just create database using template0 as template and you can skip this
>> step ^^
>
>
> Wouldn't this only work if my template0 was UTF8 itself?
> => select datname, pg_encoding_to_char(encoding) from pg_database;
> datname | pg_encoding_to_char
> ----------------------+---------------------
> template1 | SQL_ASCII
> template0 | SQL_ASCII
> postgres | SQL_ASCII
>
> So it appears both template0 & template1 are SQL_ASCII, so how would
> creating from a new DB from template0 be any different than template1?
laptop, running 8.4:
smarlowe=# select datname, pg_encoding_to_char(encoding) from pg_database;datname | pg_encoding_to_charsmarlowe | SQL_ASCII
-----------+---------------------
template1 | SQL_ASCII
template0 | SQL_ASCII
postgres | SQL_ASCII
(4 rows)
smarlowe=# create database j template template0 encoding 'UTF8';
CREATE DATABASE
Seems to work.
P.s. I'm not sure why it works, I just know that it does. :)
Ok, I see what you mean. This would create a new DB with the proper encoding. Which is "fine", and probably what I will do. I guess I see an ideal scenario being one where we permanently convert the template encoding to UTF8 so going forward I dont have to worry about forgetting to adding the encoding= 'UTF8' for every new DB I create.
>> > Are these the correct steps to perform or is there an easier / in-placeI think you got it backwards, the -f should be somthing other than
>> > way?
>>
>> > Also, when I dump my old DB and restore it, will it be converted
>> > appropriately (e.g. it came from am SQL_ASCII encoding and its going into a
>> > UTF-8 database)?
>>
>> You might need to set client encoding when restoring. Or use iconv to
>> convert from one encoding to another, which is what I usually do.
>> Note that it's VERY likely you'll have data in a SQL_ASCII db that
>> won't go into a UTF8 database without some lossiness.
>
>
> Yes, I see this might be the case. From my playing around with iconv I
> cannot even properly do the conversion:
> $ pg_dump -Fp foo > foo.sql
> $ file -i foo.sql
> foo.sql: text/plain; charset=us-ascii
> $ iconv -f utf-8 foo.sql > utf8.sql
> iconv: illegal input sequence at position 2512661
utf-8 right? That's what the -t should be right? Try iconv without a
-f switch and a -t of utf-8 and see what happens...
You're right, I had -f when I needed -t. I tried it again with the same error:
$ iconv -t utf-8 foo.sql > utf.sql
iconv: illegal input sequence at position 2512661
On Fri, Sep 30, 2011 at 12:46 PM, Cody Caughlan <toolbag@gmail.com> wrote:
> Please see below.
>
> On Fri, Sep 30, 2011 at 11:38 AM, Scott Marlowe <scott.marlowe@gmail.com>
> wrote:
>>
>> On Fri, Sep 30, 2011 at 12:26 PM, Cody Caughlan <toolbag@gmail.com> wrote:
>> > Thanks Scott. See below:
>> >
>> > On Fri, Sep 30, 2011 at 11:05 AM, Scott Marlowe
>> > <scott.marlowe@gmail.com>
>> > wrote:
>> >>
>> >> On Fri, Sep 30, 2011 at 11:25 AM, Cody Caughlan <toolbag@gmail.com>
>> >> wrote:
>> >> > I would like to change my server_encoding which is currently
>> >> > SQL_ASCII
>> >> > to UTF8.
>> >> >
>> >> > I have existing data that I would like to keep.
>> >> >
>> >> > From my understanding of the steps I need to:
>> >> >
>> >> > 1) alter the template1 database encoding via
>> >> >
>> >> > UPDATE pg_database SET encoding = 6 where datname IN ('template0',
>> >> > 'template1');
>> >>
>> >> Just create database using template0 as template and you can skip this
>> >> step ^^
>> >
>> >
>> > Wouldn't this only work if my template0 was UTF8 itself?
>> > => select datname, pg_encoding_to_char(encoding) from pg_database;
>> > datname | pg_encoding_to_char
>> > ----------------------+---------------------
>> > template1 | SQL_ASCII
>> > template0 | SQL_ASCII
>> > postgres | SQL_ASCII
>> >
>> > So it appears both template0 & template1 are SQL_ASCII, so how would
>> > creating from a new DB from template0 be any different than template1?
>>
>> Well, let's try, shall we? From a freshly created cluster on my
>> laptop, running 8.4:
>>
>> smarlowe=# select datname, pg_encoding_to_char(encoding) from pg_database;
>> datname | pg_encoding_to_char
>> -----------+---------------------
>> template1 | SQL_ASCII
>> template0 | SQL_ASCII
>> postgres | SQL_ASCII
>> smarlowe | SQL_ASCII
>> (4 rows)
>>
>> smarlowe=# create database j template template0 encoding 'UTF8';
>> CREATE DATABASE
>>
>> Seems to work.
>>
>> P.s. I'm not sure why it works, I just know that it does. :)
>>
>
> Ok, I see what you mean. This would create a new DB with the proper
> encoding. Which is "fine", and probably what I will do. I guess I see an
> ideal scenario being one where we permanently convert the template encoding
> to UTF8 so going forward I dont have to worry about forgetting to adding the
> encoding= 'UTF8' for every new DB I create.
Ah ok. The way I fix that is this:
update pg_database set datistemplate = false where datname='template1';
drop database template1;
create database template1 template template0 encoding 'UTF8';
But your way would likely work too.
>> I think you got it backwards, the -f should be somthing other than
>> utf-8 right? That's what the -t should be right? Try iconv without a
>> -f switch and a -t of utf-8 and see what happens...
>
> You're right, I had -f when I needed -t. I tried it again with the same
> error:
> $ iconv -t utf-8 foo.sql > utf.sql
> iconv: illegal input sequence at position 2512661
Any idea waht the actual encoding of your source database is?
SQL_ASCII is basically not really ascii, more like anything goes.
Cody Caughlan <toolbag@gmail.com> writes: > Ok, I see what you mean. This would create a new DB with the proper > encoding. Which is "fine", and probably what I will do. I guess I see an > ideal scenario being one where we permanently convert the template encoding > to UTF8 so going forward I dont have to worry about forgetting to adding the > encoding= 'UTF8' for every new DB I create. Well, if you're feeling brave you can mark template0 as having utf8 encoding via a manual update to pg_database. In theory that should be safe enough. If you know template1 doesn't, and never has, contained any non-ASCII data, you could do the same to it ... but it would be a lot safer to drop it and recreate from template0. See http://wiki.postgresql.org/wiki/Adventures_in_PostgreSQL,_Episode_1 for some context. regards, tom lane
On Fri, Sep 30, 2011 at 12:31 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
Ah ok. The way I fix that is this:On Fri, Sep 30, 2011 at 12:46 PM, Cody Caughlan <toolbag@gmail.com> wrote:
> Please see below.
>
> On Fri, Sep 30, 2011 at 11:38 AM, Scott Marlowe <scott.marlowe@gmail.com>
> wrote:
>>
>> On Fri, Sep 30, 2011 at 12:26 PM, Cody Caughlan <toolbag@gmail.com> wrote:
>> > Thanks Scott. See below:
>> >
>> > On Fri, Sep 30, 2011 at 11:05 AM, Scott Marlowe
>> > <scott.marlowe@gmail.com>
>> > wrote:
>> >>
>> >> On Fri, Sep 30, 2011 at 11:25 AM, Cody Caughlan <toolbag@gmail.com>
>> >> wrote:
>> >> > I would like to change my server_encoding which is currently
>> >> > SQL_ASCII
>> >> > to UTF8.
>> >> >
>> >> > I have existing data that I would like to keep.
>> >> >
>> >> > From my understanding of the steps I need to:
>> >> >
>> >> > 1) alter the template1 database encoding via
>> >> >
>> >> > UPDATE pg_database SET encoding = 6 where datname IN ('template0',
>> >> > 'template1');
>> >>
>> >> Just create database using template0 as template and you can skip this
>> >> step ^^
>> >
>> >
>> > Wouldn't this only work if my template0 was UTF8 itself?
>> > => select datname, pg_encoding_to_char(encoding) from pg_database;
>> > datname | pg_encoding_to_char
>> > ----------------------+---------------------
>> > template1 | SQL_ASCII
>> > template0 | SQL_ASCII
>> > postgres | SQL_ASCII
>> >
>> > So it appears both template0 & template1 are SQL_ASCII, so how would
>> > creating from a new DB from template0 be any different than template1?
>>
>> Well, let's try, shall we? From a freshly created cluster on my
>> laptop, running 8.4:
>>
>> smarlowe=# select datname, pg_encoding_to_char(encoding) from pg_database;
>> datname | pg_encoding_to_char
>> -----------+---------------------
>> template1 | SQL_ASCII
>> template0 | SQL_ASCII
>> postgres | SQL_ASCII
>> smarlowe | SQL_ASCII
>> (4 rows)
>>
>> smarlowe=# create database j template template0 encoding 'UTF8';
>> CREATE DATABASE
>>
>> Seems to work.
>>
>> P.s. I'm not sure why it works, I just know that it does. :)
>>
>
> Ok, I see what you mean. This would create a new DB with the proper
> encoding. Which is "fine", and probably what I will do. I guess I see an
> ideal scenario being one where we permanently convert the template encoding
> to UTF8 so going forward I dont have to worry about forgetting to adding the
> encoding= 'UTF8' for every new DB I create.
update pg_database set datistemplate = false where datname='template1';
drop database template1;
create database template1 template template0 encoding 'UTF8';
But your way would likely work too.Any idea waht the actual encoding of your source database is?
>> I think you got it backwards, the -f should be somthing other than
>> utf-8 right? That's what the -t should be right? Try iconv without a
>> -f switch and a -t of utf-8 and see what happens...
>
> You're right, I had -f when I needed -t. I tried it again with the same
> error:
> $ iconv -t utf-8 foo.sql > utf.sql
> iconv: illegal input sequence at position 2512661
SQL_ASCII is basically not really ascii, more like anything goes.
How would I find this? pg_database says my DB is SQL_ASCII.
"show all" says
client_encoding = SQL_ASCII
server_encoding = SQL_ASCII
On Fri, Sep 30, 2011 at 1:35 PM, Cody Caughlan <toolbag@gmail.com> wrote:
>
>
> On Fri, Sep 30, 2011 at 12:31 PM, Scott Marlowe <scott.marlowe@gmail.com>
> wrote:
>>
>> On Fri, Sep 30, 2011 at 12:46 PM, Cody Caughlan <toolbag@gmail.com> wrote:
>> > Please see below.
>> >
>> > On Fri, Sep 30, 2011 at 11:38 AM, Scott Marlowe
>> > <scott.marlowe@gmail.com>
>> > wrote:
>> >>
>> >> On Fri, Sep 30, 2011 at 12:26 PM, Cody Caughlan <toolbag@gmail.com>
>> >> wrote:
>> >> > Thanks Scott. See below:
>> >> >
>> >> > On Fri, Sep 30, 2011 at 11:05 AM, Scott Marlowe
>> >> > <scott.marlowe@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> On Fri, Sep 30, 2011 at 11:25 AM, Cody Caughlan <toolbag@gmail.com>
>> >> >> wrote:
>> >> >> > I would like to change my server_encoding which is currently
>> >> >> > SQL_ASCII
>> >> >> > to UTF8.
>> >> >> >
>> >> >> > I have existing data that I would like to keep.
>> >> >> >
>> >> >> > From my understanding of the steps I need to:
>> >> >> >
>> >> >> > 1) alter the template1 database encoding via
>> >> >> >
>> >> >> > UPDATE pg_database SET encoding = 6 where datname IN ('template0',
>> >> >> > 'template1');
>> >> >>
>> >> >> Just create database using template0 as template and you can skip
>> >> >> this
>> >> >> step ^^
>> >> >
>> >> >
>> >> > Wouldn't this only work if my template0 was UTF8 itself?
>> >> > => select datname, pg_encoding_to_char(encoding) from pg_database;
>> >> > datname | pg_encoding_to_char
>> >> > ----------------------+---------------------
>> >> > template1 | SQL_ASCII
>> >> > template0 | SQL_ASCII
>> >> > postgres | SQL_ASCII
>> >> >
>> >> > So it appears both template0 & template1 are SQL_ASCII, so how would
>> >> > creating from a new DB from template0 be any different than
>> >> > template1?
>> >>
>> >> Well, let's try, shall we? From a freshly created cluster on my
>> >> laptop, running 8.4:
>> >>
>> >> smarlowe=# select datname, pg_encoding_to_char(encoding) from
>> >> pg_database;
>> >> datname | pg_encoding_to_char
>> >> -----------+---------------------
>> >> template1 | SQL_ASCII
>> >> template0 | SQL_ASCII
>> >> postgres | SQL_ASCII
>> >> smarlowe | SQL_ASCII
>> >> (4 rows)
>> >>
>> >> smarlowe=# create database j template template0 encoding 'UTF8';
>> >> CREATE DATABASE
>> >>
>> >> Seems to work.
>> >>
>> >> P.s. I'm not sure why it works, I just know that it does. :)
>> >>
>> >
>> > Ok, I see what you mean. This would create a new DB with the proper
>> > encoding. Which is "fine", and probably what I will do. I guess I see an
>> > ideal scenario being one where we permanently convert the template
>> > encoding
>> > to UTF8 so going forward I dont have to worry about forgetting to adding
>> > the
>> > encoding= 'UTF8' for every new DB I create.
>>
>> Ah ok. The way I fix that is this:
>>
>> update pg_database set datistemplate = false where datname='template1';
>> drop database template1;
>> create database template1 template template0 encoding 'UTF8';
>>
>> But your way would likely work too.
>>
>> >> I think you got it backwards, the -f should be somthing other than
>> >> utf-8 right? That's what the -t should be right? Try iconv without a
>> >> -f switch and a -t of utf-8 and see what happens...
>> >
>> > You're right, I had -f when I needed -t. I tried it again with the same
>> > error:
>> > $ iconv -t utf-8 foo.sql > utf.sql
>> > iconv: illegal input sequence at position 2512661
>>
>> Any idea waht the actual encoding of your source database is?
>> SQL_ASCII is basically not really ascii, more like anything goes.
>
>
> How would I find this? pg_database says my DB is SQL_ASCII.
> "show all" says
> client_encoding = SQL_ASCII
> server_encoding = SQL_ASCII
It would have been set by the application accessing postgresql and
inserting the data. I.e. was it a windows app using a typical windows
encoding? etc.
Its a Rails app and I do have:
encoding: utf8
Set in my DB configuration.
On Fri, Sep 30, 2011 at 12:38 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
It would have been set by the application accessing postgresql andOn Fri, Sep 30, 2011 at 1:35 PM, Cody Caughlan <toolbag@gmail.com> wrote:
>
>
> On Fri, Sep 30, 2011 at 12:31 PM, Scott Marlowe <scott.marlowe@gmail.com>
> wrote:
>>
>> On Fri, Sep 30, 2011 at 12:46 PM, Cody Caughlan <toolbag@gmail.com> wrote:
>> > Please see below.
>> >
>> > On Fri, Sep 30, 2011 at 11:38 AM, Scott Marlowe
>> > <scott.marlowe@gmail.com>
>> > wrote:
>> >>
>> >> On Fri, Sep 30, 2011 at 12:26 PM, Cody Caughlan <toolbag@gmail.com>
>> >> wrote:
>> >> > Thanks Scott. See below:
>> >> >
>> >> > On Fri, Sep 30, 2011 at 11:05 AM, Scott Marlowe
>> >> > <scott.marlowe@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> On Fri, Sep 30, 2011 at 11:25 AM, Cody Caughlan <toolbag@gmail.com>
>> >> >> wrote:
>> >> >> > I would like to change my server_encoding which is currently
>> >> >> > SQL_ASCII
>> >> >> > to UTF8.
>> >> >> >
>> >> >> > I have existing data that I would like to keep.
>> >> >> >
>> >> >> > From my understanding of the steps I need to:
>> >> >> >
>> >> >> > 1) alter the template1 database encoding via
>> >> >> >
>> >> >> > UPDATE pg_database SET encoding = 6 where datname IN ('template0',
>> >> >> > 'template1');
>> >> >>
>> >> >> Just create database using template0 as template and you can skip
>> >> >> this
>> >> >> step ^^
>> >> >
>> >> >
>> >> > Wouldn't this only work if my template0 was UTF8 itself?
>> >> > => select datname, pg_encoding_to_char(encoding) from pg_database;
>> >> > datname | pg_encoding_to_char
>> >> > ----------------------+---------------------
>> >> > template1 | SQL_ASCII
>> >> > template0 | SQL_ASCII
>> >> > postgres | SQL_ASCII
>> >> >
>> >> > So it appears both template0 & template1 are SQL_ASCII, so how would
>> >> > creating from a new DB from template0 be any different than
>> >> > template1?
>> >>
>> >> Well, let's try, shall we? From a freshly created cluster on my
>> >> laptop, running 8.4:
>> >>
>> >> smarlowe=# select datname, pg_encoding_to_char(encoding) from
>> >> pg_database;
>> >> datname | pg_encoding_to_char
>> >> -----------+---------------------
>> >> template1 | SQL_ASCII
>> >> template0 | SQL_ASCII
>> >> postgres | SQL_ASCII
>> >> smarlowe | SQL_ASCII
>> >> (4 rows)
>> >>
>> >> smarlowe=# create database j template template0 encoding 'UTF8';
>> >> CREATE DATABASE
>> >>
>> >> Seems to work.
>> >>
>> >> P.s. I'm not sure why it works, I just know that it does. :)
>> >>
>> >
>> > Ok, I see what you mean. This would create a new DB with the proper
>> > encoding. Which is "fine", and probably what I will do. I guess I see an
>> > ideal scenario being one where we permanently convert the template
>> > encoding
>> > to UTF8 so going forward I dont have to worry about forgetting to adding
>> > the
>> > encoding= 'UTF8' for every new DB I create.
>>
>> Ah ok. The way I fix that is this:
>>
>> update pg_database set datistemplate = false where datname='template1';
>> drop database template1;
>> create database template1 template template0 encoding 'UTF8';
>>
>> But your way would likely work too.
>>
>> >> I think you got it backwards, the -f should be somthing other than
>> >> utf-8 right? That's what the -t should be right? Try iconv without a
>> >> -f switch and a -t of utf-8 and see what happens...
>> >
>> > You're right, I had -f when I needed -t. I tried it again with the same
>> > error:
>> > $ iconv -t utf-8 foo.sql > utf.sql
>> > iconv: illegal input sequence at position 2512661
>>
>> Any idea waht the actual encoding of your source database is?
>> SQL_ASCII is basically not really ascii, more like anything goes.
>
>
> How would I find this? pg_database says my DB is SQL_ASCII.
> "show all" says
> client_encoding = SQL_ASCII
> server_encoding = SQL_ASCII
inserting the data. I.e. was it a windows app using a typical windows
encoding? etc.
On Fri, Sep 30, 2011 at 1:39 PM, Cody Caughlan <toolbag@gmail.com> wrote: > Its a Rails app and I do have: > encoding: utf8 Hmmm, if you try this does it work (mostly)? iconv -f utf-8 -t utf-8 -c < infile > outfile
On Fri, Sep 30, 2011 at 1:41 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Fri, Sep 30, 2011 at 1:39 PM, Cody Caughlan <toolbag@gmail.com> wrote: >> Its a Rails app and I do have: >> encoding: utf8 > > > Hmmm, if you try this does it work (mostly)? > > iconv -f utf-8 -t utf-8 -c < infile > outfile If that doesn't work try: iconv -f utf-8 -t utf-8//IGNORE -c < infile > outfile
That worked, but "file" shows no difference:
$ iconv -f utf-8 -t utf-8 -c foo.sql > utf.sql
$ file -i foo.sql
foo.sql: text/plain; charset=us-ascii
$file -i utf.sql
utf.sql: text/plain; charset=us-ascii
So iconv didnt actually convert the file OR does is the "file" command just ignorant?
On Fri, Sep 30, 2011 at 12:41 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Fri, Sep 30, 2011 at 1:39 PM, Cody Caughlan <toolbag@gmail.com> wrote:Hmmm, if you try this does it work (mostly)?
> Its a Rails app and I do have:
> encoding: utf8
iconv -f utf-8 -t utf-8 -c < infile > outfile
On Fri, Sep 30, 2011 at 1:45 PM, Cody Caughlan <toolbag@gmail.com> wrote: > That worked, but "file" shows no difference: > $ iconv -f utf-8 -t utf-8 -c foo.sql > utf.sql > $ file -i foo.sql > foo.sql: text/plain; charset=us-ascii > $file -i utf.sql > utf.sql: text/plain; charset=us-ascii > So iconv didnt actually convert the file OR does is the "file" command just > ignorant? Not sure. try loading the dump into the UTF-8 DB in postgres and see what happens I guess?
Please see below.
On Fri, Sep 30, 2011 at 1:12 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Fri, Sep 30, 2011 at 1:45 PM, Cody Caughlan <toolbag@gmail.com> wrote:Not sure. try loading the dump into the UTF-8 DB in postgres and see
> That worked, but "file" shows no difference:
> $ iconv -f utf-8 -t utf-8 -c foo.sql > utf.sql
> $ file -i foo.sql
> foo.sql: text/plain; charset=us-ascii
> $file -i utf.sql
> utf.sql: text/plain; charset=us-ascii
> So iconv didnt actually convert the file OR does is the "file" command just
> ignorant?
what happens I guess?
Uh oh.
On the remote machine:
$ pg_dump -Fc -E UTF8 foo > foo.sql
Then I've created a new local DB with UTF8 encoding and I try to restore this dump into it:
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 2342; 0 17086 TABLE DATA wine_books vinosmith
pg_restore: [archiver (db)] COPY failed for table "wine_books": ERROR: invalid byte sequence for encoding "UTF8": 0xc309
CONTEXT: COPY wine_books, line 1147
WARNING: errors ignored on restore: 1
And sure enough the table "wine_books" is empty. Not good.
On Fri, Sep 30, 2011 at 2:57 PM, Cody Caughlan <toolbag@gmail.com> wrote: > Please see below. > > On Fri, Sep 30, 2011 at 1:12 PM, Scott Marlowe <scott.marlowe@gmail.com> > wrote: >> >> On Fri, Sep 30, 2011 at 1:45 PM, Cody Caughlan <toolbag@gmail.com> wrote: >> > That worked, but "file" shows no difference: >> > $ iconv -f utf-8 -t utf-8 -c foo.sql > utf.sql >> > $ file -i foo.sql >> > foo.sql: text/plain; charset=us-ascii >> > $file -i utf.sql >> > utf.sql: text/plain; charset=us-ascii >> > So iconv didnt actually convert the file OR does is the "file" command >> > just >> > ignorant? >> >> Not sure. try loading the dump into the UTF-8 DB in postgres and see >> what happens I guess? > > > Uh oh. > On the remote machine: > $ pg_dump -Fc -E UTF8 foo > foo.sql > Then I've created a new local DB with UTF8 encoding and I try to restore > this dump into it: > pg_restore: [archiver (db)] Error while PROCESSING TOC: > pg_restore: [archiver (db)] Error from TOC entry 2342; 0 17086 TABLE DATA > wine_books vinosmith > pg_restore: [archiver (db)] COPY failed for table "wine_books": ERROR: > invalid byte sequence for encoding "UTF8": 0xc309 > CONTEXT: COPY wine_books, line 1147 > WARNING: errors ignored on restore: 1 > And sure enough the table "wine_books" is empty. Not good. You may have to hunt down that one bad line (1147) and chop it out / edit it so it works.
Thanks y'all for your help on this.
I took this opportunity to upgrade to 9.1.1 which is UTF8 by default and I ended up manually cleaning up the borked data by hand (there wasn't that much).
So all is well now.
Thanks again.
/Cody
On Fri, Sep 30, 2011 at 3:37 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
You may have to hunt down that one bad line (1147) and chop it out /On Fri, Sep 30, 2011 at 2:57 PM, Cody Caughlan <toolbag@gmail.com> wrote:
> Please see below.
>
> On Fri, Sep 30, 2011 at 1:12 PM, Scott Marlowe <scott.marlowe@gmail.com>
> wrote:
>>
>> On Fri, Sep 30, 2011 at 1:45 PM, Cody Caughlan <toolbag@gmail.com> wrote:
>> > That worked, but "file" shows no difference:
>> > $ iconv -f utf-8 -t utf-8 -c foo.sql > utf.sql
>> > $ file -i foo.sql
>> > foo.sql: text/plain; charset=us-ascii
>> > $file -i utf.sql
>> > utf.sql: text/plain; charset=us-ascii
>> > So iconv didnt actually convert the file OR does is the "file" command
>> > just
>> > ignorant?
>>
>> Not sure. try loading the dump into the UTF-8 DB in postgres and see
>> what happens I guess?
>
>
> Uh oh.
> On the remote machine:
> $ pg_dump -Fc -E UTF8 foo > foo.sql
> Then I've created a new local DB with UTF8 encoding and I try to restore
> this dump into it:
> pg_restore: [archiver (db)] Error while PROCESSING TOC:
> pg_restore: [archiver (db)] Error from TOC entry 2342; 0 17086 TABLE DATA
> wine_books vinosmith
> pg_restore: [archiver (db)] COPY failed for table "wine_books": ERROR:
> invalid byte sequence for encoding "UTF8": 0xc309
> CONTEXT: COPY wine_books, line 1147
> WARNING: errors ignored on restore: 1
> And sure enough the table "wine_books" is empty. Not good.
edit it so it works.