Обсуждение: ERROR: new collation (en_US.UTF-8) is incompatible with the collation of the template database (en_US.utf-8)

Поиск
Список
Период
Сортировка
Hi all,

while doing `cat pg_dump.dump | psql` I get the above message. Note that 
`pg_dump.dump` contains:

     CREATE DATABASE some_db WITH TEMPLATE = my_own_template ENCODING = 
'UTF8' LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8';

What is exactly the problem? I understand that en_US.UTF-8 and 
en_US.utf-8 is not *exactly* the same string.

However I do not understand how the difference came to be. And I do not 
know what the "right" way is and how to proceed from here.

If I `pg_dump --create` some DB on the new server (13.7-1.pgdg18.04+1) I 
get:

     CREATE DATABASE ... ENCODING = 'UTF8' LOCALE = 'en_US.utf-8';

When I do the same on the old server (12.8-1.pgdg20.04+1) I get:

     CREATE DATABASE ... ENCODING = 'UTF8' LC_COLLATE = 'en_US.UTF-8' 
LC_CTYPE = 'en_US.UTF-8';

Any hints or help?
*t



On 6/22/22 12:17, Tomas Pospisek wrote:
> Hi all,
> 
> while doing `cat pg_dump.dump | psql` I get the above message. Note that 
> `pg_dump.dump` contains:
> 
>      CREATE DATABASE some_db WITH TEMPLATE = my_own_template ENCODING = 
> 'UTF8' LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8';
> 
> What is exactly the problem? I understand that en_US.UTF-8 and 
> en_US.utf-8 is not *exactly* the same string.
> 
> However I do not understand how the difference came to be. And I do not 
> know what the "right" way is and how to proceed from here.
> 
> If I `pg_dump --create` some DB on the new server (13.7-1.pgdg18.04+1) I 
> get:
> 
>      CREATE DATABASE ... ENCODING = 'UTF8' LOCALE = 'en_US.utf-8';
> 
> When I do the same on the old server (12.8-1.pgdg20.04+1) I get:
> 
>      CREATE DATABASE ... ENCODING = 'UTF8' LC_COLLATE = 'en_US.UTF-8' 
> LC_CTYPE = 'en_US.UTF-8';
> 
> Any hints or help?

Are dumping/restoring from one version of Postgres to another?

If from older to newer then use the new version of pg_dump(13) to dump 
the older(12) database. Then the 13 version of restore to load the 
version 13 database.


> *t
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



On 22.06.22 21:25, Adrian Klaver wrote:
> On 6/22/22 12:17, Tomas Pospisek wrote:
>> Hi all,
>>
>> while doing `cat pg_dump.dump | psql` I get the above message. Note 
>> that `pg_dump.dump` contains:
>>
>>      CREATE DATABASE some_db WITH TEMPLATE = my_own_template ENCODING 
>> = 'UTF8' LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8';
>>
>> What is exactly the problem? I understand that en_US.UTF-8 and 
>> en_US.utf-8 is not *exactly* the same string.
>>
>> However I do not understand how the difference came to be. And I do 
>> not know what the "right" way is and how to proceed from here.
>>
>> If I `pg_dump --create` some DB on the new server (13.7-1.pgdg18.04+1) 
>> I get:
>>
>>      CREATE DATABASE ... ENCODING = 'UTF8' LOCALE = 'en_US.utf-8';
>>
>> When I do the same on the old server (12.8-1.pgdg20.04+1) I get:
>>
>>      CREATE DATABASE ... ENCODING = 'UTF8' LC_COLLATE = 'en_US.UTF-8' 
>> LC_CTYPE = 'en_US.UTF-8';
>>
>> Any hints or help?
> 
> Are dumping/restoring from one version of Postgres to another?

Yes, indeed!

> If from older to newer then use the new version of pg_dump(13) to dump 
> the older(12) database. Then the 13 version of restore to load the 
> version 13 database.

I will. Thanks a lot Adrian!
*t



Tomas Pospisek <tpo2@sourcepole.ch> writes:
> On 22.06.22 21:25, Adrian Klaver wrote:
>> On 6/22/22 12:17, Tomas Pospisek wrote:
>>> If I `pg_dump --create` some DB on the new server (13.7-1.pgdg18.04+1) 
>>> I get:
>>> 
>>>      CREATE DATABASE ... ENCODING = 'UTF8' LOCALE = 'en_US.utf-8';
>>> 
>>> When I do the same on the old server (12.8-1.pgdg20.04+1) I get:
>>> 
>>>      CREATE DATABASE ... ENCODING = 'UTF8' LC_COLLATE = 'en_US.UTF-8' 
>>> LC_CTYPE = 'en_US.UTF-8';

>> Are dumping/restoring from one version of Postgres to another?

> Yes, indeed!

This is probably more about dumping from different operating systems.
The spelling of the locale name is under the control of the OS,
and Postgres doesn't know very much about the semantics of it
(so I think we conservatively assume that any difference in
spelling is significant).

Best bet might be to edit the dump file to adjust the locale
spellings to match your new system.

            regards, tom lane



On 22.06.22 22:18, Tomas Pospisek wrote:
> On 22.06.22 21:25, Adrian Klaver wrote:
>> On 6/22/22 12:17, Tomas Pospisek wrote:
>>> Hi all,
>>>
>>> while doing `cat pg_dump.dump | psql` I get the above message. Note 
>>> that `pg_dump.dump` contains:
>>>
>>>      CREATE DATABASE some_db WITH TEMPLATE = my_own_template ENCODING 
>>> = 'UTF8' LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8';
>>>
>>> What is exactly the problem? I understand that en_US.UTF-8 and 
>>> en_US.utf-8 is not *exactly* the same string.
>>>
>>> However I do not understand how the difference came to be. And I do 
>>> not know what the "right" way is and how to proceed from here.
>>>
>>> If I `pg_dump --create` some DB on the new server 
>>> (13.7-1.pgdg18.04+1) I get:
>>>
>>>      CREATE DATABASE ... ENCODING = 'UTF8' LOCALE = 'en_US.utf-8';
>>>
>>> When I do the same on the old server (12.8-1.pgdg20.04+1) I get:
>>>
>>>      CREATE DATABASE ... ENCODING = 'UTF8' LC_COLLATE = 'en_US.UTF-8' 
>>> LC_CTYPE = 'en_US.UTF-8';
>>>
>>> Any hints or help?
>>
>> Are dumping/restoring from one version of Postgres to another?
> 
> Yes, indeed!
> 
>> If from older to newer then use the new version of pg_dump(13) to dump 
>> the older(12) database. Then the 13 version of restore to load the 
>> version 13 database.
> 
> I will. Thanks a lot Adrian!

So I used both pg_dump and pg_restore from the newer machine. Result is 
still the same.  So I'll use Tom Lane's suggestion too and fix the 
'UTF-8' spelling in the dump file:

Tom Lane wrote:

 > This is probably more about dumping from different operating systems.
 > The spelling of the locale name is under the control of the OS,
 > and Postgres doesn't know very much about the semantics of it
 > (so I think we conservatively assume that any difference in
 > spelling is significant).
 >
 > Best bet might be to edit the dump file to adjust the locale
 > spellings to match your new system.

Many thanks to both Tom & Adrian!!!
*t



On 6/23/22 00:37, Tomas Pospisek wrote:
> On 22.06.22 22:18, Tomas Pospisek wrote:
>> On 22.06.22 21:25, Adrian Klaver wrote:
>>> On 6/22/22 12:17, Tomas Pospisek wrote:

> 
> So I used both pg_dump and pg_restore from the newer machine. Result is 
> still the same.  So I'll use Tom Lane's suggestion too and fix the 
> 'UTF-8' spelling in the dump file:

Not sure why that is necessary? Is seems this is low hanging fruit that 
could dealt with by the equivalent of lower('en_US.UTF-8') = 
lower('en_US.utf-8').

> 
> Tom Lane wrote:
> 
>  > This is probably more about dumping from different operating systems.
>  > The spelling of the locale name is under the control of the OS,
>  > and Postgres doesn't know very much about the semantics of it
>  > (so I think we conservatively assume that any difference in
>  > spelling is significant).
>  >
>  > Best bet might be to edit the dump file to adjust the locale
>  > spellings to match your new system.
> 
> Many thanks to both Tom & Adrian!!!
> *t
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



On 6/23/22 10:11, Adrian Klaver wrote:
> On 6/23/22 00:37, Tomas Pospisek wrote:
>> On 22.06.22 22:18, Tomas Pospisek wrote:
>>> On 22.06.22 21:25, Adrian Klaver wrote:
>>>> On 6/22/22 12:17, Tomas Pospisek wrote:
> 
>>
>> So I used both pg_dump and pg_restore from the newer machine. Result 
>> is still the same.  So I'll use Tom Lane's suggestion too and fix the 
>> 'UTF-8' spelling in the dump file:
> 
> Not sure why that is necessary? Is seems this is low hanging fruit that 
> could dealt with by the equivalent of lower('en_US.UTF-8') = 
> lower('en_US.utf-8').

Well that was clear as mud. My point was that I don't see why the end 
user should have to do this when it could be handled internally in the 
pg_restore code.

> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



On 23.06.22 20:21, Adrian Klaver wrote:
> On 6/23/22 10:11, Adrian Klaver wrote:
>> On 6/23/22 00:37, Tomas Pospisek wrote:
>>> On 22.06.22 22:18, Tomas Pospisek wrote:
>>>> On 22.06.22 21:25, Adrian Klaver wrote:
>>>>> On 6/22/22 12:17, Tomas Pospisek wrote:
>>
>>>
>>> So I used both pg_dump and pg_restore from the newer machine. Result 
>>> is still the same.  So I'll use Tom Lane's suggestion too and fix the 
>>> 'UTF-8' spelling in the dump file:
>>
>> Not sure why that is necessary? Is seems this is low hanging fruit 
>> that could dealt with by the equivalent of lower('en_US.UTF-8') = 
>> lower('en_US.utf-8').
> 
> Well that was clear as mud.

:-D

> My point was that I don't see why the end 
> user should have to do this when it could be handled internally in the 
> pg_restore code.

That would indeed be very friendly of pg_restore if it'd take that 
little task off the user :-)

+1 of course :-)

*t



On 23.06.22 20:21, Adrian Klaver wrote:
>> Not sure why that is necessary? Is seems this is low hanging fruit 
>> that could dealt with by the equivalent of lower('en_US.UTF-8') = 
>> lower('en_US.utf-8').
> 
> Well that was clear as mud. My point was that I don't see why the end 
> user should have to do this when it could be handled internally in the 
> pg_restore code.

There are also cases where on one system the locale is spelled 
xx_YY.UTF8 and on another xx_YY.UTF-8.  If we start adjusting for case, 
should we adjust for this as well?  What other cases are there?

So this would need a bit more research.



Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> There are also cases where on one system the locale is spelled 
> xx_YY.UTF8 and on another xx_YY.UTF-8.  If we start adjusting for case, 
> should we adjust for this as well?  What other cases are there?
> So this would need a bit more research.

Yeah.  I would tend to define it as "if the locale names both end
in dot followed by a recognizable encoding name, then (a) make
sure that the encoding values are the same, then (b) strip off
the dot and encoding before comparing the rest".  We already have
code to look up encoding names, and it handles all these variants.
So this doesn't seem like it should take much new code, though
maybe some refactoring would be needed to make the lookup code
available where this check is happening.

I'm not personally very excited about this, but if someone wanted
to prepare a patch...

            regards, tom lane