Обсуждение: Help. The database was created using collation version 2.17, but the operating system provides version 2.34.

Поиск
Список
Период
Сортировка
Hello,

I am in the process of migrating DB to Alma9 host. The databse
is rather large - few TBs.

I have run pg_basebackup on Alma9 host and established replication from production to it. The idea is to quickly switch
frommaster to this new host during downtime.  

Establishing replication went fine. Source postgresql version is 15.6, destination is 15.7

When I psql into replica I get:

WARNING:  database "xxx" has a collation version mismatch
DETAIL:  The database was created using collation version 2.17, but the operating system provides version 2.34.
HINT:  Rebuild all objects in this database that use the default collation and run ALTER DATABASE xxx REFRESH COLLATION
VERSION,or build PostgreSQL with the right library version. 

Looking up the issue the solution seems to be

  REINDEX database xxx
  ALTER DATABASE xxx REFRESH COLLATION VERSION

But this defeats the whole idea of having short downtime because REINDEX will take forever.

What is this "or build PostgreSQL with the right library version"?
Is this about 15.7 vs 15.6 or is it about different glibc version between RH7 and Alma9?

Is there a better way to handle it? I cannot afford long downtime.
This came up rather unexpectedly and I am now in a tight situation having to find solution fast. I do not recall having
similarissue when going from RH6 to RH7.  

Thank you for your help.



hi

On 6/20/24 10:23, Dmitry O Litvintsev wrote:
> Hello,
>
> I am in the process of migrating DB to Alma9 host. The databse
> is rather large - few TBs.
>
> I have run pg_basebackup on Alma9 host and established replication from production to it. The idea is to quickly
switchfrom master to this new host during downtime.
 
>
> Establishing replication went fine. Source postgresql version is 15.6, destination is 15.7

You mean physical replication or logical ? In case of logical how did 
you initdb ?

Did you build postgresql from source or using a RH package ?

sorry for not being able to provide anything helpful.

>
> When I psql into replica I get:
>
> WARNING:  database "xxx" has a collation version mismatch
> DETAIL:  The database was created using collation version 2.17, but the operating system provides version 2.34.
> HINT:  Rebuild all objects in this database that use the default collation and run ALTER DATABASE xxx REFRESH
COLLATIONVERSION, or build PostgreSQL with the right library version.
 
>
> Looking up the issue the solution seems to be
>
>    REINDEX database xxx
>    ALTER DATABASE xxx REFRESH COLLATION VERSION
>
> But this defeats the whole idea of having short downtime because REINDEX will take forever.
>
> What is this "or build PostgreSQL with the right library version"?
> Is this about 15.7 vs 15.6 or is it about different glibc version between RH7 and Alma9?
>
> Is there a better way to handle it? I cannot afford long downtime.
> This came up rather unexpectedly and I am now in a tight situation having to find solution fast. I do not recall
havingsimilar issue when going from RH6 to RH7.
 
>
> Thank you for your help.
>
>



> I am in the process of migrating DB to Alma9 host. The databse
> is rather large - few TBs.
>
> I have run pg_basebackup on Alma9 host and established replication from production to it. The idea is to quickly
switchfrom master to this new host during downtime.
 
>
> Establishing replication went fine. Source postgresql version is 15.6, destination is 15.7

What replication did you use? If it's streaming, you will have this
problem. If you can use logical replication and something like:

https://github.com/dimitri/pgcopydb

It will allow you to use the logical replication trick to lower
downtime. It involves copying the database using a parallel transfer
(and some pg_dump/pg_restore wrapping for the structural stuff), and
using logical replication to keep the target up to date. It will allow
you to keep writing to your source DB while it moves, but at the cost
of disabling DDL while it's happening.

Look at the pgcopydb clone --follow documentation for more info.

I'm sure there are other options people can recommend also.

I would test it thoroughly beforehand to make sure it's a fit.

> When I psql into replica I get:
>
> WARNING:  database "xxx" has a collation version mismatch
> DETAIL:  The database was created using collation version 2.17, but the operating system provides version 2.34.
> HINT:  Rebuild all objects in this database that use the default collation and run ALTER DATABASE xxx REFRESH
COLLATIONVERSION, or build PostgreSQL with the right library version.
 
>
> Looking up the issue the solution seems to be
>
>   REINDEX database xxx
>   ALTER DATABASE xxx REFRESH COLLATION VERSION

Here we've taken two approaches when we just "copied the disks over"
so to speak:

* Backport the collation library (tricky, I can explain this deeper,
but it's tricky)
* Reindex after migration (slow but less tricky)

We had this problem going from xenial to focal, and we had to pin to
something compatible with the xenial libc.

> But this defeats the whole idea of having short downtime because REINDEX will take forever.
>
> What is this "or build PostgreSQL with the right library version"?
> Is this about 15.7 vs 15.6 or is it about different glibc version between RH7 and Alma9?
>
> Is there a better way to handle it? I cannot afford long downtime.
> This came up rather unexpectedly and I am now in a tight situation having to find solution fast. I do not recall
havingsimilar issue when going from RH6 to RH7.
 

Unfortunately you've hit a bad problem that a few of us have probably
already been through. At least you don't have thousands of these
things :-).

ken.



    Dmitry O Litvintsev wrote:

> WARNING:  database "xxx" has a collation version mismatch
> DETAIL:  The database was created using collation version 2.17, but the
> operating system provides version 2.34.
> HINT:  Rebuild all objects in this database that use the default collation
> and run ALTER DATABASE xxx REFRESH COLLATION VERSION, or build PostgreSQL
> with the right library version.

This upgrade comprises the major change in GNU libc 2.28,
so indeed text indexes created by 2.17 are very likely unsafe to use
on your new server.
See https://wiki.postgresql.org/wiki/Locale_data_changes

>  REINDEX database xxx
> ALTER DATABASE xxx REFRESH COLLATION VERSION
> But this defeats the whole idea of having short downtime because REINDEX
> will take forever.

The indexes that don't involve collatable types (text,varchar), and those
that use the C collation don't need to be reindexed.
Maybe you can reduce significantly the downtime by including only
the ones that matter.

The wiki page gives the query to obtain the list of affected indexes:

SELECT DISTINCT indrelid::regclass::text, indexrelid::regclass::text,
collname, pg_get_indexdef(indexrelid)
FROM (SELECT indexrelid, indrelid, indcollation[i] coll FROM pg_index,
generate_subscripts(indcollation, 1) g(i)) s
  JOIN pg_collation c ON coll=c.oid
WHERE collprovider IN ('d', 'c') AND collname NOT IN ('C', 'POSIX');

> I do not recall having similar issue when going from RH6 to RH7.

This warning was added relatively recently, in Postgres 15 (october 2022).


Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite



On Thu, Jun 20, 2024 at 3:23 AM Dmitry O Litvintsev <litvinse@fnal.gov> wrote:
Hello,

I am in the process of migrating DB to Alma9 host. The databse
is rather large - few TBs.

I have run pg_basebackup on Alma9 host and established replication from production to it. The idea is to quickly switch from master to this new host during downtime.

Establishing replication went fine. Source postgresql version is 15.6, destination is 15.7

When I psql into replica I get:

WARNING:  database "xxx" has a collation version mismatch
DETAIL:  The database was created using collation version 2.17, but the operating system provides version 2.34.
HINT:  Rebuild all objects in this database that use the default collation and run ALTER DATABASE xxx REFRESH COLLATION VERSION, or build PostgreSQL with the right library version.

Looking up the issue the solution seems to be

  REINDEX database xxx
  ALTER DATABASE xxx REFRESH COLLATION VERSION

But this defeats the whole idea of having short downtime because REINDEX will take forever.

What is this "or build PostgreSQL with the right library version"?
Is this about 15.7 vs 15.6 or is it about different glibc version between RH7 and Alma9?

Is there a better way to handle it? I cannot afford long downtime.

You "only" need to REINDEX indices with TEXT (including CHAR and VARCHAR) columns. That may be most of your indices, or very few.

I use this view and query to find such indices:

create or replace view dba.all_indices_types as
    select tbcl.relnamespace::regnamespace::text||'.'||tbcl.relname as table_name
            , ndcl.relname as index_name
            , array_agg(ty.typname order by att.attnum) as index_types
    from pg_class ndcl
        inner join pg_index nd
            on (ndcl.oid = nd.indexrelid and ndcl.relkind = 'i')
        inner join pg_class tbcl
            on (nd.indrelid = tbcl.oid and tbcl.relkind = 'r')
        inner join pg_attribute att
            on att.attrelid = nd.indexrelid
        inner join pg_type ty
            on att.atttypid = ty.oid
    where tbcl.relnamespace::regnamespace::text != 'pg_catalog'
    group by tbcl.relnamespace::regnamespace::text||'.'||tbcl.relname
            , ndcl.relname
    order by 1, 2;

select * 
from dba.all_indices_types 
where index_types && '{"text","varchar","char"}';


Thank you very much for help and pointers to useful information.

Just want to make clear (sorry I am slow on uptake). I should first REINDEX and then ALTER DATABASE xxx REFRESH
COLLATIONVERSION, or first ALTER and then REINDEX or does the order of these action matter at all? 

Thank you,
Dmitry

________________________________________
From: Daniel Verite <daniel@manitou-mail.org>
Sent: Thursday, June 20, 2024 7:02 AM
To: Dmitry O Litvintsev
Cc: pgsql-generallists.postgresql.org
Subject: Re: Help. The database was created using collation version 2.17, but the operating system provides version
2.34.

[EXTERNAL] – This message is from an external sender

        Dmitry O Litvintsev wrote:

> WARNING:  database "xxx" has a collation version mismatch
> DETAIL:  The database was created using collation version 2.17, but the
> operating system provides version 2.34.
> HINT:  Rebuild all objects in this database that use the default collation
> and run ALTER DATABASE xxx REFRESH COLLATION VERSION, or build PostgreSQL
> with the right library version.

This upgrade comprises the major change in GNU libc 2.28,
so indeed text indexes created by 2.17 are very likely unsafe to use
on your new server.
See
https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.postgresql.org_wiki_Locale-5Fdata-5Fchanges&d=DwIFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=7PHi3TDlwkvpc07MjENbOxVFl0u_sEurf250JnUFWCU&m=NhYMQQwCXq5TXHFVjyJeB1eB9KkyHC8FU3lyswgVpUMpAzADq2cffu-xjSUeNCND&s=WKdD4hr8nBJTkQtIcLMagxuGK1yAPTyU2VOmQARksl8&e=

>  REINDEX database xxx
> ALTER DATABASE xxx REFRESH COLLATION VERSION
> But this defeats the whole idea of having short downtime because REINDEX
> will take forever.

The indexes that don't involve collatable types (text,varchar), and those
that use the C collation don't need to be reindexed.
Maybe you can reduce significantly the downtime by including only
the ones that matter.

The wiki page gives the query to obtain the list of affected indexes:

SELECT DISTINCT indrelid::regclass::text, indexrelid::regclass::text,
collname, pg_get_indexdef(indexrelid)
FROM (SELECT indexrelid, indrelid, indcollation[i] coll FROM pg_index,
generate_subscripts(indcollation, 1) g(i)) s
  JOIN pg_collation c ON coll=c.oid
WHERE collprovider IN ('d', 'c') AND collname NOT IN ('C', 'POSIX');

> I do not recall having similar issue when going from RH6 to RH7.

This warning was added relatively recently, in Postgres 15 (october 2022).


Best regards,
--
Daniel Vérité

https://urldefense.proofpoint.com/v2/url?u=https-3A__postgresql.verite.pro_&d=DwIFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=7PHi3TDlwkvpc07MjENbOxVFl0u_sEurf250JnUFWCU&m=NhYMQQwCXq5TXHFVjyJeB1eB9KkyHC8FU3lyswgVpUMpAzADq2cffu-xjSUeNCND&s=yED6Nru4eGTULRzJymNtMgJjXhgirkjOuDzCQnae9Go&e=
Twitter: @DanielVerite



    Dmitry O Litvintsev wrote:

> Just want to make clear (sorry I am slow on uptake). I should first
> REINDEX and then ALTER DATABASE xxx REFRESH COLLATION VERSION, or
> first ALTER and then REINDEX or does the order of these action
> matter at all?

The order does not matter. The ALTER DATABASE command will simply
update the pg_database.datcollversion field with the current version
of libc. That will stop the warning being issued, but it doesn't have
any other concrete effect.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite