Обсуждение: BUG #17923: Excessive warnings of collation version mismatch in logs
The following bug has been logged on the website: Bug reference: 17923 Logged by: Hans Buschmann Email address: buschmann@nidsa.net PostgreSQL version: 15.2 Operating system: Fedora 38 x86-64 Description: We recently upgraded our production system from FEDORA 37 to FEDORA 38 (x86-64). This also upgraded the glibc from 2.36 to 2.37. Quite a period later we observed excessive logging (every 30 seconds) of the following messages: LSAM 2023-04-18 17:10:08 CEST 01000 1:> WARNING: database "template1" has a collation version mismatch LSAM 2023-04-18 17:10:08 CEST 01000 2:> DETAIL: The database was created using collation version 2.36, but the operating system provides version 2.37. LSAM 2023-04-18 17:10:08 CEST 01000 3:> HINT: Rebuild all objects in this database that use the default collation and run ALTER DATABASE template1 REFRESH COLLATION VERSION, or build PostgreSQL with the right library version. LSAM 2023-04-18 17:10:38 CEST 01000 1:> WARNING: database "postgres" has a collation version mismatch LSAM 2023-04-18 17:10:38 CEST 01000 2:> DETAIL: The database was created using collation version 2.36, but the operating system provides version 2.37. LSAM 2023-04-18 17:10:38 CEST 01000 3:> HINT: Rebuild all objects in this database that use the default collation and run ALTER DATABASE postgres REFRESH COLLATION VERSION, or build PostgreSQL with the right library version. ... etc. Our production database has collation c, so it is not mentioned in the logs. Unfortunately the frequent repetition of these messages caused more the 48000 identical entries in about 14 days. The following questions arise: 1. Why isn't it sufficient to report this message only at server startup ? The messages are sent to psql too, but only when logging into one of these mostly unused databases, so it is not observed in a regular routine. 2. Why is this message reported for databases with no objects or similar cases? These empty databases have no objects, other databases may have only objects where a collation version change doesn't matter, such as indexes on non-string data types like integers. In these cases no warning is necessary. 3. Is there a real collation change in glibc 2.37 or is it only a version number mismatch with changes only in other parts of the library? 4. Could the version information be shown in \l output of psql? 5. Wouldn't it make sense to automatically update the collation version number of such databases (without any collation-related objects) at server startup? This resembles the praxis of initdb, which quietly sets the collation version number at creation time mostly without any user intervention. Thanks for looking Hans Buschmann
On 07.05.23 12:13, PG Bug reporting form wrote: > 1. Why isn't it sufficient to report this message only at server startup ? > The messages are sent to psql too, but only when logging into one of these > mostly unused databases, so it is not observed in a regular routine. Each database is separate. The server doesn't look at each database on startup. This is only done when you connect to a given database. > 2. Why is this message reported for databases with no objects or similar > cases? When you connect to a database, the server doesn't scan all objects in the database to see if they might be affected. That would be too slow. If you have no objects in the database, you can just run the provided REFRESH command to make the warning go away. > 3. Is there a real collation change in glibc 2.37 or is it only a version > number mismatch with changes only in other parts of the library? We don't know, because glibc isn't reliable at telling us about it. So we take the safe course. > 4. Could the version information be shown in \l output of psql? That's a valid idea. > 5. Wouldn't it make sense to automatically update the collation version > number of such databases (without any collation-related objects) at server > startup? See #2. We are not going to start scanning each database on each connection to check whether it might be empty.
Hello Peter,
Thank you for looking at.
I think, there is a little misunderstanding:
The problem is not at connection time (I only mentioned it for info).
The problem is the endless repeating of these log entries without user login (cut a little bit at line end):
17:09:08 CEST 01000 1:> WARNING: database "template1" has a collation version mismatch
17:09:08 CEST 01000 2:> DETAIL: The database was created using collation version 2.36,
17:09:08 CEST 01000 3:> HINT: Rebuild all objects in this database that use the defaul
17:09:38 CEST 01000 1:> WARNING: database "postgres" has a collation version mismatch
17:09:38 CEST 01000 2:> DETAIL: The database was created using collation version 2.36,
17:09:38 CEST 01000 3:> HINT: Rebuild all objects in this database that use the defaul
17:10:08 CEST 01000 1:> WARNING: database "template1" has a collation version mismatch
17:10:08 CEST 01000 2:> DETAIL: The database was created using collation version 2.36,
17:10:08 CEST 01000 3:> HINT: Rebuild all objects in this database that use the defaul
17:10:38 CEST 01000 1:> WARNING: database "postgres" has a collation version mismatch
17:10:38 CEST 01000 2:> DETAIL: The database was created using collation version 2.36,
17:10:38 CEST 01000 3:> HINT: Rebuild all objects in this database that use the defaul
17:11:08 CEST 01000 1:> WARNING: database "template1" has a collation version mismatch
17:11:08 CEST 01000 2:> DETAIL: The database was created using collation version 2.36,
17:11:08 CEST 01000 3:> HINT: Rebuild all objects in this database that use the defaul
17:11:38 CEST 01000 1:> WARNING: database "postgres" has a collation version mismatch
17:11:38 CEST 01000 2:> DETAIL: The database was created using collation version 2.36,
17:11:38 CEST 01000 3:> HINT: Rebuild all objects in this database that use the defaul
17:12:08 CEST 01000 1:> WARNING: database "template1" has a collation version mismatch
17:12:08 CEST 01000 2:> DETAIL: The database was created using collation version 2.36,
17:12:08 CEST 01000 3:> HINT: Rebuild all objects in this database that use the defaul
17:12:38 CEST 01000 1:> WARNING: database "postgres" has a collation version mismatch
17:12:38 CEST 01000 2:> DETAIL: The database was created using collation version 2.36,
17:12:38 CEST 01000 3:> HINT: Rebuild all objects in this database that use the defaul
17:13:08 CEST 01000 1:> WARNING: database "template1" has a collation version mismatch
17:13:08 CEST 01000 2:> DETAIL: The database was created using collation version 2.36,
17:13:08 CEST 01000 3:> HINT: Rebuild all objects in this database that use the defaul
Let me explain my suggestions (Please remind that I am not familiar with the source code, internal data structures or extensive C hackking, so I'll present it in a more conceptual way):
1. Current situation:
The server checks every minute for every database if there is a collation version mismatch.
If it doesn't match, it emits every minute a warning (without user login).
This hits performance and floods the logs.
Check this mismatch only at server startup (the glic cannot be exchanged on a running server!)
and report the warnings only once per database.
It should be investigated, by which algorithm the server checks that every minute and this should be avoided if possible.
3. Further suggestion:
3. Further suggestion:
When the mismatch occurs at server startup, check if the database has any objects (more selective: if it has any indexes or materialized views)
This is a very short one-time operation only at server startup.
If there are no such objects in the mentioned database, change the collation version silently to the current version (do the ALTER DATABASE xxxxx REFRESH COLLATION VERSION automatically)
4. Elaborated premium solution with focus to easy administration:
With the ever growing databases easy recovery of collation problems is not easy for the user (see other discussions on ICU defaults etc. on hackers).
With the ever growing databases easy recovery of collation problems is not easy for the user (see other discussions on ICU defaults etc. on hackers).
So for every collation a hash value could be computed and stored in the catalogs.
If there is now a mismatch, the warnings can inform the users of the changed collations (not every collation is subject to frequent changes!).
This circumvents the indeterminable changes of a glibc version change.
A query to the catalogs for such a database can then provide the objects where the altered collations are used and the user can take appropriate action (reindexing a big database is also very expensive!).
This certainly is much more work then only stopping the endless warnings but gives a fair amount of self-administration guide to the user.
PS: perhaps you may move the discussion to hackers to reach a broader audience.
Hans Buschmann
Von: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Gesendet: Mittwoch, 10. Mai 2023 07:30
An: Hans Buschmann; pgsql-bugs@lists.postgresql.org
Betreff: Re: BUG #17923: Excessive warnings of collation version mismatch in logs
Gesendet: Mittwoch, 10. Mai 2023 07:30
An: Hans Buschmann; pgsql-bugs@lists.postgresql.org
Betreff: Re: BUG #17923: Excessive warnings of collation version mismatch in logs
On 07.05.23 12:13, PG Bug reporting form wrote:
> 1. Why isn't it sufficient to report this message only at server startup ?
> The messages are sent to psql too, but only when logging into one of these
> mostly unused databases, so it is not observed in a regular routine.
Each database is separate. The server doesn't look at each database on
startup. This is only done when you connect to a given database.
> 2. Why is this message reported for databases with no objects or similar
> cases?
When you connect to a database, the server doesn't scan all objects in
the database to see if they might be affected. That would be too slow.
If you have no objects in the database, you can just run the provided
REFRESH command to make the warning go away.
> 3. Is there a real collation change in glibc 2.37 or is it only a version
> number mismatch with changes only in other parts of the library?
We don't know, because glibc isn't reliable at telling us about it. So
we take the safe course.
> 4. Could the version information be shown in \l output of psql?
That's a valid idea.
> 5. Wouldn't it make sense to automatically update the collation version
> number of such databases (without any collation-related objects) at server
> startup?
See #2. We are not going to start scanning each database on each
connection to check whether it might be empty.
> 1. Why isn't it sufficient to report this message only at server startup ?
> The messages are sent to psql too, but only when logging into one of these
> mostly unused databases, so it is not observed in a regular routine.
Each database is separate. The server doesn't look at each database on
startup. This is only done when you connect to a given database.
> 2. Why is this message reported for databases with no objects or similar
> cases?
When you connect to a database, the server doesn't scan all objects in
the database to see if they might be affected. That would be too slow.
If you have no objects in the database, you can just run the provided
REFRESH command to make the warning go away.
> 3. Is there a real collation change in glibc 2.37 or is it only a version
> number mismatch with changes only in other parts of the library?
We don't know, because glibc isn't reliable at telling us about it. So
we take the safe course.
> 4. Could the version information be shown in \l output of psql?
That's a valid idea.
> 5. Wouldn't it make sense to automatically update the collation version
> number of such databases (without any collation-related objects) at server
> startup?
See #2. We are not going to start scanning each database on each
connection to check whether it might be empty.
Re: AW: BUG #17923: Excessive warnings of collation version mismatch in logs
От
Peter Eisentraut
Дата:
On 11.05.23 12:07, Hans Buschmann wrote: > The problem is not at connection time (I only mentioned it for info). > > > The problem is the endless repeating of these log entries without user > login (cut a little bit at line end): > > > 17:09:08 CEST 01000 1:> WARNING: database "template1" has a collation > version mismatch > 17:09:08 CEST 01000 2:> DETAIL: The database was created using > collation version 2.36, > 17:09:08 CEST 01000 3:> HINT: Rebuild all objects in this database > that use the defaul I guess this is from the autovacuum background processes. Maybe we should disable the warnings for those?
AW: AW: BUG #17923: Excessive warnings of collation version mismatch in logs
От
Hans Buschmann
Дата:
+1 for disabling the warnings there!
But I wonder, why this test is necessary at every autovacuum call at all?
Woudn't it be sufficient to test it at server start time only once?
Are there perhaps other places with this test?
Hans Buschmann
Re: AW: AW: BUG #17923: Excessive warnings of collation version mismatch in logs
От
Peter Eisentraut
Дата:
On 22.05.23 09:11, Hans Buschmann wrote: > But I wonder, why this test is necessary at every autovacuum call at all? > > Woudn't it be sufficient to test it at server start time only once? The main server process (postmaster) cannot access the database contents. So we can only do it when we start a session process.