Обсуждение: DROP COLLATION vs pg_collation question

Поиск
Список
Период
Сортировка

DROP COLLATION vs pg_collation question

От
Karsten Hilbert
Дата:
Dear list members,

maybe a naive question but I was unable to find an answer in
the fine manual (sv_SE being an example)

Does running

    DROP COLLATION IF EXISTS pg_catalog."sv_SE"

also remove the corresponding row from pg_collation (assuming
nothing depends on collation sv_SE) ?

Experimentation seems to hint that way but I was unable to
confirm.

Thanks
Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B



Re: DROP COLLATION vs pg_collation question

От
Laurenz Albe
Дата:
On Tue, 2024-06-11 at 23:15 +0200, Karsten Hilbert wrote:
> maybe a naive question but I was unable to find an answer in
> the fine manual (sv_SE being an example)
>
> Does running
>
>     DROP COLLATION IF EXISTS pg_catalog."sv_SE"
>
> also remove the corresponding row from pg_collation (assuming
> nothing depends on collation sv_SE) ?
>
> Experimentation seems to hint that way but I was unable to
> confirm.

Yes, that will delete a row from "pg_collation".  System catalogs
like "pg_collation" is where the database metadata are actually
stored.

Note that with DROP COLLATION you can only remove collations
that belong to the encoding of your current database.

Yours,
Laurenz Albe



Re: DROP COLLATION vs pg_collation question

От
Karsten Hilbert
Дата:
> >     DROP COLLATION IF EXISTS pg_catalog."...."
>
> Yes, that will delete a row from "pg_collation".

Many thanks.

> Note that with DROP COLLATION you can only remove collations
> that belong to the encoding of your current database.

A-ha !  Can that bit be found anywhere in the docs ?

IOW, the following code is exactly useless ?

(because of the "collencoding <> _db_encoding" business ;-)

    create function gm.remove_unneeded_collations()
        returns void
        language plpgsql
        security definer
        as '
    DECLARE
        _rec record;
        _db_name text;
        _db_encoding integer;
    BEGIN
        SELECT pg_catalog.current_database() INTO _db_name;
        SELECT encoding INTO _db_encoding FROM pg_database WHERE datname = _db_name;
        RAISE NOTICE ''database [%]: removing collations for encodings other than the database encoding [%]'',
_db_name,pg_catalog.pg_encoding_to_char(_db_encoding); 
        FOR _rec IN (
            SELECT oid, collnamespace, collname, collencoding
            FROM pg_collation
            WHERE
                oid > 1000
                    AND
                collencoding IS NOT NULL
                    AND
                collencoding <> -1
                    AND
                collencoding <> _db_encoding
        ) LOOP
            RAISE NOTICE ''dropping collation #% "%.%" (encoding: %)'', _rec.oid, _rec.collnamespace::regnamespace,
_rec.collname,pg_catalog.pg_encoding_to_char(_rec.collencoding); 
            BEGIN
                EXECUTE ''DROP COLLATION IF EXISTS '' || _rec.collnamespace::regnamespace || ''."'' || _rec.collname ||
''"'';
            EXCEPTION
                WHEN undefined_object THEN RAISE NOTICE ''collation does not seem to exist (perhaps for the DB encoding
?)'';
            END;
        END LOOP;
    END;';


The reason for this being the wish to reduce the risk surface
for locale version information changes at the OS level by
removing collations not relevant to a given database.

Thanks,
Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B



Re: DROP COLLATION vs pg_collation question

От
Laurenz Albe
Дата:
On Wed, 2024-06-12 at 21:13 +0200, Karsten Hilbert wrote:
> > Note that with DROP COLLATION you can only remove collations
> > that belong to the encoding of your current database.
>
> A-ha !  Can that bit be found anywhere in the docs ?

src/backend/catalog/namespace.c:

  /*
   * get_collation_oid - find a collation by possibly qualified name
   *
   * Note that this will only find collations that work with the current
   * database's encoding.
   */
  Oid
  get_collation_oid(List *collname, bool missing_ok)

Yours,
Laurenz Albe



Re: DROP COLLATION vs pg_collation question

От
Karsten Hilbert
Дата:
Am Thu, Jun 13, 2024 at 09:49:46AM +0200 schrieb Laurenz Albe:

> > > Note that with DROP COLLATION you can only remove collations
> > > that belong to the encoding of your current database.
>
> src/backend/catalog/namespace.c:
>
>   /*
>    * get_collation_oid - find a collation by possibly qualified name
>    *
>    * Note that this will only find collations that work with the current
>    * database's encoding.
>    */
>   Oid
>   get_collation_oid(List *collname, bool missing_ok)

Thanks.

Are collations per-database or per-cluster objects ?

I am asking because I seem to not be enabled to

1) use a collation that's intended for an encoding different
   from the database encoding

    -> makes sense

2) remove a collation that's intended for an encoding
   different from the database encoding

    -> so far so good, ignore them, but

3) update collation version information in pg_collations for
   collations intended for an encoding different from the
   database encoding (ALTER COLLATION ... REFRESH VERSION fails)

which in effect would mean that -- upon change of collation
versions in the underlying operating system (ICU update, libc
update) -- one would have to live with outdated version
information in pg_collations short of dump/sed/restore or
some such ?

I'm pretty sure I am overlooking something.

Best,
Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B



Re: DROP COLLATION vs pg_collation question

От
Laurenz Albe
Дата:
On Fri, 2024-06-14 at 22:08 +0200, Karsten Hilbert wrote:
> Are collations per-database or per-cluster objects ?

Each database has its own "pg_collation" catalog table.

So they are local to the database, but the collations themselves
are defined by an external library, so the implementation is shared.

> I am asking because I seem to not be enabled to
>
> 3) update collation version information in pg_collations for
>    collations intended for an encoding different from the
>    database encoding (ALTER COLLATION ... REFRESH VERSION fails)
>
> which in effect would mean that -- upon change of collation
> versions in the underlying operating system (ICU update, libc
> update) -- one would have to live with outdated version
> information in pg_collations short of dump/sed/restore or
> some such ?

That should not happen.  What error do you get when you

  ALTER COLLATION ... REFRESH VERSION

Does the following give you the same error?

  ALTER DATABASE ... REFRESH COLLATION VERSION

Yours,
Laurenz Albe



Re: DROP COLLATION vs pg_collation question

От
Karsten Hilbert
Дата:
Am Sun, Jun 16, 2024 at 06:53:31AM +0200 schrieb Laurenz Albe:

> On Fri, 2024-06-14 at 22:08 +0200, Karsten Hilbert wrote:
> > Are collations per-database or per-cluster objects ?
>
> Each database has its own "pg_collation" catalog table.
>
> So they are local to the database,

I would have thought so, thanks for confirming.

> but the collations themselves
> are defined by an external library, so the implementation is shared.

Which in turn means I cannot at all truly _remove_ collations
from a cluster at the SQL level, only make them invisible
(and thereby not-to-be-used) inside a particular database by
removing them from pg_collations via DROP COLLATION, right ?

> > 3) update collation version information in pg_collations for
> >    collations intended for an encoding different from the
> >    database encoding (ALTER COLLATION ... REFRESH VERSION fails)
> >
> > which in effect would mean that -- upon change of collation
> > versions in the underlying operating system (ICU update, libc
> > update) -- one would have to live with outdated version
> > information in pg_collations short of dump/sed/restore or
> > some such ?
>
> That should not happen.  What error do you get when you
>
>   ALTER COLLATION ... REFRESH VERSION

The error I got was to the effect of insufficient permissions
(the connected user wasn't the owner of the collation).
ALTERing as superuser updated collation version information
just fine, so PEBKAC.

For the record: Yes, collation versions CAN be updated
regardless of whether a given collation applies to the
database's encoding.

> Does the following give you the same error?
>
>   ALTER DATABASE ... REFRESH COLLATION VERSION

That did not show the same error because I ran it as database
owner (?). The database owner does not own the collations,
however, which made the above fail.

Thanks,
Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B



Re: DROP COLLATION vs pg_collation question

От
Laurenz Albe
Дата:
On Sun, 2024-06-16 at 11:27 +0200, Karsten Hilbert wrote:
> > the collations themselves
> > are defined by an external library, so the implementation is shared.
>
> Which in turn means I cannot at all truly _remove_ collations
> from a cluster at the SQL level, only make them invisible
> (and thereby not-to-be-used) inside a particular database by
> removing them from pg_collations via DROP COLLATION, right ?

As far as PostgreSQL is concerned, you can remove them.

You cannot remove the C library, but a PostgreSQL user can only use
the collation if there is a collation defined in PostgreSQL.

(Actually, that's not quite true: in CREATE DATABASE, you can use
ICU collations.  That does not depend on the collations defined in
pg_collation.)

Also, there is nothing that keeps a user from running CREATE COLLATION
to create a collation in a schema where the user can CREATE objects.

I wouldn't try too hard to make it impossible for users to use a
collation you don't want.  Dropping the collations is good enough to
keep a user from using the wrong collation by mistake.

Yours,
Laurenz Albe



Re: DROP COLLATION vs pg_collation question

От
Tom Lane
Дата:
Laurenz Albe <laurenz.albe@cybertec.at> writes:
> On Sun, 2024-06-16 at 11:27 +0200, Karsten Hilbert wrote:
>> Which in turn means I cannot at all truly _remove_ collations
>> from a cluster at the SQL level, only make them invisible
>> (and thereby not-to-be-used) inside a particular database by
>> removing them from pg_collations via DROP COLLATION, right ?

> As far as PostgreSQL is concerned, you can remove them.

It's really kind of moot, since you can't change the encoding
of an existing database.  So any pg_collation entries that are
for an incompatible encoding cannot be used for anything in that
database, and they might as well not be there.  The reason they
are there is merely an implementation detail: CREATE DATABASE clones
those catalogs from the single copy of pg_collation in template0,
which therefore had better include all collations that might be
needed.

            regards, tom lane



Re: DROP COLLATION vs pg_collation question

От
Karsten Hilbert
Дата:
Am Sun, Jun 16, 2024 at 04:38:49PM -0400 schrieb Tom Lane:

> It's really kind of moot, since you can't change the encoding
> of an existing database.  So any pg_collation entries that are
> for an incompatible encoding cannot be used for anything in that
> database, and they might as well not be there.  The reason they
> are there is merely an implementation detail: CREATE DATABASE clones
> those catalogs from the single copy of pg_collation in template0,
> which therefore had better include all collations that might be
> needed.

I see, and since any database can be used as a template for
more databases, which can be create with an encoding
different from the template, it doesn't really make too much
sense to be able to remove even pg_collation entries.

So, DROP COLLATION is somewhat of a smoking gun pointed at my
foot :-)

Thanks,
Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B



Re: DROP COLLATION vs pg_collation question

От
Karsten Hilbert
Дата:
Am Tue, Jun 18, 2024 at 03:02:56PM +0200 schrieb Karsten Hilbert:

> I see, and since any database can be used as a template for
> more databases, which can be create with an encoding
> different from the template,

Proving myself wrong:


root@hermes:~/tmp# sudo -u postgres psql -e -f /tmp/db.sql
Output format is wrapped.
drop database if exists new_tmpl;
DROP DATABASE
create database new_tmpl with encoding = 'utf8';
CREATE DATABASE
You are now connected to database "new_tmpl" as user "postgres".
select current_database();
 current_database
------------------
 new_tmpl
(1 row)

drop database if exists new_latin1;
psql:/tmp/db.sql:8: HINWEIS:  Datenbank »new_latin1« existiert nicht, wird übersprungen
DROP DATABASE
create database new_latin1 with template = new_tmpl encoding = 'latin1' locale = 'de_DE@latin1';
psql:/tmp/db.sql:9: FEHLER:  neue Kodierung (LATIN1) ist inkompatibel mit der Kodierung der Template-Datenbank (UTF8)
TIP:  Verwenden Sie die gleiche Kodierung wie die Template-Datenbank oder verwenden Sie template0 als Template.

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B



Re: DROP COLLATION vs pg_collation question

От
Adrian Klaver
Дата:
On 6/18/24 06:32, Karsten Hilbert wrote:
> Am Tue, Jun 18, 2024 at 03:02:56PM +0200 schrieb Karsten Hilbert:
> 
>> I see, and since any database can be used as a template for
>> more databases, which can be create with an encoding
>> different from the template,
> 
> Proving myself wrong:
> 
> 
> root@hermes:~/tmp# sudo -u postgres psql -e -f /tmp/db.sql
> Output format is wrapped.
> drop database if exists new_tmpl;
> DROP DATABASE
> create database new_tmpl with encoding = 'utf8';
> CREATE DATABASE
> You are now connected to database "new_tmpl" as user "postgres".
> select current_database();
>   current_database
> ------------------
>   new_tmpl
> (1 row)
> 
> drop database if exists new_latin1;
> psql:/tmp/db.sql:8: HINWEIS:  Datenbank »new_latin1« existiert nicht, wird übersprungen
> DROP DATABASE
> create database new_latin1 with template = new_tmpl encoding = 'latin1' locale = 'de_DE@latin1';
> psql:/tmp/db.sql:9: FEHLER:  neue Kodierung (LATIN1) ist inkompatibel mit der Kodierung der Template-Datenbank
(UTF8)
> TIP:  Verwenden Sie die gleiche Kodierung wie die Template-Datenbank oder verwenden Sie template0 als Template.

That is covered here:

https://www.postgresql.org/docs/16/manage-ag-templatedbs.html

"Another common reason for copying template0 instead of template1 is 
that new encoding and locale settings can be specified when copying 
template0, whereas a copy of template1 must use the same settings it 
does. This is because template1 might contain encoding-specific or 
locale-specific data, while template0 is known not to."

Substitute <db_name other then template0> for template1 and you can get 
the error you received.

> 
> Karsten
> --
> GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B
> 
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com