Обсуждение: Seeking the correct term of art for the (unique) role that is usually called "postgres"—and the mental model that underlies it all

Поиск
Список
Период
Сортировка
The descriptive designation "the role that owns the SQL part of the implementation of PostgreSQL" is too much of a mouthful for daily use.  And anyway, this notion captures only part of the story that makes "postgres" uniquely what it is—at least on Ubuntu.

MORE...

Here's what my empirical observations told me:

It's easy to characterize this role by describing the way that you get it and the conspicuous unique emergent properties that it has.

* You specify its name using the flag that's shown thus in response to "initdb —help"

  -U, --username=NAME       database superuser name

* It's listed as the owner of the pg_catalog schema, the objects in it, other related schemas in every existing and yet-to-be-created database, and some global things too. Loosely, it's the owner of the SQL part of the implementation of PostgreSQL.

The conventional choice is "postgres". I just did a brand-new PG installation in a brand new Ubuntu VM and I simply ended up with this name when the installation finished. (There was no chance in the installation flow to choose the name.) However, an informal survey among contacts who have PG installations on macOS showed that this "special" role ends up with the name that you gave when you first configured your new macOS for the admin O/S user. It's usually a cryptic form of one's own name—as is my "Bllewell".

But the name "database superuser name" (in "initdb" speak) is useless as a term of art for naming the phenomenon because you can have an unlimited number of roles that are created "with superuser" in a PG cluster.

In another context, the comments in the shipped "pg_hba.conf" file (at least on Ubuntu) include these:

# Database administrative login by Unix domain socket

for (in my case) this line:

local   all             postgres                                peer

(I failed when I tried to add a new one of my own. See below. But I assume that it must be possible—also for a superuser.)

I noticed that in my case, the bare "psql" O/S command connects me to "-d postgres -U postgres" without a password challenge. And the setup had been done by the installation. Is "postgres" role uniquely able to connect in this way with no password challenge? And might "the administrative role" be the term of art that I'm seeking?

— — — — — — — — — — — — — — — — — — — — 

* B.t.w., I tried to set up "peer" authentication for a brand new O/S user that I called "usr" to match a brand new cluster role that I also called "usr". I added a new line in "pg_hba.conf" thus:

local   all             usr                                     peer

(But there already is such a line for the special name "all".)

And I added a new line in "pg_ident.conf" (before, there were none at all) thus:

# MAPNAME       SYSTEM-USERNAME         PG-USERNAME

usr             usr                     usr

But this attempt to connect:

psql -d postgres -U usr

failed with this error:

connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: FATAL:  Peer authentication failed for user "usr"

while this attempt:

psql -h localhost -p 5432 -d postgres -U usr

happily suceeded. I clearly missed some essential other steps. But the doc didn't x-ref me to these.

I also tried this:

initdb \
  -U usr --encoding UTF8 --locale=C --lc-collate=C --lc-ctype=en_US.UTF-8 \
  -A md5 --pwfile=/etc/ybmt-code/misc/.superuser-passwd-for-initdb \
  -D /var/lib/postgresql/11/main

It succeeded. And, after re-start, I could connect as "usr". But I still could not do this using the "peer" method. I saw that, now, "usr" owns the within-cluster PG implementation artifacts.

However, while "initdb" was working, it said this:

The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

So "postgres" is clearly special in ways other than just as the name of the owner of the 
within-cluster implementation. And this was thrust upon me simply by using the recommended "apt install postgresql-11 method. I had no say at all in the choice of this name. (as it happens, I did have a Linux user called "postgres" before I started. But I seem to recall, from PG installations on Ubuntu that I did a few years ago, that the Linux user "postgres" was simply created for me when I didn't already have it.

Where can I read a nice, linear, soup-to-nuts acount of this whole business that introduces, and that consistently uses, the proper terms of art?

On Wed, Oct 26, 2022 at 6:33 PM Bryn Llewellyn <bryn@yugabyte.com> wrote:
The descriptive designation "the role that owns the SQL part of the implementation of PostgreSQL" is too much of a mouthful for daily use. 

Don't think it's documented but I like "bootstrap user" which I've seen bandied about here a bit.

It isn't that special but if the bootstrap user name and o/s user name are not the same name then you've broken an almost universal convention that exists to make stuff like logging it with peer authentication work better.

David J.


2022年10月27日(木) 11:00 David G. Johnston <david.g.johnston@gmail.com>:
>
> On Wed, Oct 26, 2022 at 6:33 PM Bryn Llewellyn <bryn@yugabyte.com> wrote:
>>
>> The descriptive designation "the role that owns the SQL part of the implementation of PostgreSQL" is too much of a
mouthfulfor daily use. 
>
>
> Don't think it's documented but I like "bootstrap user" which I've seen bandied about here a bit.

"bootstrap superuser" is also mentioned a few times in the docs, see e.g.:

    https://www.postgresql.org/docs/devel/sql-grant.html

This recent commit: e530be2c5ce77475d56ccf8f4e0c4872b666ad5f [1] might
also be of interest
to anyone considering the "special-ness" of this role.

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=e530be2c5ce77475d56ccf8f4e0c4872b666ad5f
Regards

Ian Barwick



Can you stop sending me message idk why I need my email back thanks

Sent from my iPhone

> On Oct 26, 2022, at 7:11 PM, Ian Lawrence Barwick <barwick@gmail.com> wrote:
>
> 2022年10月27日(木) 11:00 David G. Johnston <david.g.johnston@gmail.com>:
>>
>>> On Wed, Oct 26, 2022 at 6:33 PM Bryn Llewellyn <bryn@yugabyte.com> wrote:
>>>
>>> The descriptive designation "the role that owns the SQL part of the implementation of PostgreSQL" is too much of a
mouthfulfor daily use. 
>>
>>
>> Don't think it's documented but I like "bootstrap user" which I've seen bandied about here a bit.
>
> "bootstrap superuser" is also mentioned a few times in the docs, see e.g.:
>
>    https://www.postgresql.org/docs/devel/sql-grant.html
>
> This recent commit: e530be2c5ce77475d56ccf8f4e0c4872b666ad5f [1] might
> also be of interest
> to anyone considering the "special-ness" of this role.
>
> [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=e530be2c5ce77475d56ccf8f4e0c4872b666ad5f
> Regards
>
> Ian Barwick
>
>



On 10/26/22 18:33, Bryn Llewellyn wrote:
> The descriptive designation "the role that owns the SQL part of the 
> implementation of PostgreSQL" is too much of a mouthful for daily use. 
>   And anyway, this notion captures only part of the story that makes 
> "postgres" uniquely what it is—at least on Ubuntu.
> 
> MORE...
> 
> Here's what my empirical observations told me:
> 
> It's easy to characterize this role by describing the way that you get 
> it and the conspicuous unique emergent properties that it has.
> 
> * You specify its name using the flag that's shown thus in response to 
> "initdb —help"
> 
>    -U, --username=NAME       database superuser name

How much time would it have taken to go to the docs?:

https://www.postgresql.org/docs/current/app-initdb.html

"-U username
--username=username

     Selects the user name of the database superuser. This defaults to 
the name of the effective user running initdb. It is really not 
important what the superuser's name is, but one might choose to keep the 
customary name postgres, even if the operating system user's name is 
different.
"

> 
> * It's listed as the owner of the pg_catalog schema, the objects in it, 
> other related schemas in every existing and yet-to-be-created database,

You need to define 'other related schemas'.


> and some global things too. Loosely, it's the owner of the SQL part of 
> the implementation of PostgreSQL.
> 
> The conventional choice is "postgres". I just did a brand-new PG 
> installation in a brand new Ubuntu VM and I simply ended up with this 
> name when the installation finished. (There was no chance in the 
> installation flow to choose the name.) However, an informal survey among 
> contacts who have PG installations on macOS showed that this "special" 
> role ends up with the name that you gave when you first configured your 
> new macOS for the admin O/S user. It's usually a cryptic form of one's 
> own name—as is my "Bllewell".
> 
> But the name "database superuser name" (in "initdb" speak) is useless as 
> a term of art for naming the phenomenon because you can have an 
> unlimited number of roles that are created "with superuser" in a PG cluster.
> 
> In another context, the comments in the shipped "pg_hba.conf" file (at 
> least on Ubuntu) include these:
> 
> # Database administrative login by Unix domain socket
> 
> for (in my case) this line:
> 
> local   all             postgres                                peer
> 
> (I failed when I tried to add a new one of my own. See below. But I 
> assume that it must be possible—also for a superuser.)
> 
> I noticed that in my case, the bare "psql" O/S command connects me to 
> "-d postgres -U postgres" without a password challenge. And the setup 
> had been done by the installation. Is "postgres" role uniquely able to 
> connect in this way with no password challenge? And might "the 
> administrative role" be the term of art that I'm seeking?

Nothing unique. The OS user postgres has been setup to login as database 
role user by the installation.

For it to  work you have to be operating as the OS user postgres. I'm 
guessing that is why your attempt as usr failed, you where not running 
as the OS user usr.

> 
> — — — — — — — — — — — — — — — — — — — —
> 
> * B.t.w., I tried to set up "peer" authentication for a brand new O/S 
> user that I called "usr" to match a brand new cluster role that I also 
> called "usr". I added a new line in "pg_hba.conf" thus:
> 
> local   all             usr                                     peer
> 
> (But there already is such a line for the special name "all".)

Read:

https://www.postgresql.org/docs/current/auth-peer.html

"

map

     Allows for mapping between system and database user names. See 
Section 21.2 for details.

"

Section 21.2

https://www.postgresql.org/docs/current/auth-username-maps.html

"The pg_ident.conf file is read on start-up and when the main server 
process receives a SIGHUP signal. If you edit the file on an active 
system, you will need to signal the postmaster (using pg_ctl reload, 
calling the SQL function pg_reload_conf(), or using kill -HUP) to make 
it re-read the file."
> 
> And I added a new line in "pg_ident.conf" (before, there were none at 
> all) thus:
> 
> # MAPNAME       SYSTEM-USERNAME         PG-USERNAME
> 
> usr             usr                     usr
> 
> But this attempt to connect:
> 
> psql -d postgres -U usr
> 
> failed with this error:
> 
> connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" 
> failed: FATAL:  Peer authentication failed for user "usr"
> 
> while this attempt:
> 
> psql -h localhost -p 5432 -d postgres -U usr
> 
> happily suceeded. I clearly missed some essential other steps. But the 
> doc didn't x-ref me to these.
> 
> I also tried this:
> 
> initdb \
>    -U usr --encoding UTF8 --locale=C --lc-collate=C --lc-ctype=en_US.UTF-8 \
>    -A md5 --pwfile=/etc/ybmt-code/misc/.superuser-passwd-for-initdb \
>    -D /var/lib/postgresql/11/main
> 
> It succeeded. And, after re-start, I could connect as "usr". But I still 
> could not do this using the "peer" method. I saw that, now, "usr" owns 
> the within-cluster PG implementation artifacts.


Again because you probably where not running as OS user usr.

> 
> However, while "initdb" was working, it said this:
> 
> The files belonging to this database system will be owned by user 
> "postgres".
> This user must also own the server process.
> 
> So "postgres" is clearly special in ways other than just as the name of 
> the owner of the within-cluster implementation. And this was thrust upon 
> me simply by using the recommended "apt install postgresql-11 method. I 
> had no say at all in the choice of this name. (as it happens, I did have 
> a Linux user called "postgres" before I started. But I seem to recall, 
> from PG installations on Ubuntu that I did a few years ago, that the 
> Linux user "postgres" was simply created for me when I didn't already 
> have it.

Again you are not getting the distinction between OS and database user.
The directory /var/lib/postgresql/11/main is owned by postgres.postgres 
so any files created in it will be, as the message stated, owned by OS 
user postgres.

Going back to this:

"
-U username
--username=username

     Selects the user name of the database superuser. This defaults to 
the name of the effective user running initdb. It is really not 
important what the superuser's name is, but one might choose to keep the 
customary name postgres, even if the operating system user's name is 
different.
"

in the database cluster(the SQL part) itself the 'owning' database role 
will be usr.

The package  installation set up an OS user postgres that runs the OS 
side of the operation e.g the server code. It also by default uses that 
same name as the database superuser when creating a new cluster. This 
user then owns the SQL side. You can, however, change the SQL 'owner' 
for new cluster as you did.

> 
> *Where can I read a nice, linear, soup-to-nuts acount of this whole 
> business that introduces, and that consistently uses, the proper terms 
> of art?*
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




(David and Ian, I'm resending this because, I see that I managed to omit "pgsql-general@lists.postgresql.org" from the addressee list. So, of course, it didn't show up in the "pgsql-general" archive.)

barwick@gmail.com wrote:

david.g.johnston@gmail.com:

bryn@yugabyte.com wrote:

The descriptive designation "the role that owns the SQL part of the implementation of PostgreSQL" is too much of a mouthful for daily use.

Don't think it's documented but I like "bootstrap user" which I've seen bandied about here a bit. It isn't that special but if the bootstrap user name and o/s user name are not the same name then you've broken an almost universal convention that exists to make stuff like logging in with peer authentication work better.

"bootstrap superuser" is also mentioned a few times in the [upcoming] docs, see e.g.

https://www.postgresql.org/docs/devel/sql-grant.html

This recent commit: e530be2c5ce77475d56ccf8f4e0c4872b666ad5f [1] might also be of interest to anyone considering the "special-ness" of this role.

[1] https://git.postgresql.org/gitweb/?p%3Dpostgresql.git;a%3Dcommit;h%3De530be2c5ce77475d56ccf8f4e0c4872b666ad5f

Thanks. David and Ian. I'll take this:

If GRANTED BY is specified, the grant is recorded as having been done by the specified role. A user can only attribute a grant to another role if they possess the privileges of that role. The role recorded as the grantor must have ADMIN OPTION on the target role, unless it is the bootstrap superuser. When a grant is recorded as having a grantor other than the bootstrap superuser, it depends on the grantor continuing to possess ADMIN OPTION on the role; so, if ADMIN OPTION is revoked, dependent grants must be revoked as well.

from an upcoming version of the "grant" statement doc, to be a sufficient establishment of the canonical status of the term of art that I sought. I'll adopt the term "bootstrap superuser" (and not plain "bootstrap user") and I'll assume that everybody on this list (at least anybody who might answer my questions) shares the same, and immediate, understanding of the term—which implies this:

This invariant must hold if an "ordinary" within-cluster  superuser is to qualify as the cluster's "bootstrap superuser":

the name of the bootstrap superuser's within-cluster role

AND

the name of the O/S user that owns lots of (but not all*) the software files that define the PostgreSQL RDBMS, together with the various files that represent what users create

are identical.


[*] I see that, in my Ubuntu installation, critical programs like "postgres" itself, "initdb", "pg_ctl", "pg_dump" and so on are owned by "root".

The fact that the "bootstrap superuser" term of art denotes a matching pair of two principals (an O/S user and a within-cluster role) means that some sentences will require extra verbiage to identify which half of the pair the sentence treats. I'm open to suggestions. But I'll start with these these I'm corrected: the "bootstrap (regular) OS-user" and the "bootstrap within cluster superuser role". Sadly, the fact that "super" is baked into the term of art makes it difficult to name the O/S half of the phenomenon.

I can now characterize what I'd observed more clearly, thus: only a bootstrap super user (as defined above) can start a session without mentioning the name of the database to which to connect and the name of the within-cluster role to connect as—and without supplying a password. And it can do this only from as O/S session where the effective O/S user is the bootstrap superuser.

It seems, too, that one would be stupid to call the bootstrap superuser anything other than "postgres". Notice that this implies that the typical macOS regime (where my bootstrap super user is called "Bllewell" and has to be double-quoted in SQL, and yours is called "sagrawal") is, indeed, stupidly unconventional.

All this implies a little test. Here, I'll save typing by saying that my bootstrap superuser is called "postgres". I created a second database in a freshly created cluster called "x". And then, from the O/S, I tried this:

psql -d x

That worked fine. But, having said this, it would seem that it would be so very unconventional (given that you've already agreed to call your bootstrap superuser "postgres", not to make a database called "postgres" available too. (I tested that by dropping my "postgres" database. (This is the freedom that a VM with a nice snapshot together with the ability to start afresh with "initdb" brings.) Now, the bare "psql" causes the error "FATAL:  database "postgres" does not exist".


The descriptive designation "the role that owns the SQL part of the implementation of PostgreSQL" is too much of a mouthful for daily use. And anyway, this notion captures only part of the story that makes "postgres" uniquely what it is—at least on Ubuntu.

How much time would it have taken to go to the docs:

«
https://www.postgresql.org/docs/current/app-initdb.html

Selects the user name of the database superuser. This defaults to the name of the effective user running initdb. It is really not important what the superuser's name is, but one might choose to keep the customary name postgres, even if the operating system user's name is different.
»

I HAD read that. The phrase occurs three times on that page. But the account doesn't define the term. Rather, it's used as if everybody knows what it means. Yet there's no x-ref to where the definition is. I did refer, albeit implicitly, to this doc by citing the text that "initdb --help" gives. The doc and the help say the same thing. You, Adrian, often accuse me of being too wordy. So I  catered to you by not using words to spell out what I just spelled out here.

A PG cluster has lots of databases. Lots of things have names whose uniqueness scope is (maximally) a single database. Just a couple of things, and roles in particular, need names that are unique in the cluster as a whole. Therefore, the term "database superuser" is tautologically wrong. It should, at least, be "cluster superuser". But then, like I said, you can have as many superusers as you please in a single cluster. So the idea that one is singled out as *THE* [cluster] superuser didn't make sense to me. Moreover, the "initdb" doc says that it doesn't really matter if what it calls the "database superuser" has the same name as the O/S ussr that owns (most of) the PG installation and cluster content. But David pointed out here:


that you lose a lot if these two sides of the same coin don't have the same name. (So the "initdb" doc would be improved by an x-ref to the discussion of the consequences of the name choices here.)

This implies that there's still a missing term of art that denotes the nicely matched *pair* of within-cluster role and O/S user.

I said all this in my reply to David and Ian Barwick. I did send it about half an hour before you wrote this. But I see now that I'd managed to omit "pgsql-general@lists.postgresql.org" from the addressee list. (Yes, another of my notorious typos.) I resent it moments ago. It's here:


...listed as the owner of the pg_catalog schema, the objects in it, other related schemas...

You need to define 'other related schemas'.

select nspname
from pg_namespace n inner join pg_roles r on n.nspowner = r.oid
where r.rolname = 'postgres';

produces this:

pg_toast
pg_catalog
information_schema
...

Same point as before. You and others in the cohort of "the pgsql-general list lawmakers" have made me nervous about spelling things out 'cos doing so uses words and code—and often I've been told off for being too wordy. This is a pity because accuracy and precision inevitably compete with brevity.

...For it to  work you have to be operating as the OS user postgres. I'm guessing that is why your attempt as usr failed, you where not running as the OS user usr.

No, I've been super-aware of the current identity of the O/S user in all tests, I've typed "whoami" more times in the last few days than before in my whole life to date.

I tried to set up "peer" authentication for a brand new O/S user that I called "usr" to match a brand new cluster role that I also called "usr". I added a new line in "pg_hba.conf" thus:
local   all             usr                                     peer

Read:
https://www.postgresql.org/docs/current/auth-peer.html

I had. And I'd followed the link to Section 21.2:
«
"The pg_ident.conf file is read on start-up and when the main server process receives a SIGHUP signal. If you edit the file on an active system, you will need to signal the postmaster (using pg_ctl reload, calling the SQL function pg_reload_conf(), or using kill -HUP) to make it re-read the file.
»

To be sure, I did the whole thing again now. (And, yes, my O/S user is "postgres", at the start of this account.) Here's the relevant part of the output from "cat /etc/passwd": 

postgres:x:1001:1001:,,,:/home/postgres:/bin/bash
usr:x:1002:1001:,,,:/home/usr:/bin/bas
h

I did "sudo systemctl stop postgresql". Then I made sure that this line:

# MAPNAME       SYSTEM-USERNAME         PG-USERNAME
usr             usr                     usr

was present in this file:

/etc/postgresql/11/main/pg_ident.conf

Then I did this:

rm -Rf /var/lib/postgresql/11/main

initdb \
  -U usr --encoding UTF8 --locale=C --lc-collate=C --lc-ctype=en_US.UTF-8 \
  -A md5 --pwfile=/etc/ybmt-code/misc/.superuser-passwd-for-initdb \
  -D /var/lib/postgresql/11/main


It finished with "Success. You can now start the database server...". That should read:

« can now start the cluster server »

and not "database server", yes? Anyway, I used "sudo systemctl start postgresql" to start it and not "pg_ctl ... start" like the prompt says. (That's a different area of concern. But I won't go there now.)

Then I did this (even though it seems to me that having stopped and blown away the old cluster and then having started a brand new one, this would be superfluous):

pg_ctl reload -D /var/lib/postgresql/11/main

It responded with "server signaled" (and nothing else).

Next, I did "su usr" and confirmed that I'd got where I intended to with "whomai". Then I started a session with this command:

psql -h localhost -p 5432 -d postgres -U usr

That worked fine, so I did this:

select nspname
from pg_namespace n inner join pg_roles r on n.nspowner = r.oid
where r.rolname = 'usr';

It produced this:

pg_toast
pg_temp_1
pg_toast_temp_1
pg_catalog
public
information_schema

like I'd expected. So, according to the "initdb" doc, "usr" is very definitely the database superuser.

So then I tried the bare "psql". It failed with this error:

connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: FATAL:  database "usr" does not exist

This is exactly what I described in my email to David and Ian (URL above).

Again because you probably where not running as OS user usr.

No, like I just showed you, I was running as "usr". 

However, while "initdb" was working, it said « The files belonging to this database system will be owned by user "postgres". This user must also own the server process.» So "postgres" is clearly special in ways other than just as the name of the owner of the within-cluster implementation. [but the name "postgres" from the O/S user] was thrust upon me simply by using the recommended "apt install postgresql-11" method. I had no say at all in the choice of this name. (as it happens, I did have a Linux user called "postgres" before I started. But I seem to recall, from PG installations on Ubuntu that I did a few years ago, that the Linux user "postgres" was simply created for me when I didn't already have it.

Again you are not getting the distinction between OS and database user. The directory /var/lib/postgresql/11/main is owned by postgres.postgres so any files created in it will be, as the message stated, owned by OS user postgres. Going back to this:

«
-U username
--username=username

   Selects the user name of the database superuser. This defaults to the name of the effective user running initdb. It is really not important what the superuser's name is, but one might choose to keep the customary name postgres, even if the operating system user's name is different.
«

in the database cluster (the SQL part) itself the 'owning' database role will be usr. The package  installation set up an OS user postgres that runs the OS side of the operation e.g the server code. It also by default uses that same name as the database superuser when creating a new cluster. This user then owns the SQL side. You can, however, change the SQL 'owner' for new cluster as you did.

That's unfair. I do appreciate the distinction. And I tried my best to show this in what I wrote. Moreover, my empirical tests seem to show that you can start a session without specifying the name of the cluster role as which to authorize, its password, and the name of the database to which to connect ONLY when these things are true:

1. The within-cluster, uniquely special, role that owns the catalogs and similar (designated as the "database superuser") has a certain name, say "pg_system".

2.  The O/S user that owns (most of) the O/S presence of the cluster and the software that accesses it has the identical name "pg_system".

3. The current O/S user when you make the attempt to connect is "pg_system".

It's uncomfortable when I've merely speculated that this is the rule. And I want to read the definitive account. That's why I asked this:

*Where can I read a nice, linear, soup-to-nuts account of this whole business that introduces, and that consistently uses, the proper terms of art?*

Maybe there simply is no such account. And maybe one simply has to pick up the correct "do this, and this happens" understanding—without the support of an explicated mental model with suitable associated terms of art. But I think that my question is fair—and that it deserves an answer.

It's essential to understand this when you perform the Ubuntu installation as the doc specifies. If you don't, then you'll be stuck when you want to try the first, and critical, test for a successful installation—to start a session using psql. If you know what conventions the installation follows, and if you know the mental model, then you'll know that you can do a bare "psql" or, if you prefer, this as the "pstgres" O/S user

psql -c "alter role postgres with password 'x'";

where, here, the names of both halves of the coin were non-negotiably chosen by the installation flow. But if you don't know these things, then you'll be stuck.


I can now characterize what I'd observed more clearly, thus: only a bootstrap super user (as defined above) can start a session without mentioning the name of the database to which to connect and the name of the within-cluster role to connect as—and without supplying a password. And it can do this only from as O/S session where the effective O/S user is the bootstrap superuser.


I don't believe this is correct.  psql is using libpq.  A hint to that is given in the section under the \c command:

When the command neither specifies nor reuses a particular parameter, the libpq default is used.

The libpq docs show more info on the connection parameters (https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS)

If not specified, both the user and the dbname default to the name of your operating system user.  So, this could work if your operating system user, the database name, and the database username are all postgres, but they could also all be bob.

As to the password requirement - this depends on the settings in pg_hba.conf.  You could set all connections to trust and then no one would need a password, but I wouldn't recommend that.

  -Jeremy
 
 
On Thu, Oct 27, 2022 at 12:09 PM Bryn Llewellyn <bryn@yugabyte.com> wrote:
[*] I see that, in my Ubuntu installation, critical programs like "postgres" itself, "initdb", "pg_ctl", "pg_dump" and so on are owned by "root".

And they exist in a "bin" directory so that any user on the system can actually execute them.  It doesn't really matter who owns the binaries so far as the database is concerned, it matters who executes them.


The fact that the "bootstrap superuser" term of art denotes a matching pair of two principals (an O/S user and a within-cluster role)

No, it does not.  It denotes only the PostgreSQL role.  "service user" is probably a better term for the O/S side of things.  Though, frankly, aside from trying to distinguish things when talking about logging in, the necessity to even care about the O/S user is fairly minimal.
 
means that some sentences will require extra verbiage to identify which half of the pair the sentence treats. I'm open to suggestions. But I'll start with these these I'm corrected: the "bootstrap (regular) OS-user" and the "bootstrap within cluster superuser role". Sadly, the fact that "super" is baked into the term of art makes it difficult to name the O/S half of the phenomenon.

I can now characterize what I'd observed more clearly, thus: only a bootstrap super user (as defined above) can start a session without mentioning the name of the database to which to connect and the name of the within-cluster role to connect as—and without supplying a password. And it can do this only from as O/S session where the effective O/S user is the bootstrap superuser.

That is generally wrong - it is only correct when considering a newly initialized cluster - since then the only database that exists is the postgres database and you cannot connect to a cluster without specifying an existing database (so if you don't want to explicitly specify one you better arrange things so the default you end up using is postgres, which means your O/S user has to be postgres).  Otherwise, as your "usr" example demonstrates, just create a database named "usr" and you won't get the "database usr not found" error message anymore and the login will succeed.

David J.

jeremy@musicsmith.net wrote:

bryn@yugabyte.com wrote:

I can now characterize what I'd observed more clearly, thus: only a bootstrap super user (as defined above) can start a session without mentioning the name of the database to which to connect and the name of the within-cluster role to connect as—and without supplying a password. And it can do this only from as O/S session where the effective O/S user is the bootstrap superuser.

I don't believe this is correct.  psql is using libpq.  A hint to that is given in the section under the \c command:

When the command neither specifies nor reuses a particular parameter, the libpq default is used.

The libpq docs show more info on the connection parameters (https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS)

If not specified, both the user and the dbname default to the name of your operating system user.  So, this could work if your operating system user, the database name, and the database username are all postgres, but they could also all be bob.

As to the password requirement - this depends on the settings in pg_hba.conf.  You could set all connections to trust and then no one would need a password, but I wouldn't recommend that.

Thank you for this extra information, Jeremy. I'm afraid that what I wrote was insufficiently precise. I should have added these riders:

(1) My O/S env is in a brand new VM (it happens to be Parallels on my Big Sur Mac Book) that was created by a single button press that selected Ubuntu 20.04. (I followed that with what I hope is only benign customization for terminal colors and the like.)

(2) My PG regime is what I end up with in this VM immediately following the use of "apt install postgresql-11". (There's a good reason why I want that old version. I hope that its age isn't a distraction here.) Notably, the installation flow offers no opportunity to express choices.

(3) This gives me the PG software (largely owned by the O/S user "postgres", but with some programs owned by "root") and an already started cluster.

(4) I am able to start a plsql session, when my O/S user is "postgres" simply by typing the bare command "psql". Yes, I'm implicitly selecting various libpq default values—just as you described. But those defaults don't include a default for the password.

(5) When I start a session in this way, I see that I have this regime: a single role with the name "postgres" and the status "superuser"; and a single database non-template database also with the name "postgres", together with the usual "template1" and "template2". Further, this query (when connected to the "postgres" database):

select nspname
from pg_namespace n inner join pg_roles r on n.nspowner = r.oid
where r.rolname = 'postgres'
order by 1;

shows me this:

pg_toast
pg_temp_1
pg_toast_temp_1
pg_catalog
public
information_schema

When I asked how to refer to this clearly special cluster-role, David said:

Don't think it's documented but I like "bootstrap user"

and Ian said:

"bootstrap superuser" is also mentioned.

Meanwhile, in a separate thread, Adrian pointed me to the "initdb" doc (and command line help) where the term "database superuser" is used.

This means that I'm so far denied the possibility to use a single term that everybody agrees on. I may as well call it the "catalog owning role" here because at least that term is unambiguously descriptive.

I stated in my reply to Adrian that  I had formed this hypothesis (reworded slightly here).

When he environment is what I described at the start (which env. brings a "pg_hba.conf" file that requires password authentication by NOT specifying "trust"),

«
You can start a session without specifying the name of the cluster role as which to authorize, its password, and the name of the database to which to connect, ONLY when these things are true:

1. The within-cluster catalog-owning role has a certain name, say "pg_system" (or "bob").

2.  The O/S user that owns (most of) the O/S presence of the cluster and the software that accesses it has the identical name "pg_system"
 (or "bob").

3. The current O/S user when you make the attempt to connect is "pg_system"
 (or "bob").
»

I want to know if my hypothesis is correct. And, more importantly, I want to know where I can read a nicely written linear account of what *is* correct tha defines and then uses the official terms of art.

On Thu, Oct 27, 2022 at 3:24 PM Bryn Llewellyn <bryn@yugabyte.com> wrote:
«
You can start a session without specifying the name of the cluster role as which to authorize, its password, and the name of the database to which to connect, ONLY when these things are true:

1. The within-cluster catalog-owning role has a certain name, say "pg_system" (or "bob").

2.  The O/S user that owns (most of) the O/S presence of the cluster and the software that accesses it has the identical name "pg_system"
 (or "bob").

3. The current O/S user when you make the attempt to connect is "pg_system"
 (or "bob").
»

I want to know if my hypothesis is correct.

It is not.  "By default" probably, but not "only".

Peer authentication means:

If the local O/S user (bob) running "psql" requests to login to the database using that same* role name (bob) and the role exists in the cluster, accept the authentication attempt.

* You can implement aliases by using an identity mapping.

Nothing more, nothing less.
 
And, more importantly, I want to know where I can read a nicely written linear account of what *is* correct tha defines and then uses the official terms of art.

It doesn't exist, deal with it.  Most people just call the "bootstrap" role "postgres" when not talking about a specific installed cluster that happens to use something different.  In any case, no matter what terminology is used everyone seems to figure out what is being referred to from context at least and largely don't make a big deal about it.  Unless you actually want to write the documentation just pick something you like and go with it.


On Thu, Oct 27, 2022 at 12:09 PM Bryn Llewellyn <bryn@yugabyte.com> wrote:

This invariant must hold if an "ordinary" within-cluster  superuser is to qualify as the cluster's "bootstrap superuser":

the name of the bootstrap superuser's within-cluster role

AND

the name of the O/S user that owns lots of (but not all*) the software files that define the PostgreSQL RDBMS, together with the various files that represent what users create

are identical.


Nope, the name of the bootstrap user is the one supplied to initdb via the --username argument.  Period.  It need not match any name on the host operating system and it will still be the bootstrap superuser's role name.

Yes, the description for --username probably should be modified to read:

"Selects the user name of the cluster's bootstrap superuser."  Or just consider a "cluster superuser" the term d'art...since most people would just refer to any old role having superuser authorization as being plain ole "superuser".  The fact that is says "database superuser" is the same holdover effect as the fact that "init db" means "init database" even though it actually initializes a cluster.

David J.
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> Yes, the description for --username probably should be modified to read:

> "Selects the user name of the cluster's bootstrap superuser."

Yeah, perhaps.  The term "bootstrap superuser" is reasonably well
established by now --- I count half a dozen uses in our SGML docs
and another dozen or so in the code --- and it's certainly more
specific than "database superuser".  We should probably create
a glossary entry for it and then change all the uses of "database
superuser" as appropriate.

However ... it looks to me like some of those uses just mean to
distinguish between Postgres-specific superuser-dom as opposed
to whatever the term might mean out in the operating system.
But I'm not sure that anybody really uses that term for an OS-level
concept on any popular OS, so it feels a bit pedantic as well
as confusing.  Should we leave those usages alone, or reduce them
to just "superuser"?

            regards, tom lane



david.g.johnston@gmail.com wrote:

bryn@yugabyte.com wrote:

The fact that the "bootstrap superuser" term of art denotes a matching pair of two principals (an O/S user and a within-cluster role)

No, it does not.  It denotes only the PostgreSQL role.  "service user" is probably a better term for the O/S side of things.  Though, frankly, aside from trying to distinguish things when talking about logging in, the necessity to even care about the O/S user is fairly minimal.

[about your "usr" example] just create a database named "usr" and you won't get the "database usr not found" error message anymore and the login will succeed.

Thank you very much David. The scales have now finally fallen from my eyes. I know now that in order to be able to start a client session from the O/S of the machine where the PG software and cluster live, without needing to supply a password even when "pg_hba.conf" asks for password authentication, it's sufficient to do this (using my "usr" example):

(0) Simply leave the regime in place where the catalog-owning role is called "postgres" and the cluster's data files and other config files are owned by postgres.

(1) create a new database role thus (where "password null" is just so that I can prove a point here):

create role usr with login password null;

(2) Add this line under the existing final comment in the shipped copy of "pg_ident.conf" thus:

# MAPNAME       SYSTEM-USERNAME         PG-USERNAME
usr             usr                     usr          # Added by Bryn

(It seems that I could set the first field of this line to "dog"—but I won't test that.

(3) Add this line between the existing two in the shipped copy of "pg_hba.conf" thus:

local   all             postgres                                peer # See the essay at the start.
local   all             usr                                     peer # 
Added by Bryn
local   all             all                                     peer

(My copy of this file specifies "md5" and not "trust".)

I'd've thought that "all" would mean any O/S user existing, or yet-to-be-created. But the comment in the shipped "pg_hba.conf" says this:

# If you change this first entry you will need to make sure that the
# database superuser can access the database using some other method.
...
# Database administrative login by Unix domain socket
local   all             postgres                                peer

(So two terms for the one notion just a couple of lines apart!) I'll do the empirical test presently. Anyway, with these conditions met, I can "su usr" and then start a session like this:

psql -d postgres

Yes, your point about what artifacts exist the moment after "initdb" finishes is taken. So I finished my test by (after authorizing as "postgres") creating a database "usr" and granting "connect" on it to "usr".) Then I could create a new session from the O/S prompt when "whoami" shows "user" with the bare "psql"—just as I could the moment after the PG install finished from the O/S prompt when "whoami" shows "postgres".

I did think that I'd tried all this at the outset. But clearly I must've missed one of those steps or done a typo.

> david.g.johnston@gmail.com wrote:
>
>> bryn@yugabyte.com wrote:
>>
>> «
>> You can start a session without specifying the name of the cluster role as which to authorize, its password, and the
nameof the database to which to connect, ONLY when these things are true: 
>>
>> 1. The within-cluster catalog-owning role has a certain name, say "pg_system" (or "bob").
>>
>> 2.  The O/S user that owns (most of) the O/S presence of the cluster and the software that accesses it has the
identicalname "pg_system" (or "bob"). 
>>
>> 3. The current O/S user when you make the attempt to connect is "pg_system" (or "bob").
>> »
>>
>> I want to know if my hypothesis is correct.
>
> It is not.  "By default" probably, but not "only". Peer authentication means:
>
> If the local O/S user (bob) running "psql" requests to login to the database using that same* role name (bob) and the
roleexists in the cluster, accept the authentication attempt. 
>
> * You can implement aliases by using an identity mapping.
>
> Nothing more, nothing less.
>
>> And, more importantly, I want to know where I can read a nicely written linear account of what *is* correct tha
definesand then uses the official terms of art. 
>
> It doesn't exist, deal with it. Most people just call the "bootstrap" role "postgres" when not talking about a
specificinstalled cluster that happens to use something different. In any case, no matter what terminology is used
everyoneseems to figure out what is being referred to from context at least and largely don't make a big deal about it.
Unless you actually want to write the documentation just pick something you like and go with it. 

Yes, all is clear now. Thanks again. And thanks for the dispensation to choose my term of art. I'll use "catalog-owning
role"from now on. 



This invariant must hold if an "ordinary" within-cluster  superuser is to qualify as the cluster's "bootstrap superuser":

the name of the bootstrap superuser's within-cluster role

AND

the name of the O/S user that owns lots of (but not all*) the software files that define the PostgreSQL RDBMS, together with the various files that represent what users create

are identical.

Nope, the name of the bootstrap user is the one supplied to initdb via the --username argument.  Period. It need not match any name on the host operating system and it will still be the bootstrap superuser's role name.

Yes, the description for --username probably should be modified to read:

"Selects the user name of the cluster's bootstrap superuser." Or just consider a "cluster superuser" the term d'art...since most people would just refer to any old role having superuser authorization as being plain ole "superuser".  The fact that is says "database superuser" is the same holdover effect as the fact that "init db" means "init database" even though it actually initializes a cluster.

Thanks again, David. And once again, all is clear now.

> tgl@sss.pgh.pa.us> wrote:
>
>> david.g.johnston@gmail.com wrote:
>>
>> Yes, the description for --username probably should be modified to read:
>>
>> "Selects the user name of the cluster's bootstrap superuser."
>
> Yeah, perhaps. The term "bootstrap superuser" is reasonably well established by now --- I count half a dozen uses in
ourSGML docs and another dozen or so in the code --- and it's certainly more specific than "database superuser". We
shouldprobably create a glossary entry for it and then change all the uses of "database superuser" as appropriate. 
>
> However ... it looks to me like some of those uses just mean to distinguish between Postgres-specific superuser-dom
asopposed to whatever the term might mean out in the operating system. But I'm not sure that anybody really uses that
termfor an OS-level 
> concept on any popular OS, so it feels a bit pedantic as well as confusing.  Should we leave those usages alone, or
reducethem to just "superuser"? 

Thanks, Tom. I'd certainly appreciate an entry in "Appendix M. Glossary" for the term of art that I've been struggling
toname. Until I hear what the experts decide, I'll use "catalog-owning role". Then I'll switch to the newly blessed
term.

I'm afraid that I didn't get your point in your last paragraph. The terms "within-cluster role" and "O/S user" seem to
capturethe distinction when the context doesn't make it clear. I aim never to use the term "user" for the
within-clusterphenomenon. After all, an existing within-cluster role can flip between "with nologin" and "with login"
atthe drop of a hat of the guy "with createrole". 


On 10/27/22 17:20, Bryn Llewellyn wrote:
>> david.g.johnston@gmail.com <mailto:david.g.johnston@gmail.com> wrote:
>>
>>> bryn@yugabyte.com <mailto:bryn@yugabyte.com> wrote:
>>>
>>> The fact that the "bootstrap superuser" term of art denotes a 
>>> matching pair of two principals (an O/S user and a within-cluster role)
>>
>> No, it does not.  It denotes only the PostgreSQL role.  "service user" 
>> is probably a better term for the O/S side of things.  Though, 
>> frankly, aside from trying to distinguish things when talking about 
>> logging in, the necessity to even care about the O/S user is fairly 
>> minimal.
>>
>> [about your "usr" example] just create a database named "usr" and you 
>> won't get the "database usr not found" error message anymore and 
>> the login will succeed.
> 
> Thank you very much David. The scales have now finally fallen from my 
> eyes. I know now that in order to be able to start a client session from 
> the O/S of the machine where the PG software and cluster live, without 
> needing to supply a password even when "pg_hba.conf" asks for password 
> authentication, it's sufficient to do this (using my "usr" example):
> 
> (0) Simply leave the regime in place where the catalog-owning role is 
> called "postgres" and the cluster's data files and other config files 
> are owned by postgres.
> 
> (1) create a new database role thus (where "password null" is just so 
> that I can prove a point here):
> 
> create role usr with login password null;
> 
> (2) Add this line under the existing final comment in the shipped copy 
> of "pg_ident.conf" thus:
> 
> # MAPNAME       SYSTEM-USERNAME         PG-USERNAME
> usr             usr                     usr          # Added by Bryn
> 
> (It seems that I could set the first field of this line to "dog"—but I 
> won't test that.

The above is not contributing to the below(pg_hba.conf) and would be 
redundant any way as it just says OS user usr = Pg user usr and peer 
means that anyway. The purpose of mapping would be to do something like 
map OS user foo to PG user usr.

References:

https://www.postgresql.org/docs/current/auth-peer.html

"map

     Allows for mapping between system and database user names. See 
Section 21.2 for details.
"

Section 21.2

"The map-name is an arbitrary name that will be used to refer to this 
mapping in pg_hba.conf."

This example below id for the ident auth method but the same syntax 
applies to peer.

https://www.postgresql.org/docs/current/auth-pg-hba-conf.html

# TYPE  DATABASE        USER            ADDRESS                 METHOD
host    all             all             192.168.0.0/16          ident 
map=omicron


> 
> (3) Add this line between the existing two in the shipped copy of 
> "pg_hba.conf" thus:
> 
> local   all             postgres                                peer # 
> See the essay at the start.
> local   all             usr                                     peer # 
> Added by Bryn
> local   all             all                                     peer


As noted above your pg_ident.conf will not do anything for the above. It 
will work though if you are logged in as OS user usr as it will connect 
as PG user usr.

> 
> (My copy of this file specifies "md5" and not "trust".)
> 
> I'd've thought that "all" would mean any O/S user existing, or 
> yet-to-be-created. But the comment in the shipped "pg_hba.conf" says this:
> 
> # If you change this first entry you will need to make sure that the
> # database superuser can access the database using some other method.
> ...
> # Database administrative login by Unix domain socket
> local   all             postgres                                peer
> 
> (So two terms for the one notion just a couple of lines apart!) I'll do 
> the empirical test presently. Anyway, with these conditions met, I can 
> "su usr" and then start a session like this:
> 
> psql -d postgres
> 
> Yes, your point about what artifacts exist the moment after "initdb" 
> finishes is taken. So I finished my test by (after authorizing as 
> "postgres") creating a database "usr" and granting "connect" on it to 
> "usr".) Then I could create a new session from the O/S prompt when 
> "whoami" shows "user" with the bare "psql"—just as I could the moment 
> after the PG install finished from the O/S prompt when "whoami" shows 
> "postgres".
> 
> I did think that I'd tried all this at the outset. But clearly I must've 
> missed one of those steps or done a typo.
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




On Thu, Oct 27, 2022 at 4:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> Yes, the description for --username probably should be modified to read:

> "Selects the user name of the cluster's bootstrap superuser."

Yeah, perhaps.  The term "bootstrap superuser" is reasonably well
established by now --- I count half a dozen uses in our SGML docs
and another dozen or so in the code --- and it's certainly more
specific than "database superuser".  We should probably create
a glossary entry for it and then change all the uses of "database
superuser" as appropriate.

+1

However ... it looks to me like some of those uses just mean to
distinguish between Postgres-specific superuser-dom as opposed
to whatever the term might mean out in the operating system.
But I'm not sure that anybody really uses that term for an OS-level
concept on any popular OS, so it feels a bit pedantic as well
as confusing.  Should we leave those usages alone, or reduce them
to just "superuser"?


Upon a third reflection I decided that leaving "database superuser" in place is preferred; it is fairly pervasive in the code, docs, and translations.  I would suggest documenting both "bootstrap superuser" and "database superuser", making it clear that "database superuser" means any role in the cluster that has the superuser attribute while "bootstrap superuser" is specifically that superuser which was created by initdb and thus owns all initialized objects including the catalogs in all databases in the cluster.

I'm not sure what you are referring to with respect to OS-level references but those ideally will not refer to superuser at all - reserving the concept for the product.  admin/root/sudo or even just os-user suffice for the few places where the two worlds intersect.

David J.

On Wed, Oct 26, 2022 at 09:57:50PM -0700, Adrian Klaver wrote:
> On 10/26/22 18:33, Bryn Llewellyn wrote:
> > The descriptive designation "the role that owns the SQL part of the
> > implementation of PostgreSQL" is too much of a mouthful for daily use.
> >  And anyway, this notion captures only part of the story that makes
> > "postgres" uniquely what it is—at least on Ubuntu.
> > 
> > MORE...
> > 
> > Here's what my empirical observations told me:
> > 
> > It's easy to characterize this role by describing the way that you get
> > it and the conspicuous unique emergent properties that it has.
> > 
> > * You specify its name using the flag that's shown thus in response to
> > "initdb —help"
> > 
> >    -U, --username=NAME       database superuser name
> 
> How much time would it have taken to go to the docs?:
> 
> https://www.postgresql.org/docs/current/app-initdb.html

I know I am replying late here, but isn't it the database _cluster_
superuser?

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.



On Tue, Nov 29, 2022 at 9:31 AM Bruce Momjian <bruce@momjian.us> wrote:

I know I am replying late here, but isn't it the database _cluster_
superuser?


The "cluster" being implied doesn't seem like a big deal.  The shorter term is nice.  It doesn't seem worth changing all the many, many, places in the documentation where just "database superuser' is used - and so we've updated the glossary to be just that.

David J.



"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Tue, Nov 29, 2022 at 9:31 AM Bruce Momjian <bruce@momjian.us> wrote:
>> I know I am replying late here, but isn't it the database _cluster_
>> superuser?

> The "cluster" being implied doesn't seem like a big deal.  The shorter term
> is nice.  It doesn't seem worth changing all the many, many, places in the
> documentation where just "database superuser' is used - and so we've
> updated the glossary to be just that.

Yeah, I don't see that "cluster" adds anything.  It's sometimes worth
saying "database superuser" to ensure that you don't confuse people
who might think of some external-to-Postgres meaning of "superuser",
but otherwise plain "superuser" is fine.  And we've settled on
"bootstrap superuser" as the best term for the role with OID 10.
So the present set of glossary entries looks fine to me.

            regards, tom lane



On Tue, Nov 29, 2022 at 12:01:01PM -0500, Tom Lane wrote:
> Yeah, I don't see that "cluster" adds anything.  It's sometimes worth
> saying "database superuser" to ensure that you don't confuse people
> who might think of some external-to-Postgres meaning of "superuser",
> but otherwise plain "superuser" is fine.  And we've settled on
> "bootstrap superuser" as the best term for the role with OID 10.
> So the present set of glossary entries looks fine to me.

Thanks, agreed.  I just wanted to suggest a clarification in case it was
useful.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.