Обсуждение: Re: [SQL] createdb -D xxxx not working
[hackers added to cc: list]
Peter Eisentraut <peter_e@gmx.net> writes:
> On 2000-01-08, Tom Lane mentioned:
>> Chris Griffin <cgriffin@websales.com> writes:
>>>> # setenv PGDATA2 /web/sites/1.192/db-data
>>>> # initlocation PGDATA2
>>>> # createdb -D PGDATA2
>> 
>> Surely you want $PGDATA2 in the latter two lines?
> You might recall me mentioning that the whole alternative location
> business doesn't work in the first place. I'll fix that this week; Chris
> will have to wait until the next release.
Fine.  BTW, am I the only one who thinks that it's really silly for
initlocation to expect to be given an unexpanded environment variable
name?  That's not normal practice (either elsewhere in Postgres, or
anywhere else that I know of).  It's confusing because it's not how
an ordinary Unix user would expect a shell command to behave, and it
doesn't buy any functionality that I can see.  I'd vote for taking
out the auto-expansion, so that the correct invocation becomesinitlocation $PGDATA2
which is what an average user would expect.
Actually, after looking at what initlocation really does, which is
darn near nothing, I wonder whether we shouldn't dispense with it
altogether...
        regards, tom lane
			
		> >> Surely you want $PGDATA2 in the latter two lines?
> > You might recall me mentioning that the whole alternative location
> > business doesn't work in the first place.
I'm not recalling the details here; what is the problem? It works for
me (but then I wrote it ;)
> Fine.  BTW, am I the only one who thinks that it's really silly for
> initlocation to expect to be given an unexpanded environment variable
> name?  That's not normal practice (either elsewhere in Postgres, or
> anywhere else that I know of).  It's confusing because it's not how
> an ordinary Unix user would expect a shell command to behave, and it
> doesn't buy any functionality that I can see.
It is not silly at all, unless you want to shoot holes in the basic
premise of the alternate location. This is described in the docs, but
the short form is:
initlocation is used to create the directory structure *with correct
permissions* for an alternate location. It takes an environment
variable because usually the backend should have that same environment
variable defined to ensure consistancy and as a security measure. The
environment variable can be expanded or not; initlocation does the
right thing in either case.
createdb takes a "-D" argument, which specifies the alternate
location. Unless allowed at compile-time, via the
ALLOW_ABSOLUTE_DBPATHS variable, all references to an alternate
location must refer to an environment variable, to give the dbadmin
control over *where* alternate locations will appear. The mechanism is
enforced by the backend, by looking for a directory delimiter in the
datpath field of pg_database, then expanding the first field as an
environment variable. On my system, with one database in an alternate
location, this table looks like:
test=# select * from pg_database; datname   | datdba | encoding |   datpath    
------------+--------+----------+--------------template1  |    100 |        0 | template1postgres   |    100 |        0
|postgresregression |    100 |        0 | regressiontest       |    100 |        0 | PGDATA2/test
 
(4 rows)
So, this works:
[postgres@golem parser]$ createdb -D PGDATA2 test2
CREATE DATABASE
But this does not (but can be enabled at compile-time if the sysadmin
wants to allow users to scatter databases everywhere (?!!)):
[postgres@golem regress]$ createdb -D /opt/postgres/data2 test2
ERROR:  The path '/opt/postgres/data2/test2' is invalid.
This may be due to a missing environment variable on the server.
createdb: Database creation failed.
> I'd vote for taking
> out the auto-expansion, so that the correct invocation becomes
>         initlocation $PGDATA2
> which is what an average user would expect.
But the average user does not necessarily have access to the PGDATA2
environment variable! This is usually a sysadmin function.
> Actually, after looking at what initlocation really does, which is
> darn near nothing, I wonder whether we shouldn't dispense with it
> altogether...
?? I hope the explanation above helps...
                   - Thomas
-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California
			
		On 2000-01-11, Thomas Lockhart mentioned: > > >> Surely you want $PGDATA2 in the latter two lines? > > > You might recall me mentioning that the whole alternative location > > > business doesn't work in the first place. > > I'm not recalling the details here; what is the problem? It works for > me (but then I wrote it ;) > Darn, now I already rewrote the thing to where I considered it working. We had several on-and-off discussions (mostly Tom and I) about it during the last two months at least, including users complaining. The sort of scheme I came up with scraps the environment variable stuff (since it's not quite safe either and has several bugs in the code that create a bunch of problems along the way) and lets the following happen (example paths): CREATE DATABASE foo; --> /usr/local/pgsql/data/base/foo/pg_class CREATE DATABASE foo WITH LOCATION 'bar'; --> /usr/local/pgsql/data/base/bar/pg_class CREATE DATABASE foo WITH LOCATION '/some/where'; --> /some/where/pg_class CREATE DATABASE foo WITH LOCATION 'foo/bar'; is disabled. I suppose I could stick the environment variable deal back in so that users don't have to remember any complicated paths, but this is the dbadmin's job anyway, so he should be able to remember it. Anyway, in order for a path to be allowed it has to be listed in PG_ALTLOC, which is a colon-delimited, marginally wildcard enabled list of allowed locations. In some future life this could be included in a configuration file. > initlocation is used to create the directory structure *with correct > permissions* for an alternate location. It takes an environment I think one of the problems was actually that the path initlocation assumed and the one createdb worked with were different. > variable because usually the backend should have that same environment > variable defined to ensure consistancy and as a security measure. The > environment variable can be expanded or not; initlocation does the > right thing in either case. It's false security though, since for complete security the dbadmin would have to delete all other environment variables altogether. > > I'd vote for taking > > out the auto-expansion, so that the correct invocation becomes > > initlocation $PGDATA2 > > which is what an average user would expect. > > But the average user does not necessarily have access to the PGDATA2 > environment variable! This is usually a sysadmin function. The average user doesn't even have access to the machine the database is running on, so he can't do any initlocation either way. Then again the average user doesn't create databases either. But there is no real point for initlocation since a mere mkdir() call in createdb() will do the job. Now I just got done with that coding 10 minutes ago but now that someone actually spoke up in defence of this mechanism I'm going to wait and see what you think about the revised (or any) scheme. -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
> Darn, now I already rewrote the thing to where I considered it working. We
> had several on-and-off discussions (mostly Tom and I) about it during the
> last two months at least, including users complaining.
afaik I've been on the hackers list most of the time, and don't recall
significant discussion.
> The sort of scheme I came up with scraps the environment variable stuff
> (since it's not quite safe either and has several bugs in the code that
> create a bunch of problems along the way) and lets the following
> happen (example paths):
> 
> CREATE DATABASE foo;    --> /usr/local/pgsql/data/base/foo/pg_class
> CREATE DATABASE foo WITH LOCATION 'bar';
>                         --> /usr/local/pgsql/data/base/bar/pg_class
> CREATE DATABASE foo WITH LOCATION '/some/where';
>                         --> /some/where/pg_class
> CREATE DATABASE foo WITH LOCATION 'foo/bar';
>                         is disabled. I suppose I could stick the
> environment variable deal back in so that users don't have to remember any
> complicated paths, but this is the dbadmin's job anyway, so he should be
> able to remember it.
Huh? The "environment variable deal" is intended to provide security
and easier operation. The examples above which allow specifying
absolute paths (e.g. "/some/where/pg_class") are specifically
disallowed by default, for a reason. If there were bugs in that, let's
talk about it, but I'm *very* uncomfortable making wholesale changes
without a discussion. And what exactly were "several bugs in the
code"?? Better be specific; that's my code we're talking about :/
It sounds like you and Tom Lane have had discussions on this, but I'm
not certain it was with a clear understanding of what the environment
variables provided. Before killing the capability, I'd like to make
sure that we understand what it did.
otoh, you have a new proposal (the first I've heard of it afaik)...
> Anyway, in order for a path to be allowed it has to be listed in
> PG_ALTLOC, which is a colon-delimited, marginally wildcard enabled list of
> allowed locations. In some future life this could be included in a
> configuration file.
That sounds like a good capability. There is no reason (yet) why the
environment variable mechanism can not use this too; e.g.
 setenv PG_ALTLOC PGDATA2:/home/peter/unsafe/unprotected/open/path
(just had to put that last one in :)))
> > initlocation is used to create the directory structure *with correct
> > permissions* for an alternate location. It takes an environment
> I think one of the problems was actually that the path initlocation
> assumed and the one createdb worked with were different.
I'm not sure to what you are referring. To the fact that there is an
implicit "/base" in the actual path? e.g. 
initlocation PGDATA2
createdb -D PGDATA2 test
gives an actual path $PGDATA2/base/pg_class ? That's a
security/integrity benefit since:
1) there is not likely to be a random environment variable which
happens to point to a valid directory which *also* has a subdirectory
called "base".
2) it reduces the chance that a user/dbadmin will not use initlocation
to create the database area, hence reducing the chance that a
user/dbadmin has not set the permissions correctly.
> > variable because usually the backend should have that same environment
> > variable defined to ensure consistancy and as a security measure. The
> > environment variable can be expanded or not; initlocation does the
> > right thing in either case.
> It's false security though, since for complete security the dbadmin would
> have to delete all other environment variables altogether.
Theoretically true (though low-risk per above discussion), but this
would be addressed by your PG_ALTLOC augmentation, which I like btw.
> > > I'd vote for taking
> > > out the auto-expansion, so that the correct invocation becomes
> > >         initlocation $PGDATA2
> > > which is what an average user would expect.
> > But the average user does not necessarily have access to the PGDATA2
> > environment variable! This is usually a sysadmin function.
> The average user doesn't even have access to the machine the database is
> running on, so he can't do any initlocation either way. Then again the
> average user doesn't create databases either. But there is no real point
> for initlocation since a mere mkdir() call in createdb() will do the job.
Hmm. Until I see what you are actually doing, I can't comment on
whether this is indeed adequate. But decoupling the location creation
from database creation give the dbadmin control over how permissions
are set and where databases can be created. I'm not sure that is the
case if it is all done within the backend at database creation time.
> Now I just got done with that coding 10 minutes ago but now that someone
> actually spoke up in defence of this mechanism I'm going to wait and see
> what you think about the revised (or any) scheme.
PG_ALTLOC seems to be a great feature. But afaict the environment
variable feature is useful, safe (as much or more so than absolute
paths, anyway), can coexist, and is a convenience. 
I'd like to continue the discussion until I'm convinced, or least
beaten into submission ;)
                         - Thomas
-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California
			
		On 2000-01-12, Thomas Lockhart mentioned:
> Huh? The "environment variable deal" is intended to provide security
> and easier operation. The examples above which allow specifying
> absolute paths (e.g. "/some/where/pg_class") are specifically
> disallowed by default, for a reason. If there were bugs in that, let's
> talk about it, but I'm *very* uncomfortable making wholesale changes
> without a discussion. And what exactly were "several bugs in the
> code"?? Better be specific; that's my code we're talking about :/
The starting point for my investigation was actually the issue of
arbitrary characters in database names, which lead me to fix up the
createdb and dropdb code to make it more error proof, which led me to all
kinds of areas of the code that looked ages old, with comments not
matching the code, dead code, etc. The location issue was just one of
those. (Especially those /* this is work arround only !!! */ sections from
more than 5 years ago concern me a little ...) It's my idea of a learning
experience.
> > > initlocation is used to create the directory structure *with correct
> > > permissions* for an alternate location. It takes an environment
If I create a database with a normal name, the code makes a directory
/usr/local/pgsql/data/base/testdb with the proper permissions. There's no
reason why it wouldn't be able to do the same somewhere else. It's
redundant.
> 1) there is not likely to be a random environment variable which
> happens to point to a valid directory which *also* has a subdirectory
> called "base".
Forgive me, but "not likely" means zero security to me. If you want to
make it secure, be for real. Perhaps I'm a little envvar-phobic in
general. I got my reasons, but that shouldn't stand in other's ways.
> 2) it reduces the chance that a user/dbadmin will not use initlocation
> to create the database area, hence reducing the chance that a
> user/dbadmin has not set the permissions correctly.
The prototype code I got lying around insists on creating the directory
itself, so the chances of using an insecure directory are zero. But there
are also other ways to ensure this rather than another utility.
Okay, now to the "formal" proposal:
* A database name can contain any character (up to NAMEDATALEN)
* By default, the database name is appended to DataDir/base/ to form the
file system location of the database.
* There are certain characters that will not be allowed in database paths
because they are potentially "shell'ishly dangerous". In my current layout
this includes everything from 1 to 31 ascii as well as single-quote
(') and dot (.). If you choose to name a database this way, you need to
override the path to something else.
* If you override the path to a name not containing a slash, the name will
in the same fashion be appended to DataDir/base/. Any future attempts to
use the same path will fail.
* If you override the path to a name including a slash but not at the
start, the part up to the first slash will be interpreted as an
environment variable. The database directory will be the immediate
directory specified and will be created by createdb() (which must have
permission to do so). If it already exists, it will attempt to fix the
permissions.
* If you specify an absolute path then it will use that very path and it
will create the directory as above.
* Either way, in order to use a path outside DataDir, it must be listed in
some configuration option (such as PG_ALTLOC). Environment variable based
paths can be included in the natural way, e.g., to allow a path
'PGDATA2/foo' write PG_ALTLOC=$PGDATA/foo, to allow anything under
PGDATA2, write PG_ALTLOC=$PGDATA/*.
In practice that would mean:
1. If you don't use this feature, nothing changes.
2. If you want to use this feature to "recode" funny characters, you
may. (e.g., CREATE DATABASE "bbb'''..." WITH LOCATON = 'bbb______';)
3. If you want to stick one particular database somewhere else, create the
directory (or at least the directory above it), include it in PG_ALTLOC
and go ahead.
4. If you want to provide users the option of storing databases at several
alternative locations, set up mnemonic environment variables, create the
directories corresponding to them, and put something like
PGDATA2/*:PGDATA3/*:... in PG_ALTLOC.
Are there other circumstances where this would be used?
What could be discussed it the exact interpretation of a path such as
LOCATION = '/mnt/somewhere/foo' with regards to whether 'base' should be
appended or not or the name of the database should be appended to it or
not or how one could otherwise do the name "recoding" in that
circumstance.
-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden