Обсуждение: BUG #4787: Hardlink (ln) causes startup failure with bizarre "timezone_abbreviations" error

Поиск
Список
Период
Сортировка

BUG #4787: Hardlink (ln) causes startup failure with bizarre "timezone_abbreviations" error

От
"Mark Kramer"
Дата:
The following bug has been logged online:

Bug reference:      4787
Logged by:          Mark Kramer
Email address:      root@asarian-host.net
PostgreSQL version: 8.3.7
Operating system:   FreeBSD 7.1
Description:        Hardlink (ln) causes startup failure with bizarre
"timezone_abbreviations" error
Details:

I have my PostgreSQL installed in /usr/local/PostgreSQL/ (cleaner for
updates, instead of just /usr/local) As a result, I made hard-links like
this,

cd /usr/local/bin/
ln /usr/local/PostgreSQL/bin/pg_ctl pg_ctl

Etc. Seems PostgreSQL can't handle the fact. I try and start the server,
with:

/usr/bin/su -l pgsql -c "exec /usr/local/bin/pg_ctl start -D
/var/db/PostgreSQL -w -s -m fast"

I get this error, though:

May  1 04:40:26 asarian-host postgres[9742]: [6-1] FATAL:  invalid value for
parameter "timezone_abbreviations": "Default"

Which is a silly error, because it's rather untrue, and it's rather strange,
honestly, for PostgreSQL to want to be started from a hardcoded location.
Starting up the usual way, with:

/usr/bin/su -l pgsql -c "exec /usr/local/PostgreSQL/bin/pg_ctl start -D
/var/db/PostgreSQL -w -s -m fast"

Works just fine. So, at the very least, change the error message to
something that actually makes sense, like: "FATAL: Binary started from
location other than the one used at compile-time;" or something to that
affect. But better still, there's no need for PostgreSQL to have this hard
location requirement: its lib paths has been set (at boot) with ldconfig, so
it should find whatever libs it need, regardless from where the binary
resides that I use to start the server.
"Mark Kramer" <root@asarian-host.net> writes:
> I have my PostgreSQL installed in /usr/local/PostgreSQL/ (cleaner for
> updates, instead of just /usr/local) As a result, I made hard-links like
> this,

> cd /usr/local/bin/
> ln /usr/local/PostgreSQL/bin/pg_ctl pg_ctl

This isn't going to work because pg_ctl assumes it can find postgres in
the same directory it is in.  Try using a symlink instead.  (It'll be
less likely to fail miserably after an upgrade, too.)

> I get this error, though:
> May  1 04:40:26 asarian-host postgres[9742]: [6-1] FATAL:  invalid value for
> parameter "timezone_abbreviations": "Default"

I agree this is an odd error message though.  Perhaps you hardlinked a
few other things you didn't tell us about?  I'm not sure what it would
take to make this be the first complaint.  What is probably happening is
that postgres is trying to find /usr/local/PostgreSQL/share/ relative
to itself, but I'd have thought it would notice the problem sooner.

            regards, tom lane
-----Original Message-----
From: pgsql-bugs-owner@postgresql.org
[mailto:pgsql-bugs-owner@postgresql.org] On Behalf Of Tom Lane
Sent: vrijdag 1 mei 2009 17:46
To: Mark Kramer
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #4787: Hardlink (ln) causes startup failure with
bizarre "timezone_abbreviations" error

"Mark Kramer" <root@asarian-host.net> writes:

> > I have my PostgreSQL installed in /usr/local/PostgreSQL/ (cleaner for
> > updates, instead of just /usr/local) As a result, I made hard-links
> > like this,

> > cd /usr/local/bin/
> > ln /usr/local/PostgreSQL/bin/pg_ctl pg_ctl

> This isn't going to work because pg_ctl assumes it can find postgres in
> the same directory it is in. Try using a symlink instead.  (It'll be
> less likely to fail miserably after an upgrade, too.)

I tried a symlink as well. Then pg_ctl *can* start the server (which is
kinda odd, by itself, that it can do so now, whereas not with a hardlink;
unless pg_ctl actually reads the symlink content, which is very unlikely),
but it reports a spurious error nonetheless: "could not start server"
(whilst it DOES start the server just fine).

As for pg_ctl assuming it can find postgres in the same directory it is
in, it SHOULD. :) Basically, I hard-linked all files in
/usr/local/PostgreSQL/bin/ to /usr/local/bin/. So, even when pg_ctl got
started from /usr/local/bin/, it should have found /usr/local/bin/postgres
right under its very nose! Also, the error message actually DOES seem to
come from postgres (postgres[9742]: [6-1] FATAL), but that may well be an
optical illusion on my end (as pg_ctl could log as 'postgres' too: haven't
examined that yet).

Clearly, seems PostgreSQL just really wants to be started from its
original install-location.

> > I get this error, though:
> > May  1 04:40:26 asarian-host postgres[9742]: [6-1] FATAL: invalid
> > value for parameter "timezone_abbreviations": "Default"

> I agree this is an odd error message though.  Perhaps you hardlinked a
> few other things you didn't tell us about?  I'm not sure what it would
> take to make this be the first complaint. What is probably happening is
> that postgres is trying to find /usr/local/PostgreSQL/share/ relative
> to itself, but I'd have thought it would notice the problem sooner.

The /share/ thingy is what I strongly suspected too; but since the bug
report FAQ strongly discourages one from writing your assumptions about
what you *think* might be the issue, I refrained from mentioning it. :)
But yes, that seems like a logical place to look.

- Mark
Mark <admin@asarian-host.net> writes:
> As for pg_ctl assuming it can find postgres in the same directory it is
> in, it SHOULD. :) Basically, I hard-linked all files in
> /usr/local/PostgreSQL/bin/ to /usr/local/bin/. So, even when pg_ctl got
> started from /usr/local/bin/, it should have found /usr/local/bin/postgres
> right under its very nose!

Well, it did (else you'd not have got as far as you did).  The point you
are missing is that other components of the distribution, such as the
share/ directory, are expected to be found relative to where the
binaries are.  (This behavior isn't a bug, but intentional to allow
relocatable distribution packages.)  If postgres is executed via a
symlink then it will correctly determine its own location and
successfully locate the share/ directory; otherwise not so much.
I think pg_ctl needs to be able to find share/ as well, though that
might depend on other things such as whether you have NLS enabled.

I was under the impression that there was some code in there to complain
if the path-finding code failed, but maybe it's being executed too late.
Anyway the bug here is an inadequate error message, not that we should
support the configuration.

            regards, tom lane
I wrote:
> I was under the impression that there was some code in there to complain
> if the path-finding code failed, but maybe it's being executed too late.

I looked at this a bit more, and found that there is no such code.
Mark's complaint is easy to reproduce if you move (or hardlink) the
postgres executable into some other directory away from the share
directory and then try to start it on a valid data directory.  (If
it doesn't find postgresql.conf it'll fail sooner.)

initdb behaves a bit more sanely under similar circumstances:

$ initdb
initdb: file "/home/tgl/trial/share/postgresql/postgres.bki" does not exist
This might mean you have a corrupted installation or identified
the wrong directory with the invocation option -L.
$

The postmaster however is much less dependent on the contents of the
share dir than initdb is, so the first time it really notices something
is wrong is when it tries to find the file that the
timezone_abbreviations GUC is supposed to reference.  And when we get
there, in perhaps an overabundance of brevity we intentionally don't
report the file path:

    get_share_path(my_exec_path, share_path);
    snprintf(file_path, sizeof(file_path), "%s/timezonesets/%s",
             share_path, filename);
    tzFile = AllocateFile(file_path, "r");
    if (!tzFile)
    {
        /* at level 0, if file doesn't exist, guc.c's complaint is enough */
        if (errno != ENOENT || depth > 0)
            ereport(tz_elevel,
                    (errcode_for_file_access(),
                     errmsg("could not read time zone file \"%s\": %m",
                            filename)));
        return -1;
    }

So there are a number of things we could consider doing about this,
including just tweaking the above bit of code.  But that only helps
so long as this is the first such reference to fail during startup
--- which is surely pretty coincidental.

What I'm inclined to do is modify PostmasterMain so that immediately
after find_my_exec, it checks that get_share_path returns the name of
a readable directory.  (I see that it's already invoking get_pkglib_path
at that point, but not checking that the result points to anything ---
maybe we should check that too?)  The error message would then be
something similar to what initdb is saying above, ie, misconfigured
installation.  Maybe initdb should have an explicit test of this
nature too, because the message quoted above could still be
misinterpreted.

Or maybe this is more work than its worth.  I don't recall many similar
complaints previously.

Comments?

            regards, tom lane
-----Original Message-----
From: pgsql-bugs-owner@postgresql.org
[mailto:pgsql-bugs-owner@postgresql.org] On Behalf Of Tom Lane
Sent: vrijdag 1 mei 2009 23:57
To: Mark; pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #4787: Hardlink (ln) causes startup failure with
bizarre "timezone_abbreviations" error

> What I'm inclined to do is modify PostmasterMain so that immediately
> after find_my_exec, it checks that get_share_path returns the name of
> a readable directory.

I understand the rationale for relocatable packages. So, I guess hardlinks
are out. But, barring hardlinks, perhaps, in the existence of a symlink, a
simple 'readlink' function could be done to auto-correct PostgreSQL's
base-location? Ala:

char buf[1024];
ssizet_t len;
....
if ((len = readlink ("/usr/local/bin/pg_ctl", buf, sizeof(buf)-1)) != -1)
    buf[len] = '\0';

Symlinks are used quite often, *especially* when dealing with relocatable
packages (read: that will likely not reside in /usr/local/, etc.). And it
would only requires two or three extra lines of code, no?

At any rate, I appreciate you looking into this.

- Mark
Mark <admin@asarian-host.net> writes:
> I understand the rationale for relocatable packages. So, I guess hardlinks
> are out. But, barring hardlinks, perhaps, in the existence of a symlink, a
> simple 'readlink' function could be done to auto-correct PostgreSQL's
> base-location? Ala:

That's exactly what it already does, and why it would've worked if you'd
used symlinks not hardlinks.

            regards, tom lane
On Sat, 02 May 2009 14:47:48 GMT, Tom Lane wrote

> Mark <admin@asarian-host.net> writes:

> > I understand the rationale for relocatable packages. So,
> > I guess hardlinks are out. But, barring hardlinks,
> > perhaps, in the existence of a symlink, a simple 'readlink'
> > function could be done to auto-correct PostgreSQL's
> > base-location? Ala:
>
> That's exactly what it already does, and why it would've worked
> if you'd used symlinks not hardlinks.

Interesting. Yet, as I reported earlier, whilst a symlink does seem to start
the server, pg_ctl takes a long time to do so, and then report: "could not
start server" anyway. But it actually *does* get started. So I figured maybe
something was not entirely right with the symlink, either.

- Mark
Mark <admin@asarian-host.net> writes:
> Interesting. Yet, as I reported earlier, whilst a symlink does seem to start
> the server, pg_ctl takes a long time to do so, and then report: "could not
> start server" anyway. But it actually *does* get started. So I figured maybe
> something was not entirely right with the symlink, either.

That sounds like pg_ctl isn't finding the postmaster's socket file ...
were you playing games with the location of that, too?  pg_ctl is not
terribly bright about relocated socket files (in particular, it does
not read the postmaster's postgresql.conf, so a nonstandard setting
there for unix_socket_directory will confuse it).

            regards, tom lane