Обсуждение: postmaster.pid file auto-clean up?
I vaguely remember reading in the release notes (around the time 9.x was released) something about it automatically clearing out the postmaster.pid file if it was found to be stale/invalid when starting the the database server, however I cannot find any reference to this anymore.
Was this something that did, in fact, exist at one point, and was pulled?
Sebastien Boisvert <sebastienboisvert@yahoo.com> writes: > I vaguely remember reading in the release notes (around the time 9.x was released) something about it automatically clearingout the postmaster.pid file if it was found to be stale/invalid when starting the the database server, however Icannot find any reference to this anymore. It's always done that. We occasionally see startup scripts that "helpfully" remove the .pid file. They are, without exception, wrong and dangerous. The postmaster is much more likely to get this right by itself. regards, tom lane
Is this mechanism documented anywhere (besides source code)? It looks like PG will only clean it up if there's no other process running at all on the pid listed in the postmaster.pidfile, even if any process running on that pid isn't a PG process or there's no server running on the data directory(as per `pg_ctl status`). On Aug 20 2012, at 1:31 PM, Tom Lane wrote: > Sebastien Boisvert <sebastienboisvert@yahoo.com> writes: >> I vaguely remember reading in the release notes (around the time 9.x was released) something about it automatically clearingout the postmaster.pid file if it was found to be stale/invalid when starting the the database server, however Icannot find any reference to this anymore. > > It's always done that. > > We occasionally see startup scripts that "helpfully" remove the .pid > file. They are, without exception, wrong and dangerous. The postmaster > is much more likely to get this right by itself. > > regards, tom lane > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general
Sebastien Boisvert <sebastienboisvert@yahoo.com> writes: > Is this mechanism documented anywhere (besides source code)? No, not really. > It looks like PG will only clean it up if there's no other process running at all on the pid listed in the postmaster.pidfile, even if any process running on that pid isn't a PG process or there's no server running on the data directory(as per `pg_ctl status`). Not sure what you're looking at, but the above is wrong in at least one critical detail, namely that there's a process-ownership check via kill(). There are also checks to ensure no children of the previous postmaster are still alive. These are not things you want to lightly bypass: two sets of postmaster children running against the same data directory *will* result in unrecoverable data corruption. If you're trying to claim you've seen a false-positive situation, it would be interesting to hear actual details. regards, tom lane
On Mon, Aug 20, 2012 at 11:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Sebastien Boisvert <sebastienboisvert@yahoo.com> writes:No, not really.
> Is this mechanism documented anywhere (besides source code)?Not sure what you're looking at, but the above is wrong in at least one
> It looks like PG will only clean it up if there's no other process running at all on the pid listed in the postmaster.pid file, even if any process running on that pid isn't a PG process or there's no server running on the data directory (as per `pg_ctl status`).
critical detail, namely that there's a process-ownership check via
kill(). There are also checks to ensure no children of the previous
postmaster are still alive. These are not things you want to lightly
bypass: two sets of postmaster children running against the same data
directory *will* result in unrecoverable data corruption.
If you're trying to claim you've seen a false-positive situation, it
would be interesting to hear actual details.
Hello, I work with Seb, and have been investigating this deeper.
It does in fact appear that we are getting false-positives.
When trying to start PG using pg_ctl, I am getting this response:
pg_ctl: another server might be running; trying to start server anyway
2012-08-26 04:46:02.211 GMT [] - FATAL: lock file "postmaster.pid" already exists
2012-08-26 04:46:02.211 GMT [] - HINT: Is another postmaster (PID 8574) running in data directory "/Users/mclark/Library/Application Support/com.marketcircle.Daylite4/StorageDebug.dlpdb/Data/9_1"?
pg_ctl: this data directory appears to be running a pre-existing postmaster
pg_ctl: could not start server
Examine the log output.
PID 8574 is actually iTunes, not PG, and PG was cleanly brought down on it's last run, there are no children processes running.
Seb figured out how to contrive this situation.
Run PG, copy the pid file, stop pg, copy the copied pid file back to the data dir and edit it, replacing the old PID with that of another running process.
At first we thought our software was to blame, because it checks the PID from PG's pid file to see if a process is running with that PID, and if none are found then we call pg_ctl, otherwise we just continue launching our software and trying to connect to PG.
I just added an additional check to see if the process name for the PID is postgres, and if not then try to start PG with pg_ctl, thinking it would figure it out and remove the pid file as it would if there was no process running with that pid.
Is this considered a bug? Should PG do a similar check on the process name, or has the way we contrived this doing something unexpected?
Thanks,
Michael.
On 08/25/12 9:56 PM, Michael Clark wrote: > PID 8574 is actually iTunes, not PG, and PG was cleanly brought down > on it's last run, there are no children processes running. when postgres is cleanly brought down, the postgresql.pid file is supposed to be removed. that file contains the PID that pg_ctl uses. could you be running a pg_ctl from a different version, in the wrong directory ? -- john r pierce N 37, W 122 santa cruz ca mid-left coast
Michael Clark <codingninja@gmail.com> writes: > It does in fact appear that we are getting false-positives. > When trying to start PG using pg_ctl, I am getting this response: > pg_ctl: another server might be running; trying to start server anyway > 2012-08-26 04:46:02.211 GMT [] - FATAL: lock file "postmaster.pid" already > exists > 2012-08-26 04:46:02.211 GMT [] - HINT: Is another postmaster (PID 8574) > running in data directory "/Users/mclark/Library/Application > Support/com.marketcircle.Daylite4/StorageDebug.dlpdb/Data/9_1"? > PID 8574 is actually iTunes, not PG, iTunes? What is that doing running under PG's userid? If you mean that you're launching PG under some random user's UID, you might want to think about giving it a dedicated UID instead, so as to improve the selectivity of the same-UID check. This would also give a good deal more protection to the database files. > and PG was cleanly brought down on > it's last run, there are no children processes running. As John pointed out, if PG was in fact stopped cleanly, the pid file would not be there. The symptoms you've described so far seem consistent with the idea that PG was not stopped "cleanly", but rather by kill -9 on the postmaster (with the child processes exiting either on their own, or as soon as they noticed they were orphans). This is not recommended practice. > Seb figured out how to contrive this situation. > Run PG, copy the pid file, stop pg, copy the copied pid file back to the > data dir and edit it, replacing the old PID with that of another running > process. You're kidding, right? If you intentionally set out to break the postmaster interlock, you will doubtless be able to do that, and would still be able to break any other algorithm we might devise. Let's confine this discussion to scenarios that could arise without intentional interference. regards, tom lane
On Sun, Aug 26, 2012 at 10:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Clark <codingninja@gmail.com> writes:> PID 8574 is actually iTunes, not PG,iTunes? What is that doing running under PG's userid?
We back our client application with PG, each OSX user gets their own instance of PG.
It runs as that OSX user.
> Seb figured out how to contrive this situation.You're kidding, right? If you intentionally set out to break the
> Run PG, copy the pid file, stop pg, copy the copied pid file back to the
> data dir and edit it, replacing the old PID with that of another running
> process.
postmaster interlock, you will doubtless be able to do that, and would
still be able to break any other algorithm we might devise. Let's
confine this discussion to scenarios that could arise without
intentional interference.
We were presented with a problem we didn't understand.
We set out to try and figure out how we could replicate the problem, for debugging purposes.
We managed to do so to see how our application behaves, and to see how PG behaves.
In the wild this scenario has arisen without intentional interference. In debugging, yes, we contrived the situation to replicate the behaviour. Mind you, we may be using PG in an environment that isn't advisable.
We just started this discussion to learn and understand, and to see if this is a situation that would be expected to be handled.
Thanks,
Michael.
On 26 Aug 2012, at 17:21, Michael Clark wrote: > On Sun, Aug 26, 2012 at 10:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Michael Clark <codingninja@gmail.com> writes: >> > PID 8574 is actually iTunes, not PG, >> >> iTunes? What is that doing running under PG's userid? >> > > > We back our client application with PG, > each OSX user gets their own instance of PG. Are you certain that's necessary? It's generally a better idea to run a single PG server with a database for each user. Havingmultiple copies running has its use-cases, but the necessity is quite uncommon. You could compare what you're doing to giving every user their own copy of OS X. There are situations in which you'd wantthat, but generally its considered a bad idea. You'd never have even thought to do that if you were, for example, using Oracle for the database. That's a hugely expensivedatabase license for every user on the system, while you really only need one. > It runs as that OSX user. >> > Seb figured out how to contrive this situation. >> > Run PG, copy the pid file, stop pg, copy the copied pid file back to the >> > data dir and edit it, replacing the old PID with that of another running >> > process. >> >> You're kidding, right? If you intentionally set out to break the >> postmaster interlock, you will doubtless be able to do that, and would >> still be able to break any other algorithm we might devise. Let's >> confine this discussion to scenarios that could arise without >> intentional interference. > > We were presented with a problem we didn't understand. > We set out to try and figure out how we could replicate the problem, for debugging purposes. > We managed to do so to see how our application behaves, and to see how PG behaves. > > In the wild this scenario has arisen without intentional interference. In debugging, yes, we contrived the situation toreplicate the behaviour. Mind you, we may be using PG in an environment that isn't advisable. What you replicated is not what happens when your problem occurs. Processes don't do things like that with each others PIDfiles. What's probably happening in your case is that there's a conflict with another copy of Postgres running; perhaps its runningunder the same user-id twice (or more) or on the same port? My suggestion would be to get rid of those extra copies of PG and just run one instance. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
On Sun, Aug 26, 2012 at 1:25 PM, Alban Hertroys <haramrae@gmail.com> wrote:
> We back our client application with PG,Are you certain that's necessary?
> each OSX user gets their own instance of PG.
It was a decision made, weighing various trade-offs, 4 years ago now.
> In the wild this scenario has arisen without intentional interference. In debugging, yes, we contrived the situation to replicate the behaviour. Mind you, we may be using PG in an environment that isn't advisable.What you replicated is not what happens when your problem occurs. Processes don't do things like that with each others PID files.
That is true.
But the system does recycle pids, especially after a reboot.
I appreciate all the feedback and input from everyone who responded.
Thank you!! You have answered our questions, and it gives us food for thought.
Michael.