Обсуждение: Re: [BUGS] BUG #7815: Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails - invalid status retrieve
On Sat, Jan 19, 2013 at 09:56:48PM -0500, Bruce Momjian wrote: > > (But, at least with the type of packaging I'm using in Fedora, he would > > first have to go through a package downgrade/reinstallation process, > > because the packaging provides no simple scripted way of manually > > starting the old postgres executable, only the new one. Moreover, what > > pg_upgrade is printing provides no help in figuring out whether that's > > the next step.) > > > > I do sympathize with taking a paranoid attitude here, but I'm failing > > to see what advantage there is in not attempting to start the old > > postmaster. In the *only* case that pg_upgrade is successfully > > protecting against with this logic, namely there's-an-active-postmaster- > > already, the postmaster is equally able to protect itself. In other > > cases it would be more helpful not less to let the postmaster analyze > > the situation. > > > > > The other problem is that if the server start fails, how do we know if > > > the failure was due to a running postmaster? > > > > Because we read the postmaster's log file, or at least tell the user to > > do so. That report would be unambiguous, unlike pg_upgrade's. > > Attached is a WIP patch to give you an idea of how I am going to solve > this problem. This comment says it all: Attached is a ready-to-apply version of the patch. I continued to use postmaster.pid to determine if I need to try the start/stop test --- that allows me to know which servers _might_ be running, because a server start failure might be for many reasons and I would prefer not to suggest the server is running if I can avoid it, and the pid file gives me that. The patch is longer than I expected because the code needed to be reordered. The starting banner indicates if it a live check, but to check for a live server we need to start/stop the servers and we needed to know where the binaries are, and we didn't do that until after the start banner. I removed the 'check' line for binary checks, and moved that before the banner printing. postmaster_start also now needs an option to supress an error. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Вложения
On Wed, Jan 23, 2013 at 04:58:57PM -0500, Bruce Momjian wrote: > Attached is a ready-to-apply version of the patch. I continued to use > postmaster.pid to determine if I need to try the start/stop test --- > that allows me to know which servers _might_ be running, because a > server start failure might be for many reasons and I would prefer not to > suggest the server is running if I can avoid it, and the pid file gives > me that. > > The patch is longer than I expected because the code needed to be > reordered. The starting banner indicates if it a live check, but to > check for a live server we need to start/stop the servers and we needed > to know where the binaries are, and we didn't do that until after the > start banner. I removed the 'check' line for binary checks, and moved > that before the banner printing. postmaster_start also now needs an > option to supress an error. Applied. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
The latest update on original bug report at Red Hat bugzilla shows that the reproducer is just disable the peer access for 'postgres' user in pg_hba.conf. So — the old server was most probably still running for OP (not shut down properly as was originally said). But basically, this fix is relevant to this thread so posting here. 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8<
[ fixed hackers CC address ] On Mon, Aug 12, 2013 at 12:31:10PM +0200, Pavel Raiskup wrote: > The latest update on original bug report at Red Hat bugzilla shows that the > reproducer is just disable the peer access for 'postgres' user in > pg_hba.conf. So — the old server was most probably still running for OP > (not shut down properly as was originally said). > > But basically, this fix is relevant to this thread so posting here. > > 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< > > >From f2df7fb281b6346f3feeb5f0f8d2d0ee7fb13f6c Mon Sep 17 00:00:00 2001 > From: Pavel Raiskup <praiskup@redhat.com> > Date: Mon, 12 Aug 2013 10:45:08 +0200 > Subject: [PATCH] pg_upgrade: stop postmaster properly > > Setup the os_info.running_cluster little bit earlier just to allow the > stop_postmaster_atexit callback success when error occurs during the > authentication checks. > > https://bugzilla.redhat.com/show_bug.cgi?id=896161 > http://www.postgresql.org/message-id/E1TwKHs-0005yp-39@wrigleys.postgresql.org > --- > contrib/pg_upgrade/server.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/contrib/pg_upgrade/server.c b/contrib/pg_upgrade/server.c > index c1d459d..c082d0e 100644 > --- a/contrib/pg_upgrade/server.c > +++ b/contrib/pg_upgrade/server.c > @@ -239,6 +239,8 @@ start_postmaster(ClusterInfo *cluster, bool throw_error) > if (!pg_ctl_return && !throw_error) > return false; > > + os_info.running_cluster = cluster; > + > /* Check to see if we can connect to the server; if not, report it. */ > if ((conn = get_db_conn(cluster, "template1")) == NULL || > PQstatus(conn) != CONNECTION_OK) > @@ -258,8 +260,6 @@ start_postmaster(ClusterInfo *cluster, bool throw_error) > pg_log(PG_FATAL, "pg_ctl failed to start the %s server, or connection failed\n", > CLUSTER_NAME(cluster)); > > - os_info.running_cluster = cluster; > - > return true; > } Wow, this is an interesting bug report, and it certainly is a bug. What is complex is the solution. The existing code is written so we only shutdown the server on exit when we know we have started the server _and_ can connect to it. As you stated, this leaves the server running in cases where we _did_ start the server, but can't connect to it, e.g. used MD5 authentication. The patch moves the atexit setting up, as you suggested, but only does that when pg_ctl succeeds (we know we started the server), because we don't want to stop the server on exit if we didn't start it. PG 9.1+ will allow pg_ctl -w start to succeed even if there are permissions problems; earlier versions will not and will keep the server running --- the user will have to stop the server after pg_upgrade says it is running. I am not going to backpatch this beyond 9.3 as it is risky code. I have improved the comments in this area. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Вложения
> The patch moves the atexit setting up, as you suggested, but only does > that when pg_ctl succeeds (we know we started the server), Yes, of course! > PG 9.1+ will allow pg_ctl -w start to succeed even if there are > permissions problems; earlier versions will not and will keep the > server running --- the user will have to stop the server after > pg_upgrade says it is running. This makes it a complex, really.. We may not easily make the stop_postmaster resistant to non-running server. Thus your solution must be good enough. > I am not going to backpatch this beyond 9.3 as it is risky code. I have > improved the comments in this area. Agree, it is OK for me — thanks for your work. Pavel
On Mon, Aug 12, 2013 at 10:08:07PM +0200, Pavel Raiskup wrote: > > The patch moves the atexit setting up, as you suggested, but only does > > that when pg_ctl succeeds (we know we started the server), > > Yes, of course! > > > PG 9.1+ will allow pg_ctl -w start to succeed even if there are > > permissions problems; earlier versions will not and will keep the > > server running --- the user will have to stop the server after > > pg_upgrade says it is running. > > This makes it a complex, really.. We may not easily make the > stop_postmaster resistant to non-running server. Thus your solution must > be good enough. Well, stop_postmaster can run just fine with a stopped server, as we allow the atexit() shutdown to ignore errors. The larger question is whether we should ever stop a server we are not sure we started. The existing pg_upgrade logic checks if the servers are running first with start_postmaster(throw_error = false), so in our existing code, we could probably unconditionally shutdown the server even with a pg_ctl error when using throw_error = true, but pg_upgrade is complex so I am hesitant to make such a bold change. Does anyone else have an opinion? > > I am not going to backpatch this beyond 9.3 as it is risky code. I have > > improved the comments in this area. > > Agree, it is OK for me — thanks for your work. Sure. You gave me something to study today, and highlighted an area of the code that was very unclear. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Mon, Aug 12, 2013 at 04:44:12PM -0400, Bruce Momjian wrote: > On Mon, Aug 12, 2013 at 10:08:07PM +0200, Pavel Raiskup wrote: > > > The patch moves the atexit setting up, as you suggested, but only does > > > that when pg_ctl succeeds (we know we started the server), > > > > Yes, of course! > > > > > PG 9.1+ will allow pg_ctl -w start to succeed even if there are > > > permissions problems; earlier versions will not and will keep the > > > server running --- the user will have to stop the server after > > > pg_upgrade says it is running. > > > > This makes it a complex, really.. We may not easily make the > > stop_postmaster resistant to non-running server. Thus your solution must > > be good enough. > > Well, stop_postmaster can run just fine with a stopped server, as we > allow the atexit() shutdown to ignore errors. The larger question is > whether we should ever stop a server we are not sure we started. > > The existing pg_upgrade logic checks if the servers are running first > with start_postmaster(throw_error = false), so in our existing code, we > could probably unconditionally shutdown the server even with a pg_ctl > error when using throw_error = true, but pg_upgrade is complex so I am > hesitant to make such a bold change. Does anyone else have an opinion? > > > > I am not going to backpatch this beyond 9.3 as it is risky code. I have > > > improved the comments in this area. > > > > Agree, it is OK for me — thanks for your work. > > Sure. You gave me something to study today, and highlighted an area of > the code that was very unclear. I have applied a patch to shutdown the server on successful pg_ctl start, but authentication failure. I have also added code that we might want to be more aggressive someday. I backpatched this to 9.3, but not further back as this is a risky area of the code. Does anyone want to advocate further backpatching? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +