Обсуждение: Re: [BUGS] BUG #7815: Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails - invalid status retrieve

Поиск
Список
Период
Сортировка
On Sat, Jan 19, 2013 at 09:56:48PM -0500, Bruce Momjian wrote:
> > (But, at least with the type of packaging I'm using in Fedora, he would
> > first have to go through a package downgrade/reinstallation process,
> > because the packaging provides no simple scripted way of manually
> > starting the old postgres executable, only the new one.  Moreover, what
> > pg_upgrade is printing provides no help in figuring out whether that's
> > the next step.)
> >
> > I do sympathize with taking a paranoid attitude here, but I'm failing
> > to see what advantage there is in not attempting to start the old
> > postmaster.  In the *only* case that pg_upgrade is successfully
> > protecting against with this logic, namely there's-an-active-postmaster-
> > already, the postmaster is equally able to protect itself.  In other
> > cases it would be more helpful not less to let the postmaster analyze
> > the situation.
> >
> > > The other problem is that if the server start fails, how do we know if
> > > the failure was due to a running postmaster?
> >
> > Because we read the postmaster's log file, or at least tell the user to
> > do so.  That report would be unambiguous, unlike pg_upgrade's.
>
> Attached is a WIP patch to give you an idea of how I am going to solve
> this problem.  This comment says it all:

Attached is a ready-to-apply version of the patch.  I continued to use
postmaster.pid to determine if I need to try the start/stop test ---
that allows me to know which servers _might_ be running, because a
server start failure might be for many reasons and I would prefer not to
suggest the server is running if I can avoid it, and the pid file gives
me that.

The patch is longer than I expected because the code needed to be
reordered.  The starting banner indicates if it a live check, but to
check for a live server we need to start/stop the servers and we needed
to know where the binaries are, and we didn't do that until after the
start banner.  I removed the 'check' line for binary checks, and moved
that before the banner printing.  postmaster_start also now needs an
option to supress an error.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Вложения
On Wed, Jan 23, 2013 at 04:58:57PM -0500, Bruce Momjian wrote:
> Attached is a ready-to-apply version of the patch.  I continued to use
> postmaster.pid to determine if I need to try the start/stop test ---
> that allows me to know which servers _might_ be running, because a
> server start failure might be for many reasons and I would prefer not to
> suggest the server is running if I can avoid it, and the pid file gives
> me that.
> 
> The patch is longer than I expected because the code needed to be
> reordered.  The starting banner indicates if it a live check, but to
> check for a live server we need to start/stop the servers and we needed
> to know where the binaries are, and we didn't do that until after the
> start banner.  I removed the 'check' line for binary checks, and moved
> that before the banner printing.  postmaster_start also now needs an
> option to supress an error.

Applied.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



The latest update on original bug report at Red Hat bugzilla shows that the
reproducer is just disable the peer access for 'postgres' user in
pg_hba.conf.  So — the old server was most probably still running for OP
(not shut down properly as was originally said).

But basically, this fix is relevant to this thread so posting here.

8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8<


[ fixed hackers CC address ]

On Mon, Aug 12, 2013 at 12:31:10PM +0200, Pavel Raiskup wrote:
> The latest update on original bug report at Red Hat bugzilla shows that the
> reproducer is just disable the peer access for 'postgres' user in
> pg_hba.conf.  So — the old server was most probably still running for OP
> (not shut down properly as was originally said).
>
> But basically, this fix is relevant to this thread so posting here.
>
> 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8<
>
> >From f2df7fb281b6346f3feeb5f0f8d2d0ee7fb13f6c Mon Sep 17 00:00:00 2001
> From: Pavel Raiskup <praiskup@redhat.com>
> Date: Mon, 12 Aug 2013 10:45:08 +0200
> Subject: [PATCH] pg_upgrade: stop postmaster properly
>
> Setup the os_info.running_cluster little bit earlier just to allow the
> stop_postmaster_atexit callback success when error occurs during the
> authentication checks.
>
> https://bugzilla.redhat.com/show_bug.cgi?id=896161
> http://www.postgresql.org/message-id/E1TwKHs-0005yp-39@wrigleys.postgresql.org
> ---
>  contrib/pg_upgrade/server.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/contrib/pg_upgrade/server.c b/contrib/pg_upgrade/server.c
> index c1d459d..c082d0e 100644
> --- a/contrib/pg_upgrade/server.c
> +++ b/contrib/pg_upgrade/server.c
> @@ -239,6 +239,8 @@ start_postmaster(ClusterInfo *cluster, bool throw_error)
>      if (!pg_ctl_return && !throw_error)
>          return false;
>
> +    os_info.running_cluster = cluster;
> +
>      /* Check to see if we can connect to the server; if not, report it. */
>      if ((conn = get_db_conn(cluster, "template1")) == NULL ||
>          PQstatus(conn) != CONNECTION_OK)
> @@ -258,8 +260,6 @@ start_postmaster(ClusterInfo *cluster, bool throw_error)
>          pg_log(PG_FATAL, "pg_ctl failed to start the %s server, or connection failed\n",
>                 CLUSTER_NAME(cluster));
>
> -    os_info.running_cluster = cluster;
> -
>      return true;
>  }

Wow, this is an interesting bug report, and it certainly is a bug.  What
is complex is the solution.

The existing code is written so we only shutdown the server on exit when
we know we have started the server _and_ can connect to it.  As you
stated, this leaves the server running in cases where we _did_ start the
server, but can't connect to it, e.g. used MD5 authentication.

The patch moves the atexit setting up, as you suggested, but only does
that when pg_ctl succeeds (we know we started the server), because we
don't want to stop the server on exit if we didn't start it.  PG 9.1+
will allow pg_ctl -w start to succeed even if there are permissions
problems;  earlier versions will not and will keep the server running
--- the user will have to stop the server after pg_upgrade says it is
running.

I am not going to backpatch this beyond 9.3 as it is risky code.  I have
improved the comments in this area.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Вложения
> The patch moves the atexit setting up, as you suggested, but only does
> that when pg_ctl succeeds (we know we started the server),

Yes, of course!

> PG 9.1+ will allow pg_ctl -w start to succeed even if there are
> permissions problems;  earlier versions will not and will keep the
> server running --- the user will have to stop the server after
> pg_upgrade says it is running.

This makes it a complex, really..  We may not easily make the
stop_postmaster resistant to non-running server.  Thus your solution must
be good enough.

> I am not going to backpatch this beyond 9.3 as it is risky code.  I have
> improved the comments in this area.

Agree, it is OK for me — thanks for your work.

Pavel




On Mon, Aug 12, 2013 at 10:08:07PM +0200, Pavel Raiskup wrote:
> > The patch moves the atexit setting up, as you suggested, but only does
> > that when pg_ctl succeeds (we know we started the server),
> 
> Yes, of course!
> 
> > PG 9.1+ will allow pg_ctl -w start to succeed even if there are
> > permissions problems;  earlier versions will not and will keep the
> > server running --- the user will have to stop the server after
> > pg_upgrade says it is running.
> 
> This makes it a complex, really..  We may not easily make the
> stop_postmaster resistant to non-running server.  Thus your solution must
> be good enough.

Well, stop_postmaster can run just fine with a stopped server, as we
allow the atexit() shutdown to ignore errors.  The larger question is
whether we should ever stop a server we are not sure we started.

The existing pg_upgrade logic checks if the servers are running first
with start_postmaster(throw_error = false), so in our existing code, we
could probably unconditionally shutdown the server even with a pg_ctl
error when using throw_error = true, but pg_upgrade is complex so I am
hesitant to make such a bold change.  Does anyone else have an opinion?

> > I am not going to backpatch this beyond 9.3 as it is risky code.  I have
> > improved the comments in this area.
> 
> Agree, it is OK for me — thanks for your work.

Sure.  You gave me something to study today, and highlighted an area of
the code that was very unclear.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



On Mon, Aug 12, 2013 at 04:44:12PM -0400, Bruce Momjian wrote:
> On Mon, Aug 12, 2013 at 10:08:07PM +0200, Pavel Raiskup wrote:
> > > The patch moves the atexit setting up, as you suggested, but only does
> > > that when pg_ctl succeeds (we know we started the server),
> > 
> > Yes, of course!
> > 
> > > PG 9.1+ will allow pg_ctl -w start to succeed even if there are
> > > permissions problems;  earlier versions will not and will keep the
> > > server running --- the user will have to stop the server after
> > > pg_upgrade says it is running.
> > 
> > This makes it a complex, really..  We may not easily make the
> > stop_postmaster resistant to non-running server.  Thus your solution must
> > be good enough.
> 
> Well, stop_postmaster can run just fine with a stopped server, as we
> allow the atexit() shutdown to ignore errors.  The larger question is
> whether we should ever stop a server we are not sure we started.
> 
> The existing pg_upgrade logic checks if the servers are running first
> with start_postmaster(throw_error = false), so in our existing code, we
> could probably unconditionally shutdown the server even with a pg_ctl
> error when using throw_error = true, but pg_upgrade is complex so I am
> hesitant to make such a bold change.  Does anyone else have an opinion?
> 
> > > I am not going to backpatch this beyond 9.3 as it is risky code.  I have
> > > improved the comments in this area.
> > 
> > Agree, it is OK for me — thanks for your work.
> 
> Sure.  You gave me something to study today, and highlighted an area of
> the code that was very unclear.

I have applied a patch to shutdown the server on successful pg_ctl
start, but authentication failure.  I have also added code that we might
want to be more aggressive someday.

I backpatched this to 9.3, but not further back as this is a risky area
of the code.  Does anyone want to advocate further backpatching?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +