Обсуждение: Re: postgresql 6.3.2

Поиск
Список
Период
Сортировка

Re: postgresql 6.3.2

От
Bruce Momjian
Дата:
Sending this to the hackers list to see if anyone knows anything about
it.  One of our developers had the problem too, but I thought it was
fixed.

>
> Hello,
>
>   Sorry to send you this message directly to you, but I'm currently not
> suscribed to any psql mailing list and this is the only way I thought to
> know if my email has been received.
>   I've sent you some messages in the past regarding the Linux fflush() problem,
> the one that made psql receive a EPIPE signal.
>   I've downloaded the latest available version 6.3.2 and has the same
> problem.
>   If you don't remember, there was a fflush() called in fe-misc.c, exactly
> in pqPuts() function that receives EPIPE and it's currently not ignored.
>   I don't know if this is Linux specific or not, but it'd be great to fix
> the problem in the distribution. Do you agree?



--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Re: postgresql 6.3.2

От
"Thomas G. Lockhart"
Дата:
> > I've sent you some messages in the past regarding the Linux fflush()
> > problem, the one that made psql receive a EPIPE signal.
> > I've downloaded the latest available version 6.3.2 and has the
> > same problem.
> > If you don't remember, there was a fflush() called in fe-misc.c,
> > exactly in pqPuts() function that receives EPIPE and it's currently
> > not ignored.
> > I don't know if this is Linux specific or not, but it'd be great to
> > fix the problem in the distribution. Do you agree?

Yes, I recall the "broken pipe" problem and thought that someone had
fixed it (most platforms didn't seem to see the problem, but Linux did).

I'm not currently running v6.3.2, having rev-locked on 980408 to get
some development done for v6.4. Did you supply a patch to fix the
problem earlier?

                    - Tom

Re: SIGPIPE gripe

От
Tom Lane
Дата:
"Thomas G. Lockhart" <lockhart@alumni.caltech.edu> writes:
>>>> If you don't remember, there was a fflush() called in fe-misc.c,
>>>> exactly in pqPuts() function that receives EPIPE and it's currently
>>>> not ignored.

> Yes, I recall the "broken pipe" problem and thought that someone had
> fixed it (most platforms didn't seem to see the problem, but Linux did).

fe-connect.c is set up to ignore SIGPIPE while trying to shut down the
connection.  (The 6.3.2 release is broken on non-POSIX-signal platforms,
because it resets SIGPIPE to SIG_DFL afterwards, which may well not be
what the application wanted.  I've fixed that in the version that I plan
to submit soon.)

There is no equivalent code to ignore SIGPIPE during ordinary writes to
the backend.  I'm hesitant to add it on the following grounds:
  1. Speed: a write would need three kernel calls instead of one.
  2. I'm suspicious of code that alters signal settings during normal
     operation.  Especially in a library that can't know what else is
     going on in the application.  Disabling the application's signal
     handler, even for short intervals, is best avoided.
  3. It's only an issue if the backend crashes, which shouldn't happen
     anyway ... shouldn't happen anyway ... shouldn't ... ;-)

The real question is what scenario is causing SIGPIPE to be delivered
in the first place.  A search of the pgsql-hackers archives for
"SIGPIPE" yields only a mention of seeing SIGPIPE some of the time
(not always) when trying to connect to a nonexistent database.
If that's what's being complained of here, I'll try to look into it.

            regards, tom lane

Re: [HACKERS] Re: SIGPIPE gripe

От
"Thomas G. Lockhart"
Дата:
> > Yes, I recall the "broken pipe" problem and thought that someone had
> > fixed it (most platforms didn't seem to see the problem, but Linux
> > did).
> fe-connect.c is set up to ignore SIGPIPE while trying to shut down the
> connection.  (The 6.3.2 release is broken on non-POSIX-signal
> platforms, because it resets SIGPIPE to SIG_DFL afterwards, which may
> well not be what the application wanted.  I've fixed that in the
> version that I plan to submit soon.)
> There is no equivalent code to ignore SIGPIPE during ordinary writes
> to the backend.  I'm hesitant to add it on the following grounds:
>   1. Speed: a write would need three kernel calls instead of one.
>   2. I'm suspicious of code that alters signal settings during normal
>      operation.  Especially in a library that can't know what else is
>      going on in the application.  Disabling the application's signal
>      handler, even for short intervals, is best avoided.
>   3. It's only an issue if the backend crashes, which shouldn't happen
>      anyway ... shouldn't happen anyway ... shouldn't ... ;-)
> The real question is what scenario is causing SIGPIPE to be delivered
> in the first place.  A search of the pgsql-hackers archives for
> "SIGPIPE" yields only a mention of seeing SIGPIPE some of the time
> (not always) when trying to connect to a nonexistent database.
> If that's what's being complained of here, I'll try to look into it.

golem$ psql nada
Connection to database 'nada' failed.
FATAL 1:  Database nada does not exist in pg_database
golem$ psql nada
Broken pipe
golem$

This is on a Linux box with Postgres code frozen on 980408. I assume
that full v6.3.2 exhibits the same...

                  - Tom

Re: SIGPIPE gripe

От
Tom Lane
Дата:
I said:
> The real question is what scenario is causing SIGPIPE to be delivered
> in the first place.  A search of the pgsql-hackers archives for
> "SIGPIPE" yields only a mention of seeing SIGPIPE some of the time
> (not always) when trying to connect to a nonexistent database.

OK, I've been able to reproduce this; I understand the problem and
I have a proposed fix.

Here's the scenario.  On the server side, this happens:

    Postmaster receives new connection request from client

    (possible authentication cycle here)

    Postmaster sends "AUTH OK" to client

    Postmaster forks backend

    Backend discovers that database name is invalid

    Backend sends error message

    Backend closes connection and exits

Meanwhile, once the client receives the "AUTH OK" it initiates
an empty query cycle (which is commented as intending to discover
whether the database exists!):

    ...

    Client receives "AUTH_OK"

    Client sends "Q " query

    Client waits for response

The problem, of course, is that if the backend manages to exit
before the client gets to send its empty query, then the client
is writing on a closed connection.  Boom, SIGPIPE.

I thought about hacking around this by having the postmaster check
the validity of the database name before it does the authorization
cycle.  But that's a bad idea; first because it allows unauthorized
users to probe the validity of database names, and second because
it only fixes this particular instance of the problem.  The general
problem is that the FE/BE protocol does not make provision for errors
reported by the backend during startup.  ISTM there are many ways in
which the BE might fail during startup, not all of which could
reasonably be checked in advance by the postmaster.

So ... since we're altering the protocol anyway ... the right fix is
to alter the protocol a little more.  Remember that "Z" message that
the backend is now sending at the end of every query cycle?  What
we ought to do is make the BE send "Z" at completion of startup,
as well.  (In other words, "Z" will really mean "Ready for Query"
rather than "Query Done".  This is actually easier to implement in
postgres.c than the other way.)  Now the client's startup procedure
looks like

    ...

    Client receives "AUTH_OK"

    Client waits for "Z" ; if get "E" instead, BE startup failed.

I suspect it's not really necessary to do an empty query after this,
but we may as well leave that in there for additional reliability.

            regards, tom lane

Re: [HACKERS] Re: SIGPIPE gripe

От
Bruce Momjian
Дата:
>     Client receives "AUTH_OK"
>
>     Client waits for "Z" ; if get "E" instead, BE startup failed.
>
> I suspect it's not really necessary to do an empty query after this,
> but we may as well leave that in there for additional reliability.

I say go without the extra query.  We can always add it later if there
is a problem.  Backend startup time should be as fast as possible,
especially for short requests like www.

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

Re: [HACKERS] Re: SIGPIPE gripe

От
Tom Lane
Дата:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
>> I suspect it's not really necessary to do an empty query after this,
>> but we may as well leave that in there for additional reliability.

> I say go without the extra query.  We can always add it later if there
> is a problem.  Backend startup time should be as fast as possible,

Good point.  OK, I'll leave the empty-query code in fe-connect.c,
but ifdef it out, and we'll see if anyone has any problems.

            regards, tom lane

Re: postgresql 6.3.2

От
Federico Schwindt
Дата:
> > > I've sent you some messages in the past regarding the Linux fflush()
> > > problem, the one that made psql receive a EPIPE signal.
> > > I've downloaded the latest available version 6.3.2 and has the
> > > same problem.
> > > If you don't remember, there was a fflush() called in fe-misc.c,
> > > exactly in pqPuts() function that receives EPIPE and it's currently
> > > not ignored.
> > > I don't know if this is Linux specific or not, but it'd be great to
> > > fix the problem in the distribution. Do you agree?
>
> Yes, I recall the "broken pipe" problem and thought that someone had
> fixed it (most platforms didn't seem to see the problem, but Linux did).
>
> I'm not currently running v6.3.2, having rev-locked on 980408 to get
> some development done for v6.4. Did you supply a patch to fix the
> problem earlier?
>

  Well, kind of. I've tracked the problem down to PQexec() and suggested
some way to fix it.
  I've checked the current psql and the problem is when it calls to
PQconnectdb(). At this point EPIPE isn't ignored. Inside PQconnectdb()
there is a call to PQexec() to see if the database exists.
  Then PQexec() calls pqPuts() and you get the broken pipe.
  Before the call to PQconnectdb in psql there isn't any call to pqsignal.

  Federico Schwindt





Re: [HACKERS] Re: SIGPIPE gripe

От
dg@illustra.com (David Gould)
Дата:
> So ... since we're altering the protocol anyway ... the right fix is
> to alter the protocol a little more.  Remember that "Z" message that
> the backend is now sending at the end of every query cycle?  What
> we ought to do is make the BE send "Z" at completion of startup,
> as well.  (In other words, "Z" will really mean "Ready for Query"
> rather than "Query Done".  This is actually easier to implement in
> postgres.c than the other way.)  Now the client's startup procedure
> looks like
>
>     ...
>
>     Client receives "AUTH_OK"
>
>     Client waits for "Z" ; if get "E" instead, BE startup failed.

BE fails, client gets SIGPIPE? or client waits forever?

-dg

David Gould           dg@illustra.com            510.628.3783 or 510.305.9468
Informix Software                      300 Lakeside Drive   Oakland, CA 94612
 - A child of five could understand this!  Fetch me a child of five.

Re: [HACKERS] Re: SIGPIPE gripe

От
Peter T Mount
Дата:
On Mon, 4 May 1998, Tom Lane wrote:

[snip]

> Meanwhile, once the client receives the "AUTH OK" it initiates
> an empty query cycle (which is commented as intending to discover
> whether the database exists!):
>
>     ...
>
>     Client receives "AUTH_OK"
>
>     Client sends "Q " query
>
>     Client waits for response
>
> The problem, of course, is that if the backend manages to exit
> before the client gets to send its empty query, then the client
> is writing on a closed connection.  Boom, SIGPIPE.

[snip]

> So ... since we're altering the protocol anyway ... the right fix is
> to alter the protocol a little more.  Remember that "Z" message that
> the backend is now sending at the end of every query cycle?  What
> we ought to do is make the BE send "Z" at completion of startup,
> as well.  (In other words, "Z" will really mean "Ready for Query"
> rather than "Query Done".  This is actually easier to implement in
> postgres.c than the other way.)  Now the client's startup procedure
> looks like
>
>     ...
>
>     Client receives "AUTH_OK"
>
>     Client waits for "Z" ; if get "E" instead, BE startup failed.

This sounds fair enough. Infact we could then throw a more meaningful
error message when this occurs.

> I suspect it's not really necessary to do an empty query after this,
> but we may as well leave that in there for additional reliability.

In JDBC, I replaced (back in 6.2) the empty query with one to get the
current DateStyle (which the driver then uses to handle dates correctly).

--
Peter T Mount peter@retep.org.uk or petermount@earthling.net
Main Homepage: http://www.retep.org.uk
************ Someday I may rebuild this signature completely ;-) ************
Work Homepage: http://www.maidstone.gov.uk Work EMail: peter@maidstone.gov.uk


Re: [HACKERS] Re: SIGPIPE gripe

От
Tom Lane
Дата:
dg@illustra.com (David Gould) writes:
>> So ... since we're altering the protocol anyway ... the right fix is
>> to alter the protocol a little more.
>>
>> Client waits for "Z" ; if get "E" instead, BE startup failed.

> BE fails, client gets SIGPIPE? or client waits forever?

Neither: the client detects EOF on its input and realizes that the
backend failed.  Already done and tested.

(SIGPIPE is only for *write* on a closed channel, not *read*.
Read just returns EOF.)

            regards, tom lane