Обсуждение: Continued problems with pgdump, Large Objects and crashing backends

Поиск
Список
Период
Сортировка

Continued problems with pgdump, Large Objects and crashing backends

От
Peter T Mount
Дата:
[I'm posting this to the hackers list, as I think this is something deep
in the backend, and not JDBC - Peter]

I've been talking to Jason Venner <jason@idiom.com> over the last couple
of days with an interesting problem.

He's got a small Java application that restores large objects from a
backup to a database. However, the backend seemed to segv at exactly the
same moment.

This occurs with both 6.3.x and 6.4.x (can't remember what revision).

Last night, he sent me a copy of the app, and I ran it against a recent
(last Saturday) cvs copy of 6.5, and the same thing happens.

Now to good bit ;-)

The first problem is with pgdump. His restore app is in two parts, a shell
script and a Java application. The shell script creates two databases
(edit and prod), restores them (from the output of pgdump), then calls
Java to load the large objects, and repair the differing oid's.

However, this fails when creating functions that have more than one sql
statement in them. He has some functions that insert into a table
depending on some arguments, then issue a select on the last arg which is
the functions result. However, pgdump doesn't end the select with a ; and
this causes the 6.5 backend to fail. Adding the ; fixes the problem.

I don't know if it's a known problem, but may need someone to check.

Ok, that's the simple one. Now that harder two:

When the Java app runs, it causes the backend to segv repeatedly at the
same point (after about the 60th large object). Now, I've checked his code
and can't find anything obviously wrong, so I added some tracing to it,
and discovered that when the application closes (either explicitly closing
the application, or upon an error), the backend outputs the following to
stderr, and segv's:

pq_recvbuf: recv() failed, errno 2
pq_recvbuf: recv() failed, errno 0

Now, each of these are for the two open connections, and each appears as
soon as it's respective connection closes. Remember the connections are to
two different databases.

Running the backed with the -d2 flag, these expand to:

pq_recvbuf: recv() failed, errno 2
proc_exit(0) [#0]
shmem_exit(0) [#0]
exit(0)
/usr/local/pgsql/bin/postmaster: reaping dead processes...
/usr/local/pgsql/bin/postmaster: CleanupProc: pid 6731 exited with status 0
pq_recvbuf: recv() failed, errno 0
proc_exit(0) [#0]
shmem_exit(0) [#0]
exit(0)
/usr/local/pgsql/bin/postmaster: reaping dead processes...
/usr/local/pgsql/bin/postmaster: CleanupProc: pid 6730 exited with status 0

This is repeatable, and is not related at all to the large object being
loaded. Reversing the order that the objects are loaded, causes it to fail
on a different object.

Now, the first question: Does someone who knows the backend better than I
do know what could cause the recv() message to occur when disconnecting?

Now the third problem. The last problem occurs outside of any
transactions. In JDBC, you use transactions by setting autocommit to
false. Then, there are methods to commit or rollback the database.

Ok, now the problem. When he sets autocommit to false, the JDBC driver
sends BEGIN to the backend. Ok so far, however, something then fails
during the first large object's load, and causes everything else to fail.

I haven't looked into this one fully, but it's identical on all three
major versions of the backend, which is a little surprising.

Now the weird thing is that the same errors occur when the connections are
closed.

I don't think it's a JDBC problem, as I can't reproduce it with any of my
code. Neither can I see anything wrong with Jason's code.

Any how, this is what's kept me busy the last few evenings, and it's got
be stumped.

Peter

--       Peter T Mount peter@retep.org.uk     Main Homepage: http://www.retep.org.uk
PostgreSQL JDBC Faq: http://www.retep.org.uk/postgresJava PDF Generator: http://www.retep.org.uk/pdf



Re: [HACKERS] Continued problems with pgdump, Large Objects and crashing backends

От
Tom Lane
Дата:
Peter T Mount <peter@retep.org.uk> writes:
> However, this fails when creating functions that have more than one sql
> statement in them. He has some functions that insert into a table
> depending on some arguments, then issue a select on the last arg which is
> the functions result. However, pgdump doesn't end the select with a ; and
> this causes the 6.5 backend to fail. Adding the ; fixes the problem.

What does 'fail' mean exactly?  Crash, or just reject the query?
It sounds like there is a pg_dump bug here (omitting a required
semicolon) but I don't understand whether there's also a backend bug.


> Running the backed with the -d2 flag, these expand to:

> pq_recvbuf: recv() failed, errno 2
> proc_exit(0) [#0]
> shmem_exit(0) [#0]
> exit(0)
> /usr/local/pgsql/bin/postmaster: reaping dead processes...
> /usr/local/pgsql/bin/postmaster: CleanupProc: pid 6731 exited with status 0
> pq_recvbuf: recv() failed, errno 0
> proc_exit(0) [#0]
> shmem_exit(0) [#0]
> exit(0)
> /usr/local/pgsql/bin/postmaster: reaping dead processes...
> /usr/local/pgsql/bin/postmaster: CleanupProc: pid 6730 exited with status 0

This doesn't look like a segv trace to me --- if the backend was
coredumping then the postmaster should see a nonzero exit status.

The recv() complaints probably indicate that the client application
disconnected ungracefully (ie, without sending the 'X' terminate
message).  It's curious that they're not both alike.
That might be a red herring however --- right now pq_recvbuf doesn't
distinguish plain EOF from a true error, and if it's plain EOF then
whatever errno was last set to gets printed.  Think I'll go fix that.

Barring more evidence, all I see here is client disconnect, not a
backend failure.  What's your basis for claiming a segv crash?


> Ok, now the problem. When he sets autocommit to false, the JDBC driver
> sends BEGIN to the backend. Ok so far, however, something then fails
> during the first large object's load, and causes everything else to fail.

That's not a bug, it's a feature ... allegedly, anyway.  Any error
inside a transaction means the entire transaction is aborted.  And
the backend will keep reminding you so until you cooperate by ending
the transaction.  I don't like the behavior very much either, but
it's operating as designed.
        regards, tom lane


Re: [HACKERS] Continued problems with pgdump, Large Objects and crashing backends

От
Peter T Mount
Дата:
On Wed, 17 Feb 1999, Tom Lane wrote:

> Peter T Mount <peter@retep.org.uk> writes:
> > However, this fails when creating functions that have more than one sql
> > statement in them. He has some functions that insert into a table
> > depending on some arguments, then issue a select on the last arg which is
> > the functions result. However, pgdump doesn't end the select with a ; and
> > this causes the 6.5 backend to fail. Adding the ; fixes the problem.
> 
> What does 'fail' mean exactly?  Crash, or just reject the query?
> It sounds like there is a pg_dump bug here (omitting a required
> semicolon) but I don't understand whether there's also a backend bug.

I didn't say this was a backend bug, but was one thing I came across while
looking at the following problem.

> > Running the backed with the -d2 flag, these expand to:
> 
> > pq_recvbuf: recv() failed, errno 2
> > proc_exit(0) [#0]
> > shmem_exit(0) [#0]
> > exit(0)
> > /usr/local/pgsql/bin/postmaster: reaping dead processes...
> > /usr/local/pgsql/bin/postmaster: CleanupProc: pid 6731 exited with status 0
> > pq_recvbuf: recv() failed, errno 0
> > proc_exit(0) [#0]
> > shmem_exit(0) [#0]
> > exit(0)
> > /usr/local/pgsql/bin/postmaster: reaping dead processes...
> > /usr/local/pgsql/bin/postmaster: CleanupProc: pid 6730 exited with status 0
> 
> This doesn't look like a segv trace to me --- if the backend was
> coredumping then the postmaster should see a nonzero exit status.
> 
> The recv() complaints probably indicate that the client application
> disconnected ungracefully (ie, without sending the 'X' terminate
> message).  It's curious that they're not both alike.
> That might be a red herring however --- right now pq_recvbuf doesn't
> distinguish plain EOF from a true error, and if it's plain EOF then
> whatever errno was last set to gets printed.  Think I'll go fix that.
> 
> Barring more evidence, all I see here is client disconnect, not a
> backend failure.

Hmmm, I've never seen the recv() problem before with any JDBC app, only
this one.

PS: Currently the JDBC driver is still using the 6.3.x protocol. When 6.4
came out I didn't implement the CANCEL stuff, as I was concentrating on
getting more of the innards implemented.

Anyhow, if the terminate message is a problem, I'll upgrade the protocol.

> What's your basis for claiming a segv crash?

I think the segv came from Jason (who's run it against 6.3.x and 6.4.x).

> > Ok, now the problem. When he sets autocommit to false, the JDBC driver
> > sends BEGIN to the backend. Ok so far, however, something then fails
> > during the first large object's load, and causes everything else to fail.
> 
> That's not a bug, it's a feature ... allegedly, anyway.  Any error
> inside a transaction means the entire transaction is aborted.  And
> the backend will keep reminding you so until you cooperate by ending
> the transaction.  I don't like the behavior very much either, but
> it's operating as designed.

I'm going to overhaul the autocommit(false) code. I suspect it's broken,
but I need to sit down and figure what is happening with this problem
first.

Peter

--       Peter T Mount peter@retep.org.uk     Main Homepage: http://www.retep.org.uk
PostgreSQL JDBC Faq: http://www.retep.org.uk/postgresJava PDF Generator: http://www.retep.org.uk/pdf



Re: [HACKERS] Continued problems with pgdump, Large Objects and crashing backends

От
Tom Lane
Дата:
Peter T Mount <peter@retep.org.uk> writes:
>> The recv() complaints probably indicate that the client application
>> disconnected ungracefully (ie, without sending the 'X' terminate
>> message).  It's curious that they're not both alike.

> Hmmm, I've never seen the recv() problem before with any JDBC app, only
> this one.

That particular message is new in the 6.5 code (BTW, as of this morning
it should say "pq_recvbuf: unexpected EOF on client connection").

I was about to say that prior versions would also complain about an
unexpected client disconnect, but actually it looks like 6.4.2 doesn't
--- at least not in this low-level code.  I'm not inclined to remove the
message however.  I think we want it there to help detect more serious
problems, like disconnect in the middle of a COPY operation.

> PS: Currently the JDBC driver is still using the 6.3.x protocol. When 6.4
> came out I didn't implement the CANCEL stuff, as I was concentrating on
> getting more of the innards implemented.
> Anyhow, if the terminate message is a problem, I'll upgrade the protocol.

The terminate message is defined in the old protocol too; it's not new
for 6.4.  As for whether it's a "problem" not to send it, it's only 
a problem if you don't like complaints in the postmaster log ;-).
The backend will close up shop just fine without it.
        regards, tom lane


Re: [HACKERS] Continued problems with pgdump, Large Objects and crashing backends

От
Peter T Mount
Дата:
On Thu, 18 Feb 1999, Tom Lane wrote:

> Peter T Mount <peter@retep.org.uk> writes:
> >> The recv() complaints probably indicate that the client application
> >> disconnected ungracefully (ie, without sending the 'X' terminate
> >> message).  It's curious that they're not both alike.
> 
> > Hmmm, I've never seen the recv() problem before with any JDBC app, only
> > this one.
> 
> That particular message is new in the 6.5 code (BTW, as of this morning
> it should say "pq_recvbuf: unexpected EOF on client connection").
> 
> I was about to say that prior versions would also complain about an
> unexpected client disconnect, but actually it looks like 6.4.2 doesn't
> --- at least not in this low-level code.  I'm not inclined to remove the
> message however.  I think we want it there to help detect more serious
> problems, like disconnect in the middle of a COPY operation.
> 
> > PS: Currently the JDBC driver is still using the 6.3.x protocol. When 6.4
> > came out I didn't implement the CANCEL stuff, as I was concentrating on
> > getting more of the innards implemented.
> > Anyhow, if the terminate message is a problem, I'll upgrade the protocol.
> 
> The terminate message is defined in the old protocol too; it's not new
> for 6.4.  As for whether it's a "problem" not to send it, it's only 
> a problem if you don't like complaints in the postmaster log ;-).
> The backend will close up shop just fine without it.

Looks like something that's been missing since the begining. Ok, I'll add
the message to it tomorrow, as I'm planning some cleanups this weekend.

Peter

--       Peter T Mount peter@retep.org.uk     Main Homepage: http://www.retep.org.uk
PostgreSQL JDBC Faq: http://www.retep.org.uk/postgresJava PDF Generator: http://www.retep.org.uk/pdf