Обсуждение: Continued problems with pgdump, Large Objects and crashing backends
[I'm posting this to the hackers list, as I think this is something deep in the backend, and not JDBC - Peter] I've been talking to Jason Venner <jason@idiom.com> over the last couple of days with an interesting problem. He's got a small Java application that restores large objects from a backup to a database. However, the backend seemed to segv at exactly the same moment. This occurs with both 6.3.x and 6.4.x (can't remember what revision). Last night, he sent me a copy of the app, and I ran it against a recent (last Saturday) cvs copy of 6.5, and the same thing happens. Now to good bit ;-) The first problem is with pgdump. His restore app is in two parts, a shell script and a Java application. The shell script creates two databases (edit and prod), restores them (from the output of pgdump), then calls Java to load the large objects, and repair the differing oid's. However, this fails when creating functions that have more than one sql statement in them. He has some functions that insert into a table depending on some arguments, then issue a select on the last arg which is the functions result. However, pgdump doesn't end the select with a ; and this causes the 6.5 backend to fail. Adding the ; fixes the problem. I don't know if it's a known problem, but may need someone to check. Ok, that's the simple one. Now that harder two: When the Java app runs, it causes the backend to segv repeatedly at the same point (after about the 60th large object). Now, I've checked his code and can't find anything obviously wrong, so I added some tracing to it, and discovered that when the application closes (either explicitly closing the application, or upon an error), the backend outputs the following to stderr, and segv's: pq_recvbuf: recv() failed, errno 2 pq_recvbuf: recv() failed, errno 0 Now, each of these are for the two open connections, and each appears as soon as it's respective connection closes. Remember the connections are to two different databases. Running the backed with the -d2 flag, these expand to: pq_recvbuf: recv() failed, errno 2 proc_exit(0) [#0] shmem_exit(0) [#0] exit(0) /usr/local/pgsql/bin/postmaster: reaping dead processes... /usr/local/pgsql/bin/postmaster: CleanupProc: pid 6731 exited with status 0 pq_recvbuf: recv() failed, errno 0 proc_exit(0) [#0] shmem_exit(0) [#0] exit(0) /usr/local/pgsql/bin/postmaster: reaping dead processes... /usr/local/pgsql/bin/postmaster: CleanupProc: pid 6730 exited with status 0 This is repeatable, and is not related at all to the large object being loaded. Reversing the order that the objects are loaded, causes it to fail on a different object. Now, the first question: Does someone who knows the backend better than I do know what could cause the recv() message to occur when disconnecting? Now the third problem. The last problem occurs outside of any transactions. In JDBC, you use transactions by setting autocommit to false. Then, there are methods to commit or rollback the database. Ok, now the problem. When he sets autocommit to false, the JDBC driver sends BEGIN to the backend. Ok so far, however, something then fails during the first large object's load, and causes everything else to fail. I haven't looked into this one fully, but it's identical on all three major versions of the backend, which is a little surprising. Now the weird thing is that the same errors occur when the connections are closed. I don't think it's a JDBC problem, as I can't reproduce it with any of my code. Neither can I see anything wrong with Jason's code. Any how, this is what's kept me busy the last few evenings, and it's got be stumped. Peter -- Peter T Mount peter@retep.org.uk Main Homepage: http://www.retep.org.uk PostgreSQL JDBC Faq: http://www.retep.org.uk/postgresJava PDF Generator: http://www.retep.org.uk/pdf
Peter T Mount <peter@retep.org.uk> writes: > However, this fails when creating functions that have more than one sql > statement in them. He has some functions that insert into a table > depending on some arguments, then issue a select on the last arg which is > the functions result. However, pgdump doesn't end the select with a ; and > this causes the 6.5 backend to fail. Adding the ; fixes the problem. What does 'fail' mean exactly? Crash, or just reject the query? It sounds like there is a pg_dump bug here (omitting a required semicolon) but I don't understand whether there's also a backend bug. > Running the backed with the -d2 flag, these expand to: > pq_recvbuf: recv() failed, errno 2 > proc_exit(0) [#0] > shmem_exit(0) [#0] > exit(0) > /usr/local/pgsql/bin/postmaster: reaping dead processes... > /usr/local/pgsql/bin/postmaster: CleanupProc: pid 6731 exited with status 0 > pq_recvbuf: recv() failed, errno 0 > proc_exit(0) [#0] > shmem_exit(0) [#0] > exit(0) > /usr/local/pgsql/bin/postmaster: reaping dead processes... > /usr/local/pgsql/bin/postmaster: CleanupProc: pid 6730 exited with status 0 This doesn't look like a segv trace to me --- if the backend was coredumping then the postmaster should see a nonzero exit status. The recv() complaints probably indicate that the client application disconnected ungracefully (ie, without sending the 'X' terminate message). It's curious that they're not both alike. That might be a red herring however --- right now pq_recvbuf doesn't distinguish plain EOF from a true error, and if it's plain EOF then whatever errno was last set to gets printed. Think I'll go fix that. Barring more evidence, all I see here is client disconnect, not a backend failure. What's your basis for claiming a segv crash? > Ok, now the problem. When he sets autocommit to false, the JDBC driver > sends BEGIN to the backend. Ok so far, however, something then fails > during the first large object's load, and causes everything else to fail. That's not a bug, it's a feature ... allegedly, anyway. Any error inside a transaction means the entire transaction is aborted. And the backend will keep reminding you so until you cooperate by ending the transaction. I don't like the behavior very much either, but it's operating as designed. regards, tom lane
Re: [HACKERS] Continued problems with pgdump, Large Objects and crashing backends
От
Peter T Mount
Дата:
On Wed, 17 Feb 1999, Tom Lane wrote: > Peter T Mount <peter@retep.org.uk> writes: > > However, this fails when creating functions that have more than one sql > > statement in them. He has some functions that insert into a table > > depending on some arguments, then issue a select on the last arg which is > > the functions result. However, pgdump doesn't end the select with a ; and > > this causes the 6.5 backend to fail. Adding the ; fixes the problem. > > What does 'fail' mean exactly? Crash, or just reject the query? > It sounds like there is a pg_dump bug here (omitting a required > semicolon) but I don't understand whether there's also a backend bug. I didn't say this was a backend bug, but was one thing I came across while looking at the following problem. > > Running the backed with the -d2 flag, these expand to: > > > pq_recvbuf: recv() failed, errno 2 > > proc_exit(0) [#0] > > shmem_exit(0) [#0] > > exit(0) > > /usr/local/pgsql/bin/postmaster: reaping dead processes... > > /usr/local/pgsql/bin/postmaster: CleanupProc: pid 6731 exited with status 0 > > pq_recvbuf: recv() failed, errno 0 > > proc_exit(0) [#0] > > shmem_exit(0) [#0] > > exit(0) > > /usr/local/pgsql/bin/postmaster: reaping dead processes... > > /usr/local/pgsql/bin/postmaster: CleanupProc: pid 6730 exited with status 0 > > This doesn't look like a segv trace to me --- if the backend was > coredumping then the postmaster should see a nonzero exit status. > > The recv() complaints probably indicate that the client application > disconnected ungracefully (ie, without sending the 'X' terminate > message). It's curious that they're not both alike. > That might be a red herring however --- right now pq_recvbuf doesn't > distinguish plain EOF from a true error, and if it's plain EOF then > whatever errno was last set to gets printed. Think I'll go fix that. > > Barring more evidence, all I see here is client disconnect, not a > backend failure. Hmmm, I've never seen the recv() problem before with any JDBC app, only this one. PS: Currently the JDBC driver is still using the 6.3.x protocol. When 6.4 came out I didn't implement the CANCEL stuff, as I was concentrating on getting more of the innards implemented. Anyhow, if the terminate message is a problem, I'll upgrade the protocol. > What's your basis for claiming a segv crash? I think the segv came from Jason (who's run it against 6.3.x and 6.4.x). > > Ok, now the problem. When he sets autocommit to false, the JDBC driver > > sends BEGIN to the backend. Ok so far, however, something then fails > > during the first large object's load, and causes everything else to fail. > > That's not a bug, it's a feature ... allegedly, anyway. Any error > inside a transaction means the entire transaction is aborted. And > the backend will keep reminding you so until you cooperate by ending > the transaction. I don't like the behavior very much either, but > it's operating as designed. I'm going to overhaul the autocommit(false) code. I suspect it's broken, but I need to sit down and figure what is happening with this problem first. Peter -- Peter T Mount peter@retep.org.uk Main Homepage: http://www.retep.org.uk PostgreSQL JDBC Faq: http://www.retep.org.uk/postgresJava PDF Generator: http://www.retep.org.uk/pdf
Peter T Mount <peter@retep.org.uk> writes: >> The recv() complaints probably indicate that the client application >> disconnected ungracefully (ie, without sending the 'X' terminate >> message). It's curious that they're not both alike. > Hmmm, I've never seen the recv() problem before with any JDBC app, only > this one. That particular message is new in the 6.5 code (BTW, as of this morning it should say "pq_recvbuf: unexpected EOF on client connection"). I was about to say that prior versions would also complain about an unexpected client disconnect, but actually it looks like 6.4.2 doesn't --- at least not in this low-level code. I'm not inclined to remove the message however. I think we want it there to help detect more serious problems, like disconnect in the middle of a COPY operation. > PS: Currently the JDBC driver is still using the 6.3.x protocol. When 6.4 > came out I didn't implement the CANCEL stuff, as I was concentrating on > getting more of the innards implemented. > Anyhow, if the terminate message is a problem, I'll upgrade the protocol. The terminate message is defined in the old protocol too; it's not new for 6.4. As for whether it's a "problem" not to send it, it's only a problem if you don't like complaints in the postmaster log ;-). The backend will close up shop just fine without it. regards, tom lane
Re: [HACKERS] Continued problems with pgdump, Large Objects and crashing backends
От
Peter T Mount
Дата:
On Thu, 18 Feb 1999, Tom Lane wrote: > Peter T Mount <peter@retep.org.uk> writes: > >> The recv() complaints probably indicate that the client application > >> disconnected ungracefully (ie, without sending the 'X' terminate > >> message). It's curious that they're not both alike. > > > Hmmm, I've never seen the recv() problem before with any JDBC app, only > > this one. > > That particular message is new in the 6.5 code (BTW, as of this morning > it should say "pq_recvbuf: unexpected EOF on client connection"). > > I was about to say that prior versions would also complain about an > unexpected client disconnect, but actually it looks like 6.4.2 doesn't > --- at least not in this low-level code. I'm not inclined to remove the > message however. I think we want it there to help detect more serious > problems, like disconnect in the middle of a COPY operation. > > > PS: Currently the JDBC driver is still using the 6.3.x protocol. When 6.4 > > came out I didn't implement the CANCEL stuff, as I was concentrating on > > getting more of the innards implemented. > > Anyhow, if the terminate message is a problem, I'll upgrade the protocol. > > The terminate message is defined in the old protocol too; it's not new > for 6.4. As for whether it's a "problem" not to send it, it's only > a problem if you don't like complaints in the postmaster log ;-). > The backend will close up shop just fine without it. Looks like something that's been missing since the begining. Ok, I'll add the message to it tomorrow, as I'm planning some cleanups this weekend. Peter -- Peter T Mount peter@retep.org.uk Main Homepage: http://www.retep.org.uk PostgreSQL JDBC Faq: http://www.retep.org.uk/postgresJava PDF Generator: http://www.retep.org.uk/pdf