Обсуждение: backend process terminates
We've been wrestling with a problem where the backend process terminates with a SEGSEGV. We are having a hard time tracking this thing down, so I decided to run a batch gdb process that single steps through the code until it crashes and post the output to the list for a request for assistance. The problem is that the output file is 324k, so I'm sticking it on a website so as not to send such a large file as a attachment. We would appreciate any assistance folks might have in helping us determine what is going on here. The following is the query run that generated this segfault: select pcm_getmiles_s('sparta, nc', 'buffalo, ny', 0); We are building pcm_getmiles_s() into the backend process. This is Postgresql 7.4.17 on Red Hat Enterprise 4. The output from the gdb batch process may be found here: http://www.serioustechnology.com/gdbbatch.txt Any help will be greatly appreciated. -- Until later, Geoffrey Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. - Benjamin Franklin
Geoffrey Myers <geof@serioustechnology.com> writes: > The output from the gdb batch process may be found here: > http://www.serioustechnology.com/gdbbatch.txt gdb isn't telling you the whole truth, evidently --- how'd control get from line 781 to 912 with nothing in between? Recompiling the backend with -O0 or at most -O1 would be a good idea to get a more trustworthy gdb trace. regards, tom lane
Tom Lane wrote: > Geoffrey Myers <geof@serioustechnology.com> writes: >> The output from the gdb batch process may be found here: >> http://www.serioustechnology.com/gdbbatch.txt > > gdb isn't telling you the whole truth, evidently --- how'd control get > from line 781 to 912 with nothing in between? Recompiling the backend > with -O0 or at most -O1 would be a good idea to get a more trustworthy > gdb trace. Well, there is some third party libraries we've built into the backend that we don't have the source for. We think it may be that there's some memory corruption going on there. -- Until later, Geoffrey Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. - Benjamin Franklin
Tom Lane wrote: > Geoffrey Myers <geof@serioustechnology.com> writes: >> The output from the gdb batch process may be found here: >> http://www.serioustechnology.com/gdbbatch.txt > > gdb isn't telling you the whole truth, evidently --- how'd control get > from line 781 to 912 with nothing in between? Recompiling the backend > with -O0 or at most -O1 would be a good idea to get a more trustworthy > gdb trace. As previously noted, we are building some third party code into the backend. We don't have the source code, so it's difficult to know what might be going on there. I don't know all the idiosyncrasies of how this works, so bear with me on this. The developer at the vendor indicated that he's narrowed down the problem to a set of wrapper routines in their code. They are named OpenFile(), CloseFile() and ReadFile(); He inquired as to whether there might be routines in the Postgresql code with the same names that might be causing a conflict. Sure enough, I searched the Postgresql source code and found routines with the same names. I don't see how this could pose a problem though, as it is my understanding that the compiler will properly address this issue. Anyone think this might be a problem? -- Until later, Geoffrey Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. - Benjamin Franklin
On Tue, Aug 07, 2007 at 07:46:45AM -0400, Geoffrey wrote: > I don't know all the idiosyncrasies of how this works, so bear with me > on this. The developer at the vendor indicated that he's narrowed down > the problem to a set of wrapper routines in their code. They are named > OpenFile(), CloseFile() and ReadFile(); He inquired as to whether there > might be routines in the Postgresql code with the same names that might > be causing a conflict. Sure enough, I searched the Postgresql source > code and found routines with the same names. I don't see how this could > pose a problem though, as it is my understanding that the compiler will > properly address this issue. Yes, this could cause a problem. In general, when loading a library, any external references are first resolved against the main executable, then already loaded libraries, then the library being loaded. It's all in the ELF standard, if you're interested. As for solutions: 1. In your third party library, have the library built in such a way that the symbols are explicitly bound to the internal library version. There are various methods for dealing with that, it all depends on the toolchain used to build it. I suppose this product is actually several libraries that call eachother? Namespace would help here. 2. Make sure that any externally visible symbols in libraries are always prefixed by a tag, like libpq does (almost all symbols are pq*). Running "nm -D" over the main postgres executable and your libraries should give you an idea of the scope of the problem. Hope this helps, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Вложения
Martijn van Oosterhout wrote: > On Tue, Aug 07, 2007 at 07:46:45AM -0400, Geoffrey wrote: >> I don't know all the idiosyncrasies of how this works, so bear with me >> on this. The developer at the vendor indicated that he's narrowed down >> the problem to a set of wrapper routines in their code. They are named >> OpenFile(), CloseFile() and ReadFile(); He inquired as to whether there >> might be routines in the Postgresql code with the same names that might >> be causing a conflict. Sure enough, I searched the Postgresql source >> code and found routines with the same names. I don't see how this could >> pose a problem though, as it is my understanding that the compiler will >> properly address this issue. > > Yes, this could cause a problem. In general, when loading a library, > any external references are first resolved against the main > executable, then already loaded libraries, then the library being > loaded. It's all in the ELF standard, if you're interested. I will be checking them out. My compiler knowledge is a bit rusty, circa SVR4... ;) > As for solutions: > 1. In your third party library, have the library built in such a way > that the symbols are explicitly bound to the internal library version. > There are various methods for dealing with that, it all depends on the > toolchain used to build it. I suppose this product is actually several > libraries that call eachother? Namespace would help here. Correct on both counts. Many of the routines are wrapper routines used to assist in code portability. > 2. Make sure that any externally visible symbols in libraries are > always prefixed by a tag, like libpq does (almost all symbols are pq*). > > Running "nm -D" over the main postgres executable and your libraries > should give you an idea of the scope of the problem. > > Hope this helps, It appears that the common routine names were causing the problem. We are currently testing new versions of these libraries where they have renamed the common routines with unique names. Thanks for the insights. -- Until later, Geoffrey Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. - Benjamin Franklin
On Wed, Aug 08, 2007 at 08:50:41AM -0400, Geoffrey wrote: > Correct on both counts. Many of the routines are wrapper routines used > to assist in code portability. That ok in programs, but shared libraries need to be careful not to use names likely to be used by programs that use them. FWIW, this document has lots of information about ELF shared libraries. http://people.redhat.com/drepper/dsohowto.pdf There's a lot of technical stuff that you can skip, but there is a lot of info about scopes and how they are resolved, common problems and how to fix them. Have a nice, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.