Обсуждение: Proposed Windows-specific change: Enable crash dumps (like core files)
Hi all After this recent "fun" trying to get a usable crash dump or stack trace from a crashing autovacuum worker on Windows, I'd like to make a quick proposal - one that'll be followed by a patch if there aren't any loud NACKs. Windows has a couple of built-in facilities to automatically generate crash dumps, much like UNIX core files. None of them (JIT debugger included) work well with PostgreSQL's process-based archiectecture and use of a private service account. The autovacuum daemon's launcher/worker split is especially tricky, because a problem worker process may not live long before crashing, as in the case I'm currently bashing my head against. Some other mechanism is needed. Dbghelp.dll, a redistributible DLL from the platform SDK, provides that mechanism. You can set a handler that's invoked if the process encounters an unhandled exception (which in Windows-land includes anything that'd be a fatal signal in UNIX, as well as "real" C++ exceptions). This handler uses LoadLibrary to load dbghelp.dll and generate a crash dump (core file) that can be debugged with Visual Studio, windbg, etc on the generating machine or any other with the right PostgreSQL binaries and debug symbols. It's kind of like loading gdb in-process and writing out a core, with the same obvious problem that it won't help you with problems caused by running totally out of memory or certain types of memory corruption. It's protected against recursive calls by the OS, though, so if your handler fails too there's no real harm in it. The big advantage is that, like when dealing with a core file on *nix, people can email/ftp/whatever crash dump files for analsys anywhere else that someone has the same binaries and debug symbols. That'd make it it awfully useful even in cases where it *is* easy to predict the crash and attach a debugger. Because of the potential disk space use of crash dumps I think it'd be undesirable to have it always enabled, so here's what I propose: - Always compile the crash dump feature in builds, so every build that goes to a user is capable of crash dumps. Becausea suitable version of dbghelp.dll has been included in Windows XP and above, it shouldn't be necessary to ship theDLL with Pg, though the license permits that. - Use a GUC option to control whether dumps are generated on a crash. It'd be a string value, with permitted values "off","normal" or "withdata". (These match the "MiniDumpNormal" and "MiniDumpWithDataSegs" calls in dbghelp.dll). The defaultis "off". If "off" is set then no unhandled exception handler will be registered and no crash dump will be generated. The crashdump code has no effect beyond checking the GUC and seeing that it's set to "off". With either of the other valuesa handler is registered; which is chosen controls whether a minimal or full crash dump will be generated. It may even turn out to be easy to make this togglable at runtime (superuser only) at least with a conf edit and reload,or possibly even per-session. I'm unsure of that as yet, though. - Hard-code the dump file output location, so there's no need to try to read a GUC from within the crash handler. If people want to change the dump file location, they can symlink/reparse/ whatever $DATADIR\crashdumps to their preferredlocation. - If the crash dump handler is enabled by setting the GUC, all backends register the handler during startup or (if it provespractical) when the GUC is changed. - When the handler is triggered by the OS trapping an unhandled exception, it loads dbghelp.dll, writes the appropriatedump format to the hardcoded path, and terminates the process. Comments? Thoughts? Would a patch along these lines have a chance of acceptance? (BCC'd Dave Page because of his involvement in Windows & Windows Pg support). -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/
On 10/04/2010 07:50 AM, Craig Ringer wrote: > > - If the crash dump handler is enabled by setting the GUC, > all backends register the handler during startup or (if it > proves practical) when the GUC is changed. > > - When the handler is triggered by the OS trapping an unhandled > exception, it loads dbghelp.dll, writes the appropriate dump > format to the hardcoded path, and terminates the process. > > What is the performance impact of doing that? Specifically, how does it affect backend startup time? cheers andrew
On Mon, Oct 4, 2010 at 12:50 PM, Craig Ringer <craig@postnewspapers.com.au> wrote: > - Always compile the crash dump feature in builds, so every build that > goes to a user is capable of crash dumps. Because a suitable version > of dbghelp.dll has been included in Windows XP and above, it shouldn't > be necessary to ship the DLL with Pg, though the license permits that. We probably would want to - the version that comes with Windows is somewhat cut down, and MSDN recommends using the more recent versions from the debugging tools. Any idea of the performance overhead? > (BCC'd Dave Page because of his involvement in Windows & Windows Pg > support). That explains why it bypassed my filters :-) -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise Postgres Company
On 4/10/2010 8:06 PM, Andrew Dunstan wrote: > > > On 10/04/2010 07:50 AM, Craig Ringer wrote: >> >> - If the crash dump handler is enabled by setting the GUC, >> all backends register the handler during startup or (if it >> proves practical) when the GUC is changed. >> >> - When the handler is triggered by the OS trapping an unhandled >> exception, it loads dbghelp.dll, writes the appropriate dump >> format to the hardcoded path, and terminates the process. >> >> > > What is the performance impact of doing that? Specifically, how does it > affect backend startup time? Without testing I can't say for sure. My expection based on how the handler works would be: near-zero, about as expensive as registering a signal handler, plus the cost of reading the GUC and doing one string compare to test the value. When disabled, it's just the GUC test. Is there a better mechanism to use for features that're going to be unused the great majority of the time? Perhaps something that does require a server restart, but doesn't have any cost at all when disabled? -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/
On 4/10/2010 8:09 PM, Dave Page wrote: > Any idea of the performance overhead? A GUC read and string compare when off, and (untested as yet) I expect little more when on. Thinking about it, a possibly better alternative is to ship it as a library as is done with the pl/pgsql debugger, and use (I think) shared_preload_libraries to load it when desired. That way there's zero cost if disabled, though a somewhat higher cost if enabled. Hmm. That'll let me put a test version together that'll work with 9.0 as well, so that seems like a no-brainer really, it's just a better way to do it. Time for a Pg-on-windows build, yay. -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Heikki Linnakangas
Дата:
On 04.10.2010 15:21, Craig Ringer wrote: > Thinking about it, a possibly better alternative is to ship it as a > library as is done with the pl/pgsql debugger, and use (I think) > shared_preload_libraries to load it when desired. That way there's zero > cost if disabled, though a somewhat higher cost if enabled. > > Hmm. That'll let me put a test version together that'll work with 9.0 as > well, so that seems like a no-brainer really, it's just a better way to > do it. Time for a Pg-on-windows build, yay. If it's not a lot of code, it's better to have it built-in always. Loading extra libraries in debug-mode could lead to heisenbugs. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Craig Ringer
Дата:
On 4/10/2010 8:27 PM, Heikki Linnakangas wrote: > On 04.10.2010 15:21, Craig Ringer wrote: >> Thinking about it, a possibly better alternative is to ship it as a >> library as is done with the pl/pgsql debugger, and use (I think) >> shared_preload_libraries to load it when desired. That way there's zero >> cost if disabled, though a somewhat higher cost if enabled. >> >> Hmm. That'll let me put a test version together that'll work with 9.0 as >> well, so that seems like a no-brainer really, it's just a better way to >> do it. Time for a Pg-on-windows build, yay. > > If it's not a lot of code, it's better to have it built-in always. > Loading extra libraries in debug-mode could lead to heisenbugs. Good point. The built-in approach would also permit it to be turned on in an already-running server, which would be nice as for those fun only-shows-up-in-production bugs where you may not want to restart Pg. I'll chuck together a library version first, though, so it can be tested easily on existing installs of Pg 9.0. Compiling Pg on Windows isn't as quick and easy as on *nix so that should help for testing/consideration. If people are happy with the results I'll put together a patch to integrate it directly instead of using a library. -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Craig Ringer
Дата:
On 4/10/2010 8:27 PM, Heikki Linnakangas wrote: > On 04.10.2010 15:21, Craig Ringer wrote: >> Thinking about it, a possibly better alternative is to ship it as a >> library as is done with the pl/pgsql debugger, and use (I think) >> shared_preload_libraries to load it when desired. That way there's zero >> cost if disabled, though a somewhat higher cost if enabled. >> >> Hmm. That'll let me put a test version together that'll work with 9.0 as >> well, so that seems like a no-brainer really, it's just a better way to >> do it. Time for a Pg-on-windows build, yay. > > If it's not a lot of code, it's better to have it built-in always. > Loading extra libraries in debug-mode could lead to heisenbugs. It turns out that the basic minidumps are small (21kb here) and very fast to generate. It may well be worth leaving it enabled all the time after all. I just need to time how long the handler registration takes - though as LOAD of the current handler implemented as a contrib module takes 2.1ms, and LOAD of an module with an empty _PG_init() also takes 2.1ms, I'm guessing "not long". I've attached an early version for people to play with if anyone's interested. It currently contains a bunch of elog() messages to report on its progress - though I'm not at all sure elog() should be used in the final version given that Pg is crashing at the time the handler is called and there's no guarantee elog() is safe to call. It also doesn't offer any way to set the dump type yet, it's always the minimal dump with exception info, registers and stack only. Finally, a "crashme" function is included to trigger a reliable unhandled exception on demand, for initial testing. Usage with Pg built from source on Windows: - Drop crashdump.c and the Makefile in contrib/crashdump - Run build.bat and install.bat - Create a "crashdumps" directory inside your datadir, and make sure the server has read/write/create permission on it. - add 'crashdump' to shared_preload_libraries in postgresql.conf - Start the server as normal - Start psql and issue: -- CREATE OR REPLACE FUNCTION crashdump_crashme() RETURNS void AS '$libdir/crashdump','crashdump_crashme' LANGUAGE C; -- SELECT crashdump_crashme(); Your backend should crash. In the error log (and, depending on how the buffering works out, maybe in psql) you should see: WARNING: loading dll WARNING: loading dll try 1, 00000000 WARNING: loading dll try 2, 73880000 WARNING: pdump: 738C70D8 WARNING: Generating dumppath WARNING: Generated dumppath WARNING: dumping to path: crashdumps\backend.dmp WARNING: outfile: 000000B0 WARNING: about to dump NOTICE: crashdump: wrote crash dump to crashdumps\postgres-4341.mdmp Once you have the dump file, you can fire up windbg from the Debugging Tools for Windows and use File->Open Crash Dump to open it. The .excr command will report what the exception that caused the crash was, producing output like this: > 0:000> .ecxr > eax=00000000 ebx=00000000 ecx=02a1e290 edx=73831014 esi=02a1e188 edi=00c3f798 > eip=738313b2 esp=00c3f724 ebp=02a1e280 iopl=0 nv up ei pl zr na pe nc > cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010246 > crashdump!crashdump_crashme+0x2: > 738313b2 c70001000000 mov dword ptr [eax],1 ds:0023:00000000=???????? and if possible opening the postgresql source file the crash happened in to the appropriate line: Datum crashdump_crashme(PG_FUNCTION_ARGS) { int * ptr = NULL; *ptr = 1; <--- highlighted return NULL; } If you're using Visual Studio Professional (not the free Express edition, unfortunately) you should also be able to debug the crash dump that way. I don't have it to test with. -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/
Вложения
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Craig Ringer
Дата:
OK, it's pretty much ready for proper testing now. If a few people are happy with the results and think it's a good idea I'll chuck it in the commitfest app. As built, the crash dump handler works with a stock PostgreSQL 9.0 (release build) as shipped in EDB's installer. Just drop crashdump.dll in lib\, optionally pop the dbghelp.dll redist in bin\, add 'crashdump' to shared_preload_libraries, and crash some backends however you feel like doing so. The current build of crashdump.dll for release versions of PostgreSQL 9.0 on 32-bit Windows is here: http://www.postnewspapers.com.au/~craig/webfiles/crashdump.dll If folks are happy with how this works, all it needs is: - Consideration of whether elog should be used or not. I'm inclined to suggest using elog to tell the user about the dump, but only after the crash dump has been written successfully. - Comments on whether this should be left as a loadable module, or integerated into core so it loads early in backend startup. The latter will permit crash dumping of early backend startup problems, and will have tiny overhead because there's no DLL to find and load. OTOH, it's harder to pull out if somehow something breaks. I'd want to start with loadable module in shared_preload_libraries and if that proves useful, only then consider integrating in core. I'm way too bad a programmer to want my code anywhere near Pg's core without plenty of real world testing. - (Maybe) a method to configure the crash dump type. I'm not too sure it's necessary given the set of dump flags I've landed up with, so I'd leave this be unless it proves to be necessary in real-world testing. Remember that with these crash dumps the user only has to email the (~4MB in my tests) crash dump. They don't need debugging tools or the ability to use them, only the recipient does. I'm using a tweaked set of minidump flags now, to generate considerably bigger dumps (around 4MB for the configuration I'm testing) that contain pretty much everything except shared memory contents, the contents of memory mapped files, and the loaded read-only code segments of the executables and DLLs. You can see the results of using it to debug that autovac crash at the end of this mail. When testing the script provided by Andrea Peri, when the autovacuum worker crashes, the message: > 2010-10-05 22:23:44 WST 2040 WARNING: crashdump: wrote crash dump to crashdumps\postgres-2040.mdmp is emitted before the process resumes crashing out as it would've normally. Opening and displaying that dump in windbg shows a useful stack trace from the crashing process, and it's possible to examine the state of local/global variables etc. > 0:000> .ecxr > eax=90fffffe ebx=040af140 ecx=68f08610 edx=040af248 esi=011f64d4 edi=040b001c > eip=015d5267 esp=0055f1cc ebp=011f5bc8 iopl=0 nv up ei pl zr na pe nc > cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010246 > postgres!pfree+0x7: > 015d5267 8b5004 mov edx,dword ptr [eax+4] ds:0023:91000002=???????? > 0:000> ~*k > > . 0 Id: 7f8.7b8 Suspend: 0 Teb: 7ffdf000 Unfrozen > ChildEBP RetAddr > 0055e210 75b4c1ee ntdll!KiFastSystemCallRet > 0055e250 76e7100e KERNELBASE!FindClose+0x93 > 0055e514 679160c3 kernel32!_SEH_epilog4_GS+0xa > 0055e6cc 779734e0 dbghelp!Win32LiveSystemProvider::OpenMapping+0x2c3 > 0055e7a8 7797353a ntdll!RtlpAllocateHeap+0xab2 > 0055e828 77965edc ntdll!RtlAllocateHeap+0x23a > 0055e840 76e7123a ntdll!ZwWriteFile+0xc > 0055e874 67916861 kernel32!WriteFileImplementation+0x76 > 0055e8ac 67916910 dbghelp!Win32LiveSystemProvider::ReadVirtual+0x71 > 0055e8d8 67908f09 dbghelp!Win32LiveSystemProvider::ReadAllVirtual+0x30 > 0055e910 679094b4 dbghelp!WriteMemoryFromProcess+0xa9 > 0055e9a8 6790d522 dbghelp!WriteThreadList+0x184 > 0055e9c0 6790e65a dbghelp!WriteDumpData+0x102 > 0055eb58 6790e9cb dbghelp!MiniDumpProvideDump+0x3fa > 0055ebd0 73231353 dbghelp!MiniDumpWriteDump+0x1cb > 0055ed14 76e82c4a crashdump!crashDumpHandler+0x183 [c:\users\craig\developer\postgresql-9.0.0\contrib\crashdump\crashdump.c@ 159] > 0055ed9c 77995af4 kernel32!UnhandledExceptionFilter+0x127 > 0055eda4 7793d964 ntdll!__RtlUserThreadStart+0x62 > 0055edb8 7793d7fc ntdll!_EH4_CallFilterFunc+0x12 > 0055ede0 779665f9 ntdll!_except_handler4+0x8e > 0055ee04 779665cb ntdll!ExecuteHandler2+0x26 > 0055eeb4 77966457 ntdll!ExecuteHandler+0x24 > 0055eeb4 015d5267 ntdll!KiUserExceptionDispatcher+0xf > 0055f1c8 0139eee7 postgres!pfree+0x7 [c:\pginstaller-repo\postgres.windows\src\backend\utils\mmgr\mcxt.c @ 591] > 0055f1e0 0139f14a postgres!examine_attribute+0x207 [c:\pginstaller-repo\postgres.windows\src\backend\commands\analyze.c@ 877] > 0055f284 0139f94c postgres!do_analyze_rel+0x24a [c:\pginstaller-repo\postgres.windows\src\backend\commands\analyze.c @357] > 0055f2ac 013eb74a postgres!analyze_rel+0x1bc [c:\pginstaller-repo\postgres.windows\src\backend\commands\analyze.c @ 232] > 0055f330 014b30c6 postgres!vacuum+0x1ea [c:\pginstaller-repo\postgres.windows\src\backend\commands\vacuum.c @ 248] > 0055f368 014b3991 postgres!autovacuum_do_vac_analyze+0x86 [c:\pginstaller-repo\postgres.windows\src\backend\postmaster\autovacuum.c@ 2651] > 0055f4f4 014b41c5 postgres!do_autovacuum+0x811 [c:\pginstaller-repo\postgres.windows\src\backend\postmaster\autovacuum.c@ 2231] > 0055f588 014bed02 postgres!AutoVacWorkerMain+0x265 [c:\pginstaller-repo\postgres.windows\src\backend\postmaster\autovacuum.c@ 1611] > 0055f7d8 01423ce8 postgres!SubPostmasterMain+0x442 [c:\pginstaller-repo\postgres.windows\src\backend\postmaster\postmaster.c@ 4093] > 0055f7f0 015ee1ad postgres!main+0x1f8 [c:\pginstaller-repo\postgres.windows\src\backend\main\main.c @ 165] > 0055f834 76e71194 postgres!__tmainCRTStartup+0x10f [f:\dd\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 586] > 0055f840 7797b495 kernel32!BaseThreadInitThunk+0xe > 0055f880 7797b468 ntdll!__RtlUserThreadStart+0x70 > 0055f898 00000000 ntdll!_RtlUserThreadStart+0x1b
Вложения
Craig Ringer wrote: > On 4/10/2010 8:06 PM, Andrew Dunstan wrote: > > > > > > On 10/04/2010 07:50 AM, Craig Ringer wrote: > >> > >> - If the crash dump handler is enabled by setting the GUC, > >> all backends register the handler during startup or (if it > >> proves practical) when the GUC is changed. > >> > >> - When the handler is triggered by the OS trapping an unhandled > >> exception, it loads dbghelp.dll, writes the appropriate dump > >> format to the hardcoded path, and terminates the process. > >> > >> > > > > What is the performance impact of doing that? Specifically, how does it > > affect backend startup time? > > Without testing I can't say for sure. > > My expection based on how the handler works would be: near-zero, about > as expensive as registering a signal handler, plus the cost of reading > the GUC and doing one string compare to test the value. When disabled, > it's just the GUC test. > > Is there a better mechanism to use for features that're going to be > unused the great majority of the time? Perhaps something that does > require a server restart, but doesn't have any cost at all when disabled? We definately had trouble producing crash dumps caused by the 128 return code problem, so I can see great value in this, if it can be done simply. I wonder if the 128-exit would have produced a crash file. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Magnus Hagander
Дата:
On Tue, Oct 5, 2010 at 17:21, Craig Ringer <craig@postnewspapers.com.au> wrote: > OK, it's pretty much ready for proper testing now. If a few people are happy > with the results and think it's a good idea I'll chuck it in the commitfest > app. > > As built, the crash dump handler works with a stock PostgreSQL 9.0 (release > build) as shipped in EDB's installer. Just drop crashdump.dll in lib\, > optionally pop the dbghelp.dll redist in bin\, add 'crashdump' to > shared_preload_libraries, and crash some backends however you feel like > doing so. > > The current build of crashdump.dll for release versions of PostgreSQL 9.0 on > 32-bit Windows is here: > > http://www.postnewspapers.com.au/~craig/webfiles/crashdump.dll > > If folks are happy with how this works, all it needs is: > > - Consideration of whether elog should be used or not. I'm inclined to > suggest using elog to tell the user about the dump, but only after > the crash dump has been written successfully. > > - Comments on whether this should be left as a loadable module, or > integerated into core so it loads early in backend startup. The latter > will permit crash dumping of early backend startup problems, and will > have tiny overhead because there's no DLL to find and load. OTOH, it's > harder to pull out if somehow something breaks. > > I'd want to start with loadable module in shared_preload_libraries > and if that proves useful, only then consider integrating in core. > I'm way too bad a programmer to want my code anywhere near Pg's core > without plenty of real world testing. > > - (Maybe) a method to configure the crash dump type. I'm not too sure > it's necessary given the set of dump flags I've landed up with, > so I'd leave this be unless it proves to be necessary in real-world > testing. > Finally getting to looking at this one - sorry about the very long delay. I agree with Heikki's earlier comment that it's better to have this included in the backend - but that's obviously not going to happen for already-released versions. I'd therefor advocate a contrib module for existing versions, and then in core for 9.1+. We should then have an option to turn it off (on by default). But we don't need to pay the overhead on every backend startup - we could just map the value into the parameter shared memory block, and require a full postmaster restart in order to change it. Do we want to backpatch it into contrib/? Adding a new module there seems kind of wrong - probably better to keep the source separate and just publish the DLL files for people who do debugging? Looking at the code: * Why do we need to look at differnt versions of dbghelp.dll? Can't we always use the one with Windows? And in that case, can't we just use compile-time linking, do we need to bother with DLL loading at all? * Per your comment about using elog() - a good idea in that case is to use write_stderr(). That will send the output to stderr if there is one, and otherwise the eventlog (when running as service). * And yes, in the crash handler, we should *definitely* not use elog(). * On Unix, the core file is dropped in the database directory, we don't have a separate directory for crashdumps. If we want to be consistent, we should do that here too. I do think that storing them in a directory like "crashdumps" is better, but I just wanted to raise the comment. * However, when storing it in crashdumps, I think the code would need to create that directory if it does not exist, doesn't it? * Right now, we overwrite old crashdumps. It is well known that PIDs are recycled pretty quickly on Windows - should we perhaps dump as postgres-<pid>-<sequence>.mdmp when there is a filename conflict? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Mon, Nov 22, 2010 at 6:37 AM, Magnus Hagander <magnus@hagander.net> wrote: > Do we want to backpatch it into contrib/? Adding a new module there > seems kind of wrong - probably better to keep the source separate and > just publish the DLL files for people who do debugging? If this works without changes to core, I see little reason not to back-patch it into contrib. Our primary concern with back-patching is to avoid doing things that might break existing installations, but if there aren't any core changes, I don't really see how that can be an issue here. It seems to me that it's probably simpler for us and our users to keep the debugging tools together with our main tree. However, I am not clear what benefit we get from moving this into core in 9.1. If it's still going to require a full postmaster restart, the GUC you have to change may as well be shared_preload_libraries as a new one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > However, I am not clear what benefit we get from moving this into core > in 9.1. If it's still going to require a full postmaster restart, the > GUC you have to change may as well be shared_preload_libraries as a > new one. +1. I am not in favor of randomly repackaging functionality, unless there's some clear benefit gained by doing so. In this case it seems like something that could and should remain at arm's length forever, so a contrib module is the ideal packaging. regards, tom lane
On Mon, Nov 22, 2010 at 9:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> However, I am not clear what benefit we get from moving this into core >> in 9.1. If it's still going to require a full postmaster restart, the >> GUC you have to change may as well be shared_preload_libraries as a >> new one. > > +1. I am not in favor of randomly repackaging functionality, unless > there's some clear benefit gained by doing so. In this case it seems > like something that could and should remain at arm's length forever, > so a contrib module is the ideal packaging. Now, if we could make this so low-overhead that it doesn't need a switch, that would be a good argument for moving it into core, at least to my way of thinking. But if it's something that needs to be enabled with a postmaster restart anyway, meh. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Peter Eisentraut
Дата:
On mån, 2010-11-22 at 12:37 +0100, Magnus Hagander wrote: > Do we want to backpatch it into contrib/? It's not a bug fix or an upgrading aid, so no.
Peter Eisentraut <peter_e@gmx.net> writes: > On mån, 2010-11-22 at 12:37 +0100, Magnus Hagander wrote: >> Do we want to backpatch it into contrib/? > It's not a bug fix or an upgrading aid, so no. I'm less than thrilled about back-patching this, too. It seems to fly in the face of all our historical practice. If you drop the bit about back-patching, then I don't particularly care whether it goes into core or contrib for 9.1 --- whichever packaging makes the most sense is fine. regards, tom lane
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Magnus Hagander
Дата:
On Mon, Nov 22, 2010 at 16:33, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Eisentraut <peter_e@gmx.net> writes: >> On mån, 2010-11-22 at 12:37 +0100, Magnus Hagander wrote: >>> Do we want to backpatch it into contrib/? > >> It's not a bug fix or an upgrading aid, so no. > > I'm less than thrilled about back-patching this, too. It seems to fly > in the face of all our historical practice. > > If you drop the bit about back-patching, then I don't particularly care > whether it goes into core or contrib for 9.1 --- whichever packaging > makes the most sense is fine. My suggestion in the first place was not to backpatch it - I just wanted to get peoples opinions. I'm perfectly happy if we keep it somewhere else for the time being - as long as we make the binaries easily available. But that can go on the wiki for example. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Magnus Hagander
Дата:
On Mon, Nov 22, 2010 at 15:15, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Nov 22, 2010 at 6:37 AM, Magnus Hagander <magnus@hagander.net> wrote: >> Do we want to backpatch it into contrib/? Adding a new module there >> seems kind of wrong - probably better to keep the source separate and >> just publish the DLL files for people who do debugging? > > If this works without changes to core, I see little reason not to > back-patch it into contrib. Our primary concern with back-patching is > to avoid doing things that might break existing installations, but if > there aren't any core changes, I don't really see how that can be an > issue here. It seems to me that it's probably simpler for us and our > users to keep the debugging tools together with our main tree. > > However, I am not clear what benefit we get from moving this into core > in 9.1. If it's still going to require a full postmaster restart, the > GUC you have to change may as well be shared_preload_libraries as a > new one. on-by-default is what we gain. I think that's fairly big... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> writes: > on-by-default is what we gain. I think that's fairly big... Only if that's actually what we want, which is far from clear in this corner. There are good reasons why most Linux distros configure daemons not to dump core by default. It's annoying when we are trying to debug a Postgres crash, but that doesn't mean the reasons aren't good. regards, tom lane
On Mon, Nov 22, 2010 at 10:17 AM, Peter Eisentraut <peter_e@gmx.net> wrote: > On mån, 2010-11-22 at 12:37 +0100, Magnus Hagander wrote: >> Do we want to backpatch it into contrib/? > > It's not a bug fix or an upgrading aid, so no. I don't see why an upgrading aid would be worthy of back-patching, but not a debugging aid. I'd certainly prioritize those in the other order. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I don't see why an upgrading aid would be worthy of back-patching, but > not a debugging aid. I'd certainly prioritize those in the other > order. I think the sort of upgrading aid Peter has in mind is the kind where it's entirely useless if it's not back-patched, because it has to run in the pre-upgraded server. We've discussed such things before in the context of in-place upgrade, though I believe there have been no actual instances as yet. I'm not really sure why we're even considering the notion of back-patching this item. Doing so would not fit with any past practice or agreed-on project management practices, not even under our lax standards for contrib (and I keep hearing people claim that contrib is or should be as trustworthy as core, anyway). Since when do we back-patch significant features that have not been through a beta test cycle? regards, tom lane
On Mon, Nov 22, 2010 at 12:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I don't see why an upgrading aid would be worthy of back-patching, but >> not a debugging aid. I'd certainly prioritize those in the other >> order. > > I think the sort of upgrading aid Peter has in mind is the kind where > it's entirely useless if it's not back-patched, because it has to run in > the pre-upgraded server. We've discussed such things before in the > context of in-place upgrade, though I believe there have been no actual > instances as yet. > > I'm not really sure why we're even considering the notion of > back-patching this item. Doing so would not fit with any past practice > or agreed-on project management practices, not even under our lax > standards for contrib (and I keep hearing people claim that contrib > is or should be as trustworthy as core, anyway). Since when do we > back-patch significant features that have not been through a beta test > cycle? I am as conservative about back-patching as anybody here, but debugging on Windows is not an easy thing to do, and I strongly suspect we are going to point people experiencing crashes on Windows to this code whether it's part of our official distribution or not. I don't see what we get out of insisting that people install it separately. This is a tool that is only intended to be used when PostgreSQL is CRASHING, so arguing that we shouldn't include the code because it might not be stable doesn't carry much water AFAICS. As far as I understand it, we don't back-patch new features because of the risk of breaking things, but in this case refusing to back-patch the code seems more likely to prevent adequate diagnosis of what is already broken. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Magnus Hagander <magnus@hagander.net> writes: > * On Unix, the core file is dropped in the database directory, we > don't have a separate directory for crashdumps. If we want to be > consistent, we should do that here too. I do think that storing them > in a directory like "crashdumps" is better, but I just wanted to raise > the comment. Just a note on that - it's by no means universal that Unix systems will put the core files in $PGDATA. OS X likes to put them in /cores, which I think is a convention shared with some other BSDish systems. On Linux I believe it's possible to configure where the core goes via environment settings. > * However, when storing it in crashdumps, I think the code would need > to create that directory if it does not exist, doesn't it? If it didn't do so, then manual creation/removal of the directory could be used as an on/off switch for the feature. Which would have a number of advantages, not least that you don't need to have the crash dumper dependent on GUC working. I haven't looked at the patch but this discussion makes it sound like the dumper is dependent on an uncomfortably large amount of backend code being functional. You need to minimize the number of assumptions of that sort. regards, tom lane
Robert Haas <robertmhaas@gmail.com> writes: > I am as conservative about back-patching as anybody here, but > debugging on Windows is not an easy thing to do, and I strongly > suspect we are going to point people experiencing crashes on Windows > to this code whether it's part of our official distribution or not. I > don't see what we get out of insisting that people install it > separately. This is a tool that is only intended to be used when > PostgreSQL is CRASHING, so arguing that we shouldn't include the code > because it might not be stable doesn't carry much water AFAICS. As > far as I understand it, we don't back-patch new features because of > the risk of breaking things, but in this case refusing to back-patch > the code seems more likely to prevent adequate diagnosis of what is > already broken. Well, if we're going to hand out prebuilt DLLs to people, we can do that without having back-patched the code officially. But more to the point is that it's not clear that we're going to end up with a contrib module at all going forward (a core feature would clearly be a lot more reliable), and I really do not wish to get involved with maintaining two independent versions of this code. This argument seems to boil down to "we have to have this yesterday", which I don't buy for a minute. If it's as critical as that, why did it take this long for someone to write it? I do NOT agree that this feature is important enough to justify a free pass around our normal management and quality assurance processes. regards, tom lane
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Heikki Linnakangas
Дата:
On 22.11.2010 19:47, Robert Haas wrote: > I am as conservative about back-patching as anybody here, but > debugging on Windows is not an easy thing to do, and I strongly > suspect we are going to point people experiencing crashes on Windows > to this code whether it's part of our official distribution or not. This whole thing makes me wonder: is there truly no reliable, simple method to tell Windows to create a core dump on crash, without writing custom code for it. I haven't seen one, but I find it amazing if there isn't. We can't be alone with this. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Magnus Hagander
Дата:
On Mon, Nov 22, 2010 at 18:46, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Magnus Hagander <magnus@hagander.net> writes: >> * However, when storing it in crashdumps, I think the code would need >> to create that directory if it does not exist, doesn't it? > > If it didn't do so, then manual creation/removal of the directory could > be used as an on/off switch for the feature. Which would have a number > of advantages, not least that you don't need to have the crash dumper > dependent on GUC working. I haven't looked at the patch but this > discussion makes it sound like the dumper is dependent on an > uncomfortably large amount of backend code being functional. You need > to minimize the number of assumptions of that sort. No, it's dependent on close to zero backend functionality. Particularly if we take out the dependency on elog() (write_stderr is much simpler). In fact, the calls to elog() are the *only* places where it calls into the backend as it stands today. And yes, ISTM it could work reasonably well to use the creation/deletion of the directory as an on/off switch for it. Which is the default is of course up to the packager then as well ;) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Magnus Hagander
Дата:
On Mon, Nov 22, 2010 at 18:54, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I am as conservative about back-patching as anybody here, but >> debugging on Windows is not an easy thing to do, and I strongly >> suspect we are going to point people experiencing crashes on Windows >> to this code whether it's part of our official distribution or not. I >> don't see what we get out of insisting that people install it >> separately. This is a tool that is only intended to be used when >> PostgreSQL is CRASHING, so arguing that we shouldn't include the code >> because it might not be stable doesn't carry much water AFAICS. As >> far as I understand it, we don't back-patch new features because of >> the risk of breaking things, but in this case refusing to back-patch >> the code seems more likely to prevent adequate diagnosis of what is >> already broken. > > Well, if we're going to hand out prebuilt DLLs to people, we can do that > without having back-patched the code officially. But more to the point > is that it's not clear that we're going to end up with a contrib module > at all going forward (a core feature would clearly be a lot more > reliable), and I really do not wish to get involved with maintaining two > independent versions of this code. I think the reasonable options are either "don't backpatch at all" or "backpatch the same way as we put it in HEAD, which is probably included in backend". I agree that sticking it in contrib is a half-measure that we shouldn't do. *IF* we go with a contrib module for 9.1 as well, we could backpatch as contrib module, but I think that's the only case. > This argument seems to boil down to "we have to have this yesterday", > which I don't buy for a minute. If it's as critical as that, why did > it take this long for someone to write it? I do NOT agree that this > feature is important enough to justify a free pass around our normal > management and quality assurance processes. Agreed. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Magnus Hagander
Дата:
On Mon, Nov 22, 2010 at 18:56, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 22.11.2010 19:47, Robert Haas wrote: >> >> I am as conservative about back-patching as anybody here, but >> debugging on Windows is not an easy thing to do, and I strongly >> suspect we are going to point people experiencing crashes on Windows >> to this code whether it's part of our official distribution or not. > > This whole thing makes me wonder: is there truly no reliable, simple method > to tell Windows to create a core dump on crash, without writing custom code > for it. I haven't seen one, but I find it amazing if there isn't. We can't > be alone with this. You can do it without custom code but it's quite hard. And AFAIK you need to install the debugger tools package from microsoft (separate download, and not something you'll normally find on a system). There is DrWatson and the Error Reporting service you can use. DrWatson is made for interactive programs (or was the last time I looked at it). The error reporting service requires setting up a local error reporting service and configure it for reports, if you want them sent anywhere other than to Microsoft. Neither one of them is a particularly good choie. The minidumps Craig's patch does are a lot more useful. We intentionally *disable* drwatson, because it's part of what pops up an error message you have to click Ok on before your server will continue running... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> writes: > On Mon, Nov 22, 2010 at 18:46, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> ... I haven't looked at the patch but this >> discussion makes it sound like the dumper is dependent on an >> uncomfortably large amount of backend code being functional. > No, it's dependent on close to zero backend functionality. > Particularly if we take out the dependency on elog() (write_stderr is > much simpler). In fact, the calls to elog() are the *only* places > where it calls into the backend as it stands today. Well, in the contrib-module guise, it's dependent on shared_preload_libraries or local_preload_libraries, which at least involves guc and dfmgr working pretty well (and not only in the postmaster, but during backend startup). regards, tom lane
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Magnus Hagander
Дата:
On Mon, Nov 22, 2010 at 19:39, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Magnus Hagander <magnus@hagander.net> writes: >> On Mon, Nov 22, 2010 at 18:46, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> ... I haven't looked at the patch but this >>> discussion makes it sound like the dumper is dependent on an >>> uncomfortably large amount of backend code being functional. > >> No, it's dependent on close to zero backend functionality. >> Particularly if we take out the dependency on elog() (write_stderr is >> much simpler). In fact, the calls to elog() are the *only* places >> where it calls into the backend as it stands today. > > Well, in the contrib-module guise, it's dependent on > shared_preload_libraries or local_preload_libraries, which at least > involves guc and dfmgr working pretty well (and not only in the > postmaster, but during backend startup). Yes, sorry. My mindset was in "the version that'll go into 9.1". -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Peter Eisentraut
Дата:
On mån, 2010-11-22 at 19:56 +0200, Heikki Linnakangas wrote: > This whole thing makes me wonder: is there truly no reliable, simple > method to tell Windows to create a core dump on crash, without writing > custom code for it. I haven't seen one, but I find it amazing if there > isn't. We can't be alone with this. Well, there is no reliable and simple method to rename a file on Windows, so what can you expect ... ;-)
On Mon, Nov 22, 2010 at 1:28 PM, Magnus Hagander <magnus@hagander.net> wrote: > I think the reasonable options are either "don't backpatch at all" or > "backpatch the same way as we put it in HEAD, which is probably > included in backend". I agree that sticking it in contrib is a > half-measure that we shouldn't do. > > *IF* we go with a contrib module for 9.1 as well, we could backpatch > as contrib module, but I think that's the only case. I agree with this, certainly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Craig Ringer
Дата:
>> However, I am not clear what benefit we get from moving this into core >> in 9.1. If it's still going to require a full postmaster restart, the >> GUC you have to change may as well be shared_preload_libraries as a >> new one. There's no reason it should require a postmaster restart. New backends spawned after the handler is turned on would enable it, and existing backends could be signalled to enable it as well. > on-by-default is what we gain. I think that's fairly big... More than that. If a crash handler is in core, then: - It can be inited much earlier, catching crashes that happen during loading of libraries and during later backend init; and - It's basically free when the cost of shared library loading is removed, so it can be left on in production or even be on by default. I need to do some testing on this, but given the apparently near-zero cost of initing the handler I won't be surprised if testing a GUC to see if the handler should be on or not costs more than loading it does. I still wouldn't support on-by-default because you'd need an automatic process to weed out old crash dumps or limit the number stored. That's a bigger job. So I think the admin should have to turn it on, but it'd be good to make it easy to turn on in production without a restart; I don't see why that'd be hard. I'll try to put together a patch that does just that, time permitting. Things are kind of hectic ATM. -- Craig Ringer
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Craig Ringer
Дата:
On 11/23/2010 01:30 AM, Tom Lane wrote: > I'm not really sure why we're even considering the notion of > back-patching this item. Doing so would not fit with any past practice > or agreed-on project management practices, not even under our lax > standards for contrib (and I keep hearing people claim that contrib > is or should be as trustworthy as core, anyway). Since when do we > back-patch significant features that have not been through a beta test > cycle? I see no advantage to back-patching. It's easy to provide a drop-in binary DLL for older versions of Pg on Windows, who're the only people this will work for. If the EDB folks saw value in it, they could bundle the DLL with updated point releases of the installer for Windows. No back-patching is necessary. -- Craig Ringer
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Craig Ringer
Дата:
On 11/23/2010 01:46 AM, Tom Lane wrote: >> * However, when storing it in crashdumps, I think the code would need >> to create that directory if it does not exist, doesn't it? > > If it didn't do so, then manual creation/removal of the directory could > be used as an on/off switch for the feature. Yep. That's how I'd want to do it in 9.1 - test for the directory and use that to decide whether to set the handler during early backend startup. That way you don't need a GUC, and should be able to load it *very* early in backend startup. > I haven't looked at the patch but this > discussion makes it sound like the dumper is dependent on an > uncomfortably large amount of backend code being functional. You need > to minimize the number of assumptions of that sort. It doesn't need to have any backend code working, really; all it needs is a usable stack and the ability to allocate off the heap. It won't save you in total OOM situations, stack exhaustion, or severe stack corruption, but should work pretty much any other time. The crash dump facility in dbghelp.dll offers *optional* features where apps can stream in additional app-specific data like recent log excerpts, etc. I didn't try to implement anything like that in the initial module partly because I want to minimize the amount of the backend that must be working for a crash dump to succeed. -- Craig Ringer
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Craig Ringer
Дата:
On 11/23/2010 01:56 AM, Heikki Linnakangas wrote: > On 22.11.2010 19:47, Robert Haas wrote: >> I am as conservative about back-patching as anybody here, but >> debugging on Windows is not an easy thing to do, and I strongly >> suspect we are going to point people experiencing crashes on Windows >> to this code whether it's part of our official distribution or not. > > This whole thing makes me wonder: is there truly no reliable, simple > method to tell Windows to create a core dump on crash, without writing > custom code for it. I haven't seen one, but I find it amazing if there > isn't. We can't be alone with this. Search for 'dbghelp.dll' on your average Windows system and you'll be surprised how many apps use it. Steam (the software distribution system) does, as does the Adobe Creative Suite on my machine. If you're running in interactive mode with access to the user's display you can use Windows error reporting. Services running in isolated user accounts don't seem to be able to use that. In any case, windows error reporting only collects *tiny* dumps with barely anything beyond the stack contents; they're a nightmare to use, and really require psychic powers and deep knowledge of scary Windows APIs for effective debugging. -- Craig Ringer
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Magnus Hagander
Дата:
On Tue, Nov 23, 2010 at 15:02, Craig Ringer <craig@postnewspapers.com.au> wrote: >>> However, I am not clear what benefit we get from moving this into core >>> in 9.1. If it's still going to require a full postmaster restart, the >>> GUC you have to change may as well be shared_preload_libraries as a >>> new one. > > There's no reason it should require a postmaster restart. New backends > spawned after the handler is turned on would enable it, and existing > backends could be signalled to enable it as well. I think that came off my comment that we could store the on/off in the startup shared memory block. It'd then be the only way to get it into any existing backends. >> on-by-default is what we gain. I think that's fairly big... > > More than that. If a crash handler is in core, then: > > - It can be inited much earlier, catching crashes that happen during loading > of libraries and during later backend init; and Yeah. > - It's basically free when the cost of shared library loading is removed, so > it can be left on in production or even be on by default. I need to do some > testing on this, but given the apparently near-zero cost of initing the > handler I won't be surprised if testing a GUC to see if the handler should > be on or not costs more than loading it does. I'm fairly certain it does. The GUC would be there to be able to turn the whole thing off because you don't want the dumps, not for performance reasons. > I still wouldn't support on-by-default because you'd need an automatic > process to weed out old crash dumps or limit the number stored. That's a > bigger job. So I think the admin should have to turn it on, but it'd be good > to make it easy to turn on in production without a restart; I don't see why > that'd be hard. I think the admin should deal with that - just like the admin has to clear out the old logs. > I'll try to put together a patch that does just that, time permitting. > Things are kind of hectic ATM. Let me know if you want me to look at adapting the patch for that - i can do that if you prefer. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Magnus Hagander
Дата:
On Tue, Nov 23, 2010 at 15:09, Craig Ringer <craig@postnewspapers.com.au> wrote: > On 11/23/2010 01:46 AM, Tom Lane wrote: > >>> * However, when storing it in crashdumps, I think the code would need >>> to create that directory if it does not exist, doesn't it? >> >> If it didn't do so, then manual creation/removal of the directory could >> be used as an on/off switch for the feature. > > Yep. That's how I'd want to do it in 9.1 - test for the directory and use > that to decide whether to set the handler during early backend startup. That > way you don't need a GUC, and should be able to load it *very* early in > backend startup. Or you set the handler always, and have the handler only actually create the dump if the directory exists. That way you can add the directory and still get a dump from both existing backends and the postmaster itself without a restart. > The crash dump facility in dbghelp.dll offers *optional* features where apps > can stream in additional app-specific data like recent log excerpts, etc. I > didn't try to implement anything like that in the initial module partly > because I want to minimize the amount of the backend that must be working > for a crash dump to succeed. Yeah, we already have the logs in the logfile etc. Let's keep this uncomplicated so that it stays working :-) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> writes: > On Tue, Nov 23, 2010 at 15:09, Craig Ringer <craig@postnewspapers.com.au> wrote: >> Yep. That's how I'd want to do it in 9.1 - test for the directory and use >> that to decide whether to set the handler during early backend startup. That >> way you don't need a GUC, and should be able to load it *very* early in >> backend startup. > Or you set the handler always, and have the handler only actually > create the dump if the directory exists. +1 for that way. regards, tom lane
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Craig Ringer
Дата:
On 11/24/2010 05:18 AM, Magnus Hagander wrote: > Or you set the handler always, and have the handler only actually > create the dump if the directory exists. That way you can add the > directory and still get a dump from both existing backends and the > postmaster itself without a restart. That's way smarter. No extra filesystem access during startup, even if it is cheap. -- Craig Ringer
Craig Ringer wrote: > On 11/24/2010 05:18 AM, Magnus Hagander wrote: > >> Or you set the handler always, and have the handler only actually >> create the dump if the directory exists. That way you can add the >> directory and still get a dump from both existing backends and the >> postmaster itself without a restart. > > That's way smarter. No extra filesystem access during startup, even if > it is cheap. I added a commenting referencing this bit to the CF entry so it doesn't get forgotten. Magnus raised a few other issues in his earlier review too. Discussion of this patch seems to have jumped the gun a bit into talking about backpatching before the code for HEAD was completely done, then stalled there. Are we going to see an updated patch that addresses submitted feedback in this cycle? -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services and Support www.2ndQuadrant.us
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Craig Ringer
Дата:
On 22/11/2010 7:37 PM, Magnus Hagander wrote: > Finally getting to looking at this one - sorry about the very long delay. Ditto, I'm afraid. > I agree with Heikki's earlier comment that it's better to have this > included in the backend - but that's obviously not going to happen for > already-released versions. I'd therefor advocate a contrib module for > existing versions, and then in core for 9.1+. I think that was the conclusion the following discussion came to, as well. It's easy to distribute a binary .dll for older versions and if the EDB folks found this facility really useful they could even bundle it in the installer. There's no point backpatching it given how few win32 users will build from source. > We should then have an option to turn it off (on by default). But we > don't need to pay the overhead on every backend startup - we could > just map the value into the parameter shared memory block, and require > a full postmaster restart in order to change it. As discussed later, implemented by testing for a 'crashdumps' directory. If the directory exists, a dump is written. If the directory is missing a log message is emitted to explain why no dump was written; as this'll only happen when a backend dies, log noise isn't an issue and it'll help save sysadmin head-scratching given the "magic" behaviour of turning on/off based on directory existence. > Do we want to backpatch it into contrib/? Adding a new module there > seems kind of wrong - probably better to keep the source separate and > just publish the DLL files for people who do debugging? +1 to just publishing the DLLs > Looking at the code: > * Why do we need to look at differnt versions of dbghelp.dll? Can't we > always use the one with Windows? And in that case, can't we just use > compile-time linking, do we need to bother with DLL loading at all? Newer versions than those shipped with an OS release contain updated and improved functionality as well as bug fixes. We *can* use the version shipped with the OS, as even in XP dbghelp.dll contains the required functionality, but it'd be better to bundle it. That's microsoft's recommendation when you use dbghelp.dll . > * Per your comment about using elog() - a good idea in that case is to > use write_stderr(). That will send the output to stderr if there is > one, and otherwise the eventlog (when running as service). Hm, ok. The attached patch doesn't do that yet. Given the following comments, do you think the change still necessary? > * And yes, in the crash handler, we should *definitely* not use elog(). Currently I'm actually using it, but only after the dump has been written or we've decided not to write a dump. At this point, if elog() fails we don't really care, we'd just be returning control to the OS's fatal exception cleanup anyway. It's IMO desirable for output to go to the same log as everything else, so admins can find out what's going on and can associate the crash dump files with events that happened around a certain time. > * On Unix, the core file is dropped in the database directory, we > don't have a separate directory for crashdumps. If we want to be > consistent, we should do that here too. I do think that storing them > in a directory like "crashdumps" is better, but I just wanted to raise > the comment. Given the recent discussion about using the dir as an on/off switch, that'll be necessary. > * However, when storing it in crashdumps, I think the code would need > to create that directory if it does not exist, doesn't it? Not now that we're using it an on/off switch, but of course it would've made sense otherwise. > * Right now, we overwrite old crashdumps. It is well known that PIDs > are recycled pretty quickly on Windows - should we perhaps dump as > postgres-<pid>-<sequence>.mdmp when there is a filename conflict? Good idea. Rather than try to test for existing files I've just taken the simple route (always good in code run during process crash) and appended the system ticks to the dump name. System ticks are a 32-bit quantity that wraps about every 40 days, so they should be easily unique enough in combination with the pid. The logs will report the crashdump filenames unless the backend is too confused to be able to elog(), and all the crash dump files have file system timestamps so there isn't much point trying to format a crash date/time into their names. I could use wall clock seconds (or ms) instead, but favour system ticks because they can be read with a simple int-returning call that doesn't require any more stack allocation. On Windows, getting wall clock time in seconds seems to involve a call to GetSystemTimeAsFileTime (http://msdn.microsoft.com/en-us/library/ms724397(v=VS.85).aspx) to populate a struct containing a fake-64-bit int as two 32-bit ints. It's not clear that it's free of timezone calculations and generally looks like something better avoided in a crash handler if it's not necessary. As GetSystemTicks seems quite sufficient, I thought I'd stick with it. -- Craig Ringer
Вложения
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Craig Ringer
Дата:
On 6/12/2010 12:57 PM, Craig Ringer wrote: > On 22/11/2010 7:37 PM, Magnus Hagander wrote: > >> Finally getting to looking at this one - sorry about the very long delay. > > Ditto, I'm afraid. Oh, I forgot to mention in the patch email: I'm not sure I've taken the right approach in terms of how I've hooked the crash dump code into the build system. I'm fairly happy with when it's actually inited, and with the win32 portfile (ports/win32/crashdump.c) - what I'm not so sure about is how I've handled the conditional compilation by platform. Currently, src/backend/ports/win32/crashdump.c is compiled on win32. It exports a generic win32-garbage-free function: void installCrashDumpHandler(); I was just going to create a no-op version of the same function in a (new) file src/backend/ports/crashdump.c , but I was having problems figuring out how to sensibly express "compile this file for everything except these ports". In addition, doing it this way means there's still a pointless no-op function call in each backend start and some useless symbols - maybe not a big worry, but not ideal. What I ended up doing was putting a conditionally compiled switch in port.h that produces an inline no-op version of installCrashDumpHandler for unsupported platforms, and an extern for supported platforms (currently win32). That'll optimize out completely on unsupported platforms, which is nice if unimportant. I'm just not sure if port.h is really the right place and if it's only used by the backend. I considered a new header included by postgres.h (since it's really backend-only stuff) but as it seems Pg generally prefers bigger headers over more little headers, I popped it in port.h . Comments? -- Craig Ringer
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Craig Ringer
Дата:
I've attached an updated patch that fixes a failure when compiling on gcc/linux. The no-op inline installCrashDumpHandler() for unsupported platforms was not declared static, so it was not being optimized out of objects it wasn't used in and was causing symbol collisions during linkage. "make check" now passes correctly on Linux and no warnings are generated during compilation. ======================= All 124 tests passed. ======================= Sorry about this mess-up; I don't recall having to use "inline static" when using inlines in headers with c++, but as that was a while ago (and C++) my recollection can't be trusted anyway. Your thoughts on the following questions from before would be much appreciated: > I'm not sure I've taken the > right approach in terms of how I've hooked the crash dump code into the > build system. I'm fairly happy with when it's actually inited, and with > the win32 portfile (ports/win32/crashdump.c) - what I'm not so sure > about is how I've handled the conditional compilation by platform. > > Currently, src/backend/ports/win32/crashdump.c is compiled on win32. It > exports a generic win32-garbage-free function: > > void > installCrashDumpHandler(); > > I was just going to create a no-op version of the same function in a > (new) file src/backend/ports/crashdump.c , but I was having problems > figuring out how to sensibly express "compile this file for everything > except these ports". In addition, doing it this way means there's still > a pointless no-op function call in each backend start and some useless > symbols - maybe not a big worry, but not ideal. > > What I ended up doing was putting a conditionally compiled switch in > port.h that produces an inline no-op version of installCrashDumpHandler > for unsupported platforms, and an extern for supported platforms > (currently win32). That'll optimize out completely on unsupported > platforms, which is nice if unimportant. > > I'm just not sure if port.h is really the right place and if it's only > used by the backend. I considered a new header included by postgres.h > (since it's really backend-only stuff) but as it seems Pg generally > prefers bigger headers over more little headers, I popped it in port.h .
Вложения
Craig Ringer <craig@postnewspapers.com.au> writes: > I've attached an updated patch that fixes a failure when compiling on > gcc/linux. The no-op inline installCrashDumpHandler() for unsupported > platforms was not declared static, so it was not being optimized out of > objects it wasn't used in and was causing symbol collisions during linkage. Why in the world would you get involved in that portability mess for a function that is called only once? There's no possible performance justification for making it inline. I'm also wondering why you have got conflicting declarations in postgres.h and port.h, and why none of these declarations follow ANSI C (write "(void)" not "()"). regards, tom lane
I've updated this entry in the CommitFest app to note that Craig had some implementation questions attached to his patch submission that I haven't seen anyone address yet, and to include a reference to Tom's latest question--which may make those questions moot, not sure. This pretty clearly need to sit on the stove a little bit longer before it's done regardless. I'm marking this one Returned With Feedback, and hopefully Craig will continue hammering on this to clean up the remaining details and resubmit in the next month. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services and Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Craig Ringer
Дата:
On 12/15/2010 01:01 AM, Tom Lane wrote: > Craig Ringer<craig@postnewspapers.com.au> writes: >> I've attached an updated patch that fixes a failure when compiling on >> gcc/linux. The no-op inline installCrashDumpHandler() for unsupported >> platforms was not declared static, so it was not being optimized out of >> objects it wasn't used in and was causing symbol collisions during linkage. > > Why in the world would you get involved in that portability mess for a > function that is called only once? There's no possible performance > justification for making it inline. The main concern I heard voiced when first suggesting this was about performance. Given that concern, if I could make it a no-op on unix/linux I thought that worth doing. I'm _much_ happier with a simple, non-ifdef'd extern function declaration and compilation of an empty function body on unsupported platforms. Given how concerned everyone was about *any* effect on backend startup, though, I was concerned that'd be turned down as unnecessary bloat. I've done it a nicer way now, and will post the updated patch once I've had a chance to re-test it on my Windows dev box. > I'm also wondering why you have got conflicting declarations in > postgres.h and port.h, and why none of these declarations follow > ANSI C (write "(void)" not "()"). For postgres.h : that's a good question, as I thought I removed that. I suspect it was reintroduced when reapplying the patch to my working tree to revise it. Whoops. As for the ansi C style - too much time with C++, though long ago now. I think I got the PostgreSQL rules for code formatting right, but missed the void param rule. -- Craig Ringer
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Craig Ringer
Дата:
Hi all Updated patch attached. This one avoids the inline function stuff and instead just decides whether to compile unix_crashdump.c or win32_crashdump.c based on build system tests. It passes "make check" on nix and tests on win32, both with and without the "crashdumps" directory existing. The inadvertent duplication of the functon prototype is fixed. I haven't added any SGML documentation yet, as I'm not sure (a) to what level it should be documented, and (b) whether it should be in the main docs or just the developer docs plus the crash debugging guide on the wiki. Thoughts? You can also find this patch at: https://github.com/ringerc/postgres/tree/crashdump ie git://github.com/ringerc/postgres.git, branch "crashdump" . -- System & Network Administrator POST Newspapers
Вложения
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Craig Ringer
Дата:
BTW, related to the crash dump work: I wanted to see if I could get any integration into Windows Error Reporting for Pg, because it'd be nice to be able to use Microsoft's servers to aggregate crash reports and collect data on problems. WER (on Windows 7 and Vista) also provides automatic management of retained local copies of crash dumps, automatic bundling of related files, automatic out-of-process debugging, etc. WER has been significantly improved in Windows Visa / 7 and may be worth considering enabling its use when Pg is installed on those platforms. Do you think there'd be any interest in re-enabling WER when installation on win7 / vista is detected? I've seen mention that the win32 installers "go to some lengths" to disable WER because it tends to hang the server while waiting for a user prompt. I was wondering if/how the installer disables WER for Pg? There aren't any calls to AddERExcludedApplication() or WerAddExcludedApplication() in the Pg sources. Pg doesn't appear under: HKEY_CURRENT_USER\Software\Microsoft\Windows\Windows Error Reporting HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\WindowsError Reporting within an ExcludedApplications key either. WHY WINDOWS ERROR REPORTING? ============================ WER on Windows 7 permits things like an out-of-process debugger to be invoked. Once the postmaster knows how to manage generic children (as has been discussed to permit better integration of PgAgent among other things) it could also become responsible for doing out-of-process crash dumping and error reporting on crashed children rather than having to do potentially unreliable self-debugging as is presently the case. It might not even be necessary to do that much, as WER on win7/vista offers so much more configurability. On Win7/vista you can also set WER up not to block execution while it waits for user input, and to prompt the user even when the user is running under a different security level. It can automatically bundle postgresql.conf, the most recent log file, etc when an error report is sent. By contrast to app-generated minidumps, most importantly it doesn't require the user to do anything more than click "yes" for an error report to be collected and become available for analysys of most frequent failures. The downside is that it requires a *business* to register for access to WER data, and that business must obtain a VeriSign code signing cert (US$99 if obtained for WER purposes). This might be something the EDB folks would be interested in, especially as it should also permit shipping of signed Pg binaries, which is great for protecting against AV software, making it easier to deploy in business environments, etc. Note that Windows Error Reporting doens't eliminate the utility of the crashdumps feature, as self-crashdumps are easily accessible to and configurable by the local user/admin and work fine on Windows XP. The current self-generated crash dumps also contain more information than you'll usually get from a WER dump, though it may be possible to customise WER so it includes the same information. I'd really like to know more about past experience with Windows Error Reporting, because I'd like to investigate whether it's worth using in future with a few tweaks for win7 use. -- System & Network Administrator POST Newspapers
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Magnus Hagander
Дата:
On Thu, Dec 16, 2010 at 09:21, Craig Ringer <craig@postnewspapers.com.au> wrote: > Hi all > > Updated patch attached. This one avoids the inline function stuff and > instead just decides whether to compile unix_crashdump.c or > win32_crashdump.c based on build system tests. It passes "make check" on > nix and tests on win32, both with and without the "crashdumps" directory > existing. The inadvertent duplication of the functon prototype is fixed. I've updated this patch in quite a few ways, and you can find the latest on my github (will attach when I'm done with cleanups). Things I've changed so far: * I moved it all into backend/port/win32, and thus removed all the autoconf things, since they are not necessary. This is only for win32 at this point, and AFAIK we don't expect it to be for other platforms. If that's needed, we can move it back later, but for now let's keep it simple. * using elog() really isn't safe. We set the crash handler *long* before we have loaded the elog subsystem. It's better to get a log to eventlog than to not get it at all. * Plus a number of smaller changes One thing that I did notice, and I'm not entirely sure what to do with - which is why I wanted to post this while I'm still working on the cleanup: We use the existance of the "crashdumps" directory as an indication we want crashdumps. That's fine when the system is up. But what if we crash *in the postmaster before we have done chdir()*? Should we perhaps instead define a subdirectory of *where the .EXE file is*, and dump the file there? > I haven't added any SGML documentation yet, as I'm not sure (a) to what > level it should be documented, and (b) whether it should be in the main > docs or just the developer docs plus the crash debugging guide on the > wiki. Thoughts? It needs to be documented somewhere.Perhaps in "15.8 Platform Specific Notes"? That's really about building it, but it might be reasonable there anyway? It does hold a number of things today that aren't related to building, for other platforms. (And yes, it should go in the wiki as well, of course) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Thu, Dec 16, 2010 at 8:01 AM, Magnus Hagander <magnus@hagander.net> wrote: > We use the existance of the "crashdumps" directory as an indication we > want crashdumps. That's fine when the system is up. But what if we > crash *in the postmaster before we have done chdir()*? > > Should we perhaps instead define a subdirectory of *where the .EXE > file is*, and dump the file there? That seems pretty ugly, for a pretty marginal case. The number of crashes that are going to happen in the postmaster before we've done chdir() should be extremely small, especially if we're talking about crashes in the field rather than during development. How about adding a command-line option to force a dump to be written in $CWD whether a crashdumps directory exists or not? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Magnus Hagander
Дата:
On Thu, Dec 16, 2010 at 14:01, Magnus Hagander <magnus@hagander.net> wrote: > On Thu, Dec 16, 2010 at 09:21, Craig Ringer <craig@postnewspapers.com.au> wrote: >> Hi all >> >> Updated patch attached. This one avoids the inline function stuff and >> instead just decides whether to compile unix_crashdump.c or >> win32_crashdump.c based on build system tests. It passes "make check" on >> nix and tests on win32, both with and without the "crashdumps" directory >> existing. The inadvertent duplication of the functon prototype is fixed. > > I've updated this patch in quite a few ways, and you can find the > latest on my github (will attach when I'm done with cleanups). Things > I've changed so far: Found another problem in it: when running with an older version of dbghelp.dll (which I was), it simply didn't work. We need to grab the version of dbghelp.dll at runtime and pick which things we're going to dump based on that. The attached version of the patch does that. Can you please test that it still generates the full dump on your system, which supposedly has a much more modern version of dbghelp.dll than mine? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Вложения
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Magnus Hagander
Дата:
On Thu, Dec 16, 2010 at 15:07, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Dec 16, 2010 at 8:01 AM, Magnus Hagander <magnus@hagander.net> wrote: >> We use the existance of the "crashdumps" directory as an indication we >> want crashdumps. That's fine when the system is up. But what if we >> crash *in the postmaster before we have done chdir()*? >> >> Should we perhaps instead define a subdirectory of *where the .EXE >> file is*, and dump the file there? > > That seems pretty ugly, for a pretty marginal case. The number of > crashes that are going to happen in the postmaster before we've done > chdir() should be extremely small, especially if we're talking about > crashes in the field rather than during development. How about adding > a command-line option to force a dump to be written in $CWD whether a > crashdumps directory exists or not? Is that less ugly? ;) But yes, we are talking about in the field, so it's fairly small. But any crash during guc loading for example would go there, I think? We can do such a commandline. We don't have any platform-specific commandline options today. Is that something we've intentionally avoided, or just not needed before? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Thu, Dec 16, 2010 at 9:24 AM, Magnus Hagander <magnus@hagander.net> wrote: > Is that less ugly? ;) Well, I thought it was or I would have suggested it, but it's obviously open to interpretation. > But yes, we are talking about in the field, so it's fairly small. But > any crash during guc loading for example would go there, I think? Probably. But how likely is that to happen only on Windows and only in the field? > We can do such a commandline. We don't have any platform-specific > commandline options today. Is that something we've intentionally > avoided, or just not needed before? Beats me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Dec 16, 2010 at 9:24 AM, Magnus Hagander <magnus@hagander.net> wrote: >> We can do such a commandline. We don't have any platform-specific >> commandline options today. Is that something we've intentionally >> avoided, or just not needed before? > Beats me. Yes, it's something we intentionally avoid: there is no convenient way to have conditional entries in getopt()'s option-letters string. I think the proposal for such a switch is unnecessary lily-gilding, and is far more likely to create problems than solve any. Please remember that a debugging facility has got to be minimal impact, or it creates its own Heisenbugs. (In the unlikely event that someone needed to debug such an early crash, surely they could just create a crashdump subdirectory in whatever directory they want to launch the postmaster from.) regards, tom lane
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Magnus Hagander
Дата:
On Thu, Dec 16, 2010 at 15:56, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Dec 16, 2010 at 9:24 AM, Magnus Hagander <magnus@hagander.net> wrote: >>> We can do such a commandline. We don't have any platform-specific >>> commandline options today. Is that something we've intentionally >>> avoided, or just not needed before? > >> Beats me. > > Yes, it's something we intentionally avoid: there is no convenient way > to have conditional entries in getopt()'s option-letters string. > > I think the proposal for such a switch is unnecessary lily-gilding, > and is far more likely to create problems than solve any. Please > remember that a debugging facility has got to be minimal impact, > or it creates its own Heisenbugs. > > (In the unlikely event that someone needed to debug such an early crash, > surely they could just create a crashdump subdirectory in whatever > directory they want to launch the postmaster from.) Or actually - run the postmaster from a debugger... I think the thing we lose is the ability for the DBA to say pre-emptively "if this crashes tomorrow on restart i want a crashdump". Hmm. What we could do is have pg_ctl chdir() into the data directory on start. We don't set it to anything there now - so we rely on the "If this parameter is NULL, the new process will have the same current drive and directory as the calling process." Is that likely to cause problems in other areas? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> writes: > On Thu, Dec 16, 2010 at 15:56, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think the proposal for such a switch is unnecessary lily-gilding, > Hmm. What we could do is have pg_ctl chdir() into the data directory > on start. See above. You're solving a problem that probably doesn't exist, and introducing a pile of new ones in the process --- for example, this will break commands that use relative paths for either the postmaster executable or the data directory itself. regards, tom lane
Magnus Hagander <magnus@hagander.net> writes: > Found another problem in it: when running with an older version of > dbghelp.dll (which I was), it simply didn't work. We need to grab the > version of dbghelp.dll at runtime and pick which things we're going to > dump based on that. > The attached version of the patch does that. Can you please test that > it still generates the full dump on your system, which supposedly has > a much more modern version of dbghelp.dll than mine? This version of the patch looks fairly sane in terms of how it connects to the rest of the code; but I'm unqualified to comment on the Windows-specific code. One thought is maybe the call point should be in startup_hacks() rather than the main line of main()? I'm not terribly set on it if you don't like that. Also, I notice that the comment for startup_hacks() claims that it isn't executed in subprocesses, which is evidently an obsolete statement --- that should get cleaned up independently of this. regards, tom lane
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Craig Ringer
Дата:
On 16/12/10 21:01, Magnus Hagander wrote: > Found another problem in it: when running with an older version of > dbghelp.dll (which I was), it simply didn't work. We need to grab the > version of dbghelp.dll at runtime and pick which things we're going to > dump based on that. I was about to suggest dropping the fallback loading of the system dbghelp.dll, and just bundle a suitable version with Pg, then I saw how big the DLL is. It's 800kb (!!) of code that'll hopefully never get used on any given system. With a footprint like that, bundling it seems rather less attractive. I think Magnus is right: test the dbghelp.dll version and fall back to supported features - as far back as XP, anyway; who cares about anything earlier than that. An updated version can always be put in place by the user if the built-in one can't produce enough detail. > * I moved it all into backend/port/win32, and thus removed all the > autoconf things, since they are not necessary. This is only for win32 > at this point, and AFAIK we don't expect it to be for other platforms. > If that's needed, we can move it back later, but for now let's keep it > simple. Yeah, fair call - especially as most other OSes have native crash dump facilities (kernel-generated core dumps, etc) that render explicit crash handling much less useful. > * using elog() really isn't safe. We set the crash handler *long* > before we have loaded the elog subsystem. It's better to get a log to > eventlog than to not get it at all. Good point. It made some sense when running as a loadable module, as the crash dumper was only loaded quite late, and so long as it elog()'d only after writing the dump that was fine. Now, not so much. > We use the existance of the "crashdumps" directory as an indication we > want crashdumps. That's fine when the system is up. But what if we > crash *in the postmaster before we have done chdir()*? > > Should we perhaps instead define a subdirectory of *where the .EXE > file is*, and dump the file there? Well, on Windows it's pretty easy to find out where your executable is, and the crash dumper already does that to determine where to load dbghelp.dll from. It'd be no stretch to use a subdir of the postgresql install dir for crash dumps. OTOH, there are all sorts of exciting permissions issues to deal with then, as write permission for "postgres" won't be inherited from the datadir. It's also generally a big ugly and not very good practice to write to the application directory. If not writing to the datadir, for non-services you'd use something like: %LOCALAPPDATA%\postgres\crashdumps ... but for a service running under its own account that's not a good option either. Consequently, IMO the datadir remains the best place for crashdumps. Is this going to be a real-world issue? One simple answer might be not to load the crash dump handler until after the postmaster chdir()s, but honestly I'm not sure it matters. If it can't find somewhere to write dumps, it doesn't write a dump. The only possible issue I can imagine would be if the postmaster started with a cwd that a malicious user could write to, so they could create a 'crashdumps' dir, somehow cause an early start postmaster crash, therefore somehow leak info from the protected "postgres" account files that way. A quick look at the code suggests that the postmaster doesn't do anything interesting in the datadir before chdir()ing in there, so I don't see any danger in that. In any case, if a malicious user has write access to the postmaster's startup cwd, with the design of Windows PATH and library loading that means they can replace critical DLLs with their own malicious versions, so there's a whole lot more to worry about than leaking information via crashdumps. So: I say just use the cwd, whatever it is, and don't worry about it. An admin can manually create a "crashdumps" dir in the service start cwd and set appropriate permissions in the unlikely event they're trying to collect crash dumps for an early-crashing postmaster and they can't just use a debugger directly. > It needs to be documented somewhere.Perhaps in "15.8 Platform Specific > Notes"? That's really about building it, but it might be reasonable > there anyway? It does hold a number of things today that aren't > related to building, for other platforms. Seems reasonable. -- System & Network Administrator POST Newspapers
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Magnus Hagander
Дата:
<p><br /> On Dec 17, 2010 8:02 AM, "Craig Ringer" <<a href="mailto:craig@postnewspapers.com.au">craig@postnewspapers.com.au</a>>wrote:<br /> ><br /> > On 16/12/10 21:01,Magnus Hagander wrote:<br /> ><br /> > > Found another problem in it: when running with an older version of<br/> > > dbghelp.dll (which I was), it simply didn't work. We need to grab the<br /> > > version of dbghelp.dllat runtime and pick which things we're going to<br /> > > dump based on that.<br /> ><br /> > I wasabout to suggest dropping the fallback loading of the system<br /> > dbghelp.dll, and just bundle a suitable versionwith Pg, then I saw how<br /> > big the DLL is. It's 800kb (!!) of code that'll hopefully never get used<br />> on any given system. With a footprint like that, bundling it seems<br /> > rather less attractive.<br /> ><br/> > I think Magnus is right: test the dbghelp.dll version and fall back to<br /> > supported features - asfar back as XP, anyway; who cares about anything<br /> > earlier than that. An updated version can always be put inplace by the<br /> > user if the built-in one can't produce enough detail.<p>Did you get a chance to test that it stillproduced a full dump on your system? <p>> > We use the existance of the "crashdumps" directory as an indicationwe<br /> > > want crashdumps. That's fine when the system is up. But what if we<br /> > > crash *inthe postmaster before we have done chdir()*?<br /> > ><br /> > > Should we perhaps instead define a subdirectoryof *where the .EXE<br /> > > file is*, and dump the file there?<p>... <p>> Is this going to be a real-worldissue?<p>Nah, I'm with tom on the fact that this is probably not. <br /><p>> > It needs to be documentedsomewhere.Perhaps in "15.8 Platform Specific<br /> > > Notes"? That's really about building it, but it mightbe reasonable<br /> > > there anyway? It does hold a number of things today that aren't<br /> > > relatedto building, for other platforms.<br /> ><br /> > Seems reasonable.<p>I'll put something together there then.<p>/Magnus
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Craig Ringer
Дата:
On 17/12/2010 3:46 PM, Magnus Hagander wrote: > > On Dec 17, 2010 8:02 AM, "Craig Ringer" <craig@postnewspapers.com.au > <mailto:craig@postnewspapers.com.au>> wrote: > > > > On 16/12/10 21:01, Magnus Hagander wrote: > > > > > Found another problem in it: when running with an older version of > > > dbghelp.dll (which I was), it simply didn't work. We need to grab the > > > version of dbghelp.dll at runtime and pick which things we're going to > > > dump based on that. > > > > I was about to suggest dropping the fallback loading of the system > > dbghelp.dll, and just bundle a suitable version with Pg, then I saw how > > big the DLL is. It's 800kb (!!) of code that'll hopefully never get used > > on any given system. With a footprint like that, bundling it seems > > rather less attractive. > > > > I think Magnus is right: test the dbghelp.dll version and fall back to > > supported features - as far back as XP, anyway; who cares about anything > > earlier than that. An updated version can always be put in place by the > > user if the built-in one can't produce enough detail. > > Did you get a chance to test that it still produced a full dump on your > system? I've just tested that and it looks ok so far. The dumps produced are smaller - 1.3MB instead of ~4MB for a dump produced by calling a simple "crashme()" function from SQL, the source for which follows at the end of this post. However, I'm getting reasonable stack traces, as shown by the windb.exe output below. That said, the dump appears to exclude the backend's private memory. I can't resolve *fcinfo from PG_FUNCTION_ARGS; it looks like the direct values of arguments are available as they're on the stack, as are function local variables, but the process heap isn't dumped so you can't see the contents of any pointers-to-struct. That'll make it harder to track down many kinds of error - but, on the other hand, means that dumps of processes with huge work_mem etc won't be massively bloated. That might be a safer starting point than including the private process memory, really. I'd like to offer a flag to turn on dumping of private process memory but I'm not sure how best to go about it, given that we'd want to avoid accessing GUCs etc if possible. "later", I think; this is better for now. I'm happy with the patch as it stands, with the one exception that consideration of mingw is required before it can be committed. > Symbol search path is: C:\Users\Craig\Developer\postgres\Release\postgres;SRV*c:\localsymbols*http://msdl.microsoft.com/download/symbols;C:\Program Files\PostgreSQL\9.0\symbols > Executable search path is: > Windows 7 Version 7600 MP (2 procs) Free x86 compatible > Product: WinNt, suite: SingleUserTS > Machine Name: > Debug session time: Fri Dec 17 17:37:19.000 2010 (UTC + 8:00) > System Uptime: not available > Process Uptime: 0 days 0:01:09.000 > ................................ > This dump file has an exception of interest stored in it. > The stored exception information can be accessed via .ecxr. > (16b4.cd8): Access violation - code c0000005 (first/second chance not available) > eax=000002e8 ebx=03210aa8 ecx=0055e700 edx=03210a68 esi=03210a68 edi=0055ea14 > eip=771464f4 esp=0055e6d4 ebp=0055e6e4 iopl=0 nv up ei pl zr na pe nc > cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246 > ntdll!KiFastSystemCallRet: > 771464f4 c3 ret > 0:000> .ecxr > eax=00000000 ebx=00000000 ecx=0103fa88 edx=7001100f esi=0103fa78 edi=0103fab4 > eip=70011052 esp=0055f62c ebp=0103fa78 iopl=0 nv up ei pl zr na pe nc > cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010246 > crashme!crashdump_crashme+0x2: > 70011052 c70001000000 mov dword ptr [eax],1 ds:0023:00000000=???????? > 0:000> ~ > . 0 Id: 16b4.cd8 Suspend: 0 Teb: 7ffdf000 Unfrozen > 1 Id: 16b4.1798 Suspend: 0 Teb: 7ffde000 Unfrozen > 2 Id: 16b4.6f4 Suspend: 0 Teb: 7ffdd000 Unfrozen > 0:000> k > *** Stack trace for last set context - .thread/.cxr resets it > ChildEBP RetAddr > 0055f628 0142c797 crashme!crashdump_crashme+0x2 [c:\users\craig\developer\postgres\contrib\crashme\crashme.c @ 14] > 0055f68c 0142c804 postgres!ExecMakeFunctionResult+0x427 [c:\users\craig\developer\postgres\src\backend\executor\execqual.c@ 1824] > 0055f6b4 0142b760 postgres!ExecEvalFunc+0x34 [c:\users\craig\developer\postgres\src\backend\executor\execqual.c @ 2260] > 0055f6e0 0142ba83 postgres!ExecTargetList+0x70 [c:\users\craig\developer\postgres\src\backend\executor\execqual.c @ 5095] > 0055f720 0143f074 postgres!ExecProject+0x173 [c:\users\craig\developer\postgres\src\backend\executor\execqual.c @ 5312] > 0055f734 01427e07 postgres!ExecResult+0x94 [c:\users\craig\developer\postgres\src\backend\executor\noderesult.c @ 157] > 0055f744 01425ccd postgres!ExecProcNode+0x67 [c:\users\craig\developer\postgres\src\backend\executor\execprocnode.c @ 361] > 0055f758 01426ace postgres!ExecutePlan+0x2d [c:\users\craig\developer\postgres\src\backend\executor\execmain.c @ 1236] > 0055f788 0152ec7d postgres!standard_ExecutorRun+0x8e [c:\users\craig\developer\postgres\src\backend\executor\execmain.c@ 288] > 0055f7ac 0152f290 postgres!PortalRunSelect+0x6d [c:\users\craig\developer\postgres\src\backend\tcop\pquery.c @ 953] > 0055f834 0152c2b2 postgres!PortalRun+0x190 [c:\users\craig\developer\postgres\src\backend\tcop\pquery.c @ 803] > 0055f8e8 0152cbe5 postgres!exec_simple_query+0x3a2 [c:\users\craig\developer\postgres\src\backend\tcop\postgres.c @ 1067] > 0055f96c 014f2bfc postgres!PostgresMain+0x575 [c:\users\craig\developer\postgres\src\backend\tcop\postgres.c @ 3935] > 0055f98c 014f58c9 postgres!BackendRun+0x19c [c:\users\craig\developer\postgres\src\backend\postmaster\postmaster.c @ 3562] > 0055fb30 014575bc postgres!SubPostmasterMain+0x2f9 [c:\users\craig\developer\postgres\src\backend\postmaster\postmaster.c@ 4058] > 0055fb48 0162847d postgres!main+0x1ec [c:\users\craig\developer\postgres\src\backend\main\main.c @ 173] > 0055fb8c 76ab1194 postgres!__tmainCRTStartup+0x10f [f:\dd\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 586] > 0055fb98 7715b495 kernel32!BaseThreadInitThunk+0xe > 0055fbd8 7715b468 ntdll!__RtlUserThreadStart+0x70 > 0055fbf0 00000000 ntdll!_RtlUserThreadStart+0x1b /* Condensed source for crash-test contrib module. */ #include "postgres.h" #include "fmgr.h" PG_MODULE_MAGIC; extern Datum crashdump_crashme(PG_FUNCTION_ARGS); PG_FUNCTION_INFO_V1(crashdump_crashme); Datum crashdump_crashme(PG_FUNCTION_ARGS) {int * ptr = NULL;*ptr = 1;return NULL; } -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Magnus Hagander
Дата:
On Fri, Dec 17, 2010 at 12:08, Craig Ringer <craig@postnewspapers.com.au> wrote: > On 17/12/2010 3:46 PM, Magnus Hagander wrote: >> >> On Dec 17, 2010 8:02 AM, "Craig Ringer" <craig@postnewspapers.com.au >> <mailto:craig@postnewspapers.com.au>> wrote: >> > >> > On 16/12/10 21:01, Magnus Hagander wrote: >> > >> > > Found another problem in it: when running with an older version of >> > > dbghelp.dll (which I was), it simply didn't work. We need to grab the >> > > version of dbghelp.dll at runtime and pick which things we're going >> to >> > > dump based on that. >> > >> > I was about to suggest dropping the fallback loading of the system >> > dbghelp.dll, and just bundle a suitable version with Pg, then I saw how >> > big the DLL is. It's 800kb (!!) of code that'll hopefully never get >> used >> > on any given system. With a footprint like that, bundling it seems >> > rather less attractive. >> > >> > I think Magnus is right: test the dbghelp.dll version and fall back to >> > supported features - as far back as XP, anyway; who cares about >> anything >> > earlier than that. An updated version can always be put in place by the >> > user if the built-in one can't produce enough detail. >> >> Did you get a chance to test that it still produced a full dump on your >> system? > > I've just tested that and it looks ok so far. The dumps produced are smaller > - 1.3MB instead of ~4MB for a dump produced by calling a simple "crashme()" > function from SQL, the source for which follows at the end of this post. > However, I'm getting reasonable stack traces, as shown by the windb.exe > output below. > > That said, the dump appears to exclude the backend's private memory. I can't > resolve *fcinfo from PG_FUNCTION_ARGS; it looks like the direct values of > arguments are available as they're on the stack, as are function local > variables, but the process heap isn't dumped so you can't see the contents > of any pointers-to-struct. That'll make it harder to track down many kinds > of error - but, on the other hand, means that dumps of processes with huge > work_mem etc won't be massively bloated. What version of dbghelp do you have? And what does the API report for it? The code is supposed to add the private memory if it finds version 6 or higher, perhaps that check is incorrect? > That might be a safer starting point than including the private process > memory, really. I'd like to offer a flag to turn on dumping of private > process memory but I'm not sure how best to go about it, given that we'd > want to avoid accessing GUCs etc if possible. "later", I think; this is > better for now. I'm not too happy with having a bunch of switches for it. People likely to tweak those can just install the debugger tools and use those, I think. So I'd be happy to have it enabled - given that we want a default. However, what I'm most interested in right now is finding out why it's not being included anymore in the new patch, because it's supposed to be.. > I'm happy with the patch as it stands, with the one exception that > consideration of mingw is required before it can be committed. What do you mean? That it has to be tested on mingw specifically, or something else? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> writes: > On Fri, Dec 17, 2010 at 12:08, Craig Ringer <craig@postnewspapers.com.au> wrote: >> That might be a safer starting point than including the private process >> memory, really. > I'm not too happy with having a bunch of switches for it. People > likely to tweak those can just install the debugger tools and use > those, I think. Yeah. > So I'd be happy to have it enabled - given that we want a default. There is absolutely nothing more frustrating than having a crash dump that hasn't got the information you need. Please turn it on by default. regards, tom lane
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Craig Ringer
Дата:
On 17/12/2010 7:17 PM, Magnus Hagander wrote: > What version of dbghelp do you have? 6.1.7600.16385 by default, as shipped in Windows 7 32-bit, and what I was testing with. 6.12.0002.633 is what came with my copy of Debugging Tools for windows, from the windows SDK. The same version comes with Visual Studio 10 Express. 6.9.0003.113 is what came with Visual Studio 9 Express. I've now re-tested with the latest, 6.12.0002.633, and have the same result. > And what does the API report for it? This: version = (*pApiVersion)(); write_stderr("version: %hu.%hu.%hu\n",version->MajorVersion,version->MinorVersion,version->Revision); outputs "version: 4.0.5" when loading dbghelp.dll 6.12.0002.633 from c:\postgres91\bin\dbghelp.dll as verified by a write_stderr() just before LoadLibrary and a test verifying the LoadLibrary worked. Shouldn't dbghelp.dll just ignore any flags it doesn't understand? I'm surprised we can't just pass flags a particular version doesn't know about and have it deal with them. Still, I guess it's not wise to assume such things. > The code is supposed to add the private memory if it finds version > 6 or higher, perhaps that check is incorrect? It begins to look like the version check might be lying. I'll dig further in a bit; it's 12:20am now and I need to sleep. I'm going on holiday in about 48 hours, and will be away from Windows (phew!) unless I get a VM set up during that time. I'll see what I can do. >> I'm happy with the patch as it stands, with the one exception that >> consideration of mingw is required before it can be committed. > > What do you mean? That it has to be tested on mingw specifically, or > something else? I'm not even sure it'll compile or link with MinGW, and if it does, I doubt it'll be useful. It should only be compiled and called for native VC++ builds. I'm not sure if backend\port\win32 is built for all win32, or only for VC++. Testing for defined(_MSC_VER) would do the trick. I'm not sure testing defined(WIN32_ONLY_COMPILER) is quite right, since I doubt dbghelp.dll would be much use for a Pg compiled with Borland or the like either if that were ever supported. -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Magnus Hagander
Дата:
On Fri, Dec 17, 2010 at 17:24, Craig Ringer <craig@postnewspapers.com.au> wrote: > On 17/12/2010 7:17 PM, Magnus Hagander wrote: > >> What version of dbghelp do you have? > > 6.1.7600.16385 by default, as shipped in Windows 7 32-bit, and what I was > testing with. > > 6.12.0002.633 is what came with my copy of Debugging Tools for windows, from > the windows SDK. The same version comes with Visual Studio 10 Express. > > 6.9.0003.113 is what came with Visual Studio 9 Express. > > I've now re-tested with the latest, 6.12.0002.633, and have the same result. > >> And what does the API report for it? > > This: > > version = (*pApiVersion)(); > write_stderr("version: %hu.%hu.%hu\n", > version->MajorVersion, > version->MinorVersion, > version->Revision); > > outputs "version: 4.0.5" when loading dbghelp.dll 6.12.0002.633 from > c:\postgres91\bin\dbghelp.dll as verified by a write_stderr() just before > LoadLibrary and a test verifying the LoadLibrary worked. Now, that's annoying. So clearly we can't use that function to determine which version we're on. Seems it only works for "image help api", and not the general thing. According to http://msdn.microsoft.com/en-us/library/ms679294(v=vs.85).aspx, we could look for: SysEnumLines - if present, we have at least 6.1. However, I don't see any function that appeared in 6.0 only.. We're probably better off looking at the VERSIONINFO structure. Which has a not-so-nice API, but at least it's documented :) > Shouldn't dbghelp.dll just ignore any flags it doesn't understand? I'm > surprised we can't just pass flags a particular version doesn't know about > and have it deal with them. Still, I guess it's not wise to assume such > things. You'd think so, but if I include all the flags and run it with the default one on Windows 2003, it comes back with the infamous "Parameter Is Incorrect" error. >> The code is supposed to add the private memory if it finds version >> 6 or higher, perhaps that check is incorrect? > > It begins to look like the version check might be lying. I'll dig further in > a bit; it's 12:20am now and I need to sleep. > > I'm going on holiday in about 48 hours, and will be away from Windows > (phew!) unless I get a VM set up during that time. I'll see what I can do. I'll try to put together a version that uses the VERSIONINFO to determine it, for you to test. >>> I'm happy with the patch as it stands, with the one exception that >>> consideration of mingw is required before it can be committed. >> >> What do you mean? That it has to be tested on mingw specifically, or >> something else? > > I'm not even sure it'll compile or link with MinGW, and if it does, I doubt > it'll be useful. It should only be compiled and called for native VC++ > builds. I'm pretty sure it could be useful with mingw as well. not *as* useful, but still useful. You'd get proper stack traces and such for anything in msvcrt, and the actual reason for the exception etc. > I'm not sure if backend\port\win32 is built for all win32, or only for VC++. It's for all win32. > Testing for defined(_MSC_VER) would do the trick. I'm not sure testing > defined(WIN32_ONLY_COMPILER) is quite right, since I doubt dbghelp.dll would > be much use for a Pg compiled with Borland or the like either if that were > ever supported. We don't support building the backend with borland - only libpq and psql. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Magnus Hagander
Дата:
On Fri, Dec 17, 2010 at 17:42, Magnus Hagander <magnus@hagander.net> wrote: > On Fri, Dec 17, 2010 at 17:24, Craig Ringer <craig@postnewspapers.com.au> wrote: >> On 17/12/2010 7:17 PM, Magnus Hagander wrote: > Now, that's annoying. So clearly we can't use that function to > determine which version we're on. Seems it only works for "image help > api", and not the general thing. > > According to http://msdn.microsoft.com/en-us/library/ms679294(v=vs.85).aspx, > we could look for: > > SysEnumLines - if present, we have at least 6.1. > > However, I don't see any function that appeared in 6.0 only.. Actually, I'm wrong - there are functions enough to determine the version. So here's a patch that tries that. (BTW, the document is actually incorrect - partially because the version that's shipped with Windows 2003 which is 5.2, is more or less undocumented) Let me know how that one works for you, and if it doesn't, whether it does actually detect any of those higher versions by searching for the functions. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Вложения
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Craig Ringer
Дата:
On 17/12/2010 11:17 PM, Tom Lane wrote: >> So I'd be happy to have it enabled - given that we want a default. > > There is absolutely nothing more frustrating than having a crash dump > that hasn't got the information you need. Please turn it on by default. There have to be limits to that, though. dbghelp.dll can dump shared memory, too - but it's not going to be practical to write or work with many-gigabyte dumps most of the time, and neither my initial patch nor Magnus's recent reworking of it ask windbg.dll to include shared memory in the dump. It's possible to tell dbghelp.dll to include specific additional memory ranges, so it may well be worth including specific critical areas within the shared memory block while omitting the bulk of the buffers. Doing that safely in an unsafe we're-crashing environment might not be simple, though, especially if you're not even sure whether you got far into startup before crashing. -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Craig Ringer
Дата:
On 18/12/2010 1:13 AM, Magnus Hagander wrote: > On Fri, Dec 17, 2010 at 17:42, Magnus Hagander<magnus@hagander.net> wrote: >> On Fri, Dec 17, 2010 at 17:24, Craig Ringer<craig@postnewspapers.com.au> wrote: >>> On 17/12/2010 7:17 PM, Magnus Hagander wrote: >> Now, that's annoying. So clearly we can't use that function to >> determine which version we're on. Seems it only works for "image help >> api", and not the general thing. >> >> According to http://msdn.microsoft.com/en-us/library/ms679294(v=vs.85).aspx, >> we could look for: >> >> SysEnumLines - if present, we have at least 6.1. >> >> However, I don't see any function that appeared in 6.0 only.. > > Actually, I'm wrong - there are functions enough to determine the > version. So here's a patch that tries that. Great. I pulled the latest from your git tree, tested that, and got much better results. Crashdump size is back to what I expected. In my test code, fcinfo->args and fcinfo->argnull can be examined without problems. Backtraces look good; see below. It seems to be including backend private memory again now. Thanks _very_ much for your work on this. fcinfo->flinfo is still inaccessible, but I suspect it's in shared memory, as it's at 0x00000135 . Ditto fcinfo->resultinfo and fcinfo->context. This has me wondering - is it going to be necessary to dump shared memory to make many backtraces useful? I just responded to Tom mentioning that the patch doesn't currently dump shared memory, but I hadn't realized the extent to which it's used for _lots_ more than just disk buffers. I'm not sure how to handle dumping shared_buffers when someone might be using multi-gigabyte shared_buffers, though. Dumping the whole lot would risk sudden out-of-disk-space issues, slowdowns as dumps are written, and the backend being "frozen" as it's being dumped could delay the system coming back up again. Trying to selectively dump critical parts could cause dumps to fail if the system is in early startup or a bad state. The same concern applies to writing backend private memory; it's fine most of the time, but if you're doing data warehousing queries with 2GB of work_mem, it's going to be nasty having all that extra disk I/O and disk space use, not to mention the hold-up while the dump is written. If this is something we want to have people running in production "just in case" or to track down rare / hard to reproduce faults, that'll be a problem. OTOH, we can't really go poking around in palloc contexts to decide what to dump. I guess we could always do a small, minimalist minidump, then write _another_ dump that attempts to include select parts of shm and backend private memory. I just thought of two other things, too: - Is it possible for this handler to be called recursively if it fails during the handler call? If so, do we need to uninstall the handler before attempting a dump to avoid such recursion? I need to do some testing and dig around MSDN to find out more about this. - Can asynchronous events like signals (or their win32 emulation) interrupt an executing crash handler, or are they blocked before the crash handler is called? If they're not blocked, do we need to try to block them before attempting a dump? Again, I need to do some reading on this. Anyway, here's an example of the backtraces I'm currently getting. They're clearly missing some parameters (in shm? Unsure) but provide source file+line, argument values where resolvable, and the call stack its self. Locals are accessible at all levels of the stack when you go poking around in windbg. > This dump file has an exception of interest stored in it. > The stored exception information can be accessed via .ecxr. > (930.12e8): Access violation - code c0000005 (first/second chance not available) > eax=00bce2c0 ebx=72d0e800 ecx=000002e4 edx=72cb81c8 esi=000000f0 edi=00000930 > eip=771464f4 esp=00bce294 ebp=00bce2a4 iopl=0 nv up ei pl zr na pe nc > cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246 > ntdll!KiFastSystemCallRet: > 771464f4 c3 ret > 0:000> .ecxr > eax=00000000 ebx=00000000 ecx=015fd7d8 edx=7362100f esi=015fd7c8 edi=015fd804 > eip=73621052 esp=00bcf284 ebp=015fd7c8 iopl=0 nv up ei pl zr na pe nc > cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010246 > crashme!crashdump_crashme+0x2: > 73621052 c70001000000 mov dword ptr [eax],1 ds:0023:00000000=???????? > 0:000> kp > *** Stack trace for last set context - .thread/.cxr resets it > ChildEBP RetAddr > 00bcf280 0031c797 crashme!crashdump_crashme(struct FunctionCallInfoData * fcinfo = 0x015e3318)+0x2 [c:\users\craig\developer\postgres\contrib\crashme\crashme.c@ 14] > 00bcf2e4 0031c804 postgres!ExecMakeFunctionResult(struct FuncExprState * fcache = 0x015e3318, struct ExprContext * econtext= 0x00319410, char * isNull = 0x00000000 "", ExprDoneCond * isDone = 0x7362100f)+0x427 [c:\users\craig\developer\postgres\src\backend\executor\execqual.c@ 1824] > 00bcf30c 0031b760 postgres!ExecEvalFunc(struct FuncExprState * fcache = 0x00000000, struct ExprContext * econtext = 0x00000000,char * isNull = 0x00000000 "", ExprDoneCond * isDone = 0x00000000)+0x34 [c:\users\craig\developer\postgres\src\backend\executor\execqual.c@ 2260] > 00bcf338 0031ba83 postgres!ExecTargetList(struct List * targetlist = 0x00000000, struct ExprContext * econtext = 0x00000000,unsigned int * values = 0x00000000, char * isnull = 0x00000000 "", ExprDoneCond * itemIsDone = 0x00000000, ExprDoneCond* isDone = 0x00000000)+0x70 [c:\users\craig\developer\postgres\src\backend\executor\execqual.c @ 5095] > 00bcf378 0032f074 postgres!ExecProject(struct ProjectionInfo * projInfo = 0x00000000, ExprDoneCond * isDone = 0x00000000)+0x173[c:\users\craig\developer\postgres\src\backend\executor\execqual.c @ 5312] > 00bcf38c 00317e07 postgres!ExecResult(struct ResultState * node = <Memory access error>)+0x94 [c:\users\craig\developer\postgres\src\backend\executor\noderesult.c@ 157] > 00bcf39c 00315ccd postgres!ExecProcNode(struct PlanState * node = <Memory access error>)+0x67 [c:\users\craig\developer\postgres\src\backend\executor\execprocnode.c@ 361] > 00bcf3b0 00316ace postgres!ExecutePlan(struct EState * estate = 0x015fd7c8, struct PlanState * planstate = <Memory accesserror>, CmdType operation = <Memory access error>, char sendTuples = <Memory access error>, long numberTuples = <Memoryaccess error>, ScanDirection direction = NoMovementScanDirection (0n0), struct _DestReceiver * dest = <Memory accesserror>)+0x2d [c:\users\craig\developer\postgres\src\backend\executor\execmain.c @ 1236] > 00bcf3e0 0041ec5d postgres!standard_ExecutorRun(struct QueryDesc * queryDesc = <Memory access error>, ScanDirection direction= <Memory access error>, long count = <Memory access error>)+0x8e [c:\users\craig\developer\postgres\src\backend\executor\execmain.c@ 288] > 00bcf404 0041f270 postgres!PortalRunSelect(struct PortalData * portal = 0x00000000, char forward = <Memory access error>,long count = <Memory access error>, struct _DestReceiver * dest = <Memory access error>)+0x6d [c:\users\craig\developer\postgres\src\backend\tcop\pquery.c@ 953] > 00bcf48c 0041c292 postgres!PortalRun(struct PortalData * portal = 0x015fb5b8, long count = 0n2147483647, char isTopLevel= 0n1 '', struct _DestReceiver * dest = 0x015e3418, struct _DestReceiver * altdest = 0x015e3418, char * completionTag= 0x00bcf500 "")+0x190 [c:\users\craig\developer\postgres\src\backend\tcop\pquery.c @ 803] > 00bcf540 0041cbc5 postgres!exec_simple_query(char * query_string = 0x015fd7d8 "???")+0x3a2 [c:\users\craig\developer\postgres\src\backend\tcop\postgres.c@ 1067] > 00bcf5c4 003e2bdc postgres!PostgresMain(int argc = 0n2, char ** argv = 0x01555138, char * username = 0x00d484a0 "Craig")+0x575[c:\users\craig\developer\postgres\src\backend\tcop\postgres.c @ 3935] > 00bcf5e4 003e58a9 postgres!BackendRun(struct Port * port = 0x00000000)+0x19c [c:\users\craig\developer\postgres\src\backend\postmaster\postmaster.c@ 3562] > 00bcf788 003475bc postgres!SubPostmasterMain(int argc = 0n13900471, char ** argv = 0x00d41ac5)+0x2f9 [c:\users\craig\developer\postgres\src\backend\postmaster\postmaster.c@ 4058] > 00bcf7a0 0051845d postgres!main(int argc = 0n1990922644, char ** argv = 0x7ffdf000)+0x1ec [c:\users\craig\developer\postgres\src\backend\main\main.c@ 173] > 00bcf7e4 76ab1194 postgres!__tmainCRTStartup(void)+0x10f [f:\dd\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 586] > 00bcf7f0 7715b495 kernel32!BaseThreadInitThunk+0xe > 00bcf830 7715b468 ntdll!__RtlUserThreadStart+0x70 > 00bcf848 00000000 ntdll!_RtlUserThreadStart+0x1b -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Magnus Hagander
Дата:
On Sun, Dec 19, 2010 at 07:26, Craig Ringer <craig@postnewspapers.com.au> wrote: > On 18/12/2010 1:13 AM, Magnus Hagander wrote: >> >> On Fri, Dec 17, 2010 at 17:42, Magnus Hagander<magnus@hagander.net> >> wrote: >>> >>> On Fri, Dec 17, 2010 at 17:24, Craig Ringer<craig@postnewspapers.com.au> >>> wrote: >>>> >>>> On 17/12/2010 7:17 PM, Magnus Hagander wrote: >>> >>> Now, that's annoying. So clearly we can't use that function to >>> determine which version we're on. Seems it only works for "image help >>> api", and not the general thing. >>> >>> According to >>> http://msdn.microsoft.com/en-us/library/ms679294(v=vs.85).aspx, >>> we could look for: >>> >>> SysEnumLines - if present, we have at least 6.1. >>> >>> However, I don't see any function that appeared in 6.0 only.. >> >> Actually, I'm wrong - there are functions enough to determine the >> version. So here's a patch that tries that. > > Great. I pulled the latest from your git tree, tested that, and got much > better results. Crashdump size is back to what I expected. In my test code, > fcinfo->args and fcinfo->argnull can be examined without problems. > Backtraces look good; see below. It seems to be including backend private > memory again now. Thanks _very_ much for your work on this. Ok, great. I think that leaves us at least complete enough to commit - we can always improve things further as we get some more real world testing. > fcinfo->flinfo is still inaccessible, but I suspect it's in shared memory, > as it's at 0x00000135 . Ditto fcinfo->resultinfo and fcinfo->context. Hmm. Not sure why those would be in shared memory, that seems strange. > This has me wondering - is it going to be necessary to dump shared memory to > make many backtraces useful? I just responded to Tom mentioning that the > patch doesn't currently dump shared memory, but I hadn't realized the extent > to which it's used for _lots_ more than just disk buffers. I'm not sure how > to handle dumping shared_buffers when someone might be using multi-gigabyte > shared_buffers, though. Dumping the whole lot would risk sudden > out-of-disk-space issues, slowdowns as dumps are written, and the backend > being "frozen" as it's being dumped could delay the system coming back up > again. Trying to selectively dump critical parts could cause dumps to fail > if the system is in early startup or a bad state. I don'tt hink a slowdown issue during dump being written is really a problem - it's not like this is something that happens normally. In fact, we're going to restart all the other backends after the crash anyway, and that's going to be a lot more disruptive. Filling up the disk, however, is a much bigger issue.. As is having a huge dump to mail around (for error reports), if it's not actually necessary... > The same concern applies to writing backend private memory; it's fine most > of the time, but if you're doing data warehousing queries with 2GB of > work_mem, it's going to be nasty having all that extra disk I/O and disk > space use, not to mention the hold-up while the dump is written. If this is > something we want to have people running in production "just in case" or to > track down rare / hard to reproduce faults, that'll be a problem. Potential problem, yes, but not a very big one I believe. Yes, if that very big backend crashes there will be a big dump - but how else are you going to find out what went wrong? In the shared memory case, *all* dumps will be huge, not just those that happen to use a lot of local memory. > OTOH, we can't really go poking around in palloc contexts to decide what to > dump. No, that's way more complex stuff than we dare execute in a crash handler. > I guess we could always do a small, minimalist minidump, then write > _another_ dump that attempts to include select parts of shm and backend > private memory. > > I just thought of two other things, too: > > - Is it possible for this handler to be called recursively if it fails > during the handler call? If so, do we need to uninstall the handler before > attempting a dump to avoid such recursion? I need to do some testing and dig > around MSDN to find out more about this. I'm pretty sure I read somewhere that it can't happen and we don't need to do that, but I can't find any reference to it atm :-( > - Can asynchronous events like signals (or their win32 emulation) interrupt > an executing crash handler, or are they blocked before the crash handler is > called? If they're not blocked, do we need to try to block them before > attempting a dump? Again, I need to do some reading on this. The exception handler itself is global and will run on the crashing thread. I'm fairly certain it stops the other threads while running, but again I can't find a reference to that right now. But ISTM it would have to. > Anyway, here's an example of the backtraces I'm currently getting. They're > clearly missing some parameters (in shm? Unsure) but provide source > file+line, argument values where resolvable, and the call stack its self. > Locals are accessible at all levels of the stack when you go poking around > in windbg. Yeah, they're still very useful. Is that a release or debug build? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Magnus Hagander
Дата:
On Sun, Dec 19, 2010 at 12:51, Magnus Hagander <magnus@hagander.net> wrote: > On Sun, Dec 19, 2010 at 07:26, Craig Ringer <craig@postnewspapers.com.au> wrote: >> On 18/12/2010 1:13 AM, Magnus Hagander wrote: >>> >>> On Fri, Dec 17, 2010 at 17:42, Magnus Hagander<magnus@hagander.net> >>> wrote: >>>> >>>> On Fri, Dec 17, 2010 at 17:24, Craig Ringer<craig@postnewspapers.com.au> >>>> wrote: >>>>> >>>>> On 17/12/2010 7:17 PM, Magnus Hagander wrote: >>>> >>>> Now, that's annoying. So clearly we can't use that function to >>>> determine which version we're on. Seems it only works for "image help >>>> api", and not the general thing. >>>> >>>> According to >>>> http://msdn.microsoft.com/en-us/library/ms679294(v=vs.85).aspx, >>>> we could look for: >>>> >>>> SysEnumLines - if present, we have at least 6.1. >>>> >>>> However, I don't see any function that appeared in 6.0 only.. >>> >>> Actually, I'm wrong - there are functions enough to determine the >>> version. So here's a patch that tries that. >> >> Great. I pulled the latest from your git tree, tested that, and got much >> better results. Crashdump size is back to what I expected. In my test code, >> fcinfo->args and fcinfo->argnull can be examined without problems. >> Backtraces look good; see below. It seems to be including backend private >> memory again now. Thanks _very_ much for your work on this. > > Ok, great. I think that leaves us at least complete enough to commit - > we can always improve things further as we get some more real world > testing. Actually, looking through it again just before commit, I can't help but think the whole code about loading the DLL from different places is unnecessary. The windows DLL search order *always* has the directory of the EXE first, followed by either system dirs or CWD depending on version. Any reason not to just call LoadLibrary once? That makes the patch look like attached. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Вложения
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Craig Ringer
Дата:
On 19/12/2010 7:51 PM, Magnus Hagander wrote: >> Great. I pulled the latest from your git tree, tested that, and got much >> better results. Crashdump size is back to what I expected. In my test code, >> fcinfo->args and fcinfo->argnull can be examined without problems. >> Backtraces look good; see below. It seems to be including backend private >> memory again now. Thanks _very_ much for your work on this. > > Ok, great. I think that leaves us at least complete enough to commit - > we can always improve things further as we get some more real world > testing. > > >> fcinfo->flinfo is still inaccessible, but I suspect it's in shared memory, >> as it's at 0x00000135 . Ditto fcinfo->resultinfo and fcinfo->context. > > Hmm. Not sure why those would be in shared memory, that seems strange. OK, I'll have to do some more digging on that, then. I'm getting on a plane in about 2 hours, but will be bringing Visual Studio snapshots, a postgres git tree, an XP vm, etc with me, so time permitting I should be able to keep on with this. >> Anyway, here's an example of the backtraces I'm currently getting. They're >> clearly missing some parameters (in shm? Unsure) but provide source >> file+line, argument values where resolvable, and the call stack its self. >> Locals are accessible at all levels of the stack when you go poking around >> in windbg. > > Yeah, they're still very useful. Is that a release or debug build? That's a release build. I'm intentionally testing with release builds because I want something that's useful on real-world end-user systems, and I'm aware that few Windows users will build Pg for themselves. (That said, the Perl-based build scripts are some of the least painful build tools I've ever worked with on Windows. Only CMake can rival them for low-pain Windows compilation.) -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Craig Ringer
Дата:
On 19/12/2010 8:05 PM, Magnus Hagander wrote: > Actually, looking through it again just before commit, I can't help > but think the whole code about loading the DLL from different places > is unnecessary. The windows DLL search order *always* has the > directory of the EXE first, followed by either system dirs or CWD > depending on version. > > Any reason not to just call LoadLibrary once? Good point. The program directory was added to the DLL search path with Windows XP. On any Windows version that Pg targets you can trust Windows to load a DLL from the app dir before system32. We don't really care about win2k/nt4 or 9x, where this is necessary. It's not a security concern for the reasons already outlined in the comments in the patch - in brief, because Pg keeps the cwd inside the datadir, and if someone can write malicious files in there you have big problems already. If it was a security concern our fallback would be an attempt to load %WINDIR%\system32\dbghelp.dll by full path, rather than using an unqualified name to search the path. So: I agree. We should just let Windows find dbghelp.dll on the normal search path; building an explicit path isn't necessary. -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Magnus Hagander
Дата:
On Sun, Dec 19, 2010 at 13:57, Craig Ringer <craig@postnewspapers.com.au> wrote: > On 19/12/2010 7:51 PM, Magnus Hagander wrote: > >>> Great. I pulled the latest from your git tree, tested that, and got much >>> better results. Crashdump size is back to what I expected. In my test >>> code, >>> fcinfo->args and fcinfo->argnull can be examined without problems. >>> Backtraces look good; see below. It seems to be including backend private >>> memory again now. Thanks _very_ much for your work on this. >> >> Ok, great. I think that leaves us at least complete enough to commit - >> we can always improve things further as we get some more real world >> testing. >> >> >>> fcinfo->flinfo is still inaccessible, but I suspect it's in shared >>> memory, >>> as it's at 0x00000135 . Ditto fcinfo->resultinfo and fcinfo->context. >> >> Hmm. Not sure why those would be in shared memory, that seems strange. > > OK, I'll have to do some more digging on that, then. I'm getting on a plane > in about 2 hours, but will be bringing Visual Studio snapshots, a postgres > git tree, an XP vm, etc with me, so time permitting I should be able to keep > on with this. Ok. I still think what we have now is a great improvement over nothing. But improvements on top of it is obviously always good :-) >>> Anyway, here's an example of the backtraces I'm currently getting. >>> They're >>> clearly missing some parameters (in shm? Unsure) but provide source >>> file+line, argument values where resolvable, and the call stack its self. >>> Locals are accessible at all levels of the stack when you go poking >>> around >>> in windbg. >> >> Yeah, they're still very useful. Is that a release or debug build? > > That's a release build. I'm intentionally testing with release builds > because I want something that's useful on real-world end-user systems, and > I'm aware that few Windows users will build Pg for themselves. Good - I just wanted to be sure that's what you were testing as well. > (That said, the Perl-based build scripts are some of the least painful build > tools I've ever worked with on Windows. Only CMake can rival them for > low-pain Windows compilation.) :-) Thanks! They're not quite that painless to maintain, but it's not *too* bad. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Magnus Hagander
Дата:
On Sun, Dec 19, 2010 at 14:06, Craig Ringer <craig@postnewspapers.com.au> wrote: > On 19/12/2010 8:05 PM, Magnus Hagander wrote: > >> Actually, looking through it again just before commit, I can't help >> but think the whole code about loading the DLL from different places >> is unnecessary. The windows DLL search order *always* has the >> directory of the EXE first, followed by either system dirs or CWD >> depending on version. >> >> Any reason not to just call LoadLibrary once? > > Good point. The program directory was added to the DLL search path with > Windows XP. On any Windows version that Pg targets you can trust Windows to > load a DLL from the app dir before system32. We don't really care about > win2k/nt4 or 9x, where this is necessary. > > It's not a security concern for the reasons already outlined in the comments > in the patch - in brief, because Pg keeps the cwd inside the datadir, and if > someone can write malicious files in there you have big problems already. If > it was a security concern our fallback would be an attempt to load > %WINDIR%\system32\dbghelp.dll by full path, rather than using an unqualified > name to search the path. > > So: I agree. We should just let Windows find dbghelp.dll on the normal > search path; building an explicit path isn't necessary. I've committed the version that does this. Thanks! -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> writes: > On Sun, Dec 19, 2010 at 07:26, Craig Ringer <craig@postnewspapers.com.au> wrote: >> fcinfo->flinfo is still inaccessible, but I suspect it's in shared memory, >> as it's at 0x00000135 . Ditto fcinfo->resultinfo and fcinfo->context. > Hmm. Not sure why those would be in shared memory, that seems strange. FWIW, I don't believe that theory either. Offhand I cannot think of any case where an FmgrInfo would be in shared memory. It seems more likely from the above that you've misidentified where the fcinfo record is. There are some code paths where we don't bother to set up those fields, but I think we're always careful to set them to NULL if so. regards, tom lane
Craig Ringer <craig@postnewspapers.com.au> writes: > On 17/12/2010 11:17 PM, Tom Lane wrote: >> There is absolutely nothing more frustrating than having a crash dump >> that hasn't got the information you need. Please turn it on by default. > There have to be limits to that, though. dbghelp.dll can dump shared > memory, too - but it's not going to be practical to write or work with > many-gigabyte dumps most of the time, and neither my initial patch nor > Magnus's recent reworking of it ask windbg.dll to include shared memory > in the dump. Well, actually my comment above is based directly on pain in working with core dumps on Unix-oid systems that don't include shared memory. Linux *does* include shared memory, and I have not noticed that that was a problem. Of course, in development installations I don't try to raise shared_buffers to the moon ... > It's possible to tell dbghelp.dll to include specific additional memory > ranges, so it may well be worth including specific critical areas within > the shared memory block while omitting the bulk of the buffers. Doing > that safely in an unsafe we're-crashing environment might not be simple, > though, especially if you're not even sure whether you got far into > startup before crashing. I agree that having the crash dump code know anything specific about the contents of shared memory is a nonstarter --- far too fragile. But perhaps we could have some simple rule based only on what the kernel knows about the shmem block, like "dump shmem if it's no more than 1GB". regards, tom lane
On Sun, Dec 19, 2010 at 11:07 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I agree that having the crash dump code know anything specific about the > contents of shared memory is a nonstarter --- far too fragile. But > perhaps we could have some simple rule based only on what the kernel > knows about the shmem block, like "dump shmem if it's no more than 1GB". Not sure what knobs we have available, but would there be any value in trying to dump the FIRST 1GB? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Magnus Hagander
Дата:
On Sun, Dec 19, 2010 at 17:28, Robert Haas <robertmhaas@gmail.com> wrote: > On Sun, Dec 19, 2010 at 11:07 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I agree that having the crash dump code know anything specific about the >> contents of shared memory is a nonstarter --- far too fragile. But >> perhaps we could have some simple rule based only on what the kernel >> knows about the shmem block, like "dump shmem if it's no more than 1GB". > > Not sure what knobs we have available, but would there be any value in > trying to dump the FIRST 1GB? So such knob that I can find. Basically, we pick a combination of the flags at http://msdn.microsoft.com/en-us/library/ms680519(v=VS.85).aspx We could send it as a "user stream", i guess, but it won't be available in the debugger at the same address space then - just as a blob of data to process manually. I doubt it's worth it in that case... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Robert Haas <robertmhaas@gmail.com> writes: > On Sun, Dec 19, 2010 at 11:07 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I agree that having the crash dump code know anything specific about the >> contents of shared memory is a nonstarter --- far too fragile. �But >> perhaps we could have some simple rule based only on what the kernel >> knows about the shmem block, like "dump shmem if it's no more than 1GB". > Not sure what knobs we have available, but would there be any value in > trying to dump the FIRST 1GB? Probably not. regards, tom lane
Magnus Hagander <magnus@hagander.net> writes: > On Sun, Dec 19, 2010 at 17:28, Robert Haas <robertmhaas@gmail.com> wrote: >> On Sun, Dec 19, 2010 at 11:07 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> perhaps we could have some simple rule based only on what the kernel >>> knows about the shmem block, like "dump shmem if it's no more than 1GB". >> Not sure what knobs we have available, but would there be any value in >> trying to dump the FIRST 1GB? > So such knob that I can find. Basically, we pick a combination of the flags at > http://msdn.microsoft.com/en-us/library/ms680519(v=VS.85).aspx Yeah, I thought I was probably asking for something that couldn't practically be supported. Let's try it as-is for a few releases and see how it goes. regards, tom lane
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Peter Eisentraut
Дата:
On tor, 2010-12-16 at 15:16 +0100, Magnus Hagander wrote: > Found another problem in it: when running with an older version of > dbghelp.dll (which I was), it simply didn't work. We need to grab the > version of dbghelp.dll at runtime and pick which things we're going to > dump based on that. > > The attached version of the patch does that. Can you please test that > it still generates the full dump on your system, which supposedly has > a much more modern version of dbghelp.dll than mine? I was going through the GetLastError() calls to unify the printf formats, as discussed, and I stumbled across this: + write_stderr("could not write crash dump to %s: error code %08x\n", + dumpPath, GetLastError()); Is there a reason why this uses that particular format, unlike all other call sites?
Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
От
Craig Ringer
Дата:
On 11/07/2011 4:23 AM, Peter Eisentraut wrote: > I was going through the GetLastError() calls to unify the printf > formats, as discussed, and I stumbled across this: > > + write_stderr("could not write crash dump to %s: error code %08x\n", > + dumpPath, GetLastError()); > > Is there a reason why this uses that particular format, unlike all other > call sites? > Other call site usages include: errmsg_internal("could not create signal event: %d", (int) GetLastError()) write_stderr("could not create signal listenerpipe: error code %d; retrying\n", (int) GetLastError()) etc, so I presume you're referring to the use of %08x as a format specifier rather than %d ? The only reason is that ntstatus errors are typically reported this way - but, really, not all possible errors arising there would be ntstatus errors. As it's not consistent with the rest of Pg I don't see any reason not to change it to %d. -- Craig Ringer POST Newspapers 276 Onslow Rd, Shenton Park Ph: 08 9381 3088 Fax: 08 9388 2258 ABN: 50 008 917 717 http://www.postnewspapers.com.au/