Обсуждение: Proposed Windows-specific change: Enable crash dumps (like core files)

Поиск

Список

Период

Сортировка

Proposed Windows-specific change: Enable crash dumps (like core files)

От

Craig Ringer

Дата:

04 октября 2010 г., 08:50:42

Hi all

After this recent "fun" trying to get a usable crash dump or stack trace
from a crashing autovacuum worker on Windows, I'd like to make a quick
proposal - one that'll be followed by a patch if there aren't any loud
NACKs.

Windows has a couple of built-in facilities to automatically generate
crash dumps, much like UNIX core files. None of them (JIT debugger
included) work well with PostgreSQL's process-based archiectecture and
use of a private service account. The autovacuum daemon's
launcher/worker split is especially tricky, because a problem worker
process may not live long before crashing, as in the case I'm currently
bashing my head against.

Some other mechanism is needed. Dbghelp.dll, a redistributible DLL from
the platform SDK, provides that mechanism. You can set a handler that's
invoked if the process encounters an unhandled exception (which in
Windows-land includes anything that'd be a fatal signal in UNIX, as well
as "real" C++ exceptions). This handler uses LoadLibrary to load
dbghelp.dll and generate a crash dump (core file) that can be debugged
with Visual Studio, windbg, etc on the generating machine or any other
with the right PostgreSQL binaries and debug symbols.

It's kind of like loading gdb in-process and writing out a core, with
the same obvious problem that it won't help you with problems caused by
running totally out of memory or certain types of memory corruption.
It's protected against recursive calls by the OS, though, so if your
handler fails too there's no real harm in it.

The big advantage is that, like when dealing with a core file on *nix,
people can email/ftp/whatever crash dump files for analsys anywhere else
that someone has the same binaries and debug symbols. That'd make it it
awfully useful even in cases where it *is* easy to predict the crash and
attach a debugger.

Because of the potential disk space use of crash dumps I think it'd be
undesirable to have it always enabled, so here's what I propose:

- Always compile the crash dump feature in builds, so every build that goes to a user is capable of crash dumps.
Becausea suitable version of dbghelp.dll has been included in Windows XP and above, it shouldn't be necessary to ship
theDLL with Pg, though the license permits that.

- Use a GUC option to control whether dumps are generated on a crash. It'd be a string value, with permitted values
"off","normal" or "withdata". (These match the "MiniDumpNormal" and "MiniDumpWithDataSegs" calls in dbghelp.dll). The
defaultis "off".

If "off" is set then no unhandled exception handler will be registered and no crash dump will be generated. The
crashdump code has no effect beyond checking the GUC and seeing that it's set to "off". With either of the other
valuesa handler is registered; which is chosen controls whether a minimal or full crash dump will be generated.

It may even turn out to be easy to make this togglable at runtime (superuser only) at least with a conf edit and
reload,or possibly even per-session. I'm unsure of that as yet, though.

- Hard-code the dump file output location, so there's no need to try to read a GUC from within the crash handler. If
people want to change the dump file location, they can symlink/reparse/ whatever $DATADIR\crashdumps to their
preferredlocation.

- If the crash dump handler is enabled by setting the GUC, all backends register the handler during startup or (if it
provespractical) when the GUC is changed.

- When the handler is triggered by the OS trapping an unhandled exception, it loads dbghelp.dll, writes the
appropriatedump format to the hardcoded path, and terminates the process.

Comments? Thoughts?

Would a patch along these lines have a chance of acceptance?

(BCC'd Dave Page because of his involvement in Windows & Windows Pg
support).

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Andrew Dunstan

Дата:

04 октября 2010 г., 09:06:34


On 10/04/2010 07:50 AM, Craig Ringer wrote:
>
> - If the crash dump handler is enabled by setting the GUC,
>   all backends register the handler during startup or (if it
>   proves practical) when the GUC is changed.
>
> - When the handler is triggered by the OS trapping an unhandled
>   exception, it loads dbghelp.dll, writes the appropriate dump
>   format to the hardcoded path, and terminates the process.
>
>

What is the performance impact of doing that? Specifically, how does it 
affect backend startup time?

cheers

andrew

Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Dave Page

Дата:

04 октября 2010 г., 09:09:28

On Mon, Oct 4, 2010 at 12:50 PM, Craig Ringer
<craig@postnewspapers.com.au> wrote:

> - Always compile the crash dump feature in builds, so every build that
>  goes to a user is capable of crash dumps. Because a suitable version
>  of dbghelp.dll has been included in Windows XP and above, it shouldn't
>  be necessary to ship the DLL with Pg, though the license permits that.

We probably would want to - the version that comes with Windows is
somewhat cut down, and MSDN recommends using the more recent versions
from the debugging tools.

Any idea of the performance overhead?

> (BCC'd Dave Page because of his involvement in Windows & Windows Pg
> support).

That explains why it bypassed my filters :-)

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Craig Ringer

Дата:

04 октября 2010 г., 09:10:07

On 4/10/2010 8:06 PM, Andrew Dunstan wrote:
>
>
> On 10/04/2010 07:50 AM, Craig Ringer wrote:
>>
>> - If the crash dump handler is enabled by setting the GUC,
>> all backends register the handler during startup or (if it
>> proves practical) when the GUC is changed.
>>
>> - When the handler is triggered by the OS trapping an unhandled
>> exception, it loads dbghelp.dll, writes the appropriate dump
>> format to the hardcoded path, and terminates the process.
>>
>>
>
> What is the performance impact of doing that? Specifically, how does it
> affect backend startup time?

Without testing I can't say for sure.

My expection based on how the handler works would be: near-zero, about 
as expensive as registering a signal handler, plus the cost of reading 
the GUC and doing one string compare to test the value. When disabled, 
it's just the GUC test.

Is there a better mechanism to use for features that're going to be 
unused the great majority of the time? Perhaps something that does 
require a server restart, but doesn't have any cost at all when disabled?

-- 
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Craig Ringer

Дата:

04 октября 2010 г., 09:21:33

On 4/10/2010 8:09 PM, Dave Page wrote:

> Any idea of the performance overhead?

A GUC read and string compare when off, and (untested as yet) I expect 
little more when on.

Thinking about it, a possibly better alternative is to ship it as a 
library as is done with the pl/pgsql debugger, and use (I think) 
shared_preload_libraries to load it when desired. That way there's zero 
cost if disabled, though a somewhat higher cost if enabled.

Hmm. That'll let me put a test version together that'll work with 9.0 as 
well, so that seems like a no-brainer really, it's just a better way to 
do it. Time for a Pg-on-windows build, yay.

-- 
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Heikki Linnakangas

Дата:

04 октября 2010 г., 09:27:50

On 04.10.2010 15:21, Craig Ringer wrote:
> Thinking about it, a possibly better alternative is to ship it as a
> library as is done with the pl/pgsql debugger, and use (I think)
> shared_preload_libraries to load it when desired. That way there's zero
> cost if disabled, though a somewhat higher cost if enabled.
>
> Hmm. That'll let me put a test version together that'll work with 9.0 as
> well, so that seems like a no-brainer really, it's just a better way to
> do it. Time for a Pg-on-windows build, yay.

If it's not a lot of code, it's better to have it built-in always. 
Loading extra libraries in debug-mode could lead to heisenbugs.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Craig Ringer

Дата:

04 октября 2010 г., 09:32:35

On 4/10/2010 8:27 PM, Heikki Linnakangas wrote:
> On 04.10.2010 15:21, Craig Ringer wrote:
>> Thinking about it, a possibly better alternative is to ship it as a
>> library as is done with the pl/pgsql debugger, and use (I think)
>> shared_preload_libraries to load it when desired. That way there's zero
>> cost if disabled, though a somewhat higher cost if enabled.
>>
>> Hmm. That'll let me put a test version together that'll work with 9.0 as
>> well, so that seems like a no-brainer really, it's just a better way to
>> do it. Time for a Pg-on-windows build, yay.
>
> If it's not a lot of code, it's better to have it built-in always.
> Loading extra libraries in debug-mode could lead to heisenbugs.

Good point. The built-in approach would also permit it to be turned on 
in an already-running server, which would be nice as for those fun 
only-shows-up-in-production bugs where you may not want to restart Pg.

I'll chuck together a library version first, though, so it can be tested 
easily on existing installs of Pg 9.0. Compiling Pg on Windows isn't as 
quick and easy as on *nix so that should help for testing/consideration. 
If people are happy with the results I'll put together a patch to 
integrate it directly instead of using a library.

-- 
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Craig Ringer

Дата:

05 октября 2010 г., 10:40:02

On 4/10/2010 8:27 PM, Heikki Linnakangas wrote:
> On 04.10.2010 15:21, Craig Ringer wrote:
>> Thinking about it, a possibly better alternative is to ship it as a
>> library as is done with the pl/pgsql debugger, and use (I think)
>> shared_preload_libraries to load it when desired. That way there's zero
>> cost if disabled, though a somewhat higher cost if enabled.
>>
>> Hmm. That'll let me put a test version together that'll work with 9.0 as
>> well, so that seems like a no-brainer really, it's just a better way to
>> do it. Time for a Pg-on-windows build, yay.
>
> If it's not a lot of code, it's better to have it built-in always.
> Loading extra libraries in debug-mode could lead to heisenbugs.

It turns out that the basic minidumps are small (21kb here) and very
fast to generate. It may well be worth leaving it enabled all the time
after all. I just need to time how long the handler registration takes -
though as LOAD of the current handler implemented as a contrib module
takes 2.1ms, and LOAD of an module with an empty _PG_init() also takes
2.1ms, I'm guessing "not long".

I've attached an early version for people to play with if anyone's
interested. It currently contains a bunch of elog() messages to report
on its progress - though I'm not at all sure elog() should be used in
the final version given that Pg is crashing at the time the handler is
called and there's no guarantee elog() is safe to call. It also doesn't
offer any way to set the dump type yet, it's always the minimal dump
with exception info, registers and stack only. Finally, a "crashme"
function is included to trigger a reliable unhandled exception on
demand, for initial testing.

Usage with Pg built from source on Windows:

- Drop crashdump.c and the Makefile in contrib/crashdump
- Run build.bat and install.bat
- Create a "crashdumps" directory inside your datadir, and make
   sure the server has read/write/create permission on it.
- add 'crashdump' to shared_preload_libraries in postgresql.conf
- Start the server as normal
- Start psql and issue:
-- CREATE OR REPLACE FUNCTION crashdump_crashme() RETURNS void
    AS '$libdir/crashdump','crashdump_crashme' LANGUAGE C;
-- SELECT crashdump_crashme();

Your backend should crash. In the error log (and, depending on how the
buffering works out, maybe in psql) you should see:

WARNING:  loading dll
WARNING:  loading dll try 1, 00000000
WARNING:  loading dll try 2, 73880000
WARNING:  pdump: 738C70D8
WARNING:  Generating dumppath
WARNING:  Generated dumppath
WARNING:  dumping to path: crashdumps\backend.dmp
WARNING:  outfile: 000000B0
WARNING:  about to dump
NOTICE:  crashdump: wrote crash dump to crashdumps\postgres-4341.mdmp

Once you have the dump file, you can fire up windbg from the Debugging
Tools for Windows and use File->Open Crash Dump to open it. The .excr
command will report what the exception that caused the crash was,
producing output like this:

> 0:000> .ecxr
> eax=00000000 ebx=00000000 ecx=02a1e290 edx=73831014 esi=02a1e188 edi=00c3f798
> eip=738313b2 esp=00c3f724 ebp=02a1e280 iopl=0         nv up ei pl zr na pe nc
> cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00010246
> crashdump!crashdump_crashme+0x2:
> 738313b2 c70001000000    mov     dword ptr [eax],1    ds:0023:00000000=????????

and if possible opening the postgresql source file the crash happened in
to the appropriate line:

Datum
crashdump_crashme(PG_FUNCTION_ARGS)
{
    int * ptr = NULL;
    *ptr = 1;                    <--- highlighted
    return NULL;
}

If you're using Visual Studio Professional (not the free Express
edition, unfortunately) you should also be able to debug the crash dump
that way. I don't have it to test with.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

Вложения

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Craig Ringer

Дата:

05 октября 2010 г., 12:22:21

OK, it's pretty much ready for proper testing now. If a few people are
happy with the results and think it's a good idea I'll chuck it in the
commitfest app.

As built, the crash dump handler works with a stock PostgreSQL 9.0
(release build) as shipped in EDB's installer. Just drop crashdump.dll
in lib\, optionally pop the dbghelp.dll redist in bin\, add 'crashdump'
to shared_preload_libraries, and crash some backends however you feel
like doing so.

The current build of crashdump.dll for release versions of PostgreSQL
9.0 on 32-bit Windows is here:

   http://www.postnewspapers.com.au/~craig/webfiles/crashdump.dll

If folks are happy with how this works, all it needs is:

- Consideration of whether elog should be used or not. I'm inclined to
   suggest using elog to tell the user about the dump, but only after
   the crash dump has been written successfully.

- Comments on whether this should be left as a loadable module, or
   integerated into core so it loads early in backend startup. The latter
   will permit crash dumping of early backend startup problems, and will
   have tiny overhead because there's no DLL to find and load. OTOH, it's
   harder to pull out if somehow something breaks.

   I'd want to start with loadable module in shared_preload_libraries
   and if that proves useful, only then consider integrating in core.
   I'm way too bad a programmer to want my code anywhere near Pg's core
   without plenty of real world testing.

- (Maybe) a method to configure the crash dump type. I'm not too sure
   it's necessary given the set of dump flags I've landed up with,
   so I'd leave this be unless it proves to be necessary in real-world
   testing.


Remember that with these crash dumps the user only has to email the
(~4MB in my tests) crash dump. They don't need debugging tools or the
ability to use them, only the recipient does.

I'm using a tweaked set of minidump flags now, to generate considerably
bigger dumps (around 4MB for the configuration I'm testing) that contain
pretty much everything except shared memory contents, the contents of
memory mapped files, and the loaded read-only code segments of the
executables and DLLs. You can see the results of using it to debug that
autovac crash at the end of this mail.

When testing the script provided by Andrea Peri, when the autovacuum
worker crashes, the message:

> 2010-10-05 22:23:44 WST 2040 WARNING:  crashdump: wrote crash dump to crashdumps\postgres-2040.mdmp

is emitted before the process resumes crashing out as it would've normally.

Opening and displaying that dump in windbg shows a useful stack trace
from the crashing process, and it's possible to examine the state of
local/global variables etc.


> 0:000> .ecxr
> eax=90fffffe ebx=040af140 ecx=68f08610 edx=040af248 esi=011f64d4 edi=040b001c
> eip=015d5267 esp=0055f1cc ebp=011f5bc8 iopl=0         nv up ei pl zr na pe nc
> cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00010246
> postgres!pfree+0x7:
> 015d5267 8b5004          mov     edx,dword ptr [eax+4] ds:0023:91000002=????????
> 0:000> ~*k
>
> .  0  Id: 7f8.7b8 Suspend: 0 Teb: 7ffdf000 Unfrozen
> ChildEBP RetAddr
> 0055e210 75b4c1ee ntdll!KiFastSystemCallRet
> 0055e250 76e7100e KERNELBASE!FindClose+0x93
> 0055e514 679160c3 kernel32!_SEH_epilog4_GS+0xa
> 0055e6cc 779734e0 dbghelp!Win32LiveSystemProvider::OpenMapping+0x2c3
> 0055e7a8 7797353a ntdll!RtlpAllocateHeap+0xab2
> 0055e828 77965edc ntdll!RtlAllocateHeap+0x23a
> 0055e840 76e7123a ntdll!ZwWriteFile+0xc
> 0055e874 67916861 kernel32!WriteFileImplementation+0x76
> 0055e8ac 67916910 dbghelp!Win32LiveSystemProvider::ReadVirtual+0x71
> 0055e8d8 67908f09 dbghelp!Win32LiveSystemProvider::ReadAllVirtual+0x30
> 0055e910 679094b4 dbghelp!WriteMemoryFromProcess+0xa9
> 0055e9a8 6790d522 dbghelp!WriteThreadList+0x184
> 0055e9c0 6790e65a dbghelp!WriteDumpData+0x102
> 0055eb58 6790e9cb dbghelp!MiniDumpProvideDump+0x3fa
> 0055ebd0 73231353 dbghelp!MiniDumpWriteDump+0x1cb
> 0055ed14 76e82c4a crashdump!crashDumpHandler+0x183
[c:\users\craig\developer\postgresql-9.0.0\contrib\crashdump\crashdump.c@ 159] 
> 0055ed9c 77995af4 kernel32!UnhandledExceptionFilter+0x127
> 0055eda4 7793d964 ntdll!__RtlUserThreadStart+0x62
> 0055edb8 7793d7fc ntdll!_EH4_CallFilterFunc+0x12
> 0055ede0 779665f9 ntdll!_except_handler4+0x8e
> 0055ee04 779665cb ntdll!ExecuteHandler2+0x26
> 0055eeb4 77966457 ntdll!ExecuteHandler+0x24
> 0055eeb4 015d5267 ntdll!KiUserExceptionDispatcher+0xf
> 0055f1c8 0139eee7 postgres!pfree+0x7 [c:\pginstaller-repo\postgres.windows\src\backend\utils\mmgr\mcxt.c @ 591]
> 0055f1e0 0139f14a postgres!examine_attribute+0x207
[c:\pginstaller-repo\postgres.windows\src\backend\commands\analyze.c@ 877] 
> 0055f284 0139f94c postgres!do_analyze_rel+0x24a [c:\pginstaller-repo\postgres.windows\src\backend\commands\analyze.c
@357] 
> 0055f2ac 013eb74a postgres!analyze_rel+0x1bc [c:\pginstaller-repo\postgres.windows\src\backend\commands\analyze.c @
232]
> 0055f330 014b30c6 postgres!vacuum+0x1ea [c:\pginstaller-repo\postgres.windows\src\backend\commands\vacuum.c @ 248]
> 0055f368 014b3991 postgres!autovacuum_do_vac_analyze+0x86
[c:\pginstaller-repo\postgres.windows\src\backend\postmaster\autovacuum.c@ 2651] 
> 0055f4f4 014b41c5 postgres!do_autovacuum+0x811
[c:\pginstaller-repo\postgres.windows\src\backend\postmaster\autovacuum.c@ 2231] 
> 0055f588 014bed02 postgres!AutoVacWorkerMain+0x265
[c:\pginstaller-repo\postgres.windows\src\backend\postmaster\autovacuum.c@ 1611] 
> 0055f7d8 01423ce8 postgres!SubPostmasterMain+0x442
[c:\pginstaller-repo\postgres.windows\src\backend\postmaster\postmaster.c@ 4093] 
> 0055f7f0 015ee1ad postgres!main+0x1f8 [c:\pginstaller-repo\postgres.windows\src\backend\main\main.c @ 165]
> 0055f834 76e71194 postgres!__tmainCRTStartup+0x10f [f:\dd\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 586]
> 0055f840 7797b495 kernel32!BaseThreadInitThunk+0xe
> 0055f880 7797b468 ntdll!__RtlUserThreadStart+0x70
> 0055f898 00000000 ntdll!_RtlUserThreadStart+0x1b

Вложения

Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Bruce Momjian

Дата:

20 октября 2010 г., 19:15:21

Craig Ringer wrote:
> On 4/10/2010 8:06 PM, Andrew Dunstan wrote:
> >
> >
> > On 10/04/2010 07:50 AM, Craig Ringer wrote:
> >>
> >> - If the crash dump handler is enabled by setting the GUC,
> >> all backends register the handler during startup or (if it
> >> proves practical) when the GUC is changed.
> >>
> >> - When the handler is triggered by the OS trapping an unhandled
> >> exception, it loads dbghelp.dll, writes the appropriate dump
> >> format to the hardcoded path, and terminates the process.
> >>
> >>
> >
> > What is the performance impact of doing that? Specifically, how does it
> > affect backend startup time?
> 
> Without testing I can't say for sure.
> 
> My expection based on how the handler works would be: near-zero, about 
> as expensive as registering a signal handler, plus the cost of reading 
> the GUC and doing one string compare to test the value. When disabled, 
> it's just the GUC test.
> 
> Is there a better mechanism to use for features that're going to be 
> unused the great majority of the time? Perhaps something that does 
> require a server restart, but doesn't have any cost at all when disabled?

We definately had trouble producing crash dumps caused by the 128 return
code problem, so I can see great value in this, if it can be done
simply.  I wonder if the 128-exit would have produced a crash file.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Magnus Hagander

Дата:

22 ноября 2010 г., 07:37:53

On Tue, Oct 5, 2010 at 17:21, Craig Ringer <craig@postnewspapers.com.au> wrote:
> OK, it's pretty much ready for proper testing now. If a few people are happy
> with the results and think it's a good idea I'll chuck it in the commitfest
> app.
>
> As built, the crash dump handler works with a stock PostgreSQL 9.0 (release
> build) as shipped in EDB's installer. Just drop crashdump.dll in lib\,
> optionally pop the dbghelp.dll redist in bin\, add 'crashdump' to
> shared_preload_libraries, and crash some backends however you feel like
> doing so.
>
> The current build of crashdump.dll for release versions of PostgreSQL 9.0 on
> 32-bit Windows is here:
>
>  http://www.postnewspapers.com.au/~craig/webfiles/crashdump.dll
>
> If folks are happy with how this works, all it needs is:
>
> - Consideration of whether elog should be used or not. I'm inclined to
>  suggest using elog to tell the user about the dump, but only after
>  the crash dump has been written successfully.
>
> - Comments on whether this should be left as a loadable module, or
>  integerated into core so it loads early in backend startup. The latter
>  will permit crash dumping of early backend startup problems, and will
>  have tiny overhead because there's no DLL to find and load. OTOH, it's
>  harder to pull out if somehow something breaks.
>
>  I'd want to start with loadable module in shared_preload_libraries
>  and if that proves useful, only then consider integrating in core.
>  I'm way too bad a programmer to want my code anywhere near Pg's core
>  without plenty of real world testing.
>
> - (Maybe) a method to configure the crash dump type. I'm not too sure
>  it's necessary given the set of dump flags I've landed up with,
>  so I'd leave this be unless it proves to be necessary in real-world
>  testing.
>
Finally getting to looking at this one - sorry about the very long delay.

I agree with Heikki's earlier comment that it's better to have this
included in the backend - but that's obviously not going to happen for
already-released versions. I'd therefor advocate a contrib module for
existing versions, and then in core for 9.1+.

We should then have an option to turn it off (on by default). But we
don't need to pay the overhead on every backend startup - we could
just map the value into the parameter shared memory block, and require
a full postmaster restart in order to change it.

Do we want to backpatch it into contrib/? Adding a new module there
seems kind of wrong - probably better to keep the source separate and
just publish the DLL files for people who do debugging?

Looking at the code:
* Why do we need to look at differnt versions of dbghelp.dll? Can't we
always use the one with Windows? And in that case, can't we just use
compile-time linking, do we need to bother with DLL loading at all?

* Per your comment about using elog() - a good idea in that case is to
use write_stderr(). That will send the output to stderr if there is
one, and otherwise the eventlog (when running as service).

* And yes, in the crash handler, we should *definitely* not use elog().

* On Unix, the core file is dropped in the database directory, we
don't have a separate directory for crashdumps. If we want to be
consistent, we should do that here too. I do think that storing them
in a directory like "crashdumps" is better, but I just wanted to raise
the comment.

* However, when storing it in crashdumps, I think the code would need
to create that directory if it does not exist, doesn't it?

* Right now, we overwrite old crashdumps. It is well known that PIDs
are recycled pretty quickly on Windows - should we perhaps dump as
postgres-<pid>-<sequence>.mdmp when there is a filename conflict?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Robert Haas

Дата:

22 ноября 2010 г., 10:15:25

On Mon, Nov 22, 2010 at 6:37 AM, Magnus Hagander <magnus@hagander.net> wrote:
> Do we want to backpatch it into contrib/? Adding a new module there
> seems kind of wrong - probably better to keep the source separate and
> just publish the DLL files for people who do debugging?

If this works without changes to core, I see little reason not to
back-patch it into contrib.  Our primary concern with back-patching is
to avoid doing things that might break existing installations, but if
there aren't any core changes, I don't really see how that can be an
issue here.  It seems to me that it's probably simpler for us and our
users to keep the debugging tools together with our main tree.

However, I am not clear what benefit we get from moving this into core
in 9.1.  If it's still going to require a full postmaster restart, the
GUC you have to change may as well be shared_preload_libraries as a
new one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Tom Lane

Дата:

22 ноября 2010 г., 10:51:00

Robert Haas <robertmhaas@gmail.com> writes:
> However, I am not clear what benefit we get from moving this into core
> in 9.1.  If it's still going to require a full postmaster restart, the
> GUC you have to change may as well be shared_preload_libraries as a
> new one.

+1.  I am not in favor of randomly repackaging functionality, unless
there's some clear benefit gained by doing so.  In this case it seems
like something that could and should remain at arm's length forever,
so a contrib module is the ideal packaging.
        regards, tom lane

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Robert Haas

Дата:

22 ноября 2010 г., 10:53:47

On Mon, Nov 22, 2010 at 9:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> However, I am not clear what benefit we get from moving this into core
>> in 9.1.  If it's still going to require a full postmaster restart, the
>> GUC you have to change may as well be shared_preload_libraries as a
>> new one.
>
> +1.  I am not in favor of randomly repackaging functionality, unless
> there's some clear benefit gained by doing so.  In this case it seems
> like something that could and should remain at arm's length forever,
> so a contrib module is the ideal packaging.

Now, if we could make this so low-overhead that it doesn't need a
switch, that would be a good argument for moving it into core, at
least to my way of thinking.  But if it's something that needs to be
enabled with a postmaster restart anyway, meh.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Peter Eisentraut

Дата:

22 ноября 2010 г., 11:17:27

On mån, 2010-11-22 at 12:37 +0100, Magnus Hagander wrote:
> Do we want to backpatch it into contrib/?

It's not a bug fix or an upgrading aid, so no.

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Tom Lane

Дата:

22 ноября 2010 г., 11:34:30

Peter Eisentraut <peter_e@gmx.net> writes:
> On mån, 2010-11-22 at 12:37 +0100, Magnus Hagander wrote:
>> Do we want to backpatch it into contrib/?

> It's not a bug fix or an upgrading aid, so no.

I'm less than thrilled about back-patching this, too.  It seems to fly
in the face of all our historical practice.

If you drop the bit about back-patching, then I don't particularly care
whether it goes into core or contrib for 9.1 --- whichever packaging
makes the most sense is fine.
        regards, tom lane

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Magnus Hagander

Дата:

22 ноября 2010 г., 11:44:02

On Mon, Nov 22, 2010 at 16:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
>> On mån, 2010-11-22 at 12:37 +0100, Magnus Hagander wrote:
>>> Do we want to backpatch it into contrib/?
>
>> It's not a bug fix or an upgrading aid, so no.
>
> I'm less than thrilled about back-patching this, too.  It seems to fly
> in the face of all our historical practice.
>
> If you drop the bit about back-patching, then I don't particularly care
> whether it goes into core or contrib for 9.1 --- whichever packaging
> makes the most sense is fine.

My suggestion in the first place was not to backpatch it - I just
wanted to get peoples opinions. I'm perfectly happy if we keep it
somewhere else for the time being - as long as we make the binaries
easily available. But that can go on the wiki for example.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Magnus Hagander

Дата:

22 ноября 2010 г., 11:48:07

On Mon, Nov 22, 2010 at 15:15, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Nov 22, 2010 at 6:37 AM, Magnus Hagander <magnus@hagander.net> wrote:
>> Do we want to backpatch it into contrib/? Adding a new module there
>> seems kind of wrong - probably better to keep the source separate and
>> just publish the DLL files for people who do debugging?
>
> If this works without changes to core, I see little reason not to
> back-patch it into contrib.  Our primary concern with back-patching is
> to avoid doing things that might break existing installations, but if
> there aren't any core changes, I don't really see how that can be an
> issue here.  It seems to me that it's probably simpler for us and our
> users to keep the debugging tools together with our main tree.
>
> However, I am not clear what benefit we get from moving this into core
> in 9.1.  If it's still going to require a full postmaster restart, the
> GUC you have to change may as well be shared_preload_libraries as a
> new one.

on-by-default is what we gain. I think that's fairly big...


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Tom Lane

Дата:

22 ноября 2010 г., 12:39:35

Magnus Hagander <magnus@hagander.net> writes:
> on-by-default is what we gain. I think that's fairly big...

Only if that's actually what we want, which is far from clear in this
corner.  There are good reasons why most Linux distros configure daemons
not to dump core by default.  It's annoying when we are trying to debug
a Postgres crash, but that doesn't mean the reasons aren't good.
        regards, tom lane

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Robert Haas

Дата:

22 ноября 2010 г., 13:10:23

On Mon, Nov 22, 2010 at 10:17 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
> On mån, 2010-11-22 at 12:37 +0100, Magnus Hagander wrote:
>> Do we want to backpatch it into contrib/?
>
> It's not a bug fix or an upgrading aid, so no.

I don't see why an upgrading aid would be worthy of back-patching, but
not a debugging aid.  I'd certainly prioritize those in the other
order.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Tom Lane

Дата:

22 ноября 2010 г., 13:32:14

Robert Haas <robertmhaas@gmail.com> writes:
> I don't see why an upgrading aid would be worthy of back-patching, but
> not a debugging aid.  I'd certainly prioritize those in the other
> order.

I think the sort of upgrading aid Peter has in mind is the kind where
it's entirely useless if it's not back-patched, because it has to run in
the pre-upgraded server.  We've discussed such things before in the
context of in-place upgrade, though I believe there have been no actual
instances as yet.

I'm not really sure why we're even considering the notion of
back-patching this item.  Doing so would not fit with any past practice
or agreed-on project management practices, not even under our lax
standards for contrib (and I keep hearing people claim that contrib
is or should be as trustworthy as core, anyway).  Since when do we
back-patch significant features that have not been through a beta test
cycle?
        regards, tom lane

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Robert Haas

Дата:

22 ноября 2010 г., 13:48:06

On Mon, Nov 22, 2010 at 12:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I don't see why an upgrading aid would be worthy of back-patching, but
>> not a debugging aid.  I'd certainly prioritize those in the other
>> order.
>
> I think the sort of upgrading aid Peter has in mind is the kind where
> it's entirely useless if it's not back-patched, because it has to run in
> the pre-upgraded server.  We've discussed such things before in the
> context of in-place upgrade, though I believe there have been no actual
> instances as yet.
>
> I'm not really sure why we're even considering the notion of
> back-patching this item.  Doing so would not fit with any past practice
> or agreed-on project management practices, not even under our lax
> standards for contrib (and I keep hearing people claim that contrib
> is or should be as trustworthy as core, anyway).  Since when do we
> back-patch significant features that have not been through a beta test
> cycle?

I am as conservative about back-patching as anybody here, but
debugging on Windows is not an easy thing to do, and I strongly
suspect we are going to point people experiencing crashes on Windows
to this code whether it's part of our official distribution or not.  I
don't see what we get out of insisting that people install it
separately.  This is a tool that is only intended to be used when
PostgreSQL is CRASHING, so arguing that we shouldn't include the code
because it might not be stable doesn't carry much water AFAICS.  As
far as I understand it, we don't back-patch new features because of
the risk of breaking things, but in this case refusing to back-patch
the code seems more likely to prevent adequate diagnosis of what is
already broken.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Tom Lane

Дата:

22 ноября 2010 г., 13:48:12

Magnus Hagander <magnus@hagander.net> writes:
> * On Unix, the core file is dropped in the database directory, we
> don't have a separate directory for crashdumps. If we want to be
> consistent, we should do that here too. I do think that storing them
> in a directory like "crashdumps" is better, but I just wanted to raise
> the comment.

Just a note on that - it's by no means universal that Unix systems will
put the core files in $PGDATA.  OS X likes to put them in /cores, which
I think is a convention shared with some other BSDish systems.  On Linux
I believe it's possible to configure where the core goes via environment
settings.

> * However, when storing it in crashdumps, I think the code would need
> to create that directory if it does not exist, doesn't it?

If it didn't do so, then manual creation/removal of the directory could
be used as an on/off switch for the feature.  Which would have a number
of advantages, not least that you don't need to have the crash dumper
dependent on GUC working.  I haven't looked at the patch but this
discussion makes it sound like the dumper is dependent on an
uncomfortably large amount of backend code being functional.  You need
to minimize the number of assumptions of that sort.
        regards, tom lane

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Tom Lane

Дата:

22 ноября 2010 г., 13:56:15

Robert Haas <robertmhaas@gmail.com> writes:
> I am as conservative about back-patching as anybody here, but
> debugging on Windows is not an easy thing to do, and I strongly
> suspect we are going to point people experiencing crashes on Windows
> to this code whether it's part of our official distribution or not.  I
> don't see what we get out of insisting that people install it
> separately.  This is a tool that is only intended to be used when
> PostgreSQL is CRASHING, so arguing that we shouldn't include the code
> because it might not be stable doesn't carry much water AFAICS.  As
> far as I understand it, we don't back-patch new features because of
> the risk of breaking things, but in this case refusing to back-patch
> the code seems more likely to prevent adequate diagnosis of what is
> already broken.

Well, if we're going to hand out prebuilt DLLs to people, we can do that
without having back-patched the code officially.  But more to the point
is that it's not clear that we're going to end up with a contrib module
at all going forward (a core feature would clearly be a lot more
reliable), and I really do not wish to get involved with maintaining two
independent versions of this code.

This argument seems to boil down to "we have to have this yesterday",
which I don't buy for a minute.  If it's as critical as that, why did
it take this long for someone to write it?  I do NOT agree that this
feature is important enough to justify a free pass around our normal
management and quality assurance processes.
        regards, tom lane

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Heikki Linnakangas

Дата:

22 ноября 2010 г., 13:56:18

On 22.11.2010 19:47, Robert Haas wrote:
> I am as conservative about back-patching as anybody here, but
> debugging on Windows is not an easy thing to do, and I strongly
> suspect we are going to point people experiencing crashes on Windows
> to this code whether it's part of our official distribution or not.

This whole thing makes me wonder: is there truly no reliable, simple 
method to tell Windows to create a core dump on crash, without writing 
custom code for it. I haven't seen one, but I find it amazing if there 
isn't. We can't be alone with this.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Magnus Hagander

Дата:

22 ноября 2010 г., 14:26:46

On Mon, Nov 22, 2010 at 18:46, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> * However, when storing it in crashdumps, I think the code would need
>> to create that directory if it does not exist, doesn't it?
>
> If it didn't do so, then manual creation/removal of the directory could
> be used as an on/off switch for the feature.  Which would have a number
> of advantages, not least that you don't need to have the crash dumper
> dependent on GUC working.  I haven't looked at the patch but this
> discussion makes it sound like the dumper is dependent on an
> uncomfortably large amount of backend code being functional.  You need
> to minimize the number of assumptions of that sort.

No, it's dependent on close to zero backend functionality.
Particularly if we take out the dependency on elog() (write_stderr is
much simpler). In fact, the calls to elog() are the *only* places
where it calls into the backend as it stands today.

And yes, ISTM it could work reasonably well to use the
creation/deletion of the directory as an on/off switch for it. Which
is the default is of course up to the packager then as well ;)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Magnus Hagander

Дата:

22 ноября 2010 г., 14:28:50

On Mon, Nov 22, 2010 at 18:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I am as conservative about back-patching as anybody here, but
>> debugging on Windows is not an easy thing to do, and I strongly
>> suspect we are going to point people experiencing crashes on Windows
>> to this code whether it's part of our official distribution or not.  I
>> don't see what we get out of insisting that people install it
>> separately.  This is a tool that is only intended to be used when
>> PostgreSQL is CRASHING, so arguing that we shouldn't include the code
>> because it might not be stable doesn't carry much water AFAICS.  As
>> far as I understand it, we don't back-patch new features because of
>> the risk of breaking things, but in this case refusing to back-patch
>> the code seems more likely to prevent adequate diagnosis of what is
>> already broken.
>
> Well, if we're going to hand out prebuilt DLLs to people, we can do that
> without having back-patched the code officially.  But more to the point
> is that it's not clear that we're going to end up with a contrib module
> at all going forward (a core feature would clearly be a lot more
> reliable), and I really do not wish to get involved with maintaining two
> independent versions of this code.

I think the reasonable options are either "don't backpatch at all" or
"backpatch the same way as we put it in HEAD, which is probably
included in backend". I agree that sticking it in contrib is a
half-measure that we shouldn't do.

*IF* we go with a contrib module for 9.1 as well, we could backpatch
as contrib module, but I think that's the only case.

> This argument seems to boil down to "we have to have this yesterday",
> which I don't buy for a minute.  If it's as critical as that, why did
> it take this long for someone to write it?  I do NOT agree that this
> feature is important enough to justify a free pass around our normal
> management and quality assurance processes.

Agreed.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Magnus Hagander

Дата:

22 ноября 2010 г., 14:33:43

On Mon, Nov 22, 2010 at 18:56, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 22.11.2010 19:47, Robert Haas wrote:
>>
>> I am as conservative about back-patching as anybody here, but
>> debugging on Windows is not an easy thing to do, and I strongly
>> suspect we are going to point people experiencing crashes on Windows
>> to this code whether it's part of our official distribution or not.
>
> This whole thing makes me wonder: is there truly no reliable, simple method
> to tell Windows to create a core dump on crash, without writing custom code
> for it. I haven't seen one, but I find it amazing if there isn't. We can't
> be alone with this.

You can do it without custom code but it's quite hard. And AFAIK you
need to install the debugger tools package from microsoft (separate
download, and  not something you'll normally find on a system).

There is DrWatson and the Error Reporting service you can use.
DrWatson is made for interactive programs (or was the last time I
looked at it). The error reporting service requires setting up a local
error reporting service and configure it for reports, if you want them
sent anywhere other than to Microsoft.  Neither one of them is a
particularly good choie. The minidumps Craig's patch does are a lot
more useful.

We intentionally *disable* drwatson, because it's part of what pops up
an error message you have to click Ok on before your server will
continue running...

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Tom Lane

Дата:

22 ноября 2010 г., 14:41:16

Magnus Hagander <magnus@hagander.net> writes:
> On Mon, Nov 22, 2010 at 18:46, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ... I haven't looked at the patch but this
>> discussion makes it sound like the dumper is dependent on an
>> uncomfortably large amount of backend code being functional.

> No, it's dependent on close to zero backend functionality.
> Particularly if we take out the dependency on elog() (write_stderr is
> much simpler). In fact, the calls to elog() are the *only* places
> where it calls into the backend as it stands today.

Well, in the contrib-module guise, it's dependent on
shared_preload_libraries or local_preload_libraries, which at least
involves guc and dfmgr working pretty well (and not only in the
postmaster, but during backend startup).
        regards, tom lane

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Magnus Hagander

Дата:

22 ноября 2010 г., 14:50:11

On Mon, Nov 22, 2010 at 19:39, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> On Mon, Nov 22, 2010 at 18:46, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> ... I haven't looked at the patch but this
>>> discussion makes it sound like the dumper is dependent on an
>>> uncomfortably large amount of backend code being functional.
>
>> No, it's dependent on close to zero backend functionality.
>> Particularly if we take out the dependency on elog() (write_stderr is
>> much simpler). In fact, the calls to elog() are the *only* places
>> where it calls into the backend as it stands today.
>
> Well, in the contrib-module guise, it's dependent on
> shared_preload_libraries or local_preload_libraries, which at least
> involves guc and dfmgr working pretty well (and not only in the
> postmaster, but during backend startup).

Yes, sorry. My mindset was in "the version that'll go into 9.1".


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Peter Eisentraut

Дата:

22 ноября 2010 г., 15:24:17

On mån, 2010-11-22 at 19:56 +0200, Heikki Linnakangas wrote:
> This whole thing makes me wonder: is there truly no reliable, simple 
> method to tell Windows to create a core dump on crash, without writing
> custom code for it. I haven't seen one, but I find it amazing if there
> isn't. We can't be alone with this.

Well, there is no reliable and simple method to rename a file on
Windows, so what can you expect ... ;-)

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Robert Haas

Дата:

22 ноября 2010 г., 15:27:51

On Mon, Nov 22, 2010 at 1:28 PM, Magnus Hagander <magnus@hagander.net> wrote:
> I think the reasonable options are either "don't backpatch at all" or
> "backpatch the same way as we put it in HEAD, which is probably
> included in backend". I agree that sticking it in contrib is a
> half-measure that we shouldn't do.
>
> *IF* we go with a contrib module for 9.1 as well, we could backpatch
> as contrib module, but I think that's the only case.

I agree with this, certainly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Craig Ringer

Дата:

23 ноября 2010 г., 10:02:23

>> However, I am not clear what benefit we get from moving this into core
>> in 9.1.  If it's still going to require a full postmaster restart, the
>> GUC you have to change may as well be shared_preload_libraries as a
>> new one.

There's no reason it should require a postmaster restart. New backends 
spawned after the handler is turned on would enable it, and existing 
backends could be signalled to enable it as well.

> on-by-default is what we gain. I think that's fairly big...

More than that. If a crash handler is in core, then:

- It can be inited much earlier, catching crashes that happen during 
loading of libraries and during later backend init; and

- It's basically free when the cost of shared library loading is 
removed, so it can be left on in production or even be on by default. I 
need to do some testing on this, but given the apparently near-zero cost 
of initing the handler I won't be surprised if testing a GUC to see if 
the handler should be on or not costs more than loading it does.

I still wouldn't support on-by-default because you'd need an automatic 
process to weed out old crash dumps or limit the number stored. That's a 
bigger job. So I think the admin should have to turn it on, but it'd be 
good to make it easy to turn on in production without a restart; I don't 
see why that'd be hard.

I'll try to put together a patch that does just that, time permitting. 
Things are kind of hectic ATM.

--
Craig Ringer

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Craig Ringer

Дата:

23 ноября 2010 г., 10:05:19

On 11/23/2010 01:30 AM, Tom Lane wrote:

> I'm not really sure why we're even considering the notion of
> back-patching this item.  Doing so would not fit with any past practice
> or agreed-on project management practices, not even under our lax
> standards for contrib (and I keep hearing people claim that contrib
> is or should be as trustworthy as core, anyway).  Since when do we
> back-patch significant features that have not been through a beta test
> cycle?

I see no advantage to back-patching. It's easy to provide a drop-in 
binary DLL for older versions of Pg on Windows, who're the only people 
this will work for.

If the EDB folks saw value in it, they could bundle the DLL with updated 
point releases of the installer for Windows. No back-patching is necessary.

--
Craig Ringer

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Craig Ringer

Дата:

23 ноября 2010 г., 10:09:54

On 11/23/2010 01:46 AM, Tom Lane wrote:

>> * However, when storing it in crashdumps, I think the code would need
>> to create that directory if it does not exist, doesn't it?
>
> If it didn't do so, then manual creation/removal of the directory could
> be used as an on/off switch for the feature.

Yep. That's how I'd want to do it in 9.1 - test for the directory and 
use that to decide whether to set the handler during early backend 
startup. That way you don't need a GUC, and should be able to load it 
*very* early in backend startup.

> I haven't looked at the patch but this
> discussion makes it sound like the dumper is dependent on an
> uncomfortably large amount of backend code being functional.  You need
> to minimize the number of assumptions of that sort.

It doesn't need to have any backend code working, really; all it needs 
is a usable stack and the ability to allocate off the heap. It won't 
save you in total OOM situations, stack exhaustion, or severe stack 
corruption, but should work pretty much any other time.

The crash dump facility in dbghelp.dll offers *optional* features where 
apps can stream in additional app-specific data like recent log 
excerpts, etc. I didn't try to implement anything like that in the 
initial module partly because I want to minimize the amount of the 
backend that must be working for a crash dump to succeed.

--
Craig Ringer

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Craig Ringer

Дата:

23 ноября 2010 г., 10:12:23

On 11/23/2010 01:56 AM, Heikki Linnakangas wrote:
> On 22.11.2010 19:47, Robert Haas wrote:
>> I am as conservative about back-patching as anybody here, but
>> debugging on Windows is not an easy thing to do, and I strongly
>> suspect we are going to point people experiencing crashes on Windows
>> to this code whether it's part of our official distribution or not.
>
> This whole thing makes me wonder: is there truly no reliable, simple
> method to tell Windows to create a core dump on crash, without writing
> custom code for it. I haven't seen one, but I find it amazing if there
> isn't. We can't be alone with this.

Search for 'dbghelp.dll' on your average Windows system and you'll be 
surprised how many apps use it. Steam (the software distribution system) 
does, as does the Adobe Creative Suite on my machine.

If you're running in interactive mode with access to the user's display 
you can use Windows error reporting. Services running in isolated user 
accounts don't seem to be able to use that. In any case, windows error 
reporting only collects *tiny* dumps with barely anything beyond the 
stack contents; they're a nightmare to use, and really require psychic 
powers and deep knowledge of scary Windows APIs for effective debugging.

--
Craig Ringer

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Magnus Hagander

Дата:

23 ноября 2010 г., 17:17:58

On Tue, Nov 23, 2010 at 15:02, Craig Ringer <craig@postnewspapers.com.au> wrote:
>>> However, I am not clear what benefit we get from moving this into core
>>> in 9.1.  If it's still going to require a full postmaster restart, the
>>> GUC you have to change may as well be shared_preload_libraries as a
>>> new one.
>
> There's no reason it should require a postmaster restart. New backends
> spawned after the handler is turned on would enable it, and existing
> backends could be signalled to enable it as well.

I think that came off my comment that we could store the on/off in the
startup shared memory block. It'd then be the only way to get it into
any existing backends.


>> on-by-default is what we gain. I think that's fairly big...
>
> More than that. If a crash handler is in core, then:
>
> - It can be inited much earlier, catching crashes that happen during loading
> of libraries and during later backend init; and

Yeah.


> - It's basically free when the cost of shared library loading is removed, so
> it can be left on in production or even be on by default. I need to do some
> testing on this, but given the apparently near-zero cost of initing the
> handler I won't be surprised if testing a GUC to see if the handler should
> be on or not costs more than loading it does.

I'm fairly certain it does. The GUC would be there to be able to turn
the whole thing off because you don't want the dumps, not for
performance reasons.

> I still wouldn't support on-by-default because you'd need an automatic
> process to weed out old crash dumps or limit the number stored. That's a
> bigger job. So I think the admin should have to turn it on, but it'd be good
> to make it easy to turn on in production without a restart; I don't see why
> that'd be hard.

I think the admin should deal with that - just like the admin has to
clear out the old logs.


> I'll try to put together a patch that does just that, time permitting.
> Things are kind of hectic ATM.

Let me know if you want me to look at adapting the patch for that - i
can do that if you prefer.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Magnus Hagander

Дата:

23 ноября 2010 г., 17:18:33

On Tue, Nov 23, 2010 at 15:09, Craig Ringer <craig@postnewspapers.com.au> wrote:
> On 11/23/2010 01:46 AM, Tom Lane wrote:
>
>>> * However, when storing it in crashdumps, I think the code would need
>>> to create that directory if it does not exist, doesn't it?
>>
>> If it didn't do so, then manual creation/removal of the directory could
>> be used as an on/off switch for the feature.
>
> Yep. That's how I'd want to do it in 9.1 - test for the directory and use
> that to decide whether to set the handler during early backend startup. That
> way you don't need a GUC, and should be able to load it *very* early in
> backend startup.

Or you set the handler always, and have the handler only actually
create the dump if the directory exists. That way you can add the
directory and still get  a dump from both existing backends and the
postmaster itself without a restart.


> The crash dump facility in dbghelp.dll offers *optional* features where apps
> can stream in additional app-specific data like recent log excerpts, etc. I
> didn't try to implement anything like that in the initial module partly
> because I want to minimize the amount of the backend that must be working
> for a crash dump to succeed.

Yeah, we already have the logs in the logfile etc. Let's keep this
uncomplicated so that it stays working :-)


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Tom Lane

Дата:

23 ноября 2010 г., 17:43:31

Magnus Hagander <magnus@hagander.net> writes:
> On Tue, Nov 23, 2010 at 15:09, Craig Ringer <craig@postnewspapers.com.au> wrote:
>> Yep. That's how I'd want to do it in 9.1 - test for the directory and use
>> that to decide whether to set the handler during early backend startup. That
>> way you don't need a GUC, and should be able to load it *very* early in
>> backend startup.

> Or you set the handler always, and have the handler only actually
> create the dump if the directory exists.

+1 for that way.
        regards, tom lane

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Craig Ringer

Дата:

23 ноября 2010 г., 19:29:11

On 11/24/2010 05:18 AM, Magnus Hagander wrote:

> Or you set the handler always, and have the handler only actually
> create the dump if the directory exists. That way you can add the
> directory and still get  a dump from both existing backends and the
> postmaster itself without a restart.

That's way smarter. No extra filesystem access during startup, even if 
it is cheap.

--
Craig Ringer

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Greg Smith

Дата:

05 декабря 2010 г., 01:12:44

Craig Ringer wrote:
> On 11/24/2010 05:18 AM, Magnus Hagander wrote:
>
>> Or you set the handler always, and have the handler only actually
>> create the dump if the directory exists. That way you can add the
>> directory and still get  a dump from both existing backends and the
>> postmaster itself without a restart.
>
> That's way smarter. No extra filesystem access during startup, even if 
> it is cheap.

I added a commenting referencing this bit to the CF entry so it doesn't 
get forgotten.  Magnus raised a few other issues in his earlier review 
too.  Discussion of this patch seems to have jumped the gun a bit into 
talking about backpatching before the code for HEAD was completely done, 
then stalled there.  Are we going to see an updated patch that addresses 
submitted feedback in this cycle?

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Craig Ringer

Дата:

06 декабря 2010 г., 00:59:32

On 22/11/2010 7:37 PM, Magnus Hagander wrote:

> Finally getting to looking at this one - sorry about the very long delay.

Ditto, I'm afraid.

> I agree with Heikki's earlier comment that it's better to have this
> included in the backend - but that's obviously not going to happen for
> already-released versions. I'd therefor advocate a contrib module for
> existing versions, and then in core for 9.1+.

I think that was the conclusion the following discussion came to, as
well. It's easy to distribute a binary .dll for older versions and if
the EDB folks found this facility really useful they could even bundle
it in the installer. There's no point backpatching it given how few
win32 users will build from source.

> We should then have an option to turn it off (on by default). But we
> don't need to pay the overhead on every backend startup - we could
> just map the value into the parameter shared memory block, and require
> a full postmaster restart in order to change it.

As discussed later, implemented by testing for a 'crashdumps' directory.
If the directory exists, a dump is written. If the directory is missing
a log message is emitted to explain why no dump was written; as this'll
only happen when a backend dies, log noise isn't an issue and it'll help
save sysadmin head-scratching given the "magic" behaviour of turning
on/off based on directory existence.

> Do we want to backpatch it into contrib/? Adding a new module there
> seems kind of wrong - probably better to keep the source separate and
> just publish the DLL files for people who do debugging?

+1 to just publishing the DLLs

> Looking at the code:
> * Why do we need to look at differnt versions of dbghelp.dll? Can't we
> always use the one with Windows? And in that case, can't we just use
> compile-time linking, do we need to bother with DLL loading at all?

Newer versions than those shipped with an OS release contain updated and
improved functionality as well as bug fixes. We *can* use the version
shipped with the OS, as even in XP dbghelp.dll contains the required
functionality, but it'd be better to bundle it. That's microsoft's
recommendation when you use dbghelp.dll .

> * Per your comment about using elog() - a good idea in that case is to
> use write_stderr(). That will send the output to stderr if there is
> one, and otherwise the eventlog (when running as service).

Hm, ok. The attached patch doesn't do that yet. Given the following
comments, do you think the change still necessary?

> * And yes, in the crash handler, we should *definitely* not use elog().

Currently I'm actually using it, but only after the dump has been
written or we've decided not to write a dump. At this point, if elog()
fails we don't really care, we'd just be returning control to the OS's
fatal exception cleanup anyway. It's IMO desirable for output to go to
the same log as everything else, so admins can find out what's going on
and can associate the crash dump files with events that happened around
a certain time.

> * On Unix, the core file is dropped in the database directory, we
> don't have a separate directory for crashdumps. If we want to be
> consistent, we should do that here too. I do think that storing them
> in a directory like "crashdumps" is better, but I just wanted to raise
> the comment.

Given the recent discussion about using the dir as an on/off switch,
that'll be necessary.

> * However, when storing it in crashdumps, I think the code would need
> to create that directory if it does not exist, doesn't it?

Not now that we're using it an on/off switch, but of course it would've
made sense otherwise.

> * Right now, we overwrite old crashdumps. It is well known that PIDs
> are recycled pretty quickly on Windows - should we perhaps dump as
> postgres-<pid>-<sequence>.mdmp when there is a filename conflict?

Good idea. Rather than try to test for existing files I've just taken
the simple route (always good in code run during process crash) and
appended the system ticks to the dump name. System ticks are a 32-bit
quantity that wraps about every 40 days, so they should be easily unique
enough in combination with the pid. The logs will report the crashdump
filenames unless the backend is too confused to be able to elog(), and
all the crash dump files have file system timestamps so there isn't much
point trying to format a crash date/time into their names.

I could use wall clock seconds (or ms) instead, but favour system ticks
because they can be read with a simple int-returning call that doesn't
require any more stack allocation. On Windows, getting wall clock time
in seconds seems to involve a call to GetSystemTimeAsFileTime
(http://msdn.microsoft.com/en-us/library/ms724397(v=VS.85).aspx) to
populate a struct containing a fake-64-bit int as two 32-bit ints. It's
not clear that it's free of timezone calculations and generally looks
like something better avoided in a crash handler if it's not necessary.
As GetSystemTicks seems quite sufficient, I thought I'd stick with it.

--
Craig Ringer

Вложения

0001-Crash-dump-support-for-win32.patch

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Craig Ringer

Дата:

06 декабря 2010 г., 01:57:22

On 6/12/2010 12:57 PM, Craig Ringer wrote:
> On 22/11/2010 7:37 PM, Magnus Hagander wrote:
>
>> Finally getting to looking at this one - sorry about the very long delay.
>
> Ditto, I'm afraid.

Oh, I forgot to mention in the patch email: I'm not sure I've taken the 
right approach in terms of how I've hooked the crash dump code into the 
build system. I'm fairly happy with when it's actually inited, and with 
the win32 portfile (ports/win32/crashdump.c) - what I'm not so sure 
about is how I've handled the conditional compilation by platform.

Currently, src/backend/ports/win32/crashdump.c is compiled on win32. It 
exports a generic win32-garbage-free function:
 void installCrashDumpHandler();

I was just going to create a no-op version of the same function in a 
(new) file src/backend/ports/crashdump.c , but I was having problems 
figuring out how to sensibly express "compile this file for everything 
except these ports". In addition, doing it this way means there's still 
a pointless no-op function call in each backend start and some useless 
symbols - maybe not a big worry, but not ideal.

What I ended up doing was putting a conditionally compiled switch in 
port.h that produces an inline no-op version of installCrashDumpHandler 
for unsupported platforms, and an extern for supported platforms 
(currently win32). That'll optimize out completely on unsupported 
platforms, which is nice if unimportant.

I'm just not sure if port.h is really the right place and if it's only 
used by the backend. I considered a new header included by postgres.h 
(since it's really backend-only stuff) but as it seems Pg generally 
prefers bigger headers over more little headers, I popped it in port.h .

Comments?

-- 
Craig Ringer

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Craig Ringer

Дата:

14 декабря 2010 г., 02:02:52

I've attached an updated patch that fixes a failure when compiling on
gcc/linux. The no-op inline installCrashDumpHandler() for unsupported
platforms was not declared static, so it was not being optimized out of
objects it wasn't used in and was causing symbol collisions during linkage.

"make check" now passes correctly on Linux and no warnings are generated
during compilation.
=======================
  All 124 tests passed.
=======================

Sorry about this mess-up; I don't recall having to use "inline static"
when using inlines in headers with c++, but as that was a while ago (and
C++) my recollection can't be trusted anyway.

Your thoughts on the following questions from before would be much
appreciated:

 > I'm not sure I've taken the
> right approach in terms of how I've hooked the crash dump code into the
> build system. I'm fairly happy with when it's actually inited, and with
> the win32 portfile (ports/win32/crashdump.c) - what I'm not so sure
> about is how I've handled the conditional compilation by platform.
>
> Currently, src/backend/ports/win32/crashdump.c is compiled on win32. It
> exports a generic win32-garbage-free function:
>
> void
> installCrashDumpHandler();
>
> I was just going to create a no-op version of the same function in a
> (new) file src/backend/ports/crashdump.c , but I was having problems
> figuring out how to sensibly express "compile this file for everything
> except these ports". In addition, doing it this way means there's still
> a pointless no-op function call in each backend start and some useless
> symbols - maybe not a big worry, but not ideal.
>
> What I ended up doing was putting a conditionally compiled switch in
> port.h that produces an inline no-op version of installCrashDumpHandler
> for unsupported platforms, and an extern for supported platforms
> (currently win32). That'll optimize out completely on unsupported
> platforms, which is nice if unimportant.
>
> I'm just not sure if port.h is really the right place and if it's only
> used by the backend. I considered a new header included by postgres.h
> (since it's really backend-only stuff) but as it seems Pg generally
> prefers bigger headers over more little headers, I popped it in port.h .

Вложения

0001-Add-support-for-generating-a-crash-dump-on-backend-p.patch

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Tom Lane

Дата:

14 декабря 2010 г., 13:03:02

Craig Ringer <craig@postnewspapers.com.au> writes:
> I've attached an updated patch that fixes a failure when compiling on 
> gcc/linux. The no-op inline installCrashDumpHandler() for unsupported 
> platforms was not declared static, so it was not being optimized out of 
> objects it wasn't used in and was causing symbol collisions during linkage.

Why in the world would you get involved in that portability mess for a
function that is called only once?  There's no possible performance
justification for making it inline.

I'm also wondering why you have got conflicting declarations in
postgres.h and port.h, and why none of these declarations follow
ANSI C (write "(void)" not "()").
        regards, tom lane

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Greg Smith

Дата:

15 декабря 2010 г., 09:19:02

I've updated this entry in the CommitFest app to note that Craig had 
some implementation questions attached to his patch submission that I 
haven't seen anyone address yet, and to include a reference to Tom's 
latest question--which may make those questions moot, not sure.  This 
pretty clearly need to sit on the stove a little bit longer before it's 
done regardless.  I'm marking this one Returned With Feedback, and 
hopefully Craig will continue hammering on this to clean up the 
remaining details and resubmit in the next month.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Craig Ringer

Дата:

15 декабря 2010 г., 10:20:55

On 12/15/2010 01:01 AM, Tom Lane wrote:
> Craig Ringer<craig@postnewspapers.com.au>  writes:
>> I've attached an updated patch that fixes a failure when compiling on
>> gcc/linux. The no-op inline installCrashDumpHandler() for unsupported
>> platforms was not declared static, so it was not being optimized out of
>> objects it wasn't used in and was causing symbol collisions during linkage.
>
> Why in the world would you get involved in that portability mess for a
> function that is called only once?  There's no possible performance
> justification for making it inline.

The main concern I heard voiced when first suggesting this was about 
performance. Given that concern, if I could make it a no-op on 
unix/linux I thought that worth doing.

I'm _much_ happier with a simple, non-ifdef'd extern function 
declaration and compilation of an empty function body on unsupported 
platforms. Given how concerned everyone was about *any* effect on 
backend startup, though, I was concerned that'd be turned down as 
unnecessary bloat.

I've done it a nicer way now, and will post the updated patch once I've 
had a chance to re-test it on my Windows dev box.

> I'm also wondering why you have got conflicting declarations in
> postgres.h and port.h, and why none of these declarations follow
> ANSI C (write "(void)" not "()").

For postgres.h : that's a good question, as I thought I removed that. I 
suspect it was reintroduced when reapplying the patch to my working tree 
to revise it. Whoops.

As for the ansi C style - too much time with C++, though long ago now. I 
think I got the PostgreSQL rules for code formatting right, but missed 
the void param rule.

--
Craig Ringer

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Craig Ringer

Дата:

16 декабря 2010 г., 04:21:19

Hi all

Updated patch attached. This one avoids the inline function stuff and
instead just decides whether to compile unix_crashdump.c or
win32_crashdump.c based on build system tests. It passes "make check" on
nix and tests on win32, both with and without the "crashdumps" directory
existing. The inadvertent duplication of the functon prototype is fixed.

I haven't added any SGML documentation yet, as I'm not sure (a) to what
level it should be documented, and (b) whether it should be in the main
docs or just the developer docs plus the crash debugging guide on the
wiki. Thoughts?

You can also find this patch at:

  https://github.com/ringerc/postgres/tree/crashdump

ie git://github.com/ringerc/postgres.git, branch "crashdump" .

--
System & Network Administrator
POST Newspapers

Вложения

0001-Add-support-for-generating-a-crash-dump-on-backend-p.patch

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Craig Ringer

Дата:

16 декабря 2010 г., 04:28:37

BTW, related to the crash dump work:

I wanted to see if I could get any integration into Windows Error
Reporting for Pg, because it'd be nice to be able to use Microsoft's
servers to aggregate crash reports and collect data on problems. WER (on
Windows 7 and Vista) also provides automatic management of retained
local copies of crash dumps, automatic bundling of related files,
automatic out-of-process debugging, etc.

WER has been significantly improved in Windows Visa / 7 and may be worth
considering enabling its use when Pg is installed on those platforms. Do
you think there'd be any interest in re-enabling WER when installation
on win7 / vista is detected?


I've seen mention that the win32 installers "go to some lengths" to
disable WER because it tends to hang the server while waiting for a user
prompt. I was wondering if/how the installer disables WER for Pg? There
aren't any calls to AddERExcludedApplication() or
WerAddExcludedApplication() in the Pg sources. Pg doesn't appear under:
 HKEY_CURRENT_USER\Software\Microsoft\Windows\Windows Error Reporting
HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\WindowsError Reporting
 

within an ExcludedApplications key either.


WHY WINDOWS ERROR REPORTING?
============================

WER on Windows 7 permits things like an out-of-process debugger to be
invoked. Once the postmaster knows how to manage generic children (as
has been discussed to permit better integration of PgAgent among other
things) it could also become responsible for doing out-of-process crash
dumping and error reporting on crashed children rather than having to do
potentially unreliable self-debugging as is presently the case. It might
not even be necessary to do that much, as WER on win7/vista offers so
much more configurability.

On Win7/vista you can also set WER up not to block execution while it
waits for user input, and to prompt the user even when the user is
running under a different security level. It can automatically bundle
postgresql.conf, the most recent log file, etc when an error report is
sent. By contrast to app-generated minidumps, most importantly it
doesn't require the user to do anything more than click "yes" for an
error report to be collected and become available for analysys of most
frequent failures.

The downside is that it requires a *business* to register for access to
WER data, and that business must obtain a VeriSign code signing cert
(US$99 if obtained for WER purposes). This might be something the EDB
folks would be interested in, especially as it should also permit
shipping of signed Pg binaries, which is great for protecting against AV
software, making it easier to deploy in business environments, etc.

Note that Windows Error Reporting doens't eliminate the utility of the
crashdumps feature, as self-crashdumps are easily accessible to and
configurable by the local user/admin and work fine on Windows XP. The
current self-generated crash dumps also contain more information than
you'll usually get from a WER dump, though it may be possible to
customise WER so it includes the same information.

I'd really like to know more about past experience with Windows Error
Reporting, because I'd like to investigate whether it's worth using in
future with a few tweaks for win7 use.

-- 
System & Network Administrator
POST Newspapers

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Magnus Hagander

Дата:

16 декабря 2010 г., 09:01:31

On Thu, Dec 16, 2010 at 09:21, Craig Ringer <craig@postnewspapers.com.au> wrote:
> Hi all
>
> Updated patch attached. This one avoids the inline function stuff and
> instead just decides whether to compile unix_crashdump.c or
> win32_crashdump.c based on build system tests. It passes "make check" on
> nix and tests on win32, both with and without the "crashdumps" directory
> existing. The inadvertent duplication of the functon prototype is fixed.

I've updated this patch in quite a few ways, and you can find the
latest on my github (will attach when I'm done with cleanups). Things
I've changed so far:

* I moved it all into backend/port/win32, and thus removed all the
autoconf things, since they are not necessary. This is only for win32
at this point, and AFAIK we don't expect it to be for other platforms.
If that's needed, we can move it back later, but for now let's keep it
simple.
* using elog() really isn't safe. We set the crash handler *long*
before we have loaded the elog subsystem. It's better to get a log to
eventlog than to not get it at all.
* Plus a number of smaller changes

One thing that I did notice, and I'm not entirely sure what to do with
- which is why I wanted to post this while I'm still working on the
cleanup:

We use the existance of the "crashdumps" directory as an indication we
want crashdumps. That's fine when the system is up. But what if we
crash *in the postmaster before we have done chdir()*?

Should we perhaps instead define a subdirectory of *where the .EXE
file is*, and dump the file there?

> I haven't added any SGML documentation yet, as I'm not sure (a) to what
> level it should be documented, and (b) whether it should be in the main
> docs or just the developer docs plus the crash debugging guide on the
> wiki. Thoughts?

It needs to be documented somewhere.Perhaps in "15.8 Platform Specific
Notes"? That's really about building it, but it might be reasonable
there anyway? It does hold a number of things today that aren't
related to building, for other platforms.

(And yes, it should go in the wiki as well, of course)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Robert Haas

Дата:

16 декабря 2010 г., 10:07:40

On Thu, Dec 16, 2010 at 8:01 AM, Magnus Hagander <magnus@hagander.net> wrote:
> We use the existance of the "crashdumps" directory as an indication we
> want crashdumps. That's fine when the system is up. But what if we
> crash *in the postmaster before we have done chdir()*?
>
> Should we perhaps instead define a subdirectory of *where the .EXE
> file is*, and dump the file there?

That seems pretty ugly, for a pretty marginal case.  The number of
crashes that are going to happen in the postmaster before we've done
chdir() should be extremely small, especially if we're talking about
crashes in the field rather than during development.  How about adding
a command-line option to force a dump to be written in $CWD whether a
crashdumps directory exists or not?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Magnus Hagander

Дата:

16 декабря 2010 г., 10:16:36

On Thu, Dec 16, 2010 at 14:01, Magnus Hagander <magnus@hagander.net> wrote:
> On Thu, Dec 16, 2010 at 09:21, Craig Ringer <craig@postnewspapers.com.au> wrote:
>> Hi all
>>
>> Updated patch attached. This one avoids the inline function stuff and
>> instead just decides whether to compile unix_crashdump.c or
>> win32_crashdump.c based on build system tests. It passes "make check" on
>> nix and tests on win32, both with and without the "crashdumps" directory
>> existing. The inadvertent duplication of the functon prototype is fixed.
>
> I've updated this patch in quite a few ways, and you can find the
> latest on my github (will attach when I'm done with cleanups). Things
> I've changed so far:

Found another problem in it: when running with an older version of
dbghelp.dll (which I was), it simply didn't work. We need to grab the
version of dbghelp.dll at runtime and pick which things we're going to
dump based on that.

The attached version of the patch does that. Can you please test that
it still generates the full dump on your system, which supposedly has
a much more modern version of dbghelp.dll than mine?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Вложения

win32_crash.patch

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Magnus Hagander

Дата:

16 декабря 2010 г., 10:24:31

On Thu, Dec 16, 2010 at 15:07, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Dec 16, 2010 at 8:01 AM, Magnus Hagander <magnus@hagander.net> wrote:
>> We use the existance of the "crashdumps" directory as an indication we
>> want crashdumps. That's fine when the system is up. But what if we
>> crash *in the postmaster before we have done chdir()*?
>>
>> Should we perhaps instead define a subdirectory of *where the .EXE
>> file is*, and dump the file there?
>
> That seems pretty ugly, for a pretty marginal case.  The number of
> crashes that are going to happen in the postmaster before we've done
> chdir() should be extremely small, especially if we're talking about
> crashes in the field rather than during development.  How about adding
> a command-line option to force a dump to be written in $CWD whether a
> crashdumps directory exists or not?

Is that less ugly? ;)

But yes, we are talking about in the field, so it's fairly small. But
any crash during guc loading for example would go there, I think?

We can do such a commandline. We don't have any platform-specific
commandline options today. Is that something we've intentionally
avoided, or just not needed before?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Robert Haas

Дата:

16 декабря 2010 г., 10:45:31

On Thu, Dec 16, 2010 at 9:24 AM, Magnus Hagander <magnus@hagander.net> wrote:
> Is that less ugly? ;)

Well, I thought it was or I would have suggested it, but it's
obviously open to interpretation.

> But yes, we are talking about in the field, so it's fairly small. But
> any crash during guc loading for example would go there, I think?

Probably.  But how likely is that to happen only on Windows and only
in the field?

> We can do such a commandline. We don't have any platform-specific
> commandline options today. Is that something we've intentionally
> avoided, or just not needed before?

Beats me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Tom Lane

Дата:

16 декабря 2010 г., 10:57:43

Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Dec 16, 2010 at 9:24 AM, Magnus Hagander <magnus@hagander.net> wrote:
>> We can do such a commandline. We don't have any platform-specific
>> commandline options today. Is that something we've intentionally
>> avoided, or just not needed before?

> Beats me.

Yes, it's something we intentionally avoid: there is no convenient way
to have conditional entries in getopt()'s option-letters string.

I think the proposal for such a switch is unnecessary lily-gilding,
and is far more likely to create problems than solve any.  Please
remember that a debugging facility has got to be minimal impact,
or it creates its own Heisenbugs.

(In the unlikely event that someone needed to debug such an early crash,
surely they could just create a crashdump subdirectory in whatever
directory they want to launch the postmaster from.)
        regards, tom lane

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Magnus Hagander

Дата:

16 декабря 2010 г., 11:16:24

On Thu, Dec 16, 2010 at 15:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Dec 16, 2010 at 9:24 AM, Magnus Hagander <magnus@hagander.net> wrote:
>>> We can do such a commandline. We don't have any platform-specific
>>> commandline options today. Is that something we've intentionally
>>> avoided, or just not needed before?
>
>> Beats me.
>
> Yes, it's something we intentionally avoid: there is no convenient way
> to have conditional entries in getopt()'s option-letters string.
>
> I think the proposal for such a switch is unnecessary lily-gilding,
> and is far more likely to create problems than solve any.  Please
> remember that a debugging facility has got to be minimal impact,
> or it creates its own Heisenbugs.
>
> (In the unlikely event that someone needed to debug such an early crash,
> surely they could just create a crashdump subdirectory in whatever
> directory they want to launch the postmaster from.)

Or actually - run the postmaster from a debugger...

I think the thing we lose is the ability for the DBA to say
pre-emptively "if this crashes tomorrow on restart i want a
crashdump".

Hmm. What we could do is have pg_ctl chdir() into the data directory
on start. We don't set it to anything there now - so we rely on the
"If this parameter is NULL, the new process will have the same current
drive and directory as the calling process." Is that likely to cause
problems in other areas?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Tom Lane

Дата:

16 декабря 2010 г., 11:56:13

Magnus Hagander <magnus@hagander.net> writes:
> On Thu, Dec 16, 2010 at 15:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think the proposal for such a switch is unnecessary lily-gilding,

> Hmm. What we could do is have pg_ctl chdir() into the data directory
> on start.

See above.  You're solving a problem that probably doesn't exist,
and introducing a pile of new ones in the process --- for example,
this will break commands that use relative paths for either the
postmaster executable or the data directory itself.
        regards, tom lane

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Tom Lane

Дата:

16 декабря 2010 г., 18:42:06

Magnus Hagander <magnus@hagander.net> writes:
> Found another problem in it: when running with an older version of
> dbghelp.dll (which I was), it simply didn't work. We need to grab the
> version of dbghelp.dll at runtime and pick which things we're going to
> dump based on that.

> The attached version of the patch does that. Can you please test that
> it still generates the full dump on your system, which supposedly has
> a much more modern version of dbghelp.dll than mine?

This version of the patch looks fairly sane in terms of how it connects
to the rest of the code; but I'm unqualified to comment on the
Windows-specific code.

One thought is maybe the call point should be in startup_hacks() rather
than the main line of main()?  I'm not terribly set on it if you don't
like that.

Also, I notice that the comment for startup_hacks() claims that it isn't
executed in subprocesses, which is evidently an obsolete statement ---
that should get cleaned up independently of this.
        regards, tom lane

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Craig Ringer

Дата:

17 декабря 2010 г., 03:02:22

On 16/12/10 21:01, Magnus Hagander wrote:

> Found another problem in it: when running with an older version of
> dbghelp.dll (which I was), it simply didn't work. We need to grab the
> version of dbghelp.dll at runtime and pick which things we're going to
> dump based on that.

I was about to suggest dropping the fallback loading of the system
dbghelp.dll, and just bundle a suitable version with Pg, then I saw how
big the DLL is. It's 800kb (!!) of code that'll hopefully never get used
on any given system. With a footprint like that, bundling it seems
rather less attractive.

I think Magnus is right: test the dbghelp.dll version and fall back to
supported features - as far back as XP, anyway; who cares about anything
earlier than that. An updated version can always be put in place by the
user if the built-in one can't produce enough detail.

> * I moved it all into backend/port/win32, and thus removed all the
> autoconf things, since they are not necessary. This is only for win32
> at this point, and AFAIK we don't expect it to be for other platforms.
> If that's needed, we can move it back later, but for now let's keep it
> simple.

Yeah, fair call - especially as most other OSes have native crash dump
facilities (kernel-generated core dumps, etc) that render explicit crash
handling much less useful.

> * using elog() really isn't safe. We set the crash handler *long*
> before we have loaded the elog subsystem. It's better to get a log to
> eventlog than to not get it at all.

Good point. It made some sense when running as a loadable module, as the
crash dumper was only loaded quite late, and so long as it elog()'d only
after writing the dump that was fine. Now, not so much.

> We use the existance of the "crashdumps" directory as an indication we
> want crashdumps. That's fine when the system is up. But what if we
> crash *in the postmaster before we have done chdir()*?
> 
> Should we perhaps instead define a subdirectory of *where the .EXE
> file is*, and dump the file there?

Well, on Windows it's pretty easy to find out where your executable is,
and the crash dumper already does that to determine where to load
dbghelp.dll from. It'd be no stretch to use a subdir of the postgresql
install dir for crash dumps. OTOH, there are all sorts of exciting
permissions issues to deal with then, as write permission for "postgres"
won't be inherited from the datadir. It's also generally a big ugly and
not very good practice to write to the application directory.

If not writing to the datadir, for non-services you'd use something like:
   %LOCALAPPDATA%\postgres\crashdumps

... but for a service running under its own account that's not a good
option either. Consequently, IMO the datadir remains the best place for
crashdumps.

Is this going to be a real-world issue? One simple answer might be not
to load the crash dump handler until after the postmaster chdir()s, but
honestly I'm not sure it matters. If it can't find somewhere to write
dumps, it doesn't write a dump.

The only possible issue I can imagine would be if the postmaster started
with a cwd that a malicious user could write to, so they could create a
'crashdumps' dir, somehow cause an early start postmaster crash,
therefore somehow leak info from the protected "postgres" account files
that way. A quick look at the code suggests that the postmaster doesn't
do anything interesting in the datadir before chdir()ing in there, so I
don't see any danger in that. In any case, if a malicious user has write
access to the postmaster's startup cwd, with the design of Windows PATH
and library loading that means they can replace critical DLLs with their
own malicious versions, so there's a whole lot more to worry about than
leaking information via crashdumps.

So: I say just use the cwd, whatever it is, and don't worry about it. An
admin can manually create a "crashdumps" dir in the service start cwd
and set appropriate permissions in the unlikely event they're trying to
collect crash dumps for an early-crashing postmaster and they can't just
use a debugger directly.

> It needs to be documented somewhere.Perhaps in "15.8 Platform Specific
> Notes"? That's really about building it, but it might be reasonable
> there anyway? It does hold a number of things today that aren't
> related to building, for other platforms.

Seems reasonable.

-- 
System & Network Administrator
POST Newspapers

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Magnus Hagander

Дата:

17 декабря 2010 г., 03:47:07

<p><br /> On Dec 17, 2010 8:02 AM, "Craig Ringer" <<a
href="mailto:craig@postnewspapers.com.au">craig@postnewspapers.com.au</a>>wrote:<br /> ><br /> > On 16/12/10
21:01,Magnus Hagander wrote:<br /> ><br /> > > Found another problem in it: when running with an older version
of<br/> > > dbghelp.dll (which I was), it simply didn't work. We need to grab the<br /> > > version of
dbghelp.dllat runtime and pick which things we're going to<br /> > > dump based on that.<br /> ><br /> > I
wasabout to suggest dropping the fallback loading of the system<br /> > dbghelp.dll, and just bundle a suitable
versionwith Pg, then I saw how<br /> > big the DLL is. It's 800kb (!!) of code that'll hopefully never get used<br
/>> on any given system. With a footprint like that, bundling it seems<br /> > rather less attractive.<br />
><br/> > I think Magnus is right: test the dbghelp.dll version and fall back to<br /> > supported features -
asfar back as XP, anyway; who cares about anything<br /> > earlier than that. An updated version can always be put
inplace by the<br /> > user if the built-in one can't produce enough detail.<p>Did you get a chance to test that it
stillproduced a full dump on your system? <p>> > We use the existance of the "crashdumps" directory as an
indicationwe<br /> > > want crashdumps. That's fine when the system is up. But what if we<br /> > > crash
*inthe postmaster before we have done chdir()*?<br /> > ><br /> > > Should we perhaps instead define a
subdirectoryof *where the .EXE<br /> > > file is*, and dump the file there?<p>... <p>> Is this going to be a
real-worldissue?<p>Nah, I'm with tom on the fact that this is probably not. <br /><p>> > It needs to be
documentedsomewhere.Perhaps in "15.8 Platform Specific<br /> > > Notes"? That's really about building it, but it
mightbe reasonable<br /> > > there anyway? It does hold a number of things today that aren't<br /> > >
relatedto building, for other platforms.<br /> ><br /> > Seems reasonable.<p>I'll put something together there
then.<p>/Magnus

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Craig Ringer

Дата:

17 декабря 2010 г., 07:09:18

On 17/12/2010 3:46 PM, Magnus Hagander wrote:
>
> On Dec 17, 2010 8:02 AM, "Craig Ringer" <craig@postnewspapers.com.au
> <mailto:craig@postnewspapers.com.au>> wrote:
>  >
>  > On 16/12/10 21:01, Magnus Hagander wrote:
>  >
>  > > Found another problem in it: when running with an older version of
>  > > dbghelp.dll (which I was), it simply didn't work. We need to grab the
>  > > version of dbghelp.dll at runtime and pick which things we're going to
>  > > dump based on that.
>  >
>  > I was about to suggest dropping the fallback loading of the system
>  > dbghelp.dll, and just bundle a suitable version with Pg, then I saw how
>  > big the DLL is. It's 800kb (!!) of code that'll hopefully never get used
>  > on any given system. With a footprint like that, bundling it seems
>  > rather less attractive.
>  >
>  > I think Magnus is right: test the dbghelp.dll version and fall back to
>  > supported features - as far back as XP, anyway; who cares about anything
>  > earlier than that. An updated version can always be put in place by the
>  > user if the built-in one can't produce enough detail.
>
> Did you get a chance to test that it still produced a full dump on your
> system?

I've just tested that and it looks ok so far. The dumps produced are 
smaller - 1.3MB instead of ~4MB for a dump produced by calling a simple 
"crashme()" function from SQL, the source for which follows at the end 
of this post. However, I'm getting reasonable stack traces, as shown by 
the windb.exe output below.

That said, the dump appears to exclude the backend's private memory. I 
can't resolve *fcinfo from PG_FUNCTION_ARGS; it looks like the direct 
values of arguments are available as they're on the stack, as are 
function local variables, but the process heap isn't dumped so you can't 
see the contents of any pointers-to-struct. That'll make it harder to 
track down many kinds of error - but, on the other hand, means that 
dumps of processes with huge work_mem etc won't be massively bloated.

That might be a safer starting point than including the private process 
memory, really. I'd like to offer a flag to turn on dumping of private 
process memory but I'm not sure how best to go about it, given that we'd 
want to avoid accessing GUCs etc if possible. "later", I think; this is 
better for now.

I'm happy with the patch as it stands, with the one exception that 
consideration of mingw is required before it can be committed.

> Symbol search path is:
C:\Users\Craig\Developer\postgres\Release\postgres;SRV*c:\localsymbols*http://msdl.microsoft.com/download/symbols;C:\Program
Files\PostgreSQL\9.0\symbols
> Executable search path is:
> Windows 7 Version 7600 MP (2 procs) Free x86 compatible
> Product: WinNt, suite: SingleUserTS
> Machine Name:
> Debug session time: Fri Dec 17 17:37:19.000 2010 (UTC + 8:00)
> System Uptime: not available
> Process Uptime: 0 days 0:01:09.000
> ................................
> This dump file has an exception of interest stored in it.
> The stored exception information can be accessed via .ecxr.
> (16b4.cd8): Access violation - code c0000005 (first/second chance not available)
> eax=000002e8 ebx=03210aa8 ecx=0055e700 edx=03210a68 esi=03210a68 edi=0055ea14
> eip=771464f4 esp=0055e6d4 ebp=0055e6e4 iopl=0         nv up ei pl zr na pe nc
> cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246
> ntdll!KiFastSystemCallRet:
> 771464f4 c3              ret
> 0:000> .ecxr
> eax=00000000 ebx=00000000 ecx=0103fa88 edx=7001100f esi=0103fa78 edi=0103fab4
> eip=70011052 esp=0055f62c ebp=0103fa78 iopl=0         nv up ei pl zr na pe nc
> cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00010246
> crashme!crashdump_crashme+0x2:
> 70011052 c70001000000    mov     dword ptr [eax],1    ds:0023:00000000=????????
> 0:000> ~
> .  0  Id: 16b4.cd8 Suspend: 0 Teb: 7ffdf000 Unfrozen
>    1  Id: 16b4.1798 Suspend: 0 Teb: 7ffde000 Unfrozen
>    2  Id: 16b4.6f4 Suspend: 0 Teb: 7ffdd000 Unfrozen
> 0:000> k
>   *** Stack trace for last set context - .thread/.cxr resets it
> ChildEBP RetAddr
> 0055f628 0142c797 crashme!crashdump_crashme+0x2 [c:\users\craig\developer\postgres\contrib\crashme\crashme.c @ 14]
> 0055f68c 0142c804 postgres!ExecMakeFunctionResult+0x427
[c:\users\craig\developer\postgres\src\backend\executor\execqual.c@ 1824]
 
> 0055f6b4 0142b760 postgres!ExecEvalFunc+0x34 [c:\users\craig\developer\postgres\src\backend\executor\execqual.c @
2260]
> 0055f6e0 0142ba83 postgres!ExecTargetList+0x70 [c:\users\craig\developer\postgres\src\backend\executor\execqual.c @
5095]
> 0055f720 0143f074 postgres!ExecProject+0x173 [c:\users\craig\developer\postgres\src\backend\executor\execqual.c @
5312]
> 0055f734 01427e07 postgres!ExecResult+0x94 [c:\users\craig\developer\postgres\src\backend\executor\noderesult.c @
157]
> 0055f744 01425ccd postgres!ExecProcNode+0x67 [c:\users\craig\developer\postgres\src\backend\executor\execprocnode.c @
361]
> 0055f758 01426ace postgres!ExecutePlan+0x2d [c:\users\craig\developer\postgres\src\backend\executor\execmain.c @
1236]
> 0055f788 0152ec7d postgres!standard_ExecutorRun+0x8e
[c:\users\craig\developer\postgres\src\backend\executor\execmain.c@ 288]
 
> 0055f7ac 0152f290 postgres!PortalRunSelect+0x6d [c:\users\craig\developer\postgres\src\backend\tcop\pquery.c @ 953]
> 0055f834 0152c2b2 postgres!PortalRun+0x190 [c:\users\craig\developer\postgres\src\backend\tcop\pquery.c @ 803]
> 0055f8e8 0152cbe5 postgres!exec_simple_query+0x3a2 [c:\users\craig\developer\postgres\src\backend\tcop\postgres.c @
1067]
> 0055f96c 014f2bfc postgres!PostgresMain+0x575 [c:\users\craig\developer\postgres\src\backend\tcop\postgres.c @ 3935]
> 0055f98c 014f58c9 postgres!BackendRun+0x19c [c:\users\craig\developer\postgres\src\backend\postmaster\postmaster.c @
3562]
> 0055fb30 014575bc postgres!SubPostmasterMain+0x2f9
[c:\users\craig\developer\postgres\src\backend\postmaster\postmaster.c@ 4058]
 
> 0055fb48 0162847d postgres!main+0x1ec [c:\users\craig\developer\postgres\src\backend\main\main.c @ 173]
> 0055fb8c 76ab1194 postgres!__tmainCRTStartup+0x10f [f:\dd\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 586]
> 0055fb98 7715b495 kernel32!BaseThreadInitThunk+0xe
> 0055fbd8 7715b468 ntdll!__RtlUserThreadStart+0x70
> 0055fbf0 00000000 ntdll!_RtlUserThreadStart+0x1b





/* Condensed source for crash-test contrib module. */
#include "postgres.h"
#include "fmgr.h"
PG_MODULE_MAGIC;
extern Datum crashdump_crashme(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(crashdump_crashme);
Datum
crashdump_crashme(PG_FUNCTION_ARGS)
{int * ptr = NULL;*ptr = 1;return NULL;
}





-- 
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Magnus Hagander

Дата:

17 декабря 2010 г., 07:17:46

On Fri, Dec 17, 2010 at 12:08, Craig Ringer <craig@postnewspapers.com.au> wrote:
> On 17/12/2010 3:46 PM, Magnus Hagander wrote:
>>
>> On Dec 17, 2010 8:02 AM, "Craig Ringer" <craig@postnewspapers.com.au
>> <mailto:craig@postnewspapers.com.au>> wrote:
>>  >
>>  > On 16/12/10 21:01, Magnus Hagander wrote:
>>  >
>>  > > Found another problem in it: when running with an older version of
>>  > > dbghelp.dll (which I was), it simply didn't work. We need to grab the
>>  > > version of dbghelp.dll at runtime and pick which things we're going
>> to
>>  > > dump based on that.
>>  >
>>  > I was about to suggest dropping the fallback loading of the system
>>  > dbghelp.dll, and just bundle a suitable version with Pg, then I saw how
>>  > big the DLL is. It's 800kb (!!) of code that'll hopefully never get
>> used
>>  > on any given system. With a footprint like that, bundling it seems
>>  > rather less attractive.
>>  >
>>  > I think Magnus is right: test the dbghelp.dll version and fall back to
>>  > supported features - as far back as XP, anyway; who cares about
>> anything
>>  > earlier than that. An updated version can always be put in place by the
>>  > user if the built-in one can't produce enough detail.
>>
>> Did you get a chance to test that it still produced a full dump on your
>> system?
>
> I've just tested that and it looks ok so far. The dumps produced are smaller
> - 1.3MB instead of ~4MB for a dump produced by calling a simple "crashme()"
> function from SQL, the source for which follows at the end of this post.
> However, I'm getting reasonable stack traces, as shown by the windb.exe
> output below.
>
> That said, the dump appears to exclude the backend's private memory. I can't
> resolve *fcinfo from PG_FUNCTION_ARGS; it looks like the direct values of
> arguments are available as they're on the stack, as are function local
> variables, but the process heap isn't dumped so you can't see the contents
> of any pointers-to-struct. That'll make it harder to track down many kinds
> of error - but, on the other hand, means that dumps of processes with huge
> work_mem etc won't be massively bloated.

What version of dbghelp do you have? And what does the API report for
it? The code is supposed to add the private memory if it finds version
6 or higher, perhaps that check is incorrect?



> That might be a safer starting point than including the private process
> memory, really. I'd like to offer a flag to turn on dumping of private
> process memory but I'm not sure how best to go about it, given that we'd
> want to avoid accessing GUCs etc if possible. "later", I think; this is
> better for now.

I'm not too happy with having a bunch of switches for it. People
likely to tweak those can just install the debugger tools and use
those, I think.

So I'd be happy to have it enabled - given that we want a default.
However, what I'm most interested in right now is finding out why it's
not being included anymore in the new patch, because it's supposed to
be..



> I'm happy with the patch as it stands, with the one exception that
> consideration of mingw is required before it can be committed.

What do you mean? That it has to be tested on mingw specifically, or
something else?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Tom Lane

Дата:

17 декабря 2010 г., 11:18:37

Magnus Hagander <magnus@hagander.net> writes:
> On Fri, Dec 17, 2010 at 12:08, Craig Ringer <craig@postnewspapers.com.au> wrote:
>> That might be a safer starting point than including the private process
>> memory, really.

> I'm not too happy with having a bunch of switches for it. People
> likely to tweak those can just install the debugger tools and use
> those, I think.

Yeah.

> So I'd be happy to have it enabled - given that we want a default.

There is absolutely nothing more frustrating than having a crash dump
that hasn't got the information you need.  Please turn it on by default.
        regards, tom lane

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Craig Ringer

Дата:

17 декабря 2010 г., 12:25:19

On 17/12/2010 7:17 PM, Magnus Hagander wrote:

> What version of dbghelp do you have?

6.1.7600.16385 by default, as shipped in Windows 7 32-bit, and what I 
was testing with.

6.12.0002.633 is what came with my copy of Debugging Tools for windows, 
from the windows SDK. The same version comes with Visual Studio 10 Express.

6.9.0003.113 is what came with Visual Studio 9 Express.

I've now re-tested with the latest, 6.12.0002.633, and have the same result.
> And what does the API report for it?

This:

version = (*pApiVersion)();
write_stderr("version: %hu.%hu.%hu\n",version->MajorVersion,version->MinorVersion,version->Revision);

outputs "version: 4.0.5" when loading dbghelp.dll 6.12.0002.633 from 
c:\postgres91\bin\dbghelp.dll as verified by a write_stderr() just 
before LoadLibrary and a test verifying the LoadLibrary worked.

Shouldn't dbghelp.dll just ignore any flags it doesn't understand? I'm 
surprised we can't just pass flags a particular version doesn't know 
about and have it deal with them. Still, I guess it's not wise to assume 
such things.

> The code is supposed to add the private memory if it finds version
> 6 or higher, perhaps that check is incorrect?

It begins to look like the version check might be lying. I'll dig 
further in a bit; it's 12:20am now and I need to sleep.

I'm going on holiday in about 48 hours, and will be away from Windows 
(phew!) unless I get a VM set up during that time. I'll see what I can do.

>> I'm happy with the patch as it stands, with the one exception that
>> consideration of mingw is required before it can be committed.
>
> What do you mean? That it has to be tested on mingw specifically, or
> something else?

I'm not even sure it'll compile or link with MinGW, and if it does, I 
doubt it'll be useful. It should only be compiled and called for native 
VC++ builds.

I'm not sure if backend\port\win32 is built for all win32, or only for 
VC++.  Testing for defined(_MSC_VER) would do the trick. I'm not sure 
testing defined(WIN32_ONLY_COMPILER) is quite right, since I doubt 
dbghelp.dll would be much use for a Pg compiled with Borland or the like 
either if that were ever supported.

-- 
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Magnus Hagander

Дата:

17 декабря 2010 г., 12:46:00

On Fri, Dec 17, 2010 at 17:24, Craig Ringer <craig@postnewspapers.com.au> wrote:
> On 17/12/2010 7:17 PM, Magnus Hagander wrote:
>
>> What version of dbghelp do you have?
>
> 6.1.7600.16385 by default, as shipped in Windows 7 32-bit, and what I was
> testing with.
>
> 6.12.0002.633 is what came with my copy of Debugging Tools for windows, from
> the windows SDK. The same version comes with Visual Studio 10 Express.
>
> 6.9.0003.113 is what came with Visual Studio 9 Express.
>
> I've now re-tested with the latest, 6.12.0002.633, and have the same result.
>
>> And what does the API report for it?
>
> This:
>
> version = (*pApiVersion)();
> write_stderr("version: %hu.%hu.%hu\n",
>        version->MajorVersion,
>        version->MinorVersion,
>        version->Revision);
>
> outputs "version: 4.0.5" when loading dbghelp.dll 6.12.0002.633 from
> c:\postgres91\bin\dbghelp.dll as verified by a write_stderr() just before
> LoadLibrary and a test verifying the LoadLibrary worked.

Now, that's annoying. So clearly we can't use that function to
determine which version we're on. Seems it only works for "image help
api", and not the general thing.

According to http://msdn.microsoft.com/en-us/library/ms679294(v=vs.85).aspx,
we could look for:

SysEnumLines - if present, we have at least 6.1.

However, I don't see any function that appeared in 6.0 only..

We're probably better off looking at the VERSIONINFO structure. Which
has a not-so-nice API, but at least it's documented :)


> Shouldn't dbghelp.dll just ignore any flags it doesn't understand? I'm
> surprised we can't just pass flags a particular version doesn't know about
> and have it deal with them. Still, I guess it's not wise to assume such
> things.

You'd think so, but if I include all the flags and run it with the
default one on Windows 2003, it comes back with the infamous
"Parameter Is Incorrect" error.



>> The code is supposed to add the private memory if it finds version
>> 6 or higher, perhaps that check is incorrect?
>
> It begins to look like the version check might be lying. I'll dig further in
> a bit; it's 12:20am now and I need to sleep.
>
> I'm going on holiday in about 48 hours, and will be away from Windows
> (phew!) unless I get a VM set up during that time. I'll see what I can do.

I'll try to put together a version that uses the VERSIONINFO to
determine it, for you to test.


>>> I'm happy with the patch as it stands, with the one exception that
>>> consideration of mingw is required before it can be committed.
>>
>> What do you mean? That it has to be tested on mingw specifically, or
>> something else?
>
> I'm not even sure it'll compile or link with MinGW, and if it does, I doubt
> it'll be useful. It should only be compiled and called for native VC++
> builds.

I'm pretty sure it could be useful with mingw as well. not *as*
useful, but still useful. You'd get proper stack traces and such for
anything in msvcrt, and the actual reason for the exception etc.


> I'm not sure if backend\port\win32 is built for all win32, or only for VC++.

It's for all win32.

>  Testing for defined(_MSC_VER) would do the trick. I'm not sure testing
> defined(WIN32_ONLY_COMPILER) is quite right, since I doubt dbghelp.dll would
> be much use for a Pg compiled with Borland or the like either if that were
> ever supported.

We don't support building the backend with borland - only libpq and psql.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Magnus Hagander

Дата:

17 декабря 2010 г., 13:13:53

On Fri, Dec 17, 2010 at 17:42, Magnus Hagander <magnus@hagander.net> wrote:
> On Fri, Dec 17, 2010 at 17:24, Craig Ringer <craig@postnewspapers.com.au> wrote:
>> On 17/12/2010 7:17 PM, Magnus Hagander wrote:
> Now, that's annoying. So clearly we can't use that function to
> determine which version we're on. Seems it only works for "image help
> api", and not the general thing.
>
> According to http://msdn.microsoft.com/en-us/library/ms679294(v=vs.85).aspx,
> we could look for:
>
> SysEnumLines - if present, we have at least 6.1.
>
> However, I don't see any function that appeared in 6.0 only..

Actually, I'm wrong - there are functions enough to determine the
version. So here's a patch that tries that.

(BTW, the document is actually incorrect - partially because the
version that's shipped with Windows 2003 which is 5.2, is more or less
undocumented)


Let me know how that one works for you, and if it doesn't, whether it
does actually detect any of those higher versions by searching for the
functions.



--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Вложения

win32_crash.patch

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Craig Ringer

Дата:

19 декабря 2010 г., 02:06:56

On 17/12/2010 11:17 PM, Tom Lane wrote:

>> So I'd be happy to have it enabled - given that we want a default.
>
> There is absolutely nothing more frustrating than having a crash dump
> that hasn't got the information you need.  Please turn it on by default.

There have to be limits to that, though. dbghelp.dll can dump shared 
memory, too - but it's not going to be practical to write or work with 
many-gigabyte dumps most of the time, and neither my initial patch nor 
Magnus's recent reworking of it ask windbg.dll to include shared memory 
in the dump.

It's possible to tell dbghelp.dll to include specific additional memory 
ranges, so it may well be worth including specific critical areas within 
the shared memory block while omitting the bulk of the buffers. Doing 
that safely in an unsafe we're-crashing environment might not be simple, 
though, especially if you're not even sure whether you got far into 
startup before crashing.

-- 
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Craig Ringer

Дата:

19 декабря 2010 г., 02:26:30

On 18/12/2010 1:13 AM, Magnus Hagander wrote:
> On Fri, Dec 17, 2010 at 17:42, Magnus Hagander<magnus@hagander.net>  wrote:
>> On Fri, Dec 17, 2010 at 17:24, Craig Ringer<craig@postnewspapers.com.au>  wrote:
>>> On 17/12/2010 7:17 PM, Magnus Hagander wrote:
>> Now, that's annoying. So clearly we can't use that function to
>> determine which version we're on. Seems it only works for "image help
>> api", and not the general thing.
>>
>> According to http://msdn.microsoft.com/en-us/library/ms679294(v=vs.85).aspx,
>> we could look for:
>>
>> SysEnumLines - if present, we have at least 6.1.
>>
>> However, I don't see any function that appeared in 6.0 only..
>
> Actually, I'm wrong - there are functions enough to determine the
> version. So here's a patch that tries that.

Great. I pulled the latest from your git tree, tested that, and got much 
better results. Crashdump size is back to what I expected. In my test 
code, fcinfo->args and fcinfo->argnull can be examined without problems. 
Backtraces look good; see below. It seems to be including backend 
private memory again now. Thanks _very_ much for your work on this.

fcinfo->flinfo is still inaccessible, but I suspect it's in shared 
memory, as it's at 0x00000135 . Ditto fcinfo->resultinfo and 
fcinfo->context.

This has me wondering - is it going to be necessary to dump shared 
memory to make many backtraces useful? I just responded to Tom 
mentioning that the patch doesn't currently dump shared memory, but I 
hadn't realized the extent to which it's used for _lots_ more than just 
disk buffers. I'm not sure how to handle dumping shared_buffers when 
someone might be using multi-gigabyte shared_buffers, though. Dumping 
the whole lot would risk sudden out-of-disk-space issues, slowdowns as 
dumps are written, and the backend being "frozen" as it's being dumped 
could delay the system coming back up again. Trying to selectively dump 
critical parts could cause dumps to fail if the system is in early 
startup or a bad state.

The same concern applies to writing backend private memory; it's fine 
most of the time, but if you're doing data warehousing queries with 2GB 
of work_mem, it's going to be nasty having all that extra disk I/O and 
disk space use, not to mention the hold-up while the dump is written. If 
this is something we want to have people running in production "just in 
case" or to track down rare / hard to reproduce faults, that'll be a 
problem.

OTOH, we can't really go poking around in palloc contexts to decide what 
to dump.

I guess we could always do a small, minimalist minidump, then write 
_another_ dump that attempts to include select parts of shm and backend 
private memory.

I just thought of two other things, too:

- Is it possible for this handler to be called recursively if it fails 
during the handler call? If so, do we need to uninstall the handler 
before attempting a dump to avoid such recursion? I need to do some 
testing and dig around MSDN to find out more about this.

- Can asynchronous events like signals (or their win32 emulation) 
interrupt an executing crash handler, or are they blocked before the 
crash handler is called? If they're not blocked, do we need to try to 
block them before attempting a dump? Again, I need to do some reading on 
this.


Anyway, here's an example of the backtraces I'm currently getting. 
They're clearly missing some parameters (in shm? Unsure) but provide 
source file+line, argument values where resolvable, and the call stack 
its self. Locals are accessible at all levels of the stack when you go 
poking around in windbg.

> This dump file has an exception of interest stored in it.
> The stored exception information can be accessed via .ecxr.
> (930.12e8): Access violation - code c0000005 (first/second chance not available)
> eax=00bce2c0 ebx=72d0e800 ecx=000002e4 edx=72cb81c8 esi=000000f0 edi=00000930
> eip=771464f4 esp=00bce294 ebp=00bce2a4 iopl=0         nv up ei pl zr na pe nc
> cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246
> ntdll!KiFastSystemCallRet:
> 771464f4 c3              ret
> 0:000> .ecxr
> eax=00000000 ebx=00000000 ecx=015fd7d8 edx=7362100f esi=015fd7c8 edi=015fd804
> eip=73621052 esp=00bcf284 ebp=015fd7c8 iopl=0         nv up ei pl zr na pe nc
> cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00010246
> crashme!crashdump_crashme+0x2:
> 73621052 c70001000000    mov     dword ptr [eax],1    ds:0023:00000000=????????
> 0:000> kp
>   *** Stack trace for last set context - .thread/.cxr resets it
> ChildEBP RetAddr
> 00bcf280 0031c797 crashme!crashdump_crashme(struct FunctionCallInfoData * fcinfo = 0x015e3318)+0x2
[c:\users\craig\developer\postgres\contrib\crashme\crashme.c@ 14]
 
> 00bcf2e4 0031c804 postgres!ExecMakeFunctionResult(struct FuncExprState * fcache = 0x015e3318, struct ExprContext *
econtext= 0x00319410, char * isNull = 0x00000000 "", ExprDoneCond * isDone = 0x7362100f)+0x427
[c:\users\craig\developer\postgres\src\backend\executor\execqual.c@ 1824]
 
> 00bcf30c 0031b760 postgres!ExecEvalFunc(struct FuncExprState * fcache = 0x00000000, struct ExprContext * econtext =
0x00000000,char * isNull = 0x00000000 "", ExprDoneCond * isDone = 0x00000000)+0x34
[c:\users\craig\developer\postgres\src\backend\executor\execqual.c@ 2260]
 
> 00bcf338 0031ba83 postgres!ExecTargetList(struct List * targetlist = 0x00000000, struct ExprContext * econtext =
0x00000000,unsigned int * values = 0x00000000, char * isnull = 0x00000000 "", ExprDoneCond * itemIsDone = 0x00000000,
ExprDoneCond* isDone = 0x00000000)+0x70 [c:\users\craig\developer\postgres\src\backend\executor\execqual.c @ 5095]
 
> 00bcf378 0032f074 postgres!ExecProject(struct ProjectionInfo * projInfo = 0x00000000, ExprDoneCond * isDone =
0x00000000)+0x173[c:\users\craig\developer\postgres\src\backend\executor\execqual.c @ 5312]
 
> 00bcf38c 00317e07 postgres!ExecResult(struct ResultState * node = <Memory access error>)+0x94
[c:\users\craig\developer\postgres\src\backend\executor\noderesult.c@ 157]
 
> 00bcf39c 00315ccd postgres!ExecProcNode(struct PlanState * node = <Memory access error>)+0x67
[c:\users\craig\developer\postgres\src\backend\executor\execprocnode.c@ 361]
 
> 00bcf3b0 00316ace postgres!ExecutePlan(struct EState * estate = 0x015fd7c8, struct PlanState * planstate = <Memory
accesserror>, CmdType operation = <Memory access error>, char sendTuples = <Memory access error>, long numberTuples =
<Memoryaccess error>, ScanDirection direction = NoMovementScanDirection (0n0), struct _DestReceiver * dest = <Memory
accesserror>)+0x2d [c:\users\craig\developer\postgres\src\backend\executor\execmain.c @ 1236]
 
> 00bcf3e0 0041ec5d postgres!standard_ExecutorRun(struct QueryDesc * queryDesc = <Memory access error>, ScanDirection
direction= <Memory access error>, long count = <Memory access error>)+0x8e
[c:\users\craig\developer\postgres\src\backend\executor\execmain.c@ 288]
 
> 00bcf404 0041f270 postgres!PortalRunSelect(struct PortalData * portal = 0x00000000, char forward = <Memory access
error>,long count = <Memory access error>, struct _DestReceiver * dest = <Memory access error>)+0x6d
[c:\users\craig\developer\postgres\src\backend\tcop\pquery.c@ 953]
 
> 00bcf48c 0041c292 postgres!PortalRun(struct PortalData * portal = 0x015fb5b8, long count = 0n2147483647, char
isTopLevel= 0n1 '', struct _DestReceiver * dest = 0x015e3418, struct _DestReceiver * altdest = 0x015e3418, char *
completionTag= 0x00bcf500 "")+0x190 [c:\users\craig\developer\postgres\src\backend\tcop\pquery.c @ 803]
 
> 00bcf540 0041cbc5 postgres!exec_simple_query(char * query_string = 0x015fd7d8 "???")+0x3a2
[c:\users\craig\developer\postgres\src\backend\tcop\postgres.c@ 1067]
 
> 00bcf5c4 003e2bdc postgres!PostgresMain(int argc = 0n2, char ** argv = 0x01555138, char * username = 0x00d484a0
"Craig")+0x575[c:\users\craig\developer\postgres\src\backend\tcop\postgres.c @ 3935]
 
> 00bcf5e4 003e58a9 postgres!BackendRun(struct Port * port = 0x00000000)+0x19c
[c:\users\craig\developer\postgres\src\backend\postmaster\postmaster.c@ 3562]
 
> 00bcf788 003475bc postgres!SubPostmasterMain(int argc = 0n13900471, char ** argv = 0x00d41ac5)+0x2f9
[c:\users\craig\developer\postgres\src\backend\postmaster\postmaster.c@ 4058] 
> 00bcf7a0 0051845d postgres!main(int argc = 0n1990922644, char ** argv = 0x7ffdf000)+0x1ec
[c:\users\craig\developer\postgres\src\backend\main\main.c@ 173]
 
> 00bcf7e4 76ab1194 postgres!__tmainCRTStartup(void)+0x10f [f:\dd\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 586]
> 00bcf7f0 7715b495 kernel32!BaseThreadInitThunk+0xe
> 00bcf830 7715b468 ntdll!__RtlUserThreadStart+0x70
> 00bcf848 00000000 ntdll!_RtlUserThreadStart+0x1b






-- 
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Magnus Hagander

Дата:

19 декабря 2010 г., 07:51:28

On Sun, Dec 19, 2010 at 07:26, Craig Ringer <craig@postnewspapers.com.au> wrote:
> On 18/12/2010 1:13 AM, Magnus Hagander wrote:
>>
>> On Fri, Dec 17, 2010 at 17:42, Magnus Hagander<magnus@hagander.net>
>>  wrote:
>>>
>>> On Fri, Dec 17, 2010 at 17:24, Craig Ringer<craig@postnewspapers.com.au>
>>>  wrote:
>>>>
>>>> On 17/12/2010 7:17 PM, Magnus Hagander wrote:
>>>
>>> Now, that's annoying. So clearly we can't use that function to
>>> determine which version we're on. Seems it only works for "image help
>>> api", and not the general thing.
>>>
>>> According to
>>> http://msdn.microsoft.com/en-us/library/ms679294(v=vs.85).aspx,
>>> we could look for:
>>>
>>> SysEnumLines - if present, we have at least 6.1.
>>>
>>> However, I don't see any function that appeared in 6.0 only..
>>
>> Actually, I'm wrong - there are functions enough to determine the
>> version. So here's a patch that tries that.
>
> Great. I pulled the latest from your git tree, tested that, and got much
> better results. Crashdump size is back to what I expected. In my test code,
> fcinfo->args and fcinfo->argnull can be examined without problems.
> Backtraces look good; see below. It seems to be including backend private
> memory again now. Thanks _very_ much for your work on this.

Ok, great. I think that leaves us at least complete enough to commit -
we can always improve things further as we get some more real world
testing.


> fcinfo->flinfo is still inaccessible, but I suspect it's in shared memory,
> as it's at 0x00000135 . Ditto fcinfo->resultinfo and fcinfo->context.

Hmm. Not sure why those would be in shared memory, that seems strange.


> This has me wondering - is it going to be necessary to dump shared memory to
> make many backtraces useful? I just responded to Tom mentioning that the
> patch doesn't currently dump shared memory, but I hadn't realized the extent
> to which it's used for _lots_ more than just disk buffers. I'm not sure how
> to handle dumping shared_buffers when someone might be using multi-gigabyte
> shared_buffers, though. Dumping the whole lot would risk sudden
> out-of-disk-space issues, slowdowns as dumps are written, and the backend
> being "frozen" as it's being dumped could delay the system coming back up
> again. Trying to selectively dump critical parts could cause dumps to fail
> if the system is in early startup or a bad state.

I don'tt hink a slowdown issue during dump being written is really a
problem - it's not like this is something that happens normally. In
fact, we're going to restart all the other backends after the crash
anyway, and that's going to be a lot more disruptive.

Filling up the disk, however, is a much bigger issue.. As is having a
huge dump to mail around (for error reports), if it's not actually
necessary...


> The same concern applies to writing backend private memory; it's fine most
> of the time, but if you're doing data warehousing queries with 2GB of
> work_mem, it's going to be nasty having all that extra disk I/O and disk
> space use, not to mention the hold-up while the dump is written. If this is
> something we want to have people running in production "just in case" or to
> track down rare / hard to reproduce faults, that'll be a problem.

Potential problem, yes, but not a very big one I believe. Yes, if that
very big backend crashes there will be a big dump -  but how else are
you going to find out what went wrong?

In the shared memory case, *all* dumps will be huge, not just those
that happen to use a lot of local memory.


> OTOH, we can't really go poking around in palloc contexts to decide what to
> dump.

No, that's way more complex stuff than we dare execute in a crash handler.


> I guess we could always do a small, minimalist minidump, then write
> _another_ dump that attempts to include select parts of shm and backend
> private memory.
>
> I just thought of two other things, too:
>
> - Is it possible for this handler to be called recursively if it fails
> during the handler call? If so, do we need to uninstall the handler before
> attempting a dump to avoid such recursion? I need to do some testing and dig
> around MSDN to find out more about this.

I'm pretty sure I read somewhere that it can't happen and we don't
need to do that, but I can't find any reference to it atm :-(


> - Can asynchronous events like signals (or their win32 emulation) interrupt
> an executing crash handler, or are they blocked before the crash handler is
> called? If they're not blocked, do we need to try to block them before
> attempting a dump? Again, I need to do some reading on this.

The exception handler itself is global and will run on the crashing
thread. I'm fairly certain it stops the other threads while running,
but again I can't find a reference to that right now. But ISTM it
would have to.


> Anyway, here's an example of the backtraces I'm currently getting. They're
> clearly missing some parameters (in shm? Unsure) but provide source
> file+line, argument values where resolvable, and the call stack its self.
> Locals are accessible at all levels of the stack when you go poking around
> in windbg.

Yeah, they're still very useful. Is that a release or debug build?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Magnus Hagander

Дата:

19 декабря 2010 г., 08:05:57

On Sun, Dec 19, 2010 at 12:51, Magnus Hagander <magnus@hagander.net> wrote:
> On Sun, Dec 19, 2010 at 07:26, Craig Ringer <craig@postnewspapers.com.au> wrote:
>> On 18/12/2010 1:13 AM, Magnus Hagander wrote:
>>>
>>> On Fri, Dec 17, 2010 at 17:42, Magnus Hagander<magnus@hagander.net>
>>>  wrote:
>>>>
>>>> On Fri, Dec 17, 2010 at 17:24, Craig Ringer<craig@postnewspapers.com.au>
>>>>  wrote:
>>>>>
>>>>> On 17/12/2010 7:17 PM, Magnus Hagander wrote:
>>>>
>>>> Now, that's annoying. So clearly we can't use that function to
>>>> determine which version we're on. Seems it only works for "image help
>>>> api", and not the general thing.
>>>>
>>>> According to
>>>> http://msdn.microsoft.com/en-us/library/ms679294(v=vs.85).aspx,
>>>> we could look for:
>>>>
>>>> SysEnumLines - if present, we have at least 6.1.
>>>>
>>>> However, I don't see any function that appeared in 6.0 only..
>>>
>>> Actually, I'm wrong - there are functions enough to determine the
>>> version. So here's a patch that tries that.
>>
>> Great. I pulled the latest from your git tree, tested that, and got much
>> better results. Crashdump size is back to what I expected. In my test code,
>> fcinfo->args and fcinfo->argnull can be examined without problems.
>> Backtraces look good; see below. It seems to be including backend private
>> memory again now. Thanks _very_ much for your work on this.
>
> Ok, great. I think that leaves us at least complete enough to commit -
> we can always improve things further as we get some more real world
> testing.

Actually, looking through it again just before commit, I can't help
but think the whole code about loading the DLL from different places
is unnecessary. The windows DLL search order *always* has the
directory of the EXE first, followed by either system dirs or CWD
depending on version.

Any reason not to just call LoadLibrary once?

That makes the patch look like attached.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Вложения

win32_crash.patch

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Craig Ringer

Дата:

19 декабря 2010 г., 08:57:21

On 19/12/2010 7:51 PM, Magnus Hagander wrote:

>> Great. I pulled the latest from your git tree, tested that, and got much
>> better results. Crashdump size is back to what I expected. In my test code,
>> fcinfo->args and fcinfo->argnull can be examined without problems.
>> Backtraces look good; see below. It seems to be including backend private
>> memory again now. Thanks _very_ much for your work on this.
>
> Ok, great. I think that leaves us at least complete enough to commit -
> we can always improve things further as we get some more real world
> testing.
>
>
>> fcinfo->flinfo is still inaccessible, but I suspect it's in shared memory,
>> as it's at 0x00000135 . Ditto fcinfo->resultinfo and fcinfo->context.
>
> Hmm. Not sure why those would be in shared memory, that seems strange.

OK, I'll have to do some more digging on that, then. I'm getting on a 
plane in about 2 hours, but will be bringing Visual Studio snapshots, a 
postgres git tree, an XP vm, etc with me, so time permitting I should be 
able to keep on with this.

>> Anyway, here's an example of the backtraces I'm currently getting. They're
>> clearly missing some parameters (in shm? Unsure) but provide source
>> file+line, argument values where resolvable, and the call stack its self.
>> Locals are accessible at all levels of the stack when you go poking around
>> in windbg.
>
> Yeah, they're still very useful. Is that a release or debug build?

That's a release build. I'm intentionally testing with release builds 
because I want something that's useful on real-world end-user systems, 
and I'm aware that few Windows users will build Pg for themselves.

(That said, the Perl-based build scripts are some of the least painful 
build tools I've ever worked with on Windows. Only CMake can rival them 
for low-pain Windows compilation.)

-- 
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Craig Ringer

Дата:

19 декабря 2010 г., 09:06:28

On 19/12/2010 8:05 PM, Magnus Hagander wrote:

> Actually, looking through it again just before commit, I can't help
> but think the whole code about loading the DLL from different places
> is unnecessary. The windows DLL search order *always* has the
> directory of the EXE first, followed by either system dirs or CWD
> depending on version.
>
> Any reason not to just call LoadLibrary once?

Good point. The program directory was added to the DLL search path with 
Windows XP. On any Windows version that Pg targets you can trust Windows 
to load a DLL from the app dir before system32. We don't really care 
about win2k/nt4 or 9x, where this is necessary.

It's not a security concern for the reasons already outlined in the 
comments in the patch - in brief, because Pg keeps the cwd inside the 
datadir, and if someone can write malicious files in there you have big 
problems already. If it was a security concern our fallback would be an 
attempt to load %WINDIR%\system32\dbghelp.dll by full path, rather than 
using an unqualified name to search the path.

So: I agree. We should just let Windows find dbghelp.dll on the normal 
search path; building an explicit path isn't necessary.

-- 
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Magnus Hagander

Дата:

19 декабря 2010 г., 09:11:07

On Sun, Dec 19, 2010 at 13:57, Craig Ringer <craig@postnewspapers.com.au> wrote:
> On 19/12/2010 7:51 PM, Magnus Hagander wrote:
>
>>> Great. I pulled the latest from your git tree, tested that, and got much
>>> better results. Crashdump size is back to what I expected. In my test
>>> code,
>>> fcinfo->args and fcinfo->argnull can be examined without problems.
>>> Backtraces look good; see below. It seems to be including backend private
>>> memory again now. Thanks _very_ much for your work on this.
>>
>> Ok, great. I think that leaves us at least complete enough to commit -
>> we can always improve things further as we get some more real world
>> testing.
>>
>>
>>> fcinfo->flinfo is still inaccessible, but I suspect it's in shared
>>> memory,
>>> as it's at 0x00000135 . Ditto fcinfo->resultinfo and fcinfo->context.
>>
>> Hmm. Not sure why those would be in shared memory, that seems strange.
>
> OK, I'll have to do some more digging on that, then. I'm getting on a plane
> in about 2 hours, but will be bringing Visual Studio snapshots, a postgres
> git tree, an XP vm, etc with me, so time permitting I should be able to keep
> on with this.

Ok. I still think what we have now is a great improvement over
nothing. But improvements on top of it is obviously always good :-)


>>> Anyway, here's an example of the backtraces I'm currently getting.
>>> They're
>>> clearly missing some parameters (in shm? Unsure) but provide source
>>> file+line, argument values where resolvable, and the call stack its self.
>>> Locals are accessible at all levels of the stack when you go poking
>>> around
>>> in windbg.
>>
>> Yeah, they're still very useful. Is that a release or debug build?
>
> That's a release build. I'm intentionally testing with release builds
> because I want something that's useful on real-world end-user systems, and
> I'm aware that few Windows users will build Pg for themselves.

Good - I just wanted to be sure that's what you were testing as well.


> (That said, the Perl-based build scripts are some of the least painful build
> tools I've ever worked with on Windows. Only CMake can rival them for
> low-pain Windows compilation.)

:-) Thanks! They're not quite that painless to maintain, but it's not *too* bad.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Magnus Hagander

Дата:

19 декабря 2010 г., 11:49:28

On Sun, Dec 19, 2010 at 14:06, Craig Ringer <craig@postnewspapers.com.au> wrote:
> On 19/12/2010 8:05 PM, Magnus Hagander wrote:
>
>> Actually, looking through it again just before commit, I can't help
>> but think the whole code about loading the DLL from different places
>> is unnecessary. The windows DLL search order *always* has the
>> directory of the EXE first, followed by either system dirs or CWD
>> depending on version.
>>
>> Any reason not to just call LoadLibrary once?
>
> Good point. The program directory was added to the DLL search path with
> Windows XP. On any Windows version that Pg targets you can trust Windows to
> load a DLL from the app dir before system32. We don't really care about
> win2k/nt4 or 9x, where this is necessary.
>
> It's not a security concern for the reasons already outlined in the comments
> in the patch - in brief, because Pg keeps the cwd inside the datadir, and if
> someone can write malicious files in there you have big problems already. If
> it was a security concern our fallback would be an attempt to load
> %WINDIR%\system32\dbghelp.dll by full path, rather than using an unqualified
> name to search the path.
>
> So: I agree. We should just let Windows find dbghelp.dll on the normal
> search path; building an explicit path isn't necessary.

I've committed the version that does this.

Thanks!

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Tom Lane

Дата:

19 декабря 2010 г., 12:02:58

Magnus Hagander <magnus@hagander.net> writes:
> On Sun, Dec 19, 2010 at 07:26, Craig Ringer <craig@postnewspapers.com.au> wrote:
>> fcinfo->flinfo is still inaccessible, but I suspect it's in shared memory,
>> as it's at 0x00000135 . Ditto fcinfo->resultinfo and fcinfo->context.

> Hmm. Not sure why those would be in shared memory, that seems strange.

FWIW, I don't believe that theory either.  Offhand I cannot think of any
case where an FmgrInfo would be in shared memory.  It seems more likely
from the above that you've misidentified where the fcinfo record is.

There are some code paths where we don't bother to set up those fields,
but I think we're always careful to set them to NULL if so.
        regards, tom lane

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Tom Lane

Дата:

19 декабря 2010 г., 12:08:55

Craig Ringer <craig@postnewspapers.com.au> writes:
> On 17/12/2010 11:17 PM, Tom Lane wrote:
>> There is absolutely nothing more frustrating than having a crash dump
>> that hasn't got the information you need.  Please turn it on by default.

> There have to be limits to that, though. dbghelp.dll can dump shared 
> memory, too - but it's not going to be practical to write or work with 
> many-gigabyte dumps most of the time, and neither my initial patch nor 
> Magnus's recent reworking of it ask windbg.dll to include shared memory 
> in the dump.

Well, actually my comment above is based directly on pain in working
with core dumps on Unix-oid systems that don't include shared memory.
Linux *does* include shared memory, and I have not noticed that that
was a problem.  Of course, in development installations I don't try to
raise shared_buffers to the moon ...

> It's possible to tell dbghelp.dll to include specific additional memory 
> ranges, so it may well be worth including specific critical areas within 
> the shared memory block while omitting the bulk of the buffers. Doing 
> that safely in an unsafe we're-crashing environment might not be simple, 
> though, especially if you're not even sure whether you got far into 
> startup before crashing.

I agree that having the crash dump code know anything specific about the
contents of shared memory is a nonstarter --- far too fragile.  But
perhaps we could have some simple rule based only on what the kernel
knows about the shmem block, like "dump shmem if it's no more than 1GB".
        regards, tom lane

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Robert Haas

Дата:

19 декабря 2010 г., 12:28:12

On Sun, Dec 19, 2010 at 11:07 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I agree that having the crash dump code know anything specific about the
> contents of shared memory is a nonstarter --- far too fragile.  But
> perhaps we could have some simple rule based only on what the kernel
> knows about the shmem block, like "dump shmem if it's no more than 1GB".

Not sure what knobs we have available, but would there be any value in
trying to dump the FIRST 1GB?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Magnus Hagander

Дата:

19 декабря 2010 г., 12:39:41

On Sun, Dec 19, 2010 at 17:28, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Dec 19, 2010 at 11:07 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I agree that having the crash dump code know anything specific about the
>> contents of shared memory is a nonstarter --- far too fragile.  But
>> perhaps we could have some simple rule based only on what the kernel
>> knows about the shmem block, like "dump shmem if it's no more than 1GB".
>
> Not sure what knobs we have available, but would there be any value in
> trying to dump the FIRST 1GB?

So such knob that I can find. Basically, we pick a combination of the flags at

http://msdn.microsoft.com/en-us/library/ms680519(v=VS.85).aspx

We could send it as a "user stream", i guess, but it won't be
available in the debugger at the same address space then - just as a
blob of data to process manually. I doubt it's worth it in that
case...

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Tom Lane

Дата:

19 декабря 2010 г., 12:42:16

Robert Haas <robertmhaas@gmail.com> writes:
> On Sun, Dec 19, 2010 at 11:07 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I agree that having the crash dump code know anything specific about the
>> contents of shared memory is a nonstarter --- far too fragile. �But
>> perhaps we could have some simple rule based only on what the kernel
>> knows about the shmem block, like "dump shmem if it's no more than 1GB".

> Not sure what knobs we have available, but would there be any value in
> trying to dump the FIRST 1GB?

Probably not.
        regards, tom lane

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Tom Lane

Дата:

19 декабря 2010 г., 12:47:32

Magnus Hagander <magnus@hagander.net> writes:
> On Sun, Dec 19, 2010 at 17:28, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Sun, Dec 19, 2010 at 11:07 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> perhaps we could have some simple rule based only on what the kernel
>>> knows about the shmem block, like "dump shmem if it's no more than 1GB".

>> Not sure what knobs we have available, but would there be any value in
>> trying to dump the FIRST 1GB?

> So such knob that I can find. Basically, we pick a combination of the flags at
> http://msdn.microsoft.com/en-us/library/ms680519(v=VS.85).aspx

Yeah, I thought I was probably asking for something that couldn't
practically be supported.  Let's try it as-is for a few releases
and see how it goes.
        regards, tom lane

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Peter Eisentraut

Дата:

10 июля 2011 г., 17:24:07

On tor, 2010-12-16 at 15:16 +0100, Magnus Hagander wrote:
> Found another problem in it: when running with an older version of
> dbghelp.dll (which I was), it simply didn't work. We need to grab the
> version of dbghelp.dll at runtime and pick which things we're going to
> dump based on that.
> 
> The attached version of the patch does that. Can you please test that
> it still generates the full dump on your system, which supposedly has
> a much more modern version of dbghelp.dll than mine?

I was going through the GetLastError() calls to unify the printf
formats, as discussed, and I stumbled across this:

+               write_stderr("could not write crash dump to %s: error code %08x\n",
+                            dumpPath, GetLastError());

Is there a reason why this uses that particular format, unlike all other
call sites?

Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

От

Craig Ringer

Дата:

10 июля 2011 г., 19:36:35

On 11/07/2011 4:23 AM, Peter Eisentraut wrote:
> I was going through the GetLastError() calls to unify the printf
> formats, as discussed, and I stumbled across this:
>
> +               write_stderr("could not write crash dump to %s: error code %08x\n",
> +                            dumpPath, GetLastError());
>
> Is there a reason why this uses that particular format, unlike all other
> call sites?
>

Other call site usages include:
 errmsg_internal("could not create signal event: %d", (int) GetLastError()) write_stderr("could not create signal
listenerpipe: error code %d; 

retrying\n", (int) GetLastError())

etc, so I presume you're referring to the use of %08x as a format 
specifier rather than %d ? The only reason is that ntstatus errors are 
typically reported this way - but, really, not all possible errors 
arising there would be ntstatus errors. As it's not consistent with the 
rest of Pg I don't see any reason not to change it to %d.

--
Craig Ringer

POST Newspapers
276 Onslow Rd, Shenton Park
Ph: 08 9381 3088     Fax: 08 9388 2258
ABN: 50 008 917 717
http://www.postnewspapers.com.au/

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Proposed Windows-specific change: Enable crash dumps (like core files)

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения