Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Дата
Msg-id 4D0DA580.1000009@postnewspapers.com.au
обсуждение исходный текст
Ответ на Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)  (Magnus Hagander <magnus@hagander.net>)
Ответы Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)  (Magnus Hagander <magnus@hagander.net>)
Список pgsql-hackers
On 18/12/2010 1:13 AM, Magnus Hagander wrote:
> On Fri, Dec 17, 2010 at 17:42, Magnus Hagander<magnus@hagander.net>  wrote:
>> On Fri, Dec 17, 2010 at 17:24, Craig Ringer<craig@postnewspapers.com.au>  wrote:
>>> On 17/12/2010 7:17 PM, Magnus Hagander wrote:
>> Now, that's annoying. So clearly we can't use that function to
>> determine which version we're on. Seems it only works for "image help
>> api", and not the general thing.
>>
>> According to http://msdn.microsoft.com/en-us/library/ms679294(v=vs.85).aspx,
>> we could look for:
>>
>> SysEnumLines - if present, we have at least 6.1.
>>
>> However, I don't see any function that appeared in 6.0 only..
>
> Actually, I'm wrong - there are functions enough to determine the
> version. So here's a patch that tries that.

Great. I pulled the latest from your git tree, tested that, and got much 
better results. Crashdump size is back to what I expected. In my test 
code, fcinfo->args and fcinfo->argnull can be examined without problems. 
Backtraces look good; see below. It seems to be including backend 
private memory again now. Thanks _very_ much for your work on this.

fcinfo->flinfo is still inaccessible, but I suspect it's in shared 
memory, as it's at 0x00000135 . Ditto fcinfo->resultinfo and 
fcinfo->context.

This has me wondering - is it going to be necessary to dump shared 
memory to make many backtraces useful? I just responded to Tom 
mentioning that the patch doesn't currently dump shared memory, but I 
hadn't realized the extent to which it's used for _lots_ more than just 
disk buffers. I'm not sure how to handle dumping shared_buffers when 
someone might be using multi-gigabyte shared_buffers, though. Dumping 
the whole lot would risk sudden out-of-disk-space issues, slowdowns as 
dumps are written, and the backend being "frozen" as it's being dumped 
could delay the system coming back up again. Trying to selectively dump 
critical parts could cause dumps to fail if the system is in early 
startup or a bad state.

The same concern applies to writing backend private memory; it's fine 
most of the time, but if you're doing data warehousing queries with 2GB 
of work_mem, it's going to be nasty having all that extra disk I/O and 
disk space use, not to mention the hold-up while the dump is written. If 
this is something we want to have people running in production "just in 
case" or to track down rare / hard to reproduce faults, that'll be a 
problem.

OTOH, we can't really go poking around in palloc contexts to decide what 
to dump.

I guess we could always do a small, minimalist minidump, then write 
_another_ dump that attempts to include select parts of shm and backend 
private memory.

I just thought of two other things, too:

- Is it possible for this handler to be called recursively if it fails 
during the handler call? If so, do we need to uninstall the handler 
before attempting a dump to avoid such recursion? I need to do some 
testing and dig around MSDN to find out more about this.

- Can asynchronous events like signals (or their win32 emulation) 
interrupt an executing crash handler, or are they blocked before the 
crash handler is called? If they're not blocked, do we need to try to 
block them before attempting a dump? Again, I need to do some reading on 
this.


Anyway, here's an example of the backtraces I'm currently getting. 
They're clearly missing some parameters (in shm? Unsure) but provide 
source file+line, argument values where resolvable, and the call stack 
its self. Locals are accessible at all levels of the stack when you go 
poking around in windbg.

> This dump file has an exception of interest stored in it.
> The stored exception information can be accessed via .ecxr.
> (930.12e8): Access violation - code c0000005 (first/second chance not available)
> eax=00bce2c0 ebx=72d0e800 ecx=000002e4 edx=72cb81c8 esi=000000f0 edi=00000930
> eip=771464f4 esp=00bce294 ebp=00bce2a4 iopl=0         nv up ei pl zr na pe nc
> cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246
> ntdll!KiFastSystemCallRet:
> 771464f4 c3              ret
> 0:000> .ecxr
> eax=00000000 ebx=00000000 ecx=015fd7d8 edx=7362100f esi=015fd7c8 edi=015fd804
> eip=73621052 esp=00bcf284 ebp=015fd7c8 iopl=0         nv up ei pl zr na pe nc
> cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00010246
> crashme!crashdump_crashme+0x2:
> 73621052 c70001000000    mov     dword ptr [eax],1    ds:0023:00000000=????????
> 0:000> kp
>   *** Stack trace for last set context - .thread/.cxr resets it
> ChildEBP RetAddr
> 00bcf280 0031c797 crashme!crashdump_crashme(struct FunctionCallInfoData * fcinfo = 0x015e3318)+0x2
[c:\users\craig\developer\postgres\contrib\crashme\crashme.c@ 14]
 
> 00bcf2e4 0031c804 postgres!ExecMakeFunctionResult(struct FuncExprState * fcache = 0x015e3318, struct ExprContext *
econtext= 0x00319410, char * isNull = 0x00000000 "", ExprDoneCond * isDone = 0x7362100f)+0x427
[c:\users\craig\developer\postgres\src\backend\executor\execqual.c@ 1824]
 
> 00bcf30c 0031b760 postgres!ExecEvalFunc(struct FuncExprState * fcache = 0x00000000, struct ExprContext * econtext =
0x00000000,char * isNull = 0x00000000 "", ExprDoneCond * isDone = 0x00000000)+0x34
[c:\users\craig\developer\postgres\src\backend\executor\execqual.c@ 2260]
 
> 00bcf338 0031ba83 postgres!ExecTargetList(struct List * targetlist = 0x00000000, struct ExprContext * econtext =
0x00000000,unsigned int * values = 0x00000000, char * isnull = 0x00000000 "", ExprDoneCond * itemIsDone = 0x00000000,
ExprDoneCond* isDone = 0x00000000)+0x70 [c:\users\craig\developer\postgres\src\backend\executor\execqual.c @ 5095]
 
> 00bcf378 0032f074 postgres!ExecProject(struct ProjectionInfo * projInfo = 0x00000000, ExprDoneCond * isDone =
0x00000000)+0x173[c:\users\craig\developer\postgres\src\backend\executor\execqual.c @ 5312]
 
> 00bcf38c 00317e07 postgres!ExecResult(struct ResultState * node = <Memory access error>)+0x94
[c:\users\craig\developer\postgres\src\backend\executor\noderesult.c@ 157]
 
> 00bcf39c 00315ccd postgres!ExecProcNode(struct PlanState * node = <Memory access error>)+0x67
[c:\users\craig\developer\postgres\src\backend\executor\execprocnode.c@ 361]
 
> 00bcf3b0 00316ace postgres!ExecutePlan(struct EState * estate = 0x015fd7c8, struct PlanState * planstate = <Memory
accesserror>, CmdType operation = <Memory access error>, char sendTuples = <Memory access error>, long numberTuples =
<Memoryaccess error>, ScanDirection direction = NoMovementScanDirection (0n0), struct _DestReceiver * dest = <Memory
accesserror>)+0x2d [c:\users\craig\developer\postgres\src\backend\executor\execmain.c @ 1236]
 
> 00bcf3e0 0041ec5d postgres!standard_ExecutorRun(struct QueryDesc * queryDesc = <Memory access error>, ScanDirection
direction= <Memory access error>, long count = <Memory access error>)+0x8e
[c:\users\craig\developer\postgres\src\backend\executor\execmain.c@ 288]
 
> 00bcf404 0041f270 postgres!PortalRunSelect(struct PortalData * portal = 0x00000000, char forward = <Memory access
error>,long count = <Memory access error>, struct _DestReceiver * dest = <Memory access error>)+0x6d
[c:\users\craig\developer\postgres\src\backend\tcop\pquery.c@ 953]
 
> 00bcf48c 0041c292 postgres!PortalRun(struct PortalData * portal = 0x015fb5b8, long count = 0n2147483647, char
isTopLevel= 0n1 '', struct _DestReceiver * dest = 0x015e3418, struct _DestReceiver * altdest = 0x015e3418, char *
completionTag= 0x00bcf500 "")+0x190 [c:\users\craig\developer\postgres\src\backend\tcop\pquery.c @ 803]
 
> 00bcf540 0041cbc5 postgres!exec_simple_query(char * query_string = 0x015fd7d8 "???")+0x3a2
[c:\users\craig\developer\postgres\src\backend\tcop\postgres.c@ 1067]
 
> 00bcf5c4 003e2bdc postgres!PostgresMain(int argc = 0n2, char ** argv = 0x01555138, char * username = 0x00d484a0
"Craig")+0x575[c:\users\craig\developer\postgres\src\backend\tcop\postgres.c @ 3935]
 
> 00bcf5e4 003e58a9 postgres!BackendRun(struct Port * port = 0x00000000)+0x19c
[c:\users\craig\developer\postgres\src\backend\postmaster\postmaster.c@ 3562]
 
> 00bcf788 003475bc postgres!SubPostmasterMain(int argc = 0n13900471, char ** argv = 0x00d41ac5)+0x2f9
[c:\users\craig\developer\postgres\src\backend\postmaster\postmaster.c@ 4058] 
> 00bcf7a0 0051845d postgres!main(int argc = 0n1990922644, char ** argv = 0x7ffdf000)+0x1ec
[c:\users\craig\developer\postgres\src\backend\main\main.c@ 173]
 
> 00bcf7e4 76ab1194 postgres!__tmainCRTStartup(void)+0x10f [f:\dd\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 586]
> 00bcf7f0 7715b495 kernel32!BaseThreadInitThunk+0xe
> 00bcf830 7715b468 ntdll!__RtlUserThreadStart+0x70
> 00bcf848 00000000 ntdll!_RtlUserThreadStart+0x1b






-- 
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/


В списке pgsql-hackers по дате отправления:

Предыдущее
От: flyusa2010 fly
Дата:
Сообщение: can shared cache be swapped to disk?
Следующее
От: flyusa2010 fly
Дата:
Сообщение: Can postgres create a file with physically continuous blocks.