Re: PATCH: backtraces for error messages

Поиск
Список
Период
Сортировка
От Korry Douglas
Тема Re: PATCH: backtraces for error messages
Дата
Msg-id CAPqDkTOiDQ+9qMVEgba-eZJrygKTQNjzGbuS_Q1axivg0H7RUw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: PATCH: backtraces for error messages  (Craig Ringer <craig@2ndquadrant.com>)
Список pgsql-hackers


On Sat, Jun 23, 2018 at 8:22 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
On 22 June 2018 at 23:26, Korry Douglas <korry.douglas@enterprisedb.com> wrote:
A few misc comments:

In general, +1.

It might be nice to move the stack trace code into a separate function (not static to elog.c) - I have often found it useful to attach backtraces to objects so I can debug complex allocation/deallocation problems.  

Good suggestion.
 

The major expense in capturing a stack trace is in the symbolization of the stack addresses, not the stack walk itself.  You typically want to symbolize the addresses at the time you generate the trace, but that's not always the case.  For example, if you want to record the stack trace of every call to AllocSetAlloc() (and attach the trace to the AllocChunk) performance gets unbearable if you symbolize every trace.  So a flag that specifies whether to symbolize would be nice too. 

libunwind has some nifty caching that makes that a _lot_ cheaper; that's part of why I went with it despite the extra lib requirement.

I've not used libunwind before - looking through the docs now.  It does seem much more flexible than backtrace(), thanks.
 
 
If you don't symbolize, you need a way to capture the address range of each dynamically loaded shared object (so that you can later symbolize using something like addr2line).

Yeah. libunwind doesn't expose any interface to get that information, so you'd have to use platform APIs, like glibc's dl_iterate_phdr or dladdr, or capture /proc/self/maps. You have to make sure that info makes it to the log, and is re-output often enough not to be lost to log rotation, and is invalidated and re-output if mappings change due to new lib loads etc. But you don't want to print it all the time either.

Then you have to walk the stack and print the instruction pointers and stack pointers and spit out raw traces for the user to reassemble.

I'd argue that if you're doing the sort of thing where you want a stack of every AllocSetAlloc, you shouldn't be trying to do that in-process. That's where tools like perf, systemtap, etc really come into their own. I'm focused on making additional diagnostic info for errors and key log messages collectable for systems that aren't easily accessed, like users who have 12-hour read/response latencies and security setups as strict as they are insane and nonsensical.

Understood - I realized after I replied that instrumenting AllocSetAlloc() is a bit more complex than I had suggested.  When I need to capture the call stack of each allocation, I store the stack trace in malloc'ed buffers, otherwise I get recursive (in AllocSetAlloc()).  I suspect that's unreasonable for a general-purpose solution.

I'd have no objection to the option to disable symbolic back traces and print the raw call stack. It's trivial to do; in fact, I only removed the ip/sp info to keep the output more concise.

I'm not interested in writing anything to handle the library load address mapping collection etc though. I don't see a really sensible way to do that in a log-based system.

Agreed - I tend to use (saved, not logged) stack traces to find memory leaks that valgrind can't see (or more precisely, excessive resource consumption that is not actually leaked).

Thanks for your consideration.

     -- Korry

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kuntal Ghosh
Дата:
Сообщение: Re: Incorrect fsync handling in pg_basebackup's tar_finish
Следующее
От: Mark Dilger
Дата:
Сообщение: Re: Auto-partitioning in PostgreSQL 10