Обсуждение: Printing backtrace of postgres processes

Поиск
Список
Период
Сортировка

Printing backtrace of postgres processes

От
vignesh C
Дата:
Hi,

I would like to propose getting the callstack of the postgres process
by connecting to the server. This helps us in diagnosing the problems
from a customer environment in case of hung process or in case of long
running process.
The idea here is to implement & expose pg_print_callstack function,
internally what this function does is, the connected backend will send
SIGUSR1 signal by setting PMSIGNAL_BACKTRACE_EMIT to the postmaster
process. Postmaster process will send a SIGUSR1 signal to the process
by setting PROCSIG_BACKTRACE_PRINT if the process has access to
ProcSignal. As syslogger process & Stats process don't have access to
ProcSignal, multiplexing with SIGUSR1 is not possible for these
processes, hence SIGUSR2 signal will be sent for these processes. Once
the process receives this signal it will log the backtrace of the
process.
Attached is a WIP patch for the same.
Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: Printing backtrace of postgres processes

От
Tom Lane
Дата:
vignesh C <vignesh21@gmail.com> writes:
> The idea here is to implement & expose pg_print_callstack function,
> internally what this function does is, the connected backend will send
> SIGUSR1 signal by setting PMSIGNAL_BACKTRACE_EMIT to the postmaster
> process. Postmaster process will send a SIGUSR1 signal to the process
> by setting PROCSIG_BACKTRACE_PRINT if the process has access to
> ProcSignal. As syslogger process & Stats process don't have access to
> ProcSignal, multiplexing with SIGUSR1 is not possible for these
> processes, hence SIGUSR2 signal will be sent for these processes. Once
> the process receives this signal it will log the backtrace of the
> process.

Surely this is *utterly* unsafe.  You can't do that sort of stuff in
a signal handler.

It might be all right to set a flag that would cause the next
CHECK_FOR_INTERRUPTS to print a backtrace, but I'm not sure
how useful that really is.

The proposed postmaster.c addition seems quite useless, as there
is exactly one stack trace it could ever log.

I would like to see some discussion of the security implications
of such a feature, as well.  ("There aren't any" is the wrong
answer.)

            regards, tom lane



Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Sun, Nov 22, 2020 at 11:55 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> vignesh C <vignesh21@gmail.com> writes:
> > The idea here is to implement & expose pg_print_callstack function,
> > internally what this function does is, the connected backend will send
> > SIGUSR1 signal by setting PMSIGNAL_BACKTRACE_EMIT to the postmaster
> > process. Postmaster process will send a SIGUSR1 signal to the process
> > by setting PROCSIG_BACKTRACE_PRINT if the process has access to
> > ProcSignal. As syslogger process & Stats process don't have access to
> > ProcSignal, multiplexing with SIGUSR1 is not possible for these
> > processes, hence SIGUSR2 signal will be sent for these processes. Once
> > the process receives this signal it will log the backtrace of the
> > process.
>
> Surely this is *utterly* unsafe.  You can't do that sort of stuff in
> a signal handler.
>
> It might be all right to set a flag that would cause the next
> CHECK_FOR_INTERRUPTS to print a backtrace, but I'm not sure
> how useful that really is.
>
> The proposed postmaster.c addition seems quite useless, as there
> is exactly one stack trace it could ever log.
>
> I would like to see some discussion of the security implications
> of such a feature, as well.  ("There aren't any" is the wrong
> answer.)

Hi Hackers,

Any thoughts on the security implication for this feature.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com



Re: Printing backtrace of postgres processes

От
Craig Ringer
Дата:
> Surely this is *utterly* unsafe.  You can't do that sort of stuff in
> a signal handler.

Not safely, anyway. The signal handler could be called in the middle
of a malloc(), a pfree(), or all sorts of other exciting
circumstances. It'd have to be extremely careful to use only local
resources on the stack and I don't see how that's feasible here.

It'll work - most of the time. But it could explode messily and
excitingly just when you actually need it to work properly, which is
rarely what you want from features clearly intended for production
debugging.

(It'd be interesting to add some test infrastructure that helps us
fire signal handlers at awkward times in a controlled manner, so we
could predictably test signal handler re-entrancy and concurrency
behaviour.)

> It might be all right to set a flag that would cause the next
> CHECK_FOR_INTERRUPTS to print a backtrace, but I'm not sure
> how useful that really is.

I find that when I most often want a backtrace of a running, live
backend, it's because the backend is doing something that isn't
passing a CHECK_FOR_INTERRUPTS() so it's not responding to signals. So
it wouldn't help if a backend is waiting on an LWLock, busy in a
blocking call to some loaded library, a blocking syscall, etc. But
there are enough other times I want live backtraces, and I'm not the
only one whose needs matter.

It'd be kinda handy when collecting samples of some backend's activity
when it's taking an excessively long time doing something
indeterminate. I generally use a gdb script for that because
unfortunately the Linux trace tool situation is so hopeless that I
can't rely on perf or systemtap being present, working, and
understanding the same command line arguments across various distros
and versions, so something built-in would be convenient. That
convenience would probably be counterbalanced by the hassle of
extracting the results from the log files unless it's possible to use
a function in one backend to query the stack in another instead of
just writing it to the log.

So a weak +1 from me, assuming printing stacks from
CHECK_FOR_INTERRUPTS() to the log. We already have most of the
infrastructure for that so the change is trivial, and we already trust
anyone who can read the log, so it seems like a pretty low-cost,
low-risk change.

Somewhat more interested favour if the results can be obtained from a
function or view from another backend, but then the security
implications get a bit more exciting if we let non-superusers do it.

You may recall that I wanted to do something similar a while ago in
order to request MemoryContextStats() without needing to attach gdb
and invoke a function manually using ptrace(). Also changes to support
reading TCP socket state for a process. So I find this sort of thing
useful in general.

If we're querying one backend from another we could read its stack
with ptrace() and unwind it with libunwind within the requesting
backend, which would be a whole lot safer to execute and would work
fine even when blocked in syscalls or synchronous library calls. See
the eu-stack command from elfutils. If the target backend had shared
libraries loaded that the requested backend didn't, libunwind could
load the debuginfo for us if available. The downsides would be that
many system lockdowns disable ptrace() for non-root even for
same-user-owned processes, and that we'd need to depend on libunwind
(or other platform equivalents) so it'd probably be a contrib. In
which case you have to wonder if it's that much better than running
`system("eu-stack $pid")` in plperl or a trivial custom C extension
function.

> I would like to see some discussion of the security implications
> of such a feature, as well.  ("There aren't any" is the wrong
> answer.)

If the stack only goes to the log, I actually don't think there are
significant security implications beyond those we already have with
our existing backtrace printing features. We already trust anyone who
can read the log almost completely, and we can already emit stacks to
the log. But I'd still want it to be gated superuser-only, or a role
that's GRANTable by superuser only by default, since it exposes
arbitrary internals of the server.

"same user id" matching is not sufficient. A less-privileged session
user might be calling into SECURITY DEFINER code, code from a
privileged view, or sensitive C library code. Checking for the
effective user is no better because the effective user might be less
privileged at the moment the bt is requested, but the state up-stack
might reveal sensitive information from a more privileged user.



Re: Printing backtrace of postgres processes

От
Tom Lane
Дата:
Craig Ringer <craig.ringer@enterprisedb.com> writes:
>> I would like to see some discussion of the security implications
>> of such a feature, as well.  ("There aren't any" is the wrong
>> answer.)

> If the stack only goes to the log, I actually don't think there are
> significant security implications beyond those we already have with
> our existing backtrace printing features. We already trust anyone who
> can read the log almost completely, and we can already emit stacks to
> the log. But I'd still want it to be gated superuser-only, or a role
> that's GRANTable by superuser only by default, since it exposes
> arbitrary internals of the server.

The concerns that I had were that the patch as submitted provides a
mechanism that causes ALL processes in the system to dump backtraces,
not a targeted request; and that it allows any user to issue such
requests at an unbounded rate.  That seems like a really easy route
to denial of service.  There's also a question of whether you'd even
get intelligible results from dozens of processes simultaneously
dumping many-line messages to the same place.  (This might work out
all right if you're using our syslogger, but it probably would not
with any other logging technology.)

I'd feel better about it if the mechanism had you specify exactly
one target process, and were restricted to a superuser requestor.

I'm not excited about adding on frammishes like letting one process
extract another's stack trace.  I think that just adds more points
of failure, which is a bad thing in a feature that you're only going
to care about when things are a bit pear-shaped already.

            regards, tom lane



Re: Printing backtrace of postgres processes

От
Craig Ringer
Дата:
On Tue, 1 Dec 2020 at 07:04, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I'd feel better about it if the mechanism had you specify exactly
> one target process, and were restricted to a superuser requestor.

Er, rather. I actually assumed the former was the case already, not
having looked closely yet.



Re: Printing backtrace of postgres processes

От
Andres Freund
Дата:
Hi,

On 2020-11-22 01:25:08 -0500, Tom Lane wrote:
> Surely this is *utterly* unsafe.  You can't do that sort of stuff in
> a signal handler.

That's of course true for the current implementation - but I don't think
it's a fundamental constraint. With a bit of care backtrace() and
backtrace_symbols() itself can be signal safe:

> backtrace()  and  backtrace_symbols_fd()  don't  call  malloc() explicitly, but they are part of libgcc, which gets
loadeddynamically when first
 
> used.  Dynamic loading usually triggers a call to malloc(3).  If you need certain calls to these two functions to not
allocatememory (in  signal
 
> handlers, for example), you need to make sure libgcc is loaded beforehand.

It should be quite doable to emit such backtraces directly to stderr,
instead of using appendStringInfoString()/elog(). Or even use a static
buffer.

It does have quite some appeal to be able to debug production workloads
where queries can't be cancelled etc. And knowing that backtraces
reliably work in case of SIGQUIT etc is also nice...


> I would like to see some discussion of the security implications
> of such a feature, as well.  ("There aren't any" is the wrong
> answer.)

+1

Greetings,

Andres Freund



Re: Printing backtrace of postgres processes

От
Andres Freund
Дата:
Hi,

On 2020-11-30 13:35:46 +0800, Craig Ringer wrote:
> I find that when I most often want a backtrace of a running, live
> backend, it's because the backend is doing something that isn't
> passing a CHECK_FOR_INTERRUPTS() so it's not responding to signals. So
> it wouldn't help if a backend is waiting on an LWLock, busy in a
> blocking call to some loaded library, a blocking syscall, etc. But
> there are enough other times I want live backtraces, and I'm not the
> only one whose needs matter.

Random thought: Wonder if it could be worth adding a conditionally
compiled mode where we track what the longest time between two
CHECK_FOR_INTERRUPTS() calls is (with some extra logic for client
IO).

Obviously the regression tests don't tend to hit the worst cases of
CFR() less code, but even if they did, we currently wouldn't know from
running the regression tests.


Greetings,

Andres Freund



Re: Printing backtrace of postgres processes

От
Craig Ringer
Дата:
On Tue, 1 Dec 2020 at 11:31, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2020-11-30 13:35:46 +0800, Craig Ringer wrote:
> > I find that when I most often want a backtrace of a running, live
> > backend, it's because the backend is doing something that isn't
> > passing a CHECK_FOR_INTERRUPTS() so it's not responding to signals. So
> > it wouldn't help if a backend is waiting on an LWLock, busy in a
> > blocking call to some loaded library, a blocking syscall, etc. But
> > there are enough other times I want live backtraces, and I'm not the
> > only one whose needs matter.
>
> Random thought: Wonder if it could be worth adding a conditionally
> compiled mode where we track what the longest time between two
> CHECK_FOR_INTERRUPTS() calls is (with some extra logic for client
> IO).
>
> Obviously the regression tests don't tend to hit the worst cases of
> CFR() less code, but even if they did, we currently wouldn't know from
> running the regression tests.

We can probably determine that just as well with a perf or systemtap
run on an --enable-dtrace build. Just tag CHECK_FOR_INTERRUPTS() with
a SDT marker then record the timings.

It might be convenient to have it built-in I guess, but if we tag the
site and do the timing/tracing externally we don't have to bother
about conditional compilation and special builds.



Re: Printing backtrace of postgres processes

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> It should be quite doable to emit such backtraces directly to stderr,
> instead of using appendStringInfoString()/elog().

No, please no.

(1) On lots of logging setups (think syslog), anything that goes to
stderr is just going to wind up in the bit bucket.  I realize that
we have that issue already for memory context dumps on OOM errors,
but that doesn't make it a good thing.

(2) You couldn't really write "to stderr", only to fileno(stderr),
creating issues about interleaving of the output with regular stderr
output.  For instance it's quite likely that the backtrace would
appear before stderr output that had actually been emitted earlier,
which'd be tremendously confusing.

(3) This isn't going to do anything good for my concerns about interleaved
output from different processes, either.

            regards, tom lane



Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Tue, Dec 1, 2020 at 9:31 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Andres Freund <andres@anarazel.de> writes:
> > It should be quite doable to emit such backtraces directly to stderr,
> > instead of using appendStringInfoString()/elog().
>
> No, please no.
>
> (1) On lots of logging setups (think syslog), anything that goes to
> stderr is just going to wind up in the bit bucket.  I realize that
> we have that issue already for memory context dumps on OOM errors,
> but that doesn't make it a good thing.
>
> (2) You couldn't really write "to stderr", only to fileno(stderr),
> creating issues about interleaving of the output with regular stderr
> output.  For instance it's quite likely that the backtrace would
> appear before stderr output that had actually been emitted earlier,
> which'd be tremendously confusing.
>
> (3) This isn't going to do anything good for my concerns about interleaved
> output from different processes, either.
>

I felt if we are not agreeing on logging on the stderr, even using
static buffer we might not be able to log as
send_message_to_server_log calls appendStringInfo. I felt that doing
it from CHECK_FOR_INTERRUPTS may be better.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com



Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Tue, Dec 1, 2020 at 2:15 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Tue, Dec 1, 2020 at 9:31 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Andres Freund <andres@anarazel.de> writes:
> > > It should be quite doable to emit such backtraces directly to stderr,
> > > instead of using appendStringInfoString()/elog().
> >
> > No, please no.
> >
> > (1) On lots of logging setups (think syslog), anything that goes to
> > stderr is just going to wind up in the bit bucket.  I realize that
> > we have that issue already for memory context dumps on OOM errors,
> > but that doesn't make it a good thing.
> >
> > (2) You couldn't really write "to stderr", only to fileno(stderr),
> > creating issues about interleaving of the output with regular stderr
> > output.  For instance it's quite likely that the backtrace would
> > appear before stderr output that had actually been emitted earlier,
> > which'd be tremendously confusing.
> >
> > (3) This isn't going to do anything good for my concerns about interleaved
> > output from different processes, either.
> >
>
> I felt if we are not agreeing on logging on the stderr, even using
> static buffer we might not be able to log as
> send_message_to_server_log calls appendStringInfo. I felt that doing
> it from CHECK_FOR_INTERRUPTS may be better.
>

I have implemented printing of backtrace based on handling it in
CHECK_FOR_INTERRUPTS. This patch also includes the change to allow
getting backtrace of any particular process based on the suggestions.
Attached patch has the implementation for the same.
Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: Printing backtrace of postgres processes

От
Peter Eisentraut
Дата:
On 2020-12-08 10:38, vignesh C wrote:
> I have implemented printing of backtrace based on handling it in
> CHECK_FOR_INTERRUPTS. This patch also includes the change to allow
> getting backtrace of any particular process based on the suggestions.
> Attached patch has the implementation for the same.
> Thoughts?

Are we willing to use up a signal for this?



Re: Printing backtrace of postgres processes

От
Andres Freund
Дата:
On 2021-01-15 09:53:05 +0100, Peter Eisentraut wrote:
> On 2020-12-08 10:38, vignesh C wrote:
> > I have implemented printing of backtrace based on handling it in
> > CHECK_FOR_INTERRUPTS. This patch also includes the change to allow
> > getting backtrace of any particular process based on the suggestions.
> > Attached patch has the implementation for the same.
> > Thoughts?
> 
> Are we willing to use up a signal for this?

Why is a full signal needed? Seems the procsignal infrastructure should
suffice?



Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Sat, Jan 16, 2021 at 1:40 AM Andres Freund <andres@anarazel.de> wrote:
>
> On 2021-01-15 09:53:05 +0100, Peter Eisentraut wrote:
> > On 2020-12-08 10:38, vignesh C wrote:
> > > I have implemented printing of backtrace based on handling it in
> > > CHECK_FOR_INTERRUPTS. This patch also includes the change to allow
> > > getting backtrace of any particular process based on the suggestions.
> > > Attached patch has the implementation for the same.
> > > Thoughts?
> >
> > Are we willing to use up a signal for this?
>
> Why is a full signal needed? Seems the procsignal infrastructure should
> suffice?

Most of the processes have access to ProcSignal, for these processes
printing of callstack signal was handled by using ProcSignal. Pgstat
process & syslogger process do not have access to ProcSignal,
multiplexing with SIGUSR1 is not possible for these processes. So I
handled the printing of callstack for pgstat process & syslogger using
the SIGUSR2 signal.
This is because shared memory is detached before pgstat & syslogger
process is started by using the below:
/* Drop our connection to postmaster's shared memory, as well */
dsm_detach_all();
PGSharedMemoryDetach();

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com



Re: Printing backtrace of postgres processes

От
"Andres Freund"
Дата:
Hi,

On Sat, Jan 16, 2021, at 09:34, vignesh C wrote:
> On Sat, Jan 16, 2021 at 1:40 AM Andres Freund <andres@anarazel.de> wrote:
> >
> > On 2021-01-15 09:53:05 +0100, Peter Eisentraut wrote:
> > > On 2020-12-08 10:38, vignesh C wrote:
> > > > I have implemented printing of backtrace based on handling it in
> > > > CHECK_FOR_INTERRUPTS. This patch also includes the change to allow
> > > > getting backtrace of any particular process based on the suggestions.
> > > > Attached patch has the implementation for the same.
> > > > Thoughts?
> > >
> > > Are we willing to use up a signal for this?
> >
> > Why is a full signal needed? Seems the procsignal infrastructure should
> > suffice?
> 
> Most of the processes have access to ProcSignal, for these processes
> printing of callstack signal was handled by using ProcSignal. Pgstat
> process & syslogger process do not have access to ProcSignal,
> multiplexing with SIGUSR1 is not possible for these processes. So I
> handled the printing of callstack for pgstat process & syslogger using
> the SIGUSR2 signal.
> This is because shared memory is detached before pgstat & syslogger
> process is started by using the below:
> /* Drop our connection to postmaster's shared memory, as well */
> dsm_detach_all();
> PGSharedMemoryDetach();

Sure. But why is it important enough to support those that we are willing to dedicate a signal to the task? Their
backtracesaren't often interesting, so I think we should just ignore them here.
 

Andres



Re: Printing backtrace of postgres processes

От
Tom Lane
Дата:
vignesh C <vignesh21@gmail.com> writes:
> On Sat, Jan 16, 2021 at 1:40 AM Andres Freund <andres@anarazel.de> wrote:
>> Why is a full signal needed? Seems the procsignal infrastructure should
>> suffice?

> Most of the processes have access to ProcSignal, for these processes
> printing of callstack signal was handled by using ProcSignal. Pgstat
> process & syslogger process do not have access to ProcSignal,
> multiplexing with SIGUSR1 is not possible for these processes. So I
> handled the printing of callstack for pgstat process & syslogger using
> the SIGUSR2 signal.

I'd argue that backtraces for those processes aren't really essential,
and indeed that trying to make the syslogger report its own backtrace
is damn dangerous.

(Personally, I think this whole patch fails the safety-vs-usefulness
tradeoff, but I expect I'll get shouted down.)

            regards, tom lane



Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Sat, Jan 16, 2021 at 11:10 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On Sat, Jan 16, 2021, at 09:34, vignesh C wrote:
> > On Sat, Jan 16, 2021 at 1:40 AM Andres Freund <andres@anarazel.de> wrote:
> > >
> > > On 2021-01-15 09:53:05 +0100, Peter Eisentraut wrote:
> > > > On 2020-12-08 10:38, vignesh C wrote:
> > > > > I have implemented printing of backtrace based on handling it in
> > > > > CHECK_FOR_INTERRUPTS. This patch also includes the change to allow
> > > > > getting backtrace of any particular process based on the suggestions.
> > > > > Attached patch has the implementation for the same.
> > > > > Thoughts?
> > > >
> > > > Are we willing to use up a signal for this?
> > >
> > > Why is a full signal needed? Seems the procsignal infrastructure should
> > > suffice?
> >
> > Most of the processes have access to ProcSignal, for these processes
> > printing of callstack signal was handled by using ProcSignal. Pgstat
> > process & syslogger process do not have access to ProcSignal,
> > multiplexing with SIGUSR1 is not possible for these processes. So I
> > handled the printing of callstack for pgstat process & syslogger using
> > the SIGUSR2 signal.
> > This is because shared memory is detached before pgstat & syslogger
> > process is started by using the below:
> > /* Drop our connection to postmaster's shared memory, as well */
> > dsm_detach_all();
> > PGSharedMemoryDetach();
>
> Sure. But why is it important enough to support those that we are willing to dedicate a signal to the task? Their
backtracesaren't often interesting, so I think we should just ignore them here.
 

Thanks for your comments Andres, I will ignore it for the processes
which do not have access to ProcSignal. I will make the changes and
post a patch for this soon.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com



Re: Printing backtrace of postgres processes

От
Craig Ringer
Дата:


On Mon, 18 Jan 2021 at 00:56, vignesh C <vignesh21@gmail.com> wrote:

Thanks for your comments Andres, I will ignore it for the processes
which do not have access to ProcSignal. I will make the changes and
post a patch for this soon.

I think that's sensible.

I've had variable results with glibc's backtrace(), especially on older platforms and/or with external debuginfo, but it's much better than nothing. It's often not feasible to get someone to install gdb and run commands on their production systems - they can be isolated and firewalled or hobbled by painful change policies. Something basic built-in to postgres, even if basic, is likely to come in very handy.

Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Sun, Jan 17, 2021 at 10:26 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Sat, Jan 16, 2021 at 11:10 PM Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
> >
> > On Sat, Jan 16, 2021, at 09:34, vignesh C wrote:
> > > On Sat, Jan 16, 2021 at 1:40 AM Andres Freund <andres@anarazel.de> wrote:
> > > >
> > > > On 2021-01-15 09:53:05 +0100, Peter Eisentraut wrote:
> > > > > On 2020-12-08 10:38, vignesh C wrote:
> > > > > > I have implemented printing of backtrace based on handling it in
> > > > > > CHECK_FOR_INTERRUPTS. This patch also includes the change to allow
> > > > > > getting backtrace of any particular process based on the suggestions.
> > > > > > Attached patch has the implementation for the same.
> > > > > > Thoughts?
> > > > >
> > > > > Are we willing to use up a signal for this?
> > > >
> > > > Why is a full signal needed? Seems the procsignal infrastructure should
> > > > suffice?
> > >
> > > Most of the processes have access to ProcSignal, for these processes
> > > printing of callstack signal was handled by using ProcSignal. Pgstat
> > > process & syslogger process do not have access to ProcSignal,
> > > multiplexing with SIGUSR1 is not possible for these processes. So I
> > > handled the printing of callstack for pgstat process & syslogger using
> > > the SIGUSR2 signal.
> > > This is because shared memory is detached before pgstat & syslogger
> > > process is started by using the below:
> > > /* Drop our connection to postmaster's shared memory, as well */
> > > dsm_detach_all();
> > > PGSharedMemoryDetach();
> >
> > Sure. But why is it important enough to support those that we are willing to dedicate a signal to the task? Their
backtracesaren't often interesting, so I think we should just ignore them here.
 
>
> Thanks for your comments Andres, I will ignore it for the processes
> which do not have access to ProcSignal. I will make the changes and
> post a patch for this soon.
>

The attached patch has the fix for this.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: Printing backtrace of postgres processes

От
Robert Haas
Дата:
On Sat, Jan 16, 2021 at 3:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'd argue that backtraces for those processes aren't really essential,
> and indeed that trying to make the syslogger report its own backtrace
> is damn dangerous.

I agree. Ideally I'd like to be able to use the same mechanism
everywhere and include those processes too, but surely regular
backends and parallel workers are going to be the things that come up
most often.

> (Personally, I think this whole patch fails the safety-vs-usefulness
> tradeoff, but I expect I'll get shouted down.)

You and I are frequently on opposite sides of these kinds of
questions, but I think this is a closer call than many cases. I'm
convinced that it's useful, but I'm not sure whether it's safe. On the
usefulness side, backtraces are often the only way to troubleshoot
problems that occur on production systems. I wish we had better
logging and tracing tools instead of having to ask for this sort of
thing, but we don't. EDB support today frequently asks customers to
attach gdb and take a backtrace that way, and that has risks which
this implementation does not: for example, suppose you were unlucky
enough to attach during a spinlock protected critical section, and
suppose you didn't continue the stopped process before the 60 second
timeout expired and some other process caused a PANIC. Even if this
implementation were to end up emitting a backtrace with a spinlock
held, it would remove the risk of leaving the process stopped while
holding a critical lock, and would in that sense be safer. However, as
soon as you make something like this accessible via an SQL callable
function, some people are going to start spamming it. And, as soon as
they do that, any risks inherent in the implementation are multiplied.
If it carries an 0.01% chance of crashing the system, we'll have
people taking production systems down with this all the time. At that
point I wouldn't want the feature, even if the gdb approach had the
same risk (which I don't think it does).

What do you see as the main safety risks here?

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Printing backtrace of postgres processes

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Sat, Jan 16, 2021 at 3:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> (Personally, I think this whole patch fails the safety-vs-usefulness
>> tradeoff, but I expect I'll get shouted down.)

> What do you see as the main safety risks here?

The thing that is scaring me the most is the "broadcast" aspect.
For starters, I think that that is going to be useless in the
field because of the likelihood that different backends' stack
traces will get interleaved in whatever log file the traces are
going to.  Also, having hundreds of processes spitting dozens of
lines to the same place at the same time is going to expose any
little weaknesses in your logging arrangements, such as rsyslog's
tendency to drop messages under load.

I think it's got security hazards as well.  If we restricted the
feature to cause a trace of only one process at a time, and required
that process to be logged in as the same database user as the
requestor (or at least someone with the privs of that user, such
as a superuser), that'd be less of an issue.

Beyond that, well, maybe it's all right.  In theory anyplace that
there's a CHECK_FOR_INTERRUPTS should be okay to call elog from;
but it's completely untested whether we can do that and then
continue, as opposed to aborting the transaction or whole session.
I share your estimate that there'll be small-fraction-of-a-percent
hazards that could still add up to dangerous instability if people
go wild with this.

            regards, tom lane



Re: Printing backtrace of postgres processes

От
Robert Haas
Дата:
On Tue, Jan 19, 2021 at 12:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The thing that is scaring me the most is the "broadcast" aspect.
> For starters, I think that that is going to be useless in the
> field because of the likelihood that different backends' stack
> traces will get interleaved in whatever log file the traces are
> going to.  Also, having hundreds of processes spitting dozens of
> lines to the same place at the same time is going to expose any
> little weaknesses in your logging arrangements, such as rsyslog's
> tendency to drop messages under load.

+1. I don't think broadcast is a good idea.

> I think it's got security hazards as well.  If we restricted the
> feature to cause a trace of only one process at a time, and required
> that process to be logged in as the same database user as the
> requestor (or at least someone with the privs of that user, such
> as a superuser), that'd be less of an issue.

I am not sure I see a security hazard here. I think the checks for
this should have the same structure as pg_terminate_backend() and
pg_cancel_backend(); whatever is required there should be required
here, too, but not more, unless we have a real clear reason for such
an inconsistency.

> Beyond that, well, maybe it's all right.  In theory anyplace that
> there's a CHECK_FOR_INTERRUPTS should be okay to call elog from;
> but it's completely untested whether we can do that and then
> continue, as opposed to aborting the transaction or whole session.

I guess that's a theoretical risk but it doesn't seem very likely.
And, if we do have such a problem, I think that'd probably be a case
of bad code that we would want to fix either way.

> I share your estimate that there'll be small-fraction-of-a-percent
> hazards that could still add up to dangerous instability if people
> go wild with this.

Right. I was more concerned about whether we could, for example, crash
while inside the function that generates the backtrace, on some
platforms or in some scenarios. That would be super-sad. I assume that
the people who wrote the code tried to make sure that didn't happen
but I don't really know what's reasonable to expect.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Printing backtrace of postgres processes

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jan 19, 2021 at 12:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think it's got security hazards as well.  If we restricted the
>> feature to cause a trace of only one process at a time, and required
>> that process to be logged in as the same database user as the
>> requestor (or at least someone with the privs of that user, such
>> as a superuser), that'd be less of an issue.

> I am not sure I see a security hazard here. I think the checks for
> this should have the same structure as pg_terminate_backend() and
> pg_cancel_backend(); whatever is required there should be required
> here, too, but not more, unless we have a real clear reason for such
> an inconsistency.

Yeah, agreed.  The "broadcast" option seems inconsistent with doing
things that way, but I don't have a problem with being able to send
a trace signal to the same processes you could terminate.

>> I share your estimate that there'll be small-fraction-of-a-percent
>> hazards that could still add up to dangerous instability if people
>> go wild with this.

> Right. I was more concerned about whether we could, for example, crash
> while inside the function that generates the backtrace, on some
> platforms or in some scenarios. That would be super-sad. I assume that
> the people who wrote the code tried to make sure that didn't happen
> but I don't really know what's reasonable to expect.

Recursion is scary, but it should (I think) not be possible if this
is driven off CHECK_FOR_INTERRUPTS.  There will certainly be none
of those in libbacktrace.

One point here is that it might be a good idea to suppress elog.c's
calls to functions in the error context stack.  As we saw in another
connection recently, allowing that to happen makes for a *very*
large increase in the footprint of code that you are expecting to
work at any random CHECK_FOR_INTERRUPTS call site.

BTW, it also looks like the patch is doing nothing to prevent the
backtrace from being sent to the connected client.  I'm not sure
what I think about whether it'd be okay from a security standpoint
to do that on the connection that requested the trace, but I sure
as heck don't want it to happen on connections that didn't.  Also,
whatever you think about security concerns, it seems like there'd be
pretty substantial reentrancy hazards if the interrupt occurs
anywhere in code dealing with normal client communication.

Maybe, given all of these things, we should forget using elog at
all and just emit the trace with fprintf(stderr).  That seems like
it would decrease the odds of trouble by about an order of magnitude.
It would make it more painful to collect the trace in some logging
setups, of course.

            regards, tom lane



Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Wed, Jan 20, 2021 at 2:52 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Tue, Jan 19, 2021 at 12:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> I think it's got security hazards as well.  If we restricted the
> >> feature to cause a trace of only one process at a time, and required
> >> that process to be logged in as the same database user as the
> >> requestor (or at least someone with the privs of that user, such
> >> as a superuser), that'd be less of an issue.
>
> > I am not sure I see a security hazard here. I think the checks for
> > this should have the same structure as pg_terminate_backend() and
> > pg_cancel_backend(); whatever is required there should be required
> > here, too, but not more, unless we have a real clear reason for such
> > an inconsistency.
>
> Yeah, agreed.  The "broadcast" option seems inconsistent with doing
> things that way, but I don't have a problem with being able to send
> a trace signal to the same processes you could terminate.
>

The current patch supports both getting the trace for all the
processes of that instance and getting the trace for a particular
process, I'm understanding the concern here with broadcasting to all
the connected backends, I will remove the functionality to get trace
for all the processes. And I will also change the trace of a single
process in such a way that the user can get the trace of only the
processes which that user could terminate.

> >> I share your estimate that there'll be small-fraction-of-a-percent
> >> hazards that could still add up to dangerous instability if people
> >> go wild with this.
>
> > Right. I was more concerned about whether we could, for example, crash
> > while inside the function that generates the backtrace, on some
> > platforms or in some scenarios. That would be super-sad. I assume that
> > the people who wrote the code tried to make sure that didn't happen
> > but I don't really know what's reasonable to expect.
>
> Recursion is scary, but it should (I think) not be possible if this
> is driven off CHECK_FOR_INTERRUPTS.  There will certainly be none
> of those in libbacktrace.
>
> One point here is that it might be a good idea to suppress elog.c's
> calls to functions in the error context stack.  As we saw in another
> connection recently, allowing that to happen makes for a *very*
> large increase in the footprint of code that you are expecting to
> work at any random CHECK_FOR_INTERRUPTS call site.
>
> BTW, it also looks like the patch is doing nothing to prevent the
> backtrace from being sent to the connected client.  I'm not sure
> what I think about whether it'd be okay from a security standpoint
> to do that on the connection that requested the trace, but I sure
> as heck don't want it to happen on connections that didn't.  Also,
> whatever you think about security concerns, it seems like there'd be
> pretty substantial reentrancy hazards if the interrupt occurs
> anywhere in code dealing with normal client communication.
>
> Maybe, given all of these things, we should forget using elog at
> all and just emit the trace with fprintf(stderr).  That seems like
> it would decrease the odds of trouble by about an order of magnitude.
> It would make it more painful to collect the trace in some logging
> setups, of course.

I would prefer if the trace appears in the log file rather on the
console, as you rightly pointed out it will be difficult to collect
the trace of fprint(stderr). Let me know if I misunderstood. I think
you are concerned about the problem where elog logs the trace to the
client also. Can we use LOG_SERVER_ONLY log level that would prevent
it from logging to the client. That way we can keep the elog
implementation itself.

Thoughts?


Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com



Re: Printing backtrace of postgres processes

От
Craig Ringer
Дата:
On Wed, 20 Jan 2021 at 01:31, Robert Haas <robertmhaas@gmail.com> wrote:
On Sat, Jan 16, 2021 at 3:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'd argue that backtraces for those processes aren't really essential,
> and indeed that trying to make the syslogger report its own backtrace
> is damn dangerous.

I agree. Ideally I'd like to be able to use the same mechanism
everywhere and include those processes too, but surely regular
backends and parallel workers are going to be the things that come up
most often.

> (Personally, I think this whole patch fails the safety-vs-usefulness
> tradeoff, but I expect I'll get shouted down.)

You and I are frequently on opposite sides of these kinds of
questions, but I think this is a closer call than many cases. I'm
convinced that it's useful, but I'm not sure whether it's safe. On the
usefulness side, backtraces are often the only way to troubleshoot
problems that occur on production systems. I wish we had better
logging and tracing tools instead of having to ask for this sort of
thing, but we don't.

Agreed.

In theory we should be able to do this sort of thing using external trace and diagnostic tools like perf, systemtap, etc. In practice, these tools tend to be quite version-sensitive and hard to get right without multiple rounds of back-and-forth to deal with specifics of the site's setup, installed debuginfo or lack thereof, specific tool versions, etc.

It's quite common to have to fall back on attaching gdb with a breakpoint on a function in the export symbol table (so it works w/o debuginfo), saving a core, and then analysing the core on a separate system on which debuginfo is available for all the loaded modules. It's a major pain.

The ability to get a basic bt from within Pg is strongly desirable. IIRC gdb's basic unwinder works without external debuginfo, if not especially well. libunwind produces much better results, but that didn't pass the extra-dependency bar when backtracing support was introduced to core postgres.

On a side note, to help get better diagnostics I've also been meaning to look into building --enable-debug with -ggdb3 so we can embed macro info, and using dwz to deduplicate+compress the debuginfo so we can encourage people to install it by default on production. I also want to start exporting pointers to all the important data symbols for diagnostic use, even if we do so in a separate ELF section just for debug use.

Re: Printing backtrace of postgres processes

От
Craig Ringer
Дата:
On Wed, 20 Jan 2021 at 05:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Recursion is scary, but it should (I think) not be possible if this
is driven off CHECK_FOR_INTERRUPTS.  There will certainly be none
of those in libbacktrace.

We can also hold interrupts for the call, and it might be wise to do so.

One point here is that it might be a good idea to suppress elog.c's
calls to functions in the error context stack.  As we saw in another
connection recently, allowing that to happen makes for a *very*
large increase in the footprint of code that you are expecting to
work at any random CHECK_FOR_INTERRUPTS call site.

I strongly agree. Treat it as errhidecontext().

BTW, it also looks like the patch is doing nothing to prevent the
backtrace from being sent to the connected client.  I'm not sure
what I think about whether it'd be okay from a security standpoint
to do that on the connection that requested the trace, but I sure
as heck don't want it to happen on connections that didn't.

I don't see a good reason to send a bt to a client. Even though these backtraces won't be analysing debuginfo and populating args, locals, etc, it should still just go to the server log.
 
Maybe, given all of these things, we should forget using elog at
all and just emit the trace with fprintf(stderr).

That causes quite a lot of pain with MemoryContextStats() already as it's frequently difficult to actually locate the output given the variations that exist in customer logging configurations. Sometimes stderr goes to a separate file or to journald. It's also much harder to locate the desired output since there's no log_line_prefix. I have a WIP patch floating around somewhere that tries to teach MemoryContextStats to write to the ereport channel when not called during an actual out-of-memory situation for that reason; an early version is somewhere in the archives.

This is one of those "ok in development, painful in production" situations.

So I'm not a big fan of pushing it to stderr, though I'd rather have that than not have the ability at all.

Re: Printing backtrace of postgres processes

От
Tom Lane
Дата:
Craig Ringer <craig.ringer@enterprisedb.com> writes:
> On Wed, 20 Jan 2021 at 05:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> BTW, it also looks like the patch is doing nothing to prevent the
>> backtrace from being sent to the connected client.

> I don't see a good reason to send a bt to a client. Even though these
> backtraces won't be analysing debuginfo and populating args, locals, etc,
> it should still just go to the server log.

Yeah.  That's easier than I was thinking, we just need to
s/LOG/LOG_SERVER_ONLY/.

>> Maybe, given all of these things, we should forget using elog at
>> all and just emit the trace with fprintf(stderr).

> That causes quite a lot of pain with MemoryContextStats() already

True.  Given the changes discussed in the last couple messages, I don't
see any really killer reasons why we can't ship the trace through elog.
We can always try that first, and back off to fprintf if we do find
reasons why it's too unstable.

            regards, tom lane



Re: Printing backtrace of postgres processes

От
Craig Ringer
Дата:
On Thu, 21 Jan 2021 at 09:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Craig Ringer <craig.ringer@enterprisedb.com> writes:
> On Wed, 20 Jan 2021 at 05:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> BTW, it also looks like the patch is doing nothing to prevent the
>> backtrace from being sent to the connected client.

> I don't see a good reason to send a bt to a client. Even though these
> backtraces won't be analysing debuginfo and populating args, locals, etc,
> it should still just go to the server log.

Yeah.  That's easier than I was thinking, we just need to
s/LOG/LOG_SERVER_ONLY/.

>> Maybe, given all of these things, we should forget using elog at
>> all and just emit the trace with fprintf(stderr).

> That causes quite a lot of pain with MemoryContextStats() already

True.  Given the changes discussed in the last couple messages, I don't
see any really killer reasons why we can't ship the trace through elog.
We can always try that first, and back off to fprintf if we do find
reasons why it's too unstable.

Yep, works for me.

Thanks for being open to considering this.

I know lots of this stuff can seem like a pointless sidetrack because the utility of it is not obvious on dev systems or when you're doing your own hands-on expert support on systems you own and operate yourself. These sorts of things really only start to make sense when you're touching many different postgres systems "in the wild" - such as commercial support services, helping people on -general, -bugs or stackoverflow, etc.

I really appreciate your help with it.

Re: Printing backtrace of postgres processes

От
Robert Haas
Дата:
On Wed, Jan 20, 2021 at 9:24 PM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:
> I know lots of this stuff can seem like a pointless sidetrack because the utility of it is not obvious on dev systems
orwhen you're doing your own hands-on expert support on systems you own and operate yourself. These sorts of things
reallyonly start to make sense when you're touching many different postgres systems "in the wild" - such as commercial
supportservices, helping people on -general, -bugs or stackoverflow, etc. 
>
> I really appreciate your help with it.

Big +1 for all that from me, too.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Thu, Jan 21, 2021 at 7:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Craig Ringer <craig.ringer@enterprisedb.com> writes:
> > On Wed, 20 Jan 2021 at 05:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> BTW, it also looks like the patch is doing nothing to prevent the
> >> backtrace from being sent to the connected client.
>
> > I don't see a good reason to send a bt to a client. Even though these
> > backtraces won't be analysing debuginfo and populating args, locals, etc,
> > it should still just go to the server log.
>
> Yeah.  That's easier than I was thinking, we just need to
> s/LOG/LOG_SERVER_ONLY/.
>
> >> Maybe, given all of these things, we should forget using elog at
> >> all and just emit the trace with fprintf(stderr).
>
> > That causes quite a lot of pain with MemoryContextStats() already
>
> True.  Given the changes discussed in the last couple messages, I don't
> see any really killer reasons why we can't ship the trace through elog.
> We can always try that first, and back off to fprintf if we do find
> reasons why it's too unstable.
>

Thanks all of them for the suggestions. Attached v3 patch which has
fixes based on the suggestions. It includes the following fixes: 1)
Removal of support to get callstack of all postgres process, user can
get only one process callstack. 2) Update the documentation. 3) Added
necessary checks for pg_print_callstack similar to
pg_terminate_backend. 4) Changed LOG to LOG_SERVER_ONLY.
Thoughts?

Regards,
Vignesh

Вложения

Re: Printing backtrace of postgres processes

От
Andres Freund
Дата:
Hi,

On 2021-01-27 19:05:16 +0530, vignesh C wrote:

>  /*
> + * LogBackTrace
> + *
> + * Get the backtrace and log the backtrace to log file.
> + */
> +void
> +LogBackTrace(void)
> +{
> +    int            save_errno = errno;
> +
> +    void       *buf[100];
> +    int            nframes;
> +    char      **strfrms;
> +    StringInfoData errtrace;
> +
> +    /* OK to process messages.  Reset the flag saying there are more to do. */
> +    PrintBacktracePending = false;

ISTM that it'd be better to do this in the caller, allowing this
function to be used outside of signal triggered backtraces.

>  
> +extern void LogBackTrace(void); /* Called from EmitProcSignalPrintCallStack */

I don't think this comment is correct anymore?

Greetings,

Andres Freund



Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Wed, Jan 27, 2021 at 10:40 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2021-01-27 19:05:16 +0530, vignesh C wrote:
>
> >  /*
> > + * LogBackTrace
> > + *
> > + * Get the backtrace and log the backtrace to log file.
> > + */
> > +void
> > +LogBackTrace(void)
> > +{
> > +     int                     save_errno = errno;
> > +
> > +     void       *buf[100];
> > +     int                     nframes;
> > +     char      **strfrms;
> > +     StringInfoData errtrace;
> > +
> > +     /* OK to process messages.  Reset the flag saying there are more to do. */
> > +     PrintBacktracePending = false;
>
> ISTM that it'd be better to do this in the caller, allowing this
> function to be used outside of signal triggered backtraces.
>
> >
> > +extern void LogBackTrace(void); /* Called from EmitProcSignalPrintCallStack */
>
> I don't think this comment is correct anymore?

Thanks for the comments, I have fixed and attached an updated patch
with the fixes for the same.
Thoughts?

Regards,
Vignesh

Вложения

Re: Printing backtrace of postgres processes

От
Bharath Rupireddy
Дата:
On Thu, Jan 28, 2021 at 5:22 PM vignesh C <vignesh21@gmail.com> wrote:
> Thanks for the comments, I have fixed and attached an updated patch
> with the fixes for the same.
> Thoughts?

Thanks for the patch. Here are few comments:

1) I think it's return SIGNAL_BACKEND_SUCCESS; instead of return 0; in
check_valid_pid?

2) How about following in pg_signal_backend for more readability
+    if (ret != SIGNAL_BACKEND_SUCCESS)
+        return ret;
instead of
+    if (ret)
+        return ret;

3) How about validate_backend_pid or some better name instead of
check_valid_pid?

4) How about following
+                     errmsg("must be a superuser to print backtrace
of backend process")));
instead of
+                     errmsg("must be a superuser to print backtrace
of superuser query process")));

5) How about following
                     errmsg("must be a member of the role whose backed
process's backtrace is being printed or member of
pg_signal_backend")));
instead of
+                     errmsg("must be a member of the role whose
backtrace is being logged or member of pg_signal_backend")));

6) I'm not sure whether "backtrace" or "call stack" is a generic term
from the user/developer perspective. In the patch, the function name
and documentation says callstack(I think it is "call stack" actually),
but the error/warning messages says backtrace. IMHO, having
"backtrace" everywhere in the patch, even the function name changed to
pg_print_backtrace, looks better and consistent. Thoughts?

7) How about following in pg_print_callstack?
{
    int            bt_pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
    bool        result = false;

        if (r == SIGNAL_BACKEND_SUCCESS)
        {
            if (EmitProcSignalPrintCallStack(bt_pid))
                result = true;
            else
                ereport(WARNING,
                        (errmsg("failed to send signal to postmaster: %m")));
        }

        PG_RETURN_BOOL(result);
}

8) How about following
+                (errmsg("backtrace generation is not supported by
this PostgresSQL installation")));
instead of
+                (errmsg("backtrace generation is not supported by
this installation")));

9) Typo - it's "example" +2) Using "addr2line -e postgres address", For exmple:

10) How about
+ * Handle print backtrace signal
instead of
+ * Handle receipt of an print backtrace.

11) Isn't below in documentation specific to Linux platform. What
happens if GDB is not there on the platform?
+<programlisting>
+1)  "info line *address" from gdb on postgres executable. For example:
+gdb ./postgres
+GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7

12) +The callstack will be logged in the log file. What happens if the
server is started without a log file , ./pg_ctl -D data start? Where
will the backtrace go?

13) Not sure, if it's an overkill, but how about pg_print_callstack
returning a warning/notice along with true, which just says, "See
<<<full log file name along with log directory>>>". Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
Thanks Bharath for your review comments. Please find my comments inline below.

On Thu, Jan 28, 2021 at 7:40 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Jan 28, 2021 at 5:22 PM vignesh C <vignesh21@gmail.com> wrote:
> > Thanks for the comments, I have fixed and attached an updated patch
> > with the fixes for the same.
> > Thoughts?
>
> Thanks for the patch. Here are few comments:
>
> 1) I think it's return SIGNAL_BACKEND_SUCCESS; instead of return 0; in
> check_valid_pid?
>

I did not want to use SIGNAL_BACKEND_SUCCESS as we have not yet
signalled the backend process at this time. I have added
BACKEND_VALIDATION_SUCCESS macro and used it here for better
readability.

> 2) How about following in pg_signal_backend for more readability
> +    if (ret != SIGNAL_BACKEND_SUCCESS)
> +        return ret;
> instead of
> +    if (ret)
> +        return ret;
>

Modified it to ret != BACKEND_VALIDATION_SUCCESS

> 3) How about validate_backend_pid or some better name instead of
> check_valid_pid?
>

Modified it to validate_backend_pid

> 4) How about following
> +                     errmsg("must be a superuser to print backtrace
> of backend process")));
> instead of
> +                     errmsg("must be a superuser to print backtrace
> of superuser query process")));
>

Here the message should include superuser, we cannot remove it. Non
super user can log non super user provided if user has permissions for
it.

> 5) How about following
>                      errmsg("must be a member of the role whose backed
> process's backtrace is being printed or member of
> pg_signal_backend")));
> instead of
> +                     errmsg("must be a member of the role whose
> backtrace is being logged or member of pg_signal_backend")));
>

Modified it.

> 6) I'm not sure whether "backtrace" or "call stack" is a generic term
> from the user/developer perspective. In the patch, the function name
> and documentation says callstack(I think it is "call stack" actually),
> but the error/warning messages says backtrace. IMHO, having
> "backtrace" everywhere in the patch, even the function name changed to
> pg_print_backtrace, looks better and consistent. Thoughts?
>

Modified it to pg_print_backtrace.

> 7) How about following in pg_print_callstack?
> {
>     int            bt_pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
>     bool        result = false;
>
>         if (r == SIGNAL_BACKEND_SUCCESS)
>         {
>             if (EmitProcSignalPrintCallStack(bt_pid))
>                 result = true;
>             else
>                 ereport(WARNING,
>                         (errmsg("failed to send signal to postmaster: %m")));
>         }
>
>         PG_RETURN_BOOL(result);
> }
>

Modified similarly with slight change.

> 8) How about following
> +                (errmsg("backtrace generation is not supported by
> this PostgresSQL installation")));
> instead of
> +                (errmsg("backtrace generation is not supported by
> this installation")));
>

I used the existing message to maintain consistency with
set_backtrace. I feel we can keep it the same.

> 9) Typo - it's "example" +2) Using "addr2line -e postgres address", For exmple:
>

Modified it.

> 10) How about
> + * Handle print backtrace signal
> instead of
> + * Handle receipt of an print backtrace.
>

I used the existing message to maintain consistency similar to
HandleProcSignalBarrierInterrupt. I feel we can keep it the same.

> 11) Isn't below in documentation specific to Linux platform. What
> happens if GDB is not there on the platform?
> +<programlisting>
> +1)  "info line *address" from gdb on postgres executable. For example:
> +gdb ./postgres
> +GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
>

I have made changes "You can get the file name and line number by
using gdb/addr2line in linux platforms, as a prerequisite users must
ensure gdb/addr2line is already installed".

User will get an error like this in windows:
select pg_print_backtrace(pg_backend_pid());
WARNING:  backtrace generation is not supported by this installation
 pg_print_callstack
--------------------
 f
(1 row)

The backtrace will not be logged in case of windows, it will throw a
warning "backtrace generation is not supported by this installation"
Thoughts?

> 12) +The callstack will be logged in the log file. What happens if the
> server is started without a log file , ./pg_ctl -D data start? Where
> will the backtrace go?
>

Updated to: The backtrace will be logged to the log file if logging is
enabled, if logging is disabled backtrace will be logged to the
console where the postmaster was started.

> 13) Not sure, if it's an overkill, but how about pg_print_callstack
> returning a warning/notice along with true, which just says, "See
> <<<full log file name along with log directory>>>". Thoughts?

As you rightly pointed out it will be an overkill, I feel the existing
is easily understandable.

Attached v5 patch has the fixes for the same.
Thoughts?

Regards,
Vignesh

Вложения

Re: Printing backtrace of postgres processes

От
Bharath Rupireddy
Дата:
On Fri, Jan 29, 2021 at 7:10 PM vignesh C <vignesh21@gmail.com> wrote:
> > 4) How about following
> > +                     errmsg("must be a superuser to print backtrace
> > of backend process")));
> > instead of
> > +                     errmsg("must be a superuser to print backtrace
> > of superuser query process")));
> >
>
> Here the message should include superuser, we cannot remove it. Non
> super user can log non super user provided if user has permissions for
> it.
>
> > 5) How about following
> >                      errmsg("must be a member of the role whose backed
> > process's backtrace is being printed or member of
> > pg_signal_backend")));
> > instead of
> > +                     errmsg("must be a member of the role whose
> > backtrace is being logged or member of pg_signal_backend")));
> >
>
> Modified it.

Maybe I'm confused here to understand the difference between
SIGNAL_BACKEND_NOSUPERUSER and SIGNAL_BACKEND_NOPERMISSION macros and
corresponding error messages. Some clarification/use case to know in
which scenarios we hit those error messages might help me. Did we try
to add test cases that show up these error messages for
pg_print_backtrace? If not, can we add?

> Attached v5 patch has the fixes for the same.
> Thoughts?

Thanks. Here are some comments on v5 patch:

1) typo - it's "process" not "porcess" +    a backend porcess. For example:

2) select * from pg_print_backtrace(NULL);
postgres=# select proname, proisstrict from pg_proc where proname =
'pg_print_backtrace';
      proname       | proisstrict
--------------------+-------------
 pg_print_backtrace | t

 See the documentation:
 "proisstrict bool

Function returns null if any call argument is null. In that case the
function won't actually be called at all. Functions that are not
“strict” must be prepared to handle null inputs."
So below PG_ARGISNUL check is not needed, you can remove that, because
pg_print_backtrace will not get called with null input.
int            bt_pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);

3) Can we just set results = true instead of PG_RETURN_BOOL(true); so
that it will be returned from PG_RETURN_BOOL(result); just for
consistency?
            if (!SendProcSignal(bt_pid, PROCSIG_BACKTRACE_PRINT,
InvalidBackendId))
                PG_RETURN_BOOL(true);
            else
                ereport(WARNING,
                        (errmsg("failed to send signal to postmaster: %m")));
        }

        PG_RETURN_BOOL(result);

4) Below is what happens if I request for a backtrace of the
postmaster process? 1388210 is pid of postmaster.
postgres=# select * from pg_print_backtrace(1388210);
WARNING:  PID 1388210 is not a PostgreSQL server process
 pg_print_backtrace
--------------------
 f

Does it make sense to have a postmaster's backtrace? If yes, can we
have something like below in sigusr1_handler()?
    if (CheckPostmasterSignal(PMSIGNAL_BACKTRACE_EMIT))
    {
        LogBackTrace();
    }

5) Can we have PROCSIG_PRINT_BACKTRACE instead of
PROCSIG_BACKTRACE_PRINT, just for readability and consistency with
function name?

6) I think it's not the postmaster that prints backtrace of the
requested backend and we don't send SIGUSR1 with
PMSIGNAL_BACKTRACE_EMIT to postmaster, I think it's the backends, upon
receiving SIGUSR1 with PMSIGNAL_BACKTRACE_EMIT signal, log their own
backtrace. Am I missing anything here? If I'm correct, we need to
change the below description and other places wherever we refer to
this description.

The idea here is to implement & expose pg_print_backtrace function, internally
what this function does is, the connected backend will send SIGUSR1 signal by
setting PMSIGNAL_BACKTRACE_EMIT to the postmaster process. Once the process
receives this signal it will log the backtrace of the process.

7) Can we get the parallel worker's backtrace? IIUC it's possible.

8) What happens if a user runs pg_print_backtrace() on Windows or
MacOS or some other platform? Do you want to say something about other
platforms where gdb/addr2line doesn't exist?
+</programlisting>
+    You can get the file name and line number by using gdb/addr2line in
+    linux platforms, as a prerequisite users must ensure gdb/addr2line is
+    already installed:
+<programlisting>

9) Can't we reuse set_backtrace with just adding a flag to
set_backtrace(ErrorData *edata, int num_skip, bool
is_print_backtrace_function), making it a non-static function and call
set_backtrace(NULL, 0, true)?

void
set_backtrace(ErrorData *edata, int num_skip, bool is_print_backtrace_function)
{
    StringInfoData errtrace;
-------
-------
    if (is_print_backtrace_function)
        elog(LOG_SERVER_ONLY, "current backtrace:%s", errtrace.data);
    else
        edata->backtrace = errtrace.data;
}

I think it will be good if we do this, because we can avoid duplicate
code in set_backtrace and LogBackTrace.

10) I think it's "pg_signal_backend" instead of "pg_print_backtrace"?
+        backtrace is being printed or the calling role has been granted
+        <literal>pg_print_backtrace</literal>, however only superusers can

11) In the documentation added, isn't it good if we talk a bit about
in which scenarios users can use this function? For instance,
something like "Use pg_print_backtrace to  know exactly where it's
currently waiting when a backend process hangs."?


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Printing backtrace of postgres processes

От
Bharath Rupireddy
Дата:
On Mon, Feb 1, 2021 at 6:14 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> On Fri, Jan 29, 2021 at 7:10 PM vignesh C <vignesh21@gmail.com> wrote:
> > > 4) How about following
> > > +                     errmsg("must be a superuser to print backtrace
> > > of backend process")));
> > > instead of
> > > +                     errmsg("must be a superuser to print backtrace
> > > of superuser query process")));
> > >
> >
> > Here the message should include superuser, we cannot remove it. Non
> > super user can log non super user provided if user has permissions for
> > it.
> >
> > > 5) How about following
> > >                      errmsg("must be a member of the role whose backed
> > > process's backtrace is being printed or member of
> > > pg_signal_backend")));
> > > instead of
> > > +                     errmsg("must be a member of the role whose
> > > backtrace is being logged or member of pg_signal_backend")));
> > >
> >
> > Modified it.
>
> Maybe I'm confused here to understand the difference between
> SIGNAL_BACKEND_NOSUPERUSER and SIGNAL_BACKEND_NOPERMISSION macros and
> corresponding error messages. Some clarification/use case to know in
> which scenarios we hit those error messages might help me. Did we try
> to add test cases that show up these error messages for
> pg_print_backtrace? If not, can we add?

Are these superuser and permission checks enough from a security
standpoint that we don't expose some sensitive information to the
user? Although I'm not sure, say from the backtrace printed and
attached to GDB, can users see the passwords or other sensitive
information from the system that they aren't supposed to see?

I'm sure this point would have been discussed upthread.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Printing backtrace of postgres processes

От
Kyotaro Horiguchi
Дата:
At Fri, 29 Jan 2021 19:10:24 +0530, vignesh C <vignesh21@gmail.com> wrote in 
> Attached v5 patch has the fixes for the same.

PMSIGNAL_BACKTRACE_EMIT is not used anywhere?
(the commit message seems stale.)

+++ b/src/bin/pg_ctl/t/005_backtrace.pl

pg_ctl doesn't (or no longer?) seem relevant to this patch.


+    eval {
+        $current_logfiles = slurp_file($node->data_dir . '/current_logfiles');
+    };
+    last unless $@;

Is there any reason not just to do "while (! -e <filenmae)) { usleep(); }" ?


+logging_collector = on

I don't see a reason for turning on logging collector here.


+gdb ./postgres
+GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
+Copyright (C) 2013 Free Software Foundation, Inc.
+License GPLv3+: GNU GPL version 3 or later
<literal><</literal>http://gnu.org/licenses/gpl.html<literal>></literal>
+This is free software: you are free to change and redistribute it.
+There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
+and "show warranty" for details.
+This GDB was configured as "x86_64-redhat-linux-gnu".
+For bug reporting instructions, please see:
+<literal><</literal>http://www.gnu.org/software/gdb/bugs/<literal>></literal>...
+Reading symbols from /home/postgresdba/inst/bin/postgres...done.

Almost all of the banner lines seem to be useless here.


 #define SIGNAL_BACKEND_SUCCESS 0
+#define BACKEND_VALIDATION_SUCCESS 0
 #define SIGNAL_BACKEND_ERROR 1
 #define SIGNAL_BACKEND_NOPERMISSION 2
 #define SIGNAL_BACKEND_NOSUPERUSER 3

Even though I can share the feeling that SIGNAL_BACKEND_SUCCESS is a
bit odd to represent "sending signal is allowed", I feel that it's
better using the existing symbol than using the new symbol.


+validate_backend_pid(int pid)

The function needs a comment. The name is somewhat
confusing. check_privilege_to_send_singal()?

Maybe the return value of the function should be changed to an enum,
and its callers should use switch-case to handle the value.


+            if (!SendProcSignal(bt_pid, PROCSIG_BACKTRACE_PRINT, InvalidBackendId))
+                PG_RETURN_BOOL(true);
+            else
+                ereport(WARNING,
+                        (errmsg("failed to send signal to postmaster: %m")));
+        }
+
+        PG_RETURN_BOOL(result);

The variable "result" seems useless.


+    elog(LOG_SERVER_ONLY, "current backtrace:%s", errtrace.data);
+
+    errno = save_errno;
+}

You need to release the resouces held by the errtrace. And the
errtrace is a bit pointless. Why isn't it "backtrace"?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Printing backtrace of postgres processes

От
Dilip Kumar
Дата:
On Fri, Jan 29, 2021 at 7:10 PM vignesh C <vignesh21@gmail.com> wrote:
>
> Thanks Bharath for your review comments. Please find my comments inline below.
>
> On Thu, Jan 28, 2021 at 7:40 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Thu, Jan 28, 2021 at 5:22 PM vignesh C <vignesh21@gmail.com> wrote:
> > > Thanks for the comments, I have fixed and attached an updated patch
> > > with the fixes for the same.
> > > Thoughts?
> >
> > Thanks for the patch. Here are few comments:
> >
> > 1) I think it's return SIGNAL_BACKEND_SUCCESS; instead of return 0; in
> > check_valid_pid?
> >
>
> I did not want to use SIGNAL_BACKEND_SUCCESS as we have not yet
> signalled the backend process at this time. I have added
> BACKEND_VALIDATION_SUCCESS macro and used it here for better
> readability.
>
> > 2) How about following in pg_signal_backend for more readability
> > +    if (ret != SIGNAL_BACKEND_SUCCESS)
> > +        return ret;
> > instead of
> > +    if (ret)
> > +        return ret;
> >
>
> Modified it to ret != BACKEND_VALIDATION_SUCCESS
>
> > 3) How about validate_backend_pid or some better name instead of
> > check_valid_pid?
> >
>
> Modified it to validate_backend_pid
>
> > 4) How about following
> > +                     errmsg("must be a superuser to print backtrace
> > of backend process")));
> > instead of
> > +                     errmsg("must be a superuser to print backtrace
> > of superuser query process")));
> >
>
> Here the message should include superuser, we cannot remove it. Non
> super user can log non super user provided if user has permissions for
> it.
>
> > 5) How about following
> >                      errmsg("must be a member of the role whose backed
> > process's backtrace is being printed or member of
> > pg_signal_backend")));
> > instead of
> > +                     errmsg("must be a member of the role whose
> > backtrace is being logged or member of pg_signal_backend")));
> >
>
> Modified it.
>
> > 6) I'm not sure whether "backtrace" or "call stack" is a generic term
> > from the user/developer perspective. In the patch, the function name
> > and documentation says callstack(I think it is "call stack" actually),
> > but the error/warning messages says backtrace. IMHO, having
> > "backtrace" everywhere in the patch, even the function name changed to
> > pg_print_backtrace, looks better and consistent. Thoughts?
> >
>
> Modified it to pg_print_backtrace.
>
> > 7) How about following in pg_print_callstack?
> > {
> >     int            bt_pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
> >     bool        result = false;
> >
> >         if (r == SIGNAL_BACKEND_SUCCESS)
> >         {
> >             if (EmitProcSignalPrintCallStack(bt_pid))
> >                 result = true;
> >             else
> >                 ereport(WARNING,
> >                         (errmsg("failed to send signal to postmaster: %m")));
> >         }
> >
> >         PG_RETURN_BOOL(result);
> > }
> >
>
> Modified similarly with slight change.
>
> > 8) How about following
> > +                (errmsg("backtrace generation is not supported by
> > this PostgresSQL installation")));
> > instead of
> > +                (errmsg("backtrace generation is not supported by
> > this installation")));
> >
>
> I used the existing message to maintain consistency with
> set_backtrace. I feel we can keep it the same.
>
> > 9) Typo - it's "example" +2) Using "addr2line -e postgres address", For exmple:
> >
>
> Modified it.
>
> > 10) How about
> > + * Handle print backtrace signal
> > instead of
> > + * Handle receipt of an print backtrace.
> >
>
> I used the existing message to maintain consistency similar to
> HandleProcSignalBarrierInterrupt. I feel we can keep it the same.
>
> > 11) Isn't below in documentation specific to Linux platform. What
> > happens if GDB is not there on the platform?
> > +<programlisting>
> > +1)  "info line *address" from gdb on postgres executable. For example:
> > +gdb ./postgres
> > +GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
> >
>
> I have made changes "You can get the file name and line number by
> using gdb/addr2line in linux platforms, as a prerequisite users must
> ensure gdb/addr2line is already installed".
>
> User will get an error like this in windows:
> select pg_print_backtrace(pg_backend_pid());
> WARNING:  backtrace generation is not supported by this installation
>  pg_print_callstack
> --------------------
>  f
> (1 row)
>
> The backtrace will not be logged in case of windows, it will throw a
> warning "backtrace generation is not supported by this installation"
> Thoughts?
>
> > 12) +The callstack will be logged in the log file. What happens if the
> > server is started without a log file , ./pg_ctl -D data start? Where
> > will the backtrace go?
> >
>
> Updated to: The backtrace will be logged to the log file if logging is
> enabled, if logging is disabled backtrace will be logged to the
> console where the postmaster was started.
>
> > 13) Not sure, if it's an overkill, but how about pg_print_callstack
> > returning a warning/notice along with true, which just says, "See
> > <<<full log file name along with log directory>>>". Thoughts?
>
> As you rightly pointed out it will be an overkill, I feel the existing
> is easily understandable.
>
> Attached v5 patch has the fixes for the same.
> Thoughts?

1.
+ elog(LOG_SERVER_ONLY, "current backtrace:%s", errtrace.data);
+
+ errno = save_errno;

Can you add some comments that why we have chosen LOG_SERVER_ONLY?

2.
+pg_print_backtrace(PG_FUNCTION_ARGS)
+{
+ int bt_pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
+

The variable name bt_pid is a bit odd, can we just use pid?

3.
+Datum
+pg_print_backtrace(PG_FUNCTION_ARGS)
+{
+ int bt_pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
+
+#ifdef HAVE_BACKTRACE_SYMBOLS
+ {
+ bool result = false;
...
+
+ PG_RETURN_BOOL(result);
+ }
+#else
+ {
+ ereport(WARNING,
+ (errmsg("backtrace generation is not supported by this installation")));
+ PG_RETURN_BOOL(false);
+ }
+#endif

The result is just initialized to false and it is never updated?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
Thanks Bharath for your comments.

On Mon, Feb 1, 2021 at 6:14 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Jan 29, 2021 at 7:10 PM vignesh C <vignesh21@gmail.com> wrote:
> > > 4) How about following
> > > +                     errmsg("must be a superuser to print backtrace
> > > of backend process")));
> > > instead of
> > > +                     errmsg("must be a superuser to print backtrace
> > > of superuser query process")));
> > >
> >
> > Here the message should include superuser, we cannot remove it. Non
> > super user can log non super user provided if user has permissions for
> > it.
> >
> > > 5) How about following
> > >                      errmsg("must be a member of the role whose backed
> > > process's backtrace is being printed or member of
> > > pg_signal_backend")));
> > > instead of
> > > +                     errmsg("must be a member of the role whose
> > > backtrace is being logged or member of pg_signal_backend")));
> > >
> >
> > Modified it.
>
> Maybe I'm confused here to understand the difference between
> SIGNAL_BACKEND_NOSUPERUSER and SIGNAL_BACKEND_NOPERMISSION macros and
> corresponding error messages. Some clarification/use case to know in
> which scenarios we hit those error messages might help me. Did we try
> to add test cases that show up these error messages for
> pg_print_backtrace? If not, can we add?

I have tested this manually:

I have tested it manually, Here is the test I did:
Create 2 users:
create user test password 'test@123';
create user test1 password 'test@123';

Test1: Test print backtrace of a different user's session:
./psql -d postgres -U test
psql (14devel)
Type "help" for help.
postgres=> select pg_backend_pid();
 pg_backend_pid
----------------
          30142
(1 row)
------------------------------------------
./psql -d postgres -U test1
psql (14devel)
Type "help" for help.
postgres=> select pg_print_backtrace(30142);
ERROR:  must be a member of the role whose backtrace is being logged
or member of pg_signal_backend

The above will be successful after:
grant pg_signal_backend to test1;

Test1: Test for non super user trying to print backtrace of a super
user's session:
./psql -d postgres
psql (14devel)
Type "help" for help.
postgres=# select pg_backend_pid();
 pg_backend_pid
----------------
          30211
(1 row)

./psql -d postgres -U test1
psql (14devel)
Type "help" for help.
postgres=> select pg_print_backtrace(30211);
ERROR:  must be a superuser to print backtrace of superuser process
I have not added any tests for this as we required 2 active sessions
and I did not see any existing framework for this.
This test should help in relating the behavior.

> > Attached v5 patch has the fixes for the same.
> > Thoughts?
>
> Thanks. Here are some comments on v5 patch:
>
> 1) typo - it's "process" not "porcess" +    a backend porcess. For example:
>

Modified.

> 2) select * from pg_print_backtrace(NULL);
> postgres=# select proname, proisstrict from pg_proc where proname =
> 'pg_print_backtrace';
>       proname       | proisstrict
> --------------------+-------------
>  pg_print_backtrace | t
>
>  See the documentation:
>  "proisstrict bool
>
> Function returns null if any call argument is null. In that case the
> function won't actually be called at all. Functions that are not
> “strict” must be prepared to handle null inputs."
> So below PG_ARGISNUL check is not needed, you can remove that, because
> pg_print_backtrace will not get called with null input.
> int            bt_pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
>

Modified.

> 3) Can we just set results = true instead of PG_RETURN_BOOL(true); so
> that it will be returned from PG_RETURN_BOOL(result); just for
> consistency?
>             if (!SendProcSignal(bt_pid, PROCSIG_BACKTRACE_PRINT,
> InvalidBackendId))
>                 PG_RETURN_BOOL(true);
>             else
>                 ereport(WARNING,
>                         (errmsg("failed to send signal to postmaster: %m")));
>         }
>
>         PG_RETURN_BOOL(result);
>

I felt existing is better as it will reduce one instruction of setting
first and then returning. There are only 2 places from where we
return. I felt we could directly return true or false.

> 4) Below is what happens if I request for a backtrace of the
> postmaster process? 1388210 is pid of postmaster.
> postgres=# select * from pg_print_backtrace(1388210);
> WARNING:  PID 1388210 is not a PostgreSQL server process
>  pg_print_backtrace
> --------------------
>  f
>
> Does it make sense to have a postmaster's backtrace? If yes, can we
> have something like below in sigusr1_handler()?
>     if (CheckPostmasterSignal(PMSIGNAL_BACKTRACE_EMIT))
>     {
>         LogBackTrace();
>     }
>

We had a discussion about this in [1]  earlier and felt including this
is not very useful.

> 5) Can we have PROCSIG_PRINT_BACKTRACE instead of
> PROCSIG_BACKTRACE_PRINT, just for readability and consistency with
> function name?
>

PROCSIG_PRINT_BACKTRACE is better, I have modified it.

> 6) I think it's not the postmaster that prints backtrace of the
> requested backend and we don't send SIGUSR1 with
> PMSIGNAL_BACKTRACE_EMIT to postmaster, I think it's the backends, upon
> receiving SIGUSR1 with PMSIGNAL_BACKTRACE_EMIT signal, log their own
> backtrace. Am I missing anything here? If I'm correct, we need to
> change the below description and other places wherever we refer to
> this description.
>
> The idea here is to implement & expose pg_print_backtrace function, internally
> what this function does is, the connected backend will send SIGUSR1 signal by
> setting PMSIGNAL_BACKTRACE_EMIT to the postmaster process. Once the process
> receives this signal it will log the backtrace of the process.
>

Modified.

> 7) Can we get the parallel worker's backtrace? IIUC it's possible.
>

Yes we can get the parallel worker's backtrace. As this design is
driven based on CHECK_FOR_INTERRUPTS, it will be printed when
CHECK_FOR_INTERRUPTS is called.

> 8) What happens if a user runs pg_print_backtrace() on Windows or
> MacOS or some other platform? Do you want to say something about other
> platforms where gdb/addr2line doesn't exist?
> +</programlisting>
> +    You can get the file name and line number by using gdb/addr2line in
> +    linux platforms, as a prerequisite users must ensure gdb/addr2line is
> +    already installed:
> +<programlisting>
>

I have added the following:
This feature will be available, if the installer was generated on a
platform which had backtrace capturing capability. If not available it
will return false by throwing the following warning "WARNING:
backtrace generation is not supported by this installation"

> 9) Can't we reuse set_backtrace with just adding a flag to
> set_backtrace(ErrorData *edata, int num_skip, bool
> is_print_backtrace_function), making it a non-static function and call
> set_backtrace(NULL, 0, true)?
>
> void
> set_backtrace(ErrorData *edata, int num_skip, bool is_print_backtrace_function)
> {
>     StringInfoData errtrace;
> -------
> -------
>     if (is_print_backtrace_function)
>         elog(LOG_SERVER_ONLY, "current backtrace:%s", errtrace.data);
>     else
>         edata->backtrace = errtrace.data;
> }
>
> I think it will be good if we do this, because we can avoid duplicate
> code in set_backtrace and LogBackTrace.
>

Yes it is better, I have modified it to use the same function.

> 10) I think it's "pg_signal_backend" instead of "pg_print_backtrace"?
> +        backtrace is being printed or the calling role has been granted
> +        <literal>pg_print_backtrace</literal>, however only superusers can
>

Modified.

> 11) In the documentation added, isn't it good if we talk a bit about
> in which scenarios users can use this function? For instance,
> something like "Use pg_print_backtrace to  know exactly where it's
> currently waiting when a backend process hangs."?
>

Modified to include more details.

[1] - https:://www.postgresql.org/message-id/1778088.1606026308%40sss.pgh.pa.us
Attached v6 patch with the fixes.
Thoughts?

Regards,
Vignesh

Вложения

Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Mon, Feb 1, 2021 at 11:04 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Feb 1, 2021 at 6:14 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> > On Fri, Jan 29, 2021 at 7:10 PM vignesh C <vignesh21@gmail.com> wrote:
> > > > 4) How about following
> > > > +                     errmsg("must be a superuser to print backtrace
> > > > of backend process")));
> > > > instead of
> > > > +                     errmsg("must be a superuser to print backtrace
> > > > of superuser query process")));
> > > >
> > >
> > > Here the message should include superuser, we cannot remove it. Non
> > > super user can log non super user provided if user has permissions for
> > > it.
> > >
> > > > 5) How about following
> > > >                      errmsg("must be a member of the role whose backed
> > > > process's backtrace is being printed or member of
> > > > pg_signal_backend")));
> > > > instead of
> > > > +                     errmsg("must be a member of the role whose
> > > > backtrace is being logged or member of pg_signal_backend")));
> > > >
> > >
> > > Modified it.
> >
> > Maybe I'm confused here to understand the difference between
> > SIGNAL_BACKEND_NOSUPERUSER and SIGNAL_BACKEND_NOPERMISSION macros and
> > corresponding error messages. Some clarification/use case to know in
> > which scenarios we hit those error messages might help me. Did we try
> > to add test cases that show up these error messages for
> > pg_print_backtrace? If not, can we add?
>
> Are these superuser and permission checks enough from a security
> standpoint that we don't expose some sensitive information to the
> user? Although I'm not sure, say from the backtrace printed and
> attached to GDB, can users see the passwords or other sensitive
> information from the system that they aren't supposed to see?
>
> I'm sure this point would have been discussed upthread.

This will just print the backtrace of the current backend. Users
cannot get password information from this. This backtrace will be sent
from customer side to the customer support. Developers will use gdb to
check the file and line number using the addresses. We are suggesting
to use gdb to get the file and line number from the address.  Users
can attach gdb to the process even now without this feature, I think
that behavior will be the same as is.  That will not be impacted
because of this feature. It was discussed to keep the checks similar
to pg_terminate_backend as discussed in [1].
[1] https://www.postgresql.org/message-id/CA%2BTgmoZ8XeQVCCqfq826kAr804a1ZnYy46FnQB9r2n_iOofh8Q%40mail.gmail.com

Regards,
Vignesh



Re: Printing backtrace of postgres processes

От
Tom Lane
Дата:
vignesh C <vignesh21@gmail.com> writes:
> On Mon, Feb 1, 2021 at 11:04 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>> Are these superuser and permission checks enough from a security
>> standpoint that we don't expose some sensitive information to the
>> user?

> This will just print the backtrace of the current backend. Users
> cannot get password information from this.

Really?

A backtrace normally exposes the text of the current query, for
instance, which could contain very sensitive data (passwords in ALTER
USER, customer credit card numbers in ordinary data, etc etc).  We
don't allow the postmaster log to be seen by any but very privileged
users; it's not sane to think that this data is any less
security-critical than the postmaster log.

This point is entirely separate from the question of whether
triggering stack traces at inopportune moments could cause system
malfunctions, but that question is also not to be ignored.

TBH, I'm leaning to the position that this should be superuser
only.  I do NOT agree with the idea that ordinary users should
be able to trigger it, even against backends theoretically
belonging to their own userid.  (Do I need to point out that
some levels of the call stack might be from security-definer
functions with more privilege than the session's nominal user?)

            regards, tom lane



Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Wed, Feb 3, 2021 at 1:00 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> vignesh C <vignesh21@gmail.com> writes:
> > On Mon, Feb 1, 2021 at 11:04 AM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> >> Are these superuser and permission checks enough from a security
> >> standpoint that we don't expose some sensitive information to the
> >> user?
>
> > This will just print the backtrace of the current backend. Users
> > cannot get password information from this.
>
> Really?
>
> A backtrace normally exposes the text of the current query, for
> instance, which could contain very sensitive data (passwords in ALTER
> USER, customer credit card numbers in ordinary data, etc etc).  We
> don't allow the postmaster log to be seen by any but very privileged
> users; it's not sane to think that this data is any less
> security-critical than the postmaster log.
>
> This point is entirely separate from the question of whether
> triggering stack traces at inopportune moments could cause system
> malfunctions, but that question is also not to be ignored.
>
> TBH, I'm leaning to the position that this should be superuser
> only.  I do NOT agree with the idea that ordinary users should
> be able to trigger it, even against backends theoretically
> belonging to their own userid.  (Do I need to point out that
> some levels of the call stack might be from security-definer
> functions with more privilege than the session's nominal user?)
>

I had seen that the log that will be logged will be something like:
        postgres: test postgres [local]
idle(ProcessClientReadInterrupt+0x3a) [0x9500ec]
        postgres: test postgres [local] idle(secure_read+0x183) [0x787f43]
        postgres: test postgres [local] idle() [0x7919de]
        postgres: test postgres [local] idle(pq_getbyte+0x32) [0x791a8e]
        postgres: test postgres [local] idle() [0x94fc16]
        postgres: test postgres [local] idle() [0x950099]
        postgres: test postgres [local] idle(PostgresMain+0x6d5) [0x954bd5]
        postgres: test postgres [local] idle() [0x898a09]
        postgres: test postgres [local] idle() [0x89838f]
        postgres: test postgres [local] idle() [0x894953]
        postgres: test postgres [local] idle(PostmasterMain+0x116b) [0x89422a]
        postgres: test postgres [local] idle() [0x79725b]
        /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f6e68d75555]
        postgres: test postgres [local] idle() [0x484249]
I was not sure if we would be able to get any secure information from
this. I did not notice the function arguments being printed. I felt
the function name, offset  and the return address will be logged. I
might be missing something here.
Thoughts?

Regards,
Vignesh



Re: Printing backtrace of postgres processes

От
Bharath Rupireddy
Дата:
On Wed, Feb 3, 2021 at 1:49 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Wed, Feb 3, 2021 at 1:00 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > vignesh C <vignesh21@gmail.com> writes:
> > > On Mon, Feb 1, 2021 at 11:04 AM Bharath Rupireddy
> > > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >> Are these superuser and permission checks enough from a security
> > >> standpoint that we don't expose some sensitive information to the
> > >> user?
> >
> > > This will just print the backtrace of the current backend. Users
> > > cannot get password information from this.
> >
> > Really?
> >
> > A backtrace normally exposes the text of the current query, for
> > instance, which could contain very sensitive data (passwords in ALTER
> > USER, customer credit card numbers in ordinary data, etc etc).  We
> > don't allow the postmaster log to be seen by any but very privileged
> > users; it's not sane to think that this data is any less
> > security-critical than the postmaster log.
> >
> > This point is entirely separate from the question of whether
> > triggering stack traces at inopportune moments could cause system
> > malfunctions, but that question is also not to be ignored.
> >
> > TBH, I'm leaning to the position that this should be superuser
> > only.  I do NOT agree with the idea that ordinary users should
> > be able to trigger it, even against backends theoretically
> > belonging to their own userid.  (Do I need to point out that
> > some levels of the call stack might be from security-definer
> > functions with more privilege than the session's nominal user?)
> >
>
> I had seen that the log that will be logged will be something like:
>         postgres: test postgres [local]
> idle(ProcessClientReadInterrupt+0x3a) [0x9500ec]
>         postgres: test postgres [local] idle(secure_read+0x183) [0x787f43]
>         postgres: test postgres [local] idle() [0x7919de]
>         postgres: test postgres [local] idle(pq_getbyte+0x32) [0x791a8e]
>         postgres: test postgres [local] idle() [0x94fc16]
>         postgres: test postgres [local] idle() [0x950099]
>         postgres: test postgres [local] idle(PostgresMain+0x6d5) [0x954bd5]
>         postgres: test postgres [local] idle() [0x898a09]
>         postgres: test postgres [local] idle() [0x89838f]
>         postgres: test postgres [local] idle() [0x894953]
>         postgres: test postgres [local] idle(PostmasterMain+0x116b) [0x89422a]
>         postgres: test postgres [local] idle() [0x79725b]
>         /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f6e68d75555]
>         postgres: test postgres [local] idle() [0x484249]
> I was not sure if we would be able to get any secure information from
> this. I did not notice the function arguments being printed. I felt
> the function name, offset  and the return address will be logged. I
> might be missing something here.
> Thoughts?

First of all, we need to see if the output of pg_print_backtrace shows
up function parameter addresses or only function start addresses along
with line and file information when attached to gdb. In either case,
IMO, it will be easy for experienced hackers(I'm not one though) to
calculate and fetch the query string or other function parameters or
the variables inside the functions from the stack by looking at the
code (which is available openly, of course).

Say, if a backend is in a long running scan or insert operation, then
pg_print_backtrace is issued from another session, the
exec_simple_query function shows up query_string. Below is captured
from attached gdb though, I'm not sure whether the logged backtrace
will have function address or the function parameters addresses, I
think we can check that by having a long running query which
frequently checks interrupts and issue pg_print_backtrace from another
session to that backend. Now, attach gdb to the backend in which the
query is running, then take bt, see if the logged backtrace and the
gdb bt have the same or closer addresses.

#13 0x00005644f4320729 in exec_simple_query (
    query_string=0x5644f6771bf0 "select pg_backend_pid();") at postgres.c:1240
#14 0x00005644f4324ff4 in PostgresMain (argc=1, argv=0x7ffd819bd5e0,
    dbname=0x5644f679d2b8 "postgres", username=0x5644f679d298 "bharath")
    at postgres.c:4394
#15 0x00005644f4256f9d in BackendRun (port=0x5644f67935c0) at postmaster.c:4484
#16 0x00005644f4256856 in BackendStartup (port=0x5644f67935c0) at
postmaster.c:4206
#17 0x00005644f4252a11 in ServerLoop () at postmaster.c:1730
#18 0x00005644f42521aa in PostmasterMain (argc=3, argv=0x5644f676b1f0)
    at postmaster.c:1402
#19 0x00005644f4148789 in main (argc=3, argv=0x5644f676b1f0) at main.c:209

As suggested by Tom, I'm okay if this function is callable only by the
superusers. In that case, the superusers can fetch the backtrace and
send it for further analysis in case of any hangs or issues.

Others may have better thoughts.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Wed, Feb 3, 2021 at 3:24 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Feb 3, 2021 at 1:49 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Wed, Feb 3, 2021 at 1:00 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > >
> > > vignesh C <vignesh21@gmail.com> writes:
> > > > On Mon, Feb 1, 2021 at 11:04 AM Bharath Rupireddy
> > > > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > >> Are these superuser and permission checks enough from a security
> > > >> standpoint that we don't expose some sensitive information to the
> > > >> user?
> > >
> > > > This will just print the backtrace of the current backend. Users
> > > > cannot get password information from this.
> > >
> > > Really?
> > >
> > > A backtrace normally exposes the text of the current query, for
> > > instance, which could contain very sensitive data (passwords in ALTER
> > > USER, customer credit card numbers in ordinary data, etc etc).  We
> > > don't allow the postmaster log to be seen by any but very privileged
> > > users; it's not sane to think that this data is any less
> > > security-critical than the postmaster log.
> > >
> > > This point is entirely separate from the question of whether
> > > triggering stack traces at inopportune moments could cause system
> > > malfunctions, but that question is also not to be ignored.
> > >
> > > TBH, I'm leaning to the position that this should be superuser
> > > only.  I do NOT agree with the idea that ordinary users should
> > > be able to trigger it, even against backends theoretically
> > > belonging to their own userid.  (Do I need to point out that
> > > some levels of the call stack might be from security-definer
> > > functions with more privilege than the session's nominal user?)
> > >
> >
> > I had seen that the log that will be logged will be something like:
> >         postgres: test postgres [local]
> > idle(ProcessClientReadInterrupt+0x3a) [0x9500ec]
> >         postgres: test postgres [local] idle(secure_read+0x183) [0x787f43]
> >         postgres: test postgres [local] idle() [0x7919de]
> >         postgres: test postgres [local] idle(pq_getbyte+0x32) [0x791a8e]
> >         postgres: test postgres [local] idle() [0x94fc16]
> >         postgres: test postgres [local] idle() [0x950099]
> >         postgres: test postgres [local] idle(PostgresMain+0x6d5) [0x954bd5]
> >         postgres: test postgres [local] idle() [0x898a09]
> >         postgres: test postgres [local] idle() [0x89838f]
> >         postgres: test postgres [local] idle() [0x894953]
> >         postgres: test postgres [local] idle(PostmasterMain+0x116b) [0x89422a]
> >         postgres: test postgres [local] idle() [0x79725b]
> >         /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f6e68d75555]
> >         postgres: test postgres [local] idle() [0x484249]
> > I was not sure if we would be able to get any secure information from
> > this. I did not notice the function arguments being printed. I felt
> > the function name, offset  and the return address will be logged. I
> > might be missing something here.
> > Thoughts?
>
> First of all, we need to see if the output of pg_print_backtrace shows
> up function parameter addresses or only function start addresses along
> with line and file information when attached to gdb. In either case,
> IMO, it will be easy for experienced hackers(I'm not one though) to
> calculate and fetch the query string or other function parameters or
> the variables inside the functions from the stack by looking at the
> code (which is available openly, of course).
>
> Say, if a backend is in a long running scan or insert operation, then
> pg_print_backtrace is issued from another session, the
> exec_simple_query function shows up query_string. Below is captured
> from attached gdb though, I'm not sure whether the logged backtrace
> will have function address or the function parameters addresses, I
> think we can check that by having a long running query which
> frequently checks interrupts and issue pg_print_backtrace from another
> session to that backend. Now, attach gdb to the backend in which the
> query is running, then take bt, see if the logged backtrace and the
> gdb bt have the same or closer addresses.
>
> #13 0x00005644f4320729 in exec_simple_query (
>     query_string=0x5644f6771bf0 "select pg_backend_pid();") at postgres.c:1240
> #14 0x00005644f4324ff4 in PostgresMain (argc=1, argv=0x7ffd819bd5e0,
>     dbname=0x5644f679d2b8 "postgres", username=0x5644f679d298 "bharath")
>     at postgres.c:4394
> #15 0x00005644f4256f9d in BackendRun (port=0x5644f67935c0) at postmaster.c:4484
> #16 0x00005644f4256856 in BackendStartup (port=0x5644f67935c0) at
> postmaster.c:4206
> #17 0x00005644f4252a11 in ServerLoop () at postmaster.c:1730
> #18 0x00005644f42521aa in PostmasterMain (argc=3, argv=0x5644f676b1f0)
>     at postmaster.c:1402
> #19 0x00005644f4148789 in main (argc=3, argv=0x5644f676b1f0) at main.c:209
>
> As suggested by Tom, I'm okay if this function is callable only by the
> superusers. In that case, the superusers can fetch the backtrace and
> send it for further analysis in case of any hangs or issues.
>
> Others may have better thoughts.

I would like to clarify a bit to avoid confusion here: Currently when
there is a long running query or hang in the server, one of our
customer support members will go for a screen share with the customer.
If gdb is not installed we tell the customer to install gdb. Then we
tell the customer to attach the backend process and execute the
command. We tell the customer to share this to the customer support
team and later the development team checks if this is an issue or a
long running query and provides a workaround or explains what needs to
be done next.
This feature  reduces a lot of these processes. Whenever there is an
issue and if the user/customer is not sure if it is a hang or long
running query. User can execute pg_print_backtrace, after this is
executed, the backtrace will be logged to the log file something like
below:
        postgres: test postgres [local]
idle(ProcessClientReadInterrupt+0x3a) [0x9500ec]
        postgres: test postgres [local] idle(secure_read+0x183) [0x787f43]
        postgres: test postgres [local] idle() [0x7919de]
        postgres: test postgres [local] idle(pq_getbyte+0x32) [0x791a8e]
        postgres: test postgres [local] idle() [0x94fc16]
        postgres: test postgres [local] idle() [0x950099]
        postgres: test postgres [local] idle(PostgresMain+0x6d5) [0x954bd5]
        postgres: test postgres [local] idle() [0x898a09]
        postgres: test postgres [local] idle() [0x89838f]
        postgres: test postgres [local] idle() [0x894953]
        postgres: test postgres [local] idle(PostmasterMain+0x116b) [0x89422a]
        postgres: test postgres [local] idle() [0x79725b]
        /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f6e68d75555]
        postgres: test postgres [local] idle() [0x484249]

The above log contents (not the complete log file) will be shared by
the customer/user along with the query/configuration/statistics, etc
to the support team. I have mentioned a few steps in the documentation
how to get the file/line from the backtrace logged using
gdb/addr2line. Here this will be done in the developer environment,
not in the actual customer environment which has the sensitive data.
We don't attach to the running process in gdb. Developers will use the
same binary that was released to the customer(not in the customer
environment) to get the file/line number. Users will be able to
simulate a backtrace which includes file and line number. I felt users
cannot get the sensitive information from here. This information will
help the developer to analyze and suggest what is the next action that
customers need to take.
I felt that there was a slight misunderstanding in where gdb is
executed, it is not in the customer environment but in the developer
environment. From the customer environment we will only get the logs
of stack trace as mentioned above.
I have changed it so that this feature is supported only for superuser users.
Thoughts?

Regards,
Vignesh

Вложения

Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Thu, May 6, 2021 at 7:43 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> Here's a cleaned-up copy of the doc text.
>
> Send a request to the backend with the specified process ID to log its backtrace.
> The backtrace will be logged at message level <literal>LOG</literal>.
> It will appear in the server log based on the log configuration set
> (See <xref linkend="runtime-config-logging"/> for more information),
> but will not be sent to the client regardless of
> <xref linkend="guc-client-min-messages"/>.
> A backtrace will identify where exactly the backend process is currently
> executing. This may be useful to developers to diagnose stuck
> processes and other problems. This feature is
> not supported for the postmaster, logger, or statistics collector process. This
> feature will be available if PostgreSQL was built
> with the ability to capture backtracee.  If not available, the function will
> return false and show a WARNING.
> Only superusers can request backends to log their backtrace.

Thanks for  rephrasing, I have modified to include checkpointer,
walwriter and background writer process also.

> > - * this and related functions are not inlined.
> > + * this and related functions are not inlined. If edata pointer is valid
> > + * backtrace information will set in edata.
>
> will *be* set

Modified.

Thanks for the comments, Attached v8 patch has the fixes for the same.

Regards,
Vignesh

Вложения

Re: Printing backtrace of postgres processes

От
Robert Haas
Дата:
On Wed, Feb 3, 2021 at 2:30 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> A backtrace normally exposes the text of the current query, for
> instance, which could contain very sensitive data (passwords in ALTER
> USER, customer credit card numbers in ordinary data, etc etc).  We
> don't allow the postmaster log to be seen by any but very privileged
> users; it's not sane to think that this data is any less
> security-critical than the postmaster log.

I agree. Vignesh points out downthread that it's just printing out
memory addresses and not necessarily anything too directly usable, but
the memory addresses themselves are potentially security-sensitive. If
that were not a thing, there would not be such a thing as ASLR.

> This point is entirely separate from the question of whether
> triggering stack traces at inopportune moments could cause system
> malfunctions, but that question is also not to be ignored.

That worries me too, although I have a hard time saying exactly why.
If we call an OS-provided function called backtrace() and it does
something other than generate a backtrace - e.g. makes the process seg
fault, or mucks about with the values of global variables - isn't that
just a bug in the OS? Do we have particular reasons to believe that
such bugs are common? My own skepticism here is mostly based on how
inconsistent debuggers are about being able to tell you anything
useful, which makes me think that in a binary compiled with any
optimization, the ability of backtrace() to do something consistently
useful is also questionable. But that's a separate question from
whether it's likely to cause any active harm.

> TBH, I'm leaning to the position that this should be superuser
> only.  I do NOT agree with the idea that ordinary users should
> be able to trigger it, even against backends theoretically
> belonging to their own userid.  (Do I need to point out that
> some levels of the call stack might be from security-definer
> functions with more privilege than the session's nominal user?)

I agree that ordinary users shouldn't be able to trigger it, but I
think it should be restricted to some predefined role, new or
existing, rather than to superuser. I see no reason not to let
individual users decide what risks they want to take.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Printing backtrace of postgres processes

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Feb 3, 2021 at 2:30 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> TBH, I'm leaning to the position that this should be superuser
>> only.

> I agree that ordinary users shouldn't be able to trigger it, but I
> think it should be restricted to some predefined role, new or
> existing, rather than to superuser. I see no reason not to let
> individual users decide what risks they want to take.

If we think it's worth having a predefined role for, OK.  However,
I don't like the future I see us heading towards where there are
hundreds of random predefined roles.  Is there an existing role
that it'd be reasonable to attach this ability to?

            regards, tom lane



Re: Printing backtrace of postgres processes

От
Andres Freund
Дата:
Hi,

On 2021-05-06 14:38:51 -0400, Robert Haas wrote:
> On Wed, Feb 3, 2021 at 2:30 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > This point is entirely separate from the question of whether
> > triggering stack traces at inopportune moments could cause system
> > malfunctions, but that question is also not to be ignored.
> 
> That worries me too, although I have a hard time saying exactly why.
> If we call an OS-provided function called backtrace() and it does
> something other than generate a backtrace - e.g. makes the process seg
> fault, or mucks about with the values of global variables - isn't that
> just a bug in the OS? Do we have particular reasons to believe that
> such bugs are common? My own skepticism here is mostly based on how
> inconsistent debuggers are about being able to tell you anything
> useful, which makes me think that in a binary compiled with any
> optimization, the ability of backtrace() to do something consistently
> useful is also questionable. But that's a separate question from
> whether it's likely to cause any active harm.

I think that ship kind of has sailed with

commit 71a8a4f6e36547bb060dbcc961ea9b57420f7190
Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date:   2019-11-08 15:44:20 -0300

    Add backtrace support for error reporting

we allow generating backtraces in all kind of places, including
e.g. some inside critical sections via backtrace_functions. I don't
think also doing so during interrupt processing is a meaningful increase
in exposed surface area?

Greetings,

Andres Freund



Re: Printing backtrace of postgres processes

От
Andres Freund
Дата:
Hi,

On 2021-05-06 14:56:09 -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Wed, Feb 3, 2021 at 2:30 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> TBH, I'm leaning to the position that this should be superuser
> >> only.
> 
> > I agree that ordinary users shouldn't be able to trigger it, but I
> > think it should be restricted to some predefined role, new or
> > existing, rather than to superuser. I see no reason not to let
> > individual users decide what risks they want to take.
> 
> If we think it's worth having a predefined role for, OK.  However,
> I don't like the future I see us heading towards where there are
> hundreds of random predefined roles.  Is there an existing role
> that it'd be reasonable to attach this ability to?

It does seem like it'd be good to group it in with something
else. There's nothing fitting 100% though.

postgres[1475723][1]=# SELECT rolname FROM pg_roles WHERE rolname LIKE 'pg_%' ORDER BY rolname;
┌───────────────────────────┐
│          rolname          │
├───────────────────────────┤
│ pg_database_owner         │
│ pg_execute_server_program │
│ pg_monitor                │
│ pg_read_all_data          │
│ pg_read_all_settings      │
│ pg_read_all_stats         │
│ pg_read_server_files      │
│ pg_signal_backend         │
│ pg_stat_scan_tables       │
│ pg_write_all_data         │
│ pg_write_server_files     │
└───────────────────────────┘
(11 rows)

We could fit it into pg_monitor, but it's probably a bit more impactful
than most things in there? But I'd go for it anyway, I think.

Greetings,

Andres Freund



Re: Printing backtrace of postgres processes

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2021-05-06 14:38:51 -0400, Robert Haas wrote:
>> On Wed, Feb 3, 2021 at 2:30 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> This point is entirely separate from the question of whether
>>> triggering stack traces at inopportune moments could cause system
>>> malfunctions, but that question is also not to be ignored.

> I think that ship kind of has sailed with
>
> commit 71a8a4f6e36547bb060dbcc961ea9b57420f7190
> Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
> Date:   2019-11-08 15:44:20 -0300
>     Add backtrace support for error reporting

The fact that we have a scarily large surface area for that to
cause problems is not a great argument for making the surface
area even larger.  Also, I don't think v13 has been out long
enough for us to have full confidence that the backtrace behavior
doesn't cause any problems already.

> we allow generating backtraces in all kind of places, including
> e.g. some inside critical sections via backtrace_functions.

If there's an elog call inside a critical section, that seems
like a problem already.  Are you sure that there are any such?

            regards, tom lane



Re: Printing backtrace of postgres processes

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2021-05-06 14:56:09 -0400, Tom Lane wrote:
>> If we think it's worth having a predefined role for, OK.  However,
>> I don't like the future I see us heading towards where there are
>> hundreds of random predefined roles.  Is there an existing role
>> that it'd be reasonable to attach this ability to?

> It does seem like it'd be good to group it in with something
> else. There's nothing fitting 100% though.

I'd probably vote for pg_read_all_data, considering that much of
the concern about this has to do with the possibility of exposure
of sensitive data.  I'm not quite sure what the security expectations
are for pg_monitor.

            regards, tom lane



Re: Printing backtrace of postgres processes

От
Andres Freund
Дата:
Hi,

On 2021-05-06 15:31:02 -0400, Tom Lane wrote:
> I'd probably vote for pg_read_all_data, considering that much of
> the concern about this has to do with the possibility of exposure
> of sensitive data.  I'm not quite sure what the security expectations
> are for pg_monitor.

I was wondering the same, but looking at the docs of pg_read_all_data
that doesn't seem like a great fit:

      <row>
       <entry>pg_read_all_data</entry>
       <entry>Read all data (tables, views, sequences), as if having SELECT
       rights on those objects, and USAGE rights on all schemas, even without
       having it explicitly.  This role does not have the role attribute
       <literal>BYPASSRLS</literal> set.  If RLS is being used, an administrator
       may wish to set <literal>BYPASSRLS</literal> on roles which this role is
       GRANTed to.</entry>
      </row>

It's mostly useful to be able to run pg_dump without superuser
permissions.

Contrast that to pg_monitor:

      <row>
       <entry>pg_monitor</entry>
       <entry>Read/execute various monitoring views and functions.
       This role is a member of <literal>pg_read_all_settings</literal>,
       <literal>pg_read_all_stats</literal> and
       <literal>pg_stat_scan_tables</literal>.</entry>
      </row>
...
  <para>
  The <literal>pg_monitor</literal>, <literal>pg_read_all_settings</literal>,
  <literal>pg_read_all_stats</literal> and <literal>pg_stat_scan_tables</literal>
  roles are intended to allow administrators to easily configure a role for the
  purpose of monitoring the database server. They grant a set of common privileges
  allowing the role to read various useful configuration settings, statistics and
  other system information normally restricted to superusers.
  </para>

"normally restricted to superusers" seems to fit pretty well?

I think if we follow your argument, pg_read_server_files would be a bit
better fit than pg_read_all_data? But it seems "too powerful" to me.
      <row>
       <entry>pg_read_server_files</entry>
       <entry>Allow reading files from any location the database can access on the server with COPY and
       other file-access functions.</entry>
      </row>

Greetings,

Andres Freund



Re: Printing backtrace of postgres processes

От
Andres Freund
Дата:
Hi,

On 2021-05-06 15:22:02 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > we allow generating backtraces in all kind of places, including
> > e.g. some inside critical sections via backtrace_functions.
> 
> If there's an elog call inside a critical section, that seems
> like a problem already.  Are you sure that there are any such?

There's several, yes. In xlog.c there's quite a few that are gated by
wal_debug being enabled. But also a few without that,
e.g. XLogFileInit() logging
    elog(DEBUG1, "creating and filling new WAL file");
and XLogFileInit() can be called within a critical section.

Greetings,

Andres Freund



Re: Printing backtrace of postgres processes

От
Robert Haas
Дата:
On Thu, May 6, 2021 at 3:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2021-05-06 14:56:09 -0400, Tom Lane wrote:
> >> If we think it's worth having a predefined role for, OK.  However,
> >> I don't like the future I see us heading towards where there are
> >> hundreds of random predefined roles.  Is there an existing role
> >> that it'd be reasonable to attach this ability to?
>
> > It does seem like it'd be good to group it in with something
> > else. There's nothing fitting 100% though.
>
> I'd probably vote for pg_read_all_data, considering that much of
> the concern about this has to do with the possibility of exposure
> of sensitive data.  I'm not quite sure what the security expectations
> are for pg_monitor.

Maybe we should have a role that is specifically for server debugging
type things. This kind of overlaps with Mark Dilger's proposal to try
to allow SET for security-sensitive GUCs to be delegated via
predefined roles. The exact way to divide that up is open to question,
but it wouldn't seem crazy to me if the same role controlled the
ability to do this plus the ability to set the GUCs
backtrace_functions, debug_invalidate_system_caches_always,
wal_consistency_checking, and maybe a few other things.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Wed, May 12, 2021 at 2:27 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, May 6, 2021 at 3:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Andres Freund <andres@anarazel.de> writes:
> > > On 2021-05-06 14:56:09 -0400, Tom Lane wrote:
> > >> If we think it's worth having a predefined role for, OK.  However,
> > >> I don't like the future I see us heading towards where there are
> > >> hundreds of random predefined roles.  Is there an existing role
> > >> that it'd be reasonable to attach this ability to?
> >
> > > It does seem like it'd be good to group it in with something
> > > else. There's nothing fitting 100% though.
> >
> > I'd probably vote for pg_read_all_data, considering that much of
> > the concern about this has to do with the possibility of exposure
> > of sensitive data.  I'm not quite sure what the security expectations
> > are for pg_monitor.
>
> Maybe we should have a role that is specifically for server debugging
> type things. This kind of overlaps with Mark Dilger's proposal to try
> to allow SET for security-sensitive GUCs to be delegated via
> predefined roles. The exact way to divide that up is open to question,
> but it wouldn't seem crazy to me if the same role controlled the
> ability to do this plus the ability to set the GUCs
> backtrace_functions, debug_invalidate_system_caches_always,
> wal_consistency_checking, and maybe a few other things.

+1 for the idea of having a new role for this. Currently I have
implemented this feature to be supported only for the superuser. If we
are ok with having a new role to handle debugging features, I will
make a 002 patch to handle this.

Regards,
Vignesh



Re: Printing backtrace of postgres processes

От
Bharath Rupireddy
Дата:
On Tue, Jul 13, 2021 at 9:03 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Wed, May 12, 2021 at 2:27 AM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > Maybe we should have a role that is specifically for server debugging
> > type things. This kind of overlaps with Mark Dilger's proposal to try
> > to allow SET for security-sensitive GUCs to be delegated via
> > predefined roles. The exact way to divide that up is open to question,
> > but it wouldn't seem crazy to me if the same role controlled the
> > ability to do this plus the ability to set the GUCs
> > backtrace_functions, debug_invalidate_system_caches_always,
> > wal_consistency_checking, and maybe a few other things.
>
> +1 for the idea of having a new role for this. Currently I have
> implemented this feature to be supported only for the superuser. If we
> are ok with having a new role to handle debugging features, I will
> make a 002 patch to handle this.

I see that there are a good number of user functions that are
accessible only by superuser (I searched for "if (!superuser())" in
the code base). I agree with the intention to not overload the
superuser anymore and have a few other special roles to delegate the
existing superuser-only functions to them. In that case, are we going
to revisit and reassign all the existing superuser-only functions?

Regards,
Bharath Rupireddy.



Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Wed, Jul 21, 2021 at 3:52 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Jul 13, 2021 at 9:03 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Wed, May 12, 2021 at 2:27 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > >
> > > Maybe we should have a role that is specifically for server debugging
> > > type things. This kind of overlaps with Mark Dilger's proposal to try
> > > to allow SET for security-sensitive GUCs to be delegated via
> > > predefined roles. The exact way to divide that up is open to question,
> > > but it wouldn't seem crazy to me if the same role controlled the
> > > ability to do this plus the ability to set the GUCs
> > > backtrace_functions, debug_invalidate_system_caches_always,
> > > wal_consistency_checking, and maybe a few other things.
> >
> > +1 for the idea of having a new role for this. Currently I have
> > implemented this feature to be supported only for the superuser. If we
> > are ok with having a new role to handle debugging features, I will
> > make a 002 patch to handle this.
>
> I see that there are a good number of user functions that are
> accessible only by superuser (I searched for "if (!superuser())" in
> the code base). I agree with the intention to not overload the
> superuser anymore and have a few other special roles to delegate the
> existing superuser-only functions to them. In that case, are we going
> to revisit and reassign all the existing superuser-only functions?

As Robert pointed out, this idea is based on Mark Dilger's proposal. Mark Dilger is already handling many of them at [1]. I'm proposing this patch only for server debugging functionalities based on Robert's suggestions at [2].
[1] - https://commitfest.postgresql.org/33/3223/
[2] - https://www.postgresql.org/message-id/CA%2BTgmoZz%3DK1bQRp0Ug%3D6uMGFWg-6kaxdHe6VSWaxq0U-YkppYQ%40mail.gmail.com

Regards,
Vignesh

Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Wed, Jul 21, 2021 at 10:56 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Wed, Jul 21, 2021 at 3:52 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Tue, Jul 13, 2021 at 9:03 PM vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > On Wed, May 12, 2021 at 2:27 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > > >
> > > > Maybe we should have a role that is specifically for server debugging
> > > > type things. This kind of overlaps with Mark Dilger's proposal to try
> > > > to allow SET for security-sensitive GUCs to be delegated via
> > > > predefined roles. The exact way to divide that up is open to question,
> > > > but it wouldn't seem crazy to me if the same role controlled the
> > > > ability to do this plus the ability to set the GUCs
> > > > backtrace_functions, debug_invalidate_system_caches_always,
> > > > wal_consistency_checking, and maybe a few other things.
> > >
> > > +1 for the idea of having a new role for this. Currently I have
> > > implemented this feature to be supported only for the superuser. If we
> > > are ok with having a new role to handle debugging features, I will
> > > make a 002 patch to handle this.
> >
> > I see that there are a good number of user functions that are
> > accessible only by superuser (I searched for "if (!superuser())" in
> > the code base). I agree with the intention to not overload the
> > superuser anymore and have a few other special roles to delegate the
> > existing superuser-only functions to them. In that case, are we going
> > to revisit and reassign all the existing superuser-only functions?
>
> As Robert pointed out, this idea is based on Mark Dilger's proposal. Mark Dilger is already handling many of them at
[1].I'm proposing this patch only for server debugging functionalities based on Robert's suggestions at [2].
 
> [1] - https://commitfest.postgresql.org/33/3223/
> [2] -
https://www.postgresql.org/message-id/CA%2BTgmoZz%3DK1bQRp0Ug%3D6uMGFWg-6kaxdHe6VSWaxq0U-YkppYQ%40mail.gmail.com

The previous patch was failing because of the recent test changes made
by commit 201a76183e2 which unified new and get_new_node, attached
patch has the changes to handle the changes accordingly.

Regards,
Vignesh

Вложения

Re: Printing backtrace of postgres processes

От
bt21tanigaway
Дата:
Hi,

> The previous patch was failing because of the recent test changes made
> by commit 201a76183e2 which unified new and get_new_node, attached
> patch has the changes to handle the changes accordingly.
> Thanks for your update!
> I have two comments.

1.Do we need “set_backtrace(NULL, 0);” on “HandleMainLoopInterrupts()”?
I could observe that it works correctly without this. It is written on 
“HandleAutoVacLauncherInterrupts” as well, but I think it is necessary 
to prevent delays as well as [1].

2.The patch seems to forget to handle
“ereport(LOG,(errmsg("logging backtrace of PID %d", MyProcPid)));” on 
“HandleAutoVacLauncherInterrupts” and “HandleMainLoopInterrupts()”.
I think it should be the same as the process on “ProcessInterrupts()”.

3.How about creating a new function.
Since the same process is on three functions( “ProcessInterrupts()”, 
“HandleAutoVacLauncherInterrupts”, “HandleMainLoopInterrupts()” ), I 
think it’s good to create a new function.

[1] https://commitfest.postgresql.org/35/3342/

Regards,
Koyu Tanigawa



Re: Printing backtrace of postgres processes

От
Daniel Gustafsson
Дата:
> On 26 Aug 2021, at 16:56, vignesh C <vignesh21@gmail.com> wrote:

> The previous patch was failing because of the recent test changes made
> by commit 201a76183e2 which unified new and get_new_node, attached
> patch has the changes to handle the changes accordingly.

This patch now fails because of the test changes made by commit b3b4d8e68a,
please submit a rebase.

--
Daniel Gustafsson        https://vmware.com/




Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Thu, Nov 4, 2021 at 4:06 PM Daniel Gustafsson <daniel@yesql.se> wrote:
>
> > On 26 Aug 2021, at 16:56, vignesh C <vignesh21@gmail.com> wrote:
>
> > The previous patch was failing because of the recent test changes made
> > by commit 201a76183e2 which unified new and get_new_node, attached
> > patch has the changes to handle the changes accordingly.
>
> This patch now fails because of the test changes made by commit b3b4d8e68a,
> please submit a rebase.

Thanks for reporting this, the attached v9 patch has the changes for the same.

Regards,
Vignesh

Вложения

Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Tue, Oct 12, 2021 at 10:47 AM bt21tanigaway
<bt21tanigaway@oss.nttdata.com> wrote:
>
> Hi,
>
> > The previous patch was failing because of the recent test changes made
> > by commit 201a76183e2 which unified new and get_new_node, attached
> > patch has the changes to handle the changes accordingly.
> > Thanks for your update!
> > I have two comments.
>
> 1.Do we need “set_backtrace(NULL, 0);” on “HandleMainLoopInterrupts()”?
> I could observe that it works correctly without this. It is written on
> “HandleAutoVacLauncherInterrupts” as well, but I think it is necessary
> to prevent delays as well as [1].

I have removed this from HandleMainLoopInterrupts

> 2.The patch seems to forget to handle
> “ereport(LOG,(errmsg("logging backtrace of PID %d", MyProcPid)));” on
> “HandleAutoVacLauncherInterrupts” and “HandleMainLoopInterrupts()”.
> I think it should be the same as the process on “ProcessInterrupts()”.

I have create ProcessPrintBacktraceInterrupt which has the
implementation and is called wherever required. It is handled now.

> 3.How about creating a new function.
> Since the same process is on three functions( “ProcessInterrupts()”,
> “HandleAutoVacLauncherInterrupts”, “HandleMainLoopInterrupts()” ), I
> think it’s good to create a new function.

I have created ProcessPrintBacktraceInterrupt to handle it.

Thanks for the comments, v9 patch attached at [1] has the changes for the same.
[1] - https://www.postgresql.org/message-id/CALDaNm3MGVP_WK1Uuf%3DBiAJ9PeVOfciwLy0mrFA1JNbRp99VOQ%40mail.gmail.com

Regards,
Vignesh



Re: Printing backtrace of postgres processes

От
Bharath Rupireddy
Дата:
On Tue, Nov 9, 2021 at 4:45 PM vignesh C <vignesh21@gmail.com> wrote:
> Thanks for reporting this, the attached v9 patch has the changes for the same.

Thanks for the v9 patch. I have some comments:

1) I think we are moving away from if (!superuser()) checks, see the
commit [1]. The goal is to let the GRANT-REVOKE system deal with who
is supposed to run these system functions. Since pg_print_backtrace
also writes the info to server logs,

2) I think we need to have LOG_SERVER_ONLY instead of LOG to avoid
bakctrace being sent to the connected client. This will be good from
security perspective as well since we don't send backtrace over the
wire to the client.
+ PrintBacktracePending = false;
+ ereport(LOG,
+ (errmsg("logging backtrace of PID %d", MyProcPid)));

for pg_log_backend_memory_contexts:
+               /*
+                * Use LOG_SERVER_ONLY to prevent the memory contexts
from being sent
+                * to the connected client.
+                *
+                * We don't buffer the information about all memory
contexts in a
+                * backend into StringInfo and log it as one message.
Otherwise which
+                * may require the buffer to be enlarged very much and
lead to OOM
+                * error since there can be a large number of memory
contexts in a
+                * backend. Instead, we log one message per memory context.
+                */
+               ereport(LOG_SERVER_ONLY,

3) I think we need to extend this function to the auxiliary processes
too, because users might be interested to see what these processes are
doing and where they are currently stuck via their backtraces, see the
proposal for pg_log_backend_memory_contexts at [2]. I think you need
to add below code in couple of other places such as
HandleCheckpointerInterrupts, HandleMainLoopInterrupts,
HandlePgArchInterrupts, HandleStartupProcInterrupts,
HandleWalWriterInterrupts.

+ /* Process printing backtrace */
+ if (PrintBacktracePending)
+ ProcessPrintBacktraceInterrupt();

[1] commit f0b051e322d530a340e62f2ae16d99acdbcb3d05
Author: Jeff Davis <jdavis@postgresql.org>
Date:   Tue Oct 26 13:13:52 2021 -0700

    Allow GRANT on pg_log_backend_memory_contexts().

    Remove superuser check, allowing any user granted permissions on
    pg_log_backend_memory_contexts() to log the memory contexts of any
    backend.

    Note that this could allow a privileged non-superuser to log the
    memory contexts of a superuser backend, but as discussed, that does
    not seem to be a problem.

    Reviewed-by: Nathan Bossart, Bharath Rupireddy, Michael Paquier,
Kyotaro Horiguchi, Andres Freund
    Discussion:
https://postgr.es/m/e5cf6684d17c8d1ef4904ae248605ccd6da03e72.camel@j-davis.com

[2] https://www.postgresql.org/message-id/CALj2ACU1nBzpacOK2q%3Da65S_4%2BOaz_rLTsU1Ri0gf7YUmnmhfQ%40mail.gmail.com

Regards,
Bharath Rupireddy.



Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Wed, Nov 10, 2021 at 12:17 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Nov 9, 2021 at 4:45 PM vignesh C <vignesh21@gmail.com> wrote:
> > Thanks for reporting this, the attached v9 patch has the changes for the same.
>
> Thanks for the v9 patch. I have some comments:
>
> 1) I think we are moving away from if (!superuser()) checks, see the
> commit [1]. The goal is to let the GRANT-REVOKE system deal with who
> is supposed to run these system functions. Since pg_print_backtrace
> also writes the info to server logs,

Modified

> 2) I think we need to have LOG_SERVER_ONLY instead of LOG to avoid
> bakctrace being sent to the connected client. This will be good from
> security perspective as well since we don't send backtrace over the
> wire to the client.
> + PrintBacktracePending = false;
> + ereport(LOG,
> + (errmsg("logging backtrace of PID %d", MyProcPid)));
>
> for pg_log_backend_memory_contexts:
> +               /*
> +                * Use LOG_SERVER_ONLY to prevent the memory contexts
> from being sent
> +                * to the connected client.
> +                *
> +                * We don't buffer the information about all memory
> contexts in a
> +                * backend into StringInfo and log it as one message.
> Otherwise which
> +                * may require the buffer to be enlarged very much and
> lead to OOM
> +                * error since there can be a large number of memory
> contexts in a
> +                * backend. Instead, we log one message per memory context.
> +                */
> +               ereport(LOG_SERVER_ONLY,

Modified

> 3) I think we need to extend this function to the auxiliary processes
> too, because users might be interested to see what these processes are
> doing and where they are currently stuck via their backtraces, see the
> proposal for pg_log_backend_memory_contexts at [2]. I think you need
> to add below code in couple of other places such as
> HandleCheckpointerInterrupts, HandleMainLoopInterrupts,
> HandlePgArchInterrupts, HandleStartupProcInterrupts,
> HandleWalWriterInterrupts.
>
> + /* Process printing backtrace */
> + if (PrintBacktracePending)
> + ProcessPrintBacktraceInterrupt();

Created 0002 patch to handle this.

Thanks for the comments, the attached v10 patch has the fixes for the same.

Regards,
Vignesh

Вложения

Re: Printing backtrace of postgres processes

От
Bharath Rupireddy
Дата:
On Thu, Nov 11, 2021 at 12:14 PM vignesh C <vignesh21@gmail.com> wrote:
> Thanks for the comments, the attached v10 patch has the fixes for the same.

Thanks for the patches. Here are some comments:

1) In the docs, let's have the similar description of
pg_log_backend_memory_contexts for pg_print_backtrace, just for the
continuity in the users readability.

2) I don't know how the <screen> part looks like in the Server
Signaling Functions table. I think here you can just say, it will emit
a warning and return false if not supported by the installation. And
you can give the <screen> part after the description where you are
showing a sample backtrace.

+        capture backtrace. If not available, the function will return false
+        and a warning is issued, for example:
+<screen>
+WARNING:  backtrace generation is not supported by this installation
+ pg_print_backtrace
+--------------------
+ f
+</screen>
+       </para></entry>
+      </row>

3) Replace '!' with '.'.
+ * Note: this is called within a signal handler!  All we can do is set

4) It is not only the next CFI but also the process specific interrupt
handlers (in your 0002 patch) right?
+ * a flag that will cause the next CHECK_FOR_INTERRUPTS to invoke

5) I think you need to update CATALOG_VERSION_NO, mostly the committer
will take care of it but just in case.

6) Be consistent with casing "Verify" and "Might"
+# Verify that log output gets to the file
+# might need to retry if logging collector process is slow...


Regards,
Bharath Rupireddy.



Re: Printing backtrace of postgres processes

От
Bharath Rupireddy
Дата:
On Fri, Nov 12, 2021 at 5:15 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Nov 11, 2021 at 12:14 PM vignesh C <vignesh21@gmail.com> wrote:
> > Thanks for the comments, the attached v10 patch has the fixes for the same.
>
> Thanks for the patches. Here are some comments:
>
> 1) In the docs, let's have the similar description of
> pg_log_backend_memory_contexts for pg_print_backtrace, just for the
> continuity in the users readability.
>
> 2) I don't know how the <screen> part looks like in the Server
> Signaling Functions table. I think here you can just say, it will emit
> a warning and return false if not supported by the installation. And
> you can give the <screen> part after the description where you are
> showing a sample backtrace.
>
> +        capture backtrace. If not available, the function will return false
> +        and a warning is issued, for example:
> +<screen>
> +WARNING:  backtrace generation is not supported by this installation
> + pg_print_backtrace
> +--------------------
> + f
> +</screen>
> +       </para></entry>
> +      </row>
>
> 3) Replace '!' with '.'.
> + * Note: this is called within a signal handler!  All we can do is set
>
> 4) It is not only the next CFI but also the process specific interrupt
> handlers (in your 0002 patch) right?
> + * a flag that will cause the next CHECK_FOR_INTERRUPTS to invoke
>
> 5) I think you need to update CATALOG_VERSION_NO, mostly the committer
> will take care of it but just in case.
>
> 6) Be consistent with casing "Verify" and "Might"
> +# Verify that log output gets to the file
> +# might need to retry if logging collector process is slow...

Few more comments:

7) Do we need TAP tests for this function? I think it is sufficient to
test the function in misc_functions.sql, please remove
002_print_backtrace_validation.pl. Note that we don't do similar TAP
testing for pg_log_backend_memory_contexts as well.

8) I hope you have manually tested the pg_print_backtrace for the
startup process and other auxiliary processes.

9) I think we can have a similar description (to the patch [1]). with
links to each process definition in the glossary so that it will be
easier for the users to follow the links and know what each process
is. Especially, I didn't like the 0002 mentioned about the
BackendPidGetProc or AuxiliaryPidGetProc as these are internal to the
server and the users don't care about them.

- * end on its own first and its backtrace are not logged is not a problem.
+ * BackendPidGetProc or AuxiliaryPidGetProc returns NULL if the pid isn't
+ * valid; but by the time we reach kill(), a process for which we get a
+ * valid proc here might have terminated on its own.  There's no way to
+ * acquire a lock on an arbitrary process to prevent that. But since this
+ * mechanism is usually used to debug a backend running and consuming lots
+ * of CPU cycles, that it might end on its own first and its backtrace are
+ * not logged is not a problem.
  */

Here's what I have written in the other patch [1], if okay please use this:

+        Requests to log memory contexts of the backend or the
+        <glossterm linkend="glossary-wal-sender">WAL sender</glossterm> or
+        the <glossterm linkend="glossary-auxiliary-proc">auxiliary
process</glossterm>
+        with the specified process ID. All of the
+        <glossterm linkend="glossary-auxiliary-proc">auxiliary
processes</glossterm>
+        are supported except the <glossterm
linkend="glossary-logger">logger</glossterm>
+        and the <glossterm
linkend="glossary-stats-collector">statistics collector</glossterm>
+        as they are not connected to shared memory the function can
not make requests.
+        These memory contexts will be logged at
<literal>LOG</literal> message level.
+        They will appear in the server log based on the log configuration set
         (See <xref linkend="runtime-config-logging"/> for more information),

I have no further comments on v10.

[1] - https://www.postgresql.org/message-id/CALj2ACU1nBzpacOK2q%3Da65S_4%2BOaz_rLTsU1Ri0gf7YUmnmhfQ%40mail.gmail.com

Regards,
Bharath Rupireddy.



Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Fri, Nov 12, 2021 at 5:15 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Nov 11, 2021 at 12:14 PM vignesh C <vignesh21@gmail.com> wrote:
> > Thanks for the comments, the attached v10 patch has the fixes for the same.
>
> Thanks for the patches. Here are some comments:
>
> 1) In the docs, let's have the similar description of
> pg_log_backend_memory_contexts for pg_print_backtrace, just for the
> continuity in the users readability.

I have kept some contents of the description similar. There is some
additional information to explain more about the functionality. I felt
that will help the user to understand more about the feature.

> 2) I don't know how the <screen> part looks like in the Server
> Signaling Functions table. I think here you can just say, it will emit
> a warning and return false if not supported by the installation. And
> you can give the <screen> part after the description where you are
> showing a sample backtrace.
>
> +        capture backtrace. If not available, the function will return false
> +        and a warning is issued, for example:
> +<screen>
> +WARNING:  backtrace generation is not supported by this installation
> + pg_print_backtrace
> +--------------------
> + f
> +</screen>
> +       </para></entry>
> +      </row>

Modified

> 3) Replace '!' with '.'.
> + * Note: this is called within a signal handler!  All we can do is set

I have changed it similar to HandleLogMemoryContextInterrupt

> 4) It is not only the next CFI but also the process specific interrupt
> handlers (in your 0002 patch) right?
> + * a flag that will cause the next CHECK_FOR_INTERRUPTS to invoke

Modified

> 5) I think you need to update CATALOG_VERSION_NO, mostly the committer
> will take care of it but just in case.

Modified

> 6) Be consistent with casing "Verify" and "Might"
> +# Verify that log output gets to the file
> +# might need to retry if logging collector process is slow...

Modified

The attached v11 patch has the changes for the same.

Regards,
Vignesh

Вложения

Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Fri, Nov 12, 2021 at 6:11 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Nov 12, 2021 at 5:15 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Thu, Nov 11, 2021 at 12:14 PM vignesh C <vignesh21@gmail.com> wrote:
> > > Thanks for the comments, the attached v10 patch has the fixes for the same.
> >
> > Thanks for the patches. Here are some comments:
> >
> > 1) In the docs, let's have the similar description of
> > pg_log_backend_memory_contexts for pg_print_backtrace, just for the
> > continuity in the users readability.
> >
> > 2) I don't know how the <screen> part looks like in the Server
> > Signaling Functions table. I think here you can just say, it will emit
> > a warning and return false if not supported by the installation. And
> > you can give the <screen> part after the description where you are
> > showing a sample backtrace.
> >
> > +        capture backtrace. If not available, the function will return false
> > +        and a warning is issued, for example:
> > +<screen>
> > +WARNING:  backtrace generation is not supported by this installation
> > + pg_print_backtrace
> > +--------------------
> > + f
> > +</screen>
> > +       </para></entry>
> > +      </row>
> >
> > 3) Replace '!' with '.'.
> > + * Note: this is called within a signal handler!  All we can do is set
> >
> > 4) It is not only the next CFI but also the process specific interrupt
> > handlers (in your 0002 patch) right?
> > + * a flag that will cause the next CHECK_FOR_INTERRUPTS to invoke
> >
> > 5) I think you need to update CATALOG_VERSION_NO, mostly the committer
> > will take care of it but just in case.
> >
> > 6) Be consistent with casing "Verify" and "Might"
> > +# Verify that log output gets to the file
> > +# might need to retry if logging collector process is slow...
>
> Few more comments:
>
> 7) Do we need TAP tests for this function? I think it is sufficient to
> test the function in misc_functions.sql, please remove
> 002_print_backtrace_validation.pl. Note that we don't do similar TAP
> testing for pg_log_backend_memory_contexts as well.

I felt let's keep this test case, all the other tests just check if it
returns true or false, it does not checks for the contents in the
logfile. This is the only test which checks the logfile.

> 8) I hope you have manually tested the pg_print_backtrace for the
> startup process and other auxiliary processes.

Yes, I have checked them manually.

> 9) I think we can have a similar description (to the patch [1]). with
> links to each process definition in the glossary so that it will be
> easier for the users to follow the links and know what each process
> is. Especially, I didn't like the 0002 mentioned about the
> BackendPidGetProc or AuxiliaryPidGetProc as these are internal to the
> server and the users don't care about them.
>
> - * end on its own first and its backtrace are not logged is not a problem.
> + * BackendPidGetProc or AuxiliaryPidGetProc returns NULL if the pid isn't
> + * valid; but by the time we reach kill(), a process for which we get a
> + * valid proc here might have terminated on its own.  There's no way to
> + * acquire a lock on an arbitrary process to prevent that. But since this
> + * mechanism is usually used to debug a backend running and consuming lots
> + * of CPU cycles, that it might end on its own first and its backtrace are
> + * not logged is not a problem.
>   */
>
> Here's what I have written in the other patch [1], if okay please use this:
>
> +        Requests to log memory contexts of the backend or the
> +        <glossterm linkend="glossary-wal-sender">WAL sender</glossterm> or
> +        the <glossterm linkend="glossary-auxiliary-proc">auxiliary
> process</glossterm>
> +        with the specified process ID. All of the
> +        <glossterm linkend="glossary-auxiliary-proc">auxiliary
> processes</glossterm>
> +        are supported except the <glossterm
> linkend="glossary-logger">logger</glossterm>
> +        and the <glossterm
> linkend="glossary-stats-collector">statistics collector</glossterm>
> +        as they are not connected to shared memory the function can
> not make requests.
> +        These memory contexts will be logged at
> <literal>LOG</literal> message level.
> +        They will appear in the server log based on the log configuration set
>          (See <xref linkend="runtime-config-logging"/> for more information),

I had mentioned BackendPidGetProc or AuxiliaryPidGetProc as comments
in the function, but I have not changed that. I have changed the
documentation similar to your patch.

Thanks for the comments, v11 patch attached at [1] has the changes for the same.
[1] - https://www.postgresql.org/message-id/CALDaNm3BYGOG3-PQvYbWkB%3DG3h1KYJ8CO8UYbzfECH4DYGMGqA%40mail.gmail.com

Regards,
Vignesh



Re: Printing backtrace of postgres processes

От
Bharath Rupireddy
Дата:
On Sun, Nov 14, 2021 at 8:49 PM vignesh C <vignesh21@gmail.com> wrote:
> > 7) Do we need TAP tests for this function? I think it is sufficient to
> > test the function in misc_functions.sql, please remove
> > 002_print_backtrace_validation.pl. Note that we don't do similar TAP
> > testing for pg_log_backend_memory_contexts as well.
>
> I felt let's keep this test case, all the other tests just check if it
> returns true or false, it does not checks for the contents in the
> logfile. This is the only test which checks the logfile.

 I still don't agree to have test cases within a new file
002_print_backtrace_validation.pl. I feel this test case doesn't add
value because the code coverage is done by .sql test cases and .pl
just ensures the backtrace appears in the server logs. I don't think
we ever need a new file for this purpose. If this is the case, then
there are other functions like pg_log_backend_memory_contexts  or
pg_log_query_plan (in progress thread) might add the same test files
for the same reasons which make the TAP tests i.e. "make check-world"
to take longer times. Moreover, pg_log_backend_memory_contexts  has
been committed without having a TAP test case.

I think we can remove it.

Few more comments on v11:
1) I think we can improve here by adding a link to "backend" as well,
I will modify it in the other thread.
+        Requests to log the backtrace of the backend or the
+        <glossterm linkend="glossary-wal-sender">WAL sender</glossterm> or
Something like:
+        Requests to log the backtrace of the <glossterm
linkend="glossary-backend">backend</glossterm> or the
+        <glossterm linkend="glossary-wal-sender">WAL sender</glossterm> or

2) I think "which is enough because the target process for logging of
backtrace is a backend" isn't valid anymore with 0002, righit? Please
remove it.
+ * to call this function if we see PrintBacktracePending set. It is called from
+ * CHECK_FOR_INTERRUPTS() or from process specific interrupt handlers, which is
+ * enough because the target process for logging of backtrace is a backend.

> Thanks for the comments, v11 patch attached at [1] has the changes for the same.

The v11 patches mostly look good to me except the above comments.

Regards,
Bharath Rupireddy.



Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Mon, Nov 15, 2021 at 7:37 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Sun, Nov 14, 2021 at 8:49 PM vignesh C <vignesh21@gmail.com> wrote:
> > > 7) Do we need TAP tests for this function? I think it is sufficient to
> > > test the function in misc_functions.sql, please remove
> > > 002_print_backtrace_validation.pl. Note that we don't do similar TAP
> > > testing for pg_log_backend_memory_contexts as well.
> >
> > I felt let's keep this test case, all the other tests just check if it
> > returns true or false, it does not checks for the contents in the
> > logfile. This is the only test which checks the logfile.
>
>  I still don't agree to have test cases within a new file
> 002_print_backtrace_validation.pl. I feel this test case doesn't add
> value because the code coverage is done by .sql test cases and .pl
> just ensures the backtrace appears in the server logs. I don't think
> we ever need a new file for this purpose. If this is the case, then
> there are other functions like pg_log_backend_memory_contexts  or
> pg_log_query_plan (in progress thread) might add the same test files
> for the same reasons which make the TAP tests i.e. "make check-world"
> to take longer times. Moreover, pg_log_backend_memory_contexts  has
> been committed without having a TAP test case.
>
> I think we can remove it.

Removed

> Few more comments on v11:
> 1) I think we can improve here by adding a link to "backend" as well,
> I will modify it in the other thread.
> +        Requests to log the backtrace of the backend or the
> +        <glossterm linkend="glossary-wal-sender">WAL sender</glossterm> or
> Something like:
> +        Requests to log the backtrace of the <glossterm
> linkend="glossary-backend">backend</glossterm> or the
> +        <glossterm linkend="glossary-wal-sender">WAL sender</glossterm> or

Modified

> 2) I think "which is enough because the target process for logging of
> backtrace is a backend" isn't valid anymore with 0002, righit? Please
> remove it.
> + * to call this function if we see PrintBacktracePending set. It is called from
> + * CHECK_FOR_INTERRUPTS() or from process specific interrupt handlers, which is
> + * enough because the target process for logging of backtrace is a backend.
>
> > Thanks for the comments, v11 patch attached at [1] has the changes for the same.

Modified

Thanks for the comments, the attached v12 patch has the changes for the same.

Regards,
Vignesh

Вложения

Re: Printing backtrace of postgres processes

От
Bharath Rupireddy
Дата:
On Mon, Nov 15, 2021 at 10:34 AM vignesh C <vignesh21@gmail.com> wrote:
> > 2) I think "which is enough because the target process for logging of
> > backtrace is a backend" isn't valid anymore with 0002, righit? Please
> > remove it.
> > + * to call this function if we see PrintBacktracePending set. It is called from
> > + * CHECK_FOR_INTERRUPTS() or from process specific interrupt handlers, which is
> > + * enough because the target process for logging of backtrace is a backend.
> >
> > > Thanks for the comments, v11 patch attached at [1] has the changes for the same.
>
> Modified

I don't see the above change in v12. Am I missing something? I still
see the comment "It is called from CHECK_FOR_INTERRUPTS(), which is
enough because the target process for logging of backtrace is a
backend.".

+ * ProcessPrintBacktraceInterrupt - Perform logging of backtrace of this
+ * backend process.
+ *
+ * Any backend that participates in ProcSignal signaling must arrange
+ * to call this function if we see PrintBacktracePending set.
+ * It is called from CHECK_FOR_INTERRUPTS(), which is enough because
+ * the target process for logging of backtrace is a backend.
+ */
+void
+ProcessPrintBacktraceInterrupt(void)

Regards,
Bharath Rupireddy.



Re: Printing backtrace of postgres processes

От
Dilip Kumar
Дата:
On Mon, Nov 15, 2021 at 10:34 AM vignesh C <vignesh21@gmail.com> wrote:

>
> Thanks for the comments, the attached v12 patch has the changes for the same.

I have reviewed this patch and have some comments on v12-0001,

1.
+        This feature is not supported for the postmaster, logger, checkpointer,
+        walwriter, background writer or statistics collector process. This


Comment says it is not supported for postmaster, logger, checkpointer
etc, but I just tried and it is working for checkpointer and walwriter
processes, can you explain in comments why do we not want to support
for these processes?  or the comment is old and now we are supporting
for some of these processes.


2.
postgres[64154]=# select pg_print_backtrace(64136);
WARNING:  01000: PID 64136 is not a PostgreSQL server process
LOCATION:  pg_print_backtrace, signalfuncs.c:335
 pg_print_backtrace
--------------------
 f


For postmaster I am getting this WARNING "PID 64136 is not a
PostgreSQL server process", even if we don't want to support this
process I don't think this message is good.



3.
I was looking into the patch, I tried to print the backtrace using GDB
as well as using this function, I have complied in debug mode, I can
see the backtrace printed
by GDB is more detailed than printed by this API, I understand we can
find out the function name from address, but can't we print the
detailed call stack with all function names at least when debug
symbols are available?

Using GDB

#0  0x00007fa26c527e93 in __epoll_wait_nocancel () from
#1  0x0000000000947a61 in WaitEventSetWaitBlock (set=0x2
#2  0x00000000009478f9 in WaitEventSetWait (set=0x2f0111
#3  0x00000000007a6cef in secure_read (port=0x2f26800, p
#4  0x00000000007b0bd6 in pq_recvbuf () at pqcomm.c:957
#5  0x00000000007b0c86 in pq_getbyte () at pqcomm.c:1000
#6  0x0000000000978c13 in SocketBackend (inBuf=0x7ffea99
#7  0x0000000000978e37 in ReadCommand (inBuf=0x7ffea9937
#8  0x000000000097dca5 in PostgresMain (dbname=0x2f2ef40
....

Using pg_print_backtrace
        postgres: dilipkumar postgres [local]
SELECT(set_backtrace+0x38) [0xb118ff]
        postgres: dilipkumar postgres [local]
SELECT(ProcessPrintBacktraceInterrupt+0x5b) [0x94fe42]
        postgres: dilipkumar postgres [local]
SELECT(ProcessInterrupts+0x7d9) [0x97cb2a]
        postgres: dilipkumar postgres [local] SELECT() [0x78143c]
        postgres: dilipkumar postgres [local] SELECT() [0x736731]
        postgres: dilipkumar postgres [local] SELECT() [0x738f5f]
        postgres: dilipkumar postgres [local]
SELECT(standard_ExecutorRun+0x1d6) [0x736d94]
        postgres: dilipkumar postgres [local] SELECT(ExecutorRun+0x55)
[0x736bbc]
        postgres: dilipkumar postgres [local] SELECT() [0x97ff0c]
        postgres: dilipkumar postgres [local] SELECT(PortalRun+0x268) [0x97fbbf]
        postgres: dilipkumar postgres [local] SELECT() [0x9798dc]
        postgres: dilipkumar postgres [local]
SELECT(PostgresMain+0x6ca) [0x97dda7]

4.
+WARNING:  backtrace generation is not supported by this installation
+ pg_print_backtrace
+--------------------
+ f

Should we give some hints on how to enable that?


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Printing backtrace of postgres processes

От
Bharath Rupireddy
Дата:
On Mon, Nov 15, 2021 at 11:37 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Nov 15, 2021 at 10:34 AM vignesh C <vignesh21@gmail.com> wrote:
>
> >
> > Thanks for the comments, the attached v12 patch has the changes for the same.
>
> I have reviewed this patch and have some comments on v12-0001,
>
> 1.
> +        This feature is not supported for the postmaster, logger, checkpointer,
> +        walwriter, background writer or statistics collector process. This
>
>
> Comment says it is not supported for postmaster, logger, checkpointer
> etc, but I just tried and it is working for checkpointer and walwriter
> processes, can you explain in comments why do we not want to support
> for these processes?  or the comment is old and now we are supporting
> for some of these processes.

Please see the v12-0002 which will have the description modified.

> 2.
> postgres[64154]=# select pg_print_backtrace(64136);
> WARNING:  01000: PID 64136 is not a PostgreSQL server process
> LOCATION:  pg_print_backtrace, signalfuncs.c:335
>  pg_print_backtrace
> --------------------
>  f
>
>
> For postmaster I am getting this WARNING "PID 64136 is not a
> PostgreSQL server process", even if we don't want to support this
> process I don't think this message is good.

This is a generic message that is coming from pg_signal_backend, not
related to Vignesh's patch. I agree with you that emitting a "not
postgres server process" for the postmaster process which is the main
"postgres process" doesn't sound sensible. Please see there's already
a thread [1] and see the v1 patch [2] for changing this message.
Please let me know if you want me to revive that stalled thread?

[1] https://www.postgresql.org/message-id/CALj2ACW7Rr-R7mBcBQiXWPp%3DJV5chajjTdudLiF5YcpW-BmHhg%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CALj2ACUGxedgYk-5nO8D2EJV2YHXnoycp_oqYAxDXTODhWkmkg%40mail.gmail.com

Regards,
Bharath Rupireddy.



Re: Printing backtrace of postgres processes

От
Dilip Kumar
Дата:
On Mon, Nov 15, 2021 at 11:53 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Nov 15, 2021 at 11:37 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Mon, Nov 15, 2021 at 10:34 AM vignesh C <vignesh21@gmail.com> wrote:
> >
> > >
> > > Thanks for the comments, the attached v12 patch has the changes for the same.
> >
> > I have reviewed this patch and have some comments on v12-0001,
> >
> > 1.
> > +        This feature is not supported for the postmaster, logger, checkpointer,
> > +        walwriter, background writer or statistics collector process. This
> >
> >
> > Comment says it is not supported for postmaster, logger, checkpointer
> > etc, but I just tried and it is working for checkpointer and walwriter
> > processes, can you explain in comments why do we not want to support
> > for these processes?  or the comment is old and now we are supporting
> > for some of these processes.
>
> Please see the v12-0002 which will have the description modified.

Okay, now I see that.

> > 2.
> > postgres[64154]=# select pg_print_backtrace(64136);
> > WARNING:  01000: PID 64136 is not a PostgreSQL server process
> > LOCATION:  pg_print_backtrace, signalfuncs.c:335
> >  pg_print_backtrace
> > --------------------
> >  f
> >
> >
> > For postmaster I am getting this WARNING "PID 64136 is not a
> > PostgreSQL server process", even if we don't want to support this
> > process I don't think this message is good.
>
> This is a generic message that is coming from pg_signal_backend, not
> related to Vignesh's patch. I agree with you that emitting a "not
> postgres server process" for the postmaster process which is the main
> "postgres process" doesn't sound sensible. Please see there's already
> a thread [1] and see the v1 patch [2] for changing this message.
> Please let me know if you want me to revive that stalled thread?

>[1] https://www.postgresql.org/message-id/CALj2ACW7Rr-R7mBcBQiXWPp%3DJV5chajjTdudLiF5YcpW-BmHhg%40mail.gmail.com
>[2] https://www.postgresql.org/message-id/CALj2ACUGxedgYk-5nO8D2EJV2YHXnoycp_oqYAxDXTODhWkmkg%40mail.gmail.com

Hmm, yeah I think I like the idea posted in [1], however, I could not
open the link [2]

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Mon, Nov 15, 2021 at 11:00 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Nov 15, 2021 at 10:34 AM vignesh C <vignesh21@gmail.com> wrote:
> > > 2) I think "which is enough because the target process for logging of
> > > backtrace is a backend" isn't valid anymore with 0002, righit? Please
> > > remove it.
> > > + * to call this function if we see PrintBacktracePending set. It is called from
> > > + * CHECK_FOR_INTERRUPTS() or from process specific interrupt handlers, which is
> > > + * enough because the target process for logging of backtrace is a backend.
> > >
> > > > Thanks for the comments, v11 patch attached at [1] has the changes for the same.
> >
> > Modified
>
> I don't see the above change in v12. Am I missing something? I still
> see the comment "It is called from CHECK_FOR_INTERRUPTS(), which is
> enough because the target process for logging of backtrace is a
> backend.".

This change is present in the 0002
(v12-0002-pg_print_backtrace-support-for-printing-backtrac.patch)
patch.

Regards,
Vignesh



Re: Printing backtrace of postgres processes

От
Bharath Rupireddy
Дата:
On Mon, Nov 15, 2021 at 12:13 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Mon, Nov 15, 2021 at 11:00 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Mon, Nov 15, 2021 at 10:34 AM vignesh C <vignesh21@gmail.com> wrote:
> > > > 2) I think "which is enough because the target process for logging of
> > > > backtrace is a backend" isn't valid anymore with 0002, righit? Please
> > > > remove it.
> > > > + * to call this function if we see PrintBacktracePending set. It is called from
> > > > + * CHECK_FOR_INTERRUPTS() or from process specific interrupt handlers, which is
> > > > + * enough because the target process for logging of backtrace is a backend.
> > > >
> > > > > Thanks for the comments, v11 patch attached at [1] has the changes for the same.
> > >
> > > Modified
> >
> > I don't see the above change in v12. Am I missing something? I still
> > see the comment "It is called from CHECK_FOR_INTERRUPTS(), which is
> > enough because the target process for logging of backtrace is a
> > backend.".
>
> This change is present in the 0002
> (v12-0002-pg_print_backtrace-support-for-printing-backtrac.patch)
> patch.

Thanks. Yes, it was removed in 0002. I missed it.

Regards,
Bharath Rupireddy.



Re: Printing backtrace of postgres processes

От
Bharath Rupireddy
Дата:
On Mon, Nov 15, 2021 at 12:08 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > 2.
> > > postgres[64154]=# select pg_print_backtrace(64136);
> > > WARNING:  01000: PID 64136 is not a PostgreSQL server process
> > > LOCATION:  pg_print_backtrace, signalfuncs.c:335
> > >  pg_print_backtrace
> > > --------------------
> > >  f
> > >
> > >
> > > For postmaster I am getting this WARNING "PID 64136 is not a
> > > PostgreSQL server process", even if we don't want to support this
> > > process I don't think this message is good.
> >
> > This is a generic message that is coming from pg_signal_backend, not
> > related to Vignesh's patch. I agree with you that emitting a "not
> > postgres server process" for the postmaster process which is the main
> > "postgres process" doesn't sound sensible. Please see there's already
> > a thread [1] and see the v1 patch [2] for changing this message.
> > Please let me know if you want me to revive that stalled thread?
>
> >[1] https://www.postgresql.org/message-id/CALj2ACW7Rr-R7mBcBQiXWPp%3DJV5chajjTdudLiF5YcpW-BmHhg%40mail.gmail.com
> >[2] https://www.postgresql.org/message-id/CALj2ACUGxedgYk-5nO8D2EJV2YHXnoycp_oqYAxDXTODhWkmkg%40mail.gmail.com
>
> Hmm, yeah I think I like the idea posted in [1], however, I could not
> open the link [2]

Thanks, I posted an updated v3 patch at [1]. Please review it there.

[1] - https://www.postgresql.org/message-id/CALj2ACWS2bJRW-bSvcoL4FvS%3DkbQ8SSWXi%3D9RFUt7uqZvTQWWw%40mail.gmail.com

Regards,
Bharath Rupireddy.



Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Mon, Nov 15, 2021 at 11:37 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Nov 15, 2021 at 10:34 AM vignesh C <vignesh21@gmail.com> wrote:
>
> >
> > Thanks for the comments, the attached v12 patch has the changes for the same.
>
> I have reviewed this patch and have some comments on v12-0001,
>
> 1.
> +        This feature is not supported for the postmaster, logger, checkpointer,
> +        walwriter, background writer or statistics collector process. This
>
>
> Comment says it is not supported for postmaster, logger, checkpointer
> etc, but I just tried and it is working for checkpointer and walwriter
> processes, can you explain in comments why do we not want to support
> for these processes?  or the comment is old and now we are supporting
> for some of these processes.
>
>
> 2.
> postgres[64154]=# select pg_print_backtrace(64136);
> WARNING:  01000: PID 64136 is not a PostgreSQL server process
> LOCATION:  pg_print_backtrace, signalfuncs.c:335
>  pg_print_backtrace
> --------------------
>  f
>
>
> For postmaster I am getting this WARNING "PID 64136 is not a
> PostgreSQL server process", even if we don't want to support this
> process I don't think this message is good.
>
>
>
> 3.
> I was looking into the patch, I tried to print the backtrace using GDB
> as well as using this function, I have complied in debug mode, I can
> see the backtrace printed
> by GDB is more detailed than printed by this API, I understand we can
> find out the function name from address, but can't we print the
> detailed call stack with all function names at least when debug
> symbols are available?
>
> Using GDB
>
> #0  0x00007fa26c527e93 in __epoll_wait_nocancel () from
> #1  0x0000000000947a61 in WaitEventSetWaitBlock (set=0x2
> #2  0x00000000009478f9 in WaitEventSetWait (set=0x2f0111
> #3  0x00000000007a6cef in secure_read (port=0x2f26800, p
> #4  0x00000000007b0bd6 in pq_recvbuf () at pqcomm.c:957
> #5  0x00000000007b0c86 in pq_getbyte () at pqcomm.c:1000
> #6  0x0000000000978c13 in SocketBackend (inBuf=0x7ffea99
> #7  0x0000000000978e37 in ReadCommand (inBuf=0x7ffea9937
> #8  0x000000000097dca5 in PostgresMain (dbname=0x2f2ef40
> ....
>
> Using pg_print_backtrace
>         postgres: dilipkumar postgres [local]
> SELECT(set_backtrace+0x38) [0xb118ff]
>         postgres: dilipkumar postgres [local]
> SELECT(ProcessPrintBacktraceInterrupt+0x5b) [0x94fe42]
>         postgres: dilipkumar postgres [local]
> SELECT(ProcessInterrupts+0x7d9) [0x97cb2a]
>         postgres: dilipkumar postgres [local] SELECT() [0x78143c]
>         postgres: dilipkumar postgres [local] SELECT() [0x736731]
>         postgres: dilipkumar postgres [local] SELECT() [0x738f5f]
>         postgres: dilipkumar postgres [local]
> SELECT(standard_ExecutorRun+0x1d6) [0x736d94]
>         postgres: dilipkumar postgres [local] SELECT(ExecutorRun+0x55)
> [0x736bbc]
>         postgres: dilipkumar postgres [local] SELECT() [0x97ff0c]
>         postgres: dilipkumar postgres [local] SELECT(PortalRun+0x268) [0x97fbbf]
>         postgres: dilipkumar postgres [local] SELECT() [0x9798dc]
>         postgres: dilipkumar postgres [local]
> SELECT(PostgresMain+0x6ca) [0x97dda7]

I did not find any API's with such an implementation. We have used
backtrace and backtrace_symbols to print the address. Same thing is
used to add error backtrace and print backtrace for assert failure,
see errbacktrace and ExceptionalCondition. I felt this is ok.

> 4.
> +WARNING:  backtrace generation is not supported by this installation
> + pg_print_backtrace
> +--------------------
> + f
>
> Should we give some hints on how to enable that?

Modified to include a hint message.
The Attached v13 patch has the fix for it.

Regards,
Vignesh

Вложения

Re: Printing backtrace of postgres processes

От
Bharath Rupireddy
Дата:
On Tue, Nov 16, 2021 at 1:12 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Mon, Nov 15, 2021 at 09:12:49PM +0530, vignesh C wrote:
> > The idea here is to implement & expose pg_print_backtrace function, internally
>
> This patch is closely related to this one
> https://commitfest.postgresql.org/35/3142/
> | Logging plan of the currently running query
>
> I suggest to review that patch and make sure there's nothing you could borrow.
>
> My only comment for now is that maybe the function name should be
> pg_log_backtrace() rather than pg_print_backtrace(), since it doesn't actually
> "print" the backtrace, but rather request the other backend to log its
> backtrace.

+1 for pg_log_backtrace().

Regards,
Bharath Rupireddy.



Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Tue, Nov 16, 2021 at 1:12 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Mon, Nov 15, 2021 at 09:12:49PM +0530, vignesh C wrote:
> > The idea here is to implement & expose pg_print_backtrace function, internally
>
> This patch is closely related to this one
> https://commitfest.postgresql.org/35/3142/
> | Logging plan of the currently running query
>
> I suggest to review that patch and make sure there's nothing you could borrow.

I had a look and felt the attached patch is in similar lines

> My only comment for now is that maybe the function name should be
> pg_log_backtrace() rather than pg_print_backtrace(), since it doesn't actually
> "print" the backtrace, but rather request the other backend to log its
> backtrace.

Modified

> Did you see that the windows build failed ?
> https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.153557
> I think you'll need to create an "alternate" output like
> src/test/regress/expected/misc_functions_1.out
>
> It's possible that's best done by creating a totally new .sql and .out files
> added to src/test/regress/parallel_schedule - because otherwise, you'd have to
> duplicate the existing 300 lines of misc_fuctions.out.

Modified

Attached v14 patch has the fixes for the same.

Regards,
Vignesh

Вложения

Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Thu, Nov 18, 2021 at 9:52 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Wed, Nov 17, 2021 at 08:12:44PM +0530, vignesh C wrote:
> > Attached v14 patch has the fixes for the same.
>
> Thanks for updating the patch.
>
> I cleaned up the docs and comments.  I think this could be nearly "Ready".
>
> If you like the changes in my "fixup" patch (0002 and 0004), you should be able
> to apply my 0002 on top of your 0001.  I'm sure it'll cause some conflicts with
> your 2nd patch, though...

I have slightly modified and taken the changes. I have not taken a few
of the changes to keep it similar to pg_log_backend_memory_contexts.

> This doesn't bump the catversion, since that would cause the patch to fail in
> cfbot every time another commit updates catversion.

I have removed it, since it there in commit  message it is ok.

> Your 0001 patch allows printing backtraces of autovacuum, but the doc says it's
> only for "backends".  Should the change to autovacuum.c be moved to the 2nd
> patch ?  Or, why is autovacuum so special that it should be handled in the
> first patch ?

I had separated the patches so that it is easier for review, I have
merged the changes as few rounds of review is done for the patch. Now
since the patch is merged, this change is handled.
The Attached v15 patch has the fixes for the same.

Regards,
Vignesh

Вложения

Re: Printing backtrace of postgres processes

От
Bharath Rupireddy
Дата:
On Fri, Nov 19, 2021 at 4:07 PM vignesh C <vignesh21@gmail.com> wrote:
> The Attached v15 patch has the fixes for the same.

Thanks. The v15 patch LGTM and the cf bot is happy hence marking it as RfC.

Regards,
Bharath Rupireddy.



Re: Printing backtrace of postgres processes

От
Bharath Rupireddy
Дата:
On Sat, Nov 20, 2021 at 11:50 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Nov 19, 2021 at 4:07 PM vignesh C <vignesh21@gmail.com> wrote:
> > The Attached v15 patch has the fixes for the same.
>
> Thanks. The v15 patch LGTM and the cf bot is happy hence marking it as RfC.

The patch was not applying because of the recent commit [1]. I took
this opportunity and tried a bunch of things without modifying the
core logic of the pg_log_backtrace feature that Vignesh has worked on.

I did following -  moved the duplicate code to a new function
CheckPostgresProcessId which can be used by pg_log_backtrace,
pg_log_memory_contexts, pg_signal_backend and pg_log_query_plan ([2]),
modified the code comments, docs and tests to be more in sync with the
commit [1], moved two of ProcessLogBacktraceInterrupt calls (archiver
and wal writer) to their respective interrupt handlers. Here's the v16
version that I've come up with.

[1]
commit 790fbda902093c71ae47bff1414799cd716abb80
Author: Fujii Masao <fujii@postgresql.org>
Date:   Tue Jan 11 23:19:59 2022 +0900

    Enhance pg_log_backend_memory_contexts() for auxiliary processes.

[2] https://www.postgresql.org/message-id/flat/20220114063846.7jygvulyu6g23kdv%40jrouhaud

Regards,
Bharath Rupireddy.

Вложения

Re: Printing backtrace of postgres processes

От
torikoshia
Дата:
On 2022-01-14 19:48, Bharath Rupireddy wrote:
> On Sat, Nov 20, 2021 at 11:50 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>> 
>> On Fri, Nov 19, 2021 at 4:07 PM vignesh C <vignesh21@gmail.com> wrote:
>> > The Attached v15 patch has the fixes for the same.
>> 
>> Thanks. The v15 patch LGTM and the cf bot is happy hence marking it as 
>> RfC.
> 
> The patch was not applying because of the recent commit [1]. I took
> this opportunity and tried a bunch of things without modifying the
> core logic of the pg_log_backtrace feature that Vignesh has worked on.
> 
> I did following -  moved the duplicate code to a new function
> CheckPostgresProcessId which can be used by pg_log_backtrace,
> pg_log_memory_contexts, pg_signal_backend and pg_log_query_plan ([2]),

Thanks for refactoring!
I'm going to use it for pg_log_query_plan after this patch is merged.

> modified the code comments, docs and tests to be more in sync with the
> commit [1], moved two of ProcessLogBacktraceInterrupt calls (archiver
> and wal writer) to their respective interrupt handlers. Here's the v16
> version that I've come up with.

I have some minor comments.

> +</screen>
> +    You can get the file name and line number from the logged details 
> by using
> +    gdb/addr2line in linux platforms (users must ensure gdb/addr2line 
> is
> +    already installed).
> +<programlisting>
> +1)  "info line *address" from gdb on postgres executable. For example:
> +gdb ./postgres
> +(gdb) info line *0x71c25d
> +Line 378 of "execMain.c" starts at address 0x71c25d 
> <literal><</literal>standard_ExecutorRun+470<literal>></literal> 
> and ends at 0x71c263     
> <literal><</literal>standard_ExecutorRun+476<literal>></literal>.
> +OR
> +2) Using "addr2line -e postgres address", For example:
> +addr2line -e ./postgres 0x71c25d
> +/home/postgresdba/src/backend/executor/execMain.c:378
> +</programlisting>
> +   </para>
> +

Isn't it better to remove line 1) and 2) from <programlisting>?
I just glanced at the existing sgml, but <programlisting> seems to 
contain only codes.


> + * CheckPostgresProcessId -- check if the process with given pid is a 
> backend
> + * or an auxiliary process.
> + *
> +
> + */

Isn't the 4th line needless?

BTW, when I saw the name of this function, I thought it just checks if 
the specified pid is PostgreSQL process or not.
Since it returns the pointer to the PGPROC or BackendId of the PID, it 
might be kind to write comments about it.


-- 
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION



Re: Printing backtrace of postgres processes

От
Fujii Masao
Дата:

On 2022/01/24 16:35, torikoshia wrote:
> On 2022-01-14 19:48, Bharath Rupireddy wrote:
>> On Sat, Nov 20, 2021 at 11:50 AM Bharath Rupireddy
>> <bharath.rupireddyforpostgres@gmail.com> wrote:
>>>
>>> On Fri, Nov 19, 2021 at 4:07 PM vignesh C <vignesh21@gmail.com> wrote:
>>> > The Attached v15 patch has the fixes for the same.
>>>
>>> Thanks. The v15 patch LGTM and the cf bot is happy hence marking it as RfC.
>>
>> The patch was not applying because of the recent commit [1]. I took
>> this opportunity and tried a bunch of things without modifying the
>> core logic of the pg_log_backtrace feature that Vignesh has worked on.

I have one question about this feature. When the PID of auxiliary process like archiver is specified, probably the
functionalways reports the same result, doesn't it? Because, for example, the archiver always logs its backtrace in
HandlePgArchInterrupts().

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Mon, Jan 24, 2022 at 10:06 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2022/01/24 16:35, torikoshia wrote:
> > On 2022-01-14 19:48, Bharath Rupireddy wrote:
> >> On Sat, Nov 20, 2021 at 11:50 AM Bharath Rupireddy
> >> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >>>
> >>> On Fri, Nov 19, 2021 at 4:07 PM vignesh C <vignesh21@gmail.com> wrote:
> >>> > The Attached v15 patch has the fixes for the same.
> >>>
> >>> Thanks. The v15 patch LGTM and the cf bot is happy hence marking it as RfC.
> >>
> >> The patch was not applying because of the recent commit [1]. I took
> >> this opportunity and tried a bunch of things without modifying the
> >> core logic of the pg_log_backtrace feature that Vignesh has worked on.
>
> I have one question about this feature. When the PID of auxiliary process like archiver is specified, probably the
functionalways reports the same result, doesn't it? Because, for example, the archiver always logs its backtrace in
HandlePgArchInterrupts().

It can be either from HandlePgArchInterrupts in pgarch_MainLoop or
from HandlePgArchInterrupts in pgarch_ArchiverCopyLoop, since the
archiver functionality is mainly in these functions.

Regards,
Vignesh



Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Mon, Jan 24, 2022 at 1:05 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> On 2022-01-14 19:48, Bharath Rupireddy wrote:
> > On Sat, Nov 20, 2021 at 11:50 AM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> >>
> >> On Fri, Nov 19, 2021 at 4:07 PM vignesh C <vignesh21@gmail.com> wrote:
> >> > The Attached v15 patch has the fixes for the same.
> >>
> >> Thanks. The v15 patch LGTM and the cf bot is happy hence marking it as
> >> RfC.
> >
> > The patch was not applying because of the recent commit [1]. I took
> > this opportunity and tried a bunch of things without modifying the
> > core logic of the pg_log_backtrace feature that Vignesh has worked on.
> >
> > I did following -  moved the duplicate code to a new function
> > CheckPostgresProcessId which can be used by pg_log_backtrace,
> > pg_log_memory_contexts, pg_signal_backend and pg_log_query_plan ([2]),
>
> Thanks for refactoring!
> I'm going to use it for pg_log_query_plan after this patch is merged.
>
> > modified the code comments, docs and tests to be more in sync with the
> > commit [1], moved two of ProcessLogBacktraceInterrupt calls (archiver
> > and wal writer) to their respective interrupt handlers. Here's the v16
> > version that I've come up with.
>
> I have some minor comments.
>
> > +</screen>
> > +    You can get the file name and line number from the logged details
> > by using
> > +    gdb/addr2line in linux platforms (users must ensure gdb/addr2line
> > is
> > +    already installed).
> > +<programlisting>
> > +1)  "info line *address" from gdb on postgres executable. For example:
> > +gdb ./postgres
> > +(gdb) info line *0x71c25d
> > +Line 378 of "execMain.c" starts at address 0x71c25d
> > <literal><</literal>standard_ExecutorRun+470<literal>></literal>
> > and ends at 0x71c263
> > <literal><</literal>standard_ExecutorRun+476<literal>></literal>.
> > +OR
> > +2) Using "addr2line -e postgres address", For example:
> > +addr2line -e ./postgres 0x71c25d
> > +/home/postgresdba/src/backend/executor/execMain.c:378
> > +</programlisting>
> > +   </para>
> > +
>
> Isn't it better to remove line 1) and 2) from <programlisting>?
> I just glanced at the existing sgml, but <programlisting> seems to
> contain only codes.

Modified

> > + * CheckPostgresProcessId -- check if the process with given pid is a
> > backend
> > + * or an auxiliary process.
> > + *
> > +
> > + */
>
> Isn't the 4th line needless?

Modified

> BTW, when I saw the name of this function, I thought it just checks if
> the specified pid is PostgreSQL process or not.
> Since it returns the pointer to the PGPROC or BackendId of the PID, it
> might be kind to write comments about it.

Modified

Thanks for the comments, attached v17 patch has the fix for the same.

Regards,
Vignesh

Вложения

Re: Printing backtrace of postgres processes

От
Bharath Rupireddy
Дата:
On Tue, Jan 25, 2022 at 12:00 PM vignesh C <vignesh21@gmail.com> wrote:
> Thanks for the comments, attached v17 patch has the fix for the same.

Have a minor comment on v17:

can we modify the elog(LOG, to new style ereport(LOG, ?
+ elog(LOG_SERVER_ONLY, "current backtrace:%s", errtrace.data);

/*----------
 * New-style error reporting API: to be used in this way:
 *      ereport(ERROR,
 *              errcode(ERRCODE_UNDEFINED_CURSOR),
 *              errmsg("portal \"%s\" not found", stmt->portalname),
 *              ... other errxxx() fields as needed ...);
 *

Regards,
Bharath Rupireddy.



Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Wed, Jan 26, 2022 at 11:07 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Jan 25, 2022 at 12:00 PM vignesh C <vignesh21@gmail.com> wrote:
> > Thanks for the comments, attached v17 patch has the fix for the same.
>
> Have a minor comment on v17:
>
> can we modify the elog(LOG, to new style ereport(LOG, ?
> + elog(LOG_SERVER_ONLY, "current backtrace:%s", errtrace.data);
>
> /*----------
>  * New-style error reporting API: to be used in this way:
>  *      ereport(ERROR,
>  *              errcode(ERRCODE_UNDEFINED_CURSOR),
>  *              errmsg("portal \"%s\" not found", stmt->portalname),
>  *              ... other errxxx() fields as needed ...);
>  *

Attached v18 patch has the changes for the same.

Regards,
Vignesh

Вложения

Re: Printing backtrace of postgres processes

От
Bharath Rupireddy
Дата:
On Thu, Jan 27, 2022 at 10:45 AM vignesh C <vignesh21@gmail.com> wrote:
>
> On Wed, Jan 26, 2022 at 11:07 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Tue, Jan 25, 2022 at 12:00 PM vignesh C <vignesh21@gmail.com> wrote:
> > > Thanks for the comments, attached v17 patch has the fix for the same.
> >
> > Have a minor comment on v17:
> >
> > can we modify the elog(LOG, to new style ereport(LOG, ?
> > + elog(LOG_SERVER_ONLY, "current backtrace:%s", errtrace.data);
> >
> > /*----------
> >  * New-style error reporting API: to be used in this way:
> >  *      ereport(ERROR,
> >  *              errcode(ERRCODE_UNDEFINED_CURSOR),
> >  *              errmsg("portal \"%s\" not found", stmt->portalname),
> >  *              ... other errxxx() fields as needed ...);
> >  *
>
> Attached v18 patch has the changes for the same.

Thanks. The v18 patch LGTM. I'm not sure if the CF bot failure is
related to the patch -  https://cirrus-ci.com/task/5633364051886080

Regards,
Bharath Rupireddy.



Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Fri, Jan 28, 2022 at 1:54 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Jan 27, 2022 at 10:45 AM vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Wed, Jan 26, 2022 at 11:07 AM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> > > On Tue, Jan 25, 2022 at 12:00 PM vignesh C <vignesh21@gmail.com> wrote:
> > > > Thanks for the comments, attached v17 patch has the fix for the same.
> > >
> > > Have a minor comment on v17:
> > >
> > > can we modify the elog(LOG, to new style ereport(LOG, ?
> > > + elog(LOG_SERVER_ONLY, "current backtrace:%s", errtrace.data);
> > >
> > > /*----------
> > >  * New-style error reporting API: to be used in this way:
> > >  *      ereport(ERROR,
> > >  *              errcode(ERRCODE_UNDEFINED_CURSOR),
> > >  *              errmsg("portal \"%s\" not found", stmt->portalname),
> > >  *              ... other errxxx() fields as needed ...);
> > >  *
> >
> > Attached v18 patch has the changes for the same.
>
> Thanks. The v18 patch LGTM. I'm not sure if the CF bot failure is
> related to the patch -  https://cirrus-ci.com/task/5633364051886080

I felt this test failure is not related to this change. Let's wait and
see the results of the next run. Also I noticed that this test seems
to have failed many times in the buildfarm too recently:

https://buildfarm.postgresql.org/cgi-bin/show_failures.pl?max_days=60&branch=HEAD&member=&stage=recoveryCheck&filter=Submit

Regards,
Vignesh

Вложения

Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Sat, Jan 29, 2022 at 8:06 AM vignesh C <vignesh21@gmail.com> wrote:
>
> On Fri, Jan 28, 2022 at 1:54 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Thu, Jan 27, 2022 at 10:45 AM vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > On Wed, Jan 26, 2022 at 11:07 AM Bharath Rupireddy
> > > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > >
> > > > On Tue, Jan 25, 2022 at 12:00 PM vignesh C <vignesh21@gmail.com> wrote:
> > > > > Thanks for the comments, attached v17 patch has the fix for the same.
> > > >
> > > > Have a minor comment on v17:
> > > >
> > > > can we modify the elog(LOG, to new style ereport(LOG, ?
> > > > + elog(LOG_SERVER_ONLY, "current backtrace:%s", errtrace.data);
> > > >
> > > > /*----------
> > > >  * New-style error reporting API: to be used in this way:
> > > >  *      ereport(ERROR,
> > > >  *              errcode(ERRCODE_UNDEFINED_CURSOR),
> > > >  *              errmsg("portal \"%s\" not found", stmt->portalname),
> > > >  *              ... other errxxx() fields as needed ...);
> > > >  *
> > >
> > > Attached v18 patch has the changes for the same.
> >
> > Thanks. The v18 patch LGTM. I'm not sure if the CF bot failure is
> > related to the patch -  https://cirrus-ci.com/task/5633364051886080
>
> I felt this test failure is not related to this change. Let's wait and
> see the results of the next run. Also I noticed that this test seems
> to have failed many times in the buildfarm too recently:
>
https://buildfarm.postgresql.org/cgi-bin/show_failures.pl?max_days=60&branch=HEAD&member=&stage=recoveryCheck&filter=Submit

CFBot has passed in the last run.

Regards,
Vignesh



Re: Printing backtrace of postgres processes

От
Justin Pryzby
Дата:
rebased to appease cfbot.

+ couple of little fixes as 0002.

-- 
Justin

Вложения

Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Wed, Mar 9, 2022 at 9:26 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> rebased to appease cfbot.
>
> + couple of little fixes as 0002.

Thanks for rebasing and fixing a few issues. I have taken all your
changes except for mcxtfuncs changes as those changes were not done as
part of this patch. Attached v19 patch which has the changes for the
same.

Regards,
Vignesh

Вложения

Re: Printing backtrace of postgres processes

От
Greg Stark
Дата:
Sadly the cfbot is showing a patch conflict again. It's just a trivial
conflict in the regression test schedule so I'm not going to update
the status but it would be good to rebase it so we continue to get
cfbot testing.



Re: Printing backtrace of postgres processes

От
Justin Pryzby
Дата:
On Wed, Mar 30, 2022 at 11:53:52AM -0400, Greg Stark wrote:
> Sadly the cfbot is showing a patch conflict again. It's just a trivial
> conflict in the regression test schedule so I'm not going to update
> the status but it would be good to rebase it so we continue to get
> cfbot testing.

Done.  No changes.

Вложения

Re: Printing backtrace of postgres processes

От
Robert Haas
Дата:
On Wed, Mar 30, 2022 at 12:03 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> On Wed, Mar 30, 2022 at 11:53:52AM -0400, Greg Stark wrote:
> > Sadly the cfbot is showing a patch conflict again. It's just a trivial
> > conflict in the regression test schedule so I'm not going to update
> > the status but it would be good to rebase it so we continue to get
> > cfbot testing.
>
> Done.  No changes.

+ if (chk_auxiliary_proc)
+ ereport(WARNING,
+ errmsg("PID %d is not a PostgreSQL server process", pid));
+ else
+ ereport(WARNING,
+ errmsg("PID %d is not a PostgreSQL backend process", pid));

This doesn't look right to me. I don't think that PostgreSQL server
processes are one kind of thing and PostgreSQL backend processes are
another kind of thing. I think they're two terms that are pretty
nearly interchangeable, or maybe at best you want to argue that
backend processes are some subset of server processes. I don't see
this sort of thing adding any clarity.

-static void
+void
 set_backtrace(ErrorData *edata, int num_skip)
 {
  StringInfoData errtrace;
@@ -978,7 +978,18 @@ set_backtrace(ErrorData *edata, int num_skip)
     "backtrace generation is not supported by this installation");
 #endif

- edata->backtrace = errtrace.data;
+ if (edata)
+ edata->backtrace = errtrace.data;
+ else
+ {
+ /*
+ * LOG_SERVER_ONLY is used to make sure the backtrace is never
+ * sent to client. We want to avoid messing up the other client
+ * session.
+ */
+ elog(LOG_SERVER_ONLY, "current backtrace:%s", errtrace.data);
+ pfree(errtrace.data);
+ }
 }

This looks like a grotty hack.

- PGPROC    *proc = BackendPidGetProc(pid);
-
- /*
- * BackendPidGetProc returns NULL if the pid isn't valid; but by the time
- * we reach kill(), a process for which we get a valid proc here might
- * have terminated on its own.  There's no way to acquire a lock on an
- * arbitrary process to prevent that. But since so far all the callers of
- * this mechanism involve some request for ending the process anyway, that
- * it might end on its own first is not a problem.
- *
- * Note that proc will also be NULL if the pid refers to an auxiliary
- * process or the postmaster (neither of which can be signaled via
- * pg_signal_backend()).
- */
- if (proc == NULL)
- {
- /*
- * This is just a warning so a loop-through-resultset will not abort
- * if one backend terminated on its own during the run.
- */
- ereport(WARNING,
- (errmsg("PID %d is not a PostgreSQL backend process", pid)));
+ PGPROC  *proc;

+ /* Users can only signal valid backend or an auxiliary process. */
+ proc = CheckPostgresProcessId(pid, false, NULL);
+ if (!proc)
  return SIGNAL_BACKEND_ERROR;
- }

Incidentally changing the behavior of pg_signal_backend() doesn't seem
like a great idea. We can do that as a separate commit, after
considering whether documentation changes are needed. But it's not
something that should get folded into a commit on another topic.

+ /*
+ * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
+ * isn't valid; but by the time we reach kill(), a process for which we
+ * get a valid proc here might have terminated on its own.  There's no way
+ * to acquire a lock on an arbitrary process to prevent that. But since
+ * this mechanism is usually used to debug a backend or an auxiliary
+ * process running and consuming lots of memory, that it might end on its
+ * own first without logging the requested info is not a problem.
+ */

This comment made a lot more sense where it used to be than it does
where it is now. I think more work is required here than just cutting
and pasting.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Tue, Apr 5, 2022 at 9:18 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Mar 30, 2022 at 12:03 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > On Wed, Mar 30, 2022 at 11:53:52AM -0400, Greg Stark wrote:
> > > Sadly the cfbot is showing a patch conflict again. It's just a trivial
> > > conflict in the regression test schedule so I'm not going to update
> > > the status but it would be good to rebase it so we continue to get
> > > cfbot testing.
> >
> > Done.  No changes.
>
> + if (chk_auxiliary_proc)
> + ereport(WARNING,
> + errmsg("PID %d is not a PostgreSQL server process", pid));
> + else
> + ereport(WARNING,
> + errmsg("PID %d is not a PostgreSQL backend process", pid));
>
> This doesn't look right to me. I don't think that PostgreSQL server
> processes are one kind of thing and PostgreSQL backend processes are
> another kind of thing. I think they're two terms that are pretty
> nearly interchangeable, or maybe at best you want to argue that
> backend processes are some subset of server processes. I don't see
> this sort of thing adding any clarity.

I have changed it to "PID %d is not a PostgreSQL server process" which
is similar to the existing warning message in
pg_log_backend_memory_contexts.

> -static void
> +void
>  set_backtrace(ErrorData *edata, int num_skip)
>  {
>   StringInfoData errtrace;
> @@ -978,7 +978,18 @@ set_backtrace(ErrorData *edata, int num_skip)
>      "backtrace generation is not supported by this installation");
>  #endif
>
> - edata->backtrace = errtrace.data;
> + if (edata)
> + edata->backtrace = errtrace.data;
> + else
> + {
> + /*
> + * LOG_SERVER_ONLY is used to make sure the backtrace is never
> + * sent to client. We want to avoid messing up the other client
> + * session.
> + */
> + elog(LOG_SERVER_ONLY, "current backtrace:%s", errtrace.data);
> + pfree(errtrace.data);
> + }
>  }
>
> This looks like a grotty hack.

I have changed it so that the backtrace is set and returned to the
caller. The caller will take care of logging or setting it to the
error data context. Thoughts?

> - PGPROC    *proc = BackendPidGetProc(pid);
> -
> - /*
> - * BackendPidGetProc returns NULL if the pid isn't valid; but by the time
> - * we reach kill(), a process for which we get a valid proc here might
> - * have terminated on its own.  There's no way to acquire a lock on an
> - * arbitrary process to prevent that. But since so far all the callers of
> - * this mechanism involve some request for ending the process anyway, that
> - * it might end on its own first is not a problem.
> - *
> - * Note that proc will also be NULL if the pid refers to an auxiliary
> - * process or the postmaster (neither of which can be signaled via
> - * pg_signal_backend()).
> - */
> - if (proc == NULL)
> - {
> - /*
> - * This is just a warning so a loop-through-resultset will not abort
> - * if one backend terminated on its own during the run.
> - */
> - ereport(WARNING,
> - (errmsg("PID %d is not a PostgreSQL backend process", pid)));
> + PGPROC  *proc;
>
> + /* Users can only signal valid backend or an auxiliary process. */
> + proc = CheckPostgresProcessId(pid, false, NULL);
> + if (!proc)
>   return SIGNAL_BACKEND_ERROR;
> - }
>
> Incidentally changing the behavior of pg_signal_backend() doesn't seem
> like a great idea. We can do that as a separate commit, after
> considering whether documentation changes are needed. But it's not
> something that should get folded into a commit on another topic.

Agreed. I have kept the logic of pg_signal_backend as it is.
pg_log_backtrace and pg_log_backend_memory_contexts use the same
functionality to check and send signal. I have added a new function
"CheckProcSendDebugSignal" which has the common code implementation
and will be called by both pg_log_backtrace and
pg_log_backend_memory_contexts.

> + /*
> + * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
> + * isn't valid; but by the time we reach kill(), a process for which we
> + * get a valid proc here might have terminated on its own.  There's no way
> + * to acquire a lock on an arbitrary process to prevent that. But since
> + * this mechanism is usually used to debug a backend or an auxiliary
> + * process running and consuming lots of memory, that it might end on its
> + * own first without logging the requested info is not a problem.
> + */
>
> This comment made a lot more sense where it used to be than it does
> where it is now. I think more work is required here than just cutting
> and pasting.

This function was called from pg_signal_backend earlier, this logic is
now moved to CheckProcSendDebugSignal which will be called only by
pg_log_backtrace and pg_log_backend_memory_contexts. This looks
appropriate in CheckProcSendDebugSignal with slight change to specify
it can consume lots of memory or a long running process.
Thoughts?

Attached v20 patch has the changes for the same.

Regards,
Vignesh

Вложения

Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Wed, Apr 6, 2022 at 12:29 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Tue, Apr 5, 2022 at 9:18 PM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Wed, Mar 30, 2022 at 12:03 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > On Wed, Mar 30, 2022 at 11:53:52AM -0400, Greg Stark wrote:
> > > > Sadly the cfbot is showing a patch conflict again. It's just a trivial
> > > > conflict in the regression test schedule so I'm not going to update
> > > > the status but it would be good to rebase it so we continue to get
> > > > cfbot testing.
> > >
> > > Done.  No changes.
> >
> > + if (chk_auxiliary_proc)
> > + ereport(WARNING,
> > + errmsg("PID %d is not a PostgreSQL server process", pid));
> > + else
> > + ereport(WARNING,
> > + errmsg("PID %d is not a PostgreSQL backend process", pid));
> >
> > This doesn't look right to me. I don't think that PostgreSQL server
> > processes are one kind of thing and PostgreSQL backend processes are
> > another kind of thing. I think they're two terms that are pretty
> > nearly interchangeable, or maybe at best you want to argue that
> > backend processes are some subset of server processes. I don't see
> > this sort of thing adding any clarity.
>
> I have changed it to "PID %d is not a PostgreSQL server process" which
> is similar to the existing warning message in
> pg_log_backend_memory_contexts.
>
> > -static void
> > +void
> >  set_backtrace(ErrorData *edata, int num_skip)
> >  {
> >   StringInfoData errtrace;
> > @@ -978,7 +978,18 @@ set_backtrace(ErrorData *edata, int num_skip)
> >      "backtrace generation is not supported by this installation");
> >  #endif
> >
> > - edata->backtrace = errtrace.data;
> > + if (edata)
> > + edata->backtrace = errtrace.data;
> > + else
> > + {
> > + /*
> > + * LOG_SERVER_ONLY is used to make sure the backtrace is never
> > + * sent to client. We want to avoid messing up the other client
> > + * session.
> > + */
> > + elog(LOG_SERVER_ONLY, "current backtrace:%s", errtrace.data);
> > + pfree(errtrace.data);
> > + }
> >  }
> >
> > This looks like a grotty hack.
>
> I have changed it so that the backtrace is set and returned to the
> caller. The caller will take care of logging or setting it to the
> error data context. Thoughts?
>
> > - PGPROC    *proc = BackendPidGetProc(pid);
> > -
> > - /*
> > - * BackendPidGetProc returns NULL if the pid isn't valid; but by the time
> > - * we reach kill(), a process for which we get a valid proc here might
> > - * have terminated on its own.  There's no way to acquire a lock on an
> > - * arbitrary process to prevent that. But since so far all the callers of
> > - * this mechanism involve some request for ending the process anyway, that
> > - * it might end on its own first is not a problem.
> > - *
> > - * Note that proc will also be NULL if the pid refers to an auxiliary
> > - * process or the postmaster (neither of which can be signaled via
> > - * pg_signal_backend()).
> > - */
> > - if (proc == NULL)
> > - {
> > - /*
> > - * This is just a warning so a loop-through-resultset will not abort
> > - * if one backend terminated on its own during the run.
> > - */
> > - ereport(WARNING,
> > - (errmsg("PID %d is not a PostgreSQL backend process", pid)));
> > + PGPROC  *proc;
> >
> > + /* Users can only signal valid backend or an auxiliary process. */
> > + proc = CheckPostgresProcessId(pid, false, NULL);
> > + if (!proc)
> >   return SIGNAL_BACKEND_ERROR;
> > - }
> >
> > Incidentally changing the behavior of pg_signal_backend() doesn't seem
> > like a great idea. We can do that as a separate commit, after
> > considering whether documentation changes are needed. But it's not
> > something that should get folded into a commit on another topic.
>
> Agreed. I have kept the logic of pg_signal_backend as it is.
> pg_log_backtrace and pg_log_backend_memory_contexts use the same
> functionality to check and send signal. I have added a new function
> "CheckProcSendDebugSignal" which has the common code implementation
> and will be called by both pg_log_backtrace and
> pg_log_backend_memory_contexts.
>
> > + /*
> > + * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
> > + * isn't valid; but by the time we reach kill(), a process for which we
> > + * get a valid proc here might have terminated on its own.  There's no way
> > + * to acquire a lock on an arbitrary process to prevent that. But since
> > + * this mechanism is usually used to debug a backend or an auxiliary
> > + * process running and consuming lots of memory, that it might end on its
> > + * own first without logging the requested info is not a problem.
> > + */
> >
> > This comment made a lot more sense where it used to be than it does
> > where it is now. I think more work is required here than just cutting
> > and pasting.
>
> This function was called from pg_signal_backend earlier, this logic is
> now moved to CheckProcSendDebugSignal which will be called only by
> pg_log_backtrace and pg_log_backend_memory_contexts. This looks
> appropriate in CheckProcSendDebugSignal with slight change to specify
> it can consume lots of memory or a long running process.
> Thoughts?
>
> Attached v20 patch has the changes for the same.

The patch was not applying on top of HEAD. Attached patch is a rebased
version on top of HEAD.

Regards,
Vignesh

Вложения

Re: Printing backtrace of postgres processes

От
Kyotaro Horiguchi
Дата:
At Thu, 14 Apr 2022 10:33:50 +0530, vignesh C <vignesh21@gmail.com> wrote in 
> On Wed, Apr 6, 2022 at 12:29 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Tue, Apr 5, 2022 at 9:18 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > > This looks like a grotty hack.
> >
> > I have changed it so that the backtrace is set and returned to the
> > caller. The caller will take care of logging or setting it to the
> > error data context. Thoughts?

The point here I think is that the StringInfo is useless here in the
first place.  Addition to that, it is outside the use of StringInfo
hammering in a string pointer into it.

By the way, as Andres suggested very early in this thread, backrace()
can be called from signal handlers with some preparation.  So we can
run backtrace() in HandleLogBacktraceInterrupt() and backtrace_symbols
in ProceeLogBacktraceInterrupt().  This make this a bit complex, but
the outcome should be more informative.


> > > Incidentally changing the behavior of pg_signal_backend() doesn't seem
> > > like a great idea. We can do that as a separate commit, after
> > > considering whether documentation changes are needed. But it's not
> > > something that should get folded into a commit on another topic.
> >
> > Agreed. I have kept the logic of pg_signal_backend as it is.
> > pg_log_backtrace and pg_log_backend_memory_contexts use the same
> > functionality to check and send signal. I have added a new function
> > "CheckProcSendDebugSignal" which has the common code implementation
> > and will be called by both pg_log_backtrace and
> > pg_log_backend_memory_contexts.

The name looks like too specific than what it actually does, and
rather doesn't manifest what it does.
SendProcSignalBackendOrAuxproc() might be more eescriptive.

Or we can provide BackendOrAuxiliaryPidGetProc(int pid, BackendId &backendid).

> > > + /*
> > > + * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
> > > + * isn't valid; but by the time we reach kill(), a process for which we
> > > + * get a valid proc here might have terminated on its own.  There's no way
> > > + * to acquire a lock on an arbitrary process to prevent that. But since
> > > + * this mechanism is usually used to debug a backend or an auxiliary
> > > + * process running and consuming lots of memory, that it might end on its
> > > + * own first without logging the requested info is not a problem.
> > > + */
> > >
> > > This comment made a lot more sense where it used to be than it does
> > > where it is now. I think more work is required here than just cutting
> > > and pasting.
> >
> > This function was called from pg_signal_backend earlier, this logic is
> > now moved to CheckProcSendDebugSignal which will be called only by
> > pg_log_backtrace and pg_log_backend_memory_contexts. This looks
> > appropriate in CheckProcSendDebugSignal with slight change to specify
> > it can consume lots of memory or a long running process.
> > Thoughts?

For example. do you see the following part correct as well for
pg_log_backtrace()?

> + * this mechanism is usually used to debug a backend or an auxiliary
> + * process running and consuming lots of memory, that it might end on its

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Fri, Apr 15, 2022 at 11:49 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Thu, 14 Apr 2022 10:33:50 +0530, vignesh C <vignesh21@gmail.com> wrote in
> > On Wed, Apr 6, 2022 at 12:29 PM vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > On Tue, Apr 5, 2022 at 9:18 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > > > This looks like a grotty hack.
> > >
> > > I have changed it so that the backtrace is set and returned to the
> > > caller. The caller will take care of logging or setting it to the
> > > error data context. Thoughts?
>
> The point here I think is that the StringInfo is useless here in the
> first place.  Addition to that, it is outside the use of StringInfo
> hammering in a string pointer into it.

Modified

> By the way, as Andres suggested very early in this thread, backrace()
> can be called from signal handlers with some preparation.  So we can
> run backtrace() in HandleLogBacktraceInterrupt() and backtrace_symbols
> in ProceeLogBacktraceInterrupt().  This make this a bit complex, but
> the outcome should be more informative.

Using that approach we could only send the trace to stderr, when we
are trying to log it to the logs it might result in deadlock as the
signal handler can be called in between malloc lock. We dropped that
approach to avoid issues pointed in [1].

> > > > Incidentally changing the behavior of pg_signal_backend() doesn't seem
> > > > like a great idea. We can do that as a separate commit, after
> > > > considering whether documentation changes are needed. But it's not
> > > > something that should get folded into a commit on another topic.
> > >
> > > Agreed. I have kept the logic of pg_signal_backend as it is.
> > > pg_log_backtrace and pg_log_backend_memory_contexts use the same
> > > functionality to check and send signal. I have added a new function
> > > "CheckProcSendDebugSignal" which has the common code implementation
> > > and will be called by both pg_log_backtrace and
> > > pg_log_backend_memory_contexts.
>
> The name looks like too specific than what it actually does, and
> rather doesn't manifest what it does.
> SendProcSignalBackendOrAuxproc() might be more eescriptive.
>
> Or we can provide BackendOrAuxiliaryPidGetProc(int pid, BackendId &backendid).

Modified it to SendProcSignalBackendOrAuxproc

> > > > + /*
> > > > + * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
> > > > + * isn't valid; but by the time we reach kill(), a process for which we
> > > > + * get a valid proc here might have terminated on its own.  There's no way
> > > > + * to acquire a lock on an arbitrary process to prevent that. But since
> > > > + * this mechanism is usually used to debug a backend or an auxiliary
> > > > + * process running and consuming lots of memory, that it might end on its
> > > > + * own first without logging the requested info is not a problem.
> > > > + */
> > > >
> > > > This comment made a lot more sense where it used to be than it does
> > > > where it is now. I think more work is required here than just cutting
> > > > and pasting.
> > >
> > > This function was called from pg_signal_backend earlier, this logic is
> > > now moved to CheckProcSendDebugSignal which will be called only by
> > > pg_log_backtrace and pg_log_backend_memory_contexts. This looks
> > > appropriate in CheckProcSendDebugSignal with slight change to specify
> > > it can consume lots of memory or a long running process.
> > > Thoughts?
>
> For example. do you see the following part correct as well for
> pg_log_backtrace()?
">
> > + * this mechanism is usually used to debug a backend or an auxiliary
> > + * process running and consuming lots of memory, that it might end on its

I felt it was ok since I mentioned it as "consuming lots of memory or
a long running process", let me know if you want to change it to
something else, I will change it.

[1] - https://www.postgresql.org/message-id/1361750.1606795285%40sss.pgh.pa.us

The attached v21 patch has the changes for the same.

Regards,
Vignesh

Вложения

Re: Printing backtrace of postgres processes

От
Bharath Rupireddy
Дата:
On Mon, Apr 18, 2022 at 9:10 AM vignesh C <vignesh21@gmail.com> wrote:
>
> The attached v21 patch has the changes for the same.

Thanks for the patch. Here are some comments:

1. I think there's a fundamental problem with this patch, that is, it
prints the backtrace when the interrupt is processed but not when
interrupt is received. This way, the backends/aux processes will
always have backtraces in their main loops or some processing loops
wherever CHECK_FOR_INTERRUPTS() is called [1]. What we need is
something more useful. What I think is the better way is to capture
the backtrace, call set_backtrace(), when the interrupt arrives and
then if we don't want to print it in the interrupt handler, perhaps
save it and print it in the next CFI() cycle. See some realistic and
useful backtrace [2] when I emitted the backtrace in the interrupt
handler.

2.
+        specified process ID.  This function can send the request to
+        backends and auxiliary processes except the logger and statistics
+        collector.  The backtraces will be logged at <literal>LOG</literal>
There's no statistics collector process any more. Please remove it.
Also remove 'the' before 'logger' to be in sync with
pg_log_backend_memory_contexts docs.

3.
+        configuration set (See <xref linkend="runtime-config-logging"/> for
Lowercase 'see' to be in sync with pg_log_backend_memory_contexts docs.

4.
+    You can obtain the file name and line number from the logged
details by using
How about 'One can obtain'?

5.
+postgres=# select pg_log_backtrace(pg_backend_pid());
+For example:
+<screen>
+2021-01-27 11:33:50.247 IST [111735] LOG:  current backtrace:
A more realistic example would be better here, say walwriter or
checkpointer or some other process backtrace will be good instead of a
backend logging its pg_log_backtrace()'s call stack?

6.
Don't we need the backtrace of walreceiver? I think it'll be good to
have in ProcessWalRcvInterrupts().

7.
+            errmsg_internal("logging backtrace of PID %d", MyProcPid));
+    elog(LOG_SERVER_ONLY, "current backtrace:%s", errtrace);
Can we get rid of "current backtrace:%s" and have something similar to
ProcessLogMemoryContextInterrupt() like below?

errmsg("logging backtrace of PID %d", MyProcPid)));
errmsg("%s", errtrace);

[1]
2022-11-10 09:55:44.691 UTC [1346443] LOG:  logging backtrace of PID 1346443
2022-11-10 09:55:44.691 UTC [1346443] LOG:  current backtrace:
        postgres: checkpointer (set_backtrace+0x46) [0x5640df9849c6]
        postgres: checkpointer (ProcessLogBacktraceInterrupt+0x16)
[0x5640df7fd326]
        postgres: checkpointer (CheckpointerMain+0x1a3) [0x5640df77f553]
        postgres: checkpointer (AuxiliaryProcessMain+0xc9) [0x5640df77d5e9]
        postgres: checkpointer (+0x436a9a) [0x5640df783a9a]
        postgres: checkpointer (PostmasterMain+0xd57) [0x5640df7877e7]
        postgres: checkpointer (main+0x20f) [0x5640df4a6f1f]
        /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f4b9fe9dd90]
        /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7f4b9fe9de40]
        postgres: checkpointer (_start+0x25) [0x5640df4a7265]
2022-11-10 09:56:05.032 UTC [1346448] LOG:  logging backtrace of PID 1346448
2022-11-10 09:56:05.032 UTC [1346448] LOG:  current backtrace:
        postgres: logical replication launcher (set_backtrace+0x46)
[0x5640df9849c6]
        postgres: logical replication launcher
(ProcessLogBacktraceInterrupt+0x16) [0x5640df7fd326]
        postgres: logical replication launcher
(ApplyLauncherMain+0x11b) [0x5640df7a253b]
        postgres: logical replication launcher
(StartBackgroundWorker+0x220) [0x5640df77e270]
        postgres: logical replication launcher (+0x4369f4) [0x5640df7839f4]
        postgres: logical replication launcher (+0x43771d) [0x5640df78471d]
        /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f4b9feb6520]
        /lib/x86_64-linux-gnu/libc.so.6(__select+0xbd) [0x7f4b9ff8f74d]
        postgres: logical replication launcher (+0x43894d) [0x5640df78594d]
        postgres: logical replication launcher (PostmasterMain+0xcb5)
[0x5640df787745]
        postgres: logical replication launcher (main+0x20f) [0x5640df4a6f1f]
        /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f4b9fe9dd90]
        /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7f4b9fe9de40]
        postgres: logical replication launcher (_start+0x25) [0x5640df4a7265]

[2]
2022-11-10 10:25:20.021 UTC [1351953] LOG:  current backtrace:
        postgres: ubuntu postgres [local] INSERT(set_backtrace+0x46)
[0x55c60ae069b6]
        postgres: ubuntu postgres [local]
INSERT(procsignal_sigusr1_handler+0x1d8) [0x55c60ac7f4f8]
        /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f75a395c520]
        /lib/x86_64-linux-gnu/libc.so.6(+0x91104) [0x7f75a39ab104]
        /lib/x86_64-linux-gnu/libc.so.6(+0x9ccf8) [0x7f75a39b6cf8]
        postgres: ubuntu postgres [local] INSERT(PGSemaphoreLock+0x22)
[0x55c60abfb1f2]
        postgres: ubuntu postgres [local] INSERT(LWLockAcquire+0x174)
[0x55c60ac8f384]
        postgres: ubuntu postgres [local] INSERT(+0x1fb778) [0x55c60a9ca778]
        postgres: ubuntu postgres [local] INSERT(+0x1fb8d3) [0x55c60a9ca8d3]
        postgres: ubuntu postgres [local]
INSERT(XLogInsertRecord+0x4e4) [0x55c60a9cb0c4]
        postgres: ubuntu postgres [local] INSERT(XLogInsert+0xcf)
[0x55c60a9d167f]
        postgres: ubuntu postgres [local] INSERT(heap_insert+0x2ca)
[0x55c60a97168a]
        postgres: ubuntu postgres [local] INSERT(+0x1ab14a) [0x55c60a97a14a]
        postgres: ubuntu postgres [local] INSERT(+0x33dcab) [0x55c60ab0ccab]
        postgres: ubuntu postgres [local] INSERT(+0x33f10d) [0x55c60ab0e10d]
        postgres: ubuntu postgres [local]
INSERT(standard_ExecutorRun+0x102) [0x55c60aadfdb2]
        postgres: ubuntu postgres [local] INSERT(+0x4d48d4) [0x55c60aca38d4]
        postgres: ubuntu postgres [local] INSERT(PortalRun+0x27d)
[0x55c60aca47ed]
        postgres: ubuntu postgres [local] INSERT(+0x4d180f) [0x55c60aca080f]
        postgres: ubuntu postgres [local] INSERT(PostgresMain+0x1eb4)
[0x55c60aca2b94]
        postgres: ubuntu postgres [local] INSERT(+0x4396f8) [0x55c60ac086f8]
        postgres: ubuntu postgres [local] INSERT(PostmasterMain+0xcb5)
[0x55c60ac09745]
        postgres: ubuntu postgres [local] INSERT(main+0x20f) [0x55c60a928f1f]
        /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f75a3943d90]
        /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7f75a3943e40]
        postgres: ubuntu postgres [local] INSERT(_start+0x25) [0x55c60a929265]
2022-11-10 10:25:20.021 UTC [1351953] STATEMENT:  insert into foo
select * from generate_series(1, 1000000000);
2022-11-10 10:25:20.521 UTC [1351953] LOG:  logging backtrace of PID 1351953
2022-11-10 10:25:20.521 UTC [1351953] LOG:  current backtrace:
        postgres: ubuntu postgres [local] INSERT(set_backtrace+0x46)
[0x55c60ae069b6]
        postgres: ubuntu postgres [local]
INSERT(procsignal_sigusr1_handler+0x1d8) [0x55c60ac7f4f8]
        /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f75a395c520]
        postgres: ubuntu postgres [local]
INSERT(XLogInsertRecord+0x976) [0x55c60a9cb556]
        postgres: ubuntu postgres [local] INSERT(XLogInsert+0xcf)
[0x55c60a9d167f]
        postgres: ubuntu postgres [local] INSERT(heap_insert+0x2ca)
[0x55c60a97168a]
        postgres: ubuntu postgres [local] INSERT(+0x1ab14a) [0x55c60a97a14a]
        postgres: ubuntu postgres [local] INSERT(+0x33dcab) [0x55c60ab0ccab]
        postgres: ubuntu postgres [local] INSERT(+0x33f10d) [0x55c60ab0e10d]
        postgres: ubuntu postgres [local]
INSERT(standard_ExecutorRun+0x102) [0x55c60aadfdb2]
        postgres: ubuntu postgres [local] INSERT(+0x4d48d4) [0x55c60aca38d4]
        postgres: ubuntu postgres [local] INSERT(PortalRun+0x27d)
[0x55c60aca47ed]
        postgres: ubuntu postgres [local] INSERT(+0x4d180f) [0x55c60aca080f]
        postgres: ubuntu postgres [local] INSERT(PostgresMain+0x1eb4)
[0x55c60aca2b94]
        postgres: ubuntu postgres [local] INSERT(+0x4396f8) [0x55c60ac086f8]
        postgres: ubuntu postgres [local] INSERT(PostmasterMain+0xcb5)
[0x55c60ac09745]
        postgres: ubuntu postgres [local] INSERT(main+0x20f) [0x55c60a928f1f]
        /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f75a3943d90]
        /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7f75a3943e40]
        postgres: ubuntu postgres [local] INSERT(_start+0x25) [0x55c60a929265]
2022-11-10 10:25:20.521 UTC [1351953] STATEMENT:  insert into foo
select * from generate_series(1, 1000000000);
2022-11-10 10:25:21.050 UTC [1351953] LOG:  logging backtrace of PID 1351953
2022-11-10 10:25:21.050 UTC [1351953] LOG:  current backtrace:
        postgres: ubuntu postgres [local] INSERT(set_backtrace+0x46)
[0x55c60ae069b6]
        postgres: ubuntu postgres [local]
INSERT(procsignal_sigusr1_handler+0x1d8) [0x55c60ac7f4f8]
        /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f75a395c520]
        /lib/x86_64-linux-gnu/libc.so.6(fdatasync+0x17) [0x7f75a3a35af7]
        postgres: ubuntu postgres [local] INSERT(+0x1fb384) [0x55c60a9ca384]
        postgres: ubuntu postgres [local] INSERT(+0x1fb7ef) [0x55c60a9ca7ef]
        postgres: ubuntu postgres [local] INSERT(+0x1fb8d3) [0x55c60a9ca8d3]
        postgres: ubuntu postgres [local]
INSERT(XLogInsertRecord+0x4e4) [0x55c60a9cb0c4]
        postgres: ubuntu postgres [local] INSERT(XLogInsert+0xcf)
[0x55c60a9d167f]
        postgres: ubuntu postgres [local] INSERT(heap_insert+0x2ca)
[0x55c60a97168a]
        postgres: ubuntu postgres [local] INSERT(+0x1ab14a) [0x55c60a97a14a]
        postgres: ubuntu postgres [local] INSERT(+0x33dcab) [0x55c60ab0ccab]
        postgres: ubuntu postgres [local] INSERT(+0x33f10d) [0x55c60ab0e10d]
        postgres: ubuntu postgres [local]
INSERT(standard_ExecutorRun+0x102) [0x55c60aadfdb2]
        postgres: ubuntu postgres [local] INSERT(+0x4d48d4) [0x55c60aca38d4]
        postgres: ubuntu postgres [local] INSERT(PortalRun+0x27d)
[0x55c60aca47ed]
        postgres: ubuntu postgres [local] INSERT(+0x4d180f) [0x55c60aca080f]
        postgres: ubuntu postgres [local] INSERT(PostgresMain+0x1eb4)
[0x55c60aca2b94]
        postgres: ubuntu postgres [local] INSERT(+0x4396f8) [0x55c60ac086f8]
        postgres: ubuntu postgres [local] INSERT(PostmasterMain+0xcb5)
[0x55c60ac09745]
        postgres: ubuntu postgres [local] INSERT(main+0x20f) [0x55c60a928f1f]
        /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f75a3943d90]
        /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7f75a3943e40]
        postgres: ubuntu postgres [local] INSERT(_start+0x25) [0x55c60a929265]

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Printing backtrace of postgres processes

От
Kyotaro Horiguchi
Дата:
At Thu, 10 Nov 2022 15:56:35 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in 
> On Mon, Apr 18, 2022 at 9:10 AM vignesh C <vignesh21@gmail.com> wrote:
> >
> > The attached v21 patch has the changes for the same.
> 
> Thanks for the patch. Here are some comments:
> 
> 1. I think there's a fundamental problem with this patch, that is, it
> prints the backtrace when the interrupt is processed but not when
> interrupt is received. This way, the backends/aux processes will

Yeah, but the obstacle was backtrace(3) itself. Andres pointed [1]
that that may be doable with some care (and I agree to the opinion).
AFAIS no discussions followed and things have been to the current
shape since then.


[1] https://www.postgresql.org/message-id/20201201032649.aekv5b5dicvmovf4%40alap3.anarazel.de
| > Surely this is *utterly* unsafe.  You can't do that sort of stuff in
| > a signal handler.
| 
| That's of course true for the current implementation - but I don't think
| it's a fundamental constraint. With a bit of care backtrace() and
| backtrace_symbols() itself can be signal safe:

man 3 backtrace
>  *  backtrace()  and  backtrace_symbols_fd() don't call malloc() explic‐
>     itly, but they are part of libgcc,  which  gets  loaded  dynamically
>     when  first  used.   Dynamic loading usually triggers a call to mal‐
>     loc(3).  If you need certain calls to these  two  functions  to  not
>     allocate  memory (in signal handlers, for example), you need to make
>     sure libgcc is loaded beforehand.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Printing backtrace of postgres processes

От
Bharath Rupireddy
Дата:
On Fri, Nov 11, 2022 at 7:59 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Thu, 10 Nov 2022 15:56:35 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
> > On Mon, Apr 18, 2022 at 9:10 AM vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > The attached v21 patch has the changes for the same.
> >
> > Thanks for the patch. Here are some comments:
> >
> > 1. I think there's a fundamental problem with this patch, that is, it
> > prints the backtrace when the interrupt is processed but not when
> > interrupt is received. This way, the backends/aux processes will
>
> Yeah, but the obstacle was backtrace(3) itself. Andres pointed [1]
> that that may be doable with some care (and I agree to the opinion).
> AFAIS no discussions followed and things have been to the current
> shape since then.
>
>
> [1] https://www.postgresql.org/message-id/20201201032649.aekv5b5dicvmovf4%40alap3.anarazel.de
> | > Surely this is *utterly* unsafe.  You can't do that sort of stuff in
> | > a signal handler.
> |
> | That's of course true for the current implementation - but I don't think
> | it's a fundamental constraint. With a bit of care backtrace() and
> | backtrace_symbols() itself can be signal safe:
>
> man 3 backtrace
> >  *  backtrace()  and  backtrace_symbols_fd() don't call malloc() explic‐
> >     itly, but they are part of libgcc,  which  gets  loaded  dynamically
> >     when  first  used.   Dynamic loading usually triggers a call to mal‐
> >     loc(3).  If you need certain calls to these  two  functions  to  not
> >     allocate  memory (in signal handlers, for example), you need to make
> >     sure libgcc is loaded beforehand.

I missed that part. Thanks for pointing it out. The
backtrace_symbols() seems to be returning a malloc'ed array [1],
meaning it can't be used in signal handlers (if used, it can cause
deadlocks as per [2]) and existing set_backtrace() is using it.
Therefore, we need to either change set_backtrace() to use
backtrace_symbols_fd() instead of backtrace_symobls() or introduce
another function for the purpose of this feature. If done that, then
we can think of preloading of libgcc which makes backtrace(),
backtrace_symobols_fd() safe to use in signal handlers.

Looks like we're not loading libgcc explicitly now into any of
postgres processes, please correct me if I'm wrong here. If we're not
loading it right now, is it acceptable to load libgcc into every
postgres process for the sake of this feature?

[1] https://linux.die.net/man/3/backtrace_symbols
[2] https://stackoverflow.com/questions/40049751/malloc-inside-linux-signal-handler-cause-deadlock

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Printing backtrace of postgres processes

От
Bharath Rupireddy
Дата:
On Fri, Nov 11, 2022 at 6:04 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Nov 11, 2022 at 7:59 AM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> >
> > At Thu, 10 Nov 2022 15:56:35 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
> > > On Mon, Apr 18, 2022 at 9:10 AM vignesh C <vignesh21@gmail.com> wrote:
> > > >
> > > > The attached v21 patch has the changes for the same.
> > >
> > > Thanks for the patch. Here are some comments:
> > >
> > > 1. I think there's a fundamental problem with this patch, that is, it
> > > prints the backtrace when the interrupt is processed but not when
> > > interrupt is received. This way, the backends/aux processes will
> >
> > Yeah, but the obstacle was backtrace(3) itself. Andres pointed [1]
> > that that may be doable with some care (and I agree to the opinion).
> > AFAIS no discussions followed and things have been to the current
> > shape since then.
> >
> >
> > [1] https://www.postgresql.org/message-id/20201201032649.aekv5b5dicvmovf4%40alap3.anarazel.de
> > | > Surely this is *utterly* unsafe.  You can't do that sort of stuff in
> > | > a signal handler.
> > |
> > | That's of course true for the current implementation - but I don't think
> > | it's a fundamental constraint. With a bit of care backtrace() and
> > | backtrace_symbols() itself can be signal safe:
> >
> > man 3 backtrace
> > >  *  backtrace()  and  backtrace_symbols_fd() don't call malloc() explic‐
> > >     itly, but they are part of libgcc,  which  gets  loaded  dynamically
> > >     when  first  used.   Dynamic loading usually triggers a call to mal‐
> > >     loc(3).  If you need certain calls to these  two  functions  to  not
> > >     allocate  memory (in signal handlers, for example), you need to make
> > >     sure libgcc is loaded beforehand.
>
> I missed that part. Thanks for pointing it out. The
> backtrace_symbols() seems to be returning a malloc'ed array [1],
> meaning it can't be used in signal handlers (if used, it can cause
> deadlocks as per [2]) and existing set_backtrace() is using it.
> Therefore, we need to either change set_backtrace() to use
> backtrace_symbols_fd() instead of backtrace_symobls() or introduce
> another function for the purpose of this feature. If done that, then
> we can think of preloading of libgcc which makes backtrace(),
> backtrace_symobols_fd() safe to use in signal handlers.
>
> Looks like we're not loading libgcc explicitly now into any of
> postgres processes, please correct me if I'm wrong here. If we're not
> loading it right now, is it acceptable to load libgcc into every
> postgres process for the sake of this feature?
>
> [1] https://linux.die.net/man/3/backtrace_symbols
> [2] https://stackoverflow.com/questions/40049751/malloc-inside-linux-signal-handler-cause-deadlock

I took a stab at it after discussing with Vignesh off-list. Here's
what I've done:

1. Make backtrace functions signal safe i.e. ensure libgcc is loaded,
just in case if it's not, by calling backtrace() function during the
early life of a process after SIGUSR1 signal handler and other signal
handlers are established.
2. Emit backtrace to stderr within the signal handler itself to keep
things simple so that we don't need to allocate dynamic memory inside
the signal handler.
3. Split the patch into 3 for ease of review, 0001 does preparatory
stuff, 0002 adds the function, 0003 adds the docs and tests.

I'm attaching the v23 patch set for further review.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: Printing backtrace of postgres processes

От
Bharath Rupireddy
Дата:
On Wed, Nov 30, 2022 at 11:43 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> I'm attaching the v22 patch set for further review.

Needed a rebase due to 216a784829c2c5f03ab0c43e009126cbb819e9b2.
Attaching v23 patch set for further review.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: Printing backtrace of postgres processes

От
Daniel Gustafsson
Дата:
> On 11 Jan 2023, at 15:44, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Nov 30, 2022 at 11:43 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>>
>> I'm attaching the v22 patch set for further review.
>
> Needed a rebase due to 216a784829c2c5f03ab0c43e009126cbb819e9b2.
> Attaching v23 patch set for further review.

This thread has stalled for well over 6 months with the patch going from CF to
CF.  From skimming the thread it seems that a lot of the details have been
ironed out with most (all?) objections addressed.  Is any committer interested
in picking this up?  If not we should probably mark it returned with feedback.

--
Daniel Gustafsson




Re: Printing backtrace of postgres processes

От
Bharath Rupireddy
Дата:
On Thu, Jul 20, 2023 at 8:22 PM Daniel Gustafsson <daniel@yesql.se> wrote:
>
> > On 11 Jan 2023, at 15:44, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Wed, Nov 30, 2022 at 11:43 AM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> >>
> >> I'm attaching the v22 patch set for further review.
> >
> > Needed a rebase due to 216a784829c2c5f03ab0c43e009126cbb819e9b2.
> > Attaching v23 patch set for further review.
>
> This thread has stalled for well over 6 months with the patch going from CF to
> CF.  From skimming the thread it seems that a lot of the details have been
> ironed out with most (all?) objections addressed.  Is any committer interested
> in picking this up?  If not we should probably mark it returned with feedback.

Rebase needed due to function oid clash. Picked the new OID with the
help of src/include/catalog/unused_oids. PSA v24 patch set.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: Printing backtrace of postgres processes

От
Maciek Sakrejda
Дата:
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:       tested, passed
Spec compliant:           not tested
Documentation:            tested, passed

I'm not sure if this actually still needs review, but it's marked as such in the CF app, so I'm reviewing it in the
hopesof moving it along.
 

The feature works as documented. The docs say "This function is supported only if PostgreSQL was built with the ability
tocapture backtraces, otherwise it will emit a warning." I'm not sure what building with the ability to capture
backtracesis, but it worked with no special config on my machine. I don't have much C experience, so I don't know if
thisis something that should have more context in a README somewhere, or if it's likely someone who's interested in
thiswill already know what to do. The code looks fine to me.
 

I tried running make installcheck-world, but it failed on 17 tests. However, master also fails here on 17 tests. A
normalmake check-world passes on both branches. I assume I'm doing something wrong and would appreciate any pointers
[0].

Based on my review and Daniel's comment above, I'm marking this as Ready for Committer.

Thanks,
Maciek

[0] My regression.diffs has errors like

```
diff -U3 /home/maciek/code/aux/postgres/src/test/regress/expected/copyselect.out
/home/maciek/code/aux/postgres/src/test/regress/results/copyselect.out
--- /home/maciek/code/aux/postgres/src/test/regress/expected/copyselect.out     2023-01-02 12:21:10.792646101 -0800
+++ /home/maciek/code/aux/postgres/src/test/regress/results/copyselect.out      2024-01-14 15:04:07.513887866 -0800
@@ -131,11 +131,6 @@
 2
  ?column? 
 ----------
-        3
-(1 row)
-
- ?column? 
-----------
         4
 (1 row)
```

and

```
diff -U3 /home/maciek/code/aux/postgres/src/test/regress/expected/create_table.out
/home/maciek/code/aux/postgres/src/test/regress/results/create_table.out
--- /home/maciek/code/aux/postgres/src/test/regress/expected/create_table.out   2023-10-02 22:14:02.583377845 -0700
+++ /home/maciek/code/aux/postgres/src/test/regress/results/create_table.out    2024-01-14 15:04:09.037890710 -0800
@@ -854,8 +854,6 @@
  b      | integer |           | not null | 1       | plain    |              | 
 Partition of: parted FOR VALUES IN ('b')
 Partition constraint: ((a IS NOT NULL) AND (a = 'b'::text))
-Not-null constraints:
-    "part_b_b_not_null" NOT NULL "b" (local, inherited)
 
 -- Both partition bound and partition key in describe output
 \d+ part_c
``` 

I'm on Ubuntu 22.04 with Postgres 11, 12, 13, and 16 installed from PGDG.

The new status of this patch is: Ready for Committer

Re: Printing backtrace of postgres processes

От
vignesh C
Дата:
On Sun, 5 Nov 2023 at 01:49, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Jul 20, 2023 at 8:22 PM Daniel Gustafsson <daniel@yesql.se> wrote:
> >
> > > On 11 Jan 2023, at 15:44, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> > > On Wed, Nov 30, 2022 at 11:43 AM Bharath Rupireddy
> > > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >>
> > >> I'm attaching the v22 patch set for further review.
> > >
> > > Needed a rebase due to 216a784829c2c5f03ab0c43e009126cbb819e9b2.
> > > Attaching v23 patch set for further review.
> >
> > This thread has stalled for well over 6 months with the patch going from CF to
> > CF.  From skimming the thread it seems that a lot of the details have been
> > ironed out with most (all?) objections addressed.  Is any committer interested
> > in picking this up?  If not we should probably mark it returned with feedback.
>
> Rebase needed due to function oid clash. Picked the new OID with the
> help of src/include/catalog/unused_oids. PSA v24 patch set.

Rebase needed due to changes in parallel_schedule. PSA v25 patch set.

Regards,
Vignesh

Вложения

Re: Printing backtrace of postgres processes

От
Alvaro Herrera
Дата:
On 2022-Jan-27, vignesh C wrote:

> diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
> index 0ee6974f1c..855ccc8902 100644
> --- a/doc/src/sgml/func.sgml
> +++ b/doc/src/sgml/func.sgml

> +    You can get the file name and line number from the logged details by using
> +    gdb/addr2line in linux platforms (users must ensure gdb/addr2line is
> +    already installed).

This doesn't read great.  I mean, what is gdb/addr2line?  I think you
mean either GDB or addr2line; this could be clearer.

> diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
> index 7402696986..522a525741 100644
> --- a/src/backend/utils/error/elog.c
> +++ b/src/backend/utils/error/elog.c
> @@ -944,9 +943,10 @@ errbacktrace(void)
>   * Compute backtrace data and add it to the supplied ErrorData.  num_skip
>   * specifies how many inner frames to skip.  Use this to avoid showing the
>   * internal backtrace support functions in the backtrace.  This requires that
> - * this and related functions are not inlined.
> + * this and related functions are not inlined. If the edata pointer is valid,
> + * backtrace information will be set in edata.
>   */
> -static void
> +void
>  set_backtrace(ErrorData *edata, int num_skip)
>  {
>      StringInfoData errtrace;

This seems like a terrible API choice, and the comment change is no
good.  I suggest you need to create a function that deals only with a
StringInfo, maybe
  append_backtrace(StringInfo buf, int num_skip)
which is used by set_backtrace to print the backtrace in
edata->backtrace, and a new function log_backtrace() that does the
ereport(LOG_SERVER_ONLY) thing.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Before you were born your parents weren't as boring as they are now. They
got that way paying your bills, cleaning up your room and listening to you
tell them how idealistic you are."  -- Charles J. Sykes' advice to teenagers



Re: Printing backtrace of postgres processes

От
Bharath Rupireddy
Дата:
On Fri, Jan 26, 2024 at 4:11 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>

Thanks for reviewing.

> > +    You can get the file name and line number from the logged details by using
> > +    gdb/addr2line in linux platforms (users must ensure gdb/addr2line is
> > +    already installed).
>
> This doesn't read great.  I mean, what is gdb/addr2line?  I think you
> mean either GDB or addr2line; this could be clearer.

Wrapped them in <productname> tag and reworded the comment a bit.

> >   * internal backtrace support functions in the backtrace.  This requires that
> > - * this and related functions are not inlined.
> > + * this and related functions are not inlined. If the edata pointer is valid,
> > + * backtrace information will be set in edata.
> >   */
> > -static void
> > +void
> >  set_backtrace(ErrorData *edata, int num_skip)
> >  {
> >       StringInfoData errtrace;
>
> This seems like a terrible API choice, and the comment change is no
> good.  I suggest you need to create a function that deals only with a
> StringInfo, maybe
>   append_backtrace(StringInfo buf, int num_skip)
> which is used by set_backtrace to print the backtrace in
> edata->backtrace, and a new function log_backtrace() that does the
> ereport(LOG_SERVER_ONLY) thing.

You probably were looking at v21, the above change isn't there in
versions after that. Can you please review the latest version v26
attached here?

We might want this patch extended to the newly added walsummarizer
process which I'll do so in the next version.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: Printing backtrace of postgres processes

От
Michael Paquier
Дата:
On Fri, Jan 26, 2024 at 11:58:00PM +0530, Bharath Rupireddy wrote:
> You probably were looking at v21, the above change isn't there in
> versions after that. Can you please review the latest version v26
> attached here?
>
> We might want this patch extended to the newly added walsummarizer
> process which I'll do so in the next version.

--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
+bool
+SendProcSignalBackendOrAuxproc(int pid, ProcSignalReason reason)
+{

Looking at 0001.  This API is a thin wrapper of SendProcSignal(), that
just checks that we have actually a process before using it.
Shouldn't it be in procsignal.c.

Now looking at 0002, this new routine is used in one place.  Seeing
that we have something similar in pgstatfuncs.c, wouldn't it be
better, instead of englobing SendProcSignal(), to have one routine
that's able to return a PID for a PostgreSQL process?

All the backtrace-related handling is stored in procsignal.c, could it
be cleaner to move the internals into a separate, new file, like
procbacktrace.c or something similar?

LoadBacktraceFunctions() is one more thing we need to set up in all
auxiliary processes.  That's a bit sad to require that in all these
places, and we may forget to add it.  Could we put more efforts in
centralizing these initializations?  The auxiliary processes are one
portion of the problem, and getting stack traces for backend processes
would be useful on its own.  Another suggestion that I'd propose to
simplify the patch would be to focus solely on backends for now, and
do something for auxiliary process later on.  If you do that, the
strange layer with BackendOrAuxproc() is not a problem anymore, as it
would be left out for now.

+<screen>
+logging current backtrace of process with PID 3499242:
+postgres: ubuntu postgres [local] INSERT(+0x61a355)[0x5559b94de355]
+postgres: ubuntu postgres [local] INSERT(procsignal_sigusr1_handler+0x9e)[0x5559b94de4ef]

This is IMO too much details for the documentation, and could be
confusing depending on how the code evolves in the future.  I'd
suggest to keep it minimal, cutting that to a few lines.  I don't see
a need to mention ubuntu, either.
--
Michael

Вложения

Re: Printing backtrace of postgres processes

От
Bharath Rupireddy
Дата:
On Tue, Feb 6, 2024 at 4:21 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> --- a/src/backend/storage/ipc/procarray.c
> +++ b/src/backend/storage/ipc/procarray.c
> +bool
> +SendProcSignalBackendOrAuxproc(int pid, ProcSignalReason reason)
> +{
>
> Looking at 0001.  This API is a thin wrapper of SendProcSignal(), that
> just checks that we have actually a process before using it.
> Shouldn't it be in procsignal.c.
>
> Now looking at 0002, this new routine is used in one place.  Seeing
> that we have something similar in pgstatfuncs.c, wouldn't it be
> better, instead of englobing SendProcSignal(), to have one routine
> that's able to return a PID for a PostgreSQL process?

I liked the idea of going ahead with logging backtraces for only
backends for now, so a separate wrapper like this isn't needed.

> All the backtrace-related handling is stored in procsignal.c, could it
> be cleaner to move the internals into a separate, new file, like
> procbacktrace.c or something similar?

+1. Moved all the code to a new file.

> LoadBacktraceFunctions() is one more thing we need to set up in all
> auxiliary processes.  That's a bit sad to require that in all these
> places, and we may forget to add it.  Could we put more efforts in
> centralizing these initializations?

If we were to do it for only backends (including bg workers)
InitProcess() is the better place. If we were to do it for both
backends and auxiliary processes, BaseInit() is best.

> The auxiliary processes are one
> portion of the problem, and getting stack traces for backend processes
> would be useful on its own.  Another suggestion that I'd propose to
> simplify the patch would be to focus solely on backends for now, and
> do something for auxiliary process later on.  If you do that, the
> strange layer with BackendOrAuxproc() is not a problem anymore, as it
> would be left out for now.

+1 to keep it simple for now; that is, log backtraces of only backends
leaving auxiliary processes aside.

> +<screen>
> +logging current backtrace of process with PID 3499242:
> +postgres: ubuntu postgres [local] INSERT(+0x61a355)[0x5559b94de355]
> +postgres: ubuntu postgres [local] INSERT(procsignal_sigusr1_handler+0x9e)[0x5559b94de4ef]
>
> This is IMO too much details for the documentation, and could be
> confusing depending on how the code evolves in the future.  I'd
> suggest to keep it minimal, cutting that to a few lines.  I don't see
> a need to mention ubuntu, either.

Well, that 'ubuntu' is the default username there, I've changed it now
and kept the output short.

I've simplified the tests, now we don't need two separate output files
for tests. Please see the attached v27 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: Printing backtrace of postgres processes

От
Michael Paquier
Дата:
On Wed, Feb 07, 2024 at 02:04:39PM +0530, Bharath Rupireddy wrote:
> Well, that 'ubuntu' is the default username there, I've changed it now
> and kept the output short.

I would keep it just at two or three lines, with a "For example, with
lines like":

> I've simplified the tests, now we don't need two separate output files
> for tests. Please see the attached v27 patch.

+  proname => 'pg_log_backtrace', provolatile => 'v', prorettype => 'bool',

Hmm.  Would it be better to be in line with memory contexts logging
and use pg_log_backend_backtrace()?  One thing I was wondering is that
there may be a good point in splitting the backtrace support into two
functions (backends and auxiliary processes) that could be split with
two execution ACLs across different roles.

+   PROCSIG_LOG_BACKTRACE,      /* ask backend to log the current backtrace */

Incorrect order.

+-- Backtrace is not logged for auxiliary processes

Not sure there's a point in keeping that in the tests for now.

+    * XXX: It might be worth implementing it for auxiliary processes.

Same, I would remove that.

+static volatile sig_atomic_t backtrace_functions_loaded = false;

Hmm, so you need that because of the fact that it would be called in a
signal as backtrace(3) says:
"If you need certain calls to these two functions to not allocate
memory (in signal handlers, for example), you need to make sure libgcc
is loaded beforehand".

True that it is not interesting to only log something when having a
CFI, this needs to be dynamic to get a precise state of things.
--
Michael

Вложения

Re: Printing backtrace of postgres processes

От
Bharath Rupireddy
Дата:
On Wed, Feb 7, 2024 at 2:57 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Feb 07, 2024 at 02:04:39PM +0530, Bharath Rupireddy wrote:
> > Well, that 'ubuntu' is the default username there, I've changed it now
> > and kept the output short.
>
> I would keep it just at two or three lines, with a "For example, with
> lines like":

Done.

> > I've simplified the tests, now we don't need two separate output files
> > for tests. Please see the attached v27 patch.
>
> +  proname => 'pg_log_backtrace', provolatile => 'v', prorettype => 'bool',
>
> Hmm.  Would it be better to be in line with memory contexts logging
> and use pg_log_backend_backtrace()?

+1.

> One thing I was wondering is that
> there may be a good point in splitting the backtrace support into two
> functions (backends and auxiliary processes) that could be split with
> two execution ACLs across different roles.

-1 for that unless we have any requests. I mean, backtrace is a common
thing for all postgres processes, why different controls are needed?
I'd go with what pg_log_backend_memory_contexts does - it supports
both backends and auxiliary processes.

> +   PROCSIG_LOG_BACKTRACE,      /* ask backend to log the current backtrace */
>
> Incorrect order.

PROCSIG_XXX aren't alphabetically ordered, no?

> +-- Backtrace is not logged for auxiliary processes
>
> Not sure there's a point in keeping that in the tests for now.
>
> +    * XXX: It might be worth implementing it for auxiliary processes.
>
> Same, I would remove that.

Done.

> +static volatile sig_atomic_t backtrace_functions_loaded = false;
>
> Hmm, so you need that because of the fact that it would be called in a
> signal as backtrace(3) says:
> "If you need certain calls to these two functions to not allocate
> memory (in signal handlers, for example), you need to make sure libgcc
> is loaded beforehand".
>
> True that it is not interesting to only log something when having a
> CFI, this needs to be dynamic to get a precise state of things.

Right.

I've also fixed some test failures. Please see the attached v28 patch
set. 0002 extends pg_log_backend_backtrace to auxiliary processes,
just like pg_log_backend_memory_contexts (not focused on PID
de-duplication code yet).

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: Printing backtrace of postgres processes

От
Bharath Rupireddy
Дата:
On Wed, Feb 7, 2024 at 9:00 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Feb 7, 2024 at 2:57 PM Michael Paquier <michael@paquier.xyz> wrote:
> >
> > On Wed, Feb 07, 2024 at 02:04:39PM +0530, Bharath Rupireddy wrote:
> > > Well, that 'ubuntu' is the default username there, I've changed it now
> > > and kept the output short.
> >
> > I would keep it just at two or three lines, with a "For example, with
> > lines like":
>
> Done.
>
> > > I've simplified the tests, now we don't need two separate output files
> > > for tests. Please see the attached v27 patch.
> >
> > +  proname => 'pg_log_backtrace', provolatile => 'v', prorettype => 'bool',
> >
> > Hmm.  Would it be better to be in line with memory contexts logging
> > and use pg_log_backend_backtrace()?
>
> +1.
>
> > One thing I was wondering is that
> > there may be a good point in splitting the backtrace support into two
> > functions (backends and auxiliary processes) that could be split with
> > two execution ACLs across different roles.
>
> -1 for that unless we have any requests. I mean, backtrace is a common
> thing for all postgres processes, why different controls are needed?
> I'd go with what pg_log_backend_memory_contexts does - it supports
> both backends and auxiliary processes.
>
> > +   PROCSIG_LOG_BACKTRACE,      /* ask backend to log the current backtrace */
> >
> > Incorrect order.
>
> PROCSIG_XXX aren't alphabetically ordered, no?
>
> > +-- Backtrace is not logged for auxiliary processes
> >
> > Not sure there's a point in keeping that in the tests for now.
> >
> > +    * XXX: It might be worth implementing it for auxiliary processes.
> >
> > Same, I would remove that.
>
> Done.
>
> > +static volatile sig_atomic_t backtrace_functions_loaded = false;
> >
> > Hmm, so you need that because of the fact that it would be called in a
> > signal as backtrace(3) says:
> > "If you need certain calls to these two functions to not allocate
> > memory (in signal handlers, for example), you need to make sure libgcc
> > is loaded beforehand".
> >
> > True that it is not interesting to only log something when having a
> > CFI, this needs to be dynamic to get a precise state of things.
>
> Right.
>
> I've also fixed some test failures. Please see the attached v28 patch
> set. 0002 extends pg_log_backend_backtrace to auxiliary processes,
> just like pg_log_backend_memory_contexts (not focused on PID
> de-duplication code yet).

I've missed adding LoadBacktraceFunctions() in InitAuxiliaryProcess
for 0002 patch. Please find the attached v29 patch set. Sorry for the
noise.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: Printing backtrace of postgres processes

От
Michael Paquier
Дата:
On Thu, Feb 08, 2024 at 12:30:00AM +0530, Bharath Rupireddy wrote:
> I've missed adding LoadBacktraceFunctions() in InitAuxiliaryProcess
> for 0002 patch. Please find the attached v29 patch set. Sorry for the
> noise.

I've been torturing the patch with \watch and loops calling the
function while doing a sequential scan of pg_stat_activity, and that
was stable while doing a pgbench and an installcheck-world in
parallel, with some infinite loops and some spinlocks I should not
have taken.

+   if (backtrace_functions_loaded)
+       return;

I was really wondering about this point, particularly regarding the
fact that this would load libgcc for all the backends when they start,
unconditionally.  One thing could be to hide that behind a postmaster
GUC disabled by default, but then we'd come back to using gdb on a
live server, which is no fun on a customer environment because of the
extra dependencies, which may not, or just cannot, be installed.  So
yeah, I think that I'm OK about that.

+ * procsignal.h
+ *   Backtrace-related routines

This one is incorrect.

In HandleLogBacktraceInterrupt(), we don't use backtrace_symbols() and
rely on backtrace_symbols_fd() to avoid doing malloc() in the signal
handler as mentioned in [1] back in 2022.  Perhaps the part about the
fact that we don't use backtrace_symbols() should be mentioned
explicitely in a comment rather than silently implied?  That's
a very important point.

Echoing with upthread, and we've been more lax with superuser checks
and assignment of custom roles in recent years, I agree with the
approach of the patch to make that superuser by default.  Then users
can force their own policy as they want with an EXECUTE ACL on the SQL
function.

As a whole, I'm pretty excited about being able to rely on that
without the need to use gdb to get a live trace.  Does anybody have
objections and/or comments, particularly about the superuser and the
load part at backend startup?  This thread has been going on for so
long that it would be good to let 1 week for folks to react before
doing anything.  See v29 for references.

[1]: https://www.postgresql.org/message-id/CALj2ACUNZVB0cQovvKBd53-upsMur8j-5_K=-fg86uAa+WYEWg@mail.gmail.com
--
Michael

Вложения

Re: Printing backtrace of postgres processes

От
Michael Paquier
Дата:
On Thu, Feb 08, 2024 at 12:25:18PM +0900, Michael Paquier wrote:
> In HandleLogBacktraceInterrupt(), we don't use backtrace_symbols() and
> rely on backtrace_symbols_fd() to avoid doing malloc() in the signal
> handler as mentioned in [1] back in 2022.  Perhaps the part about the
> fact that we don't use backtrace_symbols() should be mentioned
> explicitely in a comment rather than silently implied?  That's
> a very important point.

This has been itching me, so I have spent more time reading about
that, and while browsing signal(7) and signal-safety(7), I've first
noticed that this is not safe in the patch:
+   write_stderr("logging current backtrace of process with PID %d:\n",
+                MyProcPid);

Note that there's a write_stderr_signal_safe().

Anyway, I've been digging around the signal-safety of backtrace(3)
(even looking a bit at some GCC code, brrr), and I am under the
impression that backtrace() is just by nature not safe and also
dangerous in signal handlers.  One example of issue I've found:
https://github.com/gperftools/gperftools/issues/838

This looks like enough ground to me to reject the patch.
--
Michael

Вложения

Re: Printing backtrace of postgres processes

От
Alvaro Herrera
Дата:
On 2024-Feb-09, Michael Paquier wrote:

> Anyway, I've been digging around the signal-safety of backtrace(3)
> (even looking a bit at some GCC code, brrr), and I am under the
> impression that backtrace() is just by nature not safe and also
> dangerous in signal handlers.  One example of issue I've found:
> https://github.com/gperftools/gperftools/issues/838
> 
> This looks like enough ground to me to reject the patch.

Hmm, but the backtrace() manpage says

       •  backtrace() and backtrace_symbols_fd() don't call malloc()  explic‐
          itly,  but  they  are part of libgcc, which gets loaded dynamically
          when first used.  Dynamic loading usually triggers a call  to  mal‐
          loc(3).   If  you  need certain calls to these two functions to not
          allocate memory (in signal handlers, for example), you need to make
          sure libgcc is loaded beforehand.

and the patch ensures that libgcc is loaded by calling a dummy
backtrace() at the start of the process.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"All rings of power are equal,
But some rings of power are more equal than others."
                                 (George Orwell's The Lord of the Rings)



Re: Printing backtrace of postgres processes

От
Ashutosh Bapat
Дата:
On Fri, Feb 9, 2024 at 2:18 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2024-Feb-09, Michael Paquier wrote:
>
> > Anyway, I've been digging around the signal-safety of backtrace(3)
> > (even looking a bit at some GCC code, brrr), and I am under the
> > impression that backtrace() is just by nature not safe and also
> > dangerous in signal handlers.  One example of issue I've found:
> > https://github.com/gperftools/gperftools/issues/838
> >
> > This looks like enough ground to me to reject the patch.
>
> Hmm, but the backtrace() manpage says
>
>        •  backtrace() and backtrace_symbols_fd() don't call malloc()  explic‐
>           itly,  but  they  are part of libgcc, which gets loaded dynamically
>           when first used.  Dynamic loading usually triggers a call  to  mal‐
>           loc(3).   If  you  need certain calls to these two functions to not
>           allocate memory (in signal handlers, for example), you need to make
>           sure libgcc is loaded beforehand.
>
> and the patch ensures that libgcc is loaded by calling a dummy
> backtrace() at the start of the process.
>

We defer actual action triggered by a signal till CHECK_FOR_INTERRUPTS
is called. I understand that we can't do that here since we want to
capture the backtrace at that moment and can't wait till next CFI. But
printing the backend can surely wait till next CFI right?


--
Best Wishes,
Ashutosh Bapat



Re: Printing backtrace of postgres processes

От
Michael Paquier
Дата:
On Fri, Feb 09, 2024 at 02:27:26PM +0530, Ashutosh Bapat wrote:
> On Fri, Feb 9, 2024 at 2:18 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>> Hmm, but the backtrace() manpage says
>>
>>        •  backtrace() and backtrace_symbols_fd() don't call malloc()  explic‐
>>           itly,  but  they  are part of libgcc, which gets loaded dynamically
>>           when first used.  Dynamic loading usually triggers a call  to  mal‐
>>           loc(3).   If  you  need certain calls to these two functions to not
>>           allocate memory (in signal handlers, for example), you need to make
>>           sure libgcc is loaded beforehand.
>>
>> and the patch ensures that libgcc is loaded by calling a dummy
>> backtrace() at the start of the process.

FWIW, anything I am reading about the matter freaks me out, including
the dlopen() part in all the backends:
https://www.gnu.org/software/libc/manual/html_node/Backtraces.html

So I really question whether it is a good idea to assume if this will
always be safe depending on the version of libgcc dealt with,
increasing the impact area.  Perhaps that's worrying too much, but it
looks like one of these things where we'd better be really careful.

> We defer actual action triggered by a signal till CHECK_FOR_INTERRUPTS
> is called. I understand that we can't do that here since we want to
> capture the backtrace at that moment and can't wait till next CFI. But
> printing the backend can surely wait till next CFI right?

Delaying the call of backtrace() to happen during a CFI() would be
safe, yes, and writing data to stderr would not really be an issue as
at least the data would be sent somewhere.  That's less useful, but
we do that for memory contexts.
--
Michael

Вложения

Re: Printing backtrace of postgres processes

От
Ashutosh Bapat
Дата:
On Sat, Feb 10, 2024 at 5:36 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Feb 09, 2024 at 02:27:26PM +0530, Ashutosh Bapat wrote:
> > On Fri, Feb 9, 2024 at 2:18 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >> Hmm, but the backtrace() manpage says
> >>
> >>        •  backtrace() and backtrace_symbols_fd() don't call malloc()  explic‐
> >>           itly,  but  they  are part of libgcc, which gets loaded dynamically
> >>           when first used.  Dynamic loading usually triggers a call  to  mal‐
> >>           loc(3).   If  you  need certain calls to these two functions to not
> >>           allocate memory (in signal handlers, for example), you need to make
> >>           sure libgcc is loaded beforehand.
> >>
> >> and the patch ensures that libgcc is loaded by calling a dummy
> >> backtrace() at the start of the process.
>
> FWIW, anything I am reading about the matter freaks me out, including
> the dlopen() part in all the backends:
> https://www.gnu.org/software/libc/manual/html_node/Backtraces.html
>
> So I really question whether it is a good idea to assume if this will
> always be safe depending on the version of libgcc dealt with,
> increasing the impact area.  Perhaps that's worrying too much, but it
> looks like one of these things where we'd better be really careful.

I agree. We don't want a call to backtrace printing mechanism to make
things worse.

>
> > We defer actual action triggered by a signal till CHECK_FOR_INTERRUPTS
> > is called. I understand that we can't do that here since we want to
> > capture the backtrace at that moment and can't wait till next CFI. But
> > printing the backend can surely wait till next CFI right?
>
> Delaying the call of backtrace() to happen during a CFI() would be
> safe, yes, and writing data to stderr would not really be an issue as
> at least the data would be sent somewhere.  That's less useful, but
> we do that for memory contexts.

Memory contexts do not change more or less till next CFI, but stack
traces do. So I am not sure whether it is desirable to wait to capture
backtrace till next CFI. Given that the user can not time a call to
pg_log_backend() exactly, so whether it captures the backtrace exactly
at when interrupt happens or at the next CFI may not matter much in
practice.

--
Best Wishes,
Ashutosh Bapat



Re: Printing backtrace of postgres processes

От
Ashutosh Bapat
Дата:
On Mon, Feb 12, 2024 at 6:52 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> >
> > > We defer actual action triggered by a signal till CHECK_FOR_INTERRUPTS
> > > is called. I understand that we can't do that here since we want to
> > > capture the backtrace at that moment and can't wait till next CFI. But
> > > printing the backend can surely wait till next CFI right?
> >
> > Delaying the call of backtrace() to happen during a CFI() would be
> > safe, yes, and writing data to stderr would not really be an issue as
> > at least the data would be sent somewhere.  That's less useful, but
> > we do that for memory contexts.
>
> Memory contexts do not change more or less till next CFI, but stack
> traces do. So I am not sure whether it is desirable to wait to capture
> backtrace till next CFI. Given that the user can not time a call to
> pg_log_backend() exactly, so whether it captures the backtrace exactly
> at when interrupt happens or at the next CFI may not matter much in
> practice.
>

Thinking more about this I have following thoughts/questions:

1. Whether getting a backtrace at CFI is good enough?
Backtrace is required to know what a process is doing when it's stuck
or is behaviour unexpected etc. PostgreSQL code has CFIs sprinkled in
almost all the tight loops. Whether those places are enough to cover
most of the cases that the user of this feature would care about?

2. tools like gdb, strace can be used to get the stack trace of any
process, so do we really need this tool?
Most of the OSes provide such tools but may be there are useful in
kubernetes like environment, I am not sure.

3. tools like gdb and strace are able to capture stack trace at any
point during execution. Can we use the same mechanism instead of
relying on CFI?

4. tools like gdb and strace can capture more than just stack trace
e.g. variable values, values of registers etc. Are we planning to add
those facilities as well? OR whether this feature will be useful
without those facilities?

May the feature be more useful if it can provide PostgreSQL specific
details which an external tool can not.

--
Best Wishes,
Ashutosh Bapat



Re: Printing backtrace of postgres processes

От
Christoph Berg
Дата:
Re: Michael Paquier
> >>        •  backtrace() and backtrace_symbols_fd() don't call malloc()  explic‐
> >>           itly,  but  they  are part of libgcc, which gets loaded dynamically
> >>           when first used.  Dynamic loading usually triggers a call  to  mal‐
> >>           loc(3).   If  you  need certain calls to these two functions to not
> >>           allocate memory (in signal handlers, for example), you need to make
> >>           sure libgcc is loaded beforehand.
> >>
> >> and the patch ensures that libgcc is loaded by calling a dummy
> >> backtrace() at the start of the process.
> 
> FWIW, anything I am reading about the matter freaks me out, including
> the dlopen() part in all the backends:
> https://www.gnu.org/software/libc/manual/html_node/Backtraces.html

I'd be concerned about the cost of doing that as part of the startup
of every single backend process. Shouldn't this rather be done within
the postmaster so it's automatically inherited by forked backends?
(EXEC_BACKEND systems probably don't have libgcc I guess.)

Christoph



Re: Printing backtrace of postgres processes

От
Michael Paquier
Дата:
On Fri, Feb 23, 2024 at 04:39:47PM +0100, Christoph Berg wrote:
> I'd be concerned about the cost of doing that as part of the startup
> of every single backend process. Shouldn't this rather be done within
> the postmaster so it's automatically inherited by forked backends?
> (EXEC_BACKEND systems probably don't have libgcc I guess.)

Something like this can be measured with a bunch of concurrent
connections attempting connections and a very high rate, like pgbench
with an empty script and -C, for local connections.
--
Michael

Вложения

Re: Printing backtrace of postgres processes

От
Christoph Berg
Дата:
Re: Michael Paquier
> Something like this can be measured with a bunch of concurrent
> connections attempting connections and a very high rate, like pgbench
> with an empty script and -C, for local connections.

I tried that now. Mind that I'm not a benchmarking expert, and there's
been quite some jitter in the results, but I think there's a clear
trend.

Current head without and with the v28 patchset.
Command line:
pgbench -n -C -c 20 -j 20 -f empty.sql -T 30 --progress=2
empty.sql just contains a ";"
model name: 13th Gen Intel(R) Core(TM) i7-13700H

head:
tps = 2211.289863 (including reconnection times)
tps = 2113.907588 (including reconnection times)
tps = 2200.406877 (including reconnection times)
average: 2175

v28:
tps = 1873.472459 (including reconnection times)
tps = 2068.094383 (including reconnection times)
tps = 2196.890897 (including reconnection times)
average: 2046

2046 / 2175 = 0.941

Even if we regard the 1873 as an outlier, I've seen many vanilla runs
with 22xx tps, and not a single v28 run with 22xx tps. Other numbers I
collected suggested a cost of at least 3% for the feature.

Christoph



Re: Printing backtrace of postgres processes

От
Michael Paquier
Дата:
On Mon, Feb 26, 2024 at 04:05:05PM +0100, Christoph Berg wrote:
> I tried that now. Mind that I'm not a benchmarking expert, and there's
> been quite some jitter in the results, but I think there's a clear
> trend.
>
> Even if we regard the 1873 as an outlier, I've seen many vanilla runs
> with 22xx tps, and not a single v28 run with 22xx tps. Other numbers I
> collected suggested a cost of at least 3% for the feature.

Thanks for the numbers.  Yes, that's annoying and I suspect could be
noticeable for a lot of users..
--
Michael

Вложения