Обсуждение: Replacing abort() with __builtin_trap()?

Поиск
Список
Период
Сортировка

Replacing abort() with __builtin_trap()?

От
Andres Freund
Дата:
Hi,

When looking at Assert() failures and at PANICs, the number of "pointless"
stack entries at the top seems to have grown over the years.  Here's an
example of a stacktrace (that I obviously intentionally triggered):

Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at
./nptl/pthread_kill.c:44
44    ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at
./nptl/pthread_kill.c:44
#1  0x00007f31920a815f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  0x00007f319205a472 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007f31920444b2 in __GI_abort () at ./stdlib/abort.c:79
#4  0x000055b3340c5140 in ExceptionalCondition (conditionName=0x55b3338c7ea0 "\"I kid you not\" == NULL",
    fileName=0x55b3338c6958 "../../../../home/andres/src/postgresql/src/backend/tcop/postgres.c", lineNumber=4126)
    at ../../../../home/andres/src/postgresql/src/backend/utils/error/assert.c:66
#5  0x000055b333ef46c4 in PostgresMain (dbname=0x55b336271608 "postgres", username=0x55b3361fa888 "andres")
    at ../../../../home/andres/src/postgresql/src/backend/tcop/postgres.c:4126
#6  0x000055b333e1fadd in BackendRun (port=0x55b336267ec0) at
../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:4461
#7  0x000055b333e1f369 in BackendStartup (port=0x55b336267ec0) at
../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:4189
#8  0x000055b333e1b406 in ServerLoop () at
../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:1779
#9  0x000055b333e1ad17 in PostmasterMain (argc=73, argv=0x55b3361f83f0) at
../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:1463
#10 0x000055b333d052e2 in main (argc=73, argv=0x55b3361f83f0) at
../../../../home/andres/src/postgresql/src/backend/main/main.c:198

That's due to glibc having a very complicated abort(). Which might be nice as
a backstop, but for the default Assert it's imo just noise.

I'd like to propose that we do a configure test for __builtin_trap() and use
it, if available, before the abort() in ExceptionalCondition(). Perhaps also
for PANIC, but it's not as clear to me whether we should.

Here's a backtrace when using __builtin_trap():
#0  ExceptionalCondition (conditionName=0x55e7e7c90ea0 "\"I kid you not\" == NULL",
    fileName=0x55e7e7c8f958 "../../../../home/andres/src/postgresql/src/backend/tcop/postgres.c", lineNumber=4126)
    at ../../../../home/andres/src/postgresql/src/backend/utils/error/assert.c:66
66        __builtin_trap();
(gdb) bt
#0  ExceptionalCondition (conditionName=0x55e7e7c90ea0 "\"I kid you not\" == NULL",
    fileName=0x55e7e7c8f958 "../../../../home/andres/src/postgresql/src/backend/tcop/postgres.c", lineNumber=4126)
    at ../../../../home/andres/src/postgresql/src/backend/utils/error/assert.c:66
#1  0x000055e7e82bd6c4 in PostgresMain (dbname=0x55e7e9ea8608 "postgres", username=0x55e7e9e31888 "andres")
    at ../../../../home/andres/src/postgresql/src/backend/tcop/postgres.c:4126
#2  0x000055e7e81e8add in BackendRun (port=0x55e7e9e9eec0) at
../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:4461
#3  0x000055e7e81e8369 in BackendStartup (port=0x55e7e9e9eec0) at
../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:4189
#4  0x000055e7e81e4406 in ServerLoop () at
../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:1779
#5  0x000055e7e81e3d17 in PostmasterMain (argc=73, argv=0x55e7e9e2f3f0) at
../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:1463
#6  0x000055e7e80ce2e2 in main (argc=73, argv=0x55e7e9e2f3f0) at
../../../../home/andres/src/postgresql/src/backend/main/main.c:198


Maybe I crash things too often, but I like to not have to deal with 4
pointless frames at the top...

Greetings,

Andres Freund



Re: Replacing abort() with __builtin_trap()?

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> I'd like to propose that we do a configure test for __builtin_trap() and use
> it, if available, before the abort() in ExceptionalCondition(). Perhaps also
> for PANIC, but it's not as clear to me whether we should.

Does that still result in the same process exit signal being reported to
the postmaster?  The GCC manual makes it sound like the reported signal
could be platform-dependent, which'd be kind of confusing.

            regards, tom lane



Re: Replacing abort() with __builtin_trap()?

От
Andres Freund
Дата:
Hi,

On 2023-07-02 13:55:53 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I'd like to propose that we do a configure test for __builtin_trap() and use
> > it, if available, before the abort() in ExceptionalCondition(). Perhaps also
> > for PANIC, but it's not as clear to me whether we should.
>
> Does that still result in the same process exit signal being reported to
> the postmaster?

It does not on linux / x86-64.

2023-07-02 10:52:55.103 PDT [1398197][postmaster][:0][] LOG:  server process (PID 1398207) was terminated by signal 4:
Illegalinstruction
 
vs today's
2023-07-02 11:08:22.674 PDT [1401801][postmaster][:0][] LOG:  server process (PID 1401809) was terminated by signal 6:
Aborted

It wouldn't be bad for postmaster to be able to distinguish between PANIC and
Assert(), but I agree that the non-determinism is a bit annoying.

Greetings,

Andres Freund



Re: Replacing abort() with __builtin_trap()?

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2023-07-02 13:55:53 -0400, Tom Lane wrote:
>> Andres Freund <andres@anarazel.de> writes:
>>> I'd like to propose that we do a configure test for __builtin_trap() and use
>>> it, if available, before the abort() in ExceptionalCondition(). Perhaps also
>>> for PANIC, but it's not as clear to me whether we should.

>> Does that still result in the same process exit signal being reported to
>> the postmaster?

> It does not on linux / x86-64.

> 2023-07-02 10:52:55.103 PDT [1398197][postmaster][:0][] LOG:  server process (PID 1398207) was terminated by signal
4:Illegal instruction 
> vs today's
> 2023-07-02 11:08:22.674 PDT [1401801][postmaster][:0][] LOG:  server process (PID 1401809) was terminated by signal
6:Aborted 

Hm, I do *not* like "Illegal instruction" in place of SIGABRT;
that looks too much like we vectored off into never-never land.
I'd rather live with the admittedly-ugly stack traces.

            regards, tom lane