Re: Add errdetail() with PID and UID about source of termination signal
| От | Chao Li |
|---|---|
| Тема | Re: Add errdetail() with PID and UID about source of termination signal |
| Дата | |
| Msg-id | CFD6C011-7D95-418B-B7BA-5F864F682C83@gmail.com обсуждение исходный текст |
| Ответ на | Re: Add errdetail() with PID and UID about source of termination signal (Jakub Wartak <jakub.wartak@enterprisedb.com>) |
| Ответы |
Re: Add errdetail() with PID and UID about source of termination signal
|
| Список | pgsql-hackers |
> On Feb 25, 2026, at 18:45, Jakub Wartak <jakub.wartak@enterprisedb.com> wrote:
>
> On Wed, Feb 25, 2026 at 10:15 AM Chao Li <li.evan.chao@gmail.com> wrote:
>
> Hi Chao, thanks for review.
>
>> A few comments for v3:
>>
>> 1 - syncrep.c
> [..]
>> + proc_die_sender_pid == 0 ? 0 :
>> + errdetail("Signal sent by PID %lld, UID %lld.",
>> + (long long)proc_die_sender_pid, (long
long)proc_die_sender_uid)
>> + ));
>> ```
>>
>> Here errdetail is used twice. I guess the second conditional one should be errhint.
>>
>> 2 - syncrep.c
> [..]
>> Same as comment 1.
>
> You are right, apparently I copy/pasted code from src/backend/tcop/postgres.c
> way too fast... fixed.
>
>> 3
>> ```
>> +volatile uint32 proc_die_sender_pid = 0;
>> +volatile uint32 proc_die_sender_uid = 0;
>> ```
>>
>> These two globals are only written in the signal handler, I think they should be sig_atomic_t to ensure atomic
writes.
>
> Well the problem that sig_atomic_t is int and we need at least uint32 and
> I couldn't find better way. I think that 4 bytes writes will be mostly
> always atomic (for 64-bits it would depend on
> PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY)
>
> Yet I moved those little below, so it's more aligned to the other uses.
>
>> 4
>> ```
>> - errmsg("terminating walreceiver process due to administrator command")));
>> + errmsg("terminating walreceiver process due to administrator command"),
>> + proc_die_sender_pid == 0 ? 0 :
>> + errdetail("Signal sent by PID %lld, UID %lld.",
>> + (long long)proc_die_sender_pid, (long
long)proc_die_sender_uid)
>> + ));
>> ```
>>
>> Why do we need to format pid and uid in “long long” format? I searched over the source tree, the current
postmaster.cjust formats pid as int (%d):
>> ```
>> /* in parent, successful fork */
>> ereport(DEBUG2,
>> (errmsg_internal("forked new %s, pid=%d socket=%d",
>> GetBackendTypeDesc(bn->bkend_type),
>> (int) pid, (int) client_sock->sock)));
>> ```
>
> Yes, I think I was kind of lost when thinking about it (v1 had sig_atomic_t,
> later had pid_t, I read somewhere about 64-bit pids, and so on) vs
> platform-agnostic hell of putting that into printf). Possible I was
> overthinking it
> and I have reverted it to just using %d with that uint32. BTW I've also found:
> elog(DEBUG3, "kill(%ld,%d) failed: %m", (long) pid, signal);
>
> v4 attached. Thanks again for the review.
>
> -J.
> <v4-0001-Add-errdetail-with-PID-and-UID-about-source-of-te.patch>
I just reviewed v4 again and got a few more comments:
1. This patch only set the global proc_die_sender_pid/uid to 0 at startup, then assign values to them upon receiving
SIGTERM,and never reset them, which assumes a process must die upon SIGTERM. Is the assumption true? I guess not. If a
processreceives SIGTERM and not die immediately, then die for other reason, then it may report a misleading PID and
UID.So, I think we may need to reset proc_die_sender_pid/uid somewhere. For example, in ProcessInterrupts(), copy them
tolocal variables and reset them to 0, then use the local variables for ereport().
2.
```
@@ -319,7 +323,11 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
ereport(WARNING,
(errmsg("canceling wait for synchronous replication due to user request"),
- errdetail("The transaction has already committed locally, but might not have been replicated to
thestandby.")));
+ errdetail("The transaction has already committed locally, but might not have been replicated to
thestandby."),
+ proc_die_sender_pid == 0 ? 0 :
+ errhint("Signal sent by PID %d, UID %d.",
+ proc_die_sender_pid, proc_die_sender_uid)
+ ));
```
syncrpe.c uses errhint to print PID and UID, and postgres.c uses errdetail. We should keep consistency, maybe all use
errhint.
3.
```
@@ -319,7 +323,11 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
QueryCancelPending = false;
ereport(WARNING,
(errmsg("canceling wait for synchronous replication due to user request"),
- errdetail("The transaction has already committed locally, but might not have been replicated to
thestandby.")));
+ errdetail("The transaction has already committed locally, but might not have been replicated to
thestandby."),
+ proc_die_sender_pid == 0 ? 0 :
+ errhint("Signal sent by PID %d, UID %d.",
+ proc_die_sender_pid, proc_die_sender_uid)
+ ));
SyncRepCancelWait();
break;
}
```
I don’t think the query cancel case relates to SIGTERM, so we don’t need to log PID and UID here.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
В списке pgsql-hackers по дате отправления: