Re: Add errdetail() with PID and UID about source of termination signal

Поиск
Список
Период
Сортировка
От Chao Li
Тема Re: Add errdetail() with PID and UID about source of termination signal
Дата
Msg-id CFD6C011-7D95-418B-B7BA-5F864F682C83@gmail.com
обсуждение исходный текст
Ответ на Re: Add errdetail() with PID and UID about source of termination signal  (Jakub Wartak <jakub.wartak@enterprisedb.com>)
Ответы Re: Add errdetail() with PID and UID about source of termination signal
Список pgsql-hackers

> On Feb 25, 2026, at 18:45, Jakub Wartak <jakub.wartak@enterprisedb.com> wrote:
>
> On Wed, Feb 25, 2026 at 10:15 AM Chao Li <li.evan.chao@gmail.com> wrote:
>
> Hi Chao, thanks for review.
>
>> A few comments for v3:
>>
>> 1 - syncrep.c
> [..]
>> +                                               proc_die_sender_pid == 0 ? 0 :
>> +                                                       errdetail("Signal sent by PID %lld, UID %lld.",
>> +                                                               (long long)proc_die_sender_pid, (long
long)proc_die_sender_uid)
>> +                                               ));
>> ```
>>
>> Here errdetail is used twice. I guess the second conditional one should be errhint.
>>
>> 2 - syncrep.c
> [..]
>> Same as comment 1.
>
> You are right, apparently I copy/pasted code from src/backend/tcop/postgres.c
> way too fast... fixed.
>
>> 3
>> ```
>> +volatile uint32 proc_die_sender_pid = 0;
>> +volatile uint32 proc_die_sender_uid = 0;
>> ```
>>
>> These two globals are only written in the signal handler, I think they should be sig_atomic_t to ensure atomic
writes.
>
> Well the problem that sig_atomic_t is int and we need at least uint32 and
> I couldn't find better way. I think that 4 bytes writes will be mostly
> always atomic (for 64-bits it would depend on
> PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY)
>
> Yet I moved those little below, so it's more aligned to the other uses.
>
>> 4
>> ```
>> -                                        errmsg("terminating walreceiver process due to administrator command")));
>> +                                        errmsg("terminating walreceiver process due to administrator command"),
>> +                                        proc_die_sender_pid == 0 ? 0 :
>> +                                               errdetail("Signal sent by PID %lld, UID %lld.",
>> +                                                       (long long)proc_die_sender_pid, (long
long)proc_die_sender_uid)
>> +                                       ));
>> ```
>>
>> Why do we need to format pid and uid in “long long” format? I searched over the source tree, the current
postmaster.cjust formats pid as int (%d): 
>> ```
>>        /* in parent, successful fork */
>>        ereport(DEBUG2,
>>                        (errmsg_internal("forked new %s, pid=%d socket=%d",
>>                                                         GetBackendTypeDesc(bn->bkend_type),
>>                                                         (int) pid, (int) client_sock->sock)));
>> ```
>
> Yes, I think I was kind of lost when thinking about it (v1 had sig_atomic_t,
> later had pid_t, I read somewhere about 64-bit pids, and so on) vs
> platform-agnostic hell of putting that into printf). Possible I was
> overthinking it
> and I have reverted it to just using %d with that uint32. BTW I've also found:
>   elog(DEBUG3, "kill(%ld,%d) failed: %m", (long) pid, signal);
>
> v4 attached. Thanks again for the review.
>
> -J.
> <v4-0001-Add-errdetail-with-PID-and-UID-about-source-of-te.patch>

I just reviewed v4 again and got a few more comments:

1. This patch only set the global proc_die_sender_pid/uid to 0 at startup, then assign values to them upon receiving
SIGTERM,and never reset them, which assumes a process must die upon SIGTERM. Is the assumption true? I guess not. If a
processreceives SIGTERM and not die immediately, then die for other reason, then it may report a misleading PID and
UID.So, I think we may need to reset proc_die_sender_pid/uid somewhere. For example, in ProcessInterrupts(), copy them
tolocal variables and reset them to 0, then use the local variables for ereport(). 

2.
```
@@ -319,7 +323,11 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
             ereport(WARNING,
                     (errmsg("canceling wait for synchronous replication due to user request"),
-                     errdetail("The transaction has already committed locally, but might not have been replicated to
thestandby."))); 
+                     errdetail("The transaction has already committed locally, but might not have been replicated to
thestandby."), 
+                        proc_die_sender_pid == 0 ? 0 :
+                            errhint("Signal sent by PID %d, UID %d.",
+                                proc_die_sender_pid, proc_die_sender_uid)
+                        ));
```

syncrpe.c uses errhint to print PID and UID, and postgres.c uses errdetail. We should keep consistency, maybe all use
errhint.

3.
```
@@ -319,7 +323,11 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
             QueryCancelPending = false;
             ereport(WARNING,
                     (errmsg("canceling wait for synchronous replication due to user request"),
-                     errdetail("The transaction has already committed locally, but might not have been replicated to
thestandby."))); 
+                     errdetail("The transaction has already committed locally, but might not have been replicated to
thestandby."), 
+                        proc_die_sender_pid == 0 ? 0 :
+                            errhint("Signal sent by PID %d, UID %d.",
+                                proc_die_sender_pid, proc_die_sender_uid)
+                        ));
             SyncRepCancelWait();
             break;
         }
```

I don’t think the query cancel case relates to SIGTERM, so we don’t need to log PID and UID here.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







В списке pgsql-hackers по дате отправления: