Re: BUG #16827: macOS interrupted syscall leads to a crash
От | Andres Freund |
---|---|
Тема | Re: BUG #16827: macOS interrupted syscall leads to a crash |
Дата | |
Msg-id | 20210122173535.5zzud4b6ffk4mym6@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Re: BUG #16827: macOS interrupted syscall leads to a crash (Ricardo Ungureanu <ricardoungureanu@gmail.com>) |
Ответы |
Re: BUG #16827: macOS interrupted syscall leads to a crash
(Tom Lane <tgl@sss.pgh.pa.us>)
|
Список | pgsql-bugs |
Hi, Tom, all, this seems like a serious problem likely to become more widespread. I don't really know how we can reasonably address this short-term, adding EINTR handling to all the places that don't yet have it (and ensuring that it stays that way) seems like it's a lot to backport. There's just been another report of this at https://postgr.es/m/16832-943d33fd58eb26e4%40postgresql.org On 2021-01-16 19:30:52 +0200, Ricardo Ungureanu wrote: > În vin., 15 ian. 2021 la 23:05, Andres Freund <andres@anarazel.de> a scris: > > > > Hi, > > > > On 2021-01-15 14:00:03 +0000, PG Bug reporting form wrote: > > > I am using macOS 11.0 and trying to import a large dump into postgresql. > > > Under some circumstances, it crashes while importing. > > > I inspected the logs and found out a system call is interrupted (" LOG: > > > could not open file "pg_wal": Interrupted system call"). Apple has added a > > > new feature in macOS 11.0 to audit security events. I noticed that the > > > kernel, while waiting on a condition variable, if it receives an interrupt, > > > will just pass EINTR (error code 4) back to the usermode program. Your > > > function XLogFileInit does not treat such cases (just ENOENT is checked) and > > > decides to exit with an abort(). I have attached below the crash file > > > generated. > > > > Hm. It's fairly nasty to return EINTR from open() (except if open()ing a > > FIFO or such) - it should normally only happen when blocked. But I'm not > > sure it's *actually* violating any standards / promises made. > > There are two kinds of security events which Apple supports: AUTH and > NOTIFY. AUTH means that the system call is blocked (on that condition > variable I mentioned about), and the user mode daemon is asked about > the generated event: "postgres, pid 999, open() on file /path/file > with flags 0x400003" - something like that. The usermode can either > allow or deny the event by replying. If it decides to block this > system call, the return code seen by the target process (in this case, > postgres) is -1 (operation not permitted) . > On the other hand, NOTIFY events will only log the event, without > requesting a verdict (allow or deny). > In my scenario, usermode daemon responsible for auditing these events > is set to ALLOW everything. If I denied the system call I would see in > postgres log "operation not permitted (err 1)" insead of 'Interrupted > system call (err 4)"). Does this happens in the default configuration of OSX now? Or is this something you have manually set up? If manual, why? Isn't forcing a back-forth context switch for a syscall as common as open(2) to that auth demon *terrible* for performance? > To sum up, the open() is blocked, waiting for a verdict from the > usermode, meanwhile an interrupt is triggered, msleep on the condition > variable returns EINTR and this is passed back to postgres as the > return code of open(). > [1] https://developer.apple.com/documentation/endpointsecurity/es_event_type_t I don't understand how Apple is expecting this to not cause breakage left and right? Just about no software has EINTR handling for every syscall (as SA_RESTART is enough), so this seems like it'll lead to a long tail of bugs. > > > Apple has added a new feature in macOS 11.0 to audit security > > > events. I noticed that the kernel, while waiting on a condition > > > variable, if it receives an interrupt, will just pass EINTR (error > > > code 4) back to the usermode program. > > > > Does that also happen for close()? Because that can't reasonably be > > handled by userspace (userspace cannot retry because the fd could now > > point to something else in a threaded environment). > > Good point, however the close event is not supported as AUTH, only as > NOTIFY. Thus, this cannot happen on close(). > Open() and other file system calls are both AUTH and NOTIFY (you can > choose which one to enable). > You can read more about this here[1] Well, that's at least something. Greetings, Andres Freund
В списке pgsql-bugs по дате отправления:
Предыдущее
От: Andres FreundДата:
Сообщение: Re: BUG #16832: Interrupted system call when working with large data tables