Re: BUG #16827: macOS interrupted syscall leads to a crash

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: BUG #16827: macOS interrupted syscall leads to a crash
Дата
Msg-id 20210122173535.5zzud4b6ffk4mym6@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: BUG #16827: macOS interrupted syscall leads to a crash  (Ricardo Ungureanu <ricardoungureanu@gmail.com>)
Ответы Re: BUG #16827: macOS interrupted syscall leads to a crash  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
Hi,

Tom, all, this seems like a serious problem likely to become more
widespread. I don't really know how we can reasonably address this
short-term, adding EINTR handling to all the places that don't yet have
it (and ensuring that it stays that way) seems like it's a lot to backport.

There's just been another report of this at
https://postgr.es/m/16832-943d33fd58eb26e4%40postgresql.org

On 2021-01-16 19:30:52 +0200, Ricardo Ungureanu wrote:
> În vin., 15 ian. 2021 la 23:05, Andres Freund <andres@anarazel.de> a scris:
> >
> > Hi,
> >
> > On 2021-01-15 14:00:03 +0000, PG Bug reporting form wrote:
> > > I am using macOS 11.0 and trying to import a large dump into postgresql.
> > > Under some circumstances, it crashes while importing.
> > > I inspected the logs and found out a system call is interrupted (" LOG:
> > > could not open file "pg_wal": Interrupted system call"). Apple has added a
> > > new feature in macOS 11.0 to audit security events. I noticed that the
> > > kernel, while waiting on a condition variable, if it receives an interrupt,
> > > will just pass EINTR (error code 4) back to the usermode program. Your
> > > function XLogFileInit does not treat such cases (just ENOENT is checked) and
> > > decides to exit with an abort(). I have attached below the crash file
> > > generated.
> >
> > Hm. It's fairly nasty to return EINTR from open() (except if open()ing a
> > FIFO or such) - it should normally only happen when blocked. But I'm not
> > sure it's *actually* violating any standards / promises made.
> 
> There are two kinds of security events which Apple supports: AUTH and
> NOTIFY. AUTH means that the system call is blocked (on that condition
> variable I mentioned about), and the user mode daemon is asked about
> the generated event: "postgres, pid 999, open() on file /path/file
> with flags 0x400003" - something like that. The usermode can either
> allow or deny the event by replying. If it decides to block this
> system call, the return code seen by the target process (in this case,
> postgres) is -1 (operation not permitted) .
> On the other hand, NOTIFY events will only log the event, without
> requesting a verdict (allow or deny).
> In my scenario, usermode daemon responsible for auditing these events
> is set to ALLOW everything. If I denied the system call I would see in
> postgres log "operation not permitted (err 1)" insead of 'Interrupted
> system call (err 4)").

Does this happens in the default configuration of OSX now? Or is this
something you have manually set up? If manual, why?

Isn't forcing a back-forth context switch for a syscall as common as
open(2) to that auth demon *terrible* for performance?


> To sum up, the open() is blocked, waiting for a verdict from the
> usermode, meanwhile an interrupt is triggered, msleep on the condition
> variable returns EINTR and this is passed back to postgres as the
> return code of open().

> [1] https://developer.apple.com/documentation/endpointsecurity/es_event_type_t

I don't understand how Apple is expecting this to not cause breakage
left and right? Just about no software has EINTR handling for every
syscall (as SA_RESTART is enough), so this seems like it'll lead to a
long tail of bugs.


> > > Apple has added a new feature in macOS 11.0 to audit security
> > > events. I noticed that the kernel, while waiting on a condition
> > > variable, if it receives an interrupt, will just pass EINTR (error
> > > code 4) back to the usermode program.
> >
> > Does that also happen for close()? Because that can't reasonably be
> > handled by userspace (userspace cannot retry because the fd could now
> > point to something else in a threaded environment).
> 
> Good point, however the close event is not supported as AUTH, only as
> NOTIFY. Thus, this cannot happen on close().
> Open() and other file system calls are both AUTH and NOTIFY (you can
> choose which one to enable).
> You can read more about this here[1]

Well, that's at least something.

Greetings,

Andres Freund



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: BUG #16832: Interrupted system call when working with large data tables
Следующее
От: Tom Lane
Дата:
Сообщение: Re: BUG #16827: macOS interrupted syscall leads to a crash