Re: kqueue

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: kqueue
Дата
Msg-id CAEepm=3B5P_-aSAGUUHqtN06UezKW-zXxD+jF3KoGPmg_O6wPQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: kqueue  (Marko Tiikkaja <marko@joh.to>)
Ответы Re: kqueue  (Heikki Linnakangas <hlinnaka@iki.fi>)
Список pgsql-hackers
On Wed, Sep 7, 2016 at 12:32 AM, Marko Tiikkaja <marko@joh.to> wrote:
> I've tested and reviewed this, and it looks good to me, other than this
> part:
>
> +   /*
> +    * kevent guarantees that the change list has been processed in the
> EINTR
> +    * case.  Here we are only applying a change list so EINTR counts as
> +    * success.
> +    */
>
> this doesn't seem to be guaranteed on old versions of FreeBSD or any other
> BSD flavors, so I don't think it's a good idea to bake the assumption into
> this code.  Or what do you think?

Thanks for the testing and review!

Hmm.  Well spotted.  I wrote that because the man page from FreeBSD 10.3 says:

  When kevent() call fails with EINTR error, all changes in the changelist
  have been applied.

This sentence is indeed missing from the OpenBSD, NetBSD and OSX man
pages.  It was introduced by FreeBSD commit r280818[1] which made
kevent a Pthread cancellation point.  I investigated whether it is
also true in older FreeBSD and the rest of the BSD family.  I believe
the answer is yes.

1.  That commit doesn't do anything that would change the situation:
it just adds thread cancellation wrapper code to libc and libthr which
exits under certain conditions but otherwise lets EINTR through to the
caller.  So I think the new sentence is documentation of the existing
behaviour of the syscall.

2.  I looked at the code in FreeBSD 4.1[2] (the original kqueue
implementation from which all others derive) and the four modern
OSes[3][4][5][6].  They vary a bit but in all cases, the first place
that can produce EINTR appears to be in kqueue_scan when the
(variously named) kernel sleep routine is invoked, which can return
EINTR or ERESTART  (later translated to EINTR because kevent doesn't
support restarting).  That comes after all changes have been applied.
In fact it's unreachable if nevents is 0: OSX doesn't call kqueue_scan
in that case, and the others return early from kqueue_scan in that
case.

3.  An old email[7] from Jonathan Lemon (creator of kqueue) seems to
support that at least in respect of ancient FreeBSD.  He wrote:
"Technically, an EINTR is returned when a signal interrupts the
process after it goes to sleep (that is, after it calls tsleep).  So
if (as an example) you call kevent() with a zero valued timespec,
you'll never get EINTR, since there's no possibility of it sleeping."

So if I've understood correctly, what I wrote in the v4 patch is
universally true, but it's also moot in this case: kevent cannot fail
with errno == EINTR because nevents == 0.  On that basis, here is a
new version with the comment and special case for EINTR removed.

[1] https://svnweb.freebsd.org/base?view=revision&revision=280818
[2] https://github.com/freebsd/freebsd/blob/release/4.1.0/sys/kern/kern_event.c
[3] https://github.com/freebsd/freebsd/blob/master/sys/kern/kern_event.c
[4] https://github.com/IIJ-NetBSD/netbsd-src/blob/master/sys/kern/kern_event.c
[5] https://github.com/openbsd/src/blob/master/sys/kern/kern_event.c
[6] https://github.com/opensource-apple/xnu/blob/master/bsd/kern/kern_event.c
[7] http://marc.info/?l=freebsd-arch&m=98147346707952&w=2

--
Thomas Munro
http://www.enterprisedb.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Claudio Freire
Дата:
Сообщение: Re: Tuplesort merge pre-reading
Следующее
От: Peter Geoghegan
Дата:
Сообщение: cost_sort() may need to be updated