Re: Postgres, fsync, and OSs (specifically linux)

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: Postgres, fsync, and OSs (specifically linux)
Дата
Msg-id CAEepm=05_NJXxaC59bTd7vq8w9aCim2_61Au5dWUW39Z6+bYPg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Postgres, fsync, and OSs (specifically linux)  (Andres Freund <andres@anarazel.de>)
Ответы Re: Postgres, fsync, and OSs (specifically linux)  (Thomas Munro <thomas.munro@enterprisedb.com>)
Список pgsql-hackers
On Sat, May 19, 2018 at 9:03 AM, Andres Freund <andres@anarazel.de> wrote:
> I've written a patch series for this. Took me quite a bit longer than I
> had hoped.

Great.

> I plan to switch to working on something else for a day or two next
> week, and then polish this further. I'd greatly appreciate comments till
> then.

Took it for a spin on macOS and FreeBSD.  First problem:

+       if (socketpair(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC, 0, fsync_fds) < 0)

SOCK_CLOEXEC isn't portable (FreeBSD yes since 10, macOS no, others
who knows).  Adding FD_CLOEXEC to your later fcntl() calls is probably
the way to do it?  I understand from reading the Linux man pages that
there are race conditions with threads but that doesn't apply here.

Next, make check hangs in initdb on both of my pet OSes when md.c
raises an error (fseek fails) and we raise and error while raising and
error and deadlock against ourselves.  Backtrace here:
https://paste.debian.net/1025336/

Apparently the initial error was that mdextend() called _mdnblocks()
which called FileSeek() on vfd 43 == fd 30, pathname "base/1/2704",
but when I check my operating system open file descriptor table I find
that there is no fd 30: there is a 29 and a 31, so it has already been
unexpectedly closed.

I could dig further and/or provide a shell on a system with dev tools.

> I didn't want to do this now, but I think we should also consider
> removing all awareness of segments from the fsync request queue. Instead
> it should deal with individual files, and the segmentation should be
> handled by md.c.  That'll allow us to move all the necessary code to
> smgr.c (or checkpointer?); Thomas said that'd be helpful for further
> work.  I personally think it'd be a lot simpler, because having to have
> long bitmaps with only the last bit set for large append only relations
> isn't a particularly sensible approach imo.  The only thing that that'd
> make more complicated is that the file/database unlink requests get more
> expensive (as they'd likely need to search the whole table), but that
> seems like a sensible tradeoff. Alternatively using a tree structure
> would be an alternative obviously.  Personally I was thinking that we
> should just make the hashtable be over a pathname, that seems most
> generic.

+1

I'll be posting a patch shortly that also needs similar machinery, but
can't easily share with md.c due to technical details.  I'd love there
to be just one of those, and for it to be simpler and general.

-- 
Thomas Munro
http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: Re: Should we add GUCs to allow partition pruning to be disabled?
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: Postgres, fsync, and OSs (specifically linux)