Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
От | Andres Freund |
---|---|
Тема | Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance |
Дата | |
Msg-id | 20140113224453.GE9762@awork2.anarazel.de обсуждение исходный текст |
Ответ на | Re: Linux kernel impact on PostgreSQL performance (Josh Berkus <josh@agliodbs.com>) |
Ответы |
Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
(Jim Nasby <jim@nasby.net>)
|
Список | pgsql-hackers |
On 2014-01-13 14:19:56 -0800, James Bottomley wrote: > > Frequently mmap()/madvise()/munmap()ing 8kb chunks has > > horrible consequences for performance/scalability - very quickly you > > contend on locks in the kernel. > > Is this because of problems in the mmap_sem? It's been a while since I looked at it, but yes, mmap_sem was part of it. I also seem to recall the amount of IPIs increasing far too much for it to be practical, but I am not sure anymore. > > Also, that will mark that page dirty, which isn't what we want in this > > case. > > You mean madvise (page_addr)? It shouldn't ... the state of the dirty > bit should only be updated by actual writes. Which MADV_ primitive is > causing the dirty marking, because we might be able to fix it (unless > there's some weird corner case I don't know about). Not the madvise() itself, but transplanting the buffer from postgres' buffers to the mmap() area of the underlying file would, right? > We also do have a way of transplanting pages: it's called splice. How > do the semantics of splice differ from what you need? Hm. I don't really see how splice would allow us to seed the kernel's pagecache with content *without* marking the page as dirty in the kernel. We don't need zero-copy IO here, the important thing is just to fill the pagecache with content without a) rereading the page from disk b) marking the page as dirty. > > One major usecase is transplanting a page comming from postgres' > > buffers into the kernel's buffercache because the latter has a much > > better chance of properly allocating system resources across independent > > applications running. > > If you want to share pages between the application and the page cache, > the only known interface is mmap ... perhaps we can discuss how better > to improve mmap for you? I think purely using mmap() is pretty unlikely to work out - there's just too many constraints about when a page is allowed to be written out (e.g. it's interlocked with postgres' write ahead log). I also think that for many practical purposes using mmap() would result in an absurd number of mappings or mapping way too huge areas; e.g. large btree indexes are usually accessed in a quite fragmented manner. > > Oh, and the kernel's page-cache management while far from perfect, > > actually scales much better than postgres'. > > Well, then, it sounds like the best way forward would be to get > postgress to use the kernel page cache more efficiently. No arguments there, although working on postgres scalability is a good idea as well ;) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
В списке pgsql-hackers по дате отправления:
Следующее
От: Mel GormanДата:
Сообщение: Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance