Re: Why we are going to have to go DirectIO

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: Why we are going to have to go DirectIO
Дата
Msg-id CAMkU=1wwhJ9aYxwj53bGFsMeC0HnGtWZgA5UAaJoA7_jAdsYqg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Why we are going to have to go DirectIO  (Claudio Freire <klaussfreire@gmail.com>)
Ответы Re: Why we are going to have to go DirectIO
Re: Why we are going to have to go DirectIO
Список pgsql-hackers
On Tue, Dec 3, 2013 at 11:39 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
On Wed, Dec 4, 2013 at 4:28 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
>>> Can we avoid the Linux kernel problem by simply increasing our shared
>>> buffer size, say up to 80% of memory?
>> It will be swap more easier.
>
> Is that the case? If the system has not enough memory, the kernel
> buffer will be used for other purpose, and the kernel cache will not
> work very well anyway. In my understanding, the problem is, even if
> there's enough memory, the kernel's cache does not work as expected.


Problem is, Postgres relies on a working kernel cache for checkpoints.
Checkpoint logic would have to be heavily reworked to account for an
impaired kernel cache.

I don't think it would need anything more than a sorted checkpoint.  There are patches around for doing those.  I can dig one up again and rebase it to HEAD if anyone cares.  What else would be needed checkpoint-wise?

As far as I can tell, the main problem with large shared_buffers is some poorly characterized locking issues related to either the buffer mapping or the freelist.  And those locking issues seem to trigger even more poorly characterized scheduling issues in the kernel, at least in some kernels.

But note that if we did do this, just crank up shared_buffers so it takes up 95% of RAM, our own ring buffer access strategy would be even worse for the case which started this thread than the kernel's policy being complained of.  That strategy is only acceptable because it normally sits on top of a substantial cache at the kernel level.
 

Really, there's no difference between fixing the I/O problems in the
kernel(s) vs in postgres. The only difference is, in the kernel(s),
everyone profits, and you've got a huge head start.

That assumes the type of problem the kernel faces is the same as the ones a database does, which I kind of doubt.  Even if the changes were absolute improvements with no trade-offs, we would need to convince a much larger community of that fact.
 

Communicating more with the kernel (through posix_fadvise, fallocate,
aio, iovec, etc...) would probably be good, but it does expose more
kernel issues. posix_fadvise, for instance, is a double-edged sword
ATM. I do believe, however, that exposing those issues and prompting a
fix is far preferable than silently working around them.

Getting the kernel to improve those things so PostgreSQL can be changed to use them more aggressively seems almost hopeless to me.  PostgreSQL would have to be coded to take advantage of the improved versions, while defending itself from the pre-improved versions.  And my understanding is that different distributions of Linux cherry pick changes to the kernel back and forth into their code, so just looking at the kernel version number without also looking at the distribution doesn't mean very much about whether we have the improved feature or not.  Or am I misinformed about that?

If we can point things out to the kernel hackers things that would be absolute improvements, where PostgreSQL and everything else just magically start working better if that improvement makes it in, that is great. Both if both systems have to be changed in sync to derive any benefit, how do we coordinate that?

Cheers,

Jeff

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: ANALYZE sampling is too good
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: pg_stat_statements fingerprinting logic and ArrayExpr