Re: WALWriteLock contention

Поиск

Список

Период

Сортировка

От	Jeff Janes
Тема	Re: WALWriteLock contention
Дата	18 мая 2015 г. 17:57:33
Msg-id	CAMkU=1w7nwz89FQWhbetDgOctjxOSBvRo0hDg+6mCSmCA4B1iA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: WALWriteLock contention (Robert Haas <robertmhaas@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

>
> My goal there was to further improve group commit. When running pgbench
> -j10 -c10, it was common to see fsyncs that alternated between flushing 1
> transaction, and 9 transactions. Because the first one to the gate would go
> through it and slam it on all the others, and it would take one fsync cycle
> for it reopen.

Hmm, yeah. I remember somewhat (Peter Geoghegan, I think) mentioning
behavior like that before, but I had not made the connection to this
issue at that time. This blog post is pretty depressing:

http://oldblog.antirez.com/post/fsync-different-thread-useless.html

It suggests that an fsync in progress blocks out not only other
fsyncs, but other writes to the same file, which for our purposes is
just awful. More Googling around reveals that this is apparently
well-known to Linux kernel developers and that they don't seem excited
about fixing it. :-(

I think they already did. I don't see the effect in ext4, even on a rather old kernel like 2.6.32, using the code from the link above.

<crazy-idea>I wonder if we could write WAL to two different files in
alternation, so that we could be writing to one file which fsync-ing
the other.</crazy-idea>

I thought the most promising things, once there were timers and sleeps with resolution much better than centisecond, was to record the time at which each fsync finished, and then sleep until "then + commit_delay". That way you don't do any harm to the sleeper, as the write head is not positioned to process the fsync until then anyway, and give other workers the chance to get their commit records in.

But then I kind of lost interest, because anyone who cares very much about commit performance will probably get a nonvolatile write cache, and anything done would be too hardware/platform dependent.

Of course a BBU isn't magic, the kernel still has to spend time scrubbing the buffer pool and sending the dirty ones to the disk/controller when it gets an fsync, even if the confirmation does come back quickly. But it still seems too hardware/platform dependent to find a general purpose optimization.

Cheers,

Jeff

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: WALWriteLock contention