Re: Group Commits Vs WAL Writes

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: Group Commits Vs WAL Writes
Дата
Msg-id CAMkU=1zSyAuCP6cQ=9NjX-3nwW00qPcAz93JUnEDWBRacVWtmg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Group Commits Vs WAL Writes  (Atri Sharma <atri.jiit@gmail.com>)
Ответы Re: Group Commits Vs WAL Writes  (Atri Sharma <atri.jiit@gmail.com>)
Список pgsql-hackers
On Thu, Jun 27, 2013 at 9:51 AM, Atri Sharma <atri.jiit@gmail.com> wrote:
>
> commit_delay exists to artificially increase the window in which the
> leader backend waits for more group commit followers. At higher client
> counts, that isn't terribly useful because you'll naturally have
> enough clients anyway, but at lower client counts particularly where
> fsyncs have high latency, it can help quite a bit. I mention this
> because clearly commit_delay is intended to trade off latency for
> throughput. Although having said that, when I worked on commit_delay,
> the average and worse-case latencies actually *improved* for the
> workload in question, which consisted of lots of small write
> transactions. Though I wouldn't be surprised if you could produce a
> reasonable case where latency was hurt a bit, but throughput improved.

Throughput and average latency are strictly reciprocal, aren't they?  I think when people talk about improving latency, they must mean something like "improve 95% latency", not average latency.  Otherwise, it doesn't seem to make much sense to me, they are the same thing.
 

Thanks for your reply.

The logic says that latency will be hit when commit_delay is applied,
but I am really interested in why we get an improvement instead.

There is a spot on the disk to which the current WAL is destined to go.  That spot on the disk is not going to be under the write-head for (say) another 6 milliseconds.

Without commit_delay, I try to commit my record, but find that someone else is already on the lock (and on the fsync as well).  I have to wait for 6 milliseconds before that person gets their commit done and releases the lock, then I can start mine, and have to wait another 8 milliseconds (7500 rpm disk) for the spot to come around again, for a total of 14 milliseconds of latency.

With commit_delay, I get my record in under the nose of the person who is already doing the delay, and they wake up and flush it for me in time to make the 6 millisecond cutoff.  Total 6 milliseconds latency for me. 

One thing I tried a while ago (before the recent group-commit changes were made) was to record in shared memory when the last fsync finished, and then the next time someone needed to fsync, they would sleep until just before the write spot was predicted to be under the write head again (previous_finish + rotation_time - safety_margin, where rotation_time - safety_margin were represented by a single guc).  It worked pretty well on the system in which I wrote it, but seemed too brittle to be a general solution.

Another thing I tried was to drop the WALWriteLock after the WAL write finished but before calling fsync.  The theory was that process 1 could write its WAL and then block on the fsync, and then process 2 could also write its WAL and also block directly on the fsync, and the kernel/disk controller would be smart enough to realize that it could merge the two pending fsync requests into one.  This did not work at all, possibly because my disk controller was very cheap and not very smart.

Cheers,

Jeff

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: updated emacs configuration
Следующее
От: Noah Misch
Дата:
Сообщение: Re: proposal: enable new error fields in plpgsql (9.4)