Re: some longer, larger pgbench tests with various performance-related patches
От | Greg Smith |
---|---|
Тема | Re: some longer, larger pgbench tests with various performance-related patches |
Дата | |
Msg-id | 4F3161C7.4090006@2ndQuadrant.com обсуждение исходный текст |
Ответ на | some longer, larger pgbench tests with various performance-related patches (Robert Haas <robertmhaas@gmail.com>) |
Список | pgsql-hackers |
On 01/24/2012 03:53 PM, Robert Haas wrote: > There are two graphs for each branch. The first is a scatter plot of > latency vs. transaction time. I found that graph hard to understand, > though; I couldn't really tell what I was looking at. So I made a > second set of graphs which graph number of completed transactions in a > given second of the test against time. Note that you're now reinventing parts of pgbench-tools; the main two graphs it gives are the latency scatter plot and TPS per second. The things you're likely to find interesting next are maximum latency, 90th percentile latency, and a delta for what changed in pg_stat_bgwriter during the test; those are the other things I track in that program. I'm working toward publishing my own tests of the performance patches still considered useful by the end of the week. Murphy's Law has active on that project since it started though--server crashed the day I left on a week long trip, and I've been sick ever since getting back. > First, some of > these transactions had really long latency. Second, there are a > remarkable number of seconds all of the test during which no > transactions at all manage to complete, sometimes several seconds in a > row. These periods have in my tests always been associated with Linux turning aggressive about cleaning out its write cache, either due to fsync request or simply crossing one of its thresholds for doing so. My current record is an 80 second pause with no transactions completing. One of things I expect to add to pgbench_tools within the next week is tracking how much dirty memory is accumulating during each test. Seeing that graph overlaid on top of the rest makes a lot of what's happening at any time more obvious. Noting when the checkpoints happen is a bit less interesting, because once the first one happens, they happen almost continuously. You really need to track when the write and sync phases are happening for that to be really useful. This circles back to why I proposed exposing those timing bits in pg_stat_bgwriter. pgbench-tools already grabs data from it, which avoids all the mess around log file parsing. If I could do that more often and extract checkpoint timing from that data, it would make labelling graphs like these much easier to do, from the client that's running the benchmark even. > Third, all of the tests initially start of > processing transactions very quickly, and get slammed down very hard, > probably because the very high rate of transaction processing early on > causes a checkpoint to occur around 200 s. At the beginning of a write-heavy pgbench run, rate is high until one of these two things happen: 1) A checkpoint begins 2) Linux's write cache threshold (typically /proc/sys/vm/dirty_background_ratio) worth of dirty memory accumulates. Note that (1) on its own isn't necessarily the problem, it's something the case that it just makes (2) happen much faster then. Basically, the first 30 to 150 seconds of any write-heavy test always have an inflated speed. You're writing into the OS cache at maximum speed, and none of those writes are making it to physical disk--except perhaps for the WAL, which is all fast and sequential. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
В списке pgsql-hackers по дате отправления:
Следующее
От: Robert HaasДата:
Сообщение: Re: Add protransform for numeric, varbit, and temporal types