Re: some longer, larger pgbench tests with various performance-related patches

Поиск
Список
Период
Сортировка
От Greg Smith
Тема Re: some longer, larger pgbench tests with various performance-related patches
Дата
Msg-id 4F3161C7.4090006@2ndQuadrant.com
обсуждение исходный текст
Ответ на some longer, larger pgbench tests with various performance-related patches  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 01/24/2012 03:53 PM, Robert Haas wrote:
> There are two graphs for each branch.  The first is a scatter plot of
> latency vs. transaction time.  I found that graph hard to understand,
> though; I couldn't really tell what I was looking at.  So I made a
> second set of graphs which graph number of completed transactions in a
> given second of the test against time.

Note that you're now reinventing parts of pgbench-tools; the main two 
graphs it gives are the latency scatter plot and TPS per second.  The 
things you're likely to find interesting next are maximum latency, 90th 
percentile latency, and a delta for what changed in pg_stat_bgwriter 
during the test; those are the other things I track in that program.

I'm working toward publishing my own tests of the performance patches 
still considered useful by the end of the week.  Murphy's Law has active 
on that project since it started though--server crashed the day I left 
on a week long trip, and I've been sick ever since getting back.

> First, some of
> these transactions had really long latency.  Second, there are a
> remarkable number of seconds all of the test during which no
> transactions at all manage to complete, sometimes several seconds in a
> row.

These periods have in my tests always been associated with Linux turning 
aggressive about cleaning out its write cache, either due to fsync 
request or simply crossing one of its thresholds for doing so.  My 
current record is an 80 second pause with no transactions completing.

One of things I expect to add to pgbench_tools within the next week is 
tracking how much dirty memory is accumulating during each test.  Seeing 
that graph overlaid on top of the rest makes a lot of what's happening 
at any time more obvious.  Noting when the checkpoints happen is a bit 
less interesting, because once the first one happens, they happen almost 
continuously.

You really need to track when the write and sync phases are happening 
for that to be really useful.  This circles back to why I proposed 
exposing those timing bits in pg_stat_bgwriter.  pgbench-tools already 
grabs data from it, which avoids all the mess around log file parsing.  
If I could do that more often and extract checkpoint timing from that 
data, it would make labelling graphs like these much easier to do, from 
the client that's running the benchmark even.

> Third, all of the tests initially start of
> processing transactions very quickly, and get slammed down very hard,
> probably because the very high rate of transaction processing early on
> causes a checkpoint to occur around 200 s.

At the beginning of a write-heavy pgbench run, rate is high until one of 
these two things happen:

1) A checkpoint begins
2) Linux's write cache threshold (typically 
/proc/sys/vm/dirty_background_ratio) worth of dirty memory accumulates.

Note that (1) on its own isn't necessarily the problem, it's something 
the case that it just makes (2) happen much faster then.

Basically, the first 30 to 150 seconds of any write-heavy test always 
have an inflated speed.  You're writing into the OS cache at maximum 
speed, and none of those writes are making it to physical disk--except 
perhaps for the WAL, which is all fast and sequential.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Euler Taveira de Oliveira
Дата:
Сообщение: Re: xlog location arithmetic
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Add protransform for numeric, varbit, and temporal types