Обсуждение: 8.3 beta testing suggestions welcome

Поиск

Список

Период

Сортировка

8.3 beta testing suggestions welcome

От

"Kevin Grittner"

Дата:

20 августа 2007 г., 18:07:03

I've been lobbying management here for us to allocate some resources to testing 8.3 once it hits beta. If it is
approved,it might happen on a time frame too short to get much feedback before the tests, so I'm throwing the question
outhere now: what would people like us to bang on?
The box most likely to be used for the testing is a bit old, but still, it is SMP and we would be throwing real-world
trafficat it, so it should be of some value. It has 4 2 GHz Xeon MP CPUs, 6 GB RAM, and a RAID controller with 256 MB
battery-backedRAM cache. The 230 GB database would be sitting on a 407 GB RAID 5 array. In addition to the PostgreSQL
instancethere would be two Java middle tiers running on the box.
One middle tier is for modifying data based on transactions received from 72 source databases; this load is about 1
milliondatabase transactions on a typical work day, with an average of maybe 20 INSERT, UPDATE, and DELETE statements
pertransaction. (We don't typically have many deletes.) The other middle tier uses a login which only has SELECT
rightsto support our web site. We have about 2 million web hits per day generating about 10 million database
transactions. We can play the actual HTTP requests from our log through a bank of renderers to get a real mix of
queriesfrom production.
We're particularly interested in seeing what configuration changes we may have to make to achieve optimal performance
withthe checkpoints and background writer in the new release. When we first went to PostgreSQL our biggest problem was
thatdirty buffers would accumulate in shared memory until a checkpoint, and then overrun the controllers cache. This
wouldcause disk reads to queue up behind the writes, and queries which normally ran in a millisecond or two were timing
outat our renderers' 20 second limit. The problem went away completely when we used a very aggressive background
writerconfiguration, to put the dirty pages in front of the OS file system right away, so that its algorithms and the
controllercache could deal with things before they got out of hand.
We could run some tests with just the read-only web load, if that is useful, or push the update load alone. We could
paceinput. My guess is that the most useful tests would involve letting both run as fast as the machine can handle it
withvarious configurations and see what throughput and timeout counts we get.
Any thoughts or suggestions welcome, particularly about what configurations to try.
-Kevin

Re: 8.3 beta testing suggestions welcome

От

Heikki Linnakangas

Дата:

20 августа 2007 г., 18:18:55

Kevin Grittner wrote:
> I've been lobbying management here for us to allocate some resources to testing 8.3 once it hits beta.  If it is
approved,it might happen on a time frame too short to get much feedback before the tests, so I'm throwing the question
outhere now: what would people like us to bang on?
 

That would be great!

> We're particularly interested in seeing what configuration changes we may have to make to achieve optimal performance
withthe checkpoints and background writer in the new release.  When we first went to PostgreSQL our biggest problem was
thatdirty buffers would accumulate in shared memory until a checkpoint, and then overrun the controllers cache.  This
wouldcause disk reads to queue up behind the writes, and queries which normally ran in a millisecond or two were timing
outat our renderers' 20 second limit.  The problem went away completely when we used a very aggressive background
writerconfiguration, to put the dirty pages in front of the OS file system right away, so that its algorithms and the
controllercache could deal with things before they got out of hand.
 

Yes, the load distributed checkpoints definitely should help with that.
I'd like to see how well it works for you with the default bgwriter
settings.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com

Re: 8.3 beta testing suggestions welcome

От

Gregory Stark

Дата:

20 августа 2007 г., 18:47:35

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

> When we first went to PostgreSQL our biggest problem was that dirty buffers
> would accumulate in shared memory until a checkpoint, and then overrun the
> controllers cache. This would cause disk reads to queue up behind the
> writes, and queries which normally ran in a millisecond or two were timing
> out at our renderers' 20 second limit. The problem went away completely when
> we used a very aggressive background writer configuration, to put the dirty
> pages in front of the OS file system right away, so that its algorithms and
> the controller cache could deal with things before they got out of hand.

Sounds like a tailor-mode use case for precisely what Heikki was complaining
about. He couldn't find a case in 8.3 where tuning the bgwriter to be more
aggressive helped at all.

With the load distributed checkpoints I think the symptoms would be different
but the disease may still be there. Since checkpoints will try not to swamp
your i/o bandwidth any longer you shouldn't get these terrible spikes. 

However the theory with bgwriter is that setting it to be very aggressive will
reduce the response time even outside the checkpoints by avoiding the need for
individual backends to evict dirty pages. So it would be interesting to know
with 8.3 whether the average response time even outside of checkpoints is
reduced by having a more aggressive bgwriter policy.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com

Re: 8.3 beta testing suggestions welcome

От

Greg Smith

Дата:

21 августа 2007 г., 04:37:00

On Mon, 20 Aug 2007, Kevin Grittner wrote:

> The problem went away completely when we used a very aggressive 
> background writer configuration, to put the dirty pages in front of the 
> OS file system right away, so that its algorithms and the controller 
> cache could deal with things before they got out of hand.

Can you share what the aggressive BGW settings you settled on were? 
Knowing that information would allow giving better suggestions on how to 
translate what you're currently doing into the very different settings 8.3 
uses in this area.  Also be helpful to know about how often you have a 
checkpoint with your current configuration, and how big your 
shared_buffers is to get a general context for the size/frequency of 
potential checkpoint problems.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: 8.3 beta testing suggestions welcome

От

"Kevin Grittner"

Дата:

21 августа 2007 г., 11:34:44

>>> On Tue, Aug 21, 2007 at  2:36 AM, in message
<Pine.GSO.4.64.0708210326370.2093@westnet.com>, Greg Smith
<gsmith@gregsmith.com> wrote:
> On Mon, 20 Aug 2007, Kevin Grittner wrote:
>
>> The problem went away completely when we used a very aggressive
>> background writer configuration, to put the dirty pages in front of the
>> OS file system right away, so that its algorithms and the controller
>> cache could deal with things before they got out of hand.
>
> Can you share what the aggressive BGW settings you settled on were?
Under 8.2.4, the postgresql.conf file, minus blank lines and comments:
listen_addresses = '*'
max_connections = 200
shared_buffers = 160MB
temp_buffers = 50MB
work_mem = 10MB
maintenance_work_mem = 160MB
max_fsm_pages = 800000
bgwriter_lru_percent = 20.0
bgwriter_lru_maxpages = 200
bgwriter_all_percent = 10.0
bgwriter_all_maxpages = 600
wal_buffers = 160kB
checkpoint_segments = 10
random_page_cost = 2.0
effective_cache_size = 5GB
redirect_stderr = on
log_line_prefix = '[%m] %p %q<%u %d %r> '
stats_block_level = on
stats_row_level = on
autovacuum = on
autovacuum_naptime = 10s
autovacuum_vacuum_threshold = 1
autovacuum_analyze_threshold = 1
datestyle = 'iso, mdy'
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
escape_string_warning = off
standard_conforming_strings = on
sql_inheritance = off
> Also be helpful to know about how often you have a
> checkpoint with your current configuration
Since our problems associated with checkpoints have been eliminated I
haven't been tracking them very closely.  A scan of our log files on this
machine show that there were two episodes in the last two months where we
had 20 to 40 checkpoints in a day which were less than 30 seconds after the
prior one.  These were associated with times when this central copy had
fallen significantly behind the source databases and replication was running
full out, catching up with the sources.
Outside of those events we seem to have on to four episodes on scattered
days.  A quick check shows that all but a few coincide with a weekly run
which passes massive amounts of data to rebuild a table of cases which
should not show on the public web site according to the Supreme Court's
records retention rules.
Is there anything you would like me to do to capture more information
about the checkpoints?
-Kevin

Re: 8.3 beta testing suggestions welcome

От

Greg Smith

Дата:

21 августа 2007 г., 15:14:03

On Tue, 21 Aug 2007, Kevin Grittner wrote:

> shared_buffers = 160MB
> effective_cache_size = 5GB
> bgwriter_lru_percent = 20.0
> bgwriter_lru_maxpages = 200
> bgwriter_all_percent = 10.0
> bgwriter_all_maxpages = 600
> checkpoint_segments = 10

The other thing I realized you haven't mentioned yet is what operating 
system you're using.

What you've done here is make your shared_buffers smaller than even the 
write cache on your disk controller (256MB), so that anytime a checkpoint 
hits it should be able to cache writing even the entire buffer cache out 
if necessary.  You're letting the OS handle caching everything else, which 
is memory not involved in the checkpoint process and therefore doesn't 
impact your problem situation.

With checkpoint_segments at 10, you can't do much activity (relative to 
how fast this powerful of a server can dirty pages) before a checkpoint 
happens, which also limits how much any one checkpoint has to clean up. 
And your background writer settings are aggressive enough that the entire 
pool is getting scanned constantly, which is wasting lots of CPU resources 
but you have them to spare.

This is reasonable solution for working around checkpoint issues in 8.2.4, 
but the fact that you're still having checkpoints spaced 30 seconds apart 
sometimes is one weak spot you already have seen.  PostgreSQL operating 
with a tiny shared_buffers cache is another, and you could expect overall 
performance to improve if it were possible to increase that without 
getting killed by checkpoints.

In 8.3, it should be possible for you to increase both shared_buffers and 
checkpoint_segments significantly and still have checkpoints happen in an 
orderly fashion.  There is no all-scan anymore, instead there's a 
parameter called checkpoint_completion_target; see
http://developer.postgresql.org/pgdocs/postgres/wal-configuration.html

My suggestion for a starting 8.3 configuration for you would be adjusting 
these settings as follows:

shared_buffers=1GB
checkpoint_segments = 50

And then try replaying your data with checkpoint_completion_target at 0.5 
(default), 0.7, and 0.9 and see how each of those works out for you (0.7 
is the least useful of those if you only did two).  Hopefully the data 
replay you have can be setup to invoke the same tightly spaced checkpoint 
behavior you commented about.  Based on the results of those tests, it may 
be possible to further increase shared_buffers, and 
checkpoint_segments/checkpoint_timeout may need some adjustment one way or 
another.

Note that some/all of the bgwriter_lru parameters may be going away before 
8.3 is done as well, that's a loose end I'm working on right now.

If your project gets approved, that's what I think would be a useful test 
to run.  That should get some good results for the community as large as 
well as research how upgrading to the new version might positively impact 
your application.  You're actually in one of the situations I'm a little 
concerned about.  All the tests that have been done by people here have 
suggested using checkpoint_completion_target and removing the all scan are 
always net positive compared to the 8.2.4 behavior, but your situation 
(where you're heavily using bgwriter_all_percent = 10.0 and 
bgwriter_all_maxpages = 600) is one where it's possible 8.3 may be a step 
backwards.  Not likely, just possible, and it would be great to get 
another data point on this during the beta.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: 8.3 beta testing suggestions welcome

От

"Kevin Grittner"

Дата:

21 августа 2007 г., 15:27:25

>>> On Tue, Aug 21, 2007 at  1:12 PM, in message
<Pine.GSO.4.64.0708211316210.16511@westnet.com>, Greg Smith
<gsmith@gregsmith.com> wrote:
>
> The other thing I realized you haven't mentioned yet is what operating
> system you're using.
Linux version 2.6.5-7.286-bigsmp (geeko@buildhost) (gcc version 3.3.3 (SuSE Linux)) #1 SMP Thu May 31 10:12:58 UTC 2007
SUSE LINUX Enterprise Server 9 (i586)
VERSION = 9
PATCHLEVEL = 3
I know this is old, but I was happy just to get them off of Windows.  :-/
> My suggestion for a starting 8.3 configuration for you would be adjusting
> these settings as follows:
>
> shared_buffers=1GB
> checkpoint_segments = 50
>
> And then try replaying your data with checkpoint_completion_target at 0.5
> (default), 0.7, and 0.9 and see how each of those works out for you (0.7
> is the least useful of those if you only did two).  Hopefully the data
> replay you have can be setup to invoke the same tightly spaced checkpoint
> behavior you commented about.  Based on the results of those tests, it may
> be possible to further increase shared_buffers, and
> checkpoint_segments/checkpoint_timeout may need some adjustment one way or
> another.
>
> Note that some/all of the bgwriter_lru parameters may be going away before
> 8.3 is done as well, that's a loose end I'm working on right now.
Thanks much.  If I get the approval, I'll be sure to run these three tests.
It probably makes sense to get a baseline with the current config first.
I assume you want me to do these with both the update and query streams
running full out?
Anyone else have a configuration to suggest?
-Kevin

Re: 8.3 beta testing suggestions welcome

От

"Kevin Grittner"

Дата:

30 ноября 2007 г., 19:17:19

>>> On Tue, Aug 21, 2007 at  1:12 PM, in message
<Pine.GSO.4.64.0708211316210.16511@westnet.com>, Greg Smith
<gsmith@gregsmith.com> wrote:
> My suggestion for a starting 8.3 configuration for you would be adjusting
> these settings as follows:
>
> shared_buffers=1GB
> checkpoint_segments = 50

I'm not into the most meaningful tests yet, but I'm trying to
garner what useful information I can from the load of the 8.2.5
database into a server running the cvs tip copy taken a few hours
ago.  After letting the cache settle in, while in the middle of a
COPY of a large table, I see the attached through a couple
checkpoint cycles.  The pattern is pretty consistent with this
over other the other checkpoint cycles I've examined.

Note that the system can sustain over 20000 writes per second
without the disk being a bottleneck.  The current tip behavior is
smoother than 8.2.x without background writer, for sure.  It still
isn't as smooth as 8.2.x with an aggressive backgroud writer, at
least for loading a dump.  Note how it lounges around at as low as
5500 writes per second for prolonged periods with occassional
spikes into the 55000 range, causing I/O wait time, although these
only last a few seconds.  Without more rigorous tests for
comparison, I'd say that this is probably a net loss for this
operation -- I seem to recall seeing a pretty steady 20000 bo under
8.2.x.

More info to follow as testing progresses.

-Kevin

Вложения

vmstat.sample1.out

Re: 8.3 beta testing suggestions welcome

От

Greg Smith

Дата:

30 ноября 2007 г., 20:04:34

On Fri, 30 Nov 2007, Kevin Grittner wrote:

>> checkpoint_segments = 50

Here's how I'm reading your data:

16:43:11 : Checkpoint start
16:44:23 : Checkpoint ends [ 1:22 long]
1:01 passes
16:45:24 : Checkpoint start
16:46:36 : checkpoint ends [1:22 long]

If you're getting a checkpoint every minute or so, the number of 
checkpoint segments you're using is still far too low to really take 
advantage of the new checkpoint spreading features.  You want to make that 
high enough that it's several minutes between them.  It's not unusual to 
set a much higher checkpoint_segments during bulk loading than you'd 
necessarily want for actual production use.

One thing that would make your reports more useful:  in the spots in your 
log where you're labeling [checkpoint starting] etc., if you could include 
the new message spit out by turning on checkpoint_log there it would help. 
That's the one that says how much data was written.

Did you ever write something to save snapshots of pg_stat_bgwriter? 
Those would be interesting to see on the same time scale as well.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: testing result overview (was: 8.3 beta testing suggestions welcome)

От

"Kevin Grittner"

Дата:

18 января 2008 г., 15:09:06

>>> On Tue, Aug 21, 2007 at  1:12 PM, in message
<Pine.GSO.4.64.0708211316210.16511@westnet.com>, Greg Smith
<gsmith@gregsmith.com> wrote:
> My suggestion for a starting 8.3 configuration for you would be adjusting
> these settings as follows:
>
> shared_buffers=1GB
> checkpoint_segments = 50
>
> And then try replaying your data with checkpoint_completion_target at 0.5
> (default), 0.7, and 0.9 and see how each of those works out for you (0.7
> is the least useful of those if you only did two).  Hopefully the data
> replay you have can be setup to invoke the same tightly spaced checkpoint
> behavior you commented about.
Under 8.2.4, 8.2.5, 8.3beta4 and a recent compile of the CVS tip,
the test machine (4 2 GHz Xeons with 6 GB RAM) was unable to create
the clusters of timeouts which we saw on the production box (8 3GHz
Xeons with 12 GB RAM), in spite of concerted efforts to recreate the
conditions.  It's possible that updates to PostgreSQL or the OS have
made the difference, although I think it's more likely that the
larger OS cache and/or the higher CPU power of the production box
are needed to create the problem on this type of disk array.  We're
working on getting a couple new boxes on line, which will allow us
to test using the current production box, but that's weeks away.
Performance under 8.3 is better in general, and there is a very
noticeable smoothing of the output spikes at the OS level.  I didn't
notice much difference in vmstat output with the different
checkpoint_completion_target settings.
Our testing methoodology involved playing transaction streams from
both our replication engine and our web site, at the same time,
against the test server, through the normal middle tier software we
use in production.  We have timestamps on all of it, which allowed
us to play the streams at various percentages of the production
load.  Based on the timeouts from the web renderers (which cancel
a request if the middle tier hasn't responded within 20 seconds),
the middle tier metrics of actual run time once the request is
pulled from the request queue, and the vmstat output, 8.3 shows
significant performance improvement as long as the fix for the
OUTER JOIN optimizer regression is used (that is, with a recent
checkout from the CVS tip, or beta/RC with Tom's patch applied).
-Kevin

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: 8.3 beta testing suggestions welcome

Вложения