Обсуждение: Sanity check requested

От:
"Nick Fankhauser"
Дата:

Hi folks-

For some time, we've been running Postgres with the default configuration &
getting adequate performance, but the time has come to tune a bit, so I've
been lurking on this list & gathering notes. Now I'm about ready to make a
change & would appreciate it if a few more experienced folks could comment
on whether I appear to be heading in the right direction-

Here's what I'm planning:

Increase SHMMAX and SHMALL in my kernel to 134217728 (128MB)

Increase shared_buffers to 8192 (64MB)

Increase sort_mem to 16384 (16MB)

Increase effective_cache_size to 65536 (1/2 GB)


Here's the environment:

The Hardware is a dual-processor Athlon 1.2 Ghz box with 1 GB of RAM and the
DB on SCSI RAID drives.

The server runs only PostgreSQL

The database size is about 8GB, with the largest table 2.5 GB, and the two
most commonly queried tables at 1 GB each.

The two most commonly queried tables are usually queried based on a
non-unique indexed varchar field typically 20 chars long. The query is a
"like" on people's names with trailing %, so this often gets pushed to seq
scan or returns several thousand records. (As when someone searches on
'Jones%'.

Records from the largest table are always accessed via unique index in
groups of 20 or less.

The OS is Debian Linux kernel 2.4.x (recompiled custom kernel for dual
processor support)
The PostgreSQL version is 7.3.2

We typically have about 30 interactive users on the DB, but they're using a
shared connection pool of 16. Our main problem appears to be when one of the
users fires up a large query and creates a log-jam with resources.


My reasoning is that I'll increase shared_buffers based on anecdotal
recommendations I've seen on this list to 64MB. I'll boost the OS SHMMAX to
twice that value to allow adequate room for other shared memory needs, thus
reserving 128MB. Of the remaining memory, 256MB goes to 16 connections *
16MB sort space, if I leave about 128 MB for headroom, then 1/2 GB should be
left available for the effective cache size.

Any thoughts? Is this a sane plan? Are there other parameters I should
consider changing first?


Thanks!
     -Nick

---------------------------------------------------------------------
Nick Fankhauser

      Phone 1.765.965.7363  Fax 1.765.962.9788
doxpop - Court records at your fingertips - http://www.doxpop.com/


---------------------------------------------------------------------
Nick Fankhauser

      Phone 1.765.965.7363  Fax 1.765.962.9788
doxpop - Court records at your fingertips - http://www.doxpop.com/


От:
"Shridhar Daithankar"
Дата:

On 14 Jul 2003 at 12:51, Nick Fankhauser wrote:
> Any thoughts? Is this a sane plan? Are there other parameters I should
> consider changing first?

Well, everything seems to be in order and nothing much to suggest I guess. But
still..

1. 30 users does not seem to be much of a oevrhead. If possible try doing away
with connection pooling. Buta test benchmark run is highly recommended.

2. While increasing sort memory, try 4/8/16 in that order. That way you will
get a better picture of load behaviour. Though whatever you put appears
reasonable, having more data always help.

3. I don't know how this affects on SCSI drives, but what file system you are
using? Can you try diferent ones? Like reiserfs/ext3 and XFS? See what fits
your bill.

4. OK, this is too much but linux kernel 2.6 is in test and has vastly improved
IO scheduler. May be you should look at it if you are up to experimentation.

HTH

Bye
 Shridhar

--
You're dead, Jim.        -- McCoy, "The Tholian Web", stardate unknown


От:
"Nick Fankhauser"
Дата:

Shridhar-

I appreciate your thoughts- I'll be running some before & after tests on
this using one of our development/hot-swap boxes, so I'll report the results
back to the list.

A few more thoughts/questions:

> 1. 30 users does not seem to be much of a oevrhead. If possible
> try doing away with connection pooling.

The application needs to scale up gracefully. We actually have about 200
users that could decide to log on at the same time- 30 is just a typical
load. We'd also prefer to have 20,000 subscribers so we can start making a
living with this business <g>.

> 2. While increasing sort memory, try 4/8/16 in that order. That
> way you will get a better picture of load behaviour. Though whatever you
put appears
> reasonable, having more data always help.

I'll try that approach while testing. Is it the case that the sort memory is
allocated for each connection and becomes unavailable to other processes
while the connection exists? If so, since I'm using a connection pool, I
should be able to control total usage precisely. Without a connection pool,
I could start starving the rest of the system for resources if the number of
users spiked unexpectedly. Correct?



> 3. I don't know how this affects on SCSI drives, but what file
> system you are using? Can you try diferent ones?

> 4. OK, this is too much but linux kernel 2.6 is in test and has
> vastly improved IO...

I'm using ext2. For now, I'll leave this and the OS version alone. If I
change too many variables, I won't be able to discern which one is causing a
change. Although I understand that there's an element of art to tuning, I'm
enough of a neophyte that I don't have a "feeling" for the tuning parameters
yet and hence I have to take a scientific approach of just tweaking a few
variables in an otherwise controlled and unchanged environment. If I can't
reach my goals with the simple approach, I'll consider some of the more
radical ideas.

Again, thanks for the ideas- I'll feed the results back after I've done some
tests

-Nick






От:
Josh Berkus
Дата:

Nick,

> I'll try that approach while testing. Is it the case that the sort memory
> is allocated for each connection and becomes unavailable to other processes
> while the connection exists? If so, since I'm using a connection pool, I
> should be able to control total usage precisely. Without a connection pool,
> I could start starving the rest of the system for resources if the number
> of users spiked unexpectedly. Correct?

Wrong, actually.   Sort memory is allocated *per sort*, not per connnection or
per query.   So a single complex query could easily use 4xsort_mem if it has
several merge joins ... and a pooled connection could use many times sort_mem
depending on activity.  Thus connection pooling does not help you with
sort_mem usage at all, unless your pooling mechanism can control the rate at
which queries are fed to the planner.

--
Josh Berkus
Aglio Database Solutions
San Francisco

От:
"Nick Fankhauser"
Дата:

> Wrong, actually.   Sort memory is allocated *per sort*, not per
connnection or
> per query.   So a single complex query could easily use 4xsort_mem if it
has
> several merge joins ...

Thanks for the correction- it sounds like this is one where usage can't be
precisely controlled in a dynamic user environment & I just need to get a
feel for what works under a load that approximates my production system.

-Nick


От:
Joe Conway
Дата:

Nick Fankhauser wrote:
> Thanks for the correction- it sounds like this is one where usage
> can't be precisely controlled in a dynamic user environment & I just
> need to get a feel for what works under a load that approximates my
> production system.
>

I think the most important point here is that if you set sort_mem too
high, and you have a lot of simultaneous sorts, you can drive the server
into swapping, which obviously is a very bad thing. You want it set as
high as possible, but not so high given your usage patterns that you
wind up swapping.

Joe



От:
"Shridhar Daithankar"
Дата:

On 17 Jul 2003 at 10:41, Nick Fankhauser wrote:
> I'm using ext2. For now, I'll leave this and the OS version alone. If I

I appreciate your approach but it almost proven that ext2 is not the best and
fastest out there.

IMO, you can safely change that to reiserfs or XFS. Or course, testing is
always recommended.

HTH

Bye
 Shridhar

--
Newton's Little-Known Seventh Law:    A bird in the hand is safer than one
overhead.


От:
Martin Foster
Дата:

I have two tables in the database which are used almost every time
someone makes use of the webpages themselves.   The first, is some sort
of database side parameter list which stores parameters from session to
session.  While the other, is a table that handles the posting activity
of all the rooms and chatters.

The first is required in all authentication with the system and when
entries are missing you are challenged by the system to prove your
identity.   This table is based on a randomized order, as in the unique
number changes pseudo randomly and this table sees a reduction in
entries every hour on the hour as to keep it's information fresh and
manageable.

The other table follows a sequential order and carries more columns of
information.   However, this table clears it's entry nightly and with
current settings will delete roughly a days traffic sitting at 50K rows
of information.

The difference is as follows:   Without making the use of vacuum every
hour the parameter table performs very well, showing no loss in service
or degradation.    Since people authenticate more then post, it is safe
to assume that it removes more rows daily then the posting table.

The posting table often drags the system down in performance when a day
has been skipped, which includes the use of VACUUM ANALYZE EXPLAIN.
This seems to be an indication that the process of a daily delete is
actually a very wise step to take, even if the information itself is not
needed for very long.

A VACUUM FULL will correct the issue, but put the site out of commission
for roughly 20 minutes as the drive crunches the information.

My question is, should the purging of rows be done more often then once
a day for both tables.   Is this why performance seems to take a hit
specifically?  As there were too many rows purged for vacuum to
accurately keep track of?

    Martin Foster
    Creator/Designer Ethereal Realms
    



От:
Ang Chin Han
Дата:

Shridhar Daithankar wrote:
> On 17 Jul 2003 at 10:41, Nick Fankhauser wrote:
>
>>I'm using ext2. For now, I'll leave this and the OS version alone. If I
>
>
> I appreciate your approach but it almost proven that ext2 is not the best and
> fastest out there.

Agreed.

> IMO, you can safely change that to reiserfs or XFS. Or course, testing is
> always recommended.

We've been using ext3fs for our production systems. (Red Hat Advanced
Server 2.1)

And since your (Nick) system is based on Debian, I have done some rough
testing on Debian sarge (testing) (with custom 2.4.20) with ext3fs,
reiserfs and jfs. Can't get XFS going easily on Debian, though.

I used a single partition mkfs'd with ext3fs, reiserfs and jfs one after
the other on an IDE disk. Ran pgbench and osdb-x0.15-0 on it.

jfs's has been underperforming for me. Somehow the CPU usage is higher
than the other two. As for ext3fs and reiserfs, I can't detect any
significant difference. So if you're in a hurry, it'll be easier to
convert your ext2 to ext3 (using tune2fs) and use that. Otherwise, it'd
be nice if you could do your own testing, and post it to the list.

--
Linux homer 2.4.18-14 #1 Wed Sep 4 13:35:50 EDT 2002 i686 i686 i386
GNU/Linux
   2:30pm  up 204 days,  5:35,  5 users,  load average: 5.50, 5.18, 5.13

От:
Paul Thomas
Дата:

On 18/07/2003 07:55 Martin Foster wrote:
> [snip]
> A VACUUM FULL will correct the issue, but put the site out of commission
> for roughly 20 minutes as the drive crunches the information.
>
> My question is, should the purging of rows be done more often then once
> a day for both tables.   Is this why performance seems to take a hit
> specifically?  As there were too many rows purged for vacuum to
> accurately keep track of?

ISTR that there are setting in postgresql.conf which affect how many
tables/rows vacuum can reclaim. The docs say that the default setting of
max_fsm_pages is 10000. Maybe this should be increased for your situation?

HTH

--
Paul Thomas
+------------------------------+---------------------------------------------+
| Thomas Micro Systems Limited | Software Solutions for the Smaller
Business |
| Computer Consultants         |
http://www.thomas-micro-systems-ltd.co.uk   |
+------------------------------+---------------------------------------------+

От:
Andrew Sullivan
Дата:

On Fri, Jul 18, 2003 at 12:55:12AM -0600, Martin Foster wrote:
> The other table follows a sequential order and carries more columns of
> information.   However, this table clears it's entry nightly and with
> current settings will delete roughly a days traffic sitting at 50K rows
> of information.

> has been skipped, which includes the use of VACUUM ANALYZE EXPLAIN.
> This seems to be an indication that the process of a daily delete is
> actually a very wise step to take, even if the information itself is not
> needed for very long.
>
> A VACUUM FULL will correct the issue, but put the site out of commission
> for roughly 20 minutes as the drive crunches the information.

During your "clearing period", why not do the deletes in batches, and
VACUUM the table periodically.  That will allow you to reclaim the
space gradually, and ensure that you don't end up with a big "bald
spot".  But you probably want to increase your FSM settings.  See the
docs.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<>                              M2P 2A8
                                         +1 416 646 3304 x110


От:
Tom Lane
Дата:

Martin Foster <> writes:
> My question is, should the purging of rows be done more often then once
> a day for both tables.   Is this why performance seems to take a hit
> specifically?

Given that the hourly purge seems to work well for you, I'd suggest
trying it on both tables.

Non-FULL vacuum is intended to be run *frequently*, say as often as
you've updated or deleted 10% to 50% of the rows in a table.  Delaying
it until you've had multiple complete turnovers of the table contents
will cost you.

> As there were too many rows purged for vacuum to
> accurately keep track of?

Only possible if you don't have the FSM parameters set high enough.
Infrequent vacuuming means you need more FSM space, btw.

            regards, tom lane

От:
"Nick Fankhauser"
Дата:

Thanks for the suggestions in the FS types- especially the Debian oriented
info. I'll start by playing with the memory allocation parameters that I
originally listed (seems like they should provide results in a way that is
unaffected by the disk IO). Then once I have them at optimal values, move on
to trying different file systems.

I assume that as I make changes that affect the disk IO performance, I'll
then need to do some testing to find new values for the IO cost for the
planner- Do you folks have some ballpark numbers to start with for this
based on your experience? I'm departing in three ways from the simple IDE
model that (I presume) the default random page cost of 4 is based on- The
disks are SCSI & RAID and the FS would be different.

At this point, I can't think of any better way to test this than simply
running my local test suite with various values and recording the wall-clock
results. Is there a different approach that might make more sense? (This
means that my results will be skewed to my environment, but I'll post them
anyway.)

I'll post results back to the list as I get to it- It might be a slow
process Since I spend about 18 hours of each day keeping the business
running, I'll have to cut back on sleep & do this in the other 10 hours. <g>

-NF


> Shridhar Daithankar wrote:
> I appreciate your approach but it almost proven that ext2 is
> not the best and fastest out there.
>
> Agreed.

> Ang Chin Han wrote:
> We've been using ext3fs for our production systems. (Red Hat Advanced
> Server 2.1)
>
> And since your (Nick) system is based on Debian, I have done some rough
> testing on Debian sarge (testing) (with custom 2.4.20) with ext3fs,
> reiserfs and jfs. Can't get XFS going easily on Debian, though.
>
> I used a single partition mkfs'd with ext3fs, reiserfs and jfs one after
> the other on an IDE disk. Ran pgbench and osdb-x0.15-0 on it.
>
> jfs's has been underperforming for me. Somehow the CPU usage is higher
> than the other two. As for ext3fs and reiserfs, I can't detect any
> significant difference. So if you're in a hurry, it'll be easier to
> convert your ext2 to ext3 (using tune2fs) and use that. Otherwise, it'd
> be nice if you could do your own testing, and post it to the list.
>
> --
> Linux homer 2.4.18-14 #1 Wed Sep 4 13:35:50 EDT 2002 i686 i686 i386
> GNU/Linux
>    2:30pm  up 204 days,  5:35,  5 users,  load average: 5.50, 5.18, 5.13
>


От:
"scott.marlowe"
Дата:

On Fri, 18 Jul 2003, Ang Chin Han wrote:

> Shridhar Daithankar wrote:
> > On 17 Jul 2003 at 10:41, Nick Fankhauser wrote:
> >
> >>I'm using ext2. For now, I'll leave this and the OS version alone. If I
> >
> >
> > I appreciate your approach but it almost proven that ext2 is not the best and
> > fastest out there.
>
> Agreed.

Huh?  How can journaled file systems hope to outrun a simple unjournaled
file system?  There's just less overhead for ext2 so it's quicker, it's
just not as reliable.

I point you to this link from IBM:

http://www-124.ibm.com/developerworks/opensource/linuxperf/iozone/iozone.php

While ext3 is a clear loser to jfs and rfs, ext2 wins most of the contests
against both reiser and jfs.  Note that xfs wasn't tested here.  But in
general, ext2 is quite fast nowadays.

>
> > IMO, you can safely change that to reiserfs or XFS. Or course, testing is
> > always recommended.
>
> We've been using ext3fs for our production systems. (Red Hat Advanced
> Server 2.1)
>
> And since your (Nick) system is based on Debian, I have done some rough
> testing on Debian sarge (testing) (with custom 2.4.20) with ext3fs,
> reiserfs and jfs. Can't get XFS going easily on Debian, though.
>
> I used a single partition mkfs'd with ext3fs, reiserfs and jfs one after
> the other on an IDE disk. Ran pgbench and osdb-x0.15-0 on it.
>
> jfs's has been underperforming for me. Somehow the CPU usage is higher
> than the other two. As for ext3fs and reiserfs, I can't detect any
> significant difference. So if you're in a hurry, it'll be easier to
> convert your ext2 to ext3 (using tune2fs) and use that. Otherwise, it'd
> be nice if you could do your own testing, and post it to the list.

I would like to see some tests on how they behave on top of large fast
RAID arrays, like a 10 disk RAID5 or something.  It's likely that on a
single IDE drive the most limiting factor is the bandwidth of the drive,
whereas on a large array, the limiting factor would likely be the file
system code.


От:
Vincent van Leeuwen
Дата:

On 2003-07-17 10:41:35 -0500, Nick Fankhauser wrote:
> I'm using ext2. For now, I'll leave this and the OS version alone. If I
>

I'd upgrade to a journaling filesystem as soon as possible for reliability.
Testing in our own environment has shown that PostgreSQL performs best on ext3
(yes, better than XFS, JFS or ReiserFS) with a linux 2.4.21 kernel. Be sure to
mount noatime and to create the ext3 partition with the correct stripe size of
your RAID array using the '-R stride=foo' option (see man mke2fs).

Vincent van Leeuwen
Media Design - http://www.mediadesign.nl/

От:
"Oliver Scheit"
Дата:

> Be sure to mount noatime

I did "chattr -R +A /var/lib/pgsql/data"
that should do the trick as well or am I wrong?

regards,
Oli

От:
Vincent van Leeuwen
Дата:

On 2003-07-18 18:20:55 +0200, Oliver Scheit wrote:
> > Be sure to mount noatime
>
> I did "chattr -R +A /var/lib/pgsql/data"
> that should do the trick as well or am I wrong?
>

According to the man page it gives the same effect. There are a few things you
should consider though:
- new files won't be created with the same options (I think), so you'll have
to run this command as a daily cronjob or something to that effect
- chattr is probably more filesystem-specific than a noatime mount, although
this isn't a problem on ext[23] ofcourse

Vincent van Leeuwen
Media Design - http://www.mediadesign.nl/

От:
"Oliver Scheit"
Дата:

>> > Be sure to mount noatime
>>
>> I did "chattr -R +A /var/lib/pgsql/data"
>> that should do the trick as well or am I wrong?
>>
>
> According to the man page it gives the same effect.
> There are a few things you should consider though:
> - new files won't be created with the same options (I think),
> so you'll have to run this command as a daily cronjob or
> something to that effect

This would be a really interesting point to know.
I will look into this.

I think the advantage of "chattr" is that the last access time
is still available for the rest of the filesystem.
(Of course you could have your own filesystem just for the
database stuff, in this case the advantage would be obsolete)

regards,
Oli

От:
"Nick Fankhauser"
Дата:

I'm confused:

Ang Chin Han wrote:

> We've been using ext3fs for our production systems. (Red Hat Advanced
> Server 2.1)

Vincent van Leeuwen wrote:

> I'd upgrade to a journaling filesystem as soon as possible for
> reliability.

...About one year ago I considered moving to a journaling file system, but
opted not to because it seems like that's what WAL does for us already. How
does putting a journaling file system under it add more reliability?

I also guessed that a journaling file system would add overhead because now
a write to the WAL file could itself be deferred and logged elsewhere.

...So now I'm really puzzled because folks are weighing in with solid
anecdotal evidence saying that I'll get both better reliability and
performance. Can someone explain what I'm missing about the concept?

-A puzzled Nick


От:
Josh Berkus
Дата:

Nick,

> ...About one year ago I considered moving to a journaling file system, but
> opted not to because it seems like that's what WAL does for us already. How
> does putting a journaling file system under it add more reliability?

It lets you restart your server quickly after an unexpected power-out.  Ext2
is notoriously bad about this.

Also, WAL cannot necessarily recover properly if the underlying filesystem is
corrupted.

> I also guessed that a journaling file system would add overhead because now
> a write to the WAL file could itself be deferred and logged elsewhere.

You are correct.


--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


От:
Rod Taylor
Дата:

> ...About one year ago I considered moving to a journaling file system, but
> opted not to because it seems like that's what WAL does for us already. How
> does putting a journaling file system under it add more reliability?

WAL only works if the WAL files are actually written to disk and can be
read off it again.  Ext2 has a number of deficiencies which can cause
problems with this basic operation (inode corruptions, etc).  Journaling
does not directly help.

От:
Tom Lane
Дата:

"Nick Fankhauser" <> writes:
> I'm departing in three ways from the simple IDE
> model that (I presume) the default random page cost of 4 is based on- The
> disks are SCSI & RAID and the FS would be different.

Actually, the default 4 is based on experiments I did quite awhile back
on HPUX (with a SCSI disk) and Linux (with an IDE disk, and a different
filesystem).  I didn't see too much difference between 'em.  RAID might
alter the equation, or not.

            regards, tom lane