Обсуждение: Re: [GENERAL] PostgreSQL Performance Tuning

От:
Steve Crawford
Дата:

Shohab Abdullah wrote:
>
> Dear,
> We are facing performance tuning problem while using PostgreSQL Database
> over the network on a linux OS.
> Our Database consists of more than 500 tables with an average of 10K
> records per table with an average of 20 users accessing the database
> simultaneously over the network. Each table has indexes and we are
> querying the database using Hibernate.
> The biggest problem is while insertion, updating and fetching of
> records, ie the database performance is very slow. It take a long time
> to respond in the above scenario.
> Please provide me with the tuning of the database. I am attaching my
> *postgresql.conf* file for the reference of our current configuration

Have you changed _anything_ from the defaults? The defaults are set so
PG will run on as many installations as practical. They are not set for
performance - that is specific to your equipment, your data, and how you
need to handle the data. Assuming the record sizes aren't huge, that's
not a very large data set nor number of users.

Look at these for starters:
http://www.varlena.com/GeneralBits/Tidbits/perf.html
http://www.varlena.com/GeneralBits/Tidbits/annotated_conf_e.html

You might try setting the logging parameters to log queries longer than
"x" (where x is in milliseconds - you will have to decide the
appropriate value for "too long") and start looking into those first.

Make sure that you are running "analyze" if it is not being run by
autovacuum.

Use "EXPLAIN <your query>" to see how the query is being planned - as a
first-pass assume that on any reasonably sized table the words
"sequential scan" means "fix this". Note that you may have to cast
variables in a query to match the variable in an index in order for the
planner to figure out that it can use the index.

Read the guidelines then take an educated stab at some settings and see
how they work - other than turning off fsync, there's not much in
postgresql.conf that will put your data at risk.

Cheers,
Steve

От:
Carlos Moreno
Дата:

Steve Crawford wrote:
> Have you changed _anything_ from the defaults? The defaults are set so
> PG will run on as many installations as practical. They are not set for
> performance - that is specific to your equipment, your data, and how you
> need to handle the data.
Is this really the sensible thing to do?   I know we should not encourage
the world we're leaving in even more in the ways of "have the computer
do everything for us so that we don't need to have even a clue about what
we're doing" ...  But, wouldn't it make sense that the configure script
determines the amount of physical memory and perhaps even do a HD
speed estimate to set up defaults that are closer to a
performance-optimized
configuration?

Then, perhaps command switches so that you could specify the type of
access you estimate for your system.  Perhaps something like:

./configure --db-size=100GB --write-percentage=20  .... etc.

(switch write-percentage above indicates that we estimate that 20% of
the DB activity would be writing to the disk --- there may be other
switches to indicate the percentage of queries that are transactions,
the percentage of queries that are complex;  percentage that require
index usage, etc. etc. etc.)...  And then, based on that, a better set of
defaults could be set by the configuration script.

Does this make sense?  Or perhaps I'm watching too much science
fiction?

Carlos
--


От:
Tom Lane
Дата:

Carlos Moreno <> writes:
> ...  But, wouldn't it make sense that the configure script
> determines the amount of physical memory and perhaps even do a HD
> speed estimate to set up defaults that are closer to a
> performance-optimized
> configuration?

No.  Most copies of Postgres these days are executed on machines very
far away from where the code was built.  It's a little bit safer to
try to tune things at initdb time ... as indeed we already do.  But
the fundamental problem remains that we don't know that much about
how the installation will be used.  For example, the planner
configuration parameters turn out to have not that much to do with the
absolute speed of your drive, and a whole lot to do with the ratio
of the size of your database to the amount of RAM you've got; and the
ultimate size of the DB is one thing initdb certainly can't guess.

Also, there is an extremely good reason why Postgres will never be set
up to try to take over the whole machine by default: most of the
developers run multiple postmasters on their machines.

            regards, tom lane

От:
Carlos Moreno
Дата:

Tom Lane wrote:
> Carlos Moreno <> writes:
>
>> ...  But, wouldn't it make sense that the configure script
>> determines the amount of physical memory and perhaps even do a HD
>> speed estimate to set up defaults that are closer to a
>> performance-optimized
>> configuration?
>>
>
> No.  Most copies of Postgres these days are executed on machines very
> far away from where the code was built.  It's a little bit safer to
> try to tune things at initdb time ... as indeed we already do.

D'oh!  Yes, that makes more sense, of course.

>  But
> the fundamental problem remains that we don't know that much about
> how the installation will be used.

Notice that the second part of my suggestion covers this --- have
additional
switches to initdb so that the user can tell it about estimates on how
the DB
will be used:  estimated size of the DB, estimated percentage of
activity that
will involve writing, estimated percentage of activity that will be
transactions,
percentage that will use indexes, percentage of queries that will be
complex,
etc. etc.

Wouldn't initdb be able to do a better job at coming up with sensible
defaults if it counts on this information?   Of course, all these
parameters
would have their own defaults --- the user won't necessarily know or have
an accurate estimate for each and every one of them.

> Also, there is an extremely good reason why Postgres will never be set
> up to try to take over the whole machine by default: most of the
> developers run multiple postmasters on their machines.
>
Wouldn't this be covered by the above suggestion??  One of the switches
for the command initdb could allow the user to specify how many instances
will be run  (I assume you're talking about having different instances
listening
on different ports for increased concurrency-related benefits?)

Does my suggestion make more sense now?  Or is it still too unrealistic to
make it work properly/safely?

Carlos
--


От:
Michael Stone
Дата:

On Fri, Apr 27, 2007 at 09:27:49AM -0400, Carlos Moreno wrote:
>Notice that the second part of my suggestion covers this --- have
>additional
>switches to initdb so that the user can tell it about estimates on how
>the DB
>will be used:  estimated size of the DB, estimated percentage of
>activity that
>will involve writing, estimated percentage of activity that will be
>transactions,
>percentage that will use indexes, percentage of queries that will be
>complex,
>etc. etc.

If the person knows all that, why wouldn't they know to just change the
config parameters?

Mike Stone

От:
Tom Lane
Дата:

Carlos Moreno <> writes:
> Tom Lane wrote:
>> But
>> the fundamental problem remains that we don't know that much about
>> how the installation will be used.

> Notice that the second part of my suggestion covers this --- have
> additional switches to initdb

That's been proposed and rejected before, too; the main problem being
that initdb is frequently a layer or two down from the user (eg,
executed by initscripts that can't pass extra arguments through, even
assuming they're being invoked by hand in the first place).

            regards, tom lane

От:
Mark Lewis
Дата:

Maybe he's looking for a switch for initdb that would make it
interactive and quiz you about your expected usage-- sort of a magic
auto-configurator wizard doohicky?  I could see that sort of thing being
nice for the casual user or newbie who otherwise would have a horribly
mis-tuned database.  They could instead have only a marginally mis-tuned
database :)

On Fri, 2007-04-27 at 10:30 -0400, Michael Stone wrote:
> On Fri, Apr 27, 2007 at 09:27:49AM -0400, Carlos Moreno wrote:
> >Notice that the second part of my suggestion covers this --- have
> >additional
> >switches to initdb so that the user can tell it about estimates on how
> >the DB
> >will be used:  estimated size of the DB, estimated percentage of
> >activity that
> >will involve writing, estimated percentage of activity that will be
> >transactions,
> >percentage that will use indexes, percentage of queries that will be
> >complex,
> >etc. etc.
>
> If the person knows all that, why wouldn't they know to just change the
> config parameters?
>
> Mike Stone
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org

От:
"H.J. Sanders"
Дата:

Hello.


Just my 2 cents, and not looking to the technical aspects:

setting up PSQL is the weakest point of PSQL as we have experienced ourself,
once it is running it is great.

I can imagine that a lot of people of stops after their first trials after
they have
experienced the troubles and performance of a standard set up.

This is ofcourse a lost user forever.

So anything that could be done to get an easier and BETTER setup would
strongly enhance PSQL.

My 2 cents.

Henk Sanders



-----Oorspronkelijk bericht-----
Van: 
[mailto:]Namens Tom Lane
Verzonden: vrijdag 27 april 2007 16:37
Aan: Carlos Moreno
CC: PostgreSQL Performance
Onderwerp: Re: [PERFORM] Feature Request --- was: PostgreSQL Performance
Tuning


Carlos Moreno <> writes:
> Tom Lane wrote:
>> But
>> the fundamental problem remains that we don't know that much about
>> how the installation will be used.

> Notice that the second part of my suggestion covers this --- have
> additional switches to initdb

That's been proposed and rejected before, too; the main problem being
that initdb is frequently a layer or two down from the user (eg,
executed by initscripts that can't pass extra arguments through, even
assuming they're being invoked by hand in the first place).

            regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match


От:
Michael Stone
Дата:

On Fri, Apr 27, 2007 at 07:36:52AM -0700, Mark Lewis wrote:
>Maybe he's looking for a switch for initdb that would make it
>interactive and quiz you about your expected usage-- sort of a magic
>auto-configurator wizard doohicky?  I could see that sort of thing being
>nice for the casual user or newbie who otherwise would have a horribly
>mis-tuned database.  They could instead have only a marginally mis-tuned
>database :)

However you implement it, anyone who can answer all of those questions
is probably capable of reading and understanding the performance section
in the manual.

It's probably more practical to have a seperate script that looks at the
running system (ram, disks, pg config, db size, indices, stats, etc.)
and makes suggestions--if someone wants to write such a thing.

Mike Stone

От:
Jim Nasby
Дата:

On Apr 27, 2007, at 3:30 PM, Michael Stone wrote:
> On Fri, Apr 27, 2007 at 09:27:49AM -0400, Carlos Moreno wrote:
>> Notice that the second part of my suggestion covers this --- have
>> additional
>> switches to initdb so that the user can tell it about estimates on
>> how the DB
>> will be used:  estimated size of the DB, estimated percentage of
>> activity that
>> will involve writing, estimated percentage of activity that will
>> be transactions,
>> percentage that will use indexes, percentage of queries that will
>> be complex,
>> etc. etc.
>
> If the person knows all that, why wouldn't they know to just change
> the config parameters?

Because knowing your expected workload is a lot easier for many
people than knowing what every GUC does.

Personally, I think it would be a tremendous start if we just
provided a few sample configs like MySQL does. Or if someone wanted
to get fancy they could stick a web page somewhere that would produce
a postgresql.conf based simply on how much available RAM you had,
since that's one of the biggest performance-hampering issues we run
into (ie: shared_buffers left at the default of 32MB).
--
Jim Nasby                                            
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)



От:
Dan Harris
Дата:

Michael Stone wrote:
> On Fri, Apr 27, 2007 at 09:27:49AM -0400, Carlos Moreno wrote:
>> Notice that the second part of my suggestion covers this --- have
>> additional
>> switches to initdb
<snip>
> If the person knows all that, why wouldn't they know to just change the
> config parameters?
>

Exactly..  What I think would be much more productive is to use the great amount
of information that PG tracks internally and auto-tune the parameters based on
it.  For instance:

Why does the user need to manually track max_fsm_pages and max_fsm_relations?  I
bet there are many users who have never taken the time to understand what this
means and wondering why performance still stinks after vacuuming their database
( spoken from my own experience )

How about work_mem?  shared_buffers?  column statistics sizes? random_page_cost?

Couldn't some fairly simple regression tests akin to a VACUUM process spot
potential problems?  "Hey, it looks like you need more fsm_relations.. I bumped
that up automatically for you".  Or "These indexes look bloated, shall I
automatically reindex them for you?"

I'm sure there are many more examples, that with some creative thinking, could
be auto-adjusted to match the usage patterns of the database. PG does an
excellent job of exposing the variables to the users, but mostly avoids telling
the user what to do or doing it for them.  Instead, it is up to the user to know
where to look, what to look for, and how to react to things to improve
performance.  This is not all bad, but it is assuming that all users are hackers
( which used to be true ), but certainly doesn't help when the average SQLServer
admin tries out Postgres and then is surprised at the things they are now
responsible for managing.  PG is certainly *not* the only database to suffer
from this syndrome, I know..

I like to think of my systems as good employees.  I don't want to have to
micromanage everything they do.  I want to tell them "here's what I want done",
and assuming I made a good hiring choice, they will do it and take some liberty
to adjust parameters where needed to achieve the spirit of the goal, rather than
  blindly do something inefficiently because I failed to explain to them the
absolute most efficient way to accomplish the task.

Granted, there are some people who don't like the developers making any
assumptions about their workload.  But this doesn't have to be an either/or
proposition.  I don't think any control needs to be abandoned.  But
self-adjusting defaults seem like an achievable goal ( I know, I know, "show us
the patch" ).  I just don't know if this feeling has resonated well between new
users and long-term developers.  I know it must be grating to have to answer the
same questions over and over and over "have you analyzed?  Did you leave
postgresql.conf at the defaults??".  Seems like a win-win for both sides, IMHO.

In closing, I am not bashing PG!  I love it and swear by it.  These comments are
purely from an advocacy perspective.  I'd love to see PG user base continue to grow.

My .02

-Dan



От:
Josh Berkus
Дата:

Dan,

> Exactly..  What I think would be much more productive is to use the
> great amount of information that PG tracks internally and auto-tune the
> parameters based on it.  For instance:

*Everyone* wants this.  The problem is that it's very hard code to write
given the number of variables.  I'm working on it but progress is slow,
due to my travel schedule.

--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

От:
Bill Moran
Дата:

In response to Dan Harris <>:

> Michael Stone wrote:
> > On Fri, Apr 27, 2007 at 09:27:49AM -0400, Carlos Moreno wrote:
> >> Notice that the second part of my suggestion covers this --- have
> >> additional
> >> switches to initdb
> <snip>
> > If the person knows all that, why wouldn't they know to just change the
> > config parameters?
>
> Exactly..  What I think would be much more productive is to use the great amount
> of information that PG tracks internally and auto-tune the parameters based on
> it.  For instance:
>
> Why does the user need to manually track max_fsm_pages and max_fsm_relations?  I
> bet there are many users who have never taken the time to understand what this
> means and wondering why performance still stinks after vacuuming their database
> ( spoken from my own experience )

But there are two distinct routes that can be taken if there's not enough
fsm space: add fsm space or vacuum more frequently.  I don't want the system
to eat up a bunch of memory for fsm entries if my workload indicates that
I can easily vacuum more frequently.

> How about work_mem?  shared_buffers?  column statistics sizes? random_page_cost?

The only one that seems practical (to me) is random_page_cost.  The others are
all configuration options that I (as a DBA) want to be able to decide for
myself.  For example, I have some dedicated PG servers that I pretty much
max those values out at, to let PG know that it can use everything on the
system -- but I also have some shared use machines with PG, where I carefully
constrain those values so that PG doesn't muscle other daemons out of their
share of the RAM (work_mem is probably the best example)

It would be nice to have some kind of utility that could tell me what
random_page_cost should be, as I've never felt comfortable tweaking it.
Like some utility to run that would say "based on the seek tests I just
ran, you should set random_page_cost to x".  Of course, if such a thing
existed, it could just fill in the value for you.  But I haven't figured
out how to pick a good value for that setting, so I have no idea how to
suggest to have it automatically set.

> Couldn't some fairly simple regression tests akin to a VACUUM process spot
> potential problems?  "Hey, it looks like you need more fsm_relations.. I bumped
> that up automatically for you".  Or "These indexes look bloated, shall I
> automatically reindex them for you?"

A lot of that stuff does happen.  A vacuum verbose will tell you what it
thinks you should do, but I don't _want_ it to do it automatically.  What
if I create huge temporary tables once a week for some sort of analysis that
overload the fsm space?  And if I'm dropping those tables when the analysis
is done, do I want the fsm space constantly adjusting?

Plus, some is just impossible.  shared_buffers requires a restart.  Do you
want your DB server spontaneously restarting because it thought more
buffers might be nice?

> I'm sure there are many more examples, that with some creative thinking, could
> be auto-adjusted to match the usage patterns of the database. PG does an
> excellent job of exposing the variables to the users, but mostly avoids telling
> the user what to do or doing it for them.  Instead, it is up to the user to know
> where to look, what to look for, and how to react to things to improve
> performance.  This is not all bad, but it is assuming that all users are hackers
> ( which used to be true ), but certainly doesn't help when the average SQLServer
> admin tries out Postgres and then is surprised at the things they are now
> responsible for managing.  PG is certainly *not* the only database to suffer
> from this syndrome, I know..

I expect the suffering is a result of the fact that databases are non-trivial
pieces of software, and there's no universally simple way to set them up
and make them run well.

> I like to think of my systems as good employees.  I don't want to have to
> micromanage everything they do.  I want to tell them "here's what I want done",
> and assuming I made a good hiring choice, they will do it and take some liberty
> to adjust parameters where needed to achieve the spirit of the goal, rather than
>   blindly do something inefficiently because I failed to explain to them the
> absolute most efficient way to accomplish the task.

That's silly.  No software does that.  You're asking software to behave like
humans.  If that were the case, this would be Isaac Asimov's world, not the
real one.

> Granted, there are some people who don't like the developers making any
> assumptions about their workload.  But this doesn't have to be an either/or
> proposition.  I don't think any control needs to be abandoned.  But
> self-adjusting defaults seem like an achievable goal ( I know, I know, "show us
> the patch" ).  I just don't know if this feeling has resonated well between new
> users and long-term developers.  I know it must be grating to have to answer the
> same questions over and over and over "have you analyzed?  Did you leave
> postgresql.conf at the defaults??".  Seems like a win-win for both sides, IMHO.

Well, it seems like this is happening where it's practical -- autovacuum is
a good example.

Personally, I wouldn't be opposed to more automagic stuff, just as long as
I have the option to disable it.  There are some cases where I still
disable autovac.

> In closing, I am not bashing PG!  I love it and swear by it.  These comments are
> purely from an advocacy perspective.  I'd love to see PG user base continue to grow.

I expect that part of the problem is "who's going to do it?"

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/


Phone: 412-422-3463x4023

От:
Kevin Hunter
Дата:

At 10:36a -0400 on 27 Apr 2007, Tom Lane wrote:
> That's been proposed and rejected before, too; the main problem being
> that initdb is frequently a layer or two down from the user (eg,
> executed by initscripts that can't pass extra arguments through, even
> assuming they're being invoked by hand in the first place).

And following after Dan Harris' response . . .

So what's the problem with having some sort of cronjob contrib module
that utilizes the actual and current statistics to make
recommendations?  I don't think it'd be right to simply change the
configuration options as it sees fit (especially as it was pointed
out that many run multiple postmasters or have other uses for the
machines in question), but perhaps it could send a message (email?)
along the lines of "Hey, I'm currently doing this many of X
transactions, against this much of Y data, and working under these
constraints.  You might get better performance (in this area ... ) if
you altered the the configurations options like so: ..."

Certainly not for the masters, but perhaps for standard installation
sort of deals, sort of liking bringing up the rear . . . just a thought.

Kevin

От:
Ray Stell
Дата:

On Fri, Apr 27, 2007 at 02:40:07PM -0400, Kevin Hunter wrote:
> out that many run multiple postmasters or have other uses for the
> machines in question), but perhaps it could send a message (email?)
> along the lines of "Hey, I'm currently doing this many of X
> transactions, against this much of Y data, and working under these
> constraints.  You might get better performance (in this area ... ) if
> you altered the the configurations options like so: ..."


or storing the values in the db for later trending analysis, witness
ora statspack.

От:
Josh Berkus
Дата:

Bill,

> The only one that seems practical (to me) is random_page_cost.  The
> others are all configuration options that I (as a DBA) want to be able
> to decide for myself.

Actually, random_page_cost *should* be a constant "4.0" or "3.5", which
represents the approximate ratio of seek/scan speed which has been
relatively constant across 6 years of HDD technology.  The only reason we
make it a configuration variable is that there's defects in our cost model
which cause users to want to tinker with it.

Mind you, that's gotten better in recent versions as well.  Lately I mostly
tinker with effective_cache_size and the various cpu_* stats rather than
modifying random_page_cost.

--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

От:
Dan Harris
Дата:

Bill Moran wrote:
> In response to Dan Harris <>:
<snip>
>> Why does the user need to manually track max_fsm_pages and max_fsm_relations?  I
>> bet there are many users who have never taken the time to understand what this
>> means and wondering why performance still stinks after vacuuming their database
>> ( spoken from my own experience )
>
> But there are two distinct routes that can be taken if there's not enough
> fsm space: add fsm space or vacuum more frequently.  I don't want the system
> to eat up a bunch of memory for fsm entries if my workload indicates that
> I can easily vacuum more frequently.

There's no magic bullet here, but heuristics should be able to tell us you can
"easily vacuum more frequently"  And again, I said these things would be
*optional*.  Like an item in postgresql.conf
"i_have_read_the_manual_and_know_what_this_all_means = false #default false".
If you change it to true, you have all the control you're used to and nothing
will get in your way.

>
>> How about work_mem?  shared_buffers?  column statistics sizes? random_page_cost?
>
> The only one that seems practical (to me) is random_page_cost.  The others are
> all configuration options that I (as a DBA) want to be able to decide for
> myself.  For example, I have some dedicated PG servers that I pretty much
> max those values out at, to let PG know that it can use everything on the
> system -- but I also have some shared use machines with PG, where I carefully
> constrain those values so that PG doesn't muscle other daemons out of their
> share of the RAM (work_mem is probably the best example)
>

Just because you carefully constrain it does not preclude the ability for
program logic to maintain statistics to do what I suggested.

> It would be nice to have some kind of utility that could tell me what
> random_page_cost should be, as I've never felt comfortable tweaking it.
> Like some utility to run that would say "based on the seek tests I just
> ran, you should set random_page_cost to x".  Of course, if such a thing
> existed, it could just fill in the value for you.  But I haven't figured
> out how to pick a good value for that setting, so I have no idea how to
> suggest to have it automatically set.

Me either, but I thought if there's a reason it's user-settable, there must be
some demonstrable method for deciding what is best.

>
>> Couldn't some fairly simple regression tests akin to a VACUUM process spot
>> potential problems?  "Hey, it looks like you need more fsm_relations.. I bumped
>> that up automatically for you".  Or "These indexes look bloated, shall I
>> automatically reindex them for you?"
>
> A lot of that stuff does happen.  A vacuum verbose will tell you what it
> thinks you should do, but I don't _want_ it to do it automatically.  What
> if I create huge temporary tables once a week for some sort of analysis that
> overload the fsm space?  And if I'm dropping those tables when the analysis
> is done, do I want the fsm space constantly adjusting?

I understand *you* don't want it done automatically.  But my suspicion is that
there are a lot more newbie pg admins who would rather let the system do
something sensible as a default.  Again, you sound defensive that somehow my
ideas would take power away from you.  I'm not sure why that is, but certainly
I'm not suggesting that.  An auto-pilot mode is not a bad idea just because a
few pilots don't want to use it.

>
> Plus, some is just impossible.  shared_buffers requires a restart.  Do you
> want your DB server spontaneously restarting because it thought more
> buffers might be nice?

Well, maybe look at the bigger picture and see if it can be fixed to *not*
require a program restart?  Or.. take effect on the next pid that gets created?
  This is a current limitation, but doesn't need to be one for eternity does it?

>
>> I'm sure there are many more examples, that with some creative thinking, could
>> be auto-adjusted to match the usage patterns of the database. PG does an
>> excellent job of exposing the variables to the users, but mostly avoids telling
>> the user what to do or doing it for them.  Instead, it is up to the user to know
>> where to look, what to look for, and how to react to things to improve
>> performance.  This is not all bad, but it is assuming that all users are hackers
>> ( which used to be true ), but certainly doesn't help when the average SQLServer
>> admin tries out Postgres and then is surprised at the things they are now
>> responsible for managing.  PG is certainly *not* the only database to suffer
>> from this syndrome, I know..
>
> I expect the suffering is a result of the fact that databases are non-trivial
> pieces of software, and there's no universally simple way to set them up
> and make them run well.

Speaking as a former SQL Server admin ( from day 1 of the Sybase fork up to
version 2000 ), I can say there *is* a way to make them simple.  It's certainly
not a perfect piece of software, but the learning curve speaks for itself.  It
can auto-shrink your databases ( without locking problems ).  Actually it pretty
much runs itself.  It auto-allocates RAM for you ( up to the ceiling *you*
control ).  It automatically re-analyzes itself.. I was able to successfully
manage several servers with not insignificant amounts of data in them for many
years without being a trained DBA.  After switching to PG, I found myself having
to twiddle with all sorts of settings that seemed like it should just know about
without me having to tell it.

I'm not saying it was simple to make it do that.  MS has invested LOTS of money
and effort into making it that way.  I don't expect PG to have features like
that tomorrow or even next release.  But, I feel it's important to make sure
that those who *can* realistically take steps in that direction understand this
point of view ( and with Josh's other reply to this, I think many do ).

>
>> I like to think of my systems as good employees.  I don't want to have to
>> micromanage everything they do.  I want to tell them "here's what I want done",
>> and assuming I made a good hiring choice, they will do it and take some liberty
>> to adjust parameters where needed to achieve the spirit of the goal, rather than
>>   blindly do something inefficiently because I failed to explain to them the
>> absolute most efficient way to accomplish the task.
>
> That's silly.  No software does that.  You're asking software to behave like
> humans.  If that were the case, this would be Isaac Asimov's world, not the
> real one.

It's not silly.  There are plenty of systems that do that.  Maybe you just
haven't used them.  Again, SQL Server did a lot of those things for me.  I
didn't have to fiddle with checkboxes or multi-select tuning options.  It
learned what its load was and reacted appropriately.  I never had to stare at
planner outputs and try and figure out why the heck did it choose that plan.
Although, I certainly could have if I wanted to.  It has a tool called the SQL
Profiler which will "watch" your workload on the database, do regression testing
and suggest ( and optionally implement with a single click ) indexes on your
tables.  I've been wanting to do this for years with PG, and had a small start
on a project to do just that actually.

>
>> Granted, there are some people who don't like the developers making any
>> assumptions about their workload.  But this doesn't have to be an either/or
>> proposition.  I don't think any control needs to be abandoned.  But
>> self-adjusting defaults seem like an achievable goal ( I know, I know, "show us
>> the patch" ).  I just don't know if this feeling has resonated well between new
>> users and long-term developers.  I know it must be grating to have to answer the
>> same questions over and over and over "have you analyzed?  Did you leave
>> postgresql.conf at the defaults??".  Seems like a win-win for both sides, IMHO.
>
> Well, it seems like this is happening where it's practical -- autovacuum is
> a good example.

Agreed, this is a huge step forward.  And again, I'm not taking an offensive
posture on this.  Just that I think it's worth giving my .02 since I have had
strong feelings about this for awhile.

>
> Personally, I wouldn't be opposed to more automagic stuff, just as long as
> I have the option to disable it.  There are some cases where I still
> disable autovac.
>
>> In closing, I am not bashing PG!  I love it and swear by it.  These comments are
>> purely from an advocacy perspective.  I'd love to see PG user base continue to grow.
>
> I expect that part of the problem is "who's going to do it?"
>

Yes, this is the classic problem.  I'm not demanding anyone pick up the ball and
jump on this today, tomorrow, etc.. I just think it would be good for those who
*could* make a difference to keep those goals in mind when they continue.  If
you have the right mindset, this problem will fix itself over time.

-Dan

От:
Josh Berkus
Дата:

Dan,

> Yes, this is the classic problem.  I'm not demanding anyone pick up the
> ball and jump on this today, tomorrow, etc.. I just think it would be
> good for those who *could* make a difference to keep those goals in mind
> when they continue.  If you have the right mindset, this problem will
> fix itself over time.

Don't I wish.  Autotuning is *hard*.  It took Oracle 6 years.  It took
Microsoft 3-4 years, and theirs still has major issues last I checked. And
both of those DBs support less OSes than we do.  I think it's going to
take more than the *right mindset* and my spare time.

> I appreciate your efforts in this regard.  Do you have a formal project
> plan for this?  If you can share it with me, I'll take a look and see if
> there is anything I can do to help out.

Nope, just some noodling around on the configurator:
www.pgfoundry.org/projects/configurator

> I am on the verge of starting a Java UI that will query a bunch of the
> pg_* tables and give the user information about wasted table space,
> index usage, table scans, slow-running queries and spoon-feed it in a
> nice attractive interface that can be a real-time system monitor tool.
> This could be a cooperative project or might have some redundancy with
> what you're up to.

I'd be *very* interested in collaborating with you on this.  Further, we
could feed DTrace (& systemtap?) into the interface to get data that
PostgreSQL doesn't currently produce.

--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

От:
david@lang.hm
Дата:

On Fri, 27 Apr 2007, Josh Berkus wrote:

> Dan,
>
>> Yes, this is the classic problem.  I'm not demanding anyone pick up the
>> ball and jump on this today, tomorrow, etc.. I just think it would be
>> good for those who *could* make a difference to keep those goals in mind
>> when they continue.  If you have the right mindset, this problem will
>> fix itself over time.
>
> Don't I wish.  Autotuning is *hard*.  It took Oracle 6 years.  It took
> Microsoft 3-4 years, and theirs still has major issues last I checked. And
> both of those DBs support less OSes than we do.  I think it's going to
> take more than the *right mindset* and my spare time.

I think there are a couple different things here.

1. full autotuning

   as you say, this is very hard and needs a lot of info about your
particular database useage.

2. getting defaults that are closer to right then current.

   this is much easier. for this nobody is expecting that the values are
right, we're just begging for some tool to get us within an couple orders
of magnatude of what's correct.

the current defaults are appropriate for a single cpu with 10's of MB of
ram and a single drive

nowdays you have people trying to run quick-and-dirty tests on some spare
hardware they have laying around (waiting for another project) that's got
4-8 CPU's with 10's of GB of ram and a couple dozen drives

these people don't know about database tuneing, they can learn, but they
want to see if postgres is even in the ballpark. if the results are close
to acceptable they will ask questions and research the tuneing, but if the
results are orders of magnatude lower then they need to be they'll just
say that postgress is too slow and try another database.

an autodefault script that was written assuming that postgres has the box
to itself would be a wonderful start.

I think the next step would be to be able to tell the script 'only plan on
useing 1/2 of this box'

and beyond that would be the steps that you are thinking of where the
useage pattern is considered.

but when every performance question is answered with "did you change the
defaults? they are way too low for modern hardware, raise them by 2 orders
of magnatude and then we'll start investigating"

David Lang

От:
"Harald Armin Massa"
Дата:

Carlos,

about your feature proposal: as I learned, nearly all Perfomance.Configuration can be done by editing the .INI file and making the Postmaster re-read it.

So, WHY at all should those parameters be guessed at the installation of the database? Would'nt it be a saver point of time to have some  postgresql-tune
utilitiy, which gets run after the installation, maybe every once in a while. That tool can check vital information like "Databasesize to memory relation"; and suggest a new postgresql.ini.

That tool needs NO INTEGRATION whatsoever - it can be developed, deployed totally independend and later only be bundled.

Does my suggestion make more sense now?  Or is it still too unrealistic to
make it work properly/safely?

And as this tool can be tested seperately, does not need a new initdb every time ... it can be developed more easily.

Maybe there is even a pointy flashy version possible (perhaps even for money :) which gives nice graphics and "optimized", like those Windows Optimizers. :)  I am sure, some DBAs in BIGCOMPs would be thrilled :)

May that be a possible way?

Harald


--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Reinsburgstraße 202b
70197 Stuttgart
0173/9409607
fx 01212-5-13695179
-
Python: the only language with more web frameworks than keywords.
От:
"Jonah H. Harris"
Дата:

On 4/28/07, Harald Armin Massa <> wrote:
> about your feature proposal: as I learned, nearly all
> Perfomance.Configuration can be done by editing the .INI file and making the
> Postmaster re-read it.

Um, shared_buffers is one of the most important initial parameters to
set and it most certainly cannot be set after startup.

> So, WHY at all should those parameters be guessed at the installation of the
> database?

Because a lot of good assumptions can be made on the initial install.
Likewise, some of the most important parameters cannot be tuned after
startup.

> Maybe there is even a pointy flashy version possible (perhaps even for money
> :) which gives nice graphics and "optimized", like those Windows Optimizers.
> :)  I am sure, some DBAs in BIGCOMPs would be thrilled :)

I'd suggest that you not make snide remarks about someone else's
design when your own analysis is somewhat flawed.

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation            | fax: 732.331.1301
33 Wood Ave S, 3rd Floor            | 
Iselin, New Jersey 08830            | http://www.enterprisedb.com/

От:
"Harald Armin Massa"
Дата:

Jonah,

Um, shared_buffers is one of the most important initial parameters to
set and it most certainly cannot be set after startup.

Not after startup, correct. But after installation. It is possible to change PostgreSQL.conf (not ini, to much windows on my side, sorry) and restart postmaster.

Because a lot of good assumptions can be made on the initial install.
Likewise, some of the most important parameters cannot be tuned after
startup.

Yes. These assumptions can be made - but then they are assumptions. When the database is filled and working, there are measurable facts. And yes, that needs a restart of postmaster, that does not work on 24/7. But there are many databases which can be restartet for tuning in regular maintainance sessions.

> :) which gives nice graphics and "optimized", like those Windows Optimizers.
> :)  I am sure, some DBAs in BIGCOMPs would be thrilled :)

>I'd suggest that you not make snide remarks about someone else's
>design when your own analysis is somewhat flawed.

Sorry, Jonah, if my words sounded "snide". I had feedback from some DBAs in BIGCOMPs, who said very positive things about the beauty of pgadmin. I saw some DBAs quite happy about the graphical displays of TOAD. I worked for a MVS Hoster who paid BIG SUMS to Candle Software for a Software called Omegamon, which made it possible to have charts about performance figures.
So I deducted that people would even be willing to pay money for a GUI which presents the opimizing process.

That idea of "tune PostgreSQL database after installation" also came from the various request on pgsql-performance. Some ask before they install; but there are also MANY questions with "our PostgreSQL database was running fast untill xxxx", with xxxx usually being a table grown bigger then n records.

And I really did not want to discredit the idea of properly configuring from the start. Just wanted to open an other option to do that tuning.

Harald
--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Reinsburgstraße 202b
70197 Stuttgart
0173/9409607
fx 01212-5-13695179
-
Python: the only language with more web frameworks than keywords.
От:
Carlos Moreno
Дата:

Harald Armin Massa wrote:
> Carlos,
>
> about your feature proposal: as I learned, nearly all
> Perfomance.Configuration can be done by editing the .INI file and
> making the Postmaster re-read it.
>
> So, WHY at all should those parameters be guessed at the installation
> of the database? Would'nt it be a saver point of time to have some
> postgresql-tune
> utilitiy, which gets run after the installation, maybe every once in a
> while. That tool can check vital information like "Databasesize to
> memory relation"; and suggest a new postgresql.ini.

I would soooo volunteer to do that and donate it to the PG project!!

Problem is, you see, I'm not sure what an appropriate algorithm would
be --- for instance, you mention "DB size to memory relation" as if it is
an extremely important parameter and/or a parameter with extremely
obvious consequences in the configuration file --- I'm not even sure
about what I would do given that.  I always get the feeling that figuring
out stuff from the performance-tuning-related documentation requires
technical knowledge about the internals of PG  (which, granted, it's
out there, given that we get 100% of the source code ...  But still, that's
far from the point, I'm sure we agree)

That's why I formulated my previous post as a Feature Request, rather
than a "would this be a good feature / should I get started working on
that?"  :-)

Carlos
--


От:
Greg Smith
Дата:

On Fri, 27 Apr 2007, Josh Berkus wrote:

> *Everyone* wants this.  The problem is that it's very hard code to write
> given the number of variables

There's lots of variables, and there are at least three major ways to work
on improving someone's system:

1) Collect up data about their system (memory, disk layout), find out a
bit about their apps/workload, and generate a config file based on that.

2) Connect to the database and look around.  Study the tables and some
their stats, make some estimates based on what your find, produce a new
config file.

3) Monitor the database while it's doing its thing.  See which parts go
well and which go badly by viewing database statistics like pg_statio.
From that, figure out where the bottlenecks are likely to be and push more
resources toward them.  What I've been working on lately is exposing more
readouts of performance-related database internals to make this more
practical.

When first exposed to this problem, most people assume that (1) is good
enough--ask some questions, look at the machine, and magically a
reasonable starting configuration can be produced.  It's already been
pointed out that anyone with enough knowledge to do all that can probably
spit out a reasonable guess for the config file without help.  If you're
going to the trouble of building a tool for offering configuration advice,
it can be widly more effective if you look inside the database after it's
got data in it, and preferably after it's been running under load for a
while, and make your recommendations based on all that information.

--
* Greg Smith  http://www.gregsmith.com Baltimore, MD

От:
"Craig A. James"
Дата:

Greg Smith wrote:
> If you're going to the trouble of building a tool for offering
> configuration advice, it can be widly more effective if you look inside
> the database after it's got data in it, and preferably after it's been
> running under load for a while, and make your recommendations based on
> all that information.

There are two completely different problems that are getting mixed together in this discussion.   Several people have
triedto distinguish them, but let's be explicit: 

1. Generating a resonable starting configuration for neophyte users who have installed Postgres for the first time.

2. Generating an optimal configuration for a complex, running system that's loaded with data.

The first problem is easy: Any improvement would be welcome and would give most users a better initial experience.  The
secondproblem is nearly impossible.  Forget the second problem (or put it on the "let's find someone doing a PhD
project"list), and focus on the first. 

From my limited experience, a simple questionaire could be used to create a pretty good starting configuration file.
Furthermore,many of the answers can be discovered automatically: 

1. How much memory do you have?
2. How many disks do you have?
   a. Which disk contains the OS?
   b. Which disk(s) have swap space?
   c. Which disks are "off limits" (not to be used by Postgres)
3. What is the general nature of your database?
   a. Mostly static (few updates, lots of access)
   b. Mostly archival (lots of writes, few reads)
   c. Very dynamic (data are added, updated, and deleted a lot)
4. Do you have a lot of small, fast transactions or a few big, long transactions?
5. How big do you expect your database to be?
6. How many simultaneous users do you expect?
7. What are the users you want configured initially?
8. Do you want local access only, or network access?

With these few questions (and perhaps a couple more), a decent set of startup files could be created that would give
good,'tho not optimal, performance for most people just getting started. 

I agree with an opinion posted a couple days ago: The startup configuration is one of the weakest features of Postgres.
It's not rocket science, but there are several files, and it's not obvious to the newcomer that the files even exist. 

Here's just one example: A coworker installed Postgres and couldn't get it to work at all.  He struggled for hours.
Whenhe contacted me, I tried his installation and it worked fine.  He tried it, and he couldn't connect.  I asked him,
"Areyou using localhost?"  He said yes, but what he meant was he was using the local *network*, 192.168.0.5, whereas I
wasusing "localhost".  He didn't have network access enabled.  So, four hours wasted. 

This is the sort of thing that makes experienced users say, "Well, duh!"  But there are many number of these little
trapsand obscure configuration parameters that make the initial Postgres experience a poor one.  It wouldn't take much
tomake a big difference to new users. 

Craig




От:
Kevin Hunter
Дата:

At 12:18p -0400 on 30 Apr 2007, Craig A. James wrote:
> 1. Generating a resonable starting configuration for neophyte users
> who have installed Postgres for the first time.

I recognize that PostgreSQL and MySQL try to address different
problem-areas, but is this one reason why a lot of people with whom I
talk prefer MySQL?  Because PostgreSQL is so "slooow" out of the box?*

Thanks,

Kevin

* Not trolling; I use PostgreSQL almost exclusively.

От:
Josh Berkus
Дата:

Greg,

> 1) Collect up data about their system (memory, disk layout), find out a
> bit about their apps/workload, and generate a config file based on that.

We could start with this.  Where I bogged down is that collecting system
information about several different operating systems ... and in some cases
generating scripts for boosting things like shmmax ... is actually quite a
large problem from a slog perspective; there is no standard way even within
Linux to describe CPUs, for example. Collecting available disk space
information is even worse.   So I'd like some help on this portion.

I actually have algorithms which are "good enough to start with" for most of
the important GUCs worked out, and others could be set through an interactive
script ("Does your application require large batch loads involving thousands
or millions of updates in the same transaction?"  "How large (GB) do you
expect your database to be?")

> 2) Connect to the database and look around.  Study the tables and some
> their stats, make some estimates based on what your find, produce a new
> config file.

I'm not sure that much more for (2) can be done than for (1).  Tables-on-disk
don't tell us much.

> 3) Monitor the database while it's doing its thing.  See which parts go
> well and which go badly by viewing database statistics like pg_statio.
> From that, figure out where the bottlenecks are likely to be and push more
> resources toward them.  What I've been working on lately is exposing more
> readouts of performance-related database internals to make this more
> practical.

We really should collaborate on that.

> When first exposed to this problem, most people assume that (1) is good
> enough--ask some questions, look at the machine, and magically a
> reasonable starting configuration can be produced.  It's already been
> pointed out that anyone with enough knowledge to do all that can probably
> spit out a reasonable guess for the config file without help.

But that's actually more than most people already do.  Further, if you don't
start with a "reasonable" configuration, then it's difficult-impossible to
analyze where your settings are out-of-whack; behavior introduced by some
way-to-low settings will mask any other tuning that needs to be done.  It's
also hard/impossible to devise tuning algorithms that work for both gross
tuning (increase shared_buffers by 100x) and fine tuning (decrease
bgwriter_interval to 45ms).

So whether or not we do (3), we need to do (1) first.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

От:
Carlos Moreno
Дата:

> large problem from a slog perspective; there is no standard way even within
> Linux to describe CPUs, for example. Collecting available disk space
> information is even worse.   So I'd like some help on this portion.
>

Quite likely, naiveness follows...  But, aren't things like /proc/cpuinfo ,
/proc/meminfo, /proc/partitions / /proc/diskstats standard, at the very
least across Linux distros?  I'm not familiar with BSD or other Unix
flavours, but I would expect these (or their equivalent) to exist in those,
no?

Am I just being naive?

Carlos
--


От:
david@lang.hm
Дата:

On Tue, 1 May 2007, Carlos Moreno wrote:

>>  large problem from a slog perspective; there is no standard way even
>>  within Linux to describe CPUs, for example. Collecting available disk
>>  space information is even worse.   So I'd like some help on this portion.
>>
>
> Quite likely, naiveness follows...  But, aren't things like /proc/cpuinfo ,
> /proc/meminfo, /proc/partitions / /proc/diskstats standard, at the very
> least across Linux distros?  I'm not familiar with BSD or other Unix
> flavours, but I would expect these (or their equivalent) to exist in those,
> no?
>
> Am I just being naive?

unfortunantly yes.

across different linux distros they are fairly standard (however different
kernel versions will change them)

however different kernels need drasticly different tools to get the info
from them.

David Lang

От:
Greg Smith
Дата:

On Mon, 30 Apr 2007, Kevin Hunter wrote:

> I recognize that PostgreSQL and MySQL try to address different
> problem-areas, but is this one reason why a lot of people with whom I
> talk prefer MySQL? Because PostgreSQL is so "slooow" out of the box?

It doesn't help, but there are many other differences that are as big or
bigger.  Here are a few samples off the top of my head:

1) Performance issues due to MVCC (MySQL fans love to point out how
fast they can do select count(*) from x)
2) Not knowing you have to run vacuum analyze and therefore never seeing a
good query result
3) Unfair comparison of PostgreSQL with robust WAL vs. MySQL+MyISAM on
write-heavy worksloads

These are real issues, which of course stack on top of things like
outdated opinions from older PG releases with performance issues resolved
in the last few years.

--
* Greg Smith  http://www.gregsmith.com Baltimore, MD

От:
Greg Smith
Дата:

On Tue, 1 May 2007, Josh Berkus wrote:

> there is no standard way even within Linux to describe CPUs, for
> example. Collecting available disk space information is even worse.  So
> I'd like some help on this portion.

I'm not fooled--secretly you and your co-workers laugh at how easy this is
on Solaris and are perfectly happy with how difficult it is on Linux,
right?

I joke becuase I've been re-solving some variant on this problem every few
years for a decade now and it just won't go away.  Last time I checked the
right answer was to find someone else who's already done it, packaged that
into a library, and appears committed to keeping it up to date; just pull
a new rev of that when you need it.  For example, for the CPU/memory part,
top solves this problem and is always kept current, so on open-source
platforms there's the potential to re-use that code.  Now that I know
that's one thing you're (understandably) fighting with I'll dig up my
references on that (again).

> It's also hard/impossible to devise tuning algorithms that work for both
> gross tuning (increase shared_buffers by 100x) and fine tuning (decrease
> bgwriter_interval to 45ms).

I would advocate focusing on iterative improvements to an existing
configuration rather than even bothering with generating a one-off config
for exactly this reason.  It *is* hard/impossible to get it right in a
single shot, because of how many parameters interact and the way
bottlenecks clear, so why not assume from the start you're going to do it
several times--then you've only got one piece of software to write.

The idea I have in my head is a tool that gathers system info, connects to
the database, and then spits out recommendations in order of expected
effectiveness--with the specific caveat that changing too many things at
one time isn't recommended, and some notion of parameter dependencies.
The first time you run it, you'd be told that shared_buffers was wildly
low, effective_cache_size isn't even in the right ballpark, and your
work_mem looks small relative to the size of your tables; fix those before
you bother doing anything else because any data collected with those at
very wrong values is bogus.  Take two, those parameters pass their sanity
tests, but since you're actually running at a reasonable speed now the
fact that your tables are no longer being vacuumed frequently enough might
bubble to the top.

It would take a few passes through to nail down everything, but as long as
it's put together such that you'd be in a similar position to the
single-shot tool after running it once it would remove that as something
separate that needed to be built.

To argue against myself for a second, it may very well be the case that
writing the simpler tool is the only way to get a useful prototype for
building the more complicated one; very easy to get bogged down in feature
creep on a grand design otherwise.

--
* Greg Smith  http://www.gregsmith.com Baltimore, MD

От:
david@lang.hm
Дата:

On Tue, 1 May 2007, Greg Smith wrote:

> On Tue, 1 May 2007, Josh Berkus wrote:
>
>>  there is no standard way even within Linux to describe CPUs, for example.
>>  Collecting available disk space information is even worse.  So I'd like
>>  some help on this portion.

what type of description of the CPU's are you looking for?

>>  It's also hard/impossible to devise tuning algorithms that work for both
>>  gross tuning (increase shared_buffers by 100x) and fine tuning (decrease
>>  bgwriter_interval to 45ms).
>
> I would advocate focusing on iterative improvements to an existing
> configuration rather than even bothering with generating a one-off config for
> exactly this reason.  It *is* hard/impossible to get it right in a single
> shot, because of how many parameters interact and the way bottlenecks clear,
> so why not assume from the start you're going to do it several times--then
> you've only got one piece of software to write.

nobody is asking for things to be right the first time.

> The idea I have in my head is a tool that gathers system info, connects to
> the database, and then spits out recommendations in order of expected
> effectiveness--with the specific caveat that changing too many things at one
> time isn't recommended, and some notion of parameter dependencies. The first
> time you run it, you'd be told that shared_buffers was wildly low,
> effective_cache_size isn't even in the right ballpark, and your work_mem
> looks small relative to the size of your tables; fix those before you bother
> doing anything else because any data collected with those at very wrong
> values is bogus.

why not have a much simpler script that gets these values up into the
right ballpark first? then after that the process and analysis that you
are suggesting would be useful. the problem is that the defaults are _so_
far off that no sane incremental program is going to be able to converge
on the right answer rapidly.

David Lang

>  Take two, those parameters pass their sanity tests, but
> since you're actually running at a reasonable speed now the fact that your
> tables are no longer being vacuumed frequently enough might bubble to the
> top.
>
> It would take a few passes through to nail down everything, but as long as
> it's put together such that you'd be in a similar position to the single-shot
> tool after running it once it would remove that as something separate that
> needed to be built.
>
> To argue against myself for a second, it may very well be the case that
> writing the simpler tool is the only way to get a useful prototype for
> building the more complicated one; very easy to get bogged down in feature
> creep on a grand design otherwise.

От:
Ron
Дата:

The more I think about this thread, the more I'm convinced of 2 things:

1= Suggesting initial config values is a fundamentally different
exercise than tuning a running DBMS.
This can be handled reasonably well by HW and OS snooping.  OTOH,
detailed fine tuning of a running DBMS does not appear to be amenable
to this approach.

So...
2= We need to implement the kind of timer support that Oracle 10g has.
Oracle performance tuning was revolutionized by there being
micro-second accurate timers available for all Oracle operations.
IMHO, we should learn from that.

Only the combination of the above looks like it will really be
successful in addressing the issues brought up in this thread.

Cheers,
Ron Peacetree


At 01:59 PM 4/27/2007, Josh Berkus wrote:
>Dan,
>
> > Exactly..  What I think would be much more productive is to use the
> > great amount of information that PG tracks internally and auto-tune the
> > parameters based on it.  For instance:
>
>*Everyone* wants this.  The problem is that it's very hard code to write
>given the number of variables.  I'm working on it but progress is slow,
>due to my travel schedule.
>
>--
>--Josh
>
>Josh Berkus
>PostgreSQL @ Sun
>San Francisco
>
>---------------------------(end of broadcast)---------------------------
>TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match


От:
Josh Berkus
Дата:

Greg,

> I'm not fooled--secretly you and your co-workers laugh at how easy this
> is on Solaris and are perfectly happy with how difficult it is on Linux,
> right?

Don't I wish.  There's issues with getting CPU info on Solaris, too, if you
get off of Sun Hardware to generic white boxes.  The base issue is that
there's no standardization on how manufacturers report the names of their
CPUs, 32/64bit, or clock speeds.   So any attempt to determine "how fast"
a CPU is, even on a 1-5 scale, requires matching against a database of
regexes which would have to be kept updated.

And let's not even get started on Windows.

> I joke becuase I've been re-solving some variant on this problem every
> few years for a decade now and it just won't go away.  Last time I
> checked the right answer was to find someone else who's already done it,
> packaged that into a library, and appears committed to keeping it up to
> date; just pull a new rev of that when you need it.  For example, for
> the CPU/memory part, top solves this problem and is always kept current,
> so on open-source platforms there's the potential to re-use that code.
> Now that I know that's one thing you're (understandably) fighting with
> I'll dig up my references on that (again).

Actually, total memory is not an issue, that's fairly straight forwards.
Nor is # of CPUs.  Memory *used* is a PITA, which is why I'd ignore that
part and make some assumptions.  It would have to be implemented in a
per-OS manner, which is what bogged me down.

> I would advocate focusing on iterative improvements to an existing
> configuration rather than even bothering with generating a one-off
> config for exactly this reason.  It *is* hard/impossible to get it right
> in a single shot, because of how many parameters interact and the way
> bottlenecks clear, so why not assume from the start you're going to do
> it several times--then you've only got one piece of software to write.

Sounds fine to me.

> To argue against myself for a second, it may very well be the case that
> writing the simpler tool is the only way to get a useful prototype for
> building the more complicated one; very easy to get bogged down in
> feature creep on a grand design otherwise.

It's certainly easy for me.  ;-)

--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

От:
david@lang.hm
Дата:

On Thu, 3 May 2007, Josh Berkus wrote:

> Greg,
>
>> I'm not fooled--secretly you and your co-workers laugh at how easy this
>> is on Solaris and are perfectly happy with how difficult it is on Linux,
>> right?
>
> Don't I wish.  There's issues with getting CPU info on Solaris, too, if you
> get off of Sun Hardware to generic white boxes.  The base issue is that
> there's no standardization on how manufacturers report the names of their
> CPUs, 32/64bit, or clock speeds.   So any attempt to determine "how fast"
> a CPU is, even on a 1-5 scale, requires matching against a database of
> regexes which would have to be kept updated.
>
> And let's not even get started on Windows.

I think the only sane way to try and find the cpu speed is to just do a
busy loop of some sort (ideally something that somewhat resembles the main
code) and see how long it takes. you may have to do this a few times until
you get a loop that takes long enough (a few seconds) on a fast processor

David Lang

От:
Carlos Moreno
Дата:

>> CPUs, 32/64bit, or clock speeds.   So any attempt to determine "how
>> fast"
>> a CPU is, even on a 1-5 scale, requires matching against a database of
>> regexes which would have to be kept updated.
>>
>> And let's not even get started on Windows.
>
> I think the only sane way to try and find the cpu speed is to just do
> a busy loop of some sort (ideally something that somewhat resembles
> the main code) and see how long it takes. you may have to do this a
> few times until you get a loop that takes long enough (a few seconds)
> on a fast processor

I was going to suggest just that (but then was afraid that again I may have
been just being naive) --- I can't remember the exact name, but I remember
using (on some Linux flavor) an API call that fills a struct with data
on the
resource usage for the process, including CPU time;  I assume measured
with precision  (that is, immune to issues of other applications running
simultaneously, or other random events causing the measurement to be
polluted by random noise).

As for 32/64 bit --- doesn't PG already know that information?  I mean,
./configure does gather that information --- does it not?

Carlos
--


От:
david@lang.hm
Дата:

On Thu, 3 May 2007, Carlos Moreno wrote:

>> >  CPUs, 32/64bit, or clock speeds.   So any attempt to determine "how
>> >  fast"
>> >  a CPU is, even on a 1-5 scale, requires matching against a database of
>> >  regexes which would have to be kept updated.
>> >
>> >  And let's not even get started on Windows.
>>
>>  I think the only sane way to try and find the cpu speed is to just do a
>>  busy loop of some sort (ideally something that somewhat resembles the main
>>  code) and see how long it takes. you may have to do this a few times until
>>  you get a loop that takes long enough (a few seconds) on a fast processor
>
> I was going to suggest just that (but then was afraid that again I may have
> been just being naive) --- I can't remember the exact name, but I remember
> using (on some Linux flavor) an API call that fills a struct with data on the
> resource usage for the process, including CPU time;  I assume measured
> with precision  (that is, immune to issues of other applications running
> simultaneously, or other random events causing the measurement to be
> polluted by random noise).

since what we are looking for here is a reasonable first approximation,
not perfection I don't think we should worry much about pollution of the
value. if the person has other things running while they are running this
test that will be running when they run the database it's no longer
'pollution' it's part of the environment. I think a message at runtime
that it may produce inaccurate results if you have other heavy processes
running for the config that won't be running with the database would be
good enough (remember it's not only CPU time that's affected like this,
it's disk performance as well)

> As for 32/64 bit --- doesn't PG already know that information?  I mean,
> ./configure does gather that information --- does it not?

we're not talking about comiling PG, we're talking about getting sane
defaults for a pre-compiled binary. if it's a 32 bit binary assume a 32
bit cpu, if it's a 64 bit binary assume a 64 bit cpu (all hardcoded into
the binary at compile time)

David Lang

От:
Carlos Moreno
Дата:

>> been just being naive) --- I can't remember the exact name, but I
>> remember
>> using (on some Linux flavor) an API call that fills a struct with
>> data on the
>> resource usage for the process, including CPU time;  I assume measured
>> with precision  (that is, immune to issues of other applications running
>> simultaneously, or other random events causing the measurement to be
>> polluted by random noise).
>
> since what we are looking for here is a reasonable first
> approximation, not perfection I don't think we should worry much about
> pollution of the value.

Well, it's not as much worrying as it is choosing the better among two
equally
difficult options --- what I mean is that obtaining the *real* resource
usage as
reported by the kernel is, from what I remember, equally hard as it is
obtaining
the time with milli- or micro-seconds resolution.

So, why not choosing this option?  (in fact, if we wanted to do it "the
scripted
way", I guess we could still use "time test_cpuspeed_loop" and read the
report
by the command time, specifying CPU time and system calls time.

>> As for 32/64 bit --- doesn't PG already know that information?  I mean,
>> ./configure does gather that information --- does it not?
>
> we're not talking about comiling PG, we're talking about getting sane
> defaults for a pre-compiled binary. if it's a 32 bit binary assume a
> 32 bit cpu, if it's a 64 bit binary assume a 64 bit cpu (all hardcoded
> into the binary at compile time)

Right --- I was thinking that configure, which as I understand,
generates the
Makefiles to compile applications including initdb, could plug those values
as compile-time constants, so that initdb (or a hypothetical additional
utility
that would do what we're discussing in this thread) already has them.
Anyway,
yes, that would go for the binaries as well --- we're pretty much saying
the
same thing  :-)

Carlos
--


От:
david@lang.hm
Дата:

On Thu, 3 May 2007, Carlos Moreno wrote:

>> >  been just being naive) --- I can't remember the exact name, but I
>> >  remember
>> >  using (on some Linux flavor) an API call that fills a struct with data
>> >  on the
>> >  resource usage for the process, including CPU time;  I assume measured
>> >  with precision  (that is, immune to issues of other applications running
>> >  simultaneously, or other random events causing the measurement to be
>> >  polluted by random noise).
>>
>>  since what we are looking for here is a reasonable first approximation,
>>  not perfection I don't think we should worry much about pollution of the
>>  value.
>
> Well, it's not as much worrying as it is choosing the better among two
> equally
> difficult options --- what I mean is that obtaining the *real* resource usage
> as
> reported by the kernel is, from what I remember, equally hard as it is
> obtaining
> the time with milli- or micro-seconds resolution.
>
> So, why not choosing this option?  (in fact, if we wanted to do it "the
> scripted
> way", I guess we could still use "time test_cpuspeed_loop" and read the
> report
> by the command time, specifying CPU time and system calls time.

I don't think it's that hard to get system time to a reasonable level (if
this config tuner needs to run for a min or two to generate numbers that's
acceptable, it's only run once)

but I don't think that the results are really that critical.

do we really care if the loop runs 1,000,000 times per second or 1,001,000
times per second? I'd argue that we don't even care about 1,000,000 times
per second vs 1,100,000 times per second, what we care about is 1,000,000
times per second vs 100,000 times per second, if you do a 10 second test
and run it for 11 seconds you are still in the right ballpark (i.e. close
enough that you really need to move to the stage2 tuneing to figure the
exact values)

>> >  As for 32/64 bit --- doesn't PG already know that information?  I mean,
>> >  ./configure does gather that information --- does it not?
>>
>>  we're not talking about comiling PG, we're talking about getting sane
>>  defaults for a pre-compiled binary. if it's a 32 bit binary assume a 32
>>  bit cpu, if it's a 64 bit binary assume a 64 bit cpu (all hardcoded into
>>  the binary at compile time)
>
> Right --- I was thinking that configure, which as I understand, generates the
> Makefiles to compile applications including initdb, could plug those values
> as compile-time constants, so that initdb (or a hypothetical additional
> utility
> that would do what we're discussing in this thread) already has them.
> Anyway,
> yes, that would go for the binaries as well --- we're pretty much saying the
> same thing  :-)

I'm thinking along the lines of a script or pre-compiled binary (_not_
initdb) that you could run and have it generate a new config file that has
values that are at within about an order of magnatude of being correct.

David Lang

От:
Carlos Moreno
Дата:

> I don't think it's that hard to get system time to a reasonable level
> (if this config tuner needs to run for a min or two to generate
> numbers that's acceptable, it's only run once)
>
> but I don't think that the results are really that critical.

Still --- this does not provide a valid argument against my claim.

Ok, we don't need precision --- but do we *need* to have less
precision??   I mean, you seem to be proposing that we deliberately
go out of our way to discard a solution with higher precision and
choose the one with lower precision --- just because we do not
have a critical requirement for the extra precision.

That would be a valid argument if the extra precision came at a
considerable cost  (well, or at whatever cost, considerable or not).

But my point is still that obtaining the time in the right ballpark
and obtaining the time with good precision are two things that
have, from any conceivable point of view  (programming effort,
resources consumption when executing it, etc. etc.), the exact
same cost --- why not pick the one that gives us the better results?

Mostly when you consider that:

> I'd argue that we don't even care about 1,000,000 times per second vs
> 1,100,000 times per second, what we care about is 1,000,000 times per
> second vs 100,000 times per second

Part of my claim is that measuring real-time you could get an
error like this or even a hundred times this!!   Most of the time
you wouldn't, and definitely if the user is careful it would not
happen --- but it *could* happen!!!  (and when I say could, I
really mean:  trust me, I have actually seen it happen)

Why not just use an *extremely simple* solution that is getting
information from the kernel reporting the actual CPU time that
has been used???

Of course, this goes under the premise that in all platforms there
is such a simple solution like there is on Linux  (the exact name
of the API function still eludes me, but I have used it in the past,
and I recall that it was just three or five lines of code).

Carlos
--


От:
david@lang.hm
Дата:

On Thu, 3 May 2007, Carlos Moreno wrote:

>>  I don't think it's that hard to get system time to a reasonable level (if
>>  this config tuner needs to run for a min or two to generate numbers that's
>>  acceptable, it's only run once)
>>
>>  but I don't think that the results are really that critical.
>
> Still --- this does not provide a valid argument against my claim.
>
> Ok, we don't need precision --- but do we *need* to have less
> precision??   I mean, you seem to be proposing that we deliberately
> go out of our way to discard a solution with higher precision and
> choose the one with lower precision --- just because we do not
> have a critical requirement for the extra precision.
>
> That would be a valid argument if the extra precision came at a
> considerable cost  (well, or at whatever cost, considerable or not).

the cost I am seeing is the cost of portability (getting similarly
accruate info from all the different operating systems)

> But my point is still that obtaining the time in the right ballpark
> and obtaining the time with good precision are two things that
> have, from any conceivable point of view  (programming effort,
> resources consumption when executing it, etc. etc.), the exact
> same cost --- why not pick the one that gives us the better results?
>
> Mostly when you consider that:
>
>>  I'd argue that we don't even care about 1,000,000 times per second vs
>>  1,100,000 times per second, what we care about is 1,000,000 times per
>>  second vs 100,000 times per second
>
> Part of my claim is that measuring real-time you could get an
> error like this or even a hundred times this!!   Most of the time
> you wouldn't, and definitely if the user is careful it would not
> happen --- but it *could* happen!!!  (and when I say could, I
> really mean:  trust me, I have actually seen it happen)

if you have errors of several orders of magnatude in the number of loops
it can run in a given time period then you don't have something that you
can measure to any accuracy (and it wouldn't matter anyway, if your loops
are that variable, your code execution would be as well)

> Why not just use an *extremely simple* solution that is getting
> information from the kernel reporting the actual CPU time that
> has been used???
>
> Of course, this goes under the premise that in all platforms there
> is such a simple solution like there is on Linux  (the exact name
> of the API function still eludes me, but I have used it in the past,
> and I recall that it was just three or five lines of code).

I think the problem is that it's a _different_ 3-5 lines of code for each
OS.

if I'm wrong and it's the same for the different operating systems then I
agree that we should use the most accurate clock we can get. I just don't
think we have that.

David Lang

От:
Carlos Moreno
Дата:

>> That would be a valid argument if the extra precision came at a
>> considerable cost  (well, or at whatever cost, considerable or not).
>
> the cost I am seeing is the cost of portability (getting similarly
> accruate info from all the different operating systems)

Fair enough --- as I mentioned, I was arguing under the premise that
there would be a quite similar solution for all the Unix-flavours (and
hopefully an equivalent --- and equivalently simple --- one for Windows)
...
Whether or not that premise holds, I wouldn't bet either way.

>> error like this or even a hundred times this!!   Most of the time
>> you wouldn't, and definitely if the user is careful it would not
>> happen --- but it *could* happen!!!  (and when I say could, I
>> really mean:  trust me, I have actually seen it happen)
> Part of my claim is that measuring real-time you could get an
>
> if you have errors of several orders of magnatude in the number of
> loops it can run in a given time period then you don't have something
> that you can measure to any accuracy (and it wouldn't matter anyway,
> if your loops are that variable, your code execution would be as well)

Not necessarily --- operating conditions may change drastically from
one second to the next;  that does not mean that your system is useless;
simply that the measuring mechanism is way too vulnerable to the
particular operating conditions at the exact moment it was executed.

I'm not sure if that was intentional, but you bring up an interesting
issue --- or in any case, your comment made me drastically re-think
my whole argument: do we *want* to measure the exact speed, or
rather the effective speed under normal operating conditions on the
target machine?

I know the latter is almost impossible --- we're talking about an estimate
of a random process' parameter (and we need to do it in a short period
of time) ...  But the argument goes more or less like this:  if you have a
machine that runs at  1000 MIPS, but it's usually busy running things
that in average consume 500 of those 1000 MIPS, would we want PG's
configuration file to be obtained based on 1000 or based on 500 MIPS???
After all, the CPU is, as far as PostgreSQL will be able see, 500 MIPS
fast, *not* 1000.

I think I better stop, if we want to have any hope that the PG team will
ever actually implement this feature (or similar) ...  We're probably just
scaring them!!  :-)

Carlos
--


От:
david@lang.hm
Дата:

On Thu, 3 May 2007, Carlos Moreno wrote:

>
>> >  error like this or even a hundred times this!!   Most of the time
>> >  you wouldn't, and definitely if the user is careful it would not
>> >  happen --- but it *could* happen!!!  (and when I say could, I
>> >  really mean:  trust me, I have actually seen it happen)
>>  Part of my claim is that measuring real-time you could get an
>>
>>  if you have errors of several orders of magnatude in the number of loops
>>  it can run in a given time period then you don't have something that you
>>  can measure to any accuracy (and it wouldn't matter anyway, if your loops
>>  are that variable, your code execution would be as well)
>
> Not necessarily --- operating conditions may change drastically from
> one second to the next;  that does not mean that your system is useless;
> simply that the measuring mechanism is way too vulnerable to the
> particular operating conditions at the exact moment it was executed.
>
> I'm not sure if that was intentional, but you bring up an interesting
> issue --- or in any case, your comment made me drastically re-think
> my whole argument: do we *want* to measure the exact speed, or
> rather the effective speed under normal operating conditions on the
> target machine?
>
> I know the latter is almost impossible --- we're talking about an estimate
> of a random process' parameter (and we need to do it in a short period
> of time) ...  But the argument goes more or less like this:  if you have a
> machine that runs at  1000 MIPS, but it's usually busy running things
> that in average consume 500 of those 1000 MIPS, would we want PG's
> configuration file to be obtained based on 1000 or based on 500 MIPS???
> After all, the CPU is, as far as PostgreSQL will be able see, 500 MIPS
> fast, *not* 1000.
>
> I think I better stop, if we want to have any hope that the PG team will
> ever actually implement this feature (or similar) ...  We're probably just
> scaring them!!  :-)

simpler is better (or perfect is the enemy of good enough)

if you do your sample over a few seconds (or few tens of seconds) things
will average out quite a bit.

the key is to be going for a reasonable starting point. after that then
the full analysis folks can start in with all their monitoring and
tuneing, but the 80/20 rule really applies here. 80% of the gain is from
getting 'fairly close' to the right values, and that should only be 20% of
the full 'tuneing project'

David Lang


От:
Greg Smith
Дата:

On Thu, 3 May 2007, Josh Berkus wrote:

> So any attempt to determine "how fast" a CPU is, even on a 1-5 scale,
> requires matching against a database of regexes which would have to be
> kept updated.

This comment, along with the subsequent commentary today going far astray
into CPU measurement land, serves as a perfect example to demonstrate why
I advocate attacking this from the perspective that assumes there is
already a database around we can query.

We don't have to care how fast the CPU is in any real terms; all we need
to know is how many of them are (which as you point out is relatively easy
to find), and approximately how fast each one of them can run PostgreSQL.
Here the first solution to this problem I came up with in one minute of
R&D:

-bash-3.00$ psql
postgres=# \timing
Timing is on.
postgres=# select count(*) from generate_series(1,100000,1);
  count
--------
  100000
(1 row)

Time: 106.535 ms

There you go, a completely cross-platform answer.  You should run the
statement twice and only use the second result for better consistancy.  I
ran this on all the sytems I was around today and got these results:

P4 2.4GHz    107ms
Xeon 3GHz    100ms
Opteron 275    65ms
Athlon X2 4600    61ms

For comparison sake, these numbers are more useful at predicting actual
application performance than Linux's bogomips number, which completely
reverses the relative performance of the Intel vs. AMD chips in this set
from the reality of how well they run Postgres.

My philosophy in this area is that if you can measure something
performance-related with reasonable accuracy, don't even try to estimate
it instead.  All you have to do is follow some of the downright bizzare
dd/bonnie++ results people post here to realize that there can be a vast
difference between the performance you'd expect given a particular
hardware class and what you actually get.

While I'm ranting here, I should mention that I also sigh every time I see
people suggest we should ask the user how big their database is.  The kind
of newbie user people keep talking about helping has *no idea whatsoever*
how big the data actually is after it gets into the database and all the
indexes are built.  But if you tell someone "right now this database has 1
million rows and takes up 800MB; what multiple of its current size do you
expect it to grow to?", now that's something people can work with.

--
* Greg Smith  http://www.gregsmith.com Baltimore, MD

От:
Michael Stone
Дата:

On Fri, May 04, 2007 at 12:33:29AM -0400, Greg Smith wrote:
>-bash-3.00$ psql
>postgres=# \timing
>Timing is on.
>postgres=# select count(*) from generate_series(1,100000,1);
>  count
>--------
>  100000
>(1 row)
>
>Time: 106.535 ms
>
>There you go, a completely cross-platform answer.  You should run the
>statement twice and only use the second result for better consistancy.  I
>ran this on all the sytems I was around today and got these results:
>
>P4 2.4GHz    107ms
>Xeon 3GHz    100ms
>Opteron 275    65ms
>Athlon X2 4600    61ms

PIII 1GHz    265ms
Opteron 250    39ms

something seems inconsistent here.

>For comparison sake, these numbers are more useful at predicting actual
>application performance than Linux's bogomips number, which completely
>reverses the relative performance of the Intel vs. AMD chips in this set
>from the reality of how well they run Postgres.

You misunderstand the purpose of bogomips; they have no absolute
meaning, and a comparison between different type of cpus is not
possible.

>While I'm ranting here, I should mention that I also sigh every time I see
>people suggest we should ask the user how big their database is.  The kind
>of newbie user people keep talking about helping has *no idea whatsoever*
>how big the data actually is after it gets into the database and all the
>indexes are built.

100% agreed.

Mike Stone

От:
Sebastian Hennebrueder
Дата:

 schrieb:
> On Tue, 1 May 2007, Carlos Moreno wrote:
>
>>>  large problem from a slog perspective; there is no standard way even
>>>  within Linux to describe CPUs, for example. Collecting available disk
>>>  space information is even worse.   So I'd like some help on this
>>> portion.
>>>
>>
>> Quite likely, naiveness follows...  But, aren't things like
>> /proc/cpuinfo ,
>> /proc/meminfo, /proc/partitions / /proc/diskstats standard, at the very
>> least across Linux distros?  I'm not familiar with BSD or other Unix
>> flavours, but I would expect these (or their equivalent) to exist in
>> those,
>> no?
>>
>> Am I just being naive?
>
> unfortunantly yes.
>
> across different linux distros they are fairly standard (however
> different kernel versions will change them)
>
> however different kernels need drasticly different tools to get the
> info from them.
>
> David Lang
>
Before inventing a hyper tool, we might consider to provide 3-5 example
szenarios for common hardware configurations. This consumes less time
and be discussed and defined in a couple of days. This is of course not
the correct option for a brandnew 20 spindle Sata 10.000 Raid 10 system
but these are probably not the target for default configurations.

If we carefully document these szenario they would we a great help for
people having  some hardware "between" the szenarios.

Sebastian Hennebrueder



От:
Josh Berkus
Дата:

Sebastian,

> Before inventing a hyper tool, we might consider to provide 3-5 example
> szenarios for common hardware configurations. This consumes less time
> and be discussed and defined in a couple of days. This is of course not
> the correct option for a brandnew 20 spindle Sata 10.000 Raid 10 system
> but these are probably not the target for default configurations.

That's been suggested a number of times, but some GUCs are really tied to the
*exact* amount of RAM you have available.  So I've never seen how "example
configurations" could help.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

От:
Sebastian Hennebrueder
Дата:

Josh Berkus schrieb:
> Sebastian,
>
>
>> Before inventing a hyper tool, we might consider to provide 3-5 example
>> szenarios for common hardware configurations. This consumes less time
>> and be discussed and defined in a couple of days. This is of course not
>> the correct option for a brandnew 20 spindle Sata 10.000 Raid 10 system
>> but these are probably not the target for default configurations.
>>
>
> That's been suggested a number of times, but some GUCs are really tied to the
> *exact* amount of RAM you have available.  So I've never seen how "example
> configurations" could help.
>
>

I would define the szenario as
256 MB freely available for PostgresQL
=> setting x can be of size ...


От:
Greg Smith
Дата:

On Fri, 4 May 2007, Michael Stone wrote:

>> P4 2.4GHz    107ms
>> Xeon 3GHz    100ms
>> Opteron 275    65ms
>> Athlon X2 4600    61ms
> PIII 1GHz    265ms
> Opteron 250    39ms
> something seems inconsistent here.

I don't see what you mean.  The PIII results are exactly what I'd expect,
and I wouldn't be surprised that your Opteron 250 has significantly faster
memory than my two AMD samples (which are slower than average in this
regard) such that it runs this particular task better.

Regardless, the requirement here, as Josh put it, was to get a way to
grade the CPUs on approximately a 1-5 scale.  In that context, there isn't
a need for an exact value.  The above has 3 major generations of
processors involved, and they sort out appropriately into groups; that's
all that needs to happen here.

> You misunderstand the purpose of bogomips; they have no absolute meaning, and
> a comparison between different type of cpus is not possible.

As if I don't know what the bogo stands for, ha!  I brought that up
because someone suggested testing CPU speed using some sort of idle loop.
That's exactly what bogomips does.  My point was that something that
simple can give dramatically less useful results for predicting PostgreSQL
performance than what you can find out running a real query.

--
* Greg Smith  http://www.gregsmith.com Baltimore, MD

От:
Mark Kirkwood
Дата:

Josh Berkus wrote:
> Sebastian,
>
>> Before inventing a hyper tool, we might consider to provide 3-5 example
>> szenarios for common hardware configurations. This consumes less time
>> and be discussed and defined in a couple of days. This is of course not
>> the correct option for a brandnew 20 spindle Sata 10.000 Raid 10 system
>> but these are probably not the target for default configurations.
>
> That's been suggested a number of times, but some GUCs are really tied to the
> *exact* amount of RAM you have available.  So I've never seen how "example
> configurations" could help.
>

I'm not convinced about this objection - having samples gives a bit of a
heads up on *what* knobs you should at least look at changing.

Also it might be helpful on the -general or -perf lists to be able to
say "try config 3 (or whatever we call 'em) and see what changes..."

I've certainly found the sample config files supplied with that database
whose name begins with 'M' a useful *start* when I want something better
than default...

Cheers

Mark

От:
"Steinar H. Gunderson"
Дата:

On Fri, May 04, 2007 at 09:07:53PM -0400, Greg Smith wrote:
> As if I don't know what the bogo stands for, ha!  I brought that up
> because someone suggested testing CPU speed using some sort of idle loop.
> That's exactly what bogomips does.

Just for reference (I'm sure you know, but others might not): BogoMIPS are
measured using a busy loop because that's precisely the number the kernel is
interested in -- it's used to know for how long to loop when doing small
delays in the kernel (say, "sleep 20 microseconds"), which is done using a
busy loop.

/* Steinar */
--
Homepage: http://www.sesse.net/

От:
Sebastian Hennebrueder
Дата:

Mark Kirkwood schrieb:
> Josh Berkus wrote:
>> Sebastian,
>>
>>> Before inventing a hyper tool, we might consider to provide 3-5 example
>>> szenarios for common hardware configurations. This consumes less time
>>> and be discussed and defined in a couple of days. This is of course not
>>> the correct option for a brandnew 20 spindle Sata 10.000 Raid 10 system
>>> but these are probably not the target for default configurations.
>>
>> That's been suggested a number of times, but some GUCs are really
>> tied to the *exact* amount of RAM you have available.  So I've never
>> seen how "example configurations" could help.
>>
>
> I'm not convinced about this objection - having samples gives a bit of
> a heads up on *what* knobs you should at least look at changing.
>
> Also it might be helpful on the -general or -perf lists to be able to
> say "try config 3 (or whatever we call 'em) and see what changes..."
>
> I've certainly found the sample config files supplied with that
> database whose name begins with 'M' a useful *start* when I want
> something better than default...
>
> Cheers
>
> Mark
>
Some ideas about szenarios and setting. This is meant as a discussion
proposal, I am by far not a database guru!
The settings do not provide a perfect setup but a more efficient as
compared to default setup.

criterias:
free memory
cpu ? what is the consequence?
separate spindels
total connections
Windows/linux/soloars ?

adapted settings:
max_connections
shared_buffers
effective_cache_size
/work_mem
//maintenance_work_mem

/checkpoint_segments ?
checkpoint_timeout ?
checkpoint_warning ?


Szenario a) 256 MB free memory, one disk or raid where all disks are in
the raid,
max_connections = 40
shared_buffers = 64MB
effective_cache_size = 180 MB
/work_mem = 1 MB
//maintenance_work_mem = 4 MB
/

Szenario b) 1024 MB free memory, one disk or raid where all disks are in
the raid
max_connections = 80
shared_buffers = 128 MB
effective_cache_size = 600 MB
/work_mem = 1,5 MB
//maintenance_work_mem = 16 MB
/
Szenario c) 2048 MB free memory, one disk or raid where all disks are in
the raid
max_connections = 160
shared_buffers = 256 MB
effective_cache_size = 1200 MB
/work_mem = 2 MB
//maintenance_work_mem = 32 MB
/
Szenario d) 2048 MB free memory, raid of multiple discs, second raid or
disk
max_connections = 160
shared_buffers = 256 MB
effective_cache_size = 1200 MB
/work_mem = 2 MB/
/maintenance_work_mem = 32 MB
/WAL on second spindle






От:
Jim Nasby
Дата:

On May 4, 2007, at 12:11 PM, Josh Berkus wrote:
> Sebastian,
>> Before inventing a hyper tool, we might consider to provide 3-5
>> example
>> szenarios for common hardware configurations. This consumes less time
>> and be discussed and defined in a couple of days. This is of
>> course not
>> the correct option for a brandnew 20 spindle Sata 10.000 Raid 10
>> system
>> but these are probably not the target for default configurations.
>
> That's been suggested a number of times, but some GUCs are really
> tied to the
> *exact* amount of RAM you have available.  So I've never seen how
> "example
> configurations" could help.

Uh... what GUCs are that exacting on the amount of memory? For a
decent, base-line configuration, that is.
--
Jim Nasby                                            
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)



От:
Andreas Kostyrka
Дата:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Only some problems that come to my mind with this:

a) Hardware is sometimes changed underhand without telling the customer.
Even for server-level hardware. (Been there.)

b) Hardware recommendations would get stale quickly. What use is a
hardware spec that specifies some versions of Xeons, when the supply
dries up. (the example is not contrived, certain versions of PG and
Xeons with certain usage patterns don't work that well. google for
context switch storms)

c) All that is depending upon the PG version too, so with every new
version somebody would have to reverify that the recommendations are
still valid. (Big example, partitioned tables got way better supported
in recent versions. So a setup that anticipated Seqscans over big tables
might suddenly perform way better. OTOH, there are some regressions
performance wise sometimes)

d) And to add insult to this, all that tuning (hardware and software
side) is sensitive to your workload. Before you start yelling, well,
have you ever rolled back an application version, because you notice
what stupidities the developers have added. (And yes you can try to
avoid this by adding better staging to your processes, but it's really
really hard to setup a staging environment that has the same performance
 characteristics as production.)

So, while it's a nice idea to have a set of recommended hardware setups,
I don't see much of a point. What makes a sensible database server is
not exactly a secret. Sizing slightly harder. And after that one enters
the realm of fine tuning the complete system. That does not end at the
socket on port 5432.

Andreas

Jim Nasby wrote:
> On May 4, 2007, at 12:11 PM, Josh Berkus wrote:
>> Sebastian,
>>> Before inventing a hyper tool, we might consider to provide 3-5 example
>>> szenarios for common hardware configurations. This consumes less time
>>> and be discussed and defined in a couple of days. This is of course not
>>> the correct option for a brandnew 20 spindle Sata 10.000 Raid 10 system
>>> but these are probably not the target for default configurations.
>>
>> That's been suggested a number of times, but some GUCs are really tied
>> to the
>> *exact* amount of RAM you have available.  So I've never seen how
>> "example
>> configurations" could help.
>
> Uh... what GUCs are that exacting on the amount of memory? For a decent,
> base-line configuration, that is.
> --
> Jim Nasby                                            
> EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGPX5aHJdudm4KnO0RAorYAJ9XymZy+pp1oHEQUu3VGB7G2G2cSgCfeGaU
X2bpEq3aM3tzP4MYeR02D6U=
=vtPy
-----END PGP SIGNATURE-----