Обсуждение: Changing pg_type records

Поиск
Список
Период
Сортировка

Changing pg_type records

От
"Rob Richardson"
Дата:
Greetings!
 
I just discovered the existence of a couple of functions I don't understand in one customer's PostgreSQL database:
 
-- Function: c_mode()
 
-- DROP FUNCTION c_mode();
 
CREATE OR REPLACE FUNCTION c_mode()
  RETURNS text AS
$BODY$  UPDATE pg_type SET typoutput='c_textout'    WHERE typname='SET';
 
      UPDATE pg_type SET typoutput='c_varcharout' WHERE typname='bpchar';
 
      UPDATE pg_type SET typoutput='c_textout'    WHERE typname='bytea';
 
      UPDATE pg_type SET typoutput='c_charout'    WHERE typname='char';
 
      UPDATE pg_type SET typoutput='c_textout'    WHERE typname='text';
 
      UPDATE pg_type SET typoutput='c_textout'    WHERE typname='unknown';
 
      UPDATE pg_type SET typoutput='c_varcharout' WHERE typname='varchar';
 
      select 'c_mode'::text;$BODY$
  LANGUAGE 'sql' VOLATILE;
ALTER FUNCTION c_mode() OWNER TO postgres;
-- Function: pg_mode()
 
-- DROP FUNCTION pg_mode();
 
CREATE OR REPLACE FUNCTION pg_mode()
  RETURNS text AS
$BODY$  UPDATE pg_type SET typoutput='textout'    WHERE typname='SET';
 
      UPDATE pg_type SET typoutput='varcharout' WHERE typname='bpchar';
 
      UPDATE pg_type SET typoutput='textout'    WHERE typname='bytea';
 
      UPDATE pg_type SET typoutput='charout'    WHERE typname='char';
 
      UPDATE pg_type SET typoutput='textout'    WHERE typname='text';
 
      UPDATE pg_type SET typoutput='textout'    WHERE typname='unknown';
 
      UPDATE pg_type SET typoutput='varcharout' WHERE typname='varchar';
 
      select 'pg_mode'::text;$BODY$
  LANGUAGE 'sql' VOLATILE;
ALTER FUNCTION pg_mode() OWNER TO postgres;
This customer has demanded extensive modifications to their database and our application.  I think these functions are used in encryption, since there are several other non-standard functions in this database that are related to encryption.
 
Our lead developer is in another state, and is naturally very uncommunicative.  When I asked him about the c_mode() function and expressed concern that changing pg_type records might not be a good idea, his reply was:
 
"Ignore it,  it is a database contrib routine for string I/O.   It doesn't change the database properties."
 
Are these functions really as innocuous as he claims?  It seems to me that between the time c_mode() is called and the time that pg_mode() is called, any other database access is likely to be confused. 
 
Thank you very much.
 
RobR
 

Re: Changing pg_type records

От
Tom Lane
Дата:
"Rob Richardson" <Rob.Richardson@rad-con.com> writes:
> I just discovered the existence of a couple of functions I don't
> understand in one customer's PostgreSQL database:

What do the substitute output functions do differently?

> UPDATE pg_type SET typoutput='c_textout'    WHERE typname='SET';

This one has been a no-op since 7.4, because there is no longer any such
row in pg_type; and I'm pretty sure the row was vestigial long before that.
If the underlying code is old enough that it was designed when changing
that row did something interesting, it certainly all needs a fresh look
anyway...

> Are these functions really as innocuous as he claims?  It seems to me
> that between the time c_mode() is called and the time that pg_mode() is
> called, any other database access is likely to be confused.

Yes, you definitely are taking big risks there.  Aside from the question
of how data appears to the client, this could bollix the behavior of
most PL languages too.

            regards, tom lane

Is there a meaningful benchmark?

От
"Will Rutherdale (rutherw)"
Дата:
Hi.  I'm writing up a database comparison paper in my department at
work, with Postgres being a major candidate.  I have been attempting to
research various issues and provide a meaningful comparison.

One issue I would like to give some kind of information on is
comparative performance.  When I look for such things in Google or
Wikipedia or the pgsql email archives, it's hard to find anything
reasonably definitive.  I've found isolated claims and anecdotes here
and there, and a fellow on the list here who attempted to do a
comparison between Postgres, MySQL, and Oracle but gave it up for now.

Some of the claims I've seen said that in some cases MySQL with MyISAM
ran 2x faster than Postgres, but that may have been for a special case
with only read access to the database;  whereas another one claimed that
MySQL with InnoDB was slower than Postgres.  Other people commented that
it depends on how you tune the databases.

Maybe there's nothing definitive out there.  However I'd like to get a
ballpark idea of how some databases compare, using some kind of average
case schema and application, in terms of transactions per second, on a
common hardware platform.  I would like to be able to point to a
reasonable reference, rather than engaging in handwaving myself.

Does anyone know where I could look?

-Will


Re: Is there a meaningful benchmark?

От
"Dann Corbit"
Дата:
> -----Original Message-----
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> owner@postgresql.org] On Behalf Of Will Rutherdale (rutherw)
> Sent: Thursday, March 19, 2009 1:32 PM
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] Is there a meaningful benchmark?
>
> Hi.  I'm writing up a database comparison paper in my department at
> work, with Postgres being a major candidate.  I have been attempting
to
> research various issues and provide a meaningful comparison.
>
> One issue I would like to give some kind of information on is
> comparative performance.  When I look for such things in Google or
> Wikipedia or the pgsql email archives, it's hard to find anything
> reasonably definitive.  I've found isolated claims and anecdotes here
> and there, and a fellow on the list here who attempted to do a
> comparison between Postgres, MySQL, and Oracle but gave it up for now.
>
> Some of the claims I've seen said that in some cases MySQL with MyISAM
> ran 2x faster than Postgres, but that may have been for a special case
> with only read access to the database;  whereas another one claimed
> that
> MySQL with InnoDB was slower than Postgres.  Other people commented
> that
> it depends on how you tune the databases.
>
> Maybe there's nothing definitive out there.  However I'd like to get a
> ballpark idea of how some databases compare, using some kind of
average
> case schema and application, in terms of transactions per second, on a
> common hardware platform.  I would like to be able to point to a
> reasonable reference, rather than engaging in handwaving myself.
>
> Does anyone know where I could look?

The only way to get an answer to a question like this is to actually
benchmark the application you have in mind.
And the answer won't be very good unless you have an expert on each
given system install and tune the application.

There is a regular benchmark that is run against the PostgreSQL
database.  I don't remember where to find the graphs.  Probably, someone
on the list can tell us the location.


Here are some benchmark figures:
http://tweakers.net/reviews/657/6
http://www.spec.org/jAppServer2004/results/res2007q3/jAppServer2004-2007
0606-00065.html
http://www.informationweek.com/news/software/linux/showArticle.jhtml?art
icleID=201001901
http://www.kaltenbrunner.cc/blog/index.php?/archives/21-guid.html
http://benchw.sourceforge.net/benchw_results_open3.html


P.S.
PostgreSQL seems to scale pretty well:
http://www.computerworld.com/action/article.do?command=viewArticleBasic&
taxonomyId=18&articleId=9087918&intsrc=hm_topic


My opinion:
Most benchmarks are run by someone with an axe to grind.  I never
believe them.  The TPC benchmarks are probably the most trustworthy,
because they have to be certified.  But a fast TPC/whatever benchmark is
no guarantee that *your* application will run fast.  So if you want to
evaluation several different technologies do your own benchmark.  Do
your own calculations to find out the total cost of ownership over the
lifetime of the project.  Examine all the features that are available,
and what kind of technical support is possible.  Consider the impact of
product licensing.  What happens if you need to scale up to titanic
volume?  After you have thought all factors over very carefully, make
your choice.

If you rely on someone else to do the work for you, it's really begging
for trouble.  MySQL guys will show you why MySQL is faster.  PostgreSQL
guys will show you why PostgreSQL is faster.  Oracle guys will show you
why Oracle is faster.  SQL*Server guys will show you why SQL*Server is
faster.  DB/2 guys will show you why DB/2 is faster.  Now, none of them
are lying (at least hopefully) but they are experts in their own domain
and not in the domain of the other product and they are also going to
choose those tight little corners where their product has the biggest
advantage.

IMO-YMMV.
P.S.
I'm a PostgreSQL fan and so I am likely to (perhaps unconsciously) favor
PostgreSQL in my remarks.


Re: Is there a meaningful benchmark?

От
John Cheng
Дата:
Comparison between MySQL using the MyISAM engine with PostgreSQL is really not sensible. For one, the MyISAM engine does not have transaction and foreign key support, while PostgreSQL supports transaction and foreign key. Would anyone really give up transaction and integrity for slightly more performance?

So if it makes your research easier, you can rule MySQL w/MyISAM out as an option.

On Thu, Mar 19, 2009 at 1:32 PM, Will Rutherdale (rutherw) <rutherw@cisco.com> wrote:
Hi.  I'm writing up a database comparison paper in my department at
work, with Postgres being a major candidate.  I have been attempting to
research various issues and provide a meaningful comparison.

One issue I would like to give some kind of information on is
comparative performance.  When I look for such things in Google or
Wikipedia or the pgsql email archives, it's hard to find anything
reasonably definitive.  I've found isolated claims and anecdotes here
and there, and a fellow on the list here who attempted to do a
comparison between Postgres, MySQL, and Oracle but gave it up for now.

Some of the claims I've seen said that in some cases MySQL with MyISAM
ran 2x faster than Postgres, but that may have been for a special case
with only read access to the database;  whereas another one claimed that
MySQL with InnoDB was slower than Postgres.  Other people commented that
it depends on how you tune the databases.

Maybe there's nothing definitive out there.  However I'd like to get a
ballpark idea of how some databases compare, using some kind of average
case schema and application, in terms of transactions per second, on a
common hardware platform.  I would like to be able to point to a
reasonable reference, rather than engaging in handwaving myself.

Does anyone know where I could look?

-Will


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
- John L Cheng

Re: Is there a meaningful benchmark?

От
Scott Marlowe
Дата:
On Thu, Mar 19, 2009 at 3:16 PM, Dann Corbit <DCorbit@connx.com> wrote:
> Here are some benchmark figures:
> http://tweakers.net/reviews/657/6

SNIP

> My opinion:
> Most benchmarks are run by someone with an axe to grind.  I never
> believe them.

Generally I agree with that sentiment.  However, it's interesting that
the guys at tweakers.net run (or ran?) their chat board on mysql, and
expected it to win.  They certainly didn't expect the kind of stomping
that postgresql gave to mysql, so I look at that benchmark as a
"statement against interest" kind of thing.  They were genuinely
surprised that postgresql didn't just win, but behaved much much
better than mysql under heavier loads.

Re: Is there a meaningful benchmark?

От
"Will Rutherdale (rutherw)"
Дата:
It isn't actually possible at this stage for me to benchmark "the
application" because it doesn't yet exist.  There are a number of
potential projects floating around, with as yet unwritten
specifications, to run on different platforms ranging from embedded to
larger servers.  People just want to hear what I think is a good RDBMS
to use.  My opinion won't necessarily be followed.

Nobody at this point is expecting the RDBMS to become a bottleneck, if
they are planning to actually use one at all.  However someone is sure
to ask the question, for an average application with an average
database, how is performance?

Even if such a question is answered, it isn't going to be the only
factor.  For example I have collected reasonable numbers already on
footprints of different RDBMSs, because embedded guys might find that
important if they're restricted to 64MB of flash.  On the other hand if
they went with some of the newer solid state drives with gigs of space,
then a few packages using 10s of MB wouldn't be such a problem any more.

In short, all bets are off and I'm just looking for baseline
information.  This is just a general feasibility and technology
exploration phase.

I'm aware of the limitations of hard numbers, but the more simple
information I have in different dimensions, the easier it is to convince
people not to lock in too early.

Thanks for the info, I'll check some of those references.

-Will


-----Original Message-----
From: Dann Corbit [mailto:DCorbit@connx.com]
Sent: 19 March 2009 17:16
To: Will Rutherdale (rutherw); pgsql-general@postgresql.org
Subject: RE: [GENERAL] Is there a meaningful benchmark?

The only way to get an answer to a question like this is to actually
benchmark the application you have in mind.
And the answer won't be very good unless you have an expert on each
given system install and tune the application.

There is a regular benchmark that is run against the PostgreSQL
database.  I don't remember where to find the graphs.  Probably, someone
on the list can tell us the location.


Here are some benchmark figures:
http://tweakers.net/reviews/657/6
http://www.spec.org/jAppServer2004/results/res2007q3/jAppServer2004-2007
0606-00065.html
http://www.informationweek.com/news/software/linux/showArticle.jhtml?art
icleID=201001901
http://www.kaltenbrunner.cc/blog/index.php?/archives/21-guid.html
http://benchw.sourceforge.net/benchw_results_open3.html


P.S.
PostgreSQL seems to scale pretty well:
http://www.computerworld.com/action/article.do?command=viewArticleBasic&
taxonomyId=18&articleId=9087918&intsrc=hm_topic


My opinion:
Most benchmarks are run by someone with an axe to grind.  I never
believe them.  The TPC benchmarks are probably the most trustworthy,
because they have to be certified.  But a fast TPC/whatever benchmark is
no guarantee that *your* application will run fast.  So if you want to
evaluation several different technologies do your own benchmark.  Do
your own calculations to find out the total cost of ownership over the
lifetime of the project.  Examine all the features that are available,
and what kind of technical support is possible.  Consider the impact of
product licensing.  What happens if you need to scale up to titanic
volume?  After you have thought all factors over very carefully, make
your choice.

If you rely on someone else to do the work for you, it's really begging
for trouble.  MySQL guys will show you why MySQL is faster.  PostgreSQL
guys will show you why PostgreSQL is faster.  Oracle guys will show you
why Oracle is faster.  SQL*Server guys will show you why SQL*Server is
faster.  DB/2 guys will show you why DB/2 is faster.  Now, none of them
are lying (at least hopefully) but they are experts in their own domain
and not in the domain of the other product and they are also going to
choose those tight little corners where their product has the biggest
advantage.

IMO-YMMV.
P.S.
I'm a PostgreSQL fan and so I am likely to (perhaps unconsciously) favor
PostgreSQL in my remarks.


Re: Is there a meaningful benchmark?

От
Scott Marlowe
Дата:
On Thu, Mar 19, 2009 at 3:50 PM, Will Rutherdale (rutherw)
<rutherw@cisco.com> wrote:
> Even if such a question is answered, it isn't going to be the only
> factor.  For example I have collected reasonable numbers already on
> footprints of different RDBMSs, because embedded guys might find that
> important if they're restricted to 64MB of flash.  On the other hand if
> they went with some of the newer solid state drives with gigs of space,
> then a few packages using 10s of MB wouldn't be such a problem any more.

If you're looking at embedded usage, and footprint is an issue (it
usually is even if you think it won't be) look at sqllite.  Pretty
good embedded db and lightweight.  Pgsql is not intended to compete in
the embedded space.

Re: Is there a meaningful benchmark?

От
"Will Rutherdale (rutherw)"
Дата:
I am *not* primarily interested in embedded, but I know people who are, and I have already compared with SQLite.

My main point of concern right now is for more middle sized platforms (such as an average workstation), to be able to
answerthe question of how Postgres shows in transactions per second against another RDBMS or two.   

-Will


-----Original Message-----
From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
Sent: 19 March 2009 17:57
To: Will Rutherdale (rutherw)
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Is there a meaningful benchmark?

On Thu, Mar 19, 2009 at 3:50 PM, Will Rutherdale (rutherw)
<rutherw@cisco.com> wrote:
> Even if such a question is answered, it isn't going to be the only
> factor.  For example I have collected reasonable numbers already on
> footprints of different RDBMSs, because embedded guys might find that
> important if they're restricted to 64MB of flash.  On the other hand if
> they went with some of the newer solid state drives with gigs of space,
> then a few packages using 10s of MB wouldn't be such a problem any more.

If you're looking at embedded usage, and footprint is an issue (it
usually is even if you think it won't be) look at sqllite.  Pretty
good embedded db and lightweight.  Pgsql is not intended to compete in
the embedded space.

Re: Is there a meaningful benchmark?

От
"Dann Corbit"
Дата:
> -----Original Message-----
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> owner@postgresql.org] On Behalf Of Scott Marlowe
> Sent: Thursday, March 19, 2009 2:57 PM
> To: Will Rutherdale (rutherw)
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Is there a meaningful benchmark?
>
> On Thu, Mar 19, 2009 at 3:50 PM, Will Rutherdale (rutherw)
> <rutherw@cisco.com> wrote:
> > Even if such a question is answered, it isn't going to be the only
> > factor.  For example I have collected reasonable numbers already on
> > footprints of different RDBMSs, because embedded guys might find that
> > important if they're restricted to 64MB of flash.  On the other hand
> if
> > they went with some of the newer solid state drives with gigs of
> space,
> > then a few packages using 10s of MB wouldn't be such a problem any
> more.
>
> If you're looking at embedded usage, and footprint is an issue (it
> usually is even if you think it won't be) look at sqllite.  Pretty
> good embedded db and lightweight.  Pgsql is not intended to compete in
> the embedded space.

FastDB is another good option there (it's a portable, embedded memory mapped database):
http://www.garret.ru/fastdb.html

An advantage for the SQLite option is that the grammar is a subset of PostgreSQL grammar, so if you need to scale up,
youhave a ready path. 


Re: Is there a meaningful benchmark?

От
"Will Rutherdale (rutherw)"
Дата:
I'm having trouble with the tweakers reference below.

I was hoping to see something where hardware platform is held constant while RDBMS is varied, but it seems to be just
theopposite.  Or maybe I didn't read the article the right way. 

-Will


-----Original Message-----
From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
Sent: 19 March 2009 17:36
To: Dann Corbit
Cc: Will Rutherdale (rutherw); pgsql-general@postgresql.org
Subject: Re: [GENERAL] Is there a meaningful benchmark?

On Thu, Mar 19, 2009 at 3:16 PM, Dann Corbit <DCorbit@connx.com> wrote:
> Here are some benchmark figures:
> http://tweakers.net/reviews/657/6

SNIP

> My opinion:
> Most benchmarks are run by someone with an axe to grind.  I never
> believe them.

Generally I agree with that sentiment.  However, it's interesting that
the guys at tweakers.net run (or ran?) their chat board on mysql, and
expected it to win.  They certainly didn't expect the kind of stomping
that postgresql gave to mysql, so I look at that benchmark as a
"statement against interest" kind of thing.  They were genuinely
surprised that postgresql didn't just win, but behaved much much
better than mysql under heavier loads.

Re: Is there a meaningful benchmark?

От
Scott Marlowe
Дата:
On Thu, Mar 19, 2009 at 4:02 PM, Will Rutherdale (rutherw)
<rutherw@cisco.com> wrote:
> I am *not* primarily interested in embedded, but I know people who are, and I have already compared with SQLite.
>
> My main point of concern right now is for more middle sized platforms (such as an average workstation), to be able to
answerthe question of how Postgres shows in transactions per second against another RDBMS or two. 
>

Way back right after the earth's crust had cooled I deployed a
corporate intranet on pgsql 6.5.2.  I believe we had Wooly Mammoth for
dinner that night.  It took a fair bit of work to keep that machine
happy, and every new version was an eye opener in terms of
performance, reliability, and capability improvements.  By the time
8.0 came out I was more than prepared to use it for some pretty hefty
work.  Now that 8.3 is out and 8.4 is out, for any kind of
intermediate size application (thousands of users daily, hundreds of
gigs of data) I'll put PostgreSQL against any other DBMS and expect it
to do well.

Where I work now, we handle 1.5 million or so users, a large chunk of
which log in each day, several times a day.  We had to beef up our
servers to 8 core / 16 drive machines (two of them) to handle the load
reliably.  We have enough spare capacity to handle about 3 to 4 times
the number of users we now have.

You are far more likely to be bitten by lack of familiarity with ANY
db you choose, and the real issue will be support and training.
Oracle, Pgsql, DB2, Interbase / Firebird, MySQL w/ innodb can all
handle the kind of load you've kind of hand waved about here.  How
much money and time you'll spend setting them up, learning them,
supporting them, and using them will be far more important than
anything a benchmark is likely to tell you right now.

Re: Is there a meaningful benchmark?

От
"Will Rutherdale (rutherw)"
Дата:

I am already aware of this issue, and am preparing to explain it to people.

 

Having said that,  if it were possible to set up a reasonably average database, with a test application that hits it with a reasonable mix of select, insert, and update operations, and run it one at a time against different RDBMSs on the same machine, then it might yield some simple numbers that could be quoted to people in case they asked.

 

The goal is not to absolutely determine which is fastest in the made-up scenario, I don’t think anyone cares.  However it would be interesting to see if the different RDBMSs came in within a reasonable percentage of each other.

 

An analogy would be BogoMIPS.  Nobody takes it that seriously because they know there are numerous factors that affect how a machine runs under different applications.  But as a quick sanity check BogoMIPS can be useful at times.

 

-Will

 

 


From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of John Cheng
Sent: 19 March 2009 17:27
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Is there a meaningful benchmark?

 

Comparison between MySQL using the MyISAM engine with PostgreSQL is really not sensible. For one, the MyISAM engine does not have transaction and foreign key support, while PostgreSQL supports transaction and foreign key. Would anyone really give up transaction and integrity for slightly more performance?

Re: Is there a meaningful benchmark?

От
Scott Marlowe
Дата:
On Thu, Mar 19, 2009 at 3:56 PM, Will Rutherdale (rutherw)
<rutherw@cisco.com> wrote:
> I'm having trouble with the tweakers reference below.
>
> I was hoping to see something where hardware platform is held constant while RDBMS is varied, but it seems to be just
theopposite.  Or maybe I didn't read the article the right way. 

The tweakers test is actually VERY telling.  With a constant load, how
well do the database scale as you improve the hardware you've given
them to run on.  Hardware is cheap, downtime is not.  If you can toss
a 16 core server at a performance problem for $20,000 or so, that's
probably way cheaper than watching your main db chug under load and go
down twice a day.

Conversely, telling the bossman you need that 16 core server to
improve performance and seeing the new server collapse under load
faster than the old one due to poor concurrency is not gonna win you a
lot of brownie points.

Re: Is there a meaningful benchmark?

От
Scott Marlowe
Дата:
On Thu, Mar 19, 2009 at 4:11 PM, Will Rutherdale (rutherw)
<rutherw@cisco.com> wrote:
> I am already aware of this issue, and am preparing to explain it to people.

Well, keep in mind that MOST people are gonna wave you off, and figure
it doesn't matter that much.  Lots of developers are pretty cavalier
with their user's data.  Now, if the data is easily replaceable, then
fine.  Put a price on the data.  Pay attention to that price, and
don't underestimate it.  It's usually the most expensive thing on a db
server.

> Having said that,  if it were possible to set up a reasonably average
> database, with a test application that hits it with a reasonable mix of
> select, insert, and update operations, and run it one at a time against
> different RDBMSs on the same machine, then it might yield some simple
> numbers that could be quoted to people in case they asked.

But will you be comparing apples to apples.  MySQL plays fast and
loose with data integrity (with myisam tables).  So at least make sure
you're comparing what you need to compare.  If data integrity is
important, you need to use innodb.  If it isn't, then just use flat
files.

> The goal is not to absolutely determine which is fastest in the made-up
> scenario, I don’t think anyone cares.  However it would be interesting to
> see if the different RDBMSs came in within a reasonable percentage of each
> other.

But again, if you're comparing a Go kart to a pickup truck, you need
to know that's what you're doing.  The abstract numbers mean little
outside of that fact.

> An analogy would be BogoMIPS.  Nobody takes it that seriously because they
> know there are numerous factors that affect how a machine runs under
> different applications.  But as a quick sanity check BogoMIPS can be useful
> at times.

Sorry, but I respectfully disagree.  Bogomips has the word bogo in it
on purpose.  It means, literally, almost nothing useful to the user.

Any modern db is fast enough for the kind of low level stuff you've
mentioned so far.  Until you have a better idea what your
application(s) might look like, it's hard to offer any real advice
besides "avoid myisam"

Re: Is there a meaningful benchmark?

От
Scott Marlowe
Дата:
On Thu, Mar 19, 2009 at 3:26 PM, John Cheng <chonger.cheng@gmail.com> wrote:
> Comparison between MySQL using the MyISAM engine with PostgreSQL is really
> not sensible. For one, the MyISAM engine does not have transaction and
> foreign key support, while PostgreSQL supports transaction and foreign key.
> Would anyone really give up transaction and integrity for slightly more
> performance?

Sure, if the application fit.  If I have to load 100Meg files into a
db, run some simple extraction on them, and output it back out, and
can recreate all my data at the drop of a hat, then mysql / myisam
might be a good match.

I am no a fan of MySQL, more because the company behind it seems to be
lost and drifting than the db itself.  Bugs that are 5 years old
finally getting fixed after Monty chided them in december?  Come on,
PostgreSQL comes out with a near bug free new version every 1 to 2
years.  PostgreSQL stomps bugs in hours or days that MySQL AB takes
YEARS to fix, and then get reintroduced (see the order by on innodb
indexed field debacle for that story) and I just don't trust the
company or the db for anything complex.  But as a tool it has some
uses that it's good enough at I could use it if I had to.

Re: Is there a meaningful benchmark?

От
"Will Rutherdale (rutherw)"
Дата:
Okay, you've given me some useful information.

As the original subject line indicates, I'm open to the idea that no such benchmark exists.

If anyone asks about this stuff, I can just say that performance varies widely by database and application, that
Postgresperforms well enough against other RDBMSs, that Postgres is known to scale up well and make good use of
concurrency,and that I couldn't find any clear benchmark results to back it up. 

Of course, if I *did* find any benchmark values then I could have used that to dispel false rumours from the MySQL
guys. However it looks like simple measured indicators aren't easy to come by. 

-Will


-----Original Message-----
From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
Sent: 19 March 2009 18:14
To: Will Rutherdale (rutherw)
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Is there a meaningful benchmark?

On Thu, Mar 19, 2009 at 3:56 PM, Will Rutherdale (rutherw)
<rutherw@cisco.com> wrote:
> I'm having trouble with the tweakers reference below.
>
> I was hoping to see something where hardware platform is held constant while RDBMS is varied, but it seems to be just
theopposite.  Or maybe I didn't read the article the right way. 

The tweakers test is actually VERY telling.  With a constant load, how
well do the database scale as you improve the hardware you've given
them to run on.  Hardware is cheap, downtime is not.  If you can toss
a 16 core server at a performance problem for $20,000 or so, that's
probably way cheaper than watching your main db chug under load and go
down twice a day.

Conversely, telling the bossman you need that 16 core server to
improve performance and seeing the new server collapse under load
faster than the old one due to poor concurrency is not gonna win you a
lot of brownie points.

Re: Is there a meaningful benchmark?

От
"Dann Corbit"
Дата:
Here is another interesting benchmark with a particular user's
application:
http://blog.page2rss.com/2007/01/postgresql-vs-mysql-performance.html

P.S.
Oracle won't let you publish any benchmark numbers.
So if you find an Oracle comparison, it's "unauthorized"


Re: Is there a meaningful benchmark?

От
Scott Marlowe
Дата:
On Thu, Mar 19, 2009 at 4:24 PM, Will Rutherdale (rutherw)
<rutherw@cisco.com> wrote:
> Okay, you've given me some useful information.
>
> As the original subject line indicates, I'm open to the idea that no such benchmark exists.
>
> If anyone asks about this stuff, I can just say that performance varies widely by database and application, that
Postgresperforms well enough against other RDBMSs, that Postgres is known to scale up well and make good use of
concurrency,and that I couldn't find any clear benchmark results to back it up. 
>
> Of course, if I *did* find any benchmark values then I could have used that to dispel false rumours from the MySQL
guys. However it looks like simple measured indicators aren't easy to come by. 

Well, the tweakers benchmarks are a pretty good mysql pgsql
comparison, although getting older now.  Apparently mysql has improved
a lot of their concurrency issues that were uncovered in that one.
Pgsql has just continued to get faster.

Google MySQL gotchas for a list of reasons to avoid it.  There's a
pgsql gotcha list, it's shorter, and the gotchas are a lot less likely
to induce a case of the vapors.

Re: Is there a meaningful benchmark?

От
"Will Rutherdale (rutherw)"
Дата:
This looks similar to things I've seen before.  MyISAM can be made to
look twice as fast as Postgres if the application is cooked to throw
away transaction processing, updates, and referential integrity, none of
which MyISAM seems to support well.

I plan to make a point of this to people, as I personally have
experience working with RDBMSs in the past and understand the importance
of these capabilities.  However not everyone I talk to will have any
experiences with databases and understand the issues.

That's why I was looking for a more balanced benchmark that exercises
said capabilities.

-Will


-----Original Message-----
From: Dann Corbit [mailto:DCorbit@connx.com]
Sent: 19 March 2009 18:26
To: Scott Marlowe; Will Rutherdale (rutherw)
Cc: pgsql-general@postgresql.org
Subject: RE: [GENERAL] Is there a meaningful benchmark?

Here is another interesting benchmark with a particular user's
application:
http://blog.page2rss.com/2007/01/postgresql-vs-mysql-performance.html

Re: Is there a meaningful benchmark?

От
ries van Twisk
Дата:
On Mar 19, 2009, at 4:50 PM, Will Rutherdale (rutherw) wrote:

> It isn't actually possible at this stage for me to benchmark "the
> application" because it doesn't yet exist.  There are a number of
> potential projects floating around, with as yet unwritten
> specifications, to run on different platforms ranging from embedded to
> larger servers.  People just want to hear what I think is a good RDBMS
> to use.  My opinion won't necessarily be followed.

If you name embedded... and you are even thinking of using MySQl,
don't forget you need to sell a MySQL license for each product you
sell to a customer.

First know what your requirements are, then strip away the systems that
don't comply with your requirements. Then think of what sort of
transactions
you require.. then check what RDBM might be good for you... for the
couple that are
left over do the tests...

Ries


>
> Nobody at this point is expecting the RDBMS to become a bottleneck, if
> they are planning to actually use one at all.  However someone is sure
> to ask the question, for an average application with an average
> database, how is performance?
>
> Even if such a question is answered, it isn't going to be the only
> factor.  For example I have collected reasonable numbers already on
> footprints of different RDBMSs, because embedded guys might find that
> important if they're restricted to 64MB of flash.  On the other hand
> if
> they went with some of the newer solid state drives with gigs of
> space,
> then a few packages using 10s of MB wouldn't be such a problem any
> more.
>
> In short, all bets are off and I'm just looking for baseline
> information.  This is just a general feasibility and technology
> exploration phase.
>
> I'm aware of the limitations of hard numbers, but the more simple
> information I have in different dimensions, the easier it is to
> convince
> people not to lock in too early.
>
> Thanks for the info, I'll check some of those references.
>
> -Will
>
>
> -----Original Message-----
> From: Dann Corbit [mailto:DCorbit@connx.com]
> Sent: 19 March 2009 17:16
> To: Will Rutherdale (rutherw); pgsql-general@postgresql.org
> Subject: RE: [GENERAL] Is there a meaningful benchmark?
>
> The only way to get an answer to a question like this is to actually
> benchmark the application you have in mind.
> And the answer won't be very good unless you have an expert on each
> given system install and tune the application.
>
> There is a regular benchmark that is run against the PostgreSQL
> database.  I don't remember where to find the graphs.  Probably,
> someone
> on the list can tell us the location.
>
>
> Here are some benchmark figures:
> http://tweakers.net/reviews/657/6
> http://www.spec.org/jAppServer2004/results/res2007q3/jAppServer2004-2007
> 0606-00065.html
> http://www.informationweek.com/news/software/linux/showArticle.jhtml?art
> icleID=201001901
> http://www.kaltenbrunner.cc/blog/index.php?/archives/21-guid.html
> http://benchw.sourceforge.net/benchw_results_open3.html
>
>
> P.S.
> PostgreSQL seems to scale pretty well:
> http://www.computerworld.com/action/article.do?command=viewArticleBasic&
> taxonomyId=18&articleId=9087918&intsrc=hm_topic
>
>
> My opinion:
> Most benchmarks are run by someone with an axe to grind.  I never
> believe them.  The TPC benchmarks are probably the most trustworthy,
> because they have to be certified.  But a fast TPC/whatever
> benchmark is
> no guarantee that *your* application will run fast.  So if you want to
> evaluation several different technologies do your own benchmark.  Do
> your own calculations to find out the total cost of ownership over the
> lifetime of the project.  Examine all the features that are available,
> and what kind of technical support is possible.  Consider the impact
> of
> product licensing.  What happens if you need to scale up to titanic
> volume?  After you have thought all factors over very carefully, make
> your choice.
>
> If you rely on someone else to do the work for you, it's really
> begging
> for trouble.  MySQL guys will show you why MySQL is faster.
> PostgreSQL
> guys will show you why PostgreSQL is faster.  Oracle guys will show
> you
> why Oracle is faster.  SQL*Server guys will show you why SQL*Server is
> faster.  DB/2 guys will show you why DB/2 is faster.  Now, none of
> them
> are lying (at least hopefully) but they are experts in their own
> domain
> and not in the domain of the other product and they are also going to
> choose those tight little corners where their product has the biggest
> advantage.
>
> IMO-YMMV.
> P.S.
> I'm a PostgreSQL fan and so I am likely to (perhaps unconsciously)
> favor
> PostgreSQL in my remarks.
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general



            regards, Ries van Twisk


-------------------------------------------------------------------------------------------------
Ries van Twisk
tags: Freelance TYPO3 Glassfish JasperReports JasperETL Flex Blaze-DS
WebORB PostgreSQL DB-Architect
email: ries@vantwisk.nl
web:   http://www.rvantwisk.nl/
skype: callto://r.vantwisk
Phone: +1-810-476-4196
SIP: +1-747-690-5133







Вложения

Re: Is there a meaningful benchmark?

От
John Cheng
Дата:


On Thu, Mar 19, 2009 at 3:23 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Thu, Mar 19, 2009 at 3:26 PM, John Cheng <chonger.cheng@gmail.com> wrote:
> Comparison between MySQL using the MyISAM engine with PostgreSQL is really
> not sensible. For one, the MyISAM engine does not have transaction and
> foreign key support, while PostgreSQL supports transaction and foreign key.
> Would anyone really give up transaction and integrity for slightly more
> performance?

Sure, if the application fit.  If I have to load 100Meg files into a
db, run some simple extraction on them, and output it back out, and
can recreate all my data at the drop of a hat, then mysql / myisam
might be a good match.

Right . There will be situations where MySQL with MyISAM will be a good fit. My point was more that it doesn't make sense to simply compare "speed" becuase other things needs to be taken into account.



I am no a fan of MySQL, more because the company behind it seems to be
lost and drifting than the db itself.  Bugs that are 5 years old
finally getting fixed after Monty chided them in december?  Come on,
PostgreSQL comes out with a near bug free new version every 1 to 2
years.  PostgreSQL stomps bugs in hours or days that MySQL AB takes
YEARS to fix, and then get reintroduced (see the order by on innodb
indexed field debacle for that story) and I just don't trust the
company or the db for anything complex.  But as a tool it has some
uses that it's good enough at I could use it if I had to.



--
- John L Cheng

Re: Is there a meaningful benchmark?

От
Greg Williamson
Дата:
Dann Corbit wrote:



>
> Here is another interesting benchmark with a particular user's
> application:
> http://blog.page2rss.com/2007/01/postgresql-vs-mysql-performance.html
>
> P.S.
> Oracle won't let you publish any benchmark numbers.
> So if you find an Oracle comparison, it's "unauthorized"

True enough. That said I feel comfortable in revealing that a former employer of mine ran some serious tests of
PostgreSQL vs Oracle (with an emphasis of postGIS and Oracle's equivalent) about a year and a half ago. 

Oracle was consistently 5-15% faster depending on the precise benchmark. This was judged to be not worth the extra
moneyfor more Oracle licenses. In some other environments that edge might be worth the money. Oracle does spend
resourceson its products and so I don't find the speed difference surprising. But when you consider the speed with
whichOracle produces patches vs. the Postgres folks the winner is clearly the latter. 

Greg Williamson





Re: Is there a meaningful benchmark?

От
Scott Ribe
Дата:
> That's why I was looking for a more balanced benchmark that exercises
> said capabilities.

OK, here's the thing, I will give you *one* sample issue to think about, as
an illustration of the kinds of differences there are.

- PostgresQL uses MVCC instead of row or page locking, which means only
writer vs writer locks, which means in many situations less contention and
better throughput as your concurrent load goes up.

- But it also means slower count(*) and no covering indexes. (Keeping
multiple versions of index pages would be too slow for many other
operations, so PG has go to the table pages to check which rows are actually
visible in the current transaction.)

See? Overall, PG performs very well and is certainly comparable to the "big
boys" on the same hardware. But the strengths & weaknesses can be *very*
specific to particular queries.

--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice



Re: Is there a meaningful benchmark?

От
"Will Rutherdale (rutherw)"
Дата:
You have a point, as do a lot of the other folks.

However, keeping the KISS principle in mind, you can create a benchmark
that simply sets up a sample database and forks off a bunch of processes
to do random updates for an hour, say.  Dead simple.

In fact, it's so simple that I've already written the code and have it
running against Postgres now.  A Perl DBI script runs in a loop
updating, and later prints out the number of transactions it completed
in the given time frame.  At the end I just tally up the numbers and I
have the Will Rutherdale benchmark number for Postgres.  It will give me
a simple number in units of transactions per second.

When I get time I'll set up an equivalent MySQL database on the same
machine and run it against that.

I have 'top' running in one screen and can see all the copies of my
script and all the copies of postmaster and the CPU they're using.

The degree to which I got excoriated just for asking the question
convinced me that I was onto something and had to do it myself.

-Will


-----Original Message-----
From: Scott Ribe [mailto:scott_ribe@killerbytes.com]
Sent: 20 March 2009 11:27
To: Will Rutherdale (rutherw); pgsql-general@postgresql.org
Subject: Re: [GENERAL] Is there a meaningful benchmark?

> That's why I was looking for a more balanced benchmark that exercises
> said capabilities.

OK, here's the thing, I will give you *one* sample issue to think about,
as
an illustration of the kinds of differences there are.

- PostgresQL uses MVCC instead of row or page locking, which means only
writer vs writer locks, which means in many situations less contention
and
better throughput as your concurrent load goes up.

- But it also means slower count(*) and no covering indexes. (Keeping
multiple versions of index pages would be too slow for many other
operations, so PG has go to the table pages to check which rows are
actually
visible in the current transaction.)

See? Overall, PG performs very well and is certainly comparable to the
"big
boys" on the same hardware. But the strengths & weaknesses can be *very*
specific to particular queries.

--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice



Re: Is there a meaningful benchmark?

От
Scott Marlowe
Дата:
On Fri, Mar 20, 2009 at 9:55 AM, Will Rutherdale (rutherw)
<rutherw@cisco.com> wrote:
> You have a point, as do a lot of the other folks.
>
> However, keeping the KISS principle in mind, you can create a benchmark
> that simply sets up a sample database and forks off a bunch of processes
> to do random updates for an hour, say.  Dead simple.
>
> In fact, it's so simple that I've already written the code and have it
> running against Postgres now.  A Perl DBI script runs in a loop
> updating, and later prints out the number of transactions it completed
> in the given time frame.  At the end I just tally up the numbers and I
> have the Will Rutherdale benchmark number for Postgres.  It will give me
> a simple number in units of transactions per second.

Just keep in mind that a single thread updating the database is not a
very realistic benchmark.  Databases tend to not get interesting until
there are dozens to hundreds of threads running against it at the same
time.

Re: Is there a meaningful benchmark?

От
Tom Lane
Дата:
"Will Rutherdale (rutherw)" <rutherw@cisco.com> writes:
> However, keeping the KISS principle in mind, you can create a benchmark
> that simply sets up a sample database and forks off a bunch of processes
> to do random updates for an hour, say.  Dead simple.

Indeed, and more than likely dead useless.  The only benchmark that
really counts is one's live application, which is probably not
update-only and probably has a fairly non-random update pattern too.

What people have been trying to point out to you is that you can
certainly measure *something* with a benchmark test that has no thought
behind it, but it's not clear whether the numbers you come up with will
have any real-world value.

            regards, tom lane

Re: Is there a meaningful benchmark?

От
Greg Smith
Дата:
On Thu, 19 Mar 2009, Dann Corbit wrote:

> Oracle won't let you publish any benchmark numbers.
> So if you find an Oracle comparison, it's "unauthorized"

You can find some useful comparisons that include Oracle if you look at
the audited benchmarks from the TPC.  I've collected links to a bunch at

http://wiki.postgresql.org/wiki/Why_PostgreSQL_Instead_of_MySQL:_Comparing_Reliability_and_Speed_in_2007#Sun_Microsystems_2007_jAppServer2004_Benchmark_Results

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Is there a meaningful benchmark?

От
Greg Smith
Дата:
On Fri, 20 Mar 2009, Will Rutherdale (rutherw) wrote:

> However, keeping the KISS principle in mind, you can create a benchmark
> that simply sets up a sample database and forks off a bunch of processes
> to do random updates for an hour, say.  Dead simple.

There's a benchmark tool that does something like this that comes with
PostgreSQL named pgbench.  A MySQL-oriented tool named sysbench also can
do that, and it supports running against PostgreSQL as well--badly though,
so it's hard to use that to do a fair comparison.

Simple benchmarks tend to measure only one thing though, and it's often
not what you think you're measuring.  For example, pgbench produces a
transactions/per second number.  It's useful for comparing the relative
performance between two PostgreSQL instances, and people think it gives
you an idea of transactional performance.  What the actual magnitude of
the result measures in many cases is instead how well the generated data
set fits in cache.

If you're doing something update heavy, a lot of the time what you
actually will measure is how fast your disk can seek, process a disk
commit done using fsync, or some combination of the two.  If you're not
careful to make sure you're using the same level of disk commit guarantee
on both installations, it's real easy to get bad benchmark results here.
The intro to that subject from the PostgreSQL perspective is at
http://www.postgresql.org/docs/8.3/static/wal-reliability.html

On MySQL, the parameters that controls this behavior are described
starting at
http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit

For something with lots of disk commits, it's critical that you have both
systems configured identically here.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Is there a meaningful benchmark?

От
Paul Ramsey
Дата:
http://blog.redfin.com/devblog/2007/11/elephant_versus_dolphin_which_is_faster_which_is_smarter.html

Here's another PgSQL/MySQL testimonial, with a spatial twist :)

P

On Fri, Mar 20, 2009 at 10:15 AM, Greg Smith <gsmith@gregsmith.com> wrote:
> On Fri, 20 Mar 2009, Will Rutherdale (rutherw) wrote:
>
>> However, keeping the KISS principle in mind, you can create a benchmark
>> that simply sets up a sample database and forks off a bunch of processes
>> to do random updates for an hour, say.  Dead simple.
>
> There's a benchmark tool that does something like this that comes with
> PostgreSQL named pgbench.  A MySQL-oriented tool named sysbench also can do
> that, and it supports running against PostgreSQL as well--badly though, so
> it's hard to use that to do a fair comparison.
>
> Simple benchmarks tend to measure only one thing though, and it's often not
> what you think you're measuring.  For example, pgbench produces a
> transactions/per second number.  It's useful for comparing the relative
> performance between two PostgreSQL instances, and people think it gives you
> an idea of transactional performance.  What the actual magnitude of the
> result measures in many cases is instead how well the generated data set
> fits in cache.
>
> If you're doing something update heavy, a lot of the time what you actually
> will measure is how fast your disk can seek, process a disk commit done
> using fsync, or some combination of the two.  If you're not careful to make
> sure you're using the same level of disk commit guarantee on both
> installations, it's real easy to get bad benchmark results here. The intro
> to that subject from the PostgreSQL perspective is at
> http://www.postgresql.org/docs/8.3/static/wal-reliability.html
>
> On MySQL, the parameters that controls this behavior are described starting
> at
> http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit
>
> For something with lots of disk commits, it's critical that you have both
> systems configured identically here.
>
> --
> * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: Is there a meaningful benchmark?

От
"Will Rutherdale (rutherw)"
Дата:
Point taken.

Thank you for the help.

-Will


-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: 20 March 2009 12:06
To: Will Rutherdale (rutherw)
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Is there a meaningful benchmark?

"Will Rutherdale (rutherw)" <rutherw@cisco.com> writes:
> However, keeping the KISS principle in mind, you can create a
benchmark
> that simply sets up a sample database and forks off a bunch of
processes
> to do random updates for an hour, say.  Dead simple.

Indeed, and more than likely dead useless.  The only benchmark that
really counts is one's live application, which is probably not
update-only and probably has a fairly non-random update pattern too.

What people have been trying to point out to you is that you can
certainly measure *something* with a benchmark test that has no thought
behind it, but it's not clear whether the numbers you come up with will
have any real-world value.

            regards, tom lane

Re: Is there a meaningful benchmark?

От
"Will Rutherdale (rutherw)"
Дата:
Thanks for the references.

-Will


-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Paul Ramsey
Sent: 20 March 2009 13:29
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Is there a meaningful benchmark?

http://blog.redfin.com/devblog/2007/11/elephant_versus_dolphin_which_is_faster_which_is_smarter.html

Here's another PgSQL/MySQL testimonial, with a spatial twist :)

P

On Fri, Mar 20, 2009 at 10:15 AM, Greg Smith <gsmith@gregsmith.com> wrote:
> On Fri, 20 Mar 2009, Will Rutherdale (rutherw) wrote:
>
>> However, keeping the KISS principle in mind, you can create a benchmark
>> that simply sets up a sample database and forks off a bunch of processes
>> to do random updates for an hour, say.  Dead simple.
>
> There's a benchmark tool that does something like this that comes with
> PostgreSQL named pgbench.  A MySQL-oriented tool named sysbench also can do
> that, and it supports running against PostgreSQL as well--badly though, so
> it's hard to use that to do a fair comparison.
>
> Simple benchmarks tend to measure only one thing though, and it's often not
> what you think you're measuring.  For example, pgbench produces a
> transactions/per second number.  It's useful for comparing the relative
> performance between two PostgreSQL instances, and people think it gives you
> an idea of transactional performance.  What the actual magnitude of the
> result measures in many cases is instead how well the generated data set
> fits in cache.
>
> If you're doing something update heavy, a lot of the time what you actually
> will measure is how fast your disk can seek, process a disk commit done
> using fsync, or some combination of the two.  If you're not careful to make
> sure you're using the same level of disk commit guarantee on both
> installations, it's real easy to get bad benchmark results here. The intro
> to that subject from the PostgreSQL perspective is at
> http://www.postgresql.org/docs/8.3/static/wal-reliability.html
>
> On MySQL, the parameters that controls this behavior are described starting
> at
> http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit
>
> For something with lots of disk commits, it's critical that you have both
> systems configured identically here.
>
> --
> * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Is there a meaningful benchmark?

От
Scott Marlowe
Дата:
On Fri, Mar 20, 2009 at 9:27 AM, Scott Ribe <scott_ribe@killerbytes.com> wrote:
>> That's why I was looking for a more balanced benchmark that exercises
>> said capabilities.
>
> OK, here's the thing, I will give you *one* sample issue to think about, as
> an illustration of the kinds of differences there are.
>
> - PostgresQL uses MVCC instead of row or page locking, which means only
> writer vs writer locks, which means in many situations less contention and
> better throughput as your concurrent load goes up.

It's also important to point out that writers don't necessarily block
other writers.  As long as they're operating on different ranges of
the data set.  You can have dozens of writers streaming data in with
differening primary keys all running together.

Then of course, you have the ability to have readers block writers
using for update, which turns a reader into a writer-to-(possibly)-be.

A lot of older dbs had locking by the page, not by the record, or as
with myisam, by the whole table.  Page locking lead to records that
shared pages locking each other needlessly.  Table locking leads to
even longer queues forming under heavy write load.   This does NOT
happen in pgsql if you're updating / inserting independent records.
One of the reasons postgres scales so well is it keeps writes 'cheap'
in that they don't have to interact with anything other than writes to
the same records.  If you've got millions of records and thousands
being updated, writes not blocking writes combined with record level
locking versus page level locking (or worse table level locking), pg
can handle pretty high concurrent write loads on the right hardware
and still maintain a good throughput on reads.  And concurrent write
load is what usually cripples a server.

Re: Is there a meaningful benchmark?

От
Raymond O'Donnell
Дата:
On 26/03/2009 23:10, Scott Marlowe wrote:
> It's also important to point out that writers don't necessarily block
> other writers.  As long as they're operating on different ranges of
> the data set.  You can have dozens of writers streaming data in with
> differening primary keys all running together.

Do you have to do anything special to have this happen - e.g. table
partitioning? - Or does it just happen automagically based on the
primary key?

Ray.

------------------------------------------------------------------
Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland
rod@iol.ie
Galway Cathedral Recitals: http://www.galwaycathedral.org/recitals
------------------------------------------------------------------

Re: Is there a meaningful benchmark?

От
Scott Marlowe
Дата:
On Thu, Mar 26, 2009 at 5:23 PM, Raymond O'Donnell <rod@iol.ie> wrote:
> On 26/03/2009 23:10, Scott Marlowe wrote:
>> It's also important to point out that writers don't necessarily block
>> other writers.  As long as they're operating on different ranges of
>> the data set.  You can have dozens of writers streaming data in with
>> differening primary keys all running together.
>
> Do you have to do anything special to have this happen - e.g. table
> partitioning? - Or does it just happen automagically based on the
> primary key?

No, it's pretty much automatic.  Pgsql creates new records for every
update or insert, so there's no weird locking on the original records
to make it slow.  Everything just goes in the WAL and gets flushed out
to disk later.  Setting up commit siblings helps on some loads.
That's about it. No rocket science or triggers really needed for lots
of writes at the same time.

Re: Is there a meaningful benchmark?

От
Scott Marlowe
Дата:
On Thu, Mar 26, 2009 at 5:10 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

> It's also important to point out that writers don't necessarily block
> other writers.  As long as they're operating on different ranges of
> the data set.  You can have dozens of writers streaming data in with
> differening primary keys all running together.

To be fare, some database apps have a few rows they update in a near
continuous stream, and they row lock.  These databases are often
better served by db2 or some other row locking database than pgsql
where you may or may not have problems with bloating.

There are times picking a database some will turn left when they
should have turned right.  Picking pgsql for this kind of app is
usually that kind of situation.  Innodb would have bloat problems I'd
assume too.  MyIsam's table locking puts it out, and you're left with
one of the other databases.  I wonder how firebird handles that
situation.

Re: Is there a meaningful benchmark?

От
"Will Rutherdale (rutherw)"
Дата:
I've had lots of experience fighting with the LOAD FROM command with an
old application using an old version of Informix.  When the data to be
loaded is large enough, Informix (dbaccess) hits a wall.  Tracking it
down with tools generally shows that zillions of locks are accumulating.
The system thrashes.

So far I haven't found this problem with Postgres.

Incidentally, in case people are interested in hearing the follow-up to
my original message, my home-cooked benchmark with lots of processes
doing tons of updates seemed to initially show MyISAM winning over
Postgres.

However I spent some time reading what I could from the manual on
performance improvement and realized that commits might be an issue.  I
moved a commit() call out of the main loop, and bang!  Postgres showed a
substantial performance advantage over MyISAM.

-Will


-----Original Message-----
From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
Sent: 27 March 2009 02:25
To: Scott Ribe
Cc: Will Rutherdale (rutherw); pgsql-general@postgresql.org
Subject: Re: [GENERAL] Is there a meaningful benchmark?

To be fare, some database apps have a few rows they update in a near
continuous stream, and they row lock.  These databases are often
better served by db2 or some other row locking database than pgsql
where you may or may not have problems with bloating.

Re: Is there a meaningful benchmark?

От
Scott Marlowe
Дата:
On Fri, Mar 27, 2009 at 10:32 AM, Will Rutherdale (rutherw)
<rutherw@cisco.com> wrote:
>
> Incidentally, in case people are interested in hearing the follow-up to
> my original message, my home-cooked benchmark with lots of processes
> doing tons of updates seemed to initially show MyISAM winning over
> Postgres.
>
> However I spent some time reading what I could from the manual on
> performance improvement and realized that commits might be an issue.  I
> moved a commit() call out of the main loop, and bang!  Postgres showed a
> substantial performance advantage over MyISAM.

This is one of those areas where postgresql acts very differently from
other dbs.  In lots of other dbs big transactions are the performance
killer.  In PostgreSQL big transactions are the way to get better
performace.

Re: Is there a meaningful benchmark?

От
"Will Rutherdale (rutherw)"
Дата:
I see.  Perhaps earlier I alienated some people by talking about
transactions per second, which is not necessarily an appropriate unit.

Thanks for the tip.

-Will


-----Original Message-----
From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
Sent: 27 March 2009 16:08
To: Will Rutherdale (rutherw)
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Is there a meaningful benchmark?

This is one of those areas where postgresql acts very differently from
other dbs.  In lots of other dbs big transactions are the performance
killer.  In PostgreSQL big transactions are the way to get better
performace.

Re: Is there a meaningful benchmark?

От
Scott Marlowe
Дата:
On Fri, Mar 27, 2009 at 2:27 PM, Will Rutherdale (rutherw)
<rutherw@cisco.com> wrote:
> I see.  Perhaps earlier I alienated some people by talking about
> transactions per second, which is not necessarily an appropriate unit.

Hard to say, tps is still important.  A lot of loads need to change
one thing in a transaction and change it fairly fast.  A lot of other
loads update / insert 10,000 rows, and tps doesn't mean as much as
rows inserted / second.  I doubt anybody took it personally though.
More likely just got confused over what you and they were measuring.

Storytime!

One of my best friends came from MSSQL and MySQL shop, and when
introduced him to pgsql, he was bitching at how slow this file full of
inserts was inserting.  This was in the pg 7.0 days, when pg was not
super fast, but quite fast enough if you knew a few basic tricks of
the trade.  I looked at his load file, which was basically one insert
after another, 10k of them.  I pushed him aside, typed in begin; on
top and commit; on the bottom and told him to run it now.  It finished
in about 10 seconds.  It had taken > 10 minutes before that.  He's now
my boss at another company, and pretty hard core pgsql fan.