Обсуждение: Open request for benchmarking input

Поиск
Список
Период
Сортировка

Open request for benchmarking input

От
Jeff Frost
Дата:
Did you folks see this article on Slashdot with a fellow requesting input on
what sort of benchmarks to run to get a good Postgresql vs Mysql dataset?
Perhaps this would be a good opportunity for us to get some good benchmarking
done.  Here's the article link and top text:

http://ask.slashdot.org/article.pl?sid=05/11/26/0317213

  David Lang asks: "With the release of MySQL 5.0, PostgreSQL 8.1, and the flap
over Oracle purchasing InnoDB, the age old question of performance is coming
up again. I've got some boxes that were purchased for a data warehouse project
that isn't going to be installed for a month or two, and could probably
squeeze some time in to do some benchmarks on the machines. However, the
question is: what should be done that's reasonably fair to both MySQL and
PostgreSQL? We all know that careful selection of the benchmark can seriously
skew the results, and I want to avoid that (in fact I would consider it close
to ideal if the results came out that each database won in some tests). I
would also not like to spend time generating the benchmarks only to have the
losing side accuse me of being unfair. So, for both MySQL and PostgreSQL
advocates, what would you like to see in a series of benchmarks?"

  "The hardware I have available is as follows:

     * 2x dual Opteron 8G ram, 2x144G 15Krpm SCSI
     * 2x dual Opteron 8G ram, 2x72G 15Krpm SCSI
     * 1x dual Opteron 16G ram, 2x36G 15Krpm SCSI 16x400G 7200rpm SATA

I would prefer to use Debian Sarge as the base install of the systems (with
custom built kernels), but would compile the databases from source rather then
using binary packages.

For my own interests, I would like to at least cover the following bases: 32
bit vs 64 bit vs 64 bit kernel + 32 bit user-space; data warehouse type tests
(data >> memory); and web prefs test (active data RAM)

What specific benchmarks should be run, and what other things should be
tested? Where should I go for assistance on tuning each database, evaluating
the benchmark results, and re-tuning them?"

---
Jeff Frost, Owner
<jeff@frostconsultingllc.com>
Frost Consulting, LLC     http://www.frostconsultingllc.com/
Phone: 650-780-7908    FAX: 650-649-1954

Re: Open request for benchmarking input

От
"Qingqing Zhou"
Дата:
"Jeff Frost" <jeff@frostconsultingllc.com> wrote
>
> Did you folks see this article on Slashdot with a fellow requesting input
> on what sort of benchmarks to run to get a good Postgresql vs Mysql
> dataset? Perhaps this would be a good opportunity for us to get some good
> benchmarking done.
>  "The hardware I have available is as follows:
>
>     * 2x dual Opteron 8G ram, 2x144G 15Krpm SCSI
>     * 2x dual Opteron 8G ram, 2x72G 15Krpm SCSI
>     * 1x dual Opteron 16G ram, 2x36G 15Krpm SCSI 16x400G 7200rpm SATA
>

I see this as a good chance to evaluate and boost PostgreSQL performance in
general.

My two concerns:
(1) How long will David Lang spend on the benchmarking? We need *continous*
feedback after each tuning. This is important and Mark Wong has done great
job on this.
(2) The hardware configuration may not reflect all potentials of PostgreSQL.
For example, so far, PostgreSQL does not pay much attention in reducing I/O
cost, so a stronger RAID definitely will benefit PostgreSQL performance.

>
> For my own interests, I would like to at least cover the following bases:
> 32 bit vs 64 bit vs 64 bit kernel + 32 bit user-space; data warehouse type
> tests (data >> memory); and web prefs test (active data RAM)
>

Don't forget TPCC (data > memory, with intensive updates). So the benchmarks
in my mind include TPCC, TPCH and TPCW.

Regards,
Qingqing



Re: Open request for benchmarking input

От
"Luke Lonergan"
Дата:
Jeff, Qingqing,

On 11/26/05 10:57 AM, "Qingqing Zhou" <zhouqq@cs.toronto.edu> wrote:

>
> "Jeff Frost" <jeff@frostconsultingllc.com> wrote
>>
>> Did you folks see this article on Slashdot with a fellow requesting input
>> on what sort of benchmarks to run to get a good Postgresql vs Mysql
>> dataset? Perhaps this would be a good opportunity for us to get some good
>> benchmarking done.
>>  "The hardware I have available is as follows:
>>
>>     * 2x dual Opteron 8G ram, 2x144G 15Krpm SCSI
>>     * 2x dual Opteron 8G ram, 2x72G 15Krpm SCSI
>>     * 1x dual Opteron 16G ram, 2x36G 15Krpm SCSI 16x400G 7200rpm SATA
>>

I suggest specifying a set of basic system / HW benchmarks to baseline the
hardware before each benchmark is run.  This has proven to be a major issue
with most performance tests.  My pick for I/O is bonnie++.

Your equipment allows you the opportunity to benchmark all 5 machines
running together as a cluster - this is important to measure maturity of
solutions for high performance warehousing.  Greenplum can provide you a
license for Bizgres MPP for this purpose.

> (2) The hardware configuration may not reflect all potentials of PostgreSQL.
> For example, so far, PostgreSQL does not pay much attention in reducing I/O
> cost, so a stronger RAID definitely will benefit PostgreSQL performance.

The 16x SATA drives should be great, provided you have a high performance
RAID adapter configured properly.  You should be able to get 800MB/s of
sequential scan performance by using a card like the 3Ware 9550SX.  I've
also heard that the Areca cards are good (how good?).  Configuration of the
I/O must be validated though - I've seen as low as 25MB/s from a
misconfigured system.

>> For my own interests, I would like to at least cover the following bases:
>> 32 bit vs 64 bit vs 64 bit kernel + 32 bit user-space; data warehouse type
>> tests (data >> memory); and web prefs test (active data RAM)
>>
>
> Don't forget TPCC (data > memory, with intensive updates). So the benchmarks
> in my mind include TPCC, TPCH and TPCW.

I agree with Qingqing, though I think the OSTG DBT-3 (very similar to TPC-H)
is sufficient for data warehousing.

This is a fairly ambitious project - one problem I see is that MySQL may not
run all of these benchmarks, particularly the DBT-3.  Also - would the rules
allow for mixing / matching pluggable features of the DBMS?  Innodb versus
MyISAM?

- Luke



Re: Open request for benchmarking input

От
Ron
Дата:
At 03:15 PM 11/26/2005, Luke Lonergan wrote:

>I suggest specifying a set of basic system / HW benchmarks to baseline the
>hardware before each benchmark is run.  This has proven to be a major issue
>with most performance tests.  My pick for I/O is bonnie++.
>
>Your equipment allows you the opportunity to benchmark all 5 machines
>running together as a cluster - this is important to measure maturity of
>solutions for high performance warehousing.  Greenplum can provide you a
>license for Bizgres MPP for this purpose.
...and detailed config / tuning specs as well for it or everyone is
probably wasting their time.  For instance, it seems fairly clear
that the default 8KB table size and default read ahead size are both
pessimal, at least for non OLTP-like apps.  In addition, there's been
a reasonable amount of evidence that xfs should be the file system of
choice for pg.

Things like optimal RAID strip size, how to allocate tables to
various IO HW, and what levels of RAID to use for each RAID set also
have to be defined.


>The 16x SATA drives should be great, provided you have a high performance
>RAID adapter configured properly.  You should be able to get 800MB/s of
>sequential scan performance by using a card like the 3Ware 9550SX.  I've
>also heard that the Areca cards are good (how good?).  Configuration of the
>I/O must be validated though - I've seen as low as 25MB/s from a
>misconfigured system.
The Areca cards, particularly with 1-2GB of buffer cache, are the
current commodity RAID controller performance leader.  Better
performance can be gotten out of HW from vendors like Xyratex, but it
will cost much more.


Ron



Re: Open request for benchmarking input

От
Andreas Pflug
Дата:
Qingqing Zhou wrote:
> "Jeff Frost" <jeff@frostconsultingllc.com> wrote

>> "The hardware I have available is as follows:
>>
>>    * 2x dual Opteron 8G ram, 2x144G 15Krpm SCSI
>>    * 2x dual Opteron 8G ram, 2x72G 15Krpm SCSI
>>    * 1x dual Opteron 16G ram, 2x36G 15Krpm SCSI 16x400G 7200rpm SATA
>>
> (2) The hardware configuration may not reflect all potentials of PostgreSQL.

These boxes don't look like being designed for a DB server. The first
are very CPU bound, and the third may be a good choice for very large
amounts of streamed data, but not optimal for TP random access.

Hopefully, when publicly visible benchmarks are performed, machines are
used that comply with common engineering knowledge, ignoring those guys
who still believe that sequential performance is the most important
issue on disk subsystems for DBMS.

Regards,
Andreas

Re: Open request for benchmarking input

От
David Lang
Дата:
Ok, I've subscribed (hopefully list volume won't kill me :-)

I'm covering several things in this message since I didn't receive the
prior messages in the thread

first off these benchamrks are not being sponsered by my employer, they
need the machines burned in and so I'm going to use them for the tests
while burning them in. I can spend a little official time on this,
justifying it as learning the proper config/tuneing settings for our
project, but not too much. and I'm deliberatly not useing my work e-mail
and am not mentioning the company name, so please respect this and keep
the two seperate (some of you have dealt with us formally, others will
over the next few months)

this means no remote access for people (but I am willing to run tests and
send configs around). in fact the machines will not have Internet access
for the duration of the tests.

it also means I'm doing almost all the configuration work for this on my
own time (nights and weekends). the machines will not be moved to
production for a couple of months. this should mean that we can go back
and forth with questions and answers (albeit somewhat slowly, with me
checking in every night) while whatever tests are done happen during the
day. once we get past the system tuneing and start doing different tests
it would probably be helpful if people can send me scripts to run that I
can just let loose.

I don't have any money to pay for benchmark suites, so if things like the
TPC benchmarks cost money to do I won't be able to do them

to clarify the hardware

I have 5 machines total to work with, this includes client machines to
make the queries (I may be able to get hold of 2-3 more, but they are
similar configs)

none of these have dual-core processors on them, the CPU's are 246 or 252
Opterons (I'll have to double check which is in which machine, I think the
large disk machine has 246's and the others 252's)

I have access to a gig-E switch that's on a fairly idle network to use to
connect these machines

the large-disk machine has 3ware 9500 series 8-port SATA controllers in
them with battery backup. in our official dealings with Greenplum we
attempted to do a set of benchmarks on that machine, but had horrible
timing with me being too busy when they worked with us on this and we
never did figure out the best setting to use for this machine.

Part of the reason I posted this to /. rather then just contacting you and
MySQL folks directly is that I would like to see a reasonable set of
benchmarks agreed to and have people with different hardware then I have
run the same sets of tests. I know the tuneing will be different for
different hardware, but if we can have a bunch of people run similar tests
we should learn a lot.

David Lang

Re: Open request for benchmarking input

От
David Lang
Дата:
by the way, this is the discussion that promped me to start this project
http://lwn.net/Articles/161323/

David Lang

Re: Open request for benchmarking input

От
"Steinar H. Gunderson"
Дата:
On Sat, Nov 26, 2005 at 01:57:47PM -0500, Qingqing Zhou wrote:
> Don't forget TPCC (data > memory, with intensive updates). So the benchmarks
> in my mind include TPCC, TPCH and TPCW.

I'm lost in all those acronyms, but am I right in assuming that none of these
actually push the planner very hard? We keep on pushing that "PostgreSQL is a
lot better than MySQL when it comes to many joins and complex queries"
(mostly because the planner is a lot more mature -- does MySQL even keep
statistics yet?), but I'm not sure if there are any widely used benchmarks
available that actaully excercise that.

/* Steinar */
--
Homepage: http://www.sesse.net/