Data Warehouse Reevaluation - MySQL vs Postgres

Поиск
Список
Период
Сортировка
Hi all,
I had a difficult time deciding which list to post
this to, so please forgive me if this list doesn't
perfectly match my questions.  My decision will not
solely be based on performance, but it is the primary
concern.  I would be very appreciative if you all
could comment on my test plan.  Real world examples of
a Postgres implementation of >=600G with a web
front-end would be great, or any data warehouse with
some size to it.

The dilemma:
The time has come to reevaluate/rearchitect an
application which I built about 3 years ago.  There
are no performance concerns with MySQL, but it would
benefit greatly from stored procedures, views, etc.
It is a very large rolling data warehouse that inserts
about 4.5 million rows every 2 hours and subsequently
rolls this data off the back end of a 90 day window.
A web interface has been designed for querying the
data warehouse.

Migration planning is much easier with views and
stored procedures and this is my primary reason for
evaluating Postgres once again.  As the application
grows I want to have the ability to provide backward
compatible views for those who are accustomed to the
current structure.  This is not possible in MySQL.

Some of the mining that we do could benefit from
stored procedures as well.  MySQL may have these in
the works, but we won't be able to move to a version
of MySQL that supports stored procs for another year
or two.

Requirements:
Merge table definition equivalent.  We use these
extensively.

Merge table equivalent with all tables containing over
100M rows(and about 40 columns, some quite wide) will
need to do index scans in at least 5 seconds(MySQL
currently does 2, but we can live with 5) and return
~200 rows.

Um, gonna sound silly, but the web interface has to
remain "snappy" under load.  I don't see this as a
major concern since you don't require table locking.

If business logic is moved to the database(likely with
Postgres) performance for inserting with light logic
on each insert has to keep up with the 4.5M inserts
per 2 hours(which MySQL completes in ~35min
currently).  Acceptable numbers for this aggregation
would be 45-55min using stored procedures.

About 3 years ago I did some performance
characterizations of Postgres vs. MySQL and didn't
feel Postgres was the best solution.  3 years later
we've won runner-up for MySQL application of the
year(behind Saabre).  Oddly enough this reevaluting
database strategy is right on the coattails of this
award.  I'll  begin writing my business logic within
the next week and start migrating test data shortly
thereafter.  Case studies would be very beneficial as
I put together my analysis.

Also, this is for a Fortune 500 company that uses this
data warehouse extensively.  It is an internal
application that is widely used and gets about 4 hits
per employee per day.  Much of customer care, data
engineering, plant engineering(it's a cable company),
and marketing use the interface.  I've done a great
deal of press for MySQL and would be equally willing
to tout the benefits of Postgres to trade rags,
magazines, etc provided the results are favorable.

Here's our case study if you're interested . . .
http://www.mysql.com/customers/customer.php?id=16

Thoughts, suggestions?

'njoy,
Mark

В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Bad performance with hashjoin
Следующее
От: Dennis Bjorklund
Дата:
Сообщение: Re: Data Warehouse Reevaluation - MySQL vs Postgres