Обсуждение: Case studies

Поиск
Список
Период
Сортировка

Case studies

От
"Elvis Henríquez"
Дата:
Hello everybody.

Rather than asking for a technical detail or writing about a problem,
I'm asking for up-to-date case studies involving PostgreSQL.

The company where I'm actually working is migrating some apps; one
requires a Data Warehouse, and I'm proposing PostgreSQL, but they're
thinking of Oracle, as the system has one table (among others) with 20
fields and more then 19 millions of records, and this exists for 2
subcompanies, which are actually in different databases, but the
migration project implies joining the two subcompanies data.

This amount of data is the result of 8 years of usage, and the data
growing rate has increased in the last two years.

They asked me for case studies that shows this, so they can go ahead
with the migration to PostgreSQL.

The ones shown in "http://www.postgresql.org/about/casestudies/" are
out-dated and don't show amount of records or database/table size or
growth.

I hope you could help me with this.

Re: Case studies

От
"Rodrigo De León"
Дата:
On Nov 26, 2007 10:34 PM, Elvis Henríquez <henriquez.elvis@gmail.com> wrote:
> I hope you could help me with this.

See:
http://www.lethargy.org/~jesus/archives/66-Big-Bad-PostgreSQL.html

Re: Case studies

От
"Scott Marlowe"
Дата:
On Nov 26, 2007 9:34 PM, Elvis Henríquez <henriquez.elvis@gmail.com> wrote:
> Hello everybody.
>
> Rather than asking for a technical detail or writing about a problem,
> I'm asking for up-to-date case studies involving PostgreSQL.
>
> The company where I'm actually working is migrating some apps; one
> requires a Data Warehouse, and I'm proposing PostgreSQL, but they're
> thinking of Oracle, as the system has one table (among others) with 20
> fields and more then 19 millions of records, and this exists for 2
> subcompanies, which are actually in different databases, but the
> migration project implies joining the two subcompanies data.
>
> This amount of data is the result of 8 years of usage, and the data
> growing rate has increased in the last two years.

That's actually pretty small.  Where I work we have a data warehouse
of similar design (a few large tables, a few small lookup tables).  It
has 86,840,447 rows and takes up 44 Gigs of space.  It sits on a
single CPU box with a 4 disk RAID-10 and runs queries covering a few
minutes to a few days worth of monitoring data.  Sequential scanning
the whole main table takes 621 seconds or so (10+ minutes).  We add
150 to 200k rows a day to it.

Selecting a days's worth of data takes ~ 350ms.  A week's worth takes
2 to 10 seconds depending on how much is cached.

This is a small database for either oracle or postgresql.  Talk your
bosses into giving postgresql a try if you can.  You should be able to
build a 20million test database in an afternoon or so, so it's not
like you'll be dedicating thousands of man hours to test it.

Re: Case studies

От
"Elvis Henríquez"
Дата:
Thank you very much for your reply. That's exactly what I was needing
to demonstrate the power of PostgreSQL.

On Nov 27, 2007 1:46 AM, Rodrigo De León <rdeleonp@gmail.com> wrote:
> On Nov 26, 2007 10:34 PM, Elvis Henríquez <henriquez.elvis@gmail.com> wrote:
> > I hope you could help me with this.
>
> See:
> http://www.lethargy.org/~jesus/archives/66-Big-Bad-PostgreSQL.html
>

Re: Case studies

От
"Elvis Henríquez"
Дата:
That's what I'm telling to them, but they want proven results in other
companies, but I'll manage it.

Thanks for your reply and information.

Best regards.

On Nov 27, 2007 2:12 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> That's actually pretty small.  Where I work we have a data warehouse
> of similar design (a few large tables, a few small lookup tables).  It
> has 86,840,447 rows and takes up 44 Gigs of space.  It sits on a
> single CPU box with a 4 disk RAID-10 and runs queries covering a few
> minutes to a few days worth of monitoring data.  Sequential scanning
> the whole main table takes 621 seconds or so (10+ minutes).  We add
> 150 to 200k rows a day to it.
>
> Selecting a days's worth of data takes ~ 350ms.  A week's worth takes
> 2 to 10 seconds depending on how much is cached.
>
> This is a small database for either oracle or postgresql.  Talk your
> bosses into giving postgresql a try if you can.  You should be able to
> build a 20million test database in an afternoon or so, so it's not
> like you'll be dedicating thousands of man hours to test it.
>

Re: Case studies

От
"Gregory Williamson"
Дата:

FWIW, we've got a billing database with several moderate tables, one is 48 million rows with about 75k per day; and some much larger tables that are effectively partitioned by date so that some client software (Excel, etc.) doesn't choke on them. And we've got some spatial tables that dwarf the billing stuff.

The case studies you looked at all still valid, really, if you think about it -- PostgreSQL has gotten *better* with successive releases, and has also gotten faster. So fewer worries and more speed. If 7.4.x could handle something it is almost certain that 8.2 can do so as well and probably better.

You might also take a quick look at the case studies for the PostGIS extension at <http://postgis.refractions.net/documentation/casestudies/>. (The GlobeXplorer stuff is us; DigitalGlobe is now using PostgreSQL as well.)

We migrated from Informix 9.3 some years ago (mainly due to cost) and haven't regretted it.

Greg Williamson
Senior DBA
GlobeXplorer, a DigitalGlobe Company

(some legal disclaimers snipped since this is a public mail ... please excuse the top-posting but this is a lame reader and the post I am responding to was top-posted so at least this is consistent)

-----Original Message-----
From: pgsql-admin-owner@postgresql.org on behalf of Elvis Henríquez
Sent: Tue 11/27/2007 7:32 PM
To: Scott Marlowe
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Case studies

That's what I'm telling to them, but they want proven results in other
companies, but I'll manage it.

Thanks for your reply and information.

Best regards.

On Nov 27, 2007 2:12 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> That's actually pretty small.  Where I work we have a data warehouse
> of similar design (a few large tables, a few small lookup tables).  It
> has 86,840,447 rows and takes up 44 Gigs of space.  It sits on a
> single CPU box with a 4 disk RAID-10 and runs queries covering a few
> minutes to a few days worth of monitoring data.  Sequential scanning
> the whole main table takes 621 seconds or so (10+ minutes).  We add
> 150 to 200k rows a day to it.
>
> Selecting a days's worth of data takes ~ 350ms.  A week's worth takes
> 2 to 10 seconds depending on how much is cached.
>
> This is a small database for either oracle or postgresql.  Talk your
> bosses into giving postgresql a try if you can.  You should be able to
> build a 20million test database in an afternoon or so, so it's not
> like you'll be dedicating thousands of man hours to test it.
>

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Re: Case studies

От
"=?ISO-8859-1?Q?Elvis_Henr=EDquez?="
Дата:

Good evening everybody!!!

Here is an extract of the mail I sent to the project coordinators with the information gathered thanks to the help offered by all the mailing lists I asked, this is, you; I wanted to post it so future people looking the same information can find it all in one place:

(a few information is in Spanish, as I'm from Venezuela, where the project is being developed. I left it as it is an extract of the mail; but you can check the URLs I quote)

##############
## Mail Begin ##
##############

Entre algunos Casos de Estudio exitosos de implementación de PostgreSQL
como Data Warehouse, te puedo mencionar los siguientes (cito las partes
resaltantes y coloco el enlace donde se puede observar el documento
completo):

1)
El mejor de todos los Casos de Estudio revisados, que incluye hasta
aspectos técnicos de implementación, está disponible un PDF de 3.7 Mb

Sugiero que lean dicho documento.

"Oracle features we need
Partitioning
Statistics and Aggregations
rank over partition, lead, lag, etc.
Large selects (100GB)
Autonomous transactions
Replication from Oracle (to Oracle)"

Todos esos aspectos los consiguieron con PostgreSQL

http://www.lethargy.org/~jesus/archives/66-Big-Bad-PostgreSQL.html


2)
"Greenplum is the supplier of Bizgres MPP, a massively parallel
implementation of the PostgreSQL open source database system. Frontier
Airlines, the second largest carrier at Denver International Airport,
uses Bizgres for competitive flight pricing. Bizgres, like PostgreSQL,
is an open source code project. Greenplum sponsors the Bizgres project
and sells technical support and services on top of it."
...
"The appliance can be stacked with 10, 40, or 100 terabytes of disk"
...

http://www.informationweek.com/showArticle.jhtml;jsessionid=KQDGEIMWGP3D0QSNDLRCKH0CJUNN2JVN?articleID=191600197&queryText=postgresql

3)

...
"This talk was on converting a really large (over 3 terabytes, largest
table is 1.8 billion rows) data warehouse database from Oracle 8i to
PostgreSQL"
...
"The reason for choosing Postgres over MySQL was that Postgres has a
much longer history with the kind of advanced features he needed. They
needed (and were able to hack PostgreSQL to get) the following features:
1. Data partitioning (spreading tables over multiple drives)
2. Large selects (50-million-row return sets, over 100GB of data)
2. Incremental COMMITs for really, really long queries
3. Replication"

http://www.sitepoint.com/blogs/2006/08/05/oscon-2006-big-bad-postgresql/

4)

4.1)
"FWIW, we've got a billing database with several moderate tables, one is
48 million rows with about 75k per day; and some much larger tables that
are effectively partitioned by date so that some client software (Excel,
etc.) doesn't choke on them. And we've got some spatial tables that
dwarf the billing stuff."

4.2)
"> Where I work we have a data warehouse
> of similar design (a few large tables, a few small lookup tables).  It
> has 86,840,447 rows and takes up 44 Gigs of space.  It sits on a
> single CPU box with a 4 disk RAID-10 and runs queries covering a few
> minutes to a few days worth of monitoring data.  Sequential scanning
> the whole main table takes 621 seconds or so (10+ minutes).  We add
> 150 to 200k rows a day to it.
>
> Selecting a days's worth of data takes ~ 350ms.  A week's worth takes
> 2 to 10 seconds depending on how much is cached."

http://archives.postgresql.org/pgsql-admin/2007-11/msg00237.php

5)

5.1)
"GlobeXplorer serves terrabytes of imagery to clients around the world
using PostGIS as their production database server. In 2004, GlobeXplorer
migrated from Informix to PostGIS, and now they are serving over a
million requests a day with PostGIS."

5.2)
"The national mapping agency of France manages over 100 million
topographic features in PostGIS/PostgreSQL and provides read/write
access to over 100 field researchers around the country."

http://postgis.refractions.net/documentation/casestudies/

6)
Afilias, Ltd. - A Database for the Internet
"The primary technical factors for using PostgreSQL are its support of
SQL standard and multi-version concurrency control (MVCC). MVCC helps
ensure that each database transaction sees a consistent view of the
database while ensuring that high transaction volumes are supported.
Additionally, PostgreSQL supports over 65,000 GB of data in a single
table, which is more than adequate for the needs if .ORG or .INFO
database management support. .INFO and .ORG each use PostgreSQL version 7.2"

http://www.netezza.com/products/prodlit.cfm

Nota: en el enlace anterior existen varios documentos en Formato de
Documento Portátil (PDF) que incluyen a su vez varios casos de estudio
en diferentes áreas, como E-Business, Gobierno, Venta,
Telecomunicaciones. y otros.

7)
En el site de PostgreSQL aparecen varios Casos de Estudio de
implementaciones exitosas de PostgreSQL, pero los mismos no indican
cantidades de registros o tamaño de bases de datos; pero también sirven
como soporte.
Estos casos de estudio están basados en versiones anteriores de
PostgreSQL (7.2) , pero como me comentó un DBA en uno de los foros de
PostgreSQL: Si la versión 7.2 pudo manejar esos caso, con la gran
cantidad de mejoras que ha tenido PostgreSQL en las últimas versiones
(sobre todo en la 8.2), se conseguirán mejores resultados.

8)
En Wikipedia, en la acepción de PostgreSQL, se hacen referencias a
usuarios del gestor.

Textualmente:

"Prominent users

   * Afilias, domain registries for .org, .info and others.[10]
   * Sony Online multiplayer online games.[11]
   * BASF, shopping platform for their agribusiness portal.[12]
   * hi5.com social networking portal.[13]
   * Skype VoIP application, central business databases.[14]"

http://en.wikipedia.org/wiki/PostgreSQL#Prominent_users

En la acepción en Wikipedia en Español aparecen más:

" Usuarios Destacados

   * .org, .info, .mobi y .aero registros de dominios por Afilias [1]
   * La American Chemical Society
   * BASF
   * IMDB
   * Skype
   * TiVo
   * Penny Arcade
   * Sony Online [2]
   * U.S. Departamento de Trabajo
   * USPS
   * VeriSign
   * Wisconsin Circuit Court Access con 6 * 180GB DBs replicados en
tiempo real.
   * OpenACS y .LRN"

http://es.wikipedia.org/wiki/PostgreSQL#Usuarios_Destacados





Adicional a los Casos de Estudio antes mencionados, están las pruebas de
rendimiento (Benchmarks).

De Wikipedia:

"Many informal performance studies of PostgreSQL have been done[7] but
the first industry-standard and peer-validated benchmark was completed
in June 2007 using the Sun Java Systems Application Server 9.0 Platform
Edition, UltraSPARC T1 based Sun Fire server and Postgres 8.2[8]. This
result of 778.14 SPECjAppServer2004 JOPS@Standard compares favourably
with the 874 JOPS@Standard with Oracle 10 on an Itanium based HP-UX [7]

In August 2007, Sun submitted an improved benchmark score of 813.73
SPECjAppServer2004 JOPS@Standard. With the system under test at a
reduced price, the price/performance improved from $US 84.98/JOPS to $US
70.57/JOPS. [9]"

http://en.wikipedia.org/wiki/PostgreSQL#Benchmarks

#############
## Mail Ends ##
#############

I hope the above information be helpful for somebody else.

Based in the previous information, I let you know that PostgreSQL was approved for the development of the Data Warehouse!!! :-D, but, anyway, the main application will be developed in Oracle. :-(

Again, thank you very much to everybody for your help, collaboration, comments y giving time in achieving what we have today: "The world's most advanced open source database".

Cheers and good night.