Re: Postgres for a "data warehouse", 5-10 TB

Поиск
Список
Период
Сортировка
От Andy Colson
Тема Re: Postgres for a "data warehouse", 5-10 TB
Дата
Msg-id 4E6CC2BB.1060704@squeakycode.net
обсуждение исходный текст
Ответ на Postgres for a "data warehouse", 5-10 TB  (Igor Chudov <ichudov@gmail.com>)
Ответы Re: Postgres for a "data warehouse", 5-10 TB  (Claudio Freire <klaussfreire@gmail.com>)
Список pgsql-performance
On 09/11/2011 07:35 AM, Igor Chudov wrote:
> I have been a MySQL user for years, including owning a few
> multi-gigabyte databases for my websites, and using it to host
> algebra.com <http://algebra.com> (about 12 GB database).
>
> I have had my ups and downs with MySQL. The ups were ease of use and
> decent performance for small databases such as algebra.com <http://algebra.com>. The downs
> were things like twenty hour REPAIR TABLE operations on a 35 GB
> table, etc.
>
> Right now I have a personal (one user) project to create a 5-10
> Terabyte data warehouse. The largest table will consume the most space
> and will take, perhaps, 200,000,000 rows.
>
> I want to use it to obtain valuable business intelligence and to make
> money.
>
> I expect it to grow, never shrink, and to be accessed via batch
> queries. I do not care for batch queries to be super fast, for example
> an hour per query would be just fine.
>
> However, while an hour is fine, two weeks per query is NOT fine.
>
> I have a server with about 18 TB of storage and 48 GB of RAM, and 12
> CPU cores.
>
> My initial plan was to use MySQL, InnoDB, and deal with problems as
> they arise. Perhaps, say, I would implement my own joining
> procedures.
>
> After reading some disparaging stuff about InnoDB performance on large
> datasets, however, I am getting cold feet. I have a general feeling
> that, perhaps, I will not be able to succeed with MySQL, or, perhaps,
> with either MySQL and Postgres.
>
> I do not know much about Postgres, but I am very eager to learn and
> see if I can use it for my purposes more effectively than MySQL.
>
> I cannot shell out $47,000 per CPU for Oracle for this project.
>
> To be more specific, the batch queries that I would do, I hope,
> would either use small JOINS of a small dataset to a large dataset, or
> just SELECTS from one big table.
>
> So... Can Postgres support a 5-10 TB database with the use pattern
> stated above?
>
> Thanks!
>
> i
>

That is a scale or two larger than I have experience with.  I converted my website database from mysql to PG, and it
hasseveral db's between 1 and 10 gig.  There are parts of the website that were faster with mysql, and there are parts
fasterwith PG.  One spot, because PG has superior join support on select statements, I was able to change the code to
generatea single more complicated sql statement vs. mysql that had to fire off several simpler statements.  Its a
searchscreen where you can type in 15'ish different options.  I was able to generate a single sql statement which joins
8some odd tables and plenty of where statements.  PG runs it in the blink of an eye.  Its astonishing compared to the
painof mysql.  If you ever have to write your own join, or your own lookup function, that's a failure of your database. 

One spot that was slower was a batch insert of data.  Its not so much slower that it was a problem.  I use COPY on PG
vsprepared insert's on mysql.  It was pretty close, but mysql still won. 

Seeing as you can setup and test both databases, have you considered a trial run?

Things to watch for:


I think the same amount of data will use more disk space in PG than in mysql.

Importing data into PG should use COPY and multiple connections at the same time.

PG will only use multi-core if you use multiple connections. (each connecion uses one core).

Huge result sets (like a select statement that returns 1,000,000 rows) will be slow.

PG is a much fuller database than mysql, and as such you can influence its join types, and function calls. (table scan
vsindex, immutable function vs stable, perl function vs sql).  So if at first it appears slow, you have a million
options. I think the only option you have in mysql is to pull the data back and code it yourself. 

Upgrading to major versions of PG may or may not be painful.  (mysql sometimes works seamlessly between versions, it
appearsbrilliant.  But I have had problems with an update, and when it goes bad, you dont have a lot of options).  In
thepast PG's only method of upgrade was a full backup of old, restore in new.  Things have gotten better, there is new
pg_upgradesupport (still kinda new though), and there is some 3rd party replication support where you replicate your
9.0database to a new 9.1 database, and at some point you promote the new 9.1 database as the new master.  Or something
likethat.  I've only read posts about it, never done it.  But with that much data, you'll need an upgrade plan. 

All in all, if I can summarize my personal view: mysql is fast at the expense of safety and usability.   (mysql still
cannotdo update's with subselects).  PG is safe and usable at the expense of speed, and you wont be disappointed by the
speed.

-Andy

В списке pgsql-performance по дате отправления:

Предыдущее
От: Claudio Freire
Дата:
Сообщение: Re: Postgres for a "data warehouse", 5-10 TB
Следующее
От: Igor Chudov
Дата:
Сообщение: Re: Postgres for a "data warehouse", 5-10 TB