Re: Thousands databases or schemas

Поиск
Список
Период
Сортировка
От Magnus Hagander
Тема Re: Thousands databases or schemas
Дата
Msg-id CABUevEx_TPdrbJSU3Zt5yrMFRGJ=Dbsmxki7eZF7-mnN-HHDxw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Thousands databases or schemas  (Craig Ringer <craig@2ndQuadrant.com>)
Список pgsql-performance
On Fri, Nov 9, 2012 at 7:15 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 11/08/2012 09:29 PM, Denis wrote:
>> Ok guys, it was not my intention to hurt anyone's feelings by mentioning
>> MySQL. Sorry about that.
> It's pretty silly to be upset by someone mentioning another DB product.
> I wouldn't worry.
>> There simply was a project with a similar
>> architecture built using MySQL. When we started the current project, I have
>> made a decision to give PostgreSQL a try.
> It's certainly interesting that MySQL currently scales to much larger
> table counts better than PostgreSQL appears to.
>
> I'd like to see if this can be improved down the track. Various people
> are doing work on PostgreSQL scaling and performance, so with luck huge
> table counts will come into play there. If nothing else, supporting
> large table counts is important when dealing with very large amounts of
> data in partitioned tables.
>
> I think I saw mention of better performance with higher table counts in
> 9.3 in -hackers, too.
>
>> I would recommend you to refresh the info here
>> http://wiki.postgresql.org/wiki/FAQ. There is a question "What is the
>> maximum size for a row, a table, and a database?". Please add there info on
>> maximum DBs number and tables number one DB can contain while PostgreSQL
>> continues to work properly.
> Yeah, a number of people have been thrown by that. Technical limitations
> aren't the same as practical limitations, and in some cases the
> practical limitations are lower.

Yes. And the fact is that, PostgreSQL doesn't actually have a big
problem with this scenario. pg_dump does. It's mainly a tool issue,
not a database engine one. For example, pg_admin used to have problems
even with a much smaller number of databases than that - I think that
is better in current releases, but I'm not 100% sure. And you can
imagine what a tool looks like that tries to graph per database
values, for example, into a single graph when you have thousands of
databases.

PostgreSQL isn't perfect in these cases - as noted just creating a new
schema can take slightly more than the usual millisecond. But it
works, and I've never come across a scenario personally where it's not
"good enough" (that doesn't mean it doesn't exist, of course).

pg_dump and pg_dumpall, however, do have issues, as noted elsewhere as well.

But as Pavel mentioned, and others certainly have before as well,
there are other ways to deal with backups that solve this problem. It
may not be perfect, but it gets you pretty far.

It may not be a match for the requirements in this case - but it's not
a general limitation of the db.

FWIW, have you looked into using proper backups with PITR and just
rolling forward through the transaction log with
pause_at_recovery_target set? That's an excellent way to deal with the
"we lost data sometime between <x> and <y> but don't know when" when
tracking it down. I've yet to find a case where that's not easier than
repeatedly restoring even a fairly small pg_dump based backup to find
it. Plus, it give you a much better granularity, which leads to better
results for your customer. *And* it takes away the dependency on
pg_dump's performance issues.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


В списке pgsql-performance по дате отправления:

Предыдущее
От: Craig Ringer
Дата:
Сообщение: Re: Thousands databases or schemas
Следующее
От: David Boreham
Дата:
Сообщение: Re: HT on or off for E5-26xx ?