Обсуждение: multi-tenant vs. multi-cluster

Поиск
Список
Период
Сортировка

multi-tenant vs. multi-cluster

От
Ben Chobot
Дата:
We're considering using postgres as a way to host database services for many, many independent applications. One
obviousway to do this is with schemas, roles, and proper permissions, but that still leaves open the possibility for
somepoorly written application to leave open transactions and affect others.  

Another possible way to go is to use debian's ability to easily run multiple clusters, but while this option gives
betterisolation, I imaging it comes at a cost of more overhead, both in terms of memory and in terms of support. I
supposemy question is, how much overhead? I'm less worried about support (that's what scripts are for) but if we're
talkingan extra 50MB of memory per cluster, that will start to add up. 

How have you guys offered multi-tenant postgres services? Am I forgetting something?

Re: multi-tenant vs. multi-cluster

От
Ivan Voras
Дата:
On 18/03/2011 19:17, Ben Chobot wrote:

> if we're talking an extra 50MB of memory per cluster, that will start to add up.

Consider this: each such cluster will have:

a) its own database files on the drives (WAL, data - increasing IO)
b) its own postgresql processes (many of them) running in memory
c) its own shared_buffers in memory.

It is highly unlikely that you will manage anything decent with this
type of configuration with a non-trivial number of clusters.

Re: multi-tenant vs. multi-cluster

От
Ben Chobot
Дата:
On Mar 18, 2011, at 11:47 AM, Ivan Voras wrote:

> On 18/03/2011 19:17, Ben Chobot wrote:
>
>> if we're talking an extra 50MB of memory per cluster, that will start to add up.
>
> Consider this: each such cluster will have:
>
> a) its own database files on the drives (WAL, data - increasing IO)

Oh, I hadn't thought about WAL. Good point.
But data files are a function of tables and indexes, right? Having them in different schemas or different clusters
isn'tgoing to change that. I guess there are system tables but those are relatively trivial - I think? 

> b) its own postgresql processes (many of them) running in memory

I believe this is entirely a function of client connections.

> c) its own shared_buffers in memory.

Given that each application will be independent, I don't see a different between clusters and schemas here either.


Re: multi-tenant vs. multi-cluster

От
"Nicholson, Brad (Toronto, ON, CA)"
Дата:
> -----Original Message-----
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> owner@postgresql.org] On Behalf Of Ben Chobot
> Sent: Friday, March 18, 2011 3:10 PM
> To: Ivan Voras
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] multi-tenant vs. multi-cluster
>
> On Mar 18, 2011, at 11:47 AM, Ivan Voras wrote:
>
> > On 18/03/2011 19:17, Ben Chobot wrote:
> >
> >> if we're talking an extra 50MB of memory per cluster, that will
> start to add up.
> >
> > Consider this: each such cluster will have:
> >
> > a) its own database files on the drives (WAL, data - increasing IO)
>
> Oh, I hadn't thought about WAL. Good point.
> But data files are a function of tables and indexes, right? Having them
> in different schemas or different clusters isn't going to change that.
> I guess there are system tables but those are relatively trivial - I
> think?

Correct, but with different clusters you are going to have different back ends handling writes without regard to each
other. How this unfolds will depend on your underlying disk structure and filsystems.  I've had bad experiences in the
pasthaving multiple Postgres instances fighting for the same disk. 

> > b) its own postgresql processes (many of them) running in memory
>
> I believe this is entirely a function of client connections.

With a single instance, you can use connection pooling to reduce the overall number of backend connections which will
reduceyour memory footprint.  
>
> > c) its own shared_buffers in memory.
>
> Given that each application will be independent, I don't see a
> different between clusters and schemas here either.

The difference is that in a single cluster, a single instance is going to make decisions about what data to cache or
not. This is an overly simplified example - but illustrates the point.  Say you have 4GB of RAM available to dedicate
toa shared buffers on a server, and two databases (DB A and DB B) to run.  You either set up a single instance with a
4GBpool, or two instances with 2GB pools each.  Let's say that DB A gets really busy, and DB B is not.  In the shared
instanceapproach, the instance can evict buffers cached for DB B in order to load buffers needed for DB A.  In the
splitinstance, you can't. 

Brad.

Re: multi-tenant vs. multi-cluster

От
Ben Chobot
Дата:
On Mar 18, 2011, at 12:34 PM, Nicholson, Brad (Toronto, ON, CA) wrote:

>>> b) its own postgresql processes (many of them) running in memory
>>
>> I believe this is entirely a function of client connections.
>
> With a single instance, you can use connection pooling to reduce the overall number of backend connections which will
reduceyour memory footprint.  

Er, right, for some reason I was thinking I could use connection pooling against multiple clusters, but now that I
thinkabout it that doesn't make much sense, does it? 

>>
>>> c) its own shared_buffers in memory.
>>
>> Given that each application will be independent, I don't see a
>> different between clusters and schemas here either.
>
> The difference is that in a single cluster, a single instance is going to make decisions about what data to cache or
not. This is an overly simplified example - but illustrates the point.  Say you have 4GB of RAM available to dedicate
toa shared buffers on a server, and two databases (DB A and DB B) to run.  You either set up a single instance with a
4GBpool, or two instances with 2GB pools each.  Let's say that DB A gets really busy, and DB B is not.  In the shared
instanceapproach, the instance can evict buffers cached for DB B in order to load buffers needed for DB A.  In the
splitinstance, you can't. 

Ah, that's an illustrative example. Thanks.

OK, so are there any good ways to keep a bad/clueless user from gumming up a whole cluster? Something like
statement_timeout,but for transactions, seems like it would be idle. 

Re: multi-tenant vs. multi-cluster

От
"Nicholson, Brad (Toronto, ON, CA)"
Дата:
> -----Original Message-----
> From: Ben Chobot [mailto:bench@silentmedia.com]
> Sent: Friday, March 18, 2011 3:45 PM
> To: Nicholson, Brad (Toronto, ON, CA)
> Cc: pgsql-general General
> Subject: Re: [GENERAL] multi-tenant vs. multi-cluster
>
>
> On Mar 18, 2011, at 12:34 PM, Nicholson, Brad (Toronto, ON, CA) wrote:
>
> >>> b) its own postgresql processes (many of them) running in memory
> >>
> >> I believe this is entirely a function of client connections.
> >
> > With a single instance, you can use connection pooling to reduce the
> overall number of backend connections which will reduce your memory
> footprint.
>
> Er, right, for some reason I was thinking I could use connection
> pooling against multiple clusters, but now that I think about it that
> doesn't make much sense, does it?

Not for reducing overall numbers of connections on the server.

> >>
> >>> c) its own shared_buffers in memory.
> >>
> >> Given that each application will be independent, I don't see a
> >> different between clusters and schemas here either.
> >
> > The difference is that in a single cluster, a single instance is
> going to make decisions about what data to cache or not.  This is an
> overly simplified example - but illustrates the point.  Say you have
> 4GB of RAM available to dedicate to a shared buffers on a server, and
> two databases (DB A and DB B) to run.  You either set up a single
> instance with a 4GB pool, or two instances with 2GB pools each.  Let's
> say that DB A gets really busy, and DB B is not.  In the shared
> instance approach, the instance can evict buffers cached for DB B in
> order to load buffers needed for DB A.  In the split instance, you
> can't.
>
> Ah, that's an illustrative example. Thanks.
>
> OK, so are there any good ways to keep a bad/clueless user from gumming
> up a whole cluster? Something like statement_timeout, but for
> transactions, seems like it would be idle.

statement_timeout will only time out SQL queries, not DB transactions.  There is nothing internal for that.  It's a
fairlyeasy query to terminate all IDLE transactions, but you have to be careful that you aren't terminating active
sessions.

Brad.

Re: multi-tenant vs. multi-cluster

От
Merlin Moncure
Дата:
On Fri, Mar 18, 2011 at 2:44 PM, Ben Chobot <bench@silentmedia.com> wrote:
>
> On Mar 18, 2011, at 12:34 PM, Nicholson, Brad (Toronto, ON, CA) wrote:
>
>>>> b) its own postgresql processes (many of them) running in memory
>>>
>>> I believe this is entirely a function of client connections.
>>
>> With a single instance, you can use connection pooling to reduce the overall number of backend connections which
willreduce your memory footprint. 
>
> Er, right, for some reason I was thinking I could use connection pooling against multiple clusters, but now that I
thinkabout it that doesn't make much sense, does it? 
>
>>>
>>>> c) its own shared_buffers in memory.
>>>
>>> Given that each application will be independent, I don't see a
>>> different between clusters and schemas here either.
>>
>> The difference is that in a single cluster, a single instance is going to make decisions about what data to cache or
not. This is an overly simplified example - but illustrates the point.  Say you have 4GB of RAM available to dedicate
toa shared buffers on a server, and two databases (DB A and DB B) to run.  You either set up a single instance with a
4GBpool, or two instances with 2GB pools each.  Let's say that DB A gets really busy, and DB B is not.  In the shared
instanceapproach, the instance can evict buffers cached for DB B in order to load buffers needed for DB A.  In the
splitinstance, you can't. 
>
> Ah, that's an illustrative example. Thanks.
>
> OK, so are there any good ways to keep a bad/clueless user from gumming up a whole cluster? Something like
statement_timeout,but for transactions, seems like it would be idle. 

single cluster, multiple database is likely the best way to go.
postgres is a little thin on resource provisioning features but at the
end of the day, restricting a single client app to N simultaneous
connections on a suitably powerful server is going to be your best way
to deal with this.  I highly advise using pgbouncer in front of your
database to do this.

you can always generate scripts to watch for high load situations and
kill off offending backends in emergencies.

statement_timeout is ok, but don't be too aggressive with it or it
will become annoying.

merlin

Re: multi-tenant vs. multi-cluster

От
Scott Marlowe
Дата:
On Fri, Mar 18, 2011 at 1:44 PM, Ben Chobot <bench@silentmedia.com> wrote:

> OK, so are there any good ways to keep a bad/clueless user from gumming up a whole cluster? Something like
statement_timeout,but for transactions, seems like it would be idle. 

Newer versions of postgresql aren't quite as susceptible to problems
of open transactions in one DB affecting another in the same cluster /
instance.  I.e. if dbA has an open transaction, that will no longer
prevent dbB from being properly vacuumed etc.

I'd suggest using check_postgresql.pl to keep track of things like
open transactions, vacuum wraparound etc.

Note that it might be worthwhile to have two or more but not dozens of
individual clusters. Let's say you've got a critical app, and a bunch
of not so critical apps.  You can make a cluster for the more critical
app(s) so it's / they're isolated from the other apps if things go
wrong.

A rogue query using all the memory or IO in the machine up can still
be a bit of a problem, of course, but it will be limited if it happens
in another cluster.