Обсуждение: multi-tenant vs. multi-cluster
We're considering using postgres as a way to host database services for many, many independent applications. One obviousway to do this is with schemas, roles, and proper permissions, but that still leaves open the possibility for somepoorly written application to leave open transactions and affect others. Another possible way to go is to use debian's ability to easily run multiple clusters, but while this option gives betterisolation, I imaging it comes at a cost of more overhead, both in terms of memory and in terms of support. I supposemy question is, how much overhead? I'm less worried about support (that's what scripts are for) but if we're talkingan extra 50MB of memory per cluster, that will start to add up. How have you guys offered multi-tenant postgres services? Am I forgetting something?
On 18/03/2011 19:17, Ben Chobot wrote: > if we're talking an extra 50MB of memory per cluster, that will start to add up. Consider this: each such cluster will have: a) its own database files on the drives (WAL, data - increasing IO) b) its own postgresql processes (many of them) running in memory c) its own shared_buffers in memory. It is highly unlikely that you will manage anything decent with this type of configuration with a non-trivial number of clusters.
On Mar 18, 2011, at 11:47 AM, Ivan Voras wrote: > On 18/03/2011 19:17, Ben Chobot wrote: > >> if we're talking an extra 50MB of memory per cluster, that will start to add up. > > Consider this: each such cluster will have: > > a) its own database files on the drives (WAL, data - increasing IO) Oh, I hadn't thought about WAL. Good point. But data files are a function of tables and indexes, right? Having them in different schemas or different clusters isn'tgoing to change that. I guess there are system tables but those are relatively trivial - I think? > b) its own postgresql processes (many of them) running in memory I believe this is entirely a function of client connections. > c) its own shared_buffers in memory. Given that each application will be independent, I don't see a different between clusters and schemas here either.
> -----Original Message----- > From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- > owner@postgresql.org] On Behalf Of Ben Chobot > Sent: Friday, March 18, 2011 3:10 PM > To: Ivan Voras > Cc: pgsql-general@postgresql.org > Subject: Re: [GENERAL] multi-tenant vs. multi-cluster > > On Mar 18, 2011, at 11:47 AM, Ivan Voras wrote: > > > On 18/03/2011 19:17, Ben Chobot wrote: > > > >> if we're talking an extra 50MB of memory per cluster, that will > start to add up. > > > > Consider this: each such cluster will have: > > > > a) its own database files on the drives (WAL, data - increasing IO) > > Oh, I hadn't thought about WAL. Good point. > But data files are a function of tables and indexes, right? Having them > in different schemas or different clusters isn't going to change that. > I guess there are system tables but those are relatively trivial - I > think? Correct, but with different clusters you are going to have different back ends handling writes without regard to each other. How this unfolds will depend on your underlying disk structure and filsystems. I've had bad experiences in the pasthaving multiple Postgres instances fighting for the same disk. > > b) its own postgresql processes (many of them) running in memory > > I believe this is entirely a function of client connections. With a single instance, you can use connection pooling to reduce the overall number of backend connections which will reduceyour memory footprint. > > > c) its own shared_buffers in memory. > > Given that each application will be independent, I don't see a > different between clusters and schemas here either. The difference is that in a single cluster, a single instance is going to make decisions about what data to cache or not. This is an overly simplified example - but illustrates the point. Say you have 4GB of RAM available to dedicate toa shared buffers on a server, and two databases (DB A and DB B) to run. You either set up a single instance with a 4GBpool, or two instances with 2GB pools each. Let's say that DB A gets really busy, and DB B is not. In the shared instanceapproach, the instance can evict buffers cached for DB B in order to load buffers needed for DB A. In the splitinstance, you can't. Brad.
On Mar 18, 2011, at 12:34 PM, Nicholson, Brad (Toronto, ON, CA) wrote: >>> b) its own postgresql processes (many of them) running in memory >> >> I believe this is entirely a function of client connections. > > With a single instance, you can use connection pooling to reduce the overall number of backend connections which will reduceyour memory footprint. Er, right, for some reason I was thinking I could use connection pooling against multiple clusters, but now that I thinkabout it that doesn't make much sense, does it? >> >>> c) its own shared_buffers in memory. >> >> Given that each application will be independent, I don't see a >> different between clusters and schemas here either. > > The difference is that in a single cluster, a single instance is going to make decisions about what data to cache or not. This is an overly simplified example - but illustrates the point. Say you have 4GB of RAM available to dedicate toa shared buffers on a server, and two databases (DB A and DB B) to run. You either set up a single instance with a 4GBpool, or two instances with 2GB pools each. Let's say that DB A gets really busy, and DB B is not. In the shared instanceapproach, the instance can evict buffers cached for DB B in order to load buffers needed for DB A. In the splitinstance, you can't. Ah, that's an illustrative example. Thanks. OK, so are there any good ways to keep a bad/clueless user from gumming up a whole cluster? Something like statement_timeout,but for transactions, seems like it would be idle.
> -----Original Message----- > From: Ben Chobot [mailto:bench@silentmedia.com] > Sent: Friday, March 18, 2011 3:45 PM > To: Nicholson, Brad (Toronto, ON, CA) > Cc: pgsql-general General > Subject: Re: [GENERAL] multi-tenant vs. multi-cluster > > > On Mar 18, 2011, at 12:34 PM, Nicholson, Brad (Toronto, ON, CA) wrote: > > >>> b) its own postgresql processes (many of them) running in memory > >> > >> I believe this is entirely a function of client connections. > > > > With a single instance, you can use connection pooling to reduce the > overall number of backend connections which will reduce your memory > footprint. > > Er, right, for some reason I was thinking I could use connection > pooling against multiple clusters, but now that I think about it that > doesn't make much sense, does it? Not for reducing overall numbers of connections on the server. > >> > >>> c) its own shared_buffers in memory. > >> > >> Given that each application will be independent, I don't see a > >> different between clusters and schemas here either. > > > > The difference is that in a single cluster, a single instance is > going to make decisions about what data to cache or not. This is an > overly simplified example - but illustrates the point. Say you have > 4GB of RAM available to dedicate to a shared buffers on a server, and > two databases (DB A and DB B) to run. You either set up a single > instance with a 4GB pool, or two instances with 2GB pools each. Let's > say that DB A gets really busy, and DB B is not. In the shared > instance approach, the instance can evict buffers cached for DB B in > order to load buffers needed for DB A. In the split instance, you > can't. > > Ah, that's an illustrative example. Thanks. > > OK, so are there any good ways to keep a bad/clueless user from gumming > up a whole cluster? Something like statement_timeout, but for > transactions, seems like it would be idle. statement_timeout will only time out SQL queries, not DB transactions. There is nothing internal for that. It's a fairlyeasy query to terminate all IDLE transactions, but you have to be careful that you aren't terminating active sessions. Brad.
On Fri, Mar 18, 2011 at 2:44 PM, Ben Chobot <bench@silentmedia.com> wrote: > > On Mar 18, 2011, at 12:34 PM, Nicholson, Brad (Toronto, ON, CA) wrote: > >>>> b) its own postgresql processes (many of them) running in memory >>> >>> I believe this is entirely a function of client connections. >> >> With a single instance, you can use connection pooling to reduce the overall number of backend connections which willreduce your memory footprint. > > Er, right, for some reason I was thinking I could use connection pooling against multiple clusters, but now that I thinkabout it that doesn't make much sense, does it? > >>> >>>> c) its own shared_buffers in memory. >>> >>> Given that each application will be independent, I don't see a >>> different between clusters and schemas here either. >> >> The difference is that in a single cluster, a single instance is going to make decisions about what data to cache or not. This is an overly simplified example - but illustrates the point. Say you have 4GB of RAM available to dedicate toa shared buffers on a server, and two databases (DB A and DB B) to run. You either set up a single instance with a 4GBpool, or two instances with 2GB pools each. Let's say that DB A gets really busy, and DB B is not. In the shared instanceapproach, the instance can evict buffers cached for DB B in order to load buffers needed for DB A. In the splitinstance, you can't. > > Ah, that's an illustrative example. Thanks. > > OK, so are there any good ways to keep a bad/clueless user from gumming up a whole cluster? Something like statement_timeout,but for transactions, seems like it would be idle. single cluster, multiple database is likely the best way to go. postgres is a little thin on resource provisioning features but at the end of the day, restricting a single client app to N simultaneous connections on a suitably powerful server is going to be your best way to deal with this. I highly advise using pgbouncer in front of your database to do this. you can always generate scripts to watch for high load situations and kill off offending backends in emergencies. statement_timeout is ok, but don't be too aggressive with it or it will become annoying. merlin
On Fri, Mar 18, 2011 at 1:44 PM, Ben Chobot <bench@silentmedia.com> wrote: > OK, so are there any good ways to keep a bad/clueless user from gumming up a whole cluster? Something like statement_timeout,but for transactions, seems like it would be idle. Newer versions of postgresql aren't quite as susceptible to problems of open transactions in one DB affecting another in the same cluster / instance. I.e. if dbA has an open transaction, that will no longer prevent dbB from being properly vacuumed etc. I'd suggest using check_postgresql.pl to keep track of things like open transactions, vacuum wraparound etc. Note that it might be worthwhile to have two or more but not dozens of individual clusters. Let's say you've got a critical app, and a bunch of not so critical apps. You can make a cluster for the more critical app(s) so it's / they're isolated from the other apps if things go wrong. A rogue query using all the memory or IO in the machine up can still be a bit of a problem, of course, but it will be limited if it happens in another cluster.