Обсуждение: sensible configuration of max_connections

Поиск
Список
Период
Сортировка

sensible configuration of max_connections

От
Chris Withers
Дата:
Hi All,

What's a sensible way to pick the number to use for max_connections?

I'm looking after a reasonable size multi-tenant cluster, where the 
master handles all the load and there's a slave in case of hardware 
failure in the master.
The machine is used to host what I suspect are mainly django 
applications, so lots of short web requests, not sure how much, if any, 
django's orm does connection pooling.

I arbitrarily picked 1000 for max_connections and haven't had any 
problems, but onboarding another app that handles a few million requests 
per day  on Monday and thinking about potential problems related to the 
number of available connections.

What's "too much" for max_connections? What happens when you set it to 
high? What factors affect that number?

cheers,

Chris



Fwd: sensible configuration of max_connections

От
Chris Ellis
Дата:
Hi Chris

On Fri, 7 Feb 2020, 08:36 Chris Withers, <chris@withers.org> wrote:
Hi All,

What's a sensible way to pick the number to use for max_connections?

Sensible in this context is some what variable.  Each connection in PostgreSQL will be allocated a backend process.  These are not the lightest weight of things.

Each connection takes up space in shared memory, as mentioned in the manual.


I'm looking after a reasonable size multi-tenant cluster, where the
master handles all the load and there's a slave in case of hardware
failure in the master.
The machine is used to host what I suspect are mainly django
applications, so lots of short web requests, not sure how much, if any,
django's orm does connection pooling.

I arbitrarily picked 1000 for max_connections and haven't had any
problems, but onboarding another app that handles a few million requests
per day  on Monday and thinking about potential problems related to the
number of available connections.

What's "too much" for max_connections? What happens when you set it to
high? What factors affect that number?

When sizing max_connections you need to trade off how many connections your application will use at peak vs how much RAM and CPU you have.

Each connection is capable of allocating work_mem and has a stack etc.

As such you don't want max_connections to be able to run your system out of RAM.

Given your situation I'd very seriously look at connection pooling using PgBouncer or similar.  That way you can run with a far smaller max_connections and still cope with applications configured with large usually idle connection pools.


cheers,

Chris

Regards,
Chris Ellis

Re: Fwd: sensible configuration of max_connections

От
Chris Withers
Дата:
On 07/02/2020 12:49, Chris Ellis wrote:
What's "too much" for max_connections? What happens when you set it to
high? What factors affect that number?

When sizing max_connections you need to trade off how many connections your application will use at peak vs how much RAM and CPU you have.

Each connection is capable of allocating work_mem and has a stack etc.

As such you don't want max_connections to be able to run your system out of RAM.
Sure, but that's where I'm trying to find out what's sensible. The box has 196GB memory, most of that in hugepages, 18 core Intel Skylake with HT on giving 36 cores and tonnes of SSD for storage. How would I turn that spec into a sensible number for max_connections? As that number grows, what contention points in postgres will start creaking (shared memory where the IPC happens?)

In case I forgot to say, this is PostgreSQL 11...

Chris

PS: definitely thinking of pg_bouncer, but still trying to figure out what to sensibly set for max_connections.

Re: sensible configuration of max_connections

От
Justin
Дата:
Hi Chris Withers

As stated each connection uses X amount of resources and its very easy to configure Postgresql where even small number of connections will each up all the RAM

WorkMem is the biggest consumer of resources  lets say its set to 5 megs per connection at 1000 connections that 5,000 megs that can be allocated.

Connection pooler  may or may not work depends on how security is laid out in the Application if this is a valid option...   If the application understands how to keep tenants out of each other data with no leaks then yes,  if the each tenant is assigned shema or specific database connection pooler can still work but the configuration is going to difficult,

On Fri, Feb 7, 2020 at 7:50 AM Chris Ellis <chris@intrbiz.com> wrote:
Hi Chris

On Fri, 7 Feb 2020, 08:36 Chris Withers, <chris@withers.org> wrote:
Hi All,

What's a sensible way to pick the number to use for max_connections?

Sensible in this context is some what variable.  Each connection in PostgreSQL will be allocated a backend process.  These are not the lightest weight of things.

Each connection takes up space in shared memory, as mentioned in the manual.


I'm looking after a reasonable size multi-tenant cluster, where the
master handles all the load and there's a slave in case of hardware
failure in the master.
The machine is used to host what I suspect are mainly django
applications, so lots of short web requests, not sure how much, if any,
django's orm does connection pooling.

I arbitrarily picked 1000 for max_connections and haven't had any
problems, but onboarding another app that handles a few million requests
per day  on Monday and thinking about potential problems related to the
number of available connections.

What's "too much" for max_connections? What happens when you set it to
high? What factors affect that number?

When sizing max_connections you need to trade off how many connections your application will use at peak vs how much RAM and CPU you have.

Each connection is capable of allocating work_mem and has a stack etc.

As such you don't want max_connections to be able to run your system out of RAM.

Given your situation I'd very seriously look at connection pooling using PgBouncer or similar.  That way you can run with a far smaller max_connections and still cope with applications configured with large usually idle connection pools.


cheers,

Chris

Regards,
Chris Ellis

Re: Fwd: sensible configuration of max_connections

От
Steve Atkins
Дата:


On 07/02/2020 13:18, Chris Withers wrote:
On 07/02/2020 12:49, Chris Ellis wrote:
What's "too much" for max_connections? What happens when you set it to
high? What factors affect that number?

When sizing max_connections you need to trade off how many connections your application will use at peak vs how much RAM and CPU you have.

Each connection is capable of allocating work_mem and has a stack etc.

As such you don't want max_connections to be able to run your system out of RAM.
Sure, but that's where I'm trying to find out what's sensible. The box has 196GB memory, most of that in hugepages, 18 core Intel Skylake with HT on giving 36 cores and tonnes of SSD for storage. How would I turn that spec into a sensible number for max_connections? As that number grows, what contention points in postgres will start creaking (shared memory where the IPC happens?)

The max_connections setting  is an upper limit after which postgresql will reject connections. You don't really want to hit that limit, rather you want to keep the number of concurrent connections to a reasonable number (and have max_connections somewhere above that).

Each connection is a postgresql process, so active connections are competing for resources and even idle connections take up some RAM. Creating a new connection is launching a new process (and doing some setup) so it's relatively expensive.

Doing some sort of connection pooling is a good idea, especially for web-apps that connect, do a few short queries and disconnect. Django is probably doing a passable job at pooling already, so you might want to see how many connections it's using under normal load. Adding a dedicated pooler in between Django and PostgreSQL would give you more flexibility and might be a good idea, but if what Django is already doing is reasonable you may not need it.

What's a good number of active connections to aim for? It probably depends on whether they tend to be CPU-bound or IO-bound, but I've seen the rule of thumb of "around twice the number of CPU cores" tossed around, and it's probably a decent place to start, then run it under normal load and see how it behaves - cpu usage, RAM, IO, request latency and throughput.

Cheers,
  Steve

Re: sensible configuration of max_connections

От
Michael Lewis
Дата:

On Fri, Feb 7, 2020 at 6:29 AM Justin <zzzzz.graf@gmail.com> wrote:
WorkMem is the biggest consumer of resources  lets say its set to 5 megs per connection at 1000 connections that 5,000 megs that can be allocated.

Clarification- work_mem is used per operation (sort, hash, etc) and could be many many times with a complicated query, and/or parallel processing enabled. It could be that a single connection uses 10x work_mem or more.


Re: Fwd: sensible configuration of max_connections

От
Sam Gendler
Дата:


On Fri, Feb 7, 2020 at 5:36 AM Steve Atkins <steve@blighty.com> wrote:

What's a good number of active connections to aim for? It probably depends on whether they tend to be CPU-bound or IO-bound, but I've seen the rule of thumb of "around twice the number of CPU cores" tossed around, and it's probably a decent place to start, then run it under normal load and see how it behaves - cpu usage, RAM, IO, request latency and throughput.


Back in the day of spinning media, when I was more active on the list and postgresql 9 was just released, conventional wisdom for starting number was 2*cores + 1*spindles, if memory serves. You can set max_connections higher, but that was the number you wanted to have active, and then adjust for workload - OLTP vs warehouse, how much disk access vs buffer cache, etc.  Benchmarks, at the time, showed that performance started to fall off due to contention if the number of processes got much larger.  I imagine that the speed of storage today would maybe make 3 or 4x core count a pretty reasonable place to start.  There will be a point of diminishing returns somewhere, but you can probably construct your own benchmarks to determine where that point is likely to be for your workload.

I was doing a lot of java development at the time, and tended to use a connection pool per application server rather than an external connection pool in front of postgresql, just for ease of administration, so I might have more connections than the desired pool size, but I tried to keep the number of active connections under that limit and set max_connections to a value that prevented me from being locked out if each application server was at max pool size.  I'm not sure how well that strategy would work now that autoscaling is so ubiquitous, since there is memory allocation overhead associated even with idle connections and the sheer number of per-server pools could get quite high
 

Re: Fwd: sensible configuration of max_connections

От
Justin
Дата:

On Fri, Feb 7, 2020 at 1:56 PM Sam Gendler <sgendler@ideasculptor.com> wrote:
Benchmarks, at the time, showed that performance started to fall off due to contention if the number of processes got much larger.  I imagine that the speed of storage today would maybe make 3 or 4x core count a pretty reasonable place to start.  There will be a point of diminishing returns somewhere, but you can probably construct your own benchmarks to determine where that point is likely to be for your workload.

I wonder if anyone has run benchmark like that lately?  Doing such a benchmark maybe worth while given that so much is now running either in the cloud or running in a VM or some other kind of Container.     all this abstraction from the hardware layer surely has had to have an impact on the numbers and rules of thumb...

I still run on real hardware and spinning disk.  

Re: Fwd: sensible configuration of max_connections

От
Sam Gendler
Дата:


On Fri, Feb 7, 2020 at 11:14 AM Justin <zzzzz.graf@gmail.com> wrote:

On Fri, Feb 7, 2020 at 1:56 PM Sam Gendler <sgendler@ideasculptor.com> wrote:
Benchmarks, at the time, showed that performance started to fall off due to contention if the number of processes got much larger.  I imagine that the speed of storage today would maybe make 3 or 4x core count a pretty reasonable place to start.  There will be a point of diminishing returns somewhere, but you can probably construct your own benchmarks to determine where that point is likely to be for your workload.

I wonder if anyone has run benchmark like that lately?  Doing such a benchmark maybe worth while given that so much is now running either in the cloud or running in a VM or some other kind of Container.     all this abstraction from the hardware layer surely has had to have an impact on the numbers and rules of thumb...

I still run on real hardware and spinning disk.  

To be honest, I don't even know if the old rule of thumb would still apply, given the changes that have likely occurred within the postgresql codebase over the course of a decade.  But there were plenty of people benchmarking and writing about how to administer large installations and do performance tuning back then. I don't imagine that they don't exist today, too.  They'll probably chime in on this thread soon enough. 

A quick amazon search for 'postgresql performance' turns up plenty of books on the topic that address more recent versions of the db.  I'd go hit the O'Reilly bookshelf website and use a trial membership to see what they have to say (I generally consider the o'reilly bookshelf, which gives you access to pretty much all books by all technical publishers, to be an invaluable tool and worth every penny).  I wouldn't be surprised if the postgresql documentation itself doesn't provide insight as to appropriate numbers, but no one ever reads the manual any longer.