Обсуждение: Connection pooling for a mixture of lightweight and heavyweight jobs?

Поиск
Список
Период
Сортировка

Connection pooling for a mixture of lightweight and heavyweight jobs?

От
Craig James
Дата:
I have a question that may be related to connection pooling.

We create a bunch of high-performance lightweight Postgres clients that serve up images (via mod_perl and Apache::DBI).
We have roughly ten web sites, with ten mod_perl instances each, so we always have around 100 Postgres backends sitting
aroundall the time waiting.  When a lightweight request comes in, it's a single query on an primary key with no joins,
soit's very fast. 

We also have a very heavyweight process (our primary search technology) that can take many seconds, even minutes, to do
asearch and generate a web page. 

The lightweight backends are mostly idle, but when a heavyweight search finishes, it causes a burst on the lightweight
backends,which must be very fast. (They provide all of the images in the results page.) 

This mixture seems to make it hard to configure Postgres with the right amount of memory and such.  The primary query
needssome elbow room to do its work, but the lightweight queries all get the same resources. 

I figured that having these lightweight Postgres backends sitting around was harmless -- they allocate shared memory
andother resources, but they never use them, so what's the harm?  But recent discussions about connection pooling seem
tosuggest otherwise, that merely having 100 backends sitting around might be a problem. 

Craig

Re: Connection pooling for a mixture of lightweight and heavyweight jobs?

От
"Kevin Grittner"
Дата:
Craig James <craig_james@emolecules.com> wrote:

> We create a bunch of high-performance lightweight Postgres clients
> that serve up images (via mod_perl and Apache::DBI).  We have
> roughly ten web sites, with ten mod_perl instances each, so we
> always have around 100 Postgres backends sitting around all the
> time waiting.  When a lightweight request comes in, it's a single
> query on an primary key with no joins, so it's very fast.
>
> We also have a very heavyweight process (our primary search
> technology) that can take many seconds, even minutes, to do a
> search and generate a web page.
>
> The lightweight backends are mostly idle, but when a heavyweight
> search finishes, it causes a burst on the lightweight backends,
> which must be very fast. (They provide all of the images in the
> results page.)
>
> This mixture seems to make it hard to configure Postgres with the
> right amount of memory and such.  The primary query needs some
> elbow room to do its work, but the lightweight queries all get the
> same resources.
>
> I figured that having these lightweight Postgres backends sitting
> around was harmless -- they allocate shared memory and other
> resources, but they never use them, so what's the harm?  But
> recent discussions about connection pooling seem to suggest
> otherwise, that merely having 100 backends sitting around might be
> a problem.

Well, the "if it ain't broke, don't fix it" rule might come into
play here.  The current configuration might leave you vulnerable to
occasional less-than-optimal performance, if two or more heavyweight
processes finish at the same time, and cause a "thundering herd" of
lightweight processes.  Having the lightweight requests go through a
connection pool could mitigate that problem, but they introduce
their own overhead on every request.  So, I would advise keeping an
eye out for problems which might match the above, but not to take
hasty action in the absence of evidence.  You might buy back 400MB
of RAM for caching (which you may or may not need) at the cost of
extra latency and CPU per request.

-Kevin

Re: Connection pooling for a mixture of lightweight and heavyweight jobs?

От
Craig James
Дата:
On 7/30/10 8:57 AM, Kevin Grittner wrote:
> Craig James<craig_james@emolecules.com>  wrote:
>
>> We create a bunch of high-performance lightweight Postgres clients
>> that serve up images (via mod_perl and Apache::DBI).  We have
>> roughly ten web sites, with ten mod_perl instances each, so we
>> always have around 100 Postgres backends sitting around all the
>> time waiting.  When a lightweight request comes in, it's a single
>> query on an primary key with no joins, so it's very fast.
>>
>> We also have a very heavyweight process (our primary search
>> technology) that can take many seconds, even minutes, to do a
>> search and generate a web page.
>>
>> The lightweight backends are mostly idle, but when a heavyweight
>> search finishes, it causes a burst on the lightweight backends,
>> which must be very fast. (They provide all of the images in the
>> results page.)
>>
>> This mixture seems to make it hard to configure Postgres with the
>> right amount of memory and such.  The primary query needs some
>> elbow room to do its work, but the lightweight queries all get the
>> same resources.
>>
>> I figured that having these lightweight Postgres backends sitting
>> around was harmless -- they allocate shared memory and other
>> resources, but they never use them, so what's the harm?  But
>> recent discussions about connection pooling seem to suggest
>> otherwise, that merely having 100 backends sitting around might be
>> a problem.
>
> Well, the "if it ain't broke, don't fix it" rule might come into
> play here.

I should have given one more detail here: We've been the victim of persistent "CPU spikes" that were discussed
extensivelyin postgres-performance.  Tom suggested upgrading to 8.4.4, but that can't happen for a couple more months
(we'reworking on it). 

   http://archives.postgresql.org/pgsql-performance/2010-04/msg00071.php

Craig

> The current configuration might leave you vulnerable to
> occasional less-than-optimal performance, if two or more heavyweight
> processes finish at the same time, and cause a "thundering herd" of
> lightweight processes.  Having the lightweight requests go through a
> connection pool could mitigate that problem, but they introduce
> their own overhead on every request.  So, I would advise keeping an
> eye out for problems which might match the above, but not to take
> hasty action in the absence of evidence.  You might buy back 400MB
> of RAM for caching (which you may or may not need) at the cost of
> extra latency and CPU per request.
>
> -Kevin
>


Re: Connection pooling for a mixture of lightweight and heavyweight jobs?

От
"Kevin Grittner"
Дата:
Craig James <craig_james@emolecules.com> wrote:

>> Well, the "if it ain't broke, don't fix it" rule might come into
>> play here.
>
> I should have given one more detail here: We've been the victim
> of persistent "CPU spikes" that were discussed extensively in
> postgres-performance.  Tom suggested upgrading to 8.4.4, but that
> can't happen for a couple more months (we're working on it).
>
>
http://archives.postgresql.org/pgsql-performance/2010-04/msg00071.php

Ah, I hadn't connected that thread with this.  After rereading that
thread with the information from this thread in mind, I think what
you describe on the other thread could well be the "thundering herd"
problem.  Some form of connection pooling could well help.

BTW, I hope you've updated to the latest 8.3.x by now.  If not, you
should expedite that.

-Kevin

Re: Connection pooling for a mixture of lightweight and heavyweight jobs?

От
Craig James
Дата:
On 7/30/10 10:37 AM, Kevin Grittner wrote:
> Craig James<craig_james@emolecules.com>  wrote:
>
>>> Well, the "if it ain't broke, don't fix it" rule might come into
>>> play here.
>>
>> I should have given one more detail here: We've been the victim
>> of persistent "CPU spikes" that were discussed extensively in
>> postgres-performance.  Tom suggested upgrading to 8.4.4, but that
>> can't happen for a couple more months (we're working on it).
>>
>>
> http://archives.postgresql.org/pgsql-performance/2010-04/msg00071.php
>
> Ah, I hadn't connected that thread with this.  After rereading that
> thread with the information from this thread in mind, I think what
> you describe on the other thread could well be the "thundering herd"
> problem.  Some form of connection pooling could well help.
>
> BTW, I hope you've updated to the latest 8.3.x by now.  If not, you
> should expedite that.

Yes, I updated to 8.3.10, partly to see if it would solve this problem.

I'm not clear on how connection pooling would help this problem.  I would have 100 lightweight backends, whether they
werepooled or not, always sitting around.  Or are you suggesting that I not use Apache::DBI to maintain persistent
connections,and instead rely on the connection pooler to provide fast connect/disconnect from Postgres? 

Thanks,
Craig


Re: Connection pooling for a mixture of lightweight and heavyweight jobs?

От
"Kevin Grittner"
Дата:
Craig James <craig_james@emolecules.com> wrote:

> I'm not clear on how connection pooling would help this problem.
> I would have 100 lightweight backends, whether they were pooled or
> not, always sitting around.

Not if you use a good pooler with the right configuration.  You
should have 10 to 20 database connections which the lightweight
query requests share.  You want something which will queue requests
above that maximum, and immediately release one more request when a
previous request completes.  We have this built in to our
application framework, so I'm not familiar with how to configure
pgbouncer or pgpool, but from what I've heard I would expect both of
these to support such behavior.

(Someone with more experience with one or both of these products,
please jump in here....)

At this point some people protest that they can't let requests queue
up like that because they need fast response time.  Trust me; if you
set up a good connection pool to the right size, this *will* give
you much better throughput *and* response time than letting the
thundering herd loose to compete for resources.

-Kevin