Обсуждение: Persistent Connections in Webserver Environment

Поиск
Список
Период
Сортировка

Persistent Connections in Webserver Environment

От
Hannes Dorbath
Дата:
Hi,
as the subject says I need some advice on setting up connection handling
to PG in a webserver environment. It's a typical dual Xeon FreeBSD box
running Apache2 with mod_php5 and PG 8. About 20 different applications
(ecommerce systems) will be running on this box. Each app resides in
it's own schema inside a single database. As far as I understand
persistent connections from apache processes can only be reused if the
authentication information of the allready existing connection is the
same. So in case an apache process holds a persistent connection to
database "test", auth'ed with username "user1" and another app wants to
connect as "user2" the connection can't be reused and a new one will be
spawned.

So what we are doing atm is telling all apps to use the user "apache",
grant access for this user to all schemas and fire "SET search_path TO
<app_schema>;" at the startup of each app / script. It works, but I
really would like to have an dedicated user for each app / schema for
security reasons.

The next better idea I came up with was to fire "SET SESSION
AUTHORIZATION TO <user>;" at each app / script startup, but for this to
work I would need to initially connect as superuser - and I really
dislike the idea of having a webserver connecting as superuser :/

Any ideas? I can't be the first person on earth with that problem ;/


Thanks in advance

Re: Persistent Connections in Webserver Environment

От
"Sean Davis"
Дата:
I have only a few connections, but I just connect with the equivalent of
your "apache" user.  My database is pretty much query-only with a few
exceptions that are not "sensitive".  But for you, could you just write a
stored function to do the transaction and write the audit trail for
data-altering queries?  That way, the application can still provide a
"username" to the function for the audit trail and the audit trail can be
made "safe" within the database framework (ie., it will only be written if
the transaction succeeds).  Alternatively, this could be done on the client
side by doing all data changes and auditing within the same transaction
block, but having all the code on the server side makes altering the schema
later easier (?).  This should be a balance between having cached
connections (VERY important for any even slightly-loaded system, in my very
limited experience) and having robust auditing.

Sean
----- Original Message -----
From: "Hannes Dorbath" <light@theendofthetunnel.de>
To: <pgsql-general@postgresql.org>
Sent: Monday, May 02, 2005 8:45 AM
Subject: [GENERAL] Persistent Connections in Webserver Environment


> Hi,
> as the subject says I need some advice on setting up connection handling
> to PG in a webserver environment. It's a typical dual Xeon FreeBSD box
> running Apache2 with mod_php5 and PG 8. About 20 different applications
> (ecommerce systems) will be running on this box. Each app resides in it's
> own schema inside a single database. As far as I understand persistent
> connections from apache processes can only be reused if the authentication
> information of the allready existing connection is the same. So in case an
> apache process holds a persistent connection to database "test", auth'ed
> with username "user1" and another app wants to connect as "user2" the
> connection can't be reused and a new one will be spawned.
>
> So what we are doing atm is telling all apps to use the user "apache",
> grant access for this user to all schemas and fire "SET search_path TO
> <app_schema>;" at the startup of each app / script. It works, but I really
> would like to have an dedicated user for each app / schema for security
> reasons.
>
> The next better idea I came up with was to fire "SET SESSION AUTHORIZATION
> TO <user>;" at each app / script startup, but for this to work I would
> need to initially connect as superuser - and I really dislike the idea of
> having a webserver connecting as superuser :/
>
> Any ideas? I can't be the first person on earth with that problem ;/
>
>
> Thanks in advance
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>      subscribe-nomail command to majordomo@postgresql.org so that your
>      message can get through to the mailing list cleanly
>



Re: Persistent Connections in Webserver Environment

От
Hannes Dorbath
Дата:
Hm. That would work, but there are so many data-altering queries, it's a
lot of work :/

I'm dreaming of a simple proxy that securely holds a pool of
su-connections and uses:

SET SESSION AUTHORIZATION $foo;
$query;
RESET SESSION AUTHORIZATION;

It would just have to filter queries that contain "SESSION
AUTHORIZATION" to prevent sql injection..

I wonder why pgPool doesn't work that way..


On 02.05.2005 15:23, Sean Davis wrote:
> I have only a few connections, but I just connect with the equivalent of
> your "apache" user.  My database is pretty much query-only with a few
> exceptions that are not "sensitive".  But for you, could you just write
> a stored function to do the transaction and write the audit trail for
> data-altering queries?  That way, the application can still provide a
> "username" to the function for the audit trail and the audit trail can
> be made "safe" within the database framework (ie., it will only be
> written if the transaction succeeds).  Alternatively, this could be done
> on the client side by doing all data changes and auditing within the
> same transaction block, but having all the code on the server side makes
> altering the schema later easier (?).  This should be a balance between
> having cached connections (VERY important for any even slightly-loaded
> system, in my very limited experience) and having robust auditing.
>
> Sean
> ----- Original Message ----- From: "Hannes Dorbath"
> <light@theendofthetunnel.de>
> To: <pgsql-general@postgresql.org>
> Sent: Monday, May 02, 2005 8:45 AM
> Subject: [GENERAL] Persistent Connections in Webserver Environment
>
>
>> Hi,
>> as the subject says I need some advice on setting up connection
>> handling to PG in a webserver environment. It's a typical dual Xeon
>> FreeBSD box running Apache2 with mod_php5 and PG 8. About 20 different
>> applications (ecommerce systems) will be running on this box. Each app
>> resides in it's own schema inside a single database. As far as I
>> understand persistent connections from apache processes can only be
>> reused if the authentication information of the allready existing
>> connection is the same. So in case an apache process holds a
>> persistent connection to database "test", auth'ed with username
>> "user1" and another app wants to connect as "user2" the connection
>> can't be reused and a new one will be spawned.
>>
>> So what we are doing atm is telling all apps to use the user "apache",
>> grant access for this user to all schemas and fire "SET search_path TO
>> <app_schema>;" at the startup of each app / script. It works, but I
>> really would like to have an dedicated user for each app / schema for
>> security reasons.
>>
>> The next better idea I came up with was to fire "SET SESSION
>> AUTHORIZATION TO <user>;" at each app / script startup, but for this
>> to work I would need to initially connect as superuser - and I really
>> dislike the idea of having a webserver connecting as superuser :/
>>
>> Any ideas? I can't be the first person on earth with that problem ;/
>>
>>
>> Thanks in advance
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 3: if posting/reading through Usenet, please send an appropriate
>>      subscribe-nomail command to majordomo@postgresql.org so that your
>>      message can get through to the mailing list cleanly
>>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>      subscribe-nomail command to majordomo@postgresql.org so that your
>      message can get through to the mailing list cleanly
>

Re: Persistent Connections in Webserver Environment

От
Marco Colombo
Дата:
On Mon, 2005-05-02 at 14:45 +0200, Hannes Dorbath wrote:
> Hi,
> as the subject says I need some advice on setting up connection handling
> to PG in a webserver environment. It's a typical dual Xeon FreeBSD box
> running Apache2 with mod_php5 and PG 8. About 20 different applications
> (ecommerce systems) will be running on this box. Each app resides in
> it's own schema inside a single database. As far as I understand
> persistent connections from apache processes can only be reused if the
> authentication information of the allready existing connection is the
> same. So in case an apache process holds a persistent connection to
> database "test", auth'ed with username "user1" and another app wants to
> connect as "user2" the connection can't be reused and a new one will be
> spawned.
>
> So what we are doing atm is telling all apps to use the user "apache",
> grant access for this user to all schemas and fire "SET search_path TO
> <app_schema>;" at the startup of each app / script. It works, but I
> really would like to have an dedicated user for each app / schema for
> security reasons.
>
> The next better idea I came up with was to fire "SET SESSION
> AUTHORIZATION TO <user>;" at each app / script startup, but for this to
> work I would need to initially connect as superuser - and I really
> dislike the idea of having a webserver connecting as superuser :/
>
> Any ideas? I can't be the first person on earth with that problem ;/
>

Have you measured the real gain in using persistent connections at all?

In my experience, it's just a CPU vs RAM tradeoff. Before you go thru
the pain of setting up a weird authentication mechanism, try and
consider whether you really need persistent connections. Search the
lists, it has been discussed in the past. I remember of this thread:
http://archives.postgresql.org/pgsql-php/2005-02/msg00009.php

There may be others, too.

.TM.
--
      ____/  ____/   /
     /      /       /                   Marco Colombo
    ___/  ___  /   /                  Technical Manager
   /          /   /                      ESI s.r.l.
 _____/ _____/  _/                      Colombo@ESI.it


Re: Persistent Connections in Webserver Environment

От
Hannes Dorbath
Дата:
On 02.05.2005 16:41, Marco Colombo wrote:
> Have you measured the real gain in using persistent connections at all?

I measured it about one year ago on a linux box. Swichting from
multi-user-pg_connect to single-user-pg_pconnect was a big improvment on
that box -- 50% and more on pages with just 1-2 simple SELECT queries. I
haven't done tests on BSD though.

Re: Persistent Connections in Webserver Environment

От
Hannes Dorbath
Дата:
On 02.05.2005 16:41, Marco Colombo wrote:

> Have you measured the real gain in using persistent connections at all?

As simple as possible:

<?php
require_once('Benchmark/Timer.php');
$timer =& new Benchmark_Timer();
$timer->start();

pg_pconnect('host=myhost dbname=database user=user');
pg_query("SET search_path TO myschema;");

$q = "SELECT u.login FROM users WHERE u.user_id = 1;";

$qr = pg_query($q);

print_r(pg_fetch_all($qr));

$timer->setMarker('Database');
$timer->stop();
$timer->display();
?>

Results:

pconnect: 0.001435995101928
connect:  0.016793966293335

It's factor 10 on such simple things on the BSD box.

Re: Persistent Connections in Webserver Environment

От
Hannes Dorbath
Дата:
On 02.05.2005 17:32, Hannes Dorbath wrote:
> $q = "SELECT u.login FROM users WHERE u.user_id = 1;";

Sorry, it should read:

$q = "SELECT u.login FROM users u WHERE u.user_id = 1;";

I accidently removed the "u" after users while removing line breaks to
make it shorter to post here.

Re: Persistent Connections in Webserver Environment

От
Scott Marlowe
Дата:
On Mon, 2005-05-02 at 10:32, Hannes Dorbath wrote:
> On 02.05.2005 16:41, Marco Colombo wrote:
>
> > Have you measured the real gain in using persistent connections at all?
>
> As simple as possible:
>
> <?php
> require_once('Benchmark/Timer.php');
> $timer =& new Benchmark_Timer();
> $timer->start();
>
> pg_pconnect('host=myhost dbname=database user=user');
> pg_query("SET search_path TO myschema;");
>
> $q = "SELECT u.login FROM users WHERE u.user_id = 1;";
>
> $qr = pg_query($q);
>
> print_r(pg_fetch_all($qr));
>
> $timer->setMarker('Database');
> $timer->stop();
> $timer->display();
> ?>
>
> Results:
>
> pconnect: 0.001435995101928
> connect:  0.016793966293335

But if the average PHP script takes 50 milliseconds to start up and 100
milliseconds to run, then either one is still pretty much noise.

Plus, neither benchmark is interesting really until you have pretty good
parallel load running.

It may well be that pconnect makes a difference under heavier load.  But
most the time, I've seen one or two queries that could be tuned make a
much larger difference than using pconnect.

And, if you haven't got apache / postgresql configured properly and run
out of connections, the connect time goes from 16 mS to hours as your
users wait for you to fix the connectivity issues.  :)

Re: Persistent Connections in Webserver Environment

От
Hannes Dorbath
Дата:
On 02.05.2005 17:41, Scott Marlowe wrote:

> But if the average PHP script takes 50 milliseconds to start up and 100 milliseconds to run, then either one is still
prettymuch noise. 

Yeah, _IF_ :)
Our scripts reside precompiled in a bytecode cache so there just isn't
much start up time ;)

I just replaced the simple query with a real world one:

http://hannes.imos.net/real_world.txt

http://hannes.imos.net/pconnect.php
http://hannes.imos.net/connect.php

Refresh both a few times to get meaningful result.

Still factor 3-4. And this is one of the most complex queries we have --
and factor 3-4 just makes a differece for us :)

> Plus, neither benchmark is interesting really until you have pretty good parallel load running.

The scripts are on a production box with decent load.

> It may well be that pconnect makes a difference under heavier load.  But most the time, I've seen one or two queries
thatcould be tuned make a much larger difference than using pconnect. 

We allways try to optimize and get most of performance from a query
anyway :)

> And, if you haven't got apache / postgresql configured properly and run out of connections, the connect time goes
from16 mS to hours as your users wait for you to fix the connectivity issues.  :)  

True, but we will test high load and will make sure that this won't
happen :)


--
imos  Gesellschaft fuer Internet-Marketing und Online-Services mbH
Alfons-Feifel-Str. 9 // D-73037 Goeppingen  // Stauferpark Ost
Tel: 07161 93339-14 // Fax: 07161 93339-99 // Internet: www.imos.net

Re: Persistent Connections in Webserver Environment

От
Marco Colombo
Дата:
On Mon, 2005-05-02 at 17:32 +0200, Hannes Dorbath wrote:
> On 02.05.2005 16:41, Marco Colombo wrote:
>
> > Have you measured the real gain in using persistent connections at all?
>
> As simple as possible:
>
> <?php
> require_once('Benchmark/Timer.php');
> $timer =& new Benchmark_Timer();
> $timer->start();
>
> pg_pconnect('host=myhost dbname=database user=user');
> pg_query("SET search_path TO myschema;");
>
> $q = "SELECT u.login FROM users WHERE u.user_id = 1;";
>
> $qr = pg_query($q);
>
> print_r(pg_fetch_all($qr));
>
> $timer->setMarker('Database');
> $timer->stop();
> $timer->display();
> ?>
>
> Results:
>
> pconnect: 0.001435995101928
> connect:  0.016793966293335
>
> It's factor 10 on such simple things on the BSD box.

Ok, but the difference is going to be unnoticed, that's not the point
at all.

The question was: have you measured any difference in the server load?
I did in the past and wasn't really able to measure it, with more than
300 http processes active. The web server load is _way_ lower than the
db server. Currently we're about at 100 processes (but with pconnect)
and:

(web) load average: 0.31, 0.27, 0.21
(db)  load average: 0.24, 0.21, 0.18

and I know that turning to use simple connect won't change much as page
load time is dominated by the time spent in the queries (and the
overhead of 1/100 or 1/1000 of second in the startup time goes unnoticed
at all).

With any modern operating system, the overhead is very low (15ms is very
good actually).

In my experience, pconnect my cause RAM problems. The number of
processes is useless high. You have make provisions for a large
number of backends, and that means little RAM to single backend.

My advice is: use pconnect only when you have CPU problems,
unless your case is very degenerated one (your db host being on the
other side of the globe).

And, in my experience again, the first reasons for CPU problems on
the database server are:

- wrong/missing vacuum/analyze (or similar);
- bad coding on the application side (placing silly load on the server);
- bad queries (misuse/lack of indexes);
- bad tuning of PostgreSQL (expecially RAM);
...
...
- connect overhead.

I've never managed to reach the last item in the list in real world
cases. I think it is by far the least important item.

#1 Golden Rule for optimizing:
- Don't.

(Expecially when it causes _real_ troubles elsewhere.)

Have a nice day,
.TM.
--
      ____/  ____/   /
     /      /       /                   Marco Colombo
    ___/  ___  /   /                  Technical Manager
   /          /   /                      ESI s.r.l.
 _____/ _____/  _/                      Colombo@ESI.it


Re: Persistent Connections in Webserver Environment

От
Hannes Dorbath
Дата:
On 02.05.2005 18:33, Marco Colombo wrote:

> #1 Golden Rule for optimizing:
> - Don't.
>
> (Expecially when it causes _real_ troubles elsewhere.)

hmm.. :/

I'll do some more meaningful testing on server load this night..

Thanks so far!