Обсуждение: Parallel queries for a web-application |performance testing

От:
Balkrishna Sharma
Дата:

Hello,
I will have a web application having postgres 8.4+ as backend. At any given time, there will be max of 1000 parallel web-users interacting with the database (read/write)
I wish to do performance testing of 1000 simultaneous read/write to the database.

I can do a simple unix script on the postgres server and have parallel updates fired for example with an ampersand at the end. Example:

echo '\timing \\update "DAPP".emp_data set f1 = 123where  emp_id =0;' | "psql" test1 postgres|grep "Time:"|cut -d' ' -f2- >> "/home/user/Documents/temp/logs/$NUM.txt" &
pid1=$!   
echo '\timing \\update "DAPP".emp_data set f1 = 123 where  emp_id =2;' | "psql" test1 postgres|grep "Time:"|cut -d' ' -f2- >> "/home/user/Documents/temp/logs/$NUM.txt" &
pid2=$!    
echo '\timing \\update "DAPP".emp_data set f1 = 123 where  emp_id =4;' | "psql" test1 postgres|grep "Time:"|cut -d' ' -f2- >> "/home/user/Documents/temp/logs/$NUM.txt" &
pid3=$!    
.......
......


My question is:
Am I losing something by firing these queries directly off the server and should I look at firing the queries from different IP address (as it would happen in a web application). Would the way postgres opens sockets/allocates buffer etc change in the two approaches and I get non-realistic results by a unix script on the server ?
It will be very tedious  exercise to have 1000 different machines (IP address)  and each firing a query; all the same time. But at the same time, I want to be absolutely sure my test would give the same result in production (requirements for latency for read/write is very very low)
I am not interested in the network time; just the database read/write time.


Thanks for any tips !
-Bala


The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. Get busy.
От:
"Kevin Grittner"
Дата:

Balkrishna Sharma <> wrote:

> I wish to do performance testing of 1000 simultaneous read/write
> to the database.

You should definitely be using a connection pool of some sort.  Both
your throughput and response time will be better that way.  You'll
want to test with different pool sizes, but I've found that a size
which allows the number of active queries in PostgreSQL to be
somewhere around (number_of_cores * 2) + effective_spindle_count to
be near the optimal size.

> My question is:Am I losing something by firing these queries
> directly off the server and should I look at firing the queries
> from different IP address (as it would happen in a web application).

If you run the client side of your test on the database server, the
CPU time used by the client will probably distort your results.  I
would try using one separate machine to generate the requests, but
monitor to make sure that the client machine isn't hitting some
bottleneck (like CPU time).  If the client is the limiting factor,
you may need to use more than one client machine.  No need to use
1000 different client machines.  :-)

-Kevin

От:
Dimitri Fontaine
Дата:

Balkrishna Sharma <> writes:
> I will have a web application having postgres 8.4+ as backend. At any given time, there will be max of 1000 parallel
web-usersinteracting with the database (read/write) 
> I wish to do performance testing of 1000 simultaneous read/write to
> the database.

See about tsung, and either benckmarck only the PostgreSQL side of
things, or at the HTTP side of things directly : that will run your
application code against PostgreSQL.

  http://tsung.erlang-projects.org/

And as Kevin said, consider using a connection pool, such as
pgbouncer. Once you have setup the benchmark with Tsung, adding
pgbouncer and comparing the results will be easy.

Regards,
--
dim

От:
Matthew Wakeling
Дата:

On Wed, 16 Jun 2010, Balkrishna Sharma wrote:
> Hello,I will have a web application having postgres 8.4+ as backend. At
> any given time, there will be max of 1000 parallel web-users interacting
> with the database (read/write)I wish to do performance testing of 1000
> simultaneous read/write to the database.

When you set up a server that has high throughput requirements, the last
thing you want to do is use it in a manner that cripples its throughput.
Don't try and have 1000 parallel Postgres backends - it will process those
queries slower than the optimal setup. You should aim to have
approximately ((2 * cpu core count) + effective spindle count) number of
backends, as that is the point at which throughput is the greatest. You
can use pgbouncer to achieve this.

> I can do a simple unix script on the postgres server and have parallel
> updates fired for example with an ampersand at the end. Example:

> echo '\timing \\update "DAPP".emp_data set f1 = 123where emp_id =0;' |
> "psql" test1 postgres|grep "Time:"|cut -d' ' -f2- >>
> "/home/user/Documents/temp/logs/$NUM.txt" &pid1=$!  echo '\timing
> \\update "DAPP".emp_data set f1 = 123 where emp_id =2;' | "psql" test1
> postgres|grep "Time:"|cut -d' ' -f2- >>
> "/home/user/Documents/temp/logs/$NUM.txt" &pid2=$!  echo '\timing
> \\update "DAPP".emp_data set f1 = 123 where emp_id =4;' | "psql" test1
> postgres|grep "Time:"|cut -d' ' -f2- >>
> "/home/user/Documents/temp/logs/$NUM.txt" &pid3=$!  .............

Don't do that. The overhead of starting up an echo, a psql, and a grep
will limit the rate at which these queries can be fired at Postgres, and
consume quite a lot of CPU. Use a proper benchmarking tool, possibly on a
different server.

Also, you should be using a different username to "postgres" - that one is
kind of reserved for superuser operations.

Matthew

--
 People who love sausages, respect the law, and work with IT standards
 shouldn't watch any of them being made.  -- Peter Gutmann

От:
"Pierre C"
Дата:

> When you set up a server that has high throughput requirements, the last
> thing you want to do is use it in a manner that cripples its throughput.
> Don't try and have 1000 parallel Postgres backends - it will process
> those queries slower than the optimal setup. You should aim to have
> approximately ((2 * cpu core count) + effective spindle count) number of
> backends, as that is the point at which throughput is the greatest. You
> can use pgbouncer to achieve this.

The same is true of a web server : 1000 active php interpreters (each
eating several megabytes or more) are not ideal for performance !

For php, I like lighttpd with php-fastcgi : the webserver proxies requests
to a small pool of php processes, which are only busy while generating the
page. Once the page is generated the webserver handles all (slow) IO to
the client.

An interesting side effect is that the number of database connections is
limited to the number of PHP processes in the pool, so you don't even need
a postgres connection pooler (unless you have lots of php boxes)...

От:
Dimitri Fontaine
Дата:

"Pierre C" <> writes:
> The same is true of a web server : 1000 active php interpreters (each eating
> several megabytes or more) are not ideal for performance !
>
> For php, I like lighttpd with php-fastcgi : the webserver proxies requests
> to a small pool of php processes, which are only busy while generating the
> page. Once the page is generated the webserver handles all (slow) IO to  the
> client.

I use haproxy for that, it handles requests queues very effectively.
--
dim