Обсуждение: improving concurrent transactin commit rate

Поиск
Список
Период
Сортировка

improving concurrent transactin commit rate

От
Sam Mason
Дата:
Hi,

I had an idea while going home last night and still can't think why it's
not implemented already as it seems obvious.

The conceptual idea is to have at most one outstanding flush for the
log going through the filesystem at any one time.  The effect, as far
as I can think through, would be to trade latency for bandwidth.  In
commit heavy situations you're almost always going to be starved for
rotational latency with the log while the full bandwidth of the log
device is rarely going to be much of a problem.

I don't understand PG well enough to know if/how this could be
implemented; I've had a look through transam/xlog.c and sort of
understand what's going on but will have missed all the subtleties of
its operation.  So, please take what I say below with a little salt!

The way I'm imagining it working is as follows; when a flush gets issued
the code does:
 global Lock l; global int writtento = 0, flushedto = 0; /* where are we known to have written data up to currently */
writtento= max(writtento,myrecord); /* try and acquire the flush lock */ if (!conditionalacquire (l)) {   /* lock
alreadytaken, block ourself until they finish by acquiring it */   acquire (lock);   /* if somebody "later" in the
queuegot unblocked then their flush is OK for us and we're winning */   if (myrecord <= flushedto) {     goto out;   }
}/* flush needed, record the latest write's position in the queue */ local int curat = writtento; /* actually perform
theflush */ fdatasync (log_fd); /* record where we're done flushing to so others can finish early */ flushedto =
curat;
out: /* send the next process off */ release (l);

To simplify; I've assumed that access to globals is always atomic,
locking would obviously need to be different in a real implementation.

In the case of a single client the performance hit is going to be in
a disk flush anyway; as this is likely to be a somewhat expensive
operation I'm hoping that taking a lock here isn't going to matter
much.  Two clients is going to be worse (I think) as it's going to wait
for the first client to finish flushing before sending the second flush
request off.  Three clients and more will be a win; the two clients will
wait while the first flush completes and then they'll both flush at the
same time.  This would appear to speed things up by n-2 times where n is
the number of clients waiting to commit.

What have I missed?

If this has been explored in the literature I'd appreciate any pointers;
I had a search but couldn't find anything---I'm not sure what the
terminology would be for this sort of thing anyway.

--  Sam  http://samason.me.uk/


Re: improving concurrent transactin commit rate

От
Tom Lane
Дата:
Sam Mason <sam@samason.me.uk> writes:
> The conceptual idea is to have at most one outstanding flush for the
> log going through the filesystem at any one time.

I think this is a variant of the "group commit" or "commit delay"
stuff that's already in there (and doesn't work real well :-().
The problem is to sync multiple transactions without a lot of extra
overhead.

Realize also that if the kernel's not completely brain dead, some
of this happens already by virtue of the fact that everyone's
fsync'ing the same WAL file.
        regards, tom lane


Re: improving concurrent transactin commit rate

От
Andrew Gierth
Дата:
>>>>> "Sam" == Sam Mason <sam@samason.me.uk> writes:
Sam> Hi,Sam> I had an idea while going home last night and still can't thinkSam> why it's not implemented already as it
seemsobvious.[snip idea about WAL fsyncs]
 

Unless I'm badly misunderstanding you, I think it already has (long
ago).

Only the holder of the WALWriteLock can write and fsync the WAL, and
XLogFlush implements pretty much exactly the logic you described.

-- 
Andrew (irc:RhodiumToad)


Re: improving concurrent transactin commit rate

От
Greg Stark
Дата:
Sorry for top-posting -- blame apple.

Isn't this just a good description of exactly how it works today?

-- 
Greg


On 24 Mar 2009, at 20:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Sam Mason <sam@samason.me.uk> writes:
>> The conceptual idea is to have at most one outstanding flush for the
>> log going through the filesystem at any one time.
>
> I think this is a variant of the "group commit" or "commit delay"
> stuff that's already in there (and doesn't work real well :-().
> The problem is to sync multiple transactions without a lot of extra
> overhead.
>
> Realize also that if the kernel's not completely brain dead, some
> of this happens already by virtue of the fact that everyone's
> fsync'ing the same WAL file.
>
>            regards, tom lane
>
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


Re: improving concurrent transactin commit rate

От
Greg Smith
Дата:
On Tue, 24 Mar 2009, Sam Mason wrote:

> The conceptual idea is to have at most one outstanding flush for the
> log going through the filesystem at any one time.

Quoting from src/backend/access/transam/xlog.c, inside XLogFlush:

"Since fsync is usually a horribly expensive operation, we try to 
piggyback as much data as we can on each fsync: if we see any more data 
entered into the xlog buffer, we'll write and fsync that too, so that the 
final value of LogwrtResult.Flush is as large as possible. This gives us 
some chance of avoiding another fsync immediately after."

The logic implementing that idea takes care of bunching up flushes for WAL 
data that also happens to be ready to go at that point.  You can see this 
most easily by doing inserts into a system that's limited by a slow fsync, 
like a single disk without write cache where you're bound by RPM speed. 
If you have, say, a 7200RPM disk, no one client can commit faster than 120 
times/second.  But if you have 10 clients all pushing small inserts in, 
it's fairly easy to see >500 transactions/second, because a bunch of 
commits will get batched up during the time the last fsync is waiting for 
the disk to finish.

The other idea you'll already find implemented in there is controlled by 
commit_delay.  If there are more than commit_siblings worth of open 
transactions at the point where a commit is supposed to happen, that will 
pause commit_delay microseconds in hopes that other transactions will jump 
onboard via the mechanism described above.  In practice, it's very hard to 
tune that usefully.  You can use it to help bunch together commits a bit 
better into bigger batches on a really busy system (where not having more 
than one commit ready is unexpected), it's not much help outside of that 
context.

Check out the rest of the comments in xlog.c, there's a lot in there 
that's not really covered in the README.  If you turn on WAL_DEBUG and 
XLOG_DEBUG you can actually watch some of this happen.  I found time spent 
reading the source to that file and src/backend/storage/buffer/bufmgr.c to 
be really well spent, some of the most interesting parts of the codebase 
to understand from a low-level performance tuning perspective are in those 
two.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


Re: improving concurrent transactin commit rate

От
Sam Mason
Дата:
 [ I'm arbitrarily replying to Greg as his was the most verbose ]

On Tue, Mar 24, 2009 at 11:23:36PM -0400, Greg Smith wrote:
> On Tue, 24 Mar 2009, Sam Mason wrote:
> >The conceptual idea is to have at most one outstanding flush for the
> >log going through the filesystem at any one time.
> 
> Quoting from src/backend/access/transam/xlog.c, inside XLogFlush:
> 
> "Since fsync is usually a horribly expensive operation[...]

I think I read that but for some reason thought this was only within
one backend. i.e. if several separate transactions are being committed
at once then they will all be twiddling their own locks.  This seems
somewhat silly now and I'm not sure what I was thinking; there wouldn't
be any need for locks in that case.

> You can see this 
> most easily by doing inserts into a system that's limited by a slow fsync, 
> like a single disk without write cache where you're bound by RPM speed. 

Yes, I did a test like this and wasn't getting the scaling I was
expecting--hence my post.  I thought I'd need more clients to see any
effect so my base line was at 10 clients.  I've just redone my tests
and am getting the scaling you describe (and lower down than I was
expecting).
 http://samason.me.uk/~sam/scaling.svg

Why does it top out so much though?  It goes up nicely to around ten
clients (I tested with 8 and 12) and then tops out and levels off.  The
log is chugging along at around 2MB/s which is well above where they
are for a single client, but it still seems as though things could go
further.

Then again, I'm not sure how likely this is to occur in the real-world.

> The other idea you'll already find implemented in there is controlled by 
> commit_delay.  If there are more than commit_siblings worth of open 
> transactions at the point where a commit is supposed to happen, that will 
> pause commit_delay microseconds in hopes that other transactions will jump 
> onboard via the mechanism described above.  In practice, it's very hard to 
> tune that usefully.  You can use it to help bunch together commits a bit 
> better into bigger batches on a really busy system (where not having more 
> than one commit ready is unexpected), it's not much help outside of that 
> context.

OK; that's above and beyond what I was thinking and I can see how this
could have a negative impact on performance.  How much does it usefully
affect things on a busy system? is this mopping up the slack left over
that I described above when you've got more than 10 clients?

> Check out the rest of the comments in xlog.c, there's a lot in there 
> that's not really covered in the README.  If you turn on WAL_DEBUG and 
> XLOG_DEBUG you can actually watch some of this happen.  I found time spent 
> reading the source to that file and src/backend/storage/buffer/bufmgr.c to 
> be really well spent, some of the most interesting parts of the codebase 
> to understand from a low-level performance tuning perspective are in those 
> two.

Thanks for the pointers; I'll have a read!

--  Sam  http://samason.me.uk/


Re: improving concurrent transactin commit rate

От
Greg Stark
Дата:
Sam Mason <sam@samason.me.uk> writes:

>> You can see this
>> most easily by doing inserts into a system that's limited by a slow fsync,
>> like a single disk without write cache where you're bound by RPM speed.
>
> Yes, I did a test like this and wasn't getting the scaling I was
> expecting--hence my post.  I thought I'd need more clients to see any
> effect so my base line was at 10 clients.  I've just redone my tests
> and am getting the scaling you describe (and lower down than I was
> expecting).
>
>   http://samason.me.uk/~sam/scaling.svg
>
> Why does it top out so much though?  It goes up nicely to around ten
> clients (I tested with 8 and 12) and then tops out and levels off.  The
> log is chugging along at around 2MB/s which is well above where they
> are for a single client, but it still seems as though things could go
> further.

Well 2MB/s sounds about right actually:

You have: 8kB / ( 1|7200|2min)
You want: MB/s* 1.92/ 0.52083333


Heikki looked at this a while back and we concluded that the existing
algorithm will only get 1/2 the optimal rate unless you have twice as many
sessions as you ought to need to saturate the log i/o.

What happens is that the first backend comes along, finds nobody else waiting
and does an fsync for its own work. While that fsync is happening the rest of
the crowd -- N-1 backends -- comes along and blocks waiting on the lock. The
first backend to get the lock fsyncs the whole N-1 transactions. When it's
done though the whole crowd finds the log already syncs and goes back to work.
The first transaction to commit again finds nobody waiting and syncs alone
again. rinse lather repeat.

But that only kicks in if you don't have enough sessions running enough
transactions.

-- 
greg


Re: improving concurrent transactin commit rate

От
Tom Lane
Дата:
Greg Stark <stark@enterprisedb.com> writes:
> What happens is that the first backend comes along, finds nobody else waiting
> and does an fsync for its own work. While that fsync is happening the rest of
> the crowd -- N-1 backends -- comes along and blocks waiting on the lock. The
> first backend to get the lock fsyncs the whole N-1 transactions. When it's
> done though the whole crowd finds the log already syncs and goes back to work.
> The first transaction to commit again finds nobody waiting and syncs alone
> again. rinse lather repeat.

Right.  The idea of the commit-delay stuff is to avoid that by letting
the first guy wait a little bit before starting to sync, but as
mentioned, we've never been able to get it to work real well.
        regards, tom lane


Re: improving concurrent transactin commit rate

От
Sam Mason
Дата:
On Wed, Mar 25, 2009 at 02:38:45PM +0000, Greg Stark wrote:
> Sam Mason <sam@samason.me.uk> writes:
> > Why does it top out so much though?  It goes up nicely to around ten
> > clients (I tested with 8 and 12) and then tops out and levels off.  The
> > log is chugging along at around 2MB/s which is well above where they
> > are for a single client, but it still seems as though things could go
> > further.
> 
> Well 2MB/s sounds about right actually:
> 
> You have: 8kB / ( 1|7200|2min)
> You want: MB/s
>     * 1.92
>     / 0.52083333

I'd need more explanation (or other pointers) to follow what you mean
there.  I've actually got a 15k disk, but it shouldn't matter much.
2MB/s seems to be consistent across any number of clients (specifically
1 to 48 here).

> Heikki looked at this a while back and we concluded that the existing
> algorithm will only get 1/2 the optimal rate unless you have twice as many
> sessions as you ought to need to saturate the log i/o.

I'm writing to a 15k disk which gives me 250 rotations per second.  In
the case of a single client I'm getting about 220 transactions per
second.  That seems reasonable.  When I have two clients this stays
at about 220 transactions per second.  Also reasonable, they end up
serialising after each other.

Three clients; I get about 320 tps.  This appears to be consistent with
1.5*220 and would imply that there's always a "spare" client behind the
lock that gets committed for free.  Four clients; I get 430 tps which
would mean the queueing is all good.

Below I've calculated the (mean) transaction per second over a series
of runs and calculated the value I'd expect to get (i.e. clients/2) and
then the ratio of the two.
 clients  tps     calc  ratio  1      221.5  2      223.8   220.0   102%  3      323.5   330.0    98%  4      427.7
440.0   97%  6      647.4   660.0    98%  8      799.7   880.0    91% 12      946.0  1320.0    72% 18     1020.6
1980.0   52% 24     1089.2  2640.0    41% 32     1116.6  3520.0    32% 48     1141.8  5280.0    22%
 

As you can see the ratio between the tps I'm seeing and expecting drops
off significantly after 18 clients, with the trend starting somewhere
around seven clients.  I don't understand why this would be happening.

My highly involved and complicated benchmarking is a shell script that
does:
 #!/bin/bash nclients=$1 ittrs=$2 function gensql  {     echo "INSERT INTO bm (c,v) VALUES ('$1','0');"     for (( i =
1;i < $ittrs; i++ )); do         echo "UPDATE bm SET v = '$i' WHERE c = '$1';"     done     echo "DELETE FROM bm WHERE
c= '$1';" } for (( c = 0; c < $nclients; c++)); do     gensql $c | psql -Xq -f - & done for (( c = 0; c < $nclients;
c++));do     wait done
 

I'm running "time test.sh 8 1000" and recording the time; tps = nclients
* ittrs / time.  Where the time is the "wall clock" time expired.  I'm
repeating measurements four times and the "error bars" in my SVG from
before were the standard deviation of the runs.

Something (the HOT code?) keeps the number of dead tuples consistent so
I don't think this would be confounding things.  But improvements would
be appreciated.

--  Sam  http://samason.me.uk/


Re: improving concurrent transactin commit rate

От
Kenneth Marshall
Дата:
On Wed, Mar 25, 2009 at 03:58:06PM +0000, Sam Mason wrote:
> On Wed, Mar 25, 2009 at 02:38:45PM +0000, Greg Stark wrote:
> > Sam Mason <sam@samason.me.uk> writes:
> > > Why does it top out so much though?  It goes up nicely to around ten
> > > clients (I tested with 8 and 12) and then tops out and levels off.  The
> > > log is chugging along at around 2MB/s which is well above where they
> > > are for a single client, but it still seems as though things could go
> > > further.
> > 
> > Well 2MB/s sounds about right actually:
> > 
> > You have: 8kB / ( 1|7200|2min)
> > You want: MB/s
> >     * 1.92
> >     / 0.52083333
> 
> I'd need more explanation (or other pointers) to follow what you mean
> there.  I've actually got a 15k disk, but it shouldn't matter much.
> 2MB/s seems to be consistent across any number of clients (specifically
> 1 to 48 here).
> 
> > Heikki looked at this a while back and we concluded that the existing
> > algorithm will only get 1/2 the optimal rate unless you have twice as many
> > sessions as you ought to need to saturate the log i/o.
> 
> I'm writing to a 15k disk which gives me 250 rotations per second.  In
> the case of a single client I'm getting about 220 transactions per
> second.  That seems reasonable.  When I have two clients this stays
> at about 220 transactions per second.  Also reasonable, they end up
> serialising after each other.
> 
> Three clients; I get about 320 tps.  This appears to be consistent with
> 1.5*220 and would imply that there's always a "spare" client behind the
> lock that gets committed for free.  Four clients; I get 430 tps which
> would mean the queueing is all good.
> 
> Below I've calculated the (mean) transaction per second over a series
> of runs and calculated the value I'd expect to get (i.e. clients/2) and
> then the ratio of the two.
> 
>   clients  tps     calc  ratio
>    1      221.5
>    2      223.8   220.0   102%
>    3      323.5   330.0    98%
>    4      427.7   440.0    97%
>    6      647.4   660.0    98%
>    8      799.7   880.0    91%
>   12      946.0  1320.0    72%
>   18     1020.6  1980.0    52%
>   24     1089.2  2640.0    41%
>   32     1116.6  3520.0    32%
>   48     1141.8  5280.0    22%
> 
> As you can see the ratio between the tps I'm seeing and expecting drops
> off significantly after 18 clients, with the trend starting somewhere
> around seven clients.  I don't understand why this would be happening.
> 
> My highly involved and complicated benchmarking is a shell script that
> does:
> 
>   #!/bin/bash
>   nclients=$1
>   ittrs=$2
>   function gensql  {
>       echo "INSERT INTO bm (c,v) VALUES ('$1','0');"
>       for (( i = 1; i < $ittrs; i++ )); do
>           echo "UPDATE bm SET v = '$i' WHERE c = '$1';"
>       done
>       echo "DELETE FROM bm WHERE c = '$1';"
>   }
>   for (( c = 0; c < $nclients; c++)); do
>       gensql $c | psql -Xq -f - &
>   done
>   for (( c = 0; c < $nclients; c++)); do
>       wait
>   done
> 
> I'm running "time test.sh 8 1000" and recording the time; tps = nclients
> * ittrs / time.  Where the time is the "wall clock" time expired.  I'm
> repeating measurements four times and the "error bars" in my SVG from
> before were the standard deviation of the runs.
> 
> Something (the HOT code?) keeps the number of dead tuples consistent so
> I don't think this would be confounding things.  But improvements would
> be appreciated.
> 
> -- 
>   Sam  http://samason.me.uk/
> 

Are you sure that you are able to actually drive the load at the
high end of the test regime? You may need to use multiple clients
to simulate the load effectively.

Cheers,
Ken


Re: improving concurrent transactin commit rate

От
Sam Mason
Дата:
On Wed, Mar 25, 2009 at 12:01:57PM -0500, Kenneth Marshall wrote:
> On Wed, Mar 25, 2009 at 03:58:06PM +0000, Sam Mason wrote:
> >   #!/bin/bash
> >   nclients=$1
> >   ittrs=$2
> >   function gensql  {
> >       echo "INSERT INTO bm (c,v) VALUES ('$1','0');"
> >       for (( i = 1; i < $ittrs; i++ )); do
> >           echo "UPDATE bm SET v = '$i' WHERE c = '$1';"
> >       done
> >       echo "DELETE FROM bm WHERE c = '$1';"
> >   }
> >   for (( c = 0; c < $nclients; c++)); do
> >       gensql $c | psql -Xq -f - &
> >   done
> >   for (( c = 0; c < $nclients; c++)); do
> >       wait
> >   done
> 
> Are you sure that you are able to actually drive the load at the
> high end of the test regime? You may need to use multiple clients
> to simulate the load effectively.

Notice that the code is putting things into the background and then
waiting for them to finish so there will be multiple clients.  Or maybe
I'm misunderstanding what you mean.

I've just tried modifying the code to write the generated SQL out to
a set of files first and this speeds things up by about 6% (the 48
client case goes from taking ~42 seconds to ~39 seconds) indicating that
everything is probably OK with the test harness.  Also note that this 6%
improvement will be linear and across the board and hence should just
appear as slightly reduced performance for my system.  As I'm not really
interested in absolute performance and more in how the system scales as
load increases this will negate this effect even further.

--  Sam  http://samason.me.uk/


Re: improving concurrent transactin commit rate

От
Kenneth Marshall
Дата:
On Wed, Mar 25, 2009 at 05:56:02PM +0000, Sam Mason wrote:
> On Wed, Mar 25, 2009 at 12:01:57PM -0500, Kenneth Marshall wrote:
> > On Wed, Mar 25, 2009 at 03:58:06PM +0000, Sam Mason wrote:
> > >   #!/bin/bash
> > >   nclients=$1
> > >   ittrs=$2
> > >   function gensql  {
> > >       echo "INSERT INTO bm (c,v) VALUES ('$1','0');"
> > >       for (( i = 1; i < $ittrs; i++ )); do
> > >           echo "UPDATE bm SET v = '$i' WHERE c = '$1';"
> > >       done
> > >       echo "DELETE FROM bm WHERE c = '$1';"
> > >   }
> > >   for (( c = 0; c < $nclients; c++)); do
> > >       gensql $c | psql -Xq -f - &
> > >   done
> > >   for (( c = 0; c < $nclients; c++)); do
> > >       wait
> > >   done
> > 
> > Are you sure that you are able to actually drive the load at the
> > high end of the test regime? You may need to use multiple clients
> > to simulate the load effectively.
> 
> Notice that the code is putting things into the background and then
> waiting for them to finish so there will be multiple clients.  Or maybe
> I'm misunderstanding what you mean.
> 
> I've just tried modifying the code to write the generated SQL out to
> a set of files first and this speeds things up by about 6% (the 48
> client case goes from taking ~42 seconds to ~39 seconds) indicating that
> everything is probably OK with the test harness.  Also note that this 6%
> improvement will be linear and across the board and hence should just
> appear as slightly reduced performance for my system.  As I'm not really
> interested in absolute performance and more in how the system scales as
> load increases this will negate this effect even further.
> 
> -- 
>   Sam  http://samason.me.uk/
> 

I did notice how your test harness was designed. It just seemed that
the actual process contention on your load generation system will actually
bottle-neck as the number of clients increases and that may be the cause
of your fall-off, or a contributor. You could test it by generating the
load from independent boxes and see how the perfomance falls-off as you
add additional load clients+boxes.

My two cents,
Ken


Re: improving concurrent transactin commit rate

От
Sam Mason
Дата:
On Wed, Mar 25, 2009 at 01:48:03PM -0500, Kenneth Marshall wrote:
> On Wed, Mar 25, 2009 at 05:56:02PM +0000, Sam Mason wrote:
> > On Wed, Mar 25, 2009 at 12:01:57PM -0500, Kenneth Marshall wrote:
> > > Are you sure that you are able to actually drive the load at the
> > > high end of the test regime? You may need to use multiple clients
> > > to simulate the load effectively.
> > 
> > Notice that the code is putting things into the background and then
> > waiting for them to finish so there will be multiple clients.  Or maybe
> > I'm misunderstanding what you mean.
> 
> I did notice how your test harness was designed.

OK, that's turned out to be a good point.  I've now written five
different versions and they don't seem to give the results I'm expecting
at all!

Running tests from another machine seems to slow all tests down; I'd put
this down to the increased latency between server and client but am not
sure how to demonstrate (i.e. "prove" in layman terms) this.

I've got my original shell based approach, a Python version and three
C versions (fork, pthreads and select based concurrency).  The most
scalable, by quite a long way, is the Python version and I don't
understand why.  I've plotted the mean transactions per second (and
standard deviation) for all tests in the following SVG file:
 http://samason.me.uk/~sam/pg-concurrency/compare.svg

The Python version is pretty linear up to 18 clients and then seems to
hit a wall; all the other versions petered out much earlier.  The fact
I'm IO bound would mean the shell and C based approaches are going to
be similar, but why is the Python version so much faster?  CPU time was
highest in the shell based version, generally topping out around 50%
utilisation but the others topped out at around 35%; so I'd say I was
still IO bound.

The source for the tests is available here:
 http://samason.me.uk/~sam/pg-concurrency/concurrent.sh http://samason.me.uk/~sam/pg-concurrency/concurrent.py
http://samason.me.uk/~sam/pg-concurrency/concurrent-fork.c
http://samason.me.uk/~sam/pg-concurrency/concurrent-pthreads.c
http://samason.me.uk/~sam/pg-concurrency/concurrent-select.c

I think I'm abusing things a bit with my fork based version; it
all seems to work OK but I wouldn't trust this style in real code.
Otherwise, if people have comments about how to improve things I'd be
interested to know.

--  Sam  http://samason.me.uk/


Re: improving concurrent transactin commit rate

От
Dimitri Fontaine
Дата:
Hi,

Le 27 mars 09 à 21:42, Sam Mason a écrit :
> OK, that's turned out to be a good point.  I've now written five
> different versions and they don't seem to give the results I'm
> expecting
> at all!

If you're that much willing to have a good concurrent load simulator
client for postgresql, my take is for you to test tsung. This surely
will take less time than rewriting it with result I'd suspect less
efficient than the 20+ years of research that came into Erlang design
and implementation :)  http://tsung.erlang-projects.org/
http://archives.postgresql.org/pgsql-admin/2008-12/msg00032.php

Your task will be to install erlang and tsung, then to write up a load
parameter XML script embedding your SQL requests as CDATA. You could
define more than one session and ask to mix what session each
simulated user will run (70% of this, 20% of that, 10% of the latter),
and give a thinktime between some requests, will get be respected on
average (with a grain of randomness).

Have fun,
--
dim