Обсуждение: Slow concurrent update of same row in a given table

Поиск
Список
Период
Сортировка

Slow concurrent update of same row in a given table

От
Rajesh Kumar Mallah
Дата:
Hi

While doing some stress testing for updates in a small sized table
we found the following results. We are not too happy about the speed
of the updates particularly at high concurrency (10 clients).

Initially we get 119 updates / sec but it drops to 10 updates/sec
as concurrency is increased.

PostgreSQL: 8.0.3
-------------------------------
TABLE STRUCTURE: general.stress
-------------------------------
| dispatch_id      | integer                  | not null  |
| query_id         | integer                  |           |
| generated        | timestamp with time zone |           |
| unsubscribes     | integer                  |           |
| read_count       | integer                  |           |
| status           | character varying(10)    |           |
| bounce_tracking  | boolean                  |           |
| dispatch_hour    | integer                  |           |
| dispatch_date_id | integer                  |           |
+------------------+--------------------------+-----------+
Indexes:
    "stress_pkey" PRIMARY KEY, btree (dispatch_id)

UPDATE STATEMENT:
update general.stress set read_count=read_count+1 where dispatch_id=114
TOOL USED: Perl/DBI , with prepared statement handlers
CONCURRENCY METHOD: executing multiple copies of same program
from different shells (linux enviornment)
CLIENT SERVER LINK : 10/100 Mbits , LAN

CLIENT CODE: stress.pl
-------------------------------------------------------------------------
#!/opt/perl/bin/perl -I/usr/local/masonapache/lib/perl
################################################
#overview: update the table as fast as possible (while(1){})
#on every 100th commit , print the average update frequency
#of last 100 updates
##########################################
use strict;
use Time::HiRes qw(gettimeofday tv_interval);
use Utils;
my $dbh = &Utils::db_connect();
my $sth = $dbh -> prepare("update general.stress set
read_count=read_count+1 where dispatch_id=114");
my $cnt=0;
my $t0 = [ gettimeofday ];
while(1) {
        $sth -> execute();
        $dbh->commit();
        $cnt++;
        if ($cnt % 100 == 0)
        {
                my $t1 = [ gettimeofday ];
                my $elapsed = tv_interval ( $t0 , $t1 );
                $t0 = $t1;
                printf "Rate: %d updates / sec\n" , 100.0/$elapsed ;
        }
}
$sth->finish();
$dbh->disconnect();
--------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------
RESULTS:
--------------------------------------------------------------------------------------

Number of Copies | Update perl Sec

1  --> 119
2  ---> 59
3  --->  38
4  ---> 28
5 --> 22
6 --> 19
7 --> 16
8 --> 14
9 --> 11
10 --> 11
11 --> 10

-------------------------------------------------------------------------------------
Note that the table was vacuum analyzed during the tests
total number of records in table: 93
-------------------------------------------------------------------------------------

Regds
Rajesh Kumar Mallah.

Re: Slow concurrent update of same row in a given table

От
Gavin Sherry
Дата:
On Wed, 28 Sep 2005, Rajesh Kumar Mallah wrote:

> Hi
>
> While doing some stress testing for updates in a small sized table
> we found the following results. We are not too happy about the speed
> of the updates particularly at high concurrency (10 clients).
>
> Initially we get 119 updates / sec but it drops to 10 updates/sec
> as concurrency is increased.
>
> PostgreSQL: 8.0.3
> -------------------------------
> TABLE STRUCTURE: general.stress
> -------------------------------
> | dispatch_id      | integer                  | not null  |
> | query_id         | integer                  |           |
> | generated        | timestamp with time zone |           |
> | unsubscribes     | integer                  |           |
> | read_count       | integer                  |           |
> | status           | character varying(10)    |           |
> | bounce_tracking  | boolean                  |           |
> | dispatch_hour    | integer                  |           |
> | dispatch_date_id | integer                  |           |
> +------------------+--------------------------+-----------+
> Indexes:
>     "stress_pkey" PRIMARY KEY, btree (dispatch_id)
>
> UPDATE STATEMENT:
> update general.stress set read_count=read_count+1 where dispatch_id=114

This means you are updating only one row, correct?

> Number of Copies | Update perl Sec
>
> 1  --> 119
> 2  ---> 59
> 3  --->  38
> 4  ---> 28
> 5 --> 22
> 6 --> 19
> 7 --> 16
> 8 --> 14
> 9 --> 11
> 10 --> 11
> 11 --> 10

So, 11 instances result in 10 updated rows per second, database wide or
per instance? If it is per instance, then 11 * 10 is close to the
performance for one connection.

That being said, when you've got 10 connections fighting over one row, I
wouldn't be surprised if you had bad performance.

Also, at 119 updates a second, you're more than doubling the table's
initial size (dead tuples) each second. How often are you vacuuming and
are you using vacuum or vacuum full?

Gavin

Re: Slow concurrent update of same row in a given table

От
Rajesh Kumar Mallah
Дата:
On 9/28/05, Gavin Sherry <swm@alcove.com.au> wrote:
> On Wed, 28 Sep 2005, Rajesh Kumar Mallah wrote:
>
> > Hi
> >
> > While doing some stress testing for updates in a small sized table
> > we found the following results. We are not too happy about the speed
> > of the updates particularly at high concurrency (10 clients).
> >
> > Initially we get 119 updates / sec but it drops to 10 updates/sec
> > as concurrency is increased.
> >
> > PostgreSQL: 8.0.3
> > -------------------------------
> > TABLE STRUCTURE: general.stress
> > -------------------------------
> > | dispatch_id      | integer                  | not null  |
> > | query_id         | integer                  |           |
> > | generated        | timestamp with time zone |           |
> > | unsubscribes     | integer                  |           |
> > | read_count       | integer                  |           |
> > | status           | character varying(10)    |           |
> > | bounce_tracking  | boolean                  |           |
> > | dispatch_hour    | integer                  |           |
> > | dispatch_date_id | integer                  |           |
> > +------------------+--------------------------+-----------+
> > Indexes:
> >     "stress_pkey" PRIMARY KEY, btree (dispatch_id)
> >
> > UPDATE STATEMENT:
> > update general.stress set read_count=read_count+1 where dispatch_id=114
>
> This means you are updating only one row, correct?

Correct.


>
> > Number of Copies | Update perl Sec
> >
> > 1  --> 119
> > 2  ---> 59
> > 3  --->  38
> > 4  ---> 28
> > 5 --> 22
> > 6 --> 19
> > 7 --> 16
> > 8 --> 14
> > 9 --> 11
> > 10 --> 11
> > 11 --> 10
>
> So, 11 instances result in 10 updated rows per second, database wide or
> per instance? If it is per instance, then 11 * 10 is close to the
> performance for one connection.


Sorry do not understand the difference between "database wide"
and "per instance"

>
> That being said, when you've got 10 connections fighting over one row, I
> wouldn't be surprised if you had bad performance.
>
> Also, at 119 updates a second, you're more than doubling the table's
> initial size (dead tuples) each second. How often are you vacuuming and
> are you using vacuum or vacuum full?


Yes I realize the obvious phenomenon now, (and the uselessness of the script)
 , we should not consider it a performance degradation.

I am having performance issue in my live database thats why i tried to
simulate the situation(may the the script was overstresser).

My original problem is  that i send 100 000s of emails carrying a
beacon for tracking readership every tuesday and on wednesday i see
lot of the said query in pg_stat_activity each of these query update
the SAME row that corresponds to the dispatch of last day and it is
then i face the performance problem.

I think i can only post further details next wednesday , please lemme
know how should i be dealing with the situation if each the updates takes
100times more time that normal update duration.

Best Regards
Mallah.


>
> Gavin
>

Re: Slow concurrent update of same row in a given table

От
Gavin Sherry
Дата:
On Wed, 28 Sep 2005, Rajesh Kumar Mallah wrote:

> > > Number of Copies | Update perl Sec
> > >
> > > 1  --> 119
> > > 2  ---> 59
> > > 3  --->  38
> > > 4  ---> 28
> > > 5 --> 22
> > > 6 --> 19
> > > 7 --> 16
> > > 8 --> 14
> > > 9 --> 11
> > > 10 --> 11
> > > 11 --> 10
> >
> > So, 11 instances result in 10 updated rows per second, database wide or
> > per instance? If it is per instance, then 11 * 10 is close to the
> > performance for one connection.
>
>
> Sorry do not understand the difference between "database wide"
> and "per instance"

Per instance.

>
> >
> > That being said, when you've got 10 connections fighting over one row, I
> > wouldn't be surprised if you had bad performance.
> >
> > Also, at 119 updates a second, you're more than doubling the table's
> > initial size (dead tuples) each second. How often are you vacuuming and
> > are you using vacuum or vacuum full?
>
>
> Yes I realize the obvious phenomenon now, (and the uselessness of the script)
>  , we should not consider it a performance degradation.
>
> I am having performance issue in my live database thats why i tried to
> simulate the situation(may the the script was overstresser).
>
> My original problem is  that i send 100 000s of emails carrying a
> beacon for tracking readership every tuesday and on wednesday i see
> lot of the said query in pg_stat_activity each of these query update
> the SAME row that corresponds to the dispatch of last day and it is
> then i face the performance problem.
>
> I think i can only post further details next wednesday , please lemme
> know how should i be dealing with the situation if each the updates takes
> 100times more time that normal update duration.

I see. These problems regularly come up in database design. The best thing
you can do is modify your database design/application such that instead of
incrementing a count in a single row, you insert a row into a table,
recording the 'dispatch_id'. Counting the number of rows for a given
dispatch id will give you your count.

Thanks,

Gavin

Re: Slow concurrent update of same row in a given table

От
Rajesh Kumar Mallah
Дата:
On 9/29/05, Gavin Sherry <swm@alcove.com.au> wrote:
> On Wed, 28 Sep 2005, Rajesh Kumar Mallah wrote:
>
> > > > Number of Copies | Update perl Sec
> > > >
> > > > 1  --> 119
> > > > 2  ---> 59
> > > > 3  --->  38
> > > > 4  ---> 28
> > > > 5 --> 22
> > > > 6 --> 19
> > > > 7 --> 16
> > > > 8 --> 14
> > > > 9 --> 11
> > > > 10 --> 11
> > > > 11 --> 10
> > >
> > > So, 11 instances result in 10 updated rows per second, database wide or
> > > per instance? If it is per instance, then 11 * 10 is close to the
> > > performance for one connection.
> >
> >
> > Sorry do not understand the difference between "database wide"
> > and "per instance"
>
> Per instance.
>
> >
> > >
> > > That being said, when you've got 10 connections fighting over one row, I
> > > wouldn't be surprised if you had bad performance.
> > >
> > > Also, at 119 updates a second, you're more than doubling the table's
> > > initial size (dead tuples) each second. How often are you vacuuming and
> > > are you using vacuum or vacuum full?
> >
> >
> > Yes I realize the obvious phenomenon now, (and the uselessness of the script)
> >  , we should not consider it a performance degradation.
> >
> > I am having performance issue in my live database thats why i tried to
> > simulate the situation(may the the script was overstresser).
> >
> > My original problem is  that i send 100 000s of emails carrying a
> > beacon for tracking readership every tuesday and on wednesday i see
> > lot of the said query in pg_stat_activity each of these query update
> > the SAME row that corresponds to the dispatch of last day and it is
> > then i face the performance problem.
> >
> > I think i can only post further details next wednesday , please lemme
> > know how should i be dealing with the situation if each the updates takes
> > 100times more time that normal update duration.
>
> I see. These problems regularly come up in database design. The best thing
> you can do is modify your database design/application such that instead of
> incrementing a count in a single row, you insert a row into a table,
> recording the 'dispatch_id'. Counting the number of rows for a given
> dispatch id will give you your count.
>

sorry i will be accumulating huge amount of rows in seperate table
with no extra info when i really want just the count. Do you have
a better database design in mind?

Also i encounter same problem in implementing read count of
articles in sites and in counting banner impressions where same
row get updated by multiple processes frequently.


Thanks & Regds
mallah.







> Thanks,
>
> Gavin
>

Re: Slow concurrent update of same row in a given table

От
Gavin Sherry
Дата:
On Thu, 29 Sep 2005, Rajesh Kumar Mallah wrote:

> On 9/29/05, Gavin Sherry <swm@alcove.com.au> wrote:
> > On Wed, 28 Sep 2005, Rajesh Kumar Mallah wrote:
> >
> > > > > Number of Copies | Update perl Sec
> > > > >
> > > > > 1  --> 119
> > > > > 2  ---> 59
> > > > > 3  --->  38
> > > > > 4  ---> 28
> > > > > 5 --> 22
> > > > > 6 --> 19
> > > > > 7 --> 16
> > > > > 8 --> 14
> > > > > 9 --> 11
> > > > > 10 --> 11
> > > > > 11 --> 10
> > > >
> > > > So, 11 instances result in 10 updated rows per second, database wide or
> > > > per instance? If it is per instance, then 11 * 10 is close to the
> > > > performance for one connection.
> > >
> > >
> > > Sorry do not understand the difference between "database wide"
> > > and "per instance"
> >
> > Per instance.
> >
> > >
> > > >
> > > > That being said, when you've got 10 connections fighting over one row, I
> > > > wouldn't be surprised if you had bad performance.
> > > >
> > > > Also, at 119 updates a second, you're more than doubling the table's
> > > > initial size (dead tuples) each second. How often are you vacuuming and
> > > > are you using vacuum or vacuum full?
> > >
> > >
> > > Yes I realize the obvious phenomenon now, (and the uselessness of the script)
> > >  , we should not consider it a performance degradation.
> > >
> > > I am having performance issue in my live database thats why i tried to
> > > simulate the situation(may the the script was overstresser).
> > >
> > > My original problem is  that i send 100 000s of emails carrying a
> > > beacon for tracking readership every tuesday and on wednesday i see
> > > lot of the said query in pg_stat_activity each of these query update
> > > the SAME row that corresponds to the dispatch of last day and it is
> > > then i face the performance problem.
> > >
> > > I think i can only post further details next wednesday , please lemme
> > > know how should i be dealing with the situation if each the updates takes
> > > 100times more time that normal update duration.
> >
> > I see. These problems regularly come up in database design. The best thing
> > you can do is modify your database design/application such that instead of
> > incrementing a count in a single row, you insert a row into a table,
> > recording the 'dispatch_id'. Counting the number of rows for a given
> > dispatch id will give you your count.
> >
>
> sorry i will be accumulating huge amount of rows in seperate table
> with no extra info when i really want just the count. Do you have
> a better database design in mind?
>
> Also i encounter same problem in implementing read count of
> articles in sites and in counting banner impressions where same
> row get updated by multiple processes frequently.

As I said in private email, accumulating large numbers of rows is not a
problem. In your current application, you are write bound, not read bound.
I've designed many similar systems which have hundred of millions of rows.
It takes a while to generate the count, but you just do it periodically in
non-busy periods.

With 8.1, constraint exclusion will give you significantly better
performance with this system, as well.

Thanks,

Gavin

Re: Slow concurrent update of same row in a given table

От
"Jim C. Nasby"
Дата:
On Thu, Sep 29, 2005 at 07:59:34AM +0530, Rajesh Kumar Mallah wrote:
> > I see. These problems regularly come up in database design. The best thing
> > you can do is modify your database design/application such that instead of
> > incrementing a count in a single row, you insert a row into a table,
> > recording the 'dispatch_id'. Counting the number of rows for a given
> > dispatch id will give you your count.
> >
>
> sorry i will be accumulating huge amount of rows in seperate table
> with no extra info when i really want just the count. Do you have
> a better database design in mind?
>
> Also i encounter same problem in implementing read count of
> articles in sites and in counting banner impressions where same
> row get updated by multiple processes frequently.

Databases like to work on *sets* of data, not individual rows. Something
like this would probably perform much better than what you've got now,
and would prevent having a huge table laying around:

INSERT INTO holding_table ... -- Done for every incomming
connection/what-have-you

CREATE OR REPLACE FUNCTION summarize() RETURNS void AS $$
DECLARE
    v_rows int;
BEGIN
    DELETE FROM holding_table;
    GET DIAGNOSTICS v_rows = ROW_COUNT;
    UPDATE count_table
        SET count = count + v_rows
    ;
END;
$$ LANGUAGE plpgsql;

Periodically (say, once a minute):
SELECT summarize()
VACUUM holding_table;
VACUUM count_table;
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461