Обсуждение: Performance problems???

Поиск
Список
Период
Сортировка

Performance problems???

От
James Patterson
Дата:
I have observed some disturbing behavior with the latest (7.1.3) version of
PotgreSQL.

In an application that I am working on with a team of folks, there is a
specific need to execute a series of SQL statements similar to those used in
the 'loaddb.pl' script included below. Without getting into better ways to
increment rowid's (this code is part of another tool that we are using), I'd
like to know why I get the following results with PostgreSQL and MySQL.

In 3 separate runs I get the following PostgreSQL results:
   o 1 -    2000 records inserted in 12 seconds.   o 2001 - 4000 records inserted in 16 seconds.   o 4001 - 6000
recordsinserted in 20 seconds.   
 
You see, there is a clear performance degradation here that is associated
with the number of records in the database. It appears that the main culprit
is the update statement that is issued (see 'loaddb.pl' script below). This
performance behavior is not expected. Especially with so few rows in such a
small table. 

In 3 separate runs I get the following MySQL results:
   o 1 -    2000 records inserted in 6 seconds.   o 2001 - 4000 records inserted in 5 seconds.   o 4001 - 6000 records
insertedin 6 seconds.
 

You see, MySQL performs as expected. There is no performance degradation
here that is related to the number of records in the database tables.

I have been a huge fan and advocate of PostgreSQL. I was stunned to see this
behavior. I am hoping that it is either a bug that has been fixed, or that I
can alter my PostgreSQL configuration to eliminate this behavior.

I have an urgent need to resolve this situation. If I cannot solve the
problem soon, I will be forced to drop PostgreSQL in favor of MySQL. This is
not something that I wish to do.

Please help.

Thanks in advance.

- Jim


########################################################################
#!/usr/bin/perl -w
#
## setupdb.pl 
#
## Simple perl script that creates the 'problemtest' db.
#
## Usage: ./setupdb.pl <db-type>
#
## Assumes that the 'problemtest' PostgreSQL and MySQL databases exist.
## and that there is a user 'problemtest' with proper privileges.
#
########################################################################

use strict;
use DBI;

my $dbd;

if (@ARGV) {   if (uc($ARGV[0]) eq 'POSTGRESQL') {       $dbd = 'Pg';   } elsif (uc($ARGV[0]) eq 'MYSQL') {       $dbd
='mysql';   } else {       &DoUsage();   }
 
} else {   &DoUsage();
}

my $dsn = "DBI:$dbd:dbname=problemtest";
my $usr = 'problemtest';
my $pwd = 'problemtest';

my $dbh = DBI->connect($dsn,$usr,$pwd,   { AutoCommit => 1, RaiseError => 1 });

$dbh->do(<<END);
drop table foo
END

$dbh->do(<<END);
drop table control
END

$dbh->do(<<END);
create table foo (   id  integer not null,   primary key (id),   name varchar(100))
END

$dbh->do(<<END);
create table control (   next_id integer not null)
END

$dbh->do(<<END);
insert into control (next_id) values(1)
END

$dbh->disconnect();


sub DoUsage {   print "\n\tUsage: ./setupdb.pl <db-type>\n";   print "\tWhere db-type is 'PostgreSQL' or 'MySQL'\n\n";
exit 0;
 
}

########################################################################
#!/usr/bin/perl -w
#
## loaddb.pl 
#
## Simple perl script to illustrate the performance degradation
## of the update statement with PostgreSQL as compared to MySQL.
#
## Usage: ./loaddb.pl <db-type> <range-start> <range-end>
#
########################################################################

use strict;
use DBI;

my $dbd;

if (@ARGV == 3) {   if (uc($ARGV[0]) eq 'POSTGRESQL') {       $dbd = 'Pg';   } elsif (uc($ARGV[0]) eq 'MYSQL') {
$dbd= 'mysql';   } else {       &DoUsage();   }
 
} else {   &DoUsage();
}

my $dsn = "DBI:$dbd:dbname=problemtest";
my $usr = 'problemtest';
my $pwd = 'problemtest';

my $dbh = DBI->connect($dsn,$usr,$pwd,   { AutoCommit => 1, RaiseError => 1 });

my $inc_id = $dbh->prepare("update control set next_id = next_id + 1");
my $get_id = $dbh->prepare("select next_id from control");
my $insert = $dbh->prepare("insert into foo (id,name) values(?,?)");

my $start = time;
foreach($ARGV[1]..$ARGV[2]){   $inc_id->execute();   $get_id->execute();   my $id = $get_id->fetchrow_array();
$insert->execute($id,"name$id");
}
my $duration = time - $start;
print "duration = $duration\n";

$inc_id->finish();
$get_id->finish();
$insert->finish();

$dbh->disconnect();

sub DoUsage {   print "\n\tUsage: ./loaddb.pl <db-type> <range-start> <range-end>.\n";   print "\tWhere db-type is
'PostgreSQL'or 'MySQL'.\n\n";   exit 0;
 
}







Re: Performance problems???

От
Doug McNaught
Дата:
James Patterson <jpatterson@amsite.com> writes:

> I have observed some disturbing behavior with the latest (7.1.3) version of
> PotgreSQL.
> 
> In an application that I am working on with a team of folks, there is a
> specific need to execute a series of SQL statements similar to those used in
> the 'loaddb.pl' script included below. Without getting into better ways to
> increment rowid's (this code is part of another tool that we are using), I'd
> like to know why I get the following results with PostgreSQL and MySQL.
> 
> In 3 separate runs I get the following PostgreSQL results:
> 
>     o 1 -    2000 records inserted in 12 seconds.
>     o 2001 - 4000 records inserted in 16 seconds.
>     o 4001 - 6000 records inserted in 20 seconds.
>     
> You see, there is a clear performance degradation here that is associated
> with the number of records in the database. It appears that the main culprit
> is the update statement that is issued (see 'loaddb.pl' script below). This
> performance behavior is not expected. Especially with so few rows in such a
> small table. 

One thing you should definitely do is wrap the entire load loop
((update/select/insert) * N) in a transaction.  This will give you a
huge speedup.  Otherwise you are forcing a disk sync after every SQL
statement.

You may still see some degradation as the table size grows, but actual 
times should be more comparable to MySQL.  

> In 3 separate runs I get the following MySQL results:
> 
>     o 1 -    2000 records inserted in 6 seconds.
>     o 2001 - 4000 records inserted in 5 seconds.
>     o 4001 - 6000 records inserted in 6 seconds.
> 
> You see, MySQL performs as expected. There is no performance degradation
> here that is related to the number of records in the database tables.
> 
> I have been a huge fan and advocate of PostgreSQL. I was stunned to see this
> behavior. I am hoping that it is either a bug that has been fixed, or that I
> can alter my PostgreSQL configuration to eliminate this behavior.
> 
> I have an urgent need to resolve this situation. If I cannot solve the
> problem soon, I will be forced to drop PostgreSQL in favor of MySQL. This is
> not something that I wish to do.

I think the main problem, or one of them, is that you're not using the
proper mechanism for generating sequential numbers.  If you used a
real SEQUENCE instead of a one-row table you wouldn't get the MVCC
penalty from updating that table thousands of times, which is part of
your problem I think.

I understand your issue with not wanting to change existing code, but
the fact is that a sequence is the right way to do this in PostgreSQL.
Updating a one-row table as you're doing requires a new copy of the
row to be created each time it's updated (because of MVCC) which slows
things down until VACUUM is run.

Try using a sequence along with wrapping everything in a transaction
(turn off autocommit and use BEGIN and COMMIT) and I think you'll be
pleasantly surprised. 

-Doug
-- 
Let us cross over the river, and rest under the shade of the trees.  --T. J. Jackson, 1863


Re: Performance problems???

От
Tom Lane
Дата:
James Patterson <jpatterson@amsite.com> writes:
> I have observed some disturbing behavior with the latest (7.1.3) version of
> PotgreSQL.

Try vacuuming the "control" table every so often --- you're accumulating
huge numbers of dead rows in it.

Even better, replace "control" with a sequence object.
        regards, tom lane


Re: Performance problems???

От
Hannu Krosing
Дата:
James Patterson wrote:
> 
> I have observed some disturbing behavior with the latest (7.1.3) version of
> PotgreSQL.
> 
> In an application that I am working on with a team of folks, there is a
> specific need to execute a series of SQL statements similar to those used in
> the 'loaddb.pl' script included below. Without getting into better ways to
> increment rowid's (this code is part of another tool that we are using), I'd
> like to know why I get the following results with PostgreSQL and MySQL.
> 
> In 3 separate runs I get the following PostgreSQL results:
> 
>     o 1 -    2000 records inserted in 12 seconds.
>     o 2001 - 4000 records inserted in 16 seconds.
>     o 4001 - 6000 records inserted in 20 seconds.
> 
> You see, there is a clear performance degradation here that is associated
> with the number of records in the database. It appears that the main culprit
> is the update statement that is issued (see 'loaddb.pl' script below). This
> performance behavior is not expected. Especially with so few rows in such a
> small table.
> 
> In 3 separate runs I get the following MySQL results:
> 
>     o 1 -    2000 records inserted in 6 seconds.
>     o 2001 - 4000 records inserted in 5 seconds.
>     o 4001 - 6000 records inserted in 6 seconds.
> 
> You see, MySQL performs as expected. There is no performance degradation
> here that is related to the number of records in the database tables.
> 
> I have been a huge fan and advocate of PostgreSQL. I was stunned to see this
> behavior. I am hoping that it is either a bug that has been fixed, or that I
> can alter my PostgreSQL configuration to eliminate this behavior.
> 
> I have an urgent need to resolve this situation. If I cannot solve the
> problem soon, I will be forced to drop PostgreSQL in favor of MySQL. This is
> not something that I wish to do.

You really should us e a sequence.

You will most likely need to change the way you create sequence numbers
even for mysql
as the following is not safe on non-transactional DB.

> my $inc_id = $dbh->prepare("update control set next_id = next_id + 1");
> my $get_id = $dbh->prepare("select next_id from control");

if two backends happen to interleave their queries

1> my $inc_id = $dbh->prepare("update control set next_id = next_id +
1");
2> my $inc_id = $dbh->prepare("update control set next_id = next_id +
1");
1> my $get_id = $dbh->prepare("select next_id from control");
2> my $get_id = $dbh->prepare("select next_id from control");

then both will get the same next_id which is probably not what you want.

-------------
Hannu