Re: Re: Slowdown problem when writing 1.7million records
От | Marc SCHAEFER |
---|---|
Тема | Re: Re: Slowdown problem when writing 1.7million records |
Дата | |
Msg-id | Pine.LNX.3.96.1010227150911.2241C-100000@defian.alphanet.ch обсуждение исходный текст |
Ответ на | Re: Slowdown problem when writing 1.7million records (Emmanuel Charpentier <charpent@bacbuc.dyndns.org>) |
Список | pgsql-general |
> Server (v6.2), I am accessing the files using JDBC from a Windows 2000 PC. I don't use Microsoft software, nor Java, but a few general suggestions for you: - you may want to create the INDEX at the end (using CREATE INDEX) in one operation - you may want to run your requests in transactions: e.g. a transaction which is committed every N insertion. - in some rare case you may want to disable the fsync() of the PostgreSQL - you may want to use a more efficient interface, such as the COPY command. Example (using transactions) #! /usr/bin/perl -wI /home/schaefer/perl-libs # $Id$ use DBI; use test_db; use strict; my $debug = 0; my $result = 0; # success my $reason = "of an unknown error"; $| = 1; my $dbh = &open_database(); if (defined($dbh)) { my $amount_entries = 4000000; my $commit_every = 10000; # Sometimes, large data entries are done better with a COPY. $dbh->{AutoCommit} = 0; # Use transactions. foreach (('sol_f', 'sol_i', 'sol_d')) { # Using transactions should make insertion faster, since fsync() # are probably not required. However, when changes are very big, # it might actually make it slower or using much space, this # is why we have this $commit_every above and below. # was expecting a BEGIN WORK; but that # seem to be implicit. eval { my $i; my $failed_reason = "unknown db error"; print "Populating " . $_ . " ..."; for ($i = 0; ($i < $amount_entries) && ($result == 0); $i++) { my @titles = ('id', 'ref', 'sentpos', 'wordpos'); if (!&do_query($dbh, "INSERT INTO " . $_ . "(" . join(", ", @titles) . ") VALUES (" . join(", ", ('?') x @titles) . ")", \@titles, undef, [ int(rand(32768)), 'truc', int(rand(32768)), 'temp' ], undef, \$failed_reason)) { $result = 1; $reason = "can't insert " . $i . ": " . $failed_reason; } else { if (($i % $commit_every) == 0) { if ($dbh->commit) { print "C "; } else { $result = 1; $reason = "can't commit: " . $dbh->errstr; } } } } if ($result == 0) { print " POPULATED.\n"; } else { print " FAILED.\n"; } }; # DB is not set to die mode, so we will catch only our bugs. if ($@) { print; $result = 1; $reason = "transaction failed: " . $@; # Not always right. $dbh->rollback; # res. ign. (in failure mode anyway) } elsif ($result) { $dbh->rollback; # res. ign. (in failure mode anyway) } else { if ($dbh->commit) { $result = 1; $reason = "can't commit: " . $dbh->errstr; } print "COMMITTED.\n"; } } $dbh->{AutoCommit} = 1; # No transactions if (!$dbh->disconnect) { $result = 1; $reason = "disconnect error: " . $dbh->errstr; } undef $dbh; } else { # Obviously, can't use $dbh->errstr here. $reason = "can't database connect: " . $DBI::errstr; $result = 1; } if ($result) { print $0 . ": failed " . $result . " because " . $reason . "\n"; } else { print "SUCCESSFUL.\n"; } exit $result; sub create_table { my($dbh, $name, $val) = @_; my $result = 0; my $sth = $dbh->prepare("CREATE TABLE $name ($val)"); if (defined($sth)) { my $rv = $sth->execute; if (defined($rv)) { if ($debug) { print "$name: succeeded.\n"; } $result = 1; } $sth->finish; undef $sth; } return $result; } And using COPY: #! /usr/bin/perl -wI /home/schaefer/perl-libs # USAGE # ./copy.pl | psql test_db # $Id$ my $amount_entries = 4000000; my $tell_every = 100000; print "COPY sol_f FROM stdin;\n"; my $i; for ($i = 1; $i <= $amount_entries; $i++) { print int(rand(32768)) . "\t" . 'truc' . "\t" . int(rand(32768)) . "\t" . 'temp' . "\n"; if (($i % $tell_every) == 0) { print STDERR $i . "\n"; } } print ".\n";
В списке pgsql-general по дате отправления: