Re: Re: Slowdown problem when writing 1.7million records

Поиск
Список
Период
Сортировка
От Marc SCHAEFER
Тема Re: Re: Slowdown problem when writing 1.7million records
Дата
Msg-id Pine.LNX.3.96.1010227150911.2241C-100000@defian.alphanet.ch
обсуждение исходный текст
Ответ на Re: Slowdown problem when writing 1.7million records  (Emmanuel Charpentier <charpent@bacbuc.dyndns.org>)
Список pgsql-general
> Server (v6.2), I am accessing the files using JDBC from a Windows 2000 PC.

I don't use Microsoft software, nor Java, but a few general suggestions
for you:

  - you may want to create the INDEX at the end (using CREATE INDEX)
    in one operation

  - you may want to run your requests in transactions: e.g. a transaction
    which is committed every N insertion.

  - in some rare case you may want to disable the fsync() of the
    PostgreSQL

  - you may want to use a more efficient interface, such as the
    COPY command.

Example (using transactions)

#! /usr/bin/perl -wI /home/schaefer/perl-libs
# $Id$

use DBI;
use test_db;
use strict;

my $debug = 0;

my $result = 0; # success
my $reason = "of an unknown error";

$| = 1;

my $dbh = &open_database();
if (defined($dbh)) {
   my $amount_entries =  4000000;
   my $commit_every    = 10000;

   # Sometimes, large data entries are done better with a COPY.

   $dbh->{AutoCommit} = 0; # Use transactions.

   foreach (('sol_f', 'sol_i', 'sol_d')) {
      # Using transactions should make insertion faster, since fsync()
      # are probably not required. However, when changes are very big,
      # it might actually make it slower or using much space, this
      # is why we have this $commit_every above and below.
      # was expecting a BEGIN WORK; but that
      # seem to be implicit.

      eval {
         my $i;
         my $failed_reason = "unknown db error";

     print "Populating " . $_ . " ...";
     for ($i = 0; ($i < $amount_entries) && ($result == 0); $i++) {
            my @titles = ('id', 'ref', 'sentpos', 'wordpos');
        if (!&do_query($dbh,
               "INSERT INTO " . $_ . "("
               . join(", ", @titles)
               . ") VALUES ("
               . join(", ", ('?') x @titles)
               . ")",
               \@titles,
               undef,
               [ int(rand(32768)),
                             'truc',
                             int(rand(32768)),
                             'temp'
                           ],
               undef,
                           \$failed_reason)) {
           $result = 1;
           $reason = "can't insert " . $i . ": " . $failed_reason;
        }
        else {
           if (($i % $commit_every) == 0) {
                  if ($dbh->commit) {
                     print "C ";
          }
          else {
             $result = 1;
             $reason = "can't commit: " . $dbh->errstr;
          }
           }
        }
     }

         if ($result == 0) {
            print " POPULATED.\n";
         }
         else {
            print " FAILED.\n";
         }
      }; # DB is not set to die mode, so we will catch only our bugs.
      if ($@) {
         print;
         $result = 1;
         $reason = "transaction failed: " . $@; # Not always right.
         $dbh->rollback; # res. ign. (in failure mode anyway)
      }
      elsif ($result) {
         $dbh->rollback; # res. ign. (in failure mode anyway)
      }
      else {
         if ($dbh->commit) {
            $result = 1;
        $reason = "can't commit: " . $dbh->errstr;
         }
         print "COMMITTED.\n";
      }
   }

   $dbh->{AutoCommit} = 1; # No transactions

   if (!$dbh->disconnect) {
      $result = 1;
      $reason = "disconnect error: " . $dbh->errstr;
   }
   undef $dbh;
}
else {
   # Obviously, can't use $dbh->errstr here.
   $reason = "can't database connect: " . $DBI::errstr;
   $result = 1;
}

if ($result) {
   print $0 . ": failed " . $result . " because " . $reason . "\n";
}
else {
   print "SUCCESSFUL.\n";
}
exit $result;

sub create_table {
   my($dbh, $name, $val) = @_;
   my $result = 0;

   my $sth = $dbh->prepare("CREATE TABLE $name ($val)");
   if (defined($sth)) {
      my $rv = $sth->execute;
      if (defined($rv)) {
         if ($debug) {
            print "$name: succeeded.\n";
         }

     $result = 1;
      }

      $sth->finish;
      undef $sth;
   }

   return $result;
}

And using COPY:

#! /usr/bin/perl -wI /home/schaefer/perl-libs
# USAGE
#    ./copy.pl | psql test_db
# $Id$

my $amount_entries = 4000000;
my $tell_every     =  100000;

print "COPY sol_f FROM stdin;\n";
my $i;
for ($i = 1; $i <= $amount_entries; $i++) {
   print int(rand(32768)) . "\t" . 'truc' . "\t" . int(rand(32768)) . "\t"
         . 'temp' . "\n";
   if (($i % $tell_every) == 0) {
      print STDERR $i . "\n";
   }
}
print ".\n";





В списке pgsql-general по дате отправления:

Предыдущее
От: "Gregory Wood"
Дата:
Сообщение: Re: Slowdown problem when writing 1.7million records
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Last x records