Обсуждение: Temp tables, indexes and DELETE vs. TRUNCATE

Поиск
Список
Период
Сортировка

Temp tables, indexes and DELETE vs. TRUNCATE

От
Jeff Boes
Дата:
[Apologies if you have seen this before. I just discovered that posting
to the group via NNTP, at Teranews.com, apparently isn't working since
my posts aren't showing up in the mailing list archives.]


Summary: CREATE INDEX gets slower the more INSERTs you have done on a
table. VACUUM doesn't appear to fix this, but TRUNCATE TABLE does.

Environment: Linux 2.4, Postgres 7.2.4, Perl 5.6.1, DBI

Our application includes the use of a temporary table to help optimize a
query. The query needs to make a pass over a table containing around 0.6
Mrows to gather up about 40 Krows which it stores in the temp table,
then runs several queries joining this table to another that has around
5 Mrows.

The approach was to create the temporary table once, load it with the
"first pass" query, index the table, analyze the table, and then proceed
to run the "second pass" query several different ways.  Then the
application would delete all the rows from the temporary table. As it
operated as a daemon, on receiving its next request it would repeat the
process (except for creating the table) with different parameters.

What we found was surprising. The index-creation step got slower and
slower.  I wrote a script to investigate this separately from our
application. I won't include the whole source here, but the essentials
follow in pseudocode:

CREATE TEMPORARY TABLE foo (a INTEGER);

my @rows = map(int(rand(1000)),1..10000);
my $sth = $dbh->prepare(q!INSERT INTO foo VALUES (?)!);

for my $i (1..100) {
  # start timer 'load'
  $sth->execute($_) foreach @rows;
  $dbh->commit;
  # stop timer 'load'

  # start timer 'create-index'
  $dbh->do(q!CREATE INDEX ix_foo ON foo(a)!);
  # stop timer 'index'

  # start timer 'drop-index'
  $dbh->do(q!DROP INDEX ix_foo!);
  # stop timer 'drop'

  # start timer 'delete'
  $dbh->do(q!DELETE FROM foo!);
  # stop timer 'delete'
> [quoted text muted]

Excerpts from my timing results follow:

Load #1 took 11.54 seconds
Create-index #1 took 0.26 seconds
Drop-index #1 took 0.00 seconds
Delete #1 took 0.16 seconds

Load #2 took 11.44 seconds
Create-index #2 took 0.42 seconds
Drop-index #2 took 0.01 seconds
Delete #2 took 0.27 seconds

Load #3 took 18.90 seconds
Create-index #3 took 7.26 seconds *** [1]
Drop-index #3 took 0.10 seconds
Delete #3 took 2.88 seconds

Load #4 took 11.89 seconds
Create-index #4 took 0.55 seconds
Drop-index #4 took 0.01 seconds
Delete #4 took 0.22 seconds

Load #5 took 10.23 seconds
Create-index #5 took 0.99 seconds
Drop-index #5 took 0.00 seconds
Delete #5 took 0.28 seconds

Load #6 took 10.71 seconds
Create-index #6 took 0.82 seconds
Drop-index #6 took 0.01 seconds
Delete #6 took 0.24 seconds

Load #7 took 10.02 seconds
Create-index #7 took 1.09 seconds
Drop-index #7 took 0.00 seconds
Delete #7 took 0.19 seconds

[1] Probably an anomaly. I'm not sure, but I think a CREATE INDEX will
block on another DDL statement.

...

Load #17 took 12.97 seconds
Create-index #17 took 2.33 seconds
Drop-index #17 took 0.01 seconds
Delete #17 took 0.30 seconds

Load #18 took 11.22 seconds
Create-index #18 took 2.80 seconds
Drop-index #18 took 0.01 seconds
Delete #18 took 0.30 seconds

Load #19 took 12.08 seconds
Create-index #19 took 2.54 seconds
Drop-index #19 took 0.01 seconds
Delete #19 took 0.27 seconds

Load #20 took 10.38 seconds
Create-index #20 took 3.06 seconds
Drop-index #20 took 0.01 seconds
Delete #20 took 0.40 seconds


The 'create-index" step is already an order of magnitude slower.

...

Load #35 took 11.99 seconds
Create-index #35 took 5.57 seconds
Drop-index #35 took 0.01 seconds
Delete #35 took 0.60 seconds

Load #36 took 8.64 seconds
Create-index #36 took 7.83 seconds
Drop-index #36 took 0.01 seconds
Delete #36 took 0.36 seconds

Load #37 took 8.57 seconds
Create-index #37 took 5.11 seconds
Drop-index #37 took 0.01 seconds
Delete #37 took 0.38 seconds

Load #38 took 17.94 seconds
Create-index #38 took 8.67 seconds
Drop-index #38 took 0.01 seconds
Delete #38 took 4.48 seconds

Load #39 took 14.66 seconds
Create-index #39 took 5.88 seconds
Drop-index #39 took 0.01 seconds
Delete #39 took 0.35 seconds

And so on.  Adding a VACUUM step, then a VACUUM FULL step, had no effect
on this creeping slowdown. Eventually, I replaced the DELETE with a
TRUNCATE TABLE, and found that the CREATE INDEX was much more
consistent.

--
Jeff Boes                                      vox 269.226.9550 ext 24
Database Engineer                                     fax 269.349.9076
Nexcerpt, Inc.                                 http://www.nexcerpt.com
           ...Nexcerpt... Extend your Expertise


Re: Temp tables, indexes and DELETE vs. TRUNCATE

От
Mike Mascari
Дата:
Jeff Boes wrote:
> [Apologies if you have seen this before. I just discovered that posting
> to the group via NNTP, at Teranews.com, apparently isn't working since
> my posts aren't showing up in the mailing list archives.]
>
>
> Summary: CREATE INDEX gets slower the more INSERTs you have done on a
> table. VACUUM doesn't appear to fix this, but TRUNCATE TABLE does.

The CREATE INDEX will have to ignore dead tuples left by the DELETE.
TRUNCATE TABLE truncates the underlying relation and rebuilds the
indexes. That's why CREATE INDEX takes an incrementally longer time
over each pass.

> And so on.  Adding a VACUUM step, then a VACUUM FULL step, had no effect
> on this creeping slowdown. Eventually, I replaced the DELETE with a
> TRUNCATE TABLE, and found that the CREATE INDEX was much more
> consistent.

Your pseudo-code did:

LOOP:
 INSERT DATA
 CREATE INDEX
 DROP INDEX
 DELETE FROM foo;

Where was the VACUUM/VACUUM FULL added in this? A VACUUM added after
the DELETE should be sufficient to allow the dead tuple-space to be
reused. A VACUUM FULL added after the DELETE should result in
basically the same on-disk structure as TRUNCATE TABLE, albeit through
a slower process.

Mike Mascari
mascarm@mascari.com