Обсуждение: VACUUM - Wow long should it take?! - Here are some test results

Поиск
Список
Период
Сортировка

VACUUM - Wow long should it take?! - Here are some test results

От
Aldor
Дата:
Hi,

I want to get some statistical information from people having experience
with vacuuming big tables:

I was a little bit confused why some vacuums take much much longer then
other and I was doing some tests with a test database. All tests have
been made with PostgreSQL 8.0.3. At all I made three tests with some
table structures and indexes which are problematic in some databases I
have made. I took the structure of the tables, anonymized the columns
and filled up the table with random data generated by a script.

In this test scenario the tables are filled up with data and NO CHANGES
are made to the table, because I just wanted to find out the time it
takes to just go through the table and indexes for vacuum.

First table
-----------

CREATE TABLE "public"."tbl1" (
  "id" BIGSERIAL,
  "col1" BIGINT NOT NULL,
  "col2" VARCHAR NOT NULL,
  "col3" BIGINT NOT NULL,
  "col4" INTEGER NOT NULL,
  "col5" VARCHAR NOT NULL,
  "col6" TIMESTAMP WITHOUT TIME ZONE NOT NULL,
  "col7" "public"."tsvector",
  "col8" VARCHAR NOT NULL,
  "col9" BOOLEAN NOT NULL,
  "col10" BOOLEAN NOT NULL,
  "col11" "public"."tsvector",
  "col12" INTEGER,
  CONSTRAINT "tbl1_pkey" PRIMARY KEY("id")
) WITH OIDS;

CREATE INDEX "idx_tbl1_col1" ON "public"."tbl1"
  USING btree ("col1");

CREATE INDEX "idx_tbl1_col11" ON "public"."tbl1"
  USING gist ("col11");

CREATE INDEX "idx_tbl1_col8" ON "public"."tbl1"
  USING btree ("col8");

CREATE INDEX "idx_tbl1_col7" ON "public"."tbl1"
  USING gist ("col7");

CREATE INDEX "idx_tbl1_col3" ON "public"."tbl1"
  USING btree ("col3");

=> Filled up with 400.000 rows

LOG:  duration: 148187.934 ms  statement: VACUUM VERBOSE tbl1;

Second table
------------

CREATE TABLE "public"."tbl2" (
  "id" BIGSERIAL,
  "col1" BIGINT,
  "col2" VARCHAR NOT NULL,
  "col3" INTEGER NOT NULL,
  "col4" BIGINT NOT NULL,
  "col5" TIMESTAMP WITHOUT TIME ZONE,
  "col6" BOOLEAN NOT NULL,
  "col7" BIGINT,
  CONSTRAINT "tbl2_pkey" PRIMARY KEY("id")
) WITH OIDS;

CREATE INDEX "idx_tbl2_col7" ON "public"."tbl2"
  USING btree ("col7");

CREATE INDEX "idx_tbl2_col1" ON "public"."tbl2"
  USING btree ("col1");

=> Filled up with 6.000.000 rows

LOG:  duration: 138934.027 ms  statement: VACUUM VERBOSE tbl2;

Third table
-----------

CREATE TABLE "public"."tbl3" (
  "id" BIGSERIAL,
  "col1" BIGINT NOT NULL,
  "col2" BIGINT,
  "col3" BIGINT NOT NULL,
  "col4" VARCHAR,
  "col5" VARCHAR,
  "col6" INTEGER,
  "col7" TIMESTAMP WITHOUT TIME ZONE,
  CONSTRAINT "tbl3_pkey" PRIMARY KEY("id")
) WITH OIDS;

CREATE INDEX "idx_tbl3_col2" ON "public"."tbl3"
  USING btree ("col2");

CREATE INDEX "idx_tbl3_col3" ON "public"."tbl3"
  USING btree ("col3");

CREATE INDEX "idx_tbl3_col1_col3" ON "public"."tbl3"
  USING btree ("col1", "col3");

=> Filled up with 100.000.000

Vacuum could not be performed - took longer then 24h and was aborted
then by me manually.

Postgresql.conf
---------------

listen_addresses = '*'
shared_buffers = 32768
maintenance_work_mem = 1048576
max_stack_depth = 81920

max_fsm_pages = 30000000
max_fsm_relations = 1000

vacuum_cost_delay = 0

bgwriter_percent = 0
bgwriter_maxpages = 0

fsync = false

enable_hashagg = true
enable_hashjoin = true
enable_indexscan = true
enable_mergejoin = true
enable_nestloop = true
enable_seqscan = true
enable_sort = true
enable_tidscan = true

cpu_tuple_cost = 0.01
cpu_index_tuple_cost = 0.001
cpu_operator_cost = 0.0025

effective_cache_size = 102400

geqo = false

default_statistics_target = 1
from_collapse_limit = 8
join_collapse_limit = 8

log_destination = 'stderr'
redirect_stderr = true
log_directory = 'pg_log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_truncate_on_rotation = false
log_rotation_age = 1440

client_min_messages = log
log_min_messages = log
log_error_verbosity = default
log_min_duration_statement = 1

log_connections = false
log_disconnections = false

log_line_prefix = ''

stats_start_collector = true
stats_command_string = true
stats_block_level = true
stats_row_level = true

Problem
-------

In my opinion the VACUUM of the third table takes too long time. The
table is less complex then the other ones - so going on from the badest
case that every entry needs as long as for the most complex table (tbl1)
then I end up with this calculation:

148 seconds for 400.000 rows
= 2702 rows per second

100.000.000 rows = 37.009 seconds (=616 minutes, =10 hours)

But the VACUUM of the tbl3 needs more then 24h for beeing VACUUMed even
without changes.

Hardware + Software configuration
---------------------------------

SCSI RAID5 with usable space of 120GB
2 x Intel(R) Xeon(TM) CPU 3.20GHz
RAM: 4148840 kB (4GB)
Kernel: 2.6.12.2

PGDATA is running on a separate partition

There are running no other things on this server.

Question
--------

The duration of the VACUUM of tbl1 and tbl2 is okay - so how to speed up
now the VACUUM of tbl3?


              --- Regards, Aldor









Re: VACUUM - Wow long should it take?! - Here are some test results

От
Tom Lane
Дата:
Aldor <an@mediaroot.de> writes:
> bgwriter_percent = 0
> bgwriter_maxpages = 0

Why did you do that?  There's certainly nothing in the docs recommending
disabling the bgwriter.

> The duration of the VACUUM of tbl1 and tbl2 is okay - so how to speed up
> now the VACUUM of tbl3?

You have not showed us anything that would explain any speed differences
between these tables.  Maybe there was some difference in the data you
loaded them with?

            regards, tom lane

Re: VACUUM - How long should it take?! - Here are some test

От
Aldor
Дата:
>>bgwriter_percent = 0
>>bgwriter_maxpages = 0

> Why did you do that?  There's certainly nothing in the docs recommending
> disabling the bgwriter.

I'm remembering that putting big data into the database ran faster with
this settings.

>>The duration of the VACUUM of tbl1 and tbl2 is okay - so how to speed up
>>now the VACUUM of tbl3?
>
>
> You have not showed us anything that would explain any speed differences
> between these tables.  Maybe there was some difference in the data you
> loaded them with?

The strings in the tbl3 are longer - I took the average length from the
original table which is about 94 characters.

I will make some tests with increasing the data in tbl3 with 100k,
1 Million and 100 Million rows.

I'll also try activating the bgwriter.

The other settings in postgresql.conf look for you ok?

Some other information
----------------------

Distribution: Debian
Encoding of Database: LATIN10
Locale: C

Regards, Aldor

Tom Lane wrote:
> Aldor <an@mediaroot.de> writes:
>
>>bgwriter_percent = 0
>>bgwriter_maxpages = 0
>
>
> Why did you do that?  There's certainly nothing in the docs recommending
> disabling the bgwriter.
>
>
>>The duration of the VACUUM of tbl1 and tbl2 is okay - so how to speed up
>>now the VACUUM of tbl3?
>
>
> You have not showed us anything that would explain any speed differences
> between these tables.  Maybe there was some difference in the data you
> loaded them with?
>
>             regards, tom lane
>