Обсуждение: VACUUM FREEZE output more than double input

Поиск
Список
Период
Сортировка

VACUUM FREEZE output more than double input

От
"Kevin Grittner"
Дата:
I'm grooming a new server to replace one that is soon to be retired.
Most of the data was loaded very close together, in terms of
database transaction numbers, and probably 95% of it will never be
updated.  To assess the potential impact of a "freeze everything in
the database at once" sort of night, I started a VACUUM FREEZE at
the database level, to see what the load looked like.

This seems to be a remarkably good way to cause extreme checkpoint
spikes under 8.2.5, even with an aggressive background writer.  The
I/O pattern was surprising in other respects, too, so I'm looking
to see if someone can help me understand it.

 1  0    156 314760   1888 63457580    0    0 19704 39107 3070 11681  3  4 84 10  0
 0  2    156 313004   1888 63459636    0    0 17176 34654 2807 10696  2  5 84  9  0
 1  0    156 311020   1888 63461692    0    0 17152 34288 2662 10675  4  4 83 10  0
 0  2    156 306404   1888 63465804    0    0 20056 40488 3085 12500  1  3 86  9  0
 3  2    156 304780   1888 63468888    0    0 16936 33403 2798 11424  5  4 80 10  0
 3  2    156 304236   1888 63468888    0    0 15768 37570 3066 10988  3  3 82 12  0
 0  2    156 311384   1888 63462720    0    0 23800 48821 3866 14732  2  5 76 17  0
 3  1    156 304244   1888 63468888    0    0 22440 46684 3609 13133  2  4 79 14  0
 1  2    156 313672   1888 63459636    0    0 21528 43784 3433 12416  2  4 74 20  0
 1  3    156 311856   1888 63461692    0    0 16968 101366 2876 8769  2  7 71 20  0
 0  6    156 307892   1888 63464776    0    0  3824 71225 1178 2592  0  1 79 20  0
 0  5    156 316172   1888 63456552    0    0  6904 99629 1883 5645  3  2 78 17  0
 0  8    156 313232   1888 63459636    0    0  2880 82617 1259 3196  2  1 68 29  0
 0  7    156 310892   1888 63461692    0    0  2384 81262 1118 4415  4  1 55 40  0
 0  5    156 317728   1888 63453468    0    0  8616 104245 2080 8266  5  3 64 29  0
 0  8    156 314368   1888 63457580    0    0  3352 82142 1280 4316  2  1 67 30  0
 0  4    156 310160   1888 63460664    0    0  3928 96361 1466 3885  1  1 70 28  0
 0  9    156 308240   1896 63462712    0    0  1880 77801 1092 2665  1  1 64 33  0
 1  1    156 313044   1904 63460648    0    0 10568 61796 2423 8942  4  2 65 29  0
 1  3    156 311952   1904 63461676    0    0 16112 84713 3038 9919  3  6 69 22  0
 1  2    156 304212   1904 63469900    0    0 23200 78289 4094 14690  3  5 72 20  0
 1  2    156 310516   1896 63463740    0    0 24384 52418 3995 14139  4  4 70 23  0
 1  2    156 303192   1896 63470936    0    0 22608 46513 3689 10554  2  4 73 21  0
 1  2    156 314660   1896 63459628    0    0 19464 40452 3362 9239  1  5 74 20  0
 0  2    156 305652   1896 63467852    0    0 24080 49241 3803 10274  2  4 74 20  0
 0  2    156 312012   1896 63461684    0    0 24360 49745 3995 11190  2  4 71 23  0
 3  2    156 305596   1896 63466824    0    0 21896 45210 3670 12122  3  4 73 20  0

Note that outside of the checkpoints (where writes shoot up and
reads drop down), the writes track along at just over double the
reads.  This is on a database which has had relatively little
activity since the last database vacuum.

Why double writes per read, plus massive writes at checkpoint?

Is there any harm in doing a VACUUM FREEZE after loading from
pg_dump output, before putting the machine into production?
While the normal nightly vacuum, with scattered row freezes,
doesn't seem to cause any problems, a freeze on a mass scale
sure seems to do so.  I'd rather not slow down our regular
nightly vacuum to acommodate the mass freeze case at some
unpredicatable time.

-Kevin


 PostgreSQL 8.2.5 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20070115 (prerelease) (SUSE Linux)

listen_addresses = '*'
port = 5412
max_connections = 200
shared_buffers = 160MB
temp_buffers = 50MB
work_mem = 32MB
maintenance_work_mem = 1GB
max_fsm_pages = 800000
bgwriter_lru_percent = 20.0
bgwriter_lru_maxpages = 200
bgwriter_all_percent = 10.0
bgwriter_all_maxpages = 600
wal_buffers = 1MB
checkpoint_segments = 50
checkpoint_timeout = 30min
seq_page_cost = 0.5
random_page_cost = 0.8
effective_cache_size = 63GB
geqo = off
from_collapse_limit = 15
join_collapse_limit = 15
redirect_stderr = on
log_line_prefix = '[%m] %p %q<%u %d %r> '
stats_block_level = on
stats_row_level = on
autovacuum = on
autovacuum_naptime = 10s
autovacuum_vacuum_threshold = 1
autovacuum_analyze_threshold = 1
datestyle = 'iso, mdy'
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
escape_string_warning = off
standard_conforming_strings = on
sql_inheritance = off

BINDIR = /usr/local/pgsql-8.2.5/bin
DOCDIR = /usr/local/pgsql-8.2.5/doc
INCLUDEDIR = /usr/local/pgsql-8.2.5/include
PKGINCLUDEDIR = /usr/local/pgsql-8.2.5/include
INCLUDEDIR-SERVER = /usr/local/pgsql-8.2.5/include/server
LIBDIR = /usr/local/pgsql-8.2.5/lib
PKGLIBDIR = /usr/local/pgsql-8.2.5/lib
LOCALEDIR =
MANDIR = /usr/local/pgsql-8.2.5/man
SHAREDIR = /usr/local/pgsql-8.2.5/share
SYSCONFDIR = /usr/local/pgsql-8.2.5/etc
PGXS = /usr/local/pgsql-8.2.5/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--prefix=/usr/local/pgsql-8.2.5' '--enable-integer-datetimes' '--enable-debug' '--disable-nls'
CC = gcc
CPPFLAGS = -D_GNU_SOURCE
CFLAGS = -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -Wdeclaration-after-statement -Wendif-labels
-fno-strict-aliasing-g 
CFLAGS_SL = -fpic
LDFLAGS = -Wl,-rpath,'/usr/local/pgsql-8.2.5/lib'
LDFLAGS_SL =
LIBS = -lpgport -lz -lreadline -lcrypt -ldl -lm
VERSION = PostgreSQL 8.2.5

kgrittn@SOCRATES:~> cat /proc/version
Linux version 2.6.16.53-0.8-smp (geeko@buildhost) (gcc version 4.1.2 20070115 (prerelease) (SUSE Linux)) #1 SMP Fri Aug
3113:07:27 UTC 2007 
kgrittn@SOCRATES:~> cat /etc/SuSE-release
SUSE Linux Enterprise Server 10 (x86_64)
VERSION = 10
PATCHLEVEL = 1
kgrittn@SOCRATES:~> free -m
             total       used       free     shared    buffers     cached
Mem:         64446      64145        300          0          1      61972
-/+ buffers/cache:       2171      62274
Swap:         1027          0       1027



Re: VACUUM FREEZE output more than double input

От
Tom Lane
Дата:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Why double writes per read, plus massive writes at checkpoint?

The double writes aren't surprising: freezing has to be WAL-logged,
and the odds are that each page hasn't been touched since the last
checkpoint, so the WAL log will include a complete page image.
So in the steady state where all shared buffers are dirty, the
per-page cycle is:
    * write out a dirty buffer so it can be reclaimed
    * read in a page
    * modify it to mark tuples frozen
    * write an image of the page to WAL
    * leave the now-dirty page in shared buffers for later writing

The checkpoint spikes would come from trying to flush out all the
dirty buffers at once.

You'd expect a bit of a valley after each peak, since the vacuum
could presumably recycle some buffers without having to flush 'em
first; but I don't see one in your data.  That may just be because
the numbers are too noisy, but I kinda suspect that the vacuum is
dirtying buffers nearly as fast as the bgwriter can clean them,
leaving not a lot of daylight for a valley.

8.3 should pretty well eliminate the checkpoint spike in this scenario,
because vacuum will work in a limited number of shared buffers instead
of dirtying the whole cache.  But you'll still see 2X writes over reads.

If this is data that you could re-generate at need, it might make sense
to turn off full_page_writes during the initial data load and vacuum.

I concur with trying to FREEZE all the data while you do this, else
you'll see the same work done whenever the data happens to slip past
the auto freeze threshold.

            regards, tom lane

Re: VACUUM FREEZE output more than double input

От
"Kevin Grittner"
Дата:
>>> On Fri, Dec 14, 2007 at  5:59 PM, in message <15401.1197676791@sss.pgh.pa.us>,
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> Why double writes per read, plus massive writes at checkpoint?
>
> The double writes aren't surprising: freezing has to be WAL-logged,
> and the odds are that each page hasn't been touched since the last
> checkpoint, so the WAL log will include a complete page image.
> So in the steady state where all shared buffers are dirty, the
> per-page cycle is:
>     * write out a dirty buffer so it can be reclaimed
>     * read in a page
>     * modify it to mark tuples frozen
>     * write an image of the page to WAL
>     * leave the now-dirty page in shared buffers for later writing
>
> The checkpoint spikes would come from trying to flush out all the
> dirty buffers at once.

Got it.  Thanks.

> You'd expect a bit of a valley after each peak, since the vacuum
> could presumably recycle some buffers without having to flush 'em
> first; but I don't see one in your data.  That may just be because
> the numbers are too noisy, but I kinda suspect that the vacuum is
> dirtying buffers nearly as fast as the bgwriter can clean them,
> leaving not a lot of daylight for a valley.

Yeah, the pattern was pretty consistent and without valleys.

> 8.3 should pretty well eliminate the checkpoint spike in this scenario,
> because vacuum will work in a limited number of shared buffers instead
> of dirtying the whole cache.  But you'll still see 2X writes over reads.

Testing 8.3beta4 so far has shown both smoother I/O and better
performance in all respects.  The preliminary post I did where I
thought I saw some regression on loading a pg_dump turned out to be
was an "apples to oranges" comparison; comparing the same load on
the same hardware and OS, 8.3 wins.  (Kudos to all who worked on
these improvements!)

> If this is data that you could re-generate at need, it might make sense
> to turn off full_page_writes during the initial data load and vacuum.

Thanks for the suggestions; I'll try that.

> I concur with trying to FREEZE all the data while you do this, else
> you'll see the same work done whenever the data happens to slip past
> the auto freeze threshold.

Thanks.  I thought that made sense, but I'm still trying to get my
head around some of the dynamics of PostgreSQL and MVCC.  I'll
suggest that as policy here.

-Kevin