Обсуждение: RE: WAL Log using all my disk space!

Поиск
Список
Период
Сортировка

RE: WAL Log using all my disk space!

От
"Mikheev, Vadim"
Дата:
> I do a large bulk copy once a day (100,000 records of Radius data),
> tearing down indices, truncating a large table that contains summary
> information, and rebuilding everything after the copy. Over the course
> of this operation, I can generate up to 1.5 gigs of WAL data in
> pg_xlog.  Sometimes (like just now), I will run out of disk space and
> the postmaster will crash.  I try to restart it, and it errors out.
> Then I delete all the WAL logs, try to restart, and (surprise) it errors
> out again.

(Removing WAL logs is not good idea).

> I tried to set some of the of the WAL parameters in postgres.conf like
> so:
>
> wal_buffers = 4 # min 4

(More is better and doesn't affect disk space usage).

> wal_files = 8 # range 0-64

Ops. With wal_files > 0 server pre-allocates log files in advance!
Should be used only if disk space is not problem...

> I would like to recover without an initdb, but if that isn't
> possible, I would definitely like to avoid this problem in the
> future.

So, are you able to restart? If not - send us startup server log.
You should be able to remove some of preallocated wal_files but
I need in numbers from server log to say what is safe to remove.

Now how to reduce disk space usage.
First to keep in mind - server removes old (useless) log files at
checkpoint time. Second - log file becomes useless (from transaction
subsystem POV) if it keeps no record from any running transaction.
Third - unfortunately (from my POV), we requires two checkpoint in
log files now, so we do not remove files with records between last
two checkpoints.
Recommendation: try to split your bulk operation into a few transactions
with smaller write traffic and run CHECKPOINT commands between them.
You could also try to change checkpoint_segments and/or checkpoint_timeout
params, but imho explicit CHECKPOINT is better for bulk ops, because of
it will not affect normal operations.

Vadim

Re: WAL Log using all my disk space!

От
webb sprague
Дата:
I just put a CHECKPOINT command on both sides of my COPY statement, and
it seemed to go quite a bit faster and not give me any problems.  Does
that make sense?

Thanks
W

"Mikheev, Vadim" wrote:
>
> > I do a large bulk copy once a day (100,000 records of Radius data),
> > tearing down indices, truncating a large table that contains summary
> > information, and rebuilding everything after the copy. Over the course
> > of this operation, I can generate up to 1.5 gigs of WAL data in
> > pg_xlog.  Sometimes (like just now), I will run out of disk space and
> > the postmaster will crash.  I try to restart it, and it errors out.
> > Then I delete all the WAL logs, try to restart, and (surprise) it errors
> > out again.
>
> (Removing WAL logs is not good idea).
>
> > I tried to set some of the of the WAL parameters in postgres.conf like
> > so:
> >
> > wal_buffers = 4 # min 4
>
> (More is better and doesn't affect disk space usage).
>
> > wal_files = 8 # range 0-64
>
> Ops. With wal_files > 0 server pre-allocates log files in advance!
> Should be used only if disk space is not problem...
>
> > I would like to recover without an initdb, but if that isn't
> > possible, I would definitely like to avoid this problem in the
> > future.
>
> So, are you able to restart? If not - send us startup server log.
> You should be able to remove some of preallocated wal_files but
> I need in numbers from server log to say what is safe to remove.
>
> Now how to reduce disk space usage.
> First to keep in mind - server removes old (useless) log files at
> checkpoint time. Second - log file becomes useless (from transaction
> subsystem POV) if it keeps no record from any running transaction.
> Third - unfortunately (from my POV), we requires two checkpoint in
> log files now, so we do not remove files with records between last
> two checkpoints.
> Recommendation: try to split your bulk operation into a few transactions
> with smaller write traffic and run CHECKPOINT commands between them.
> You could also try to change checkpoint_segments and/or checkpoint_timeout
> params, but imho explicit CHECKPOINT is better for bulk ops, because of
> it will not affect normal operations.
>
> Vadim

RE: WAL Log using all my disk space!

От
"Mikheev, Vadim"
Дата:
> I just put a CHECKPOINT command on both sides of my COPY
> statement, and it seemed to go quite a bit faster and not

Hmm, why would COPY be faster just after CHECKPOINT.
I assume this was caused by some fortuity (like preparing
new log file at checkpoint time - server does it even if
wal_files = 0 but current log file is 75% full).

> give me any problems.  Does that make sense?

Of couse. Also remember

> > Third - unfortunately (from my POV), we requires two checkpoint in
> > log files now, so we do not remove files with records between last
> > two checkpoints.

Some additional CPKT-s may be required.

BUT, I forgot about fourth point - server logs *entire* data pages on first
after last CPKT page modification. It doesn't affect COPY-ed tables (when
first
row goes to new page just this row will be logged) but may affect indices
and UPDATE/DELETE ops.

So, one should be very careful when using CPKT-s.

Vadim

Diagnostic

От
webb sprague
Дата:
I am trying to determine how many seconds a query takes, and I can't
figure it out.  I seem to remember some weird, mostly undocumented query
that went like:

foo# get diagnostics select processed into <int4 variable>;
foo# show <int4 variable>;

But this doesn't work, and I think it may only work in plpgsql function
anyway.

I know how to use explain, but I want to see if postgres is correct
which query is fastest.

Does anybody know how to find this info out?

Thx