Обсуждение: Vacuum full - disk space eaten by WAL logfiles

Поиск
Список
Период
Сортировка

Vacuum full - disk space eaten by WAL logfiles

От
"Lee Wu"
Дата:

Hi all,

 

When we do weekly “vacuum full”, PG uses all space and causes PG down.

 

checkpoint_segments            | 30

checkpoint_timeout                | 300

 

select version();

                           version

-------------------------------------------------------------

 PostgreSQL 7.3.2 on i686-pc-linux-gnu, compiled by GCC 2.96

 

The error message is:

Jan  8 20:25:33 mybox postgres[1602]: [13] PANIC:  ZeroFill failed to write /my/pg_xlog/xlogtemp.1602: No space left on device

Jan  8 20:25:35 mybox postgres[8213]: [163] LOG:  removing transaction log file 00001AB2000000EC

Jan  8 20:25:35 mybox postgres[1602]: [14-1] LOG:  statement: COPY xxxxx (domain, domain_id, customer_id, action_unspecified, action_unknown,

Jan  8 20:25:35 mybox postgres[8213]: [164] LOG:  removing transaction log file 00001AB2000000ED

Jan  8 20:25:35 mybox postgres[8213]: [165] LOG:  removing transaction log file 00001AB2000000EE

Jan  8 20:25:35 mybox postgres[1602]: [14-2]  action_none, action_deny, action_fail, action_strip, action_tag, action_quarantine, action_clean, action_copy, action_allow,

Jan  8 20:25:35 mybox postgres[8213]: [166] LOG:  removing transaction log file 00001AB2000000F0

Jan  8 20:25:35 mybox postgres[1602]: [14-3]  module, period, created) FROM stdin

Jan  8 20:25:35 mybox postgres[8213]: [167] LOG:  removing transaction log file 00001AB2000000F1

Jan  8 20:25:35 mybox postgres[8213]: [168] LOG:  removing transaction log file 00001AB2000000F2

Jan  8 20:25:36 mybox postgres[8213]: [169] LOG:  removing transaction log file 00001AB2000000F3

Jan  8 20:25:36 mybox postgres[8213]: [170] LOG:  removing transaction log file 00001AB2000000F4

Jan  8 20:25:36 mybox postgres[8213]: [171] LOG:  removing transaction log file 00001AB2000000F5

Jan  8 20:25:36 mybox postgres[862]: [13] LOG:  server process (pid 1602) was terminated by signal 6

Jan  8 20:25:36 mybox postgres[862]: [14] LOG:  terminating any other active server processes

Jan  8 20:25:36 mybox postgres[8601]: [13-1] WARNING:  copy: line 1, Message from PostgreSQL backend:

Jan  8 20:25:36 mybox postgres[8601]: [13-2] ^IThe Postmaster has informed me that some other backend

Jan  8 20:25:36 mybox postgres[8601]: [13-3] ^Idied abnormally and possibly corrupted shared memory.

Jan  8 20:25:36 mybox postgres[8601]: [13-4] ^II have rolled back the current transaction and am

Jan  8 20:25:36 mybox postgres[7284]: [13-1] WARNING:  Message from PostgreSQL backend:

Jan  8 20:25:36 mybox postgres[3658]: [13-1] WARNING:  Message from PostgreSQL backend:

 

When the error is happening, we see at one time, there are 563 WAL files under pg_xlog. My understanding is we should expect

61 WAL files, each 16M.

 

Our /my/pg_xlog is 10G partition and typically has 8G left.

 

The table size of being vacuumed full is 35G.

 

What can I do to fix this mess??

 

Regards,

 

Re: Vacuum full - disk space eaten by WAL logfiles

От
Tom Lane
Дата:
"Lee Wu" <Lwu@mxlogic.com> writes:
> When we do weekly "vacuum full", PG uses all space and causes PG down.

This implies that checkpoints aren't completing for some reason.
If they were, they'd be recycling WAL space.

I'm not aware of any problems in 7.3 that would block a checkpoint
indefinitely, but we have seen cases where it just took too darn long
to do the checkpoint --- implying either a ridiculously large
shared_buffers setting, or a drastic shortage of I/O bandwidth.

You might want to try strace'ing the checkpoint process to see if it
seems to be making progress or not.

Also, are you certain that this is happening during a VACUUM?  The
log messages you show refer to COPY commands.

>  PostgreSQL 7.3.2 on i686-pc-linux-gnu, compiled by GCC 2.96

Are you aware of the number and significance of post-7.3.2 bug fixes
in the 7.3 branch?  You really ought to be on 7.3.8, if you can't afford
to migrate to 7.4 right now.

            regards, tom lane

Re: Vacuum full - disk space eaten by WAL logfiles

От
"Lee Wu"
Дата:
Hi Tom,

1. shared_buffers                 | 32768
2. I/O bandwidth is not an issue to best our knowledge
3. It is "vacuum full" as shown:
Jan  8 20:25:38 mybox postgres[8603]: [15] FATAL:  The database system
is in recovery mode
Jan  8 20:25:38 mybox postgres[7284]: [14] LOG:  statement: vacuum full
analyze the_35G_table
Jan  8 20:25:39 mybox postgres[8604]: [15] FATAL:  The database system
is in recovery mode

Also this error happened last 2 Saturdays and matched our vacuum log
timing:
20050108-194000         End vacuum full analyze on table1.
20050108-194000         Begin vacuum full analyze on the_35G_table.
WARNING:  Message from PostgreSQL backend:
        The Postmaster has informed me that some other backend
        died abnormally and possibly corrupted shared memory.
        I have rolled back the current transaction and am
        going to terminate your database system connection and exit.
        Please reconnect to the database system and repeat your query.
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
connection to server was lost
20050108-202539         End vacuum full analyze on the_35G_table.
20050108-202539         Begin vacuum full analyze on table3.
psql: FATAL:  The database system is in recovery mode

We only do vacuum full on Saturday. This error has not been seen
occurring other time.

4. PG upgrade issue - out of my (an DBA) control

Thanks,

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Monday, January 10, 2005 2:27 PM
To: Lee Wu
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Vacuum full - disk space eaten by WAL logfiles

"Lee Wu" <Lwu@mxlogic.com> writes:
> When we do weekly "vacuum full", PG uses all space and causes PG down.

This implies that checkpoints aren't completing for some reason.
If they were, they'd be recycling WAL space.

I'm not aware of any problems in 7.3 that would block a checkpoint
indefinitely, but we have seen cases where it just took too darn long
to do the checkpoint --- implying either a ridiculously large
shared_buffers setting, or a drastic shortage of I/O bandwidth.

You might want to try strace'ing the checkpoint process to see if it
seems to be making progress or not.

Also, are you certain that this is happening during a VACUUM?  The
log messages you show refer to COPY commands.

>  PostgreSQL 7.3.2 on i686-pc-linux-gnu, compiled by GCC 2.96

Are you aware of the number and significance of post-7.3.2 bug fixes
in the 7.3 branch?  You really ought to be on 7.3.8, if you can't afford
to migrate to 7.4 right now.

            regards, tom lane

Re: Vacuum full - disk space eaten by WAL logfiles

От
Tom Lane
Дата:
"Lee Wu" <Lwu@mxlogic.com> writes:
> 3. It is "vacuum full" as shown:

Hmm.  I'm not able to replicate any such problem here: AFAICS,
checkpoints go through to completion just fine while a VACUUM is
running.  This might mean that the problem was fixed between 7.3.2
and 7.3.8 (though I see nothing relevant-looking in the changelog).
Or it might mean that you've not given enough information to let
someone else duplicate the problem.  What else is going on in parallel
with the VACUUM?  (We know at least some COPY operations...)

Again, it would be useful to know whether the checkpoint process is
making progress or not.

            regards, tom lane

Re: Vacuum full - disk space eaten by WAL logfiles

От
"Lee Wu"
Дата:
Hi Tom,

I re-send this because my last email with attachment did not get
through.

It is a live production database, and I do not know what else happened
on Saturdays because I do not work on Saturdays. However, I can ask our
shift to find out what other queries will be in the PG next Saturday if
it helps you/us. On the other hand, should not PG use fix number/amount
of LOG files like Oracle even though there may be other activities at
the same time?

From archive, I see a post
http://archives.postgresql.org/pgsql-hackers/2004-07/msg01055.php
It said,
> > You have 35 because the max files in pg_xlog is
> > 2*checkpoint_segments +1 or something like that. This is documented
> > in the SGML.

Here is postgres log file except I changed hostname and table names.

I do not know how to strace the checkpoint process and would appreciate
it
if you can show me:
1. how
2. when

Thanks,

Postgres.log:
Jan  8 20:15:52 mybox postgres[8037]: [73] LOG:  recycled transaction
log file 00001AB100000060
all other recycling transaction log ...
Jan  8 20:15:52 mybox postgres[8037]: [74] LOG:  removing transaction
log file 00001AB100000061
all other removing transaction log ...
Jan  8 20:17:27 mybox postgres[8213]: [13] LOG:  recycled transaction
log file 00001AB2000000A3
all other recycling transaction log ...
Jan  8 20:17:42 mybox postgres[8213]: [74] LOG:  removing transaction
log file 00001AB200000077
all other removing transaction log ...
Jan  8 20:25:33 mybox postgres[1602]: [13] PANIC:  ZeroFill failed to
write /my/pg_xlog/xlogtemp.1602: No space left on device
Jan  8 20:25:35 mybox postgres[8213]: [163] LOG:  removing transaction
log file 00001AB2000000EC
Jan  8 20:25:35 mybox postgres[1602]: [14-1] LOG:  statement: COPY
table1 (domain, domain_id, customer_id, action_unspecified,
action_unknown,
Jan  8 20:25:35 mybox postgres[8213]: [164] LOG:  removing transaction
log file 00001AB2000000ED
Jan  8 20:25:35 mybox postgres[8213]: [165] LOG:  removing transaction
log file 00001AB2000000EE
Jan  8 20:25:35 mybox postgres[1602]: [14-2]  action_none, action_deny,
action_fail, action_strip, action_tag, action_quarantine, action_clean,
action_copy, action_allow,
Jan  8 20:25:35 mybox postgres[8213]: [166] LOG:  removing transaction
log file 00001AB2000000F0
Jan  8 20:25:35 mybox postgres[1602]: [14-3]  module, period, created)
FROM stdin
Jan  8 20:25:35 mybox postgres[8213]: [167] LOG:  removing transaction
log file 00001AB2000000F1
Jan  8 20:25:35 mybox postgres[8213]: [168] LOG:  removing transaction
log file 00001AB2000000F2
Jan  8 20:25:36 mybox postgres[8213]: [169] LOG:  removing transaction
log file 00001AB2000000F3
Jan  8 20:25:36 mybox postgres[8213]: [170] LOG:  removing transaction
log file 00001AB2000000F4
Jan  8 20:25:36 mybox postgres[8213]: [171] LOG:  removing transaction
log file 00001AB2000000F5
Jan  8 20:25:36 mybox postgres[862]: [13] LOG:  server process (pid
1602) was terminated by signal 6
Jan  8 20:25:36 mybox postgres[862]: [14] LOG:  terminating any other
active server processes
Jan  8 20:25:36 mybox postgres[8601]: [13-1] WARNING:  copy: line 1,
Message from PostgreSQL backend:
Jan  8 20:25:36 mybox postgres[8601]: [13-2] ^IThe Postmaster has
informed me that some other backend
Jan  8 20:25:36 mybox postgres[8601]: [13-3] ^Idied abnormally and
possibly corrupted shared memory.
Jan  8 20:25:36 mybox postgres[8601]: [13-4] ^II have rolled back the
current transaction and am
Jan  8 20:25:36 mybox postgres[7284]: [13-1] WARNING:  Message from
PostgreSQL backend:
Jan  8 20:25:36 mybox postgres[3658]: [13-1] WARNING:  Message from
PostgreSQL backend:
Jan  8 20:25:36 mybox postgres[29685]: [13-1] WARNING:  Message from
PostgreSQL backend:
Jan  8 20:25:36 mybox postgres[1605]: [13-1] WARNING:  Message from
PostgreSQL backend:
Jan  8 20:25:36 mybox postgres[29687]: [13-1] WARNING:  Message from
PostgreSQL backend:
Jan  8 20:25:36 mybox postgres[1603]: [13-1] WARNING:  Message from
PostgreSQL backend:
Jan  8 20:25:36 mybox postgres[29684]: [13-1] WARNING:  Message from
PostgreSQL backend:
Jan  8 20:25:36 mybox postgres[1604]: [13-1] WARNING:  Message from
PostgreSQL backend:
Jan  8 20:25:36 mybox postgres[29686]: [13-1] WARNING:  Message from
PostgreSQL backend:
Jan  8 20:25:36 mybox postgres[8601]: [13-5] ^Igoing to terminate your
database system connection and exit.
Jan  8 20:25:36 mybox postgres[7284]: [13-2] ^IThe Postmaster has
informed me that some other backend
Jan  8 20:25:36 mybox postgres[3658]: [13-2] ^IThe Postmaster has
informed me that some other backend
Jan  8 20:25:36 mybox postgres[29685]: [13-2] ^IThe Postmaster has
informed me that some other backend
Jan  8 20:25:36 mybox postgres[1605]: [13-2] ^IThe Postmaster has
informed me that some other backend
Jan  8 20:25:36 mybox postgres[29687]: [13-2] ^IThe Postmaster has
informed me that some other backend
Jan  8 20:25:36 mybox postgres[1603]: [13-2] ^IThe Postmaster has
informed me that some other backend
Jan  8 20:25:36 mybox postgres[29684]: [13-2] ^IThe Postmaster has
informed me that some other backend
Jan  8 20:25:36 mybox postgres[1604]: [13-2] ^IThe Postmaster has
informed me that some other backend
Jan  8 20:25:37 mybox postgres[29686]: [13-2] ^IThe Postmaster has
informed me that some other backend
Jan  8 20:25:37 mybox postgres[8601]: [13-6] ^IPlease reconnect to the
database system and repeat your query.
Jan  8 20:25:37 mybox postgres[7284]: [13-3] ^Idied abnormally and
possibly corrupted shared memory.
Jan  8 20:25:37 mybox postgres[3658]: [13-3] ^Idied abnormally and
possibly corrupted shared memory.
Jan  8 20:25:37 mybox postgres[29685]: [13-3] ^Idied abnormally and
possibly corrupted shared memory.
Jan  8 20:25:37 mybox postgres[1605]: [13-3] ^Idied abnormally and
possibly corrupted shared memory.
Jan  8 20:25:37 mybox postgres[29687]: [13-3] ^Idied abnormally and
possibly corrupted shared memory.
Jan  8 20:25:37 mybox postgres[1603]: [13-3] ^Idied abnormally and
possibly corrupted shared memory.
Jan  8 20:25:37 mybox postgres[29684]: [13-3] ^Idied abnormally and
possibly corrupted shared memory.
Jan  8 20:25:37 mybox postgres[1604]: [13-3] ^Idied abnormally and
possibly corrupted shared memory.
Jan  8 20:25:37 mybox postgres[29686]: [13-3] ^Idied abnormally and
possibly corrupted shared memory.
Jan  8 20:25:37 mybox postgres[8601]: [14-1] LOG:  copy: line 1,
statement: COPY table1 (domain, domain_id, customer_id, email,
bytes_received, mails,
Jan  8 20:25:37 mybox postgres[7284]: [13-4] ^II have rolled back the
current transaction and am
Jan  8 20:25:37 mybox postgres[3658]: [13-4] ^II have rolled back the
current transaction and am
Jan  8 20:25:37 mybox postgres[29685]: [13-4] ^II have rolled back the
current transaction and am
Jan  8 20:25:37 mybox postgres[1605]: [13-4] ^II have rolled back the
current transaction and am
Jan  8 20:25:37 mybox postgres[29687]: [13-4] ^II have rolled back the
current transaction and am
Jan  8 20:25:37 mybox postgres[1603]: [13-4] ^II have rolled back the
current transaction and am
Jan  8 20:25:37 mybox postgres[29684]: [13-4] ^II have rolled back the
current transaction and am
Jan  8 20:25:37 mybox postgres[1604]: [13-4] ^II have rolled back the
current transaction and am
Jan  8 20:25:37 mybox postgres[29686]: [13-4] ^II have rolled back the
current transaction and am
Jan  8 20:25:37 mybox postgres[8601]: [14-2]  period, created) FROM
stdin
Jan  8 20:25:37 mybox postgres[7284]: [13-5] ^Igoing to terminate your
database system connection and exit.
Jan  8 20:25:38 mybox postgres[3658]: [13-5] ^Igoing to terminate your
database system connection and exit.
Jan  8 20:25:38 mybox postgres[29685]: [13-5] ^Igoing to terminate your
database system connection and exit.
Jan  8 20:25:38 mybox postgres[1605]: [13-5] ^Igoing to terminate your
database system connection and exit.
Jan  8 20:25:38 mybox postgres[29687]: [13-5] ^Igoing to terminate your
database system connection and exit.
Jan  8 20:25:38 mybox postgres[1603]: [13-5] ^Igoing to terminate your
database system connection and exit.
Jan  8 20:25:38 mybox postgres[29684]: [13-5] ^Igoing to terminate your
database system connection and exit.
Jan  8 20:25:38 mybox postgres[1604]: [13-5] ^Igoing to terminate your
database system connection and exit.
Jan  8 20:25:38 mybox postgres[29686]: [13-5] ^Igoing to terminate your
database system connection and exit.
Jan  8 20:25:38 mybox postgres[8602]: [15] FATAL:  The database system
is in recovery mode
Jan  8 20:25:38 mybox postgres[7284]: [13-6] ^IPlease reconnect to the
database system and repeat your query.
Jan  8 20:25:38 mybox postgres[3658]: [13-6] ^IPlease reconnect to the
database system and repeat your query.
Jan  8 20:25:38 mybox postgres[29685]: [13-6] ^IPlease reconnect to the
database system and repeat your query.
Jan  8 20:25:38 mybox postgres[1605]: [13-6] ^IPlease reconnect to the
database system and repeat your query.
Jan  8 20:25:38 mybox postgres[29687]: [13-6] ^IPlease reconnect to the
database system and repeat your query.
Jan  8 20:25:38 mybox postgres[1603]: [13-6] ^IPlease reconnect to the
database system and repeat your query.
Jan  8 20:25:38 mybox postgres[29684]: [13-6] ^IPlease reconnect to the
database system and repeat your query.
Jan  8 20:25:38 mybox postgres[1604]: [13-6] ^IPlease reconnect to the
database system and repeat your query.
Jan  8 20:25:38 mybox postgres[29686]: [13-6] ^IPlease reconnect to the
database system and repeat your query.
Jan  8 20:25:38 mybox postgres[8603]: [15] FATAL:  The database system
is in recovery mode
Jan  8 20:25:38 mybox postgres[7284]: [14] LOG:  statement: set
vacuum_mem=131072; vacuum full analyze the_35G_table
Jan  8 20:25:39 mybox postgres[8604]: [15] FATAL:  The database system
is in recovery mode
All other The database system is in recovery mode...
Jan  8 20:25:41 mybox postgres[862]: [15] LOG:  all server processes
terminated; reinitializing shared memory and semaphores
Jan  8 20:25:41 mybox postgres[9355]: [16] FATAL:  The database system
is starting up
Jan  8 20:25:41 mybox postgres[9346]: [16] LOG:  database system was
interrupted at 2005-01-08 20:25:32 MST
Jan  8 20:25:41 mybox postgres[9365]: [16] FATAL:  The database system
is starting up
Jan  8 20:25:41 mybox postgres[9346]: [17] LOG:  checkpoint record is at
1AB3/692F2358
Jan  8 20:25:41 mybox postgres[9378]: [16] FATAL:  The database system
is starting up
Jan  8 20:25:41 mybox postgres[9346]: [18] LOG:  redo record is at
1AB3/584199FC; undo record is at 0/0; shutdown FALSE
Jan  8 20:25:42 mybox postgres[9477]: [16] FATAL:  The database system
is starting up
Jan  8 20:25:42 mybox postgres[9346]: [19] LOG:  next transaction id:
85474993; next oid: 901857280
Jan  8 20:25:42 mybox postgres[9478]: [16] FATAL:  The database system
is starting up
Jan  8 20:25:42 mybox postgres[9346]: [20] LOG:  database system was not
properly shut down; automatic recovery in progress
Jan  8 20:25:42 mybox postgres[9479]: [16] FATAL:  The database system
is starting up
Jan  8 20:25:42 mybox postgres[9346]: [21] LOG:  redo starts at
1AB3/584199FC
Jan  8 20:25:42 mybox postgres[9481]: [16] FATAL:  The database system
is starting up
All other The database system is starting up...
Jan  8 20:37:38 mybox postgres[9346]: [22] LOG:  open of
/my/pg_xlog/00001AB500000030 (log file 6837, segment 48) failed: No such
file or directory
Jan  8 20:37:38 mybox postgres[9346]: [23] LOG:  redo done at
1AB5/2FFFC1F8
Jan  8 20:37:45 mybox postgres[9346]: [24] LOG:  recycled transaction
log file 00001AB300000021
All other recycling transaction log ...
Jan  8 20:37:48 mybox postgres[9346]: [85] LOG:  removing transaction
log file 00001AB2000000FB
Jan  8 20:37:48 mybox postgres[9346]: [86] LOG:  removing transaction
log file 00001AB2000000FC
Jan  8 20:37:48 mybox postgres[14114]: [16] FATAL:  The database system
is starting up
Jan  8 20:37:48 mybox postgres[9346]: [87] LOG:  removing transaction
log file 00001AB2000000FD
All other removing transaction log ...
Jan  8 20:43:28 mybox postgres[14419]: [484] LOG:  removing transaction
log file 00001AB50000002E
Jan  8 20:48:38 mybox postgres[14547]: [16] LOG:  recycled transaction
log file 00001AB500000031

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Monday, January 10, 2005 4:01 PM
To: Lee Wu
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Vacuum full - disk space eaten by WAL logfiles

"Lee Wu" <Lwu@mxlogic.com> writes:
> 3. It is "vacuum full" as shown:

Hmm.  I'm not able to replicate any such problem here: AFAICS,
checkpoints go through to completion just fine while a VACUUM is
running.  This might mean that the problem was fixed between 7.3.2
and 7.3.8 (though I see nothing relevant-looking in the changelog).
Or it might mean that you've not given enough information to let
someone else duplicate the problem.  What else is going on in parallel
with the VACUUM?  (We know at least some COPY operations...)

Again, it would be useful to know whether the checkpoint process is
making progress or not.

            regards, tom lane

Re: Vacuum full - disk space eaten by WAL logfiles

От
Tom Lane
Дата:
"Lee Wu" <Lwu@mxlogic.com> writes:
> On the other hand, should not PG use fix number/amount
> of LOG files like Oracle even though there may be other activities at
> the same time?

No, it certainly should not.  Anyone who's admin'd an Oracle
installation will tell you what a PITA it is that Oracle keels over
and dies when you exceed the fixed log space allocation.

The real question here is why the log space isn't getting recycled in
a timely fashion.

> Postgres.log:
> Jan  8 20:15:52 mybox postgres[8037]: [73] LOG:  recycled transaction
> log file 00001AB100000060
> all other recycling transaction log ...
> Jan  8 20:15:52 mybox postgres[8037]: [74] LOG:  removing transaction
> log file 00001AB100000061
> all other removing transaction log ...
> Jan  8 20:17:27 mybox postgres[8213]: [13] LOG:  recycled transaction
> log file 00001AB2000000A3
> all other recycling transaction log ...
> Jan  8 20:17:42 mybox postgres[8213]: [74] LOG:  removing transaction
> log file 00001AB200000077
> all other removing transaction log ...
> Jan  8 20:25:33 mybox postgres[1602]: [13] PANIC:  ZeroFill failed to
> write /my/pg_xlog/xlogtemp.1602: No space left on device
> Jan  8 20:25:35 mybox postgres[8213]: [163] LOG:  removing transaction
> log file 00001AB2000000EC
> Jan  8 20:25:35 mybox postgres[1602]: [14-1] LOG:  statement: COPY
> table1 (domain, domain_id, customer_id, action_unspecified,
> action_unknown,
> Jan  8 20:25:35 mybox postgres[8213]: [164] LOG:  removing transaction
> log file 00001AB2000000ED
> Jan  8 20:25:35 mybox postgres[8213]: [165] LOG:  removing transaction
> log file 00001AB2000000EE
> Jan  8 20:25:35 mybox postgres[1602]: [14-2]  action_none, action_deny,
> action_fail, action_strip, action_tag, action_quarantine, action_clean,
> action_copy, action_allow,

Hmm.  You seem to have removed all the evidence about the interesting
question, which is what process 8213 (which was evidently doing a
checkpoint) was doing between 20:17:42 and 20:25:35.

Also, what postgresql.conf parameters are you using?  The mix
of "removing" and "recycling" operations seems a bit odd.

            regards, tom lane

Re: Vacuum full - disk space eaten by WAL logfiles

От
"Lee Wu"
Дата:
Here is postgres log from 20:17:42 to 20:25:35.
I did NOT removing anything except hostname and table names.

Thanks,

              name              |                setting
--------------------------------+---------------------------------------
 australian_timezones           | off
 authentication_timeout         | 60
 autocommit                     | on
 checkpoint_segments            | 30
 checkpoint_timeout             | 300
 client_encoding                | SQL_ASCII
 client_min_messages            | notice
 commit_delay                   | 0
 commit_siblings                | 5
 cpu_index_tuple_cost           | 0.001
 cpu_operator_cost              | 0.0025
 cpu_tuple_cost                 | 0.01
 DateStyle                      | ISO with US (NonEuropean) conventions
 db_user_namespace              | off
 deadlock_timeout               | 2500
 debug_pretty_print             | off
 debug_print_parse              | off
 debug_print_plan               | off
 debug_print_rewritten          | off
 default_statistics_target      | 10
 default_transaction_isolation  | read committed
 dynamic_library_path           | $libdir
 effective_cache_size           | 32000
 enable_hashjoin                | on
 enable_indexscan               | on
 enable_mergejoin               | on
 enable_nestloop                | on
 enable_seqscan                 | on
 enable_sort                    | on
 enable_tidscan                 | on
 explain_pretty_print           | on
 fixbtree                       | on
 fsync                          | off
 geqo                           | on
 geqo_effort                    | 1
 geqo_generations               | 0
 geqo_pool_size                 | 0
 geqo_random_seed               | -1
 geqo_selection_bias            | 2
 geqo_threshold                 | 11
 hostname_lookup                | off
 krb_server_keyfile             | unset
 lc_messages                    | en_US.iso885915
 lc_monetary                    | en_US.iso885915
 lc_numeric                     | en_US.iso885915
 lc_time                        | en_US.iso885915
 log_connections                | off
 log_duration                   | off
 log_min_error_statement        | debug1
 log_pid                        | off
 log_statement                  | off
 log_timestamp                  | off
 max_connections                | 400
 max_expr_depth                 | 10000
 max_files_per_process          | 1000
 max_fsm_pages                  | 5000000
 max_fsm_relations              | 500
 max_locks_per_transaction      | 64
 password_encryption            | on
 port                           | 5432
 pre_auth_delay                 | 0
 random_page_cost               | 1.1
 search_path                    | $user,public
 server_encoding                | SQL_ASCII
 server_min_messages            | notice
 shared_buffers                 | 32768
 show_executor_stats            | off
 show_parser_stats              | off
 show_planner_stats             | off
 show_source_port               | on
 show_statement_stats           | off
 silent_mode                    | off
 sort_mem                       | 131072
 sql_inheritance                | on
 ssl                            | off
 statement_timeout              | 0
 stats_block_level              | on
 stats_command_string           | on
 stats_reset_on_server_start    | on
 stats_row_level                | on
 stats_start_collector          | on
 superuser_reserved_connections | 2
 syslog                         | 2
 syslog_facility                | LOCAL0
 syslog_ident                   | postgres
 tcpip_socket                   | on
 TimeZone                       | unknown
 trace_notify                   | off
 TRANSACTION ISOLATION LEVEL    | READ COMMITTED
 transform_null_equals          | off
 unix_socket_directory          | unset
 unix_socket_group              | unset
 unix_socket_permissions        | 511
 vacuum_mem                     | 524288
 virtual_host                   | unset
 wal_buffers                    | 2048
 wal_debug                      | 0
 wal_sync_method                | open_sync
(98 rows)

Jan  8 20:16:59 mybox postgres[8037]: [294] LOG:  removing transaction
log file 00001AB200000055
Jan  8 20:17:27 mybox postgres[8213]: [13] LOG:  recycled transaction
log file 00001AB2000000A3
Jan  8 20:17:27 mybox postgres[8213]: [14] LOG:  recycled transaction
log file 00001AB2000000A4
Jan  8 20:17:27 mybox postgres[8213]: [15] LOG:  recycled transaction
log file 00001AB2000000A5
Jan  8 20:17:27 mybox postgres[8213]: [16] LOG:  recycled transaction
log file 00001AB2000000A6
Jan  8 20:17:27 mybox postgres[8213]: [17] LOG:  recycled transaction
log file 00001AB2000000A7
Jan  8 20:17:27 mybox postgres[8213]: [18] LOG:  recycled transaction
log file 00001AB2000000A8
Jan  8 20:17:27 mybox postgres[8213]: [19] LOG:  recycled transaction
log file 00001AB2000000A9
Jan  8 20:17:27 mybox postgres[8213]: [20] LOG:  recycled transaction
log file 00001AB2000000AA
Jan  8 20:17:27 mybox postgres[8213]: [21] LOG:  recycled transaction
log file 00001AB2000000AB
Jan  8 20:17:27 mybox postgres[8213]: [22] LOG:  recycled transaction
log file 00001AB2000000AC
Jan  8 20:17:27 mybox postgres[8213]: [23] LOG:  recycled transaction
log file 00001AB2000000AD
Jan  8 20:17:27 mybox postgres[8213]: [24] LOG:  recycled transaction
log file 00001AB2000000AE
Jan  8 20:17:27 mybox postgres[8213]: [25] LOG:  recycled transaction
log file 00001AB2000000AF
Jan  8 20:17:30 mybox postgres[8213]: [26] LOG:  recycled transaction
log file 00001AB2000000B0
Jan  8 20:17:36 mybox postgres[8213]: [27] LOG:  recycled transaction
log file 00001AB2000000A2
Jan  8 20:17:36 mybox postgres[8213]: [28] LOG:  recycled transaction
log file 00001AB2000000B1
Jan  8 20:17:36 mybox postgres[8213]: [29] LOG:  recycled transaction
log file 00001AB2000000B2
Jan  8 20:17:37 mybox postgres[8213]: [30] LOG:  recycled transaction
log file 00001AB2000000B3
Jan  8 20:17:37 mybox postgres[8213]: [31] LOG:  recycled transaction
log file 00001AB2000000B4
Jan  8 20:17:38 mybox postgres[8213]: [32] LOG:  recycled transaction
log file 00001AB2000000B5
Jan  8 20:17:39 mybox postgres[8213]: [33] LOG:  recycled transaction
log file 00001AB2000000B8
Jan  8 20:17:39 mybox postgres[8213]: [34] LOG:  recycled transaction
log file 00001AB2000000BC
Jan  8 20:17:39 mybox postgres[8213]: [35] LOG:  recycled transaction
log file 00001AB2000000C0
Jan  8 20:17:40 mybox postgres[8213]: [36] LOG:  recycled transaction
log file 00001AB2000000C4
Jan  8 20:17:40 mybox postgres[8213]: [37] LOG:  recycled transaction
log file 00001AB2000000CA
Jan  8 20:17:40 mybox postgres[8213]: [38] LOG:  recycled transaction
log file 00001AB2000000D4
Jan  8 20:17:40 mybox postgres[8213]: [39] LOG:  recycled transaction
log file 00001AB2000000DE
Jan  8 20:17:40 mybox postgres[8213]: [40] LOG:  recycled transaction
log file 00001AB2000000E9
Jan  8 20:17:40 mybox postgres[8213]: [41] LOG:  recycled transaction
log file 00001AB2000000EF
Jan  8 20:17:40 mybox postgres[8213]: [42] LOG:  recycled transaction
log file 00001AB200000057
Jan  8 20:17:40 mybox postgres[8213]: [43] LOG:  recycled transaction
log file 00001AB200000058
Jan  8 20:17:40 mybox postgres[8213]: [44] LOG:  recycled transaction
log file 00001AB200000059
Jan  8 20:17:40 mybox postgres[8213]: [45] LOG:  recycled transaction
log file 00001AB20000005A
Jan  8 20:17:40 mybox postgres[8213]: [46] LOG:  recycled transaction
log file 00001AB20000005B
Jan  8 20:17:40 mybox postgres[8213]: [47] LOG:  recycled transaction
log file 00001AB20000005C
Jan  8 20:17:40 mybox postgres[8213]: [48] LOG:  recycled transaction
log file 00001AB20000005D
Jan  8 20:17:40 mybox postgres[8213]: [49] LOG:  recycled transaction
log file 00001AB20000005E
Jan  8 20:17:40 mybox postgres[8213]: [50] LOG:  recycled transaction
log file 00001AB20000005F
Jan  8 20:17:40 mybox postgres[8213]: [51] LOG:  recycled transaction
log file 00001AB200000060
Jan  8 20:17:40 mybox postgres[8213]: [52] LOG:  recycled transaction
log file 00001AB200000061
Jan  8 20:17:40 mybox postgres[8213]: [53] LOG:  recycled transaction
log file 00001AB200000062
Jan  8 20:17:41 mybox postgres[8213]: [54] LOG:  recycled transaction
log file 00001AB200000063
Jan  8 20:17:41 mybox postgres[8213]: [55] LOG:  recycled transaction
log file 00001AB200000064
Jan  8 20:17:41 mybox postgres[8213]: [56] LOG:  recycled transaction
log file 00001AB200000065
Jan  8 20:17:41 mybox postgres[8213]: [57] LOG:  recycled transaction
log file 00001AB200000066
Jan  8 20:17:41 mybox postgres[8213]: [58] LOG:  recycled transaction
log file 00001AB200000067
Jan  8 20:17:41 mybox postgres[8213]: [59] LOG:  recycled transaction
log file 00001AB200000068
Jan  8 20:17:41 mybox postgres[8213]: [60] LOG:  recycled transaction
log file 00001AB200000069
Jan  8 20:17:41 mybox postgres[8213]: [61] LOG:  recycled transaction
log file 00001AB20000006A
Jan  8 20:17:41 mybox postgres[8213]: [62] LOG:  recycled transaction
log file 00001AB20000006B
Jan  8 20:17:41 mybox postgres[8213]: [63] LOG:  recycled transaction
log file 00001AB20000006C
Jan  8 20:17:41 mybox postgres[8213]: [64] LOG:  recycled transaction
log file 00001AB20000006D
Jan  8 20:17:41 mybox postgres[8213]: [65] LOG:  recycled transaction
log file 00001AB20000006E
Jan  8 20:17:41 mybox postgres[8213]: [66] LOG:  recycled transaction
log file 00001AB20000006F
Jan  8 20:17:41 mybox postgres[8213]: [67] LOG:  recycled transaction
log file 00001AB200000070
Jan  8 20:17:42 mybox postgres[8213]: [68] LOG:  recycled transaction
log file 00001AB200000071
Jan  8 20:17:42 mybox postgres[8213]: [69] LOG:  recycled transaction
log file 00001AB200000072
Jan  8 20:17:42 mybox postgres[8213]: [70] LOG:  recycled transaction
log file 00001AB200000073
Jan  8 20:17:42 mybox postgres[8213]: [71] LOG:  recycled transaction
log file 00001AB200000074
Jan  8 20:17:42 mybox postgres[8213]: [72] LOG:  recycled transaction
log file 00001AB200000075
Jan  8 20:17:42 mybox postgres[8213]: [73] LOG:  recycled transaction
log file 00001AB200000076
Jan  8 20:17:42 mybox postgres[8213]: [74] LOG:  removing transaction
log file 00001AB200000077
Jan  8 20:17:42 mybox postgres[8213]: [75] LOG:  removing transaction
log file 00001AB200000078
Jan  8 20:17:43 mybox postgres[8213]: [76] LOG:  removing transaction
log file 00001AB200000079
Jan  8 20:17:44 mybox postgres[8213]: [77] LOG:  removing transaction
log file 00001AB20000007A
Jan  8 20:17:46 mybox postgres[8213]: [78] LOG:  removing transaction
log file 00001AB20000007B
Jan  8 20:17:46 mybox postgres[8213]: [79] LOG:  removing transaction
log file 00001AB20000007C
Jan  8 20:17:46 mybox postgres[8213]: [80] LOG:  removing transaction
log file 00001AB20000007D
Jan  8 20:17:46 mybox postgres[8213]: [81] LOG:  removing transaction
log file 00001AB20000007E
Jan  8 20:17:46 mybox postgres[8213]: [82] LOG:  removing transaction
log file 00001AB20000007F
Jan  8 20:17:46 mybox postgres[8213]: [83] LOG:  removing transaction
log file 00001AB200000080
Jan  8 20:17:46 mybox postgres[8213]: [84] LOG:  removing transaction
log file 00001AB200000081
Jan  8 20:17:46 mybox postgres[8213]: [85] LOG:  removing transaction
log file 00001AB200000082
Jan  8 20:17:47 mybox postgres[8213]: [86] LOG:  removing transaction
log file 00001AB200000083
Jan  8 20:17:48 mybox postgres[8213]: [87] LOG:  removing transaction
log file 00001AB200000084
Jan  8 20:17:49 mybox postgres[8213]: [88] LOG:  removing transaction
log file 00001AB200000085
Jan  8 20:17:54 mybox postgres[8213]: [89] LOG:  removing transaction
log file 00001AB200000086
Jan  8 20:17:57 mybox postgres[8213]: [90] LOG:  removing transaction
log file 00001AB200000087
Jan  8 20:18:21 mybox postgres[8213]: [91] LOG:  removing transaction
log file 00001AB200000088
Jan  8 20:18:32 mybox postgres[8213]: [92] LOG:  removing transaction
log file 00001AB200000089
Jan  8 20:18:40 mybox postgres[8213]: [93] LOG:  removing transaction
log file 00001AB20000008A
Jan  8 20:18:43 mybox postgres[8213]: [94] LOG:  removing transaction
log file 00001AB20000008B
Jan  8 20:18:46 mybox postgres[8213]: [95] LOG:  removing transaction
log file 00001AB20000008C
Jan  8 20:18:52 mybox postgres[8213]: [96] LOG:  removing transaction
log file 00001AB20000008D
Jan  8 20:19:01 mybox postgres[8213]: [97] LOG:  removing transaction
log file 00001AB20000008E
Jan  8 20:19:01 mybox postgres[8213]: [98] LOG:  removing transaction
log file 00001AB20000008F
Jan  8 20:19:11 mybox postgres[8213]: [99] LOG:  removing transaction
log file 00001AB200000090
Jan  8 20:19:27 mybox postgres[8213]: [100] LOG:  removing transaction
log file 00001AB200000091
Jan  8 20:19:31 mybox postgres[8213]: [101] LOG:  removing transaction
log file 00001AB200000092
Jan  8 20:19:34 mybox postgres[8213]: [102] LOG:  removing transaction
log file 00001AB200000093
Jan  8 20:19:36 mybox postgres[8213]: [103] LOG:  removing transaction
log file 00001AB200000094
Jan  8 20:19:36 mybox postgres[8213]: [104] LOG:  removing transaction
log file 00001AB200000095
Jan  8 20:19:36 mybox postgres[8213]: [105] LOG:  removing transaction
log file 00001AB200000096
Jan  8 20:19:36 mybox postgres[8213]: [106] LOG:  removing transaction
log file 00001AB200000097
Jan  8 20:19:36 mybox postgres[8213]: [107] LOG:  removing transaction
log file 00001AB200000098
Jan  8 20:19:36 mybox postgres[8213]: [108] LOG:  removing transaction
log file 00001AB200000099
Jan  8 20:19:37 mybox postgres[8213]: [109] LOG:  removing transaction
log file 00001AB20000009A
Jan  8 20:19:37 mybox postgres[8213]: [110] LOG:  removing transaction
log file 00001AB20000009B
Jan  8 20:19:37 mybox postgres[8213]: [111] LOG:  removing transaction
log file 00001AB20000009C
Jan  8 20:19:37 mybox postgres[8213]: [112] LOG:  removing transaction
log file 00001AB20000009D
Jan  8 20:19:38 mybox postgres[8213]: [113] LOG:  removing transaction
log file 00001AB20000009E
Jan  8 20:19:38 mybox postgres[8213]: [114] LOG:  removing transaction
log file 00001AB20000009F
Jan  8 20:19:40 mybox postgres[8213]: [115] LOG:  removing transaction
log file 00001AB2000000A0
Jan  8 20:19:40 mybox postgres[8213]: [116] LOG:  removing transaction
log file 00001AB2000000A1
Jan  8 20:19:43 mybox postgres[8213]: [117] LOG:  removing transaction
log file 00001AB2000000B6
Jan  8 20:19:43 mybox postgres[8213]: [118] LOG:  removing transaction
log file 00001AB2000000B7
Jan  8 20:19:44 mybox postgres[8213]: [119] LOG:  removing transaction
log file 00001AB2000000B9
Jan  8 20:19:53 mybox postgres[8213]: [120] LOG:  removing transaction
log file 00001AB2000000BA
Jan  8 20:20:25 mybox postgres[8213]: [121] LOG:  removing transaction
log file 00001AB2000000BB
Jan  8 20:21:02 mybox postgres[8213]: [122] LOG:  removing transaction
log file 00001AB2000000BD
Jan  8 20:21:23 mybox postgres[8213]: [123] LOG:  removing transaction
log file 00001AB2000000BE
Jan  8 20:21:33 mybox postgres[8213]: [124] LOG:  removing transaction
log file 00001AB2000000BF
Jan  8 20:21:38 mybox postgres[8213]: [125] LOG:  removing transaction
log file 00001AB2000000C1
Jan  8 20:21:41 mybox postgres[8213]: [126] LOG:  removing transaction
log file 00001AB2000000C2
Jan  8 20:21:43 mybox postgres[8213]: [127] LOG:  removing transaction
log file 00001AB2000000C3
Jan  8 20:22:01 mybox postgres[8213]: [128] LOG:  removing transaction
log file 00001AB2000000C5
Jan  8 20:22:01 mybox postgres[8213]: [129] LOG:  removing transaction
log file 00001AB2000000C6
Jan  8 20:22:01 mybox postgres[8213]: [130] LOG:  removing transaction
log file 00001AB2000000C7
Jan  8 20:22:01 mybox postgres[8213]: [131] LOG:  removing transaction
log file 00001AB2000000C8
Jan  8 20:22:01 mybox postgres[8213]: [132] LOG:  removing transaction
log file 00001AB2000000C9
Jan  8 20:22:08 mybox postgres[8213]: [133] LOG:  removing transaction
log file 00001AB2000000CB
Jan  8 20:22:08 mybox postgres[8213]: [134] LOG:  removing transaction
log file 00001AB2000000CC
Jan  8 20:22:08 mybox postgres[8213]: [135] LOG:  removing transaction
log file 00001AB2000000CD
Jan  8 20:22:14 mybox postgres[8213]: [136] LOG:  removing transaction
log file 00001AB2000000CE
Jan  8 20:22:29 mybox postgres[8213]: [137] LOG:  removing transaction
log file 00001AB2000000CF
Jan  8 20:22:36 mybox postgres[8213]: [138] LOG:  removing transaction
log file 00001AB2000000D0
Jan  8 20:22:49 mybox postgres[8213]: [139] LOG:  removing transaction
log file 00001AB2000000D1
Jan  8 20:22:56 mybox postgres[8213]: [140] LOG:  removing transaction
log file 00001AB2000000D2
Jan  8 20:23:01 mybox postgres[8213]: [141] LOG:  removing transaction
log file 00001AB2000000D3
Jan  8 20:23:01 mybox postgres[8213]: [142] LOG:  removing transaction
log file 00001AB2000000D5
Jan  8 20:23:01 mybox postgres[8213]: [143] LOG:  removing transaction
log file 00001AB2000000D6
Jan  8 20:23:27 mybox postgres[8213]: [144] LOG:  removing transaction
log file 00001AB2000000D7
Jan  8 20:23:31 mybox postgres[8213]: [145] LOG:  removing transaction
log file 00001AB2000000D8
Jan  8 20:23:35 mybox postgres[8213]: [146] LOG:  removing transaction
log file 00001AB2000000D9
Jan  8 20:23:35 mybox postgres[8213]: [147] LOG:  removing transaction
log file 00001AB2000000DA
Jan  8 20:23:43 mybox postgres[8213]: [148] LOG:  removing transaction
log file 00001AB2000000DB
Jan  8 20:23:47 mybox postgres[8213]: [149] LOG:  removing transaction
log file 00001AB2000000DC
Jan  8 20:24:05 mybox postgres[8213]: [150] LOG:  removing transaction
log file 00001AB2000000DD
Jan  8 20:24:17 mybox postgres[8213]: [151] LOG:  removing transaction
log file 00001AB2000000DF
Jan  8 20:24:17 mybox postgres[8213]: [152] LOG:  removing transaction
log file 00001AB2000000E0
Jan  8 20:24:17 mybox postgres[8213]: [153] LOG:  removing transaction
log file 00001AB2000000E1
Jan  8 20:24:21 mybox postgres[8213]: [154] LOG:  removing transaction
log file 00001AB2000000E2
Jan  8 20:24:26 mybox postgres[8213]: [155] LOG:  removing transaction
log file 00001AB2000000E3
Jan  8 20:24:26 mybox postgres[8213]: [156] LOG:  removing transaction
log file 00001AB2000000E4
Jan  8 20:24:26 mybox postgres[8213]: [157] LOG:  removing transaction
log file 00001AB2000000E5
Jan  8 20:24:26 mybox postgres[8213]: [158] LOG:  removing transaction
log file 00001AB2000000E6
Jan  8 20:24:31 mybox postgres[8213]: [159] LOG:  removing transaction
log file 00001AB2000000E7
Jan  8 20:24:51 mybox postgres[8213]: [160] LOG:  removing transaction
log file 00001AB2000000E8
Jan  8 20:25:02 mybox postgres[8213]: [161] LOG:  removing transaction
log file 00001AB2000000EA
Jan  8 20:25:25 mybox postgres[8213]: [162] LOG:  removing transaction
log file 00001AB2000000EB
Jan  8 20:25:33 mybox postgres[1602]: [13] PANIC:  ZeroFill failed to
write /my/pg_xlog/xlogtemp.1602: No space left on device
Jan  8 20:25:35 mybox postgres[8213]: [163] LOG:  removing transaction
log file 00001AB2000000EC
Jan  8 20:25:35 mybox postgres[1602]: [14-1] LOG:  statement: COPY
table1 (domain, domain_id, customer_id, action_unspecified,
action_unknown,
Jan  8 20:25:35 mybox postgres[8213]: [164] LOG:  removing transaction
log file 00001AB2000000ED
Jan  8 20:25:35 mybox postgres[8213]: [165] LOG:  removing transaction
log file 00001AB2000000EE
Jan  8 20:25:35 mybox postgres[1602]: [14-2]  action_none, action_deny,
action_fail, action_strip, action_tag, action_quarantine, action_clean,
action_copy, action_allow,
Jan  8 20:25:35 mybox postgres[8213]: [166] LOG:  removing transaction
log file 00001AB2000000F0
Jan  8 20:25:35 mybox postgres[1602]: [14-3]  module, period, created)
FROM stdin
Jan  8 20:25:35 mybox postgres[8213]: [167] LOG:  removing transaction
log file 00001AB2000000F1
Jan  8 20:25:35 mybox postgres[8213]: [168] LOG:  removing transaction
log file 00001AB2000000F2
Jan  8 20:25:36 mybox postgres[8213]: [169] LOG:  removing transaction
log file 00001AB2000000F3

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, January 11, 2005 11:05 AM
To: Lee Wu
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Vacuum full - disk space eaten by WAL logfiles

"Lee Wu" <Lwu@mxlogic.com> writes:
> On the other hand, should not PG use fix number/amount
> of LOG files like Oracle even though there may be other activities at
> the same time?

No, it certainly should not.  Anyone who's admin'd an Oracle
installation will tell you what a PITA it is that Oracle keels over
and dies when you exceed the fixed log space allocation.

The real question here is why the log space isn't getting recycled in
a timely fashion.

> Postgres.log:
> Jan  8 20:15:52 mybox postgres[8037]: [73] LOG:  recycled transaction
> log file 00001AB100000060
> all other recycling transaction log ...
> Jan  8 20:15:52 mybox postgres[8037]: [74] LOG:  removing transaction
> log file 00001AB100000061
> all other removing transaction log ...
> Jan  8 20:17:27 mybox postgres[8213]: [13] LOG:  recycled transaction
> log file 00001AB2000000A3
> all other recycling transaction log ...
> Jan  8 20:17:42 mybox postgres[8213]: [74] LOG:  removing transaction
> log file 00001AB200000077
> all other removing transaction log ...
> Jan  8 20:25:33 mybox postgres[1602]: [13] PANIC:  ZeroFill failed to
> write /my/pg_xlog/xlogtemp.1602: No space left on device
> Jan  8 20:25:35 mybox postgres[8213]: [163] LOG:  removing transaction
> log file 00001AB2000000EC
> Jan  8 20:25:35 mybox postgres[1602]: [14-1] LOG:  statement: COPY
> table1 (domain, domain_id, customer_id, action_unspecified,
> action_unknown,
> Jan  8 20:25:35 mybox postgres[8213]: [164] LOG:  removing transaction
> log file 00001AB2000000ED
> Jan  8 20:25:35 mybox postgres[8213]: [165] LOG:  removing transaction
> log file 00001AB2000000EE
> Jan  8 20:25:35 mybox postgres[1602]: [14-2]  action_none,
action_deny,
> action_fail, action_strip, action_tag, action_quarantine,
action_clean,
> action_copy, action_allow,

Hmm.  You seem to have removed all the evidence about the interesting
question, which is what process 8213 (which was evidently doing a
checkpoint) was doing between 20:17:42 and 20:25:35.

Also, what postgresql.conf parameters are you using?  The mix
of "removing" and "recycling" operations seems a bit odd.

            regards, tom lane

Re: Vacuum full - disk space eaten by WAL logfiles

От
Tom Lane
Дата:
"Lee Wu" <Lwu@mxlogic.com> writes:
> Here is postgres log from 20:17:42 to 20:25:35.
> I did NOT removing anything except hostname and table names.

Hmm.  That shows the checkpoint process (8213) in a tight loop removing
unwanted XLOG files.  But apparently it is being starved for cycles ---
look at the gaps between log entries.  There's no problem with removing
eight per second at 20:17:46, but after that there are periods of 10 to
20 seconds where it doesn't get to run at all.  It really should have
finished this phase in a few seconds, but instead it's still trying to
do the removal seven minutes later when the COPY process fails.
Apparently the COPY and VACUUM and whatever else is going on are able
to create/use XLOG files faster than the checkpoint process can remove
them.  Which makes no sense.

I think there is something fairly odd about either your kernel scheduler
or your filesystem.  What sort of platform is this anyway?

            regards, tom lane

Re: Vacuum full - disk space eaten by WAL logfiles

От
"Lee Wu"
Дата:
cat /etc/redhat-release
Red Hat Linux release 7.3 (Valhalla)

pg_xlog is on local disk while $PGDATA is on RAID 10.

Thanks,

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, January 11, 2005 1:00 PM
To: Lee Wu
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Vacuum full - disk space eaten by WAL logfiles

"Lee Wu" <Lwu@mxlogic.com> writes:
> Here is postgres log from 20:17:42 to 20:25:35.
> I did NOT removing anything except hostname and table names.

Hmm.  That shows the checkpoint process (8213) in a tight loop removing
unwanted XLOG files.  But apparently it is being starved for cycles ---
look at the gaps between log entries.  There's no problem with removing
eight per second at 20:17:46, but after that there are periods of 10 to
20 seconds where it doesn't get to run at all.  It really should have
finished this phase in a few seconds, but instead it's still trying to
do the removal seven minutes later when the COPY process fails.
Apparently the COPY and VACUUM and whatever else is going on are able
to create/use XLOG files faster than the checkpoint process can remove
them.  Which makes no sense.

I think there is something fairly odd about either your kernel scheduler
or your filesystem.  What sort of platform is this anyway?

            regards, tom lane