Обсуждение: Why so long between archive calls?

Поиск
Список
Период
Сортировка

Why so long between archive calls?

От
"Chris Hoover"
Дата:
I am working on getting my development system working with archive logging so I can use the pitr features of Postgres.  However, I am finding that the actual archiving is not keeping up with the generation of the logs.  I am now over 500 wal archive files behind (8+ GB).

What controls how often the archive_command is called?

The actual running of my archive script is taking 1-2 seconds, so I'm not sure why we are so far behind.

Thanks for any advise,

Chris

PG 8.1.3
RH AS 4

Re: Why so long between archive calls?

От
Andy Shellam
Дата:
Chris,

I take it you've checked that the archive_command is actually succeeding
(check the postgresql log file.)
Perhaps try running it manually with an early log file as the PostgreSQL
user account?

When you say you're 500 files behind, how old is the earliest log?

Andy.

Chris Hoover wrote:
> I am working on getting my development system working with archive
> logging so I can use the pitr features of Postgres.  However, I am
> finding that the actual archiving is not keeping up with the
> generation of the logs.  I am now over 500 wal archive files behind
> (8+ GB).
>
> What controls how often the archive_command is called?
>
> The actual running of my archive script is taking 1-2 seconds, so I'm
> not sure why we are so far behind.
>
> Thanks for any advise,
>
> Chris
>
> PG 8.1.3
> RH AS 4
> !DSPAM:37,44f45f03143293039237985!


Re: Why so long between archive calls?

От
"Chris Hoover"
Дата:
The oldest log file that is not archived is:
-rw-------  1 postgres postgres 16777216 Aug 28 15:24 00000001000001930000005A

The command I am using is running successfully and I am getting good entries in my log files.
,15282,,2006-08-29 11:07:04.443 EDT,,13552,,,,LOG:  archived transaction log file "00000001000001930000004E"
,15282,,2006-08-29 11:22:54.982 EDT,,13553,,,,LOG:  archived transaction log file "00000001000001930000004F"
,15282,,2006-08-29 11:24:40.126 EDT,,13554,,,,LOG:  archived transaction log file "000000010000019300000050"
,15282,,2006-08-29 11:32:04.650 EDT,,13555,,,,LOG:  archived transaction log file "000000010000019300000051"
,15282,,2006-08-29 11:33:30.900 EDT,,13556,,,,LOG:  archived transaction log file "000000010000019300000052"
,15282,,2006-08-29 11:37:21.333 EDT,,13557,,,,LOG:  archived transaction log file "000000010000019300000053"
,15282,,2006-08-29 11:44:50.792 EDT,,13558,,,,LOG:  archived transaction log file "000000010000019300000054"
,15282,,2006-08-29 12:24:54.566 EDT,,13559,,,,LOG:  archived transaction log file "000000010000019300000055"
,15282,,2006-08-29 12:25:08.530 EDT,,13560,,,,LOG:  archived transaction log file "000000010000019300000056"
,15282,,2006-08-29 12:25:10.754 EDT,,13561,,,,LOG:  archived transaction log file "000000010000019300000057"
,15282,,2006-08-29 12:25:14.619 EDT,,13562,,,,LOG:  archived transaction log file "000000010000019300000058"

So, it looks like everything is good, just the archive command is not getting called enough.

Thanks,

Chris

On 8/29/06, Andy Shellam <andy.shellam-lists@mailnetwork.co.uk > wrote:
Chris,

I take it you've checked that the archive_command is actually succeeding
(check the postgresql log file.)
Perhaps try running it manually with an early log file as the PostgreSQL
user account?

When you say you're 500 files behind, how old is the earliest log?

Andy.

Chris Hoover wrote:
> I am working on getting my development system working with archive
> logging so I can use the pitr features of Postgres.  However, I am
> finding that the actual archiving is not keeping up with the
> generation of the logs.  I am now over 500 wal archive files behind
> (8+ GB).
>
> What controls how often the archive_command is called?
>
> The actual running of my archive script is taking 1-2 seconds, so I'm
> not sure why we are so far behind.
>
> Thanks for any advise,
>
> Chris
>
> PG 8.1.3
> RH AS 4
> !DSPAM:37,44f45f03143293039237985!


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Re: Why so long between archive calls?

От
"Chris Hoover"
Дата:
Any other ideas?  I am still 508 archive log files behind. 

Here is my postgresql.conf

listen_addresses = '*'
port = 50001
max_connections = 256
shared_buffers = 32768 
max_prepared_transactions = 256        
work_mem = 4096                
maintenance_work_mem = 1048576         
max_fsm_pages = 8000000                
max_fsm_relations = 10000              
vacuum_cost_delay = 1000                       
vacuum_cost_page_hit = 1               
vacuum_cost_page_miss = 10             
vacuum_cost_page_dirty = 20            
vacuum_cost_limit = 8000                       
wal_buffers = 64                       
checkpoint_segments = 256              
checkpoint_timeout = 1800              
archive_command = '/home/postgres/bin/archive_pg_xlog.sh %p %f 50001'  
effective_cache_size = 383490          
random_page_cost = 2                   
default_statistics_target = 100        
constraint_exclusion = on
redirect_stderr = on 
log_directory = 'pg_log'
log_truncate_on_rotation = on
log_rotation_age = 1440
log_rotation_size = 1048576
log_min_messages = debug2
log_min_duration_statement = 0
log_connections = on
log_disconnections = on
log_duration = on
log_line_prefix = '%d,%p,%u,%m,%c,%l,%s,%x,%i,'
log_statement = 'all'             
stats_start_collector = on
stats_command_string = on
stats_block_level = on
stats_row_level = on
stats_reset_on_server_start = on
autovacuum = on                
autovacuum_naptime = 60        
autovacuum_vacuum_threshold = 1000     
autovacuum_analyze_threshold = 500     
autovacuum_vacuum_scale_factor = 0.001 
autovacuum_analyze_scale_factor = 0.0005       
autovacuum_vacuum_cost_delay = -1      
autovacuum_vacuum_cost_limit = -1      
statement_timeout = 0
lc_messages = 'C'     
lc_monetary = 'C'           
lc_numeric = 'C' 
lc_time = 'C'       
add_missing_from = on


On 8/29/06, Chris Hoover <revoohc@gmail.com> wrote:
The oldest log file that is not archived is:
-rw-------  1 postgres postgres 16777216 Aug 28 15:24 00000001000001930000005A

The command I am using is running successfully and I am getting good entries in my log files.
,15282,,2006-08-29 11:07:04.443 EDT,,13552,,,,LOG:  archived transaction log file "00000001000001930000004E"
,15282,,2006-08-29 11:22:54.982 EDT,,13553,,,,LOG:  archived transaction log file "00000001000001930000004F"
,15282,,2006-08-29 11:24:40.126 EDT,,13554,,,,LOG:  archived transaction log file "000000010000019300000050"
,15282,,2006-08-29 11:32:04.650 EDT,,13555,,,,LOG:  archived transaction log file "000000010000019300000051"
,15282,,2006-08-29 11:33:30.900 EDT,,13556,,,,LOG:  archived transaction log file "000000010000019300000052"
,15282,,2006-08-29 11:37:21.333 EDT,,13557,,,,LOG:  archived transaction log file "000000010000019300000053"
,15282,,2006-08-29 11:44:50.792 EDT,,13558,,,,LOG:  archived transaction log file "000000010000019300000054"
,15282,,2006-08-29 12:24:54.566 EDT,,13559,,,,LOG:  archived transaction log file "000000010000019300000055"
,15282,,2006-08-29 12:25:08.530 EDT,,13560,,,,LOG:  archived transaction log file "000000010000019300000056"
,15282,,2006-08-29 12:25:10.754 EDT,,13561,,,,LOG:  archived transaction log file "000000010000019300000057"
,15282,,2006-08-29 12:25:14.619 EDT,,13562,,,,LOG:  archived transaction log file "000000010000019300000058"

So, it looks like everything is good, just the archive command is not getting called enough.

Thanks,

Chris


On 8/29/06, Andy Shellam < andy.shellam-lists@mailnetwork.co.uk > wrote:
Chris,

I take it you've checked that the archive_command is actually succeeding
(check the postgresql log file.)
Perhaps try running it manually with an early log file as the PostgreSQL
user account?

When you say you're 500 files behind, how old is the earliest log?

Andy.

Chris Hoover wrote:
> I am working on getting my development system working with archive
> logging so I can use the pitr features of Postgres.  However, I am
> finding that the actual archiving is not keeping up with the
> generation of the logs.  I am now over 500 wal archive files behind
> (8+ GB).
>
> What controls how often the archive_command is called?
>
> The actual running of my archive script is taking 1-2 seconds, so I'm
> not sure why we are so far behind.
>
> Thanks for any advise,
>
> Chris
>
> PG 8.1.3
> RH AS 4
> !DSPAM:37,44f45f03143293039237985!


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster


Re: Why so long between archive calls?

От
Tom Lane
Дата:
"Chris Hoover" <revoohc@gmail.com> writes:
> Any other ideas?  I am still 508 archive log files behind.

Try "strace -f -p <pid>" on the archiver subprocess to see what it's
doing and what the launched scripts are doing.

            regards, tom lane

Re: Why so long between archive calls?

От
"Chris Hoover"
Дата:
Tom,

I just ran this for several minutes and have about an 11MB trace file.  Is there anything particular you want to see? 

Also, is the data in the file deciferable or have actual database data in it?  The reason I ask, is that I work under HIPPA regulations, so I can't release anything if it has potential personal/patient data in it.

Chris

On 8/31/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Chris Hoover" <revoohc@gmail.com> writes:
> Any other ideas?  I am still 508 archive log files behind.

Try "strace -f -p <pid>" on the archiver subprocess to see what it's
doing and what the launched scripts are doing.

                        regards, tom lane

Re: Why so long between archive calls?

От
Tom Lane
Дата:
"Chris Hoover" <revoohc@gmail.com> writes:
> I just ran this for several minutes and have about an 11MB trace file.  Is
> there anything particular you want to see?

> Also, is the data in the file deciferable or have actual database data in
> it?  The reason I ask, is that I work under HIPPA regulations, so I can't
> release anything if it has potential personal/patient data in it.

Yeah, you probably want to strip out the data portion of any read() or
write() calls shown in the trace.

BTW, I forgot to ask for timestamp information in the trace --- if you
didn't use -t please do it over.  It might be possible to get strace
to skip the read/write data by itself so you don't have to do it
manually, but I don't see anything obvious in its help output.

            regards, tom lane

Re: Why so long between archive calls?

От
Alvaro Herrera
Дата:
Tom Lane wrote:
> "Chris Hoover" <revoohc@gmail.com> writes:
> > I just ran this for several minutes and have about an 11MB trace file.  Is
> > there anything particular you want to see?
>
> > Also, is the data in the file deciferable or have actual database data in
> > it?  The reason I ask, is that I work under HIPPA regulations, so I can't
> > release anything if it has potential personal/patient data in it.
>
> Yeah, you probably want to strip out the data portion of any read() or
> write() calls shown in the trace.
>
> BTW, I forgot to ask for timestamp information in the trace --- if you
> didn't use -t please do it over.  It might be possible to get strace
> to skip the read/write data by itself so you don't have to do it
> manually, but I don't see anything obvious in its help output.

strace -s0

That'll cut any strings though, not only for read/writes.  You'll still
get file names.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Why so long between archive calls?

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> strace -s0

> That'll cut any strings though, not only for read/writes.  You'll still
> get file names.

Oh, that would work then.  I saw that but assumed it would trim file
names too, which would be a problem here.

            regards, tom lane

Setting CLIENT_MIN_MESSAGES dynamically...

От
"Eamonn Kent"
Дата:

 

Hi,

 

My config file has the following setting:

log_min_messages = fatal

 

This setting assures that postgres logging is minimal.  However, I would like to change this setting dynamically – while our embedded application is running, without restarting postgres.  For example, in the case where the application appears to be running sluggishly, I would like to change this setting to debug1 so that I can collect detailed information.  The problem is the SET command applies only to the current session:

set log_min_messages='debug1';

 

Is there a way to accomplish this? 

 

Currently, we  have postgres generate a lot of debug messages (debug1) and then have syslog filter the messages as required.  This works since I can edit the syslog.conf and restart syslog independent of our application.  This is a less than ideal situation since we normally don’t want postgres producing lots of log data (wasting cpu)  since we require it only when doing debugging.

 

Thanks

 

Ike

 

Re: Why so long between archive calls?

От
"Chris Hoover"
Дата:
Tom,

Did you get my reply to this message with my data?  I have not seen it come onto the list.  I sent it out yesterday.

Chris

On 9/1/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alvaro Herrera < alvherre@commandprompt.com> writes:
> strace -s0

> That'll cut any strings though, not only for read/writes.  You'll still
> get file names.

Oh, that would work then.  I saw that but assumed it would trim file
names too, which would be a problem here.

                        regards, tom lane

Re: Why so long between archive calls?

От
Alvaro Herrera
Дата:
Chris Hoover wrote:
> Tom,
>
> Did you get my reply to this message with my data?  I have not seen it come
> onto the list.  I sent it out yesterday.

I got it at least (but then, I'm on Cc).  Not sure if the list received it.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Why so long between archive calls?

От
Tom Lane
Дата:
"Chris Hoover" <revoohc@gmail.com> writes:
> Right now, I have 510 log files waiting to be archived totaling 8GB.
> ...
> Why is the server so far behind?

Good question.  The trace shows the archiver scanning pg_xlog/archive_status/
and apparently not finding any .ready files.  What do you see in that
directory?

            regards, tom lane

Re: Why so long between archive calls?

От
"Chris Hoover"
Дата:
Currenty, there are no .ready files in the pg_xlog/archive_status file.

Like I mentioned, the db server was stoped and restarted on 9/5 am, but the oldest unarchived log file is:
-rw-------  1 postgres postgres 16M Sep  1 04:04 0000000100000197000000F1

I'm not sure what is going on, since the db was stopped and restarted, there should be no hanging transactions.  I would have believed all of the archive files from before the reboot would have been archived by now.

Let me know if I can provide you with further information.  This is my development/non-production db server, so I do have a little bit of lattitude on what I do with it.

Chris


On 9/6/06, Tom Lane < tgl@sss.pgh.pa.us> wrote:
"Chris Hoover" < revoohc@gmail.com> writes:
> Right now, I have 510 log files waiting to be archived totaling 8GB.
> ...
> Why is the server so far behind?

Good question.  The trace shows the archiver scanning pg_xlog/archive_status/
and apparently not finding any .ready files.  What do you see in that
directory?

                        regards, tom lane

Re: Why so long between archive calls?

От
Tom Lane
Дата:
"Chris Hoover" <revoohc@gmail.com> writes:
> Currenty, there are no .ready files in the pg_xlog/archive_status file.

Well, that explains why the archiver thinks it has nothing to do.

> Like I mentioned, the db server was stoped and restarted on 9/5 am, but the
> oldest unarchived log file is:
> -rw-------  1 postgres postgres 16M Sep  1 04:04 0000000100000197000000F1

Are you sure that's an unarchived file, and not a file waiting to be
used later?  What's the current WAL endpoint address?  (pg_controldata
would tell you ... but your trace showed 000000010000019700000078 being
archived, which would suggest that this one is just waiting to be reused.)
Have you got a particularly large checkpoint_segments setting?

            regards, tom lane

Re: Why so long between archive calls?

От
"Chris Hoover"
Дата:
Tom,

I am able to tell how far behind the archiving is running since every minute, I copy the current archive_logs to a backup directory.  This is allowing me to be able to do an up to the minute pitr if required.  The archive_command removes the file from this backup dir when the file is archived by PostgreSQL.

Here my script for that:
#!/bin/bash

## archive the current log file every minute to make sure we can recover within 1 minute

###set -vx

source ~postgres/etc/postgres.profile

date >~/log
if [ -z $1 ]
then

  echo "archive_current_pg_xlog.sh <port>"
  exit

fi

if [ -f /tmp/archive_log ]
then

  echo "archive is already running - exiting"
  echo "`/bin/hostname` - archive is running long" | mail -s "`/bin/hostname` problem" $DBAPAGER $DBAMAIL
  exit
fi

touch /tmp/archive_log

serverPort=$1
serverName=`/bin/hostname`

currentArchiveDir=`echo ${archiveBaseDir}/${serverName}/${serverPort}/current_log`

echo "currentArchiveDir: "$currentArchiveDir >> ~/log
mkdir -p $currentArchiveDir

find $DB_PATH/50001/pg_xlog/ -cmin 1 -type f | grep -v "archive_status" > ~/cp_list
ls $DB_PATH/50001/pg_xlog/archive_status/*.done | while read FILE
do
  FILE2=`echo $FILE|sed 's/\.done//'`
  cat ~/cp_list | grep -v $FILE2 > ~/cp_list2
  mv ~/cp_list2 ~/cp_list
  rm $currentArchiveDir/$FILE2 2>/dev/null
done

cat ~/cp_list | while read FILE
do

  echo "archiving $FILE" >> ~/log
  cp $FILE ${currentArchiveDir}/.

done

find ${archiveBaseDir}/${serverName}/${serverPort}/pg_xlog_arch/ -type f -mmin -5 |
  sed 's/.*pg_xlog_arch\///' |
  while read FILE
    do
      rm -v ${currentArchiveDir}/$FILE >> log 2>/dev/null
    done

rm /tmp/archive_log

date >> ~/log

--------------------

Here is my postgresql.conf

listen_addresses = '*'
port = 50001
max_connections = 256
shared_buffers = 32768 
max_prepared_transactions = 256        
work_mem = 4096                
maintenance_work_mem = 1048576         
max_fsm_pages = 8000000                
max_fsm_relations = 10000              
vacuum_cost_delay = 1000                       
vacuum_cost_page_hit = 1               
vacuum_cost_page_miss = 10             
vacuum_cost_page_dirty = 20            
vacuum_cost_limit = 8000                       
wal_buffers = 64                       
checkpoint_segments = 256              
checkpoint_timeout = 1800              
archive_command = '/home/postgres/bin/archive_pg
_xlog.sh %p %f 50001'  
effective_cache_size = 383490          
random_page_cost = 2                   
default_statistics_target = 100        
constraint_exclusion = on
redirect_stderr = on 
log_directory = 'pg_log'
log_truncate_on_rotation = on
log_rotation_age = 1440
log_rotation_size = 1048576
log_min_messages = debug2
log_min_duration_statement = 0
log_connections = on
log_disconnections = on
log_duration = on
log_line_prefix = '%d,%p,%u,%m,%c,%l,%s,%x,%i,'
log_statement = 'all'             
stats_start_collector = on
stats_command_string = on
stats_block_level = on
stats_row_level = on
stats_reset_on_server_start = on
autovacuum = on                
autovacuum_naptime = 60        
autovacuum_vacuum_threshold = 1000     
autovacuum_analyze_threshold = 500     
autovacuum_vacuum_scale_factor = 0.001 
autovacuum_analyze_scale_factor = 0.0005       
autovacuum_vacuum_cost_delay = -1      
autovacuum_vacuum_cost_limit = -1      
statement_timeout = 0
lc_messages = 'C'     
lc_monetary = 'C'           
lc_numeric = 'C' 
lc_time = 'C'       
add_missing_from = on


Chris

On 9/6/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Chris Hoover" <revoohc@gmail.com> writes:
> Currenty, there are no .ready files in the pg_xlog/archive_status file.

Well, that explains why the archiver thinks it has nothing to do.

> Like I mentioned, the db server was stoped and restarted on 9/5 am, but the
> oldest unarchived log file is:
> -rw-------  1 postgres postgres 16M Sep  1 04:04 0000000100000197000000F1

Are you sure that's an unarchived file, and not a file waiting to be
used later?  What's the current WAL endpoint address?  (pg_controldata
would tell you ... but your trace showed 000000010000019700000078 being
archived, which would suggest that this one is just waiting to be reused.)
Have you got a particularly large checkpoint_segments setting?

                        regards, tom lane

Re: Why so long between archive calls?

От
Tom Lane
Дата:
"Chris Hoover" <revoohc@gmail.com> writes:
> I am able to tell how far behind the archiving is running since every
> minute, I copy the current archive_logs to a backup directory.  This is
> allowing me to be able to do an up to the minute pitr if required.  The
> archive_command removes the file from this backup dir when the file is
> archived by PostgreSQL.

> Here my script for that:

It looks to me like your script does not understand the difference
between xlog files that have been used and xlog files that haven't been
used yet.  Since you've got checkpoint_segments = 256, the xlog code
will allow as many as 513 (2*checkpoint_segments+1) not-yet-used xlog
files to be allocated.

The only simple way to tell where the end of WAL is at the moment is to
look for the xlog file with the latest mod time.  (There'll be more
support for this in 8.2.)  Any files with names numerically larger than
that one are just sitting there waiting to be used, they are not
interesting for archiving purposes.

            regards, tom lane