Обсуждение: FW: Setting up of PITR system.

Поиск
Список
Период
Сортировка

FW: Setting up of PITR system.

От
"Andy Shellam"
Дата:

we want to remove archived WAL log files which are no longer needed
so that the size of wal_archive folder is under control.

The DOCs say that filenames numerically less than the WAL record that pg_stop_backup()
suggests can be removed. Will an alphabetical sorting be different from numerically sorted
filename ?    Sorting numerically is not easy as the filenames look like HEX numbers
ut they are are so huge that they cant be stored in normal integers (32bits).

The WAL files are all numerical – however the backup file has the checkpoint appended to it – eg. 00009012514000916.A0AC91.backup.

You need to examine the contents of this file to find the earliest file needed (it’s usually the one just before it – i.e.  00009012514000916 in this case) and any numerically later than the .backup file.

 

Also you wouldn’t particular need any heavy programming – I’m sure a simple shell script could be written in bash to pick out the correct files.



Q2. We are attempting to automate the process of taking base backup and removal
     of the unneeded WAL files. Is there any reliable way of knowing the WAL file X from
     inside the shell script , such that files prior to X can be removed ?

      from a shell script we issue pg_stop_backup() by psql -c , it returns something
      which does not looks like a WAL file.

 

Pg_stop_backup() returns the checkpoint record – something like 9/A0AC91 – this is purely a guess, but you might be able to find which backup file contains this checkpoint by taking the digits after the forward-slash in the checkpoint (i.e. A0AC91 in this case), then finding the filename that contains this – in my example it’s 00009012514000916.A0AC91.backup, and grep the file for the number after the text “Start WAL Location: “ in this file – then remove anything numerically less.

 

It’d be a very “bitty” process, but I’m certain it could be done – it would need heavy testing over a period of backups though to ensure the wrong files are not being deleted.



Q3. tar exits with non zero status for the same reasons as mentioned in docs
      is there any better archiving tool for this purpose ? can we use cp -a ?

 

The WAL archive command can be set to use either cp or mv – then why not have a regular cronjob that runs a shell script to add any new files to your tar archive every time a new file is detected in your WAL-archive directory?



Regds
Rajesh Kumar Mallah.
  
!DSPAM:14,44293ed135042000516834!


Regards

 

Andy Shellam

Re: FW: Setting up of PITR system.

От
Tom Lane
Дата:
"Andy Shellam" <andy.shellam@mailnetwork.co.uk> writes:
> The DOCs say that filenames numerically less than the WAL record that
> pg_stop_backup() suggests can be removed. Will an alphabetical sorting
> be different from numerically sorted filename ?

If you're worried about that, try "LANG=C ls ..." to make sure the
sorting is done in C locale.  I don't know of any locales that would
sort hex numbers differently from C, but maybe there are some.

            regards, tom lane

Re: FW: Setting up of PITR system.

От
"Rajesh Kumar Mallah"
Дата:
Hi,

Thanks everyone for the reply,

is it reasonable/advisable to start with the output of
pg_stop_backup() in a shell script to find the WAL
file being discussed, ie the FILE which can be used
as a reference for removing the older files ?

secondly , i was asking about the tool for copying the
database directory not the one for archiving the logs.
when i use tar , it gives warning that the file changed
while it was reading the file, i need a tool that does not
give warning for this kind of activity.

Regds
Rajesh Kumar Mallah.

On 3/28/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Andy Shellam" <andy.shellam@mailnetwork.co.uk> writes:
> > The DOCs say that filenames numerically less than the WAL record that
> > pg_stop_backup() suggests can be removed. Will an alphabetical sorting
> > be different from numerically sorted filename ?
>
> If you're worried about that, try "LANG=C ls ..." to make sure the
> sorting is done in C locale.  I don't know of any locales that would
> sort hex numbers differently from C, but maybe there are some.
>
>                         regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

Re: FW: Setting up of PITR system.

От
"Andy Shellam"
Дата:
If pg_start_backup is used correctly, no data pages will be written to disk
until pg_stop_backup is called, so this shouldn't be an issue - HOWEVER -
check that you're excluding your pg_xlog directory (within your database
directory) from the tar backup as these will be changing.

Whatever archive utility you're using, if a file is being changed at the
time of reading, it won't be archived correctly.

Andy

-----Original Message-----
From: pgsql-admin-owner@postgresql.org
[mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Rajesh Kumar Mallah
Sent: Tuesday, 28 March, 2006 5:27 PM
To: Tom Lane
Cc: andy.shellam@mailnetwork.co.uk; pgsql-admin@postgresql.org
Subject: Re: FW: [ADMIN] Setting up of PITR system.

---

secondly , i was asking about the tool for copying the
database directory not the one for archiving the logs.
when i use tar , it gives warning that the file changed
while it was reading the file, i need a tool that does not
give warning for this kind of activity.

---



Re: FW: Setting up of PITR system.

От
"Rajesh Kumar Mallah"
Дата:
On 3/28/06, Andy Shellam <andy.shellam@mailnetwork.co.uk> wrote:
>
>
>
> we want to remove archived WAL log files which are no longer needed
>  so that the size of wal_archive folder is under control.
>
>  The DOCs say that filenames numerically less than the WAL record that
> pg_stop_backup()
>  suggests can be removed. Will an alphabetical sorting be different from
> numerically sorted
>  filename ?    Sorting numerically is not easy as the filenames look like
> HEX numbers
>  ut they are are so huge that they cant be stored in normal integers
> (32bits).
>
> The WAL files are all numerical – however the backup file has the checkpoint
> appended to it – eg. 00009012514000916.A0AC91.backup.
>
> You need to examine the contents of this file to find the earliest file
> needed (it's usually the one just before it – i.e.  00009012514000916 in
> this case) and any numerically later than the .backup file.
>
>
>
> Also you wouldn't particular need any heavy programming – I'm sure a simple
> shell script could be written in bash to pick out the correct files.

OK i am posting my full script [ its not heavy programming i guess :) ]
shall be grateful if  you/someone could review it . (its well commented i think)
script also carries sample data.

it does following
1. takes base backup to a destined folder by rsync
2. waits for .backup file to arrive in archive folder
   after pg_stop_bacup()
3. searches and removes unwanted archived log files.

I have run it many times in my server and it seems to
be working fine.


------------------------------------ BEGIN
--------------------------------------------
#!/bin/bash

# folder where base_backup is put
BACKUPFOLDER=/mnt/disk3/base_backups
today=`date +%d-%m-%Y-%H-%M-%S`
PSQL=/opt/usr/local/pgsql/bin/psql
RSYNC="/usr/bin/rsync  -a"
PGDATADIR=/mnt/disk5/pgdatadir

# two table spaces.

TS1=/mnt/disk4/bigtables
TS2=/mnt/disk3/indexspace

# folder where *archived* logs are put.
WAL_ARCHIVE=/mnt/wal_archive

label=base_backup_${today}

echo "Executing pg_start_backup with label $label in server ... "

# get the checkpoint at which backup starts
# the .backup files seems to be bearing this string in it.

CP=`$PSQL -q -Upostgres -d template1 -c "SELECT
pg_start_backup('$label');" -P tuples_only -P format=unaligned`

echo "Begin CheckPoint is $CP" # this contain string like A/681D1214

if [ $? -ne 0 ]
        then
        echo "PSQL pg_start_backup failed"
        exit 1;
fi
echo "pg_start_backup executed successfully"

echo "RSYNC begins.."

# rsync each of the folders to the backup folder.
for i in $TS1 $TS2 $PGDATADIR ;
do
        echo "Syncing $i .. "
        time $RSYNC $i $BACKUPFOLDER
        echo "Done"
done


# fortunately rsync does *not* seems to be exitting with non zero exit code
# for expected file disappearances and modifications.
if [ $? -ne 0 ]
        then
        echo "RSYNC failed"
        exit 1;
fi

echo "RSYNC Done successfully"

echo "Executing pg_stop_backup in server ... "
$PSQL -Upostgres template1 -c "SELECT pg_stop_backup();"
if [ $? -ne 0 ]
        then
        echo "PSQL pg_stop_backup failed"
        exit 1;
fi
echo "pg_stop_backup done successfully"
TO_SEARCH="00${CP:4}" # $TO_SEARCH contains 1D1214

# now remove the unneeded files.

# strip off first 4 chars from CP and prefix 00 to the result.
# search the file that has the Checkpoint in its filename.
# it takes a while to come, so wait till it comes.

while true; do
        REF_FILE=`ls -1 $WAL_ARCHIVE | grep $TO_SEARCH`
        if [ ! $REF_FILE ]; then
                echo  "Waitng for file with $TO_SEARCH in $WAL_ARCHIVE ... "
        else
                break
        fi
        sleep 1
done

# REF_FILE is 000000010000000A00000068.001D1214.backup.bz2

# take only first 24 chars and store.
REF_FILE_NUM=${REF_FILE:0:24}

# REF_FILE_NUM is 000000010000000A00000068

echo "REF_FILE_NUM=$REF_FILE_NUM"

# iterate list of files in the WAL_ARCHIVE folder
for i in `ls -1 $WAL_ARCHIVE` ;
do
        # $i is :000000010000000A0000005D.bz2 eg
        # get first 24 chars in filename
        FILE_NUM=${i:0:24}

        # compare if the number is less than the reference
        # here string comparison is being used.
        if [[ $FILE_NUM  < $REF_FILE_NUM ]]
        then
                echo "$FILE_NUM [ $i ] removed"
                rm -f $WAL_ARCHIVE/$i
        else
                echo "$FILE_NUM [ $i ] not removed"
        fi
done
------------------------------------ END
-----------------------------------------------

---------------------------------- REAL OUTPUT OF A RUN-----------------------
Executing pg_start_backup with label base_backup_29-03-2006-17-29-01
in server ...
Begin CheckPoint is A/681D1214
pg_start_backup executed successfully
RSYNC begins..
Syncing /mnt/disk4/bigtables ..

real    6m24.338s
user    1m12.831s
sys     0m55.295s
Done
Syncing /mnt/disk3/indexspace ..

real    5m45.245s
user    0m32.520s
sys     0m26.567s
Done
Syncing /mnt/disk5/pgdatadir ..
readlink pgdatadir/global/pgstat.tmp: No such file or directory
rsync error: some files could not be transferred (code 23) at main.c(620)

real    12m1.844s
user    1m43.698s
sys     1m24.486s
Done
RSYNC Done successfully
Executing pg_stop_backup in server ...
 pg_stop_backup
----------------
 A/6FA82D74
(1 row)

pg_stop_backup done successfully
Waitng for file with 001D1214 in /mnt/wal_archive ...
REF_FILE_NUM=000000010000000A00000068
000000010000000A0000005D [
000000010000000A0000005D.003C0B54.backup.bz2 ] removed
000000010000000A0000005D [ 000000010000000A0000005D.bz2 ] removed
000000010000000A0000005E [ 000000010000000A0000005E.bz2 ] removed
000000010000000A0000005F [ 000000010000000A0000005F.bz2 ] removed
000000010000000A00000060 [ 000000010000000A00000060.bz2 ] removed
000000010000000A00000061 [ 000000010000000A00000061.bz2 ] removed
000000010000000A00000062 [ 000000010000000A00000062.bz2 ] removed
000000010000000A00000063 [ 000000010000000A00000063.bz2 ] removed
000000010000000A00000064 [ 000000010000000A00000064.bz2 ] removed
000000010000000A00000065 [ 000000010000000A00000065.bz2 ] removed
000000010000000A00000066 [ 000000010000000A00000066.bz2 ] removed
000000010000000A00000067 [ 000000010000000A00000067.bz2 ] removed
000000010000000A00000068 [
000000010000000A00000068.001D1214.backup.bz2 ] not removed
000000010000000A00000068 [ 000000010000000A00000068.bz2 ] not removed
000000010000000A00000069 [ 000000010000000A00000069.bz2 ] not removed
000000010000000A0000006A [ 000000010000000A0000006A.bz2 ] not removed
000000010000000A0000006B [ 000000010000000A0000006B.bz2 ] not removed
000000010000000A0000006C [ 000000010000000A0000006C.bz2 ] not removed
000000010000000A0000006D [ 000000010000000A0000006D.bz2 ] not removed
000000010000000A0000006E [ 000000010000000A0000006E.bz2 ] not removed

-------------------------------
END-----------------------------------------------------


>
>
>  Q2. We are attempting to automate the process of taking base backup and
> removal
>       of the unneeded WAL files. Is there any reliable way of knowing the
> WAL file X from
>       inside the shell script , such that files prior to X can be removed ?
>
>        from a shell script we issue pg_stop_backup() by psql -c , it returns
> something
>        which does not looks like a WAL file.
>
>
>
> Pg_stop_backup() returns the checkpoint record – something like 9/A0AC91 –
> this is purely a guess, but you might be able to find which backup file
> contains this checkpoint by taking the digits after the forward-slash in the
> checkpoint (i.e. A0AC91 in this case), then finding the filename that
> contains this – in my example it's 00009012514000916.A0AC91.backup,

I think its not correct , pg_start_backup() returns the checkpoint record
which becomes part of the backup filename.



 and grep
> the file for the number after the text "Start WAL Location: " in this file –
> then remove anything numerically less.
>
>
>
> It'd be a very "bitty" process, but I'm certain it could be done – it would
> need heavy testing over a period of backups though to ensure the wrong files
> are not being deleted.
>
>
>
>  Q3. tar exits with non zero status for the same reasons as mentioned in
> docs
>        is there any better archiving tool for this purpose ? can we use cp
> -a ?
>
>
>
> The WAL archive command can be set to use either cp or mv – then why not
> have a regular cronjob that runs a shell script to add any new files to your
> tar archive every time a new file is detected in your WAL-archive directory?

Hmmm you got me wrong, am talking about taking the base_backup not
archival of WAL segments

>
>
>
>  Regds
>  Rajesh Kumar Mallah.
>
>  !DSPAM:14,44293ed135042000516834!
>
>
>  Regards
>
>
>
> Andy Shellam

Re: FW: Setting up of PITR system.

От
"Rajesh Kumar Mallah"
Дата:
On 3/29/06, Andy Shellam <andy.shellam@mailnetwork.co.uk> wrote:
> If pg_start_backup is used correctly, no data pages will be written to disk
> until pg_stop_backup is called, so this shouldn't be an issue - HOWEVER -
> check that you're excluding your pg_xlog directory (within your database
> directory) from the tar backup as these will be changing.
>


> Whatever archive utility you're using, if a file is being changed at the
> time of reading, it won't be archived correctly.

??????  What you are saying is not in sync with docs

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
23.3.2. Making a Base Backup
The procedure for making a base backup is relatively simple:

3. Perform the backup, using any convenient file-system-backup tool
such as tar or cpio. It is neither necessary nor desirable to stop
normal operation of the database while you do this.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
I used rsync like many others , it seems to work fine for me.

Regds
mallah.




>
> Andy
>
> -----Original Message-----
> From: pgsql-admin-owner@postgresql.org
> [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Rajesh Kumar Mallah
> Sent: Tuesday, 28 March, 2006 5:27 PM
> To: Tom Lane
> Cc: andy.shellam@mailnetwork.co.uk; pgsql-admin@postgresql.org
> Subject: Re: FW: [ADMIN] Setting up of PITR system.
>
> ---
>
> secondly , i was asking about the tool for copying the
> database directory not the one for archiving the logs.
> when i use tar , it gives warning that the file changed
> while it was reading the file, i need a tool that does not
> give warning for this kind of activity.
>
> ---
>
>
>

Re: FW: Setting up of PITR system.

От
Tom Lane
Дата:
"Andy Shellam" <andy.shellam@mailnetwork.co.uk> writes:
> If pg_start_backup is used correctly, no data pages will be written to disk
> until pg_stop_backup is called, so this shouldn't be an issue

That's incorrect, the database will continue to operate normally.  The
only thing pg_start_backup really does is lay down a marker so you can
tell how far back you need to save WAL files in order to do a PITR
restore using your backup.

The fact that the backup tool might get an inconsistent copy of some
files is not an issue, because replay of the WAL files against the
copied database will fix any inconsistencies.

It definitely is a pain in the neck that GNU tar complains about files
changing underneath it --- I've looked for a way to disable that, or at
least reduce it to a warning instead of an error condition, but gtar
doesn't seem to have such a switch.  You should try alternative backup
tools such as cpio or rsync.

            regards, tom lane

Re: FW: Setting up of PITR system.

От
Grega Bremec
Дата:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

Rajesh Kumar Mallah wrote:
|
| OK i am posting my full script [ its not heavy programming i guess :) ]
| shall be grateful if  you/someone could review it . (its well
commented i think)
| script also carries sample data.
|
| it does following
| 1. takes base backup to a destined folder by rsync
| 2. waits for .backup file to arrive in archive folder
|    after pg_stop_bacup()
| 3. searches and removes unwanted archived log files.
|
| I have run it many times in my server and it seems to
| be working fine.

Hello, Rajesh.

Just a couple of comments on the script itself, not what it actually
does - I never tried WAL archiving before, so I can't comment on that. I
inserted the comments at relevant points in the script. I'm sorting them
into three categories, one is just improvements in style, the other is
optimization and the third is correction of an error.

| ------------------------------------ BEGIN
| --------------------------------------------
| #!/bin/bash
|
| # folder where base_backup is put
| BACKUPFOLDER=/mnt/disk3/base_backups
| today=`date +%d-%m-%Y-%H-%M-%S`
| PSQL=/opt/usr/local/pgsql/bin/psql
| RSYNC="/usr/bin/rsync  -a"
| PGDATADIR=/mnt/disk5/pgdatadir
|
| # two table spaces.
|
| TS1=/mnt/disk4/bigtables
| TS2=/mnt/disk3/indexspace

(optimization) Since you're using bash, you can use arrays. This could
be better written as

~  TS[0]=/mnt/disk5/pgdatadir
~  TS[1]=/mnt/disk4/bigtables
~  TS[2]=/mnt/disk3/indexspace

or even

~  TS=(/mnt/disk5/pgdatadir \
~      /mnt/disk4/bigtables \
~      /mnt/disk3/indexspace)

That way, you can add tablespaces at will and just use a while loop to
back them up, which greatly simplifies adding new tablespaces or moving
the script somewhere else. See below for how to implement that.

| # folder where *archived* logs are put.
| WAL_ARCHIVE=/mnt/wal_archive
|
| label=base_backup_${today}
|
| echo "Executing pg_start_backup with label $label in server ... "
|
| # get the checkpoint at which backup starts
| # the .backup files seems to be bearing this string in it.
|
| CP=`$PSQL -q -Upostgres -d template1 -c "SELECT
| pg_start_backup('$label');" -P tuples_only -P format=unaligned`
|
| echo "Begin CheckPoint is $CP" # this contain string like A/681D1214
|
| if [ $? -ne 0 ]
|         then
|         echo "PSQL pg_start_backup failed"
|         exit 1;
| fi
| echo "pg_start_backup executed successfully"

(style) If you want to capture any error messages pg_start_backup may
have caused and store them into ${CP}, you should add 2>&1 at the end of
the psql invocation, see below snippet.

(error) Checking for exit status of pg_start_backup using $? at this
point will never report an error, as you've used echo prior to checking
what pg_start_backup returned. You should either move the echo below the
if statement (by adding an "else" clause) or store the exit status of
pg_start_backup into RVAL like this:

~  CP="`$PSQL ... 2>&1`"
~  RVAL=$?
~  echo "Begin CheckPoint says: ${CP}"
~  if [ ${RVAL} -ne 0 ]; then
~    ...
~  fi

| echo "RSYNC begins.."
|
| # rsync each of the folders to the backup folder.
| for i in $TS1 $TS2 $PGDATADIR ;
| do
|         echo "Syncing $i .. "
|         time $RSYNC $i $BACKUPFOLDER
|         echo "Done"
| done

(optimization) If you store locations into an array, you could rewrite
this as follows:

~  CTR=0
~  while [ -n "${TS[${CTR}]}" ]; do
~    echo "Syncing ${TS[${CTR}]}..."
~    time ${RSYNC} ${TS[${CTR}]} ${BACKUPFOLDER}
~    RVAL=$?
~    echo "Sync finished with exit status ${RVAL}"
~    if [ ${RVAL} -ne 0 ]; then
~      <handle errors>
~    fi
~    CTR=$((CTR + 1))
~  done
~  unset CTR

| # fortunately rsync does *not* seems to be exitting with non zero exit
code
| # for expected file disappearances and modifications.
| if [ $? -ne 0 ]
|         then
|         echo "RSYNC failed"
|         exit 1;
| fi

(error) Same error as above - what you're checking here is whether the
last command in the last for loop run was successful, and this is always
going to be true as echoing to stdout will never fail until stdout is
closed for some reason.

| echo "RSYNC Done successfully"
|
| echo "Executing pg_stop_backup in server ... "
| $PSQL -Upostgres template1 -c "SELECT pg_stop_backup();"
| if [ $? -ne 0 ]
|         then
|         echo "PSQL pg_stop_backup failed"
|         exit 1;
| fi
| echo "pg_stop_backup done successfully"
| TO_SEARCH="00${CP:4}" # $TO_SEARCH contains 1D1214
|
| # now remove the unneeded files.
|
| # strip off first 4 chars from CP and prefix 00 to the result.
| # search the file that has the Checkpoint in its filename.
| # it takes a while to come, so wait till it comes.
|
| while true; do
|         REF_FILE=`ls -1 $WAL_ARCHIVE | grep $TO_SEARCH`
|         if [ ! $REF_FILE ]; then
|                 echo  "Waitng for file with $TO_SEARCH in $WAL_ARCHIVE
... "
|         else
|                 break
|         fi
|         sleep 1
| done

(optimization) You could simplify this significantly using the test builtin:

~  while [ ! -e ${WAL_ARCHIVE}/*.00${TO_SEARCH}.backup.bz2 ]; do
~    echo "Waiting for ${WAL_ARCHIVE}/*.00${TO_SEARCH}.backup.bz2"
~    sleep 1
~  done
~  REF_FILE="`echo ${WAL_ARCHIVE}/*.00${TO_SEARCH}.backup.bz2`"

| # REF_FILE is 000000010000000A00000068.001D1214.backup.bz2
|
| # take only first 24 chars and store.
| REF_FILE_NUM=${REF_FILE:0:24}
|
| # REF_FILE_NUM is 000000010000000A00000068
|
| echo "REF_FILE_NUM=$REF_FILE_NUM"
|
| # iterate list of files in the WAL_ARCHIVE folder
| for i in `ls -1 $WAL_ARCHIVE` ;
| do
|         # $i is :000000010000000A0000005D.bz2 eg
|         # get first 24 chars in filename
|         FILE_NUM=${i:0:24}
|
|         # compare if the number is less than the reference
|         # here string comparison is being used.
|         if [[ $FILE_NUM  < $REF_FILE_NUM ]]
|         then
|                 echo "$FILE_NUM [ $i ] removed"
|                 rm -f $WAL_ARCHIVE/$i
|         else
|                 echo "$FILE_NUM [ $i ] not removed"
|         fi
| done

(optimization) Perhaps using find -newer/-anewer/-cnewer could be used
here to find files older than the reference file:

~  # "-not -newer" or "\! -newer" will also return REF_FILE
~  # so you have to grep it out and use xargs; otherwise you
~  # could also use the -delete action
~  find ${WAL_ARCHIVE} \! -newer ${REF_FILE} -type f | \
~    grep -v "^${REF_FILE}$" | \
~    xargs rm -f

|>The WAL archive command can be set to use either cp or mv ? then why not
|>have a regular cronjob that runs a shell script to add any new files
to your
|>tar archive every time a new file is detected in your WAL-archive
directory?
|
| Hmmm you got me wrong, am talking about taking the base_backup not
| archival of WAL segments

Just a thought - it might be worth trying star (Schilly's tar); it won't
bail out completely if a file changes during read, and I'm 100% sure
that when it only changes in size, it is going to be stored
nevertheless. It also has functionality for storing ACLs and xattrs in
the tar header, works on numerous UNIX flavours and comes as a system
package with most of the popular distributions.

Hope this helped,
- --
~    Grega Bremec
~    gregab at p0f dot net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFEK5Pdfu4IwuB3+XoRA7KVAJ9wbK6RDBnA4s0cjbdR3FE6+oUvGwCfWeAO
SecCgFnus70LKNYCDNeV3Z0=
=oUrb
-----END PGP SIGNATURE-----

Re: FW: Setting up of PITR system.

От
"Jim C. Nasby"
Дата:
On Wed, Mar 29, 2006 at 06:15:59PM +0530, Rajesh Kumar Mallah wrote:
> > Also you wouldn't particular need any heavy programming ? I'm sure a simple
> > shell script could be written in bash to pick out the correct files.
>
> OK i am posting my full script [ its not heavy programming i guess :) ]
> shall be grateful if  you/someone could review it . (its well commented i think)
> script also carries sample data.

Would you be willing to share this script at
http://pgfoundry.org/projects/pgpitrha/ ? Or maybe another project would
be worth starting... I know right now everyone is pretty much rolling
their own when it comes to dealing with PITR, so it'd be good to at
least get all the scripts in one place...
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: FW: Setting up of PITR system.

От
"Jim C. Nasby"
Дата:
On Wed, Mar 29, 2006 at 09:46:30AM -0500, Tom Lane wrote:
> It definitely is a pain in the neck that GNU tar complains about files
> changing underneath it --- I've looked for a way to disable that, or at
> least reduce it to a warning instead of an error condition, but gtar
> doesn't seem to have such a switch.  You should try alternative backup
> tools such as cpio or rsync.

Or you might submit a patch to GNU tar. There's also a BSD version of
tar, it's on at least FreeBSD (not sure if there's a stand-alone version
avaiable). It might not suffer from the same problem, though I haven't
tried it. I do have a machine I could try it with if that would help.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: FW: Setting up of PITR system.

От
"Sriram Dandapani"
Дата:
We have high volume inserts happening with wal files being generated
very rapidly. Will the fact that the files change as the tar happens
have any effect at all on the recovery..assuming all the appropriate wal
archive logs are available.

-----Original Message-----
From: pgsql-admin-owner@postgresql.org
[mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jim C. Nasby
Sent: Friday, March 31, 2006 10:11 AM
To: Tom Lane
Cc: andy.shellam@mailnetwork.co.uk; 'Rajesh Kumar Mallah';
pgsql-admin@postgresql.org
Subject: Re: FW: [ADMIN] Setting up of PITR system.

On Wed, Mar 29, 2006 at 09:46:30AM -0500, Tom Lane wrote:
> It definitely is a pain in the neck that GNU tar complains about files
> changing underneath it --- I've looked for a way to disable that, or
at
> least reduce it to a warning instead of an error condition, but gtar
> doesn't seem to have such a switch.  You should try alternative backup
> tools such as cpio or rsync.

Or you might submit a patch to GNU tar. There's also a BSD version of
tar, it's on at least FreeBSD (not sure if there's a stand-alone version
avaiable). It might not suffer from the same problem, though I haven't
tried it. I do have a machine I could try it with if that would help.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Re: FW: Setting up of PITR system.

От
"Rajesh Kumar Mallah"
Дата:
On 3/30/06, Grega Bremec <gregab@p0f.net> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: RIPEMD160
>
> Rajesh Kumar Mallah wrote:
> |
> | OK i am posting my full script [ its not heavy programming i guess :) ]
> | shall be grateful if  you/someone could review it . (its well
> commented i think)
> | script also carries sample data.
> |
> | it does following
> | 1. takes base backup to a destined folder by rsync
> | 2. waits for .backup file to arrive in archive folder
> |    after pg_stop_bacup()
> | 3. searches and removes unwanted archived log files.
> |
> | I have run it many times in my server and it seems to
> | be working fine.
>
> Hello, Rajesh.
>
> Just a couple of comments on the script itself, not what it actually
> does - I never tried WAL archiving before, so I can't comment on that. I
> inserted the comments at relevant points in the script. I'm sorting them
> into three categories, one is just improvements in style, the other is
> optimization and the third is correction of an error.

Dear Grega ,

Thanks for the useful tips and error spotting,
i am incorporating some of them and testing
the script in my server . I have concerns regarding
some of your optimisations that makes the script less
generic , below find my comments.


>
> | ------------------------------------ BEGIN
> | --------------------------------------------
> | #!/bin/bash
> |
> | # folder where base_backup is put
> | BACKUPFOLDER=/mnt/disk3/base_backups
> | today=`date +%d-%m-%Y-%H-%M-%S`
> | PSQL=/opt/usr/local/pgsql/bin/psql
> | RSYNC="/usr/bin/rsync  -a"
> | PGDATADIR=/mnt/disk5/pgdatadir
> |
> | # two table spaces.
> |
> | TS1=/mnt/disk4/bigtables
> | TS2=/mnt/disk3/indexspace
>
> (optimization) Since you're using bash, you can use arrays. This could
> be better written as
>
> ~  TS[0]=/mnt/disk5/pgdatadir
> ~  TS[1]=/mnt/disk4/bigtables
> ~  TS[2]=/mnt/disk3/indexspace
>
> or even
>
> ~  TS=(/mnt/disk5/pgdatadir \
> ~      /mnt/disk4/bigtables \
> ~      /mnt/disk3/indexspace)


agreed , already incorporated.


>
> That way, you can add tablespaces at will and just use a while loop to
> back them up, which greatly simplifies adding new tablespaces or moving
> the script somewhere else. See below for how to implement that.
>
> | # folder where *archived* logs are put.
> | WAL_ARCHIVE=/mnt/wal_archive
> |
> | label=base_backup_${today}
> |
> | echo "Executing pg_start_backup with label $label in server ... "
> |
> | # get the checkpoint at which backup starts
> | # the .backup files seems to be bearing this string in it.
> |
> | CP=`$PSQL -q -Upostgres -d template1 -c "SELECT
> | pg_start_backup('$label');" -P tuples_only -P format=unaligned`
> |
> | echo "Begin CheckPoint is $CP" # this contain string like A/681D1214
> |
> | if [ $? -ne 0 ]
> |         then
> |         echo "PSQL pg_start_backup failed"
> |         exit 1;
> | fi
> | echo "pg_start_backup executed successfully"
>
> (style) If you want to capture any error messages pg_start_backup may
> have caused and store them into ${CP}, you should add 2>&1 at the end of
> the psql invocation, see below snippet.


Incorporated it.



>
> (error) Checking for exit status of pg_start_backup using $? at this
> point will never report an error, as you've used echo prior to checking
> what pg_start_backup returned. You should either move the echo below the
> if statement (by adding an "else" clause) or store the exit status of
> pg_start_backup into RVAL like this:
>
> ~  CP="`$PSQL ... 2>&1`"
> ~  RVAL=$?
> ~  echo "Begin CheckPoint says: ${CP}"
> ~  if [ ${RVAL} -ne 0 ]; then
> ~    ...
> ~  fi
>
> | echo "RSYNC begins.."
> |
> | # rsync each of the folders to the backup folder.
> | for i in $TS1 $TS2 $PGDATADIR ;
> | do
> |         echo "Syncing $i .. "
> |         time $RSYNC $i $BACKUPFOLDER
> |         echo "Done"
> | done
>
> (optimization) If you store locations into an array, you could rewrite
> this as follows:
>
> ~  CTR=0
> ~  while [ -n "${TS[${CTR}]}" ]; do
> ~    echo "Syncing ${TS[${CTR}]}..."
> ~    time ${RSYNC} ${TS[${CTR}]} ${BACKUPFOLDER}
> ~    RVAL=$?
> ~    echo "Sync finished with exit status ${RVAL}"
> ~    if [ ${RVAL} -ne 0 ]; then
> ~      <handle errors>
> ~    fi
> ~    CTR=$((CTR + 1))
> ~  done
> ~  unset CTR
>
> | # fortunately rsync does *not* seems to be exitting with non zero exit
> code
> | # for expected file disappearances and modifications.
> | if [ $? -ne 0 ]
> |         then
> |         echo "RSYNC failed"
> |         exit 1;
> | fi
>
> (error) Same error as above - what you're checking here is whether the
> last command in the last for loop run was successful, and this is always
> going to be true as echoing to stdout will never fail until stdout is
> closed for some reason.


Yes it was an error, i am doing repeat runs of the script
to find the non zero exit codes which should be treated
as normal in context of taking base backups.



>
> | echo "RSYNC Done successfully"
> |
> | echo "Executing pg_stop_backup in server ... "
> | $PSQL -Upostgres template1 -c "SELECT pg_stop_backup();"
> | if [ $? -ne 0 ]
> |         then
> |         echo "PSQL pg_stop_backup failed"
> |         exit 1;
> | fi
> | echo "pg_stop_backup done successfully


I think i should improve the style here.


> | TO_SEARCH="00${CP:4}" # $TO_SEARCH contains 1D1214
> |
> | # now remove the unneeded files.
> |
> | # strip off first 4 chars from CP and prefix 00 to the result.
> | # search the file that has the Checkpoint in its filename.
> | # it takes a while to come, so wait till it comes.
> |
> | while true; do
> |         REF_FILE=`ls -1 $WAL_ARCHIVE | grep $TO_SEARCH`
> |         if [ ! $REF_FILE ]; then
> |                 echo  "Waitng for file with $TO_SEARCH in $WAL_ARCHIVE
> ... "
> |         else
> |                 break
> |         fi
> |         sleep 1
> | done
>
> (optimization) You could simplify this significantly using the test builtin:
>
> ~  while [ ! -e ${WAL_ARCHIVE}/*.00${TO_SEARCH}.backup.bz2 ]; do
> ~    echo "Waiting for ${WAL_ARCHIVE}/*.00${TO_SEARCH}.backup.bz2"
> ~    sleep 1
> ~  done
> ~  REF_FILE="`echo ${WAL_ARCHIVE}/*.00${TO_SEARCH}.backup.bz2`"


I would not like to incorporate this becuase this code assumes WAL archives
are being externally compressed to .bz2. Can you suggest an optimized but
generic alternative ?

>
> | # REF_FILE is 000000010000000A00000068.001D1214.backup.bz2
> |
> | # take only first 24 chars and store.
> | REF_FILE_NUM=${REF_FILE:0:24}
> |
> | # REF_FILE_NUM is 000000010000000A00000068
> |
> | echo "REF_FILE_NUM=$REF_FILE_NUM"
> |
> | # iterate list of files in the WAL_ARCHIVE folder
> | for i in `ls -1 $WAL_ARCHIVE` ;
> | do
> |         # $i is :000000010000000A0000005D.bz2 eg
> |         # get first 24 chars in filename
> |         FILE_NUM=${i:0:24}
> |
> |         # compare if the number is less than the reference
> |         # here string comparison is being used.
> |         if [[ $FILE_NUM  < $REF_FILE_NUM ]]
> |         then
> |                 echo "$FILE_NUM [ $i ] removed"
> |                 rm -f $WAL_ARCHIVE/$i
> |         else
> |                 echo "$FILE_NUM [ $i ] not removed"
> |         fi
> | done
>
> (optimization) Perhaps using find -newer/-anewer/-cnewer could be used
> here to find files older than the reference file:
>
> ~  # "-not -newer" or "\! -newer" will also return REF_FILE
> ~  # so you have to grep it out and use xargs; otherwise you
> ~  # could also use the -delete action
> ~  find ${WAL_ARCHIVE} \! -newer ${REF_FILE} -type f | \
> ~    grep -v "^${REF_FILE}$" | \
> ~    xargs rm -f

Nopes , i have gone by the DOCS which tells to numerically compare the filenames
i do not want to assume more recently created files are numerically more that
later as i have not seen anything like that in the docs.


I am concerned if the comparison below used in the script

> |         if [[ $FILE_NUM  < $REF_FILE_NUM ]]

is correct , as it compares strings not numbers , i am assuming that
the results will be same as numerical comparison, as all the filenames
are padded with '0' form the left.

i thank you once again for your comments and shall post the improved version
once my observations are complete.

Regds
Rajesh Kumar Mallah.

Re: FW: Setting up of PITR system.

От
"Rajesh Kumar Mallah"
Дата:
On 3/31/06, Sriram Dandapani <sdandapani@counterpane.com> wrote:
> We have high volume inserts happening with wal files being generated
> very rapidly. Will the fact that the files change as the tar happens
> have any effect at all on the recovery..assuming all the appropriate wal
> archive logs are available.


Its absolutely safe. The documentation specifically and sufficiently
dispels this
doubt[ Refer Online Backup ] . All the changes to the data pages
(tables/indexes)
are logged to WAL before the data files are modified so the WAL replay during
recovery changes the files again in the same manner. As long as wal is archived
properly/securly there should not be any doubt.


Regds
Rajesh Kumar Mallah.




>
> -----Original Message-----
> From: pgsql-admin-owner@postgresql.org
> [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jim C. Nasby
> Sent: Friday, March 31, 2006 10:11 AM
> To: Tom Lane
> Cc: andy.shellam@mailnetwork.co.uk; 'Rajesh Kumar Mallah';
> pgsql-admin@postgresql.org
> Subject: Re: FW: [ADMIN] Setting up of PITR system.
>
> On Wed, Mar 29, 2006 at 09:46:30AM -0500, Tom Lane wrote:
> > It definitely is a pain in the neck that GNU tar complains about files
> > changing underneath it --- I've looked for a way to disable that, or
> at
> > least reduce it to a warning instead of an error condition, but gtar
> > doesn't seem to have such a switch.  You should try alternative backup
> > tools such as cpio or rsync.
>
> Or you might submit a patch to GNU tar. There's also a BSD version of
> tar, it's on at least FreeBSD (not sure if there's a stand-alone version
> avaiable). It might not suffer from the same problem, though I haven't
> tried it. I do have a machine I could try it with if that would help.
> --
> Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
> Pervasive Software      http://pervasive.com    work: 512-231-6117
> vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

Re: FW: Setting up of PITR system.

От
Grega Bremec
Дата:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

Rajesh Kumar Mallah wrote:
| Dear Grega ,
|
| Thanks for the useful tips and error spotting,
| i am incorporating some of them and testing
| the script in my server . I have concerns regarding
| some of your optimisations that makes the script less
| generic , below find my comments.
|

Hello, Rajesh,

I'm glad you found some use to the comments :).

|>|
|>| echo "Executing pg_stop_backup in server ... "
|>| $PSQL -Upostgres template1 -c "SELECT pg_stop_backup();"
|>| if [ $? -ne 0 ]
|>|         then
|>|         echo "PSQL pg_stop_backup failed"
|>|         exit 1;
|>| fi
|>| echo "pg_stop_backup done successfully
|
| I think i should improve the style here.
|

Basically, since you're not storing anything into a variable, as was the
case above, anything psql says will be written to stdout/stderr,
whichever it is that psql writes it to, so the only "nasty" thing that
could happen is that line buffering could cause the error message to
appear a line or two lower than where it actually happened. If you want
to capture psql output and manipulate it (for example, indenting it to
make it more readable using sed 's/^.*$/    &/' or similar), sure - you
can assign any output psql emits to a variable you did with
pg_start_backup and redirect stderr to stdout (2>&1). That way, you can
decide if the text is an error by checking $? and act accordingly:

~  SB="`${PSQL} ... 2>&1 | sed 's/^.*$/    &/'`"
~  if [ $? -ne 0 ]; then
~    echo "psql pg_stop_backup failed:"
~    echo "${SB}"
~    exit 1
~  fi

|>
|>(optimization) You could simplify this significantly using the test
builtin:
|>
|>~  while [ ! -e ${WAL_ARCHIVE}/*.00${TO_SEARCH}.backup.bz2 ]; do
|>~    echo "Waiting for ${WAL_ARCHIVE}/*.00${TO_SEARCH}.backup.bz2"
|>~    sleep 1
|>~  done
|>~  REF_FILE="`echo ${WAL_ARCHIVE}/*.00${TO_SEARCH}.backup.bz2`"
|
|
| I would not like to incorporate this becuase this code assumes WAL
archives
| are being externally compressed to .bz2. Can you suggest an optimized but
| generic alternative ?
|

I understand your concern. I was a bit surprised you checked for .bz2
files in your original script, but I suppose that depends on WAL archive
method in use, and I didn't want to make any wild guesses there. :)

Assuming all the WAL files you deal with will either be compressed
already or not compressed at all (that is, you are not going to catch
any of them in the middle of compression taking place), you can simply
replace ".bz2" in the above tests with a glob, "*".

Also, you can define a ${MASK} variable at one place and just reuse
that, if you find there are different archive methods on different hosts
and want to keep the script configurable.

~  WAL_MASK="${WAL_ARCHIVE}/*.00${TO_SEARCH}.backup*"
~  ...
~  echo -n "Waiting for files masked \"${WAL_MASK}\" to appear..."
~  while [ ! -e ${WAL_MASK} ]; do
~    echo -n "."
~    sleep 1
~  done
~  echo " done."
~  REF_FILE="`echo ${WAL_MASK}`"

This does the basic trick. If there are any spaces in directory names,
make sure you store IFS into some variable, reset it to ^M (newline) and
reassign the old value after the above loop:

~  oldIFS="${IFS}"
~  IFS='
~  '
~  while ...; do
~    ...
~  done
~  IFS="${oldIFS}"
~  unset oldIFS

All this gets slightly more complicated if you drop the above
assumption, and expect some files to be caught in the middle of the
compression process, which means there will be more than one file
matching the WAL_MASK:

~  000000010000000A00000068.001D1214.backup (the original)
~  000000010000000A00000068.001D1214.backup.bz2 (partially compressed)

In this case, the test should be done using a different glob expansion
method in the while loop and using find and wc to count the number of
files found, then wait until there's only one file matching the mask left:

~  WAL_MASK="*.00${TO_SEARCH}.backup*"
~  WAL_FILE="${WAL_ARCHIVE}/${WAL_MASK}"
~  ...
~  echo -n "Waiting for files masked \"${WAL_MASK}\" to appear..."
~  # Until at least one file is found, the strings will both be equal.
~  # The echo will safely expand to more than one file, whereas the old
~  # test would complain a unary operator was expected.
~  while [ "${WAL_FILE}" = "`echo ${WAL_FILE}`" ] ||
~        [ `find ${WAL_ARCHIVE} -name "${WAL_MASK}" | wc -l` -gt 1 ]; do
~    # Are we in the middle of compression? Let user know some
~    # progress is taking place.
~    if [ `find ${WAL_ARCHIVE} -name "${WAL_MASK}" | wc -l` -gt 1 ]; then
~      echo -n "c" # for "compressing" :)
~    else
~      echo -n "."
~    fi
~    sleep 1
~  done
~  echo " done."
~  REF_FILE="`echo ${WAL_MASK}`"

The last step could be to protect yourself from waiting indefinitely
just in case something goes wrong (i.e. running out of disk space or
kernel OOM killing the compression program) by defining a maximum amount
of seconds you're willing to wait for a WAL:

~  MAX_SEC=600 # 10 minutes max
~  NSEC=0
~  while ...; do
~    ...
~    sleep 1
~    NSEC=$((NSEC + 1))
~    if [ ${NSEC} -gt ${MAX_SEC} ]; then
~      echo "error: timeout while waiting for WAL."
~      break
~    fi
~  done
~  if [ ${NSEC} -le ${MAX_SEC} ]; then
~    echo "done."
~    REF_FILE="`echo ${WAL_MASK}`"
~  else
~    echo "Bailing out (or something else appropriate)."
~    exit 1
~  fi
~  unset NSEC

Feel free to use whatever is appropriate for your environment, of
course; maybe implementing some of the above is simply more trouble than
it's worth, but I included it anyway, for the sake of completeness.

|>(optimization) Perhaps using find -newer/-anewer/-cnewer could be used
|>here to find files older than the reference file:
|>
|>~  # "-not -newer" or "\! -newer" will also return REF_FILE
|>~  # so you have to grep it out and use xargs; otherwise you
|>~  # could also use the -delete action
|>~  find ${WAL_ARCHIVE} \! -newer ${REF_FILE} -type f | \
|>~    grep -v "^${REF_FILE}$" | \
|>~    xargs rm -f
|
| Nopes , i have gone by the DOCS which tells to numerically compare the
filenames
| i do not want to assume more recently created files are numerically
more that
| later as i have not seen anything like that in the docs.

I see. I've been skipping classes, it seems. :)

| I am concerned if the comparison below used in the script
|
|>|         if [[ $FILE_NUM  < $REF_FILE_NUM ]]
|
| is correct , as it compares strings not numbers , i am assuming that
| the results will be same as numerical comparison, as all the filenames
| are padded with '0' form the left.

This is definitely true in the "C" locale, where "9" comes before "A"
and "Z" comes before "a". If you want to make sure it is so, run the
script with LC_ALL set to "C" like this:

~  $ env LC_ALL="C" backup_script.sh

However, you also have the option of using "sort -g", which compares
according to general numeric value:

~  RM_LIST=""
~  find ${WAL_ARCHIVE} -type f | sort -g | while read archive; do
~    if [ ! "${archive}" = "${REF_FILE}" ]; then
~      RM_LIST="${RM_LIST:+${RM_LIST} }${archive}"
~    else
~      break
~    fi
~  done
~  rm -f ${RM_LIST}

If you use this method, make sure your bash is newer than 2.05, and make
it significantly newer just to be on the safe side, because there is a
bug in old bash2 versions that will fail to propagate variables inside
the loop back to the parent shell ("while" reading from a pipe executes
in a subshell in those versions).

Good luck with the script! :)

Kind regards,
- --
~    Grega Bremec
~    gregab at p0f dot net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFELhlEfu4IwuB3+XoRA/X7AJ9N9luGEuLMlz+eUTeWHh86X0CnkwCbB0Wa
WzjVwnefz5j4a1vwWdR5UzE=
=FXy8
-----END PGP SIGNATURE-----

Re: FW: Setting up of PITR system.

От
"Andy Shellam"
Дата:
Are you talking about tar-ing your completed WAL logs, or the file-system
database directory?

1. If you're talking about tar-ing archived WAL logs - the logs you're
archiving shouldn't be changing, as they should be archived outside of your
PG data directory structure (with the use of archive_command) - perhaps your
archive command should move the files out of the pg_xlog directory BEFORE
tar-ing.

2. If you're talking about logs changing while you're tar-ing your PG data
directory, you shouldn't be including your pg_xlog directory in the archive,
as long as you've got all the WAL logs available AFTER the dump is done,
then it'll fix any inconsistencies in the data when you do the restore.

Hope this helps

Andy

-----Original Message-----
From: pgsql-admin-owner@postgresql.org
[mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Sriram Dandapani
Sent: Friday, 31 March, 2006 7:19 pm
To: Jim C. Nasby; Tom Lane
Cc: andy.shellam@mailnetwork.co.uk; Rajesh Kumar Mallah;
pgsql-admin@postgresql.org
Subject: Re: FW: [ADMIN] Setting up of PITR system.

We have high volume inserts happening with wal files being generated
very rapidly. Will the fact that the files change as the tar happens
have any effect at all on the recovery..assuming all the appropriate wal
archive logs are available.

-----Original Message-----
From: pgsql-admin-owner@postgresql.org
[mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jim C. Nasby
Sent: Friday, March 31, 2006 10:11 AM
To: Tom Lane
Cc: andy.shellam@mailnetwork.co.uk; 'Rajesh Kumar Mallah';
pgsql-admin@postgresql.org
Subject: Re: FW: [ADMIN] Setting up of PITR system.

On Wed, Mar 29, 2006 at 09:46:30AM -0500, Tom Lane wrote:
> It definitely is a pain in the neck that GNU tar complains about files
> changing underneath it --- I've looked for a way to disable that, or
at
> least reduce it to a warning instead of an error condition, but gtar
> doesn't seem to have such a switch.  You should try alternative backup
> tools such as cpio or rsync.

Or you might submit a patch to GNU tar. There's also a BSD version of
tar, it's on at least FreeBSD (not sure if there's a stand-alone version
avaiable). It might not suffer from the same problem, though I haven't
tried it. I do have a machine I could try it with if that would help.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

!DSPAM:14,442d72c535048270212230!





Re: FW: Setting up of PITR system.

От
Grega Bremec
Дата:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

Grega Bremec wrote:
| ~  find ${WAL_ARCHIVE} -type f | sort -g | while read archive; do
| ~    if [ ! "${archive}" = "${REF_FILE}" ]; then

Actually, this test is only possible since ${REF_FILE} was expanded
using the ${WAL_FILE} mask and will contain full path to the log; if you
used your old method to obtain ${REF_FILE}, it will NOT match here, as
it only contains the filename. You should write the test like this:

~  if [ ! "`basename ${archive}`" = "${REF_FILE}" ]; then
~    ...

Kind regards,
- --
~    Grega Bremec
~    gregab at p0f dot net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFELlmqfu4IwuB3+XoRA7PNAJ9Ks5+6l0dmAgeIixcofNTUjXy3FgCfT+D1
kNvLWQIlKuo+GzF05IfPu2U=
=aiNg
-----END PGP SIGNATURE-----

Re: FW: Setting up of PITR system.

От
"Rajesh Kumar Mallah"
Дата:
On 4/1/06, Grega Bremec <gregab@p0f.net> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: RIPEMD160
>
> Rajesh Kumar Mallah wrote:
> | Dear Grega ,
> |
> | Thanks for the useful tips and error spotting,
> | i am incorporating some of them and testing
> | the script in my server . I have concerns regarding
> | some of your optimisations that makes the script less
> | generic , below find my comments.
> |
>
> Hello, Rajesh,
>
> I'm glad you found some use to the comments :).

Hmm i am glad , i have found one person whom i can ask any bash doubt   ;-)
i am discussing only the differentials and not posting the flab.

As described in docs pg_start_backup() puts the file backup_label
in the datadir which can has the info about the start log file.
Instead of taking a round about method i am using the content of the
file, (This was also suggested by Andy at somepoint)
below is the code .

# read the backup_label file in pgdatadir and get the name of start wal file
# below is example content.
##############################################################
#START WAL LOCATION: E/A9145E4 (file 000000010000000E0000000A)
#CHECKPOINT LOCATION: E/A92939C
#START TIME: 2006-04-01 14:36:48 IST
#LABEL: base_backup_01-04-2006-14-36-45
###############################################################

BACKUP_LABEL=$DATADIR/backup_label
# get the like containing line START WAL LOCATION

START_LINE=`grep -i  "START WAL LOCATION"  $BACKUP_LABEL`
# strip something like 'START WAL LOCATION: E/A9145E4 (file ' from begin.
START_LINE=${START_LINE/#START*file /}
# strip ')' from end.
START_LINE=${START_LINE/%)/}
# REF_FILE_NUM is something like 000000010000000A00000068
REF_FILE_NUM=$START_LINE
------------------ End of relevent portion
---------------------------------------
please optimize the above regex capturing process if possible.


lately i feel that we should not be discussing the topic over here
becoz it has less to do with postgresql and more of bash.


~  RM_LIST=""
~  find ${WAL_ARCHIVE} -type f | sort -g | while read archive; do
~    if [ ! "${archive}" = "${REF_FILE}" ]; then

i think you meant < instead of '=' in above line.


~      RM_LIST="${RM_LIST:+${RM_LIST} }${archive}"
~    else
~      break
~    fi
~  done
~  rm -f ${RM_LIST}

last doubt:

regarding

  $ env LC_ALL="C" backup_script.sh
can i do
export LC_ALL="C"  or
LC_ALL="C"
inside the script itself to be sure ?


Thanks for you help once again.

Regds
Rajesh Kumar Mallah.

Re: FW: Setting up of PITR system.

От
Grega Bremec
Дата:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

Rajesh Kumar Mallah wrote:
|
| Instead of taking a round about method i am using the content of the
| file, (This was also suggested by Andy at somepoint)

After reading the docs again, that's what I would do as well, indeed. :)

| lately i feel that we should not be discussing the topic over here
| becoz it has less to do with postgresql and more of bash.

I've been considering that seriously in the very first post I wrote, but
since there seems to be a lot of people interested in a working,
flexible WAL archiving script, I decided to keep it on the list. It is
an administration issue, afterall. I will stand corrected if someone
feels we're clogging their mailboxes.

| ##############################################################
| #START WAL LOCATION: E/A9145E4 (file 000000010000000E0000000A)
| #CHECKPOINT LOCATION: E/A92939C
| #START TIME: 2006-04-01 14:36:48 IST
| #LABEL: base_backup_01-04-2006-14-36-45
| ###############################################################
|
| BACKUP_LABEL=$DATADIR/backup_label
| # get the like containing line START WAL LOCATION
|
| START_LINE=`grep -i  "START WAL LOCATION"  $BACKUP_LABEL`
| # strip something like 'START WAL LOCATION: E/A9145E4 (file ' from begin.
| START_LINE=${START_LINE/#START*file /}
| # strip ')' from end.
| START_LINE=${START_LINE/%)/}
| # REF_FILE_NUM is something like 000000010000000A00000068
| REF_FILE_NUM=$START_LINE

Why not go for the entire filename? Record offset is never going to be
more than eight characters, as include/access/xlogdefs.h states:

~  typedef struct XLogRecPtr
~  {
~      uint32    xlogid;  /* log file #, 0 based */
~      uint32    xrecoff; /* byte offset of location in log file */
~  } XLogRecPtr;

A 32 bit unsigned integer can always be represented in eight hexadecimal
digits or less.

~  REF_FILE="`grep 'START WAL LOCATION' ${BACKUP_LABEL} | \
~    awk '{
~            sub(/)/, "", $6);
~            sub(/[0-9A-F]\//, "", $4);
~            printf("%s.%08s.backup", $6, $4);
~         }'`"

This will remove the trailing paren from WAL filename (field 6), the
leading xlogid and the slash from WAL location (field 4) and compose
them into the full filename, zero-padding WAL location to eight
characters and giving back something like this:

~    000000010000000E0000000A.0A9145E4.backup

What you need to do now is just appendd a glob (if your archive_method
consists of gzip/bzip2/...) and prepend ${WAL_ARCHIVE}:

~  REF_FILE="${WAL_ARCHIVE}/${REF_FILE}*"

| ~  RM_LIST=""
| ~  find ${WAL_ARCHIVE} -type f | sort -g | while read archive; do
| ~    if [ ! "${archive}" = "${REF_FILE}" ]; then
| ~      RM_LIST="${RM_LIST:+${RM_LIST} }${archive}"
| ~    else
| ~      break
| ~    fi
| ~  done
| ~  rm -f ${RM_LIST}
| i think you meant < instead of '=' in above [comparison].

Absolutely not. :) What we're doing here is we're looking at all files
in ${WAL_ARCHIVE} (find), sorting them according to their general
numeric value (sort, lowest first) and adding them one-by-one to the
list of WALs to remove (RM_LIST assignment) until we find REF_FILE (the
"equals not" comparison). As soon as we find REF_FILE, we escape the
while loop (break) and remove all the old log files (rm -f).

Since WALs are numbered in a sequence, and location identifiers in a WAL
which are also a part of the filename are sequential too, sorting will
always produce a list of WAL segments in chronological order, oldest
first, newest last.

What is critical to the above piece of code is that BOTH ${archive} and
${REF_FILE} are either absolute filenames or relative ones, of course,
otherwise they'll never match.

| regarding
|   $ env LC_ALL="C" backup_script.sh

If you do it inside the script, you shoud definitely export it to
subshells since all the backtick commands execute in a subshell. Using
"sort -g" to sort the listing according to general numeric value is the
safest option though, and it is also the least disruptive one as it
doesn't require any changes to the environment.

Kind regards,
- --
~    Grega Bremec
~    gregab at p0f dot net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFEL9x/fu4IwuB3+XoRA3IgAJ9Qn7dYsNhv3e9f+P64mJoiz+s77gCeLELY
4xAxFb3Ncd8RHWkBbgyag7U=
=7MXQ
-----END PGP SIGNATURE-----

Re: FW: Setting up of PITR system.

От
"Rajesh Kumar Mallah"
Дата:
On 4/2/06, Grega Bremec <gregab@p0f.net> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: RIPEMD160
>
> Rajesh Kumar Mallah wrote:
> |
> | Instead of taking a round about method i am using the content of the
> | file, (This was also suggested by Andy at somepoint)
>
> After reading the docs again, that's what I would do as well, indeed. :)
>
> | lately i feel that we should not be discussing the topic over here
> | becoz it has less to do with postgresql and more of bash.
>
> I've been considering that seriously in the very first post I wrote, but
> since there seems to be a lot of people interested in a working,
> flexible WAL archiving script, I decided to keep it on the list. It is
> an administration issue, afterall. I will stand corrected if someone
> feels we're clogging their mailboxes.
>
> | ##############################################################
> | #START WAL LOCATION: E/A9145E4 (file 000000010000000E0000000A)
> | #CHECKPOINT LOCATION: E/A92939C
> | #START TIME: 2006-04-01 14:36:48 IST
> | #LABEL: base_backup_01-04-2006-14-36-45
> | ###############################################################
> |
> | BACKUP_LABEL=$DATADIR/backup_label
> | # get the like containing line START WAL LOCATION
> |
> | START_LINE=`grep -i  "START WAL LOCATION"  $BACKUP_LABEL`
> | # strip something like 'START WAL LOCATION: E/A9145E4 (file ' from begin.
> | START_LINE=${START_LINE/#START*file /}
> | # strip ')' from end.
> | START_LINE=${START_LINE/%)/}
> | # REF_FILE_NUM is something like 000000010000000A00000068
> | REF_FILE_NUM=$START_LINE
>
> Why not go for the entire filename?


it takes a while (i dont know how much) for the .backup file to get
archived and appear in the wal archive area. thats why i prefer to
use the wal log filename (000000010000000A00000068) instead of
something like 000000010000000A00000068.0A348A45.backup.

Do you see any problem in the current approach ?
 i have seen it working fine till now.

Another area i was thinking to improve this script was to
make it dig out all the tablespace folders to be archived by looking
into PGDATADIR/pg_tblspc . This shall make the script more generic.

pg_tblspc as contents like below:

 $ ls -l
total 0
lrwxrwxrwx 1 postgres postgres 19 Mar 27 21:45 16391 -> /mnt/indexspace_new
lrwxrwxrwx 1 postgres postgres 18 Mar 27 21:45 16392 -> /mnt/bigtables_new

can you suggest the sane/recommended way to get the destination
folders?( i was thinking find $PGDATADIR/pg_tblspc -type l -printf "%???????" )
or do i parse output of ls !

if above is done i see the script INPUT/OUTPUT'ACTIVITY as below

INPUTS: PGDATADIR , WAL_ARCHIVE_DIR , LOCAL DUMP DIRECTORY
OUTPUT: n/a
ACTIVITY:

1. it shall take a base backup of PGDATADIR (minus pg_xlog) and
    all tablespaces into *LOCAL* DUMP Directory
2. Frees disk space by removing unwanted LOG files in WAL_ARCHIVE_DIR

i am waiting for my another machine to get fitted with 2 more drives
so that i can test/develop scripts for the restoration part and modify
the script for remote base backups and archiving. that shall
happen in a week or so.

thanks for your support till now.

Regds
Rajesh Kumar Mallah.


>
> Kind regards,
> - --
> ~    Grega Bremec
> ~    gregab at p0f dot net
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.0 (GNU/Linux)
>
> iD8DBQFEL9x/fu4IwuB3+XoRA3IgAJ9Qn7dYsNhv3e9f+P64mJoiz+s77gCeLELY
> 4xAxFb3Ncd8RHWkBbgyag7U=
> =7MXQ
> -----END PGP SIGNATURE-----
>

Re: FW: Setting up of PITR system.

От
Alvaro Herrera
Дата:
Rajesh Kumar Mallah wrote:

> Another area i was thinking to improve this script was to
> make it dig out all the tablespace folders to be archived by looking
> into PGDATADIR/pg_tblspc . This shall make the script more generic.
>
> pg_tblspc as contents like below:
>
>  $ ls -l
> total 0
> lrwxrwxrwx 1 postgres postgres 19 Mar 27 21:45 16391 -> /mnt/indexspace_new
> lrwxrwxrwx 1 postgres postgres 18 Mar 27 21:45 16392 -> /mnt/bigtables_new
>
> can you suggest the sane/recommended way to get the destination
> folders?( i was thinking find $PGDATADIR/pg_tblspc -type l -printf "%???????" )
> or do i parse output of ls !

Try the readlink utility (part of GNU coreutils, not sure about other
Unixen).

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Setting up of PITR system.

От
Brendan Duddridge
Дата:
Whenever you get a finalized script including all the enhancements
you've been talking about,
I would really love to have a copy as I'm sure lots of people here
would. I think this is
something that should be included in the standard distribution.

Can I make a couple of suggestions?

1. Include a mail option to send the admin an email when the backup
succeeds or fails.
2. Call pg_stop_backup() if the script fails for whatever reason. You
can't run the script
    twice unless you stop the backup. I ran the script a few times to
see if I can get a
    PITR backup system going and due to directory permissions and
such and during basic
    testing, I had to manually call pg_stop_backup() after the script
exits abnormally.

Thanks very much for building this script and sharing it with us.
Many of us don't have the skills to
write such a script. I'm a Java programmer, but I have little bash
scripting skills, so this
is very much appreciated.

____________________________________________________________________
Brendan Duddridge | CTO | 403-277-5591 x24 |  brendan@clickspace.com

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

On Apr 2, 2006, at 10:14 AM, Rajesh Kumar Mallah wrote:

> On 4/2/06, Grega Bremec <gregab@p0f.net> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: RIPEMD160
>>
>> Rajesh Kumar Mallah wrote:
>> |
>> | Instead of taking a round about method i am using the content of
>> the
>> | file, (This was also suggested by Andy at somepoint)
>>
>> After reading the docs again, that's what I would do as well,
>> indeed. :)
>>
>> | lately i feel that we should not be discussing the topic over here
>> | becoz it has less to do with postgresql and more of bash.
>>
>> I've been considering that seriously in the very first post I
>> wrote, but
>> since there seems to be a lot of people interested in a working,
>> flexible WAL archiving script, I decided to keep it on the list.
>> It is
>> an administration issue, afterall. I will stand corrected if someone
>> feels we're clogging their mailboxes.
>>
>> | ##############################################################
>> | #START WAL LOCATION: E/A9145E4 (file 000000010000000E0000000A)
>> | #CHECKPOINT LOCATION: E/A92939C
>> | #START TIME: 2006-04-01 14:36:48 IST
>> | #LABEL: base_backup_01-04-2006-14-36-45
>> | ###############################################################
>> |
>> | BACKUP_LABEL=$DATADIR/backup_label
>> | # get the like containing line START WAL LOCATION
>> |
>> | START_LINE=`grep -i  "START WAL LOCATION"  $BACKUP_LABEL`
>> | # strip something like 'START WAL LOCATION: E/A9145E4 (file '
>> from begin.
>> | START_LINE=${START_LINE/#START*file /}
>> | # strip ')' from end.
>> | START_LINE=${START_LINE/%)/}
>> | # REF_FILE_NUM is something like 000000010000000A00000068
>> | REF_FILE_NUM=$START_LINE
>>
>> Why not go for the entire filename?
>
>
> it takes a while (i dont know how much) for the .backup file to get
> archived and appear in the wal archive area. thats why i prefer to
> use the wal log filename (000000010000000A00000068) instead of
> something like 000000010000000A00000068.0A348A45.backup.
>
> Do you see any problem in the current approach ?
>  i have seen it working fine till now.
>
> Another area i was thinking to improve this script was to
> make it dig out all the tablespace folders to be archived by looking
> into PGDATADIR/pg_tblspc . This shall make the script more generic.
>
> pg_tblspc as contents like below:
>
>  $ ls -l
> total 0
> lrwxrwxrwx 1 postgres postgres 19 Mar 27 21:45 16391 -> /mnt/
> indexspace_new
> lrwxrwxrwx 1 postgres postgres 18 Mar 27 21:45 16392 -> /mnt/
> bigtables_new
>
> can you suggest the sane/recommended way to get the destination
> folders?( i was thinking find $PGDATADIR/pg_tblspc -type l -printf
> "%???????" )
> or do i parse output of ls !
>
> if above is done i see the script INPUT/OUTPUT'ACTIVITY as below
>
> INPUTS: PGDATADIR , WAL_ARCHIVE_DIR , LOCAL DUMP DIRECTORY
> OUTPUT: n/a
> ACTIVITY:
>
> 1. it shall take a base backup of PGDATADIR (minus pg_xlog) and
>     all tablespaces into *LOCAL* DUMP Directory
> 2. Frees disk space by removing unwanted LOG files in WAL_ARCHIVE_DIR
>
> i am waiting for my another machine to get fitted with 2 more drives
> so that i can test/develop scripts for the restoration part and modify
> the script for remote base backups and archiving. that shall
> happen in a week or so.
>
> thanks for your support till now.
>
> Regds
> Rajesh Kumar Mallah.
>
>
>>
>> Kind regards,
>> - --
>> ~    Grega Bremec
>> ~    gregab at p0f dot net
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.0 (GNU/Linux)
>>
>> iD8DBQFEL9x/fu4IwuB3+XoRA3IgAJ9Qn7dYsNhv3e9f+P64mJoiz+s77gCeLELY
>> 4xAxFb3Ncd8RHWkBbgyag7U=
>> =7MXQ
>> -----END PGP SIGNATURE-----
>>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>


Вложения

Re: Setting up of PITR system.

От
"Rajesh Kumar Mallah"
Дата:
On 4/3/06, Brendan Duddridge <brendan@clickspace.com> wrote:
> Whenever you get a finalized script including all the enhancements
> you've been talking about,
> I would really love to have a copy as I'm sure lots of people here
> would. I think this is
> something that should be included in the standard distribution.
>
> Can I make a couple of suggestions?
>
> 1. Include a mail option to send the admin an email when the backup
> succeeds or fails.

Considering the fact that the script is hugely unix oriented currently,
such a facility is anyway avialble in crontab using MAILTO variable.

> 2. Call pg_stop_backup() if the script fails for whatever reason. You
> can't run the script
>     twice unless you stop the backup. I ran the script a few times to
> see if I can get a
>     PITR backup system going and due to directory permissions and
> such and during basic
>     testing, I had to manually call pg_stop_backup() after the script
> exits abnormally.

yes this is a problem , i do not know if calling pg_stop_backup() is the
end to the damage control steps, but i shall surely add it for the time
being.


>
> Thanks very much for building this script and sharing it with us.
> Many of us don't have the skills to
> write such a script. I'm a Java programmer, but I have little bash
> scripting skills, so this
> is very much appreciated.

Thanks to prying eyes of the bash gurus who are supervising it :)
i only have the machines and situation to run it and seek
suggestions.

Regds
Rajesh Kumar Mallah.

>
> ____________________________________________________________________
> Brendan Duddridge | CTO | 403-277-5591 x24 |  brendan@clickspace.com
>
> ClickSpace Interactive Inc.
> Suite L100, 239 - 10th Ave. SE
> Calgary, AB  T2G 0V9
>
> http://www.clickspace.com
>
> On Apr 2, 2006, at 10:14 AM, Rajesh Kumar Mallah wrote:
>
> > On 4/2/06, Grega Bremec <gregab@p0f.net> wrote:
> >> -----BEGIN PGP SIGNED MESSAGE-----
> >> Hash: RIPEMD160
> >>
> >> Rajesh Kumar Mallah wrote:
> >> |
> >> | Instead of taking a round about method i am using the content of
> >> the
> >> | file, (This was also suggested by Andy at somepoint)
> >>
> >> After reading the docs again, that's what I would do as well,
> >> indeed. :)
> >>
> >> | lately i feel that we should not be discussing the topic over here
> >> | becoz it has less to do with postgresql and more of bash.
> >>
> >> I've been considering that seriously in the very first post I
> >> wrote, but
> >> since there seems to be a lot of people interested in a working,
> >> flexible WAL archiving script, I decided to keep it on the list.
> >> It is
> >> an administration issue, afterall. I will stand corrected if someone
> >> feels we're clogging their mailboxes.
> >>
> >> | ##############################################################
> >> | #START WAL LOCATION: E/A9145E4 (file 000000010000000E0000000A)
> >> | #CHECKPOINT LOCATION: E/A92939C
> >> | #START TIME: 2006-04-01 14:36:48 IST
> >> | #LABEL: base_backup_01-04-2006-14-36-45
> >> | ###############################################################
> >> |
> >> | BACKUP_LABEL=$DATADIR/backup_label
> >> | # get the like containing line START WAL LOCATION
> >> |
> >> | START_LINE=`grep -i  "START WAL LOCATION"  $BACKUP_LABEL`
> >> | # strip something like 'START WAL LOCATION: E/A9145E4 (file '
> >> from begin.
> >> | START_LINE=${START_LINE/#START*file /}
> >> | # strip ')' from end.
> >> | START_LINE=${START_LINE/%)/}
> >> | # REF_FILE_NUM is something like 000000010000000A00000068
> >> | REF_FILE_NUM=$START_LINE
> >>
> >> Why not go for the entire filename?
> >
> >
> > it takes a while (i dont know how much) for the .backup file to get
> > archived and appear in the wal archive area. thats why i prefer to
> > use the wal log filename (000000010000000A00000068) instead of
> > something like 000000010000000A00000068.0A348A45.backup.
> >
> > Do you see any problem in the current approach ?
> >  i have seen it working fine till now.
> >
> > Another area i was thinking to improve this script was to
> > make it dig out all the tablespace folders to be archived by looking
> > into PGDATADIR/pg_tblspc . This shall make the script more generic.
> >
> > pg_tblspc as contents like below:
> >
> >  $ ls -l
> > total 0
> > lrwxrwxrwx 1 postgres postgres 19 Mar 27 21:45 16391 -> /mnt/
> > indexspace_new
> > lrwxrwxrwx 1 postgres postgres 18 Mar 27 21:45 16392 -> /mnt/
> > bigtables_new
> >
> > can you suggest the sane/recommended way to get the destination
> > folders?( i was thinking find $PGDATADIR/pg_tblspc -type l -printf
> > "%???????" )
> > or do i parse output of ls !
> >
> > if above is done i see the script INPUT/OUTPUT'ACTIVITY as below
> >
> > INPUTS: PGDATADIR , WAL_ARCHIVE_DIR , LOCAL DUMP DIRECTORY
> > OUTPUT: n/a
> > ACTIVITY:
> >
> > 1. it shall take a base backup of PGDATADIR (minus pg_xlog) and
> >     all tablespaces into *LOCAL* DUMP Directory
> > 2. Frees disk space by removing unwanted LOG files in WAL_ARCHIVE_DIR
> >
> > i am waiting for my another machine to get fitted with 2 more drives
> > so that i can test/develop scripts for the restoration part and modify
> > the script for remote base backups and archiving. that shall
> > happen in a week or so.
> >
> > thanks for your support till now.
> >
> > Regds
> > Rajesh Kumar Mallah.
> >
> >
> >>
> >> Kind regards,
> >> - --
> >> ~    Grega Bremec
> >> ~    gregab at p0f dot net
> >> -----BEGIN PGP SIGNATURE-----
> >> Version: GnuPG v1.4.0 (GNU/Linux)
> >>
> >> iD8DBQFEL9x/fu4IwuB3+XoRA3IgAJ9Qn7dYsNhv3e9f+P64mJoiz+s77gCeLELY
> >> 4xAxFb3Ncd8RHWkBbgyag7U=
> >> =7MXQ
> >> -----END PGP SIGNATURE-----
> >>
> >
> > ---------------------------(end of
> > broadcast)---------------------------
> > TIP 5: don't forget to increase your free space map settings
> >
>
>
>
>

Re: FW: Setting up of PITR system.

От
Grega Bremec
Дата:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

Rajesh Kumar Mallah wrote:
| it takes a while (i dont know how much) for the .backup file to get
| archived and appear in the wal archive area. thats why i prefer to
| use the wal log filename (000000010000000A00000068) instead of
| something like 000000010000000A00000068.0A348A45.backup.
|
| Do you see any problem in the current approach ?
|  i have seen it working fine till now.

I do, to be honest. The WAL location counter accounts for 4294967295
positions and while I'm certain that's WAY more than the average number
of transactions that go into a WAL, quite a number of small ones can
certainly happen before a WAL is rolled over, and until then, you're
dealing with the same log file.

If two backups happen in that period of time for whatever reason, you're
going to have a false positive by looking into ${WAL_ARCHIVE} and
searching just for the WAL name, so including the location in the search
of a WAL fragment is certainly necessary. Infact, going purely by
chance, the probability of hitting the same location in two different
log files in two subsequent backups is much lower than hitting the same
WAL twice.

| can you suggest the sane/recommended way to get the destination
| folders?( i was thinking find $PGDATADIR/pg_tblspc -type l -printf
| "%???????" ) or do i parse output of ls !

Well, one excellent idea is certainly Alvaro's - the readlink utility
can do that for you in no time:

~  $ find ${PGDATADIR}/pg_tblspc/ -type l -exec readlink {} \;
~  /export/works/space/zlxmon/tblspc
~  ...

The other option you have is piping '\db' into psql and have it display
the list of tablespaces on standard output (you can use ~/.pgpass to
supress password prompting):

~  $ echo '\db' | psql -A -F: -t template1
~  pg_default:postgres:
~  pg_global:postgres:
~  zlxmon_ts:gregab:/export/work/space/zlxmon/tblspc
~  ...

By reformatting this slightly you can also get the desired result and
most importantly, it doesn't rely on how postgres' objects are organized
in the main database cluster directory:

~  $ echo '\db' | psql -A -F: -t template1 | cut -f3 -d: | grep -v '^$'
~  /export/work/space/zlxmon/tblspc
~  ...

It's all options for you, of course. Use whichever you prefer.

| if above is done i see the script INPUT/OUTPUT'ACTIVITY as below
|
| INPUTS: PGDATADIR , WAL_ARCHIVE_DIR , LOCAL DUMP DIRECTORY
| OUTPUT: n/a
| ACTIVITY:
|
| 1. it shall take a base backup of PGDATADIR (minus pg_xlog) and
|     all tablespaces into *LOCAL* DUMP Directory

I think you can list that as an output. :)

| 2. Frees disk space by removing unwanted LOG files in WAL_ARCHIVE_DIR

Perhaps moving the old log files into a father backup directory and
having them stick around for a period of time before removing them isn't
a bad idea either, just in case something goes wrong with your latest
backup. You could go about that using find as well; see the -ctime
predicate in find(1).

Kind regards,
- --
~    Grega Bremec
~    gregab at p0f dot net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFEMLxYfu4IwuB3+XoRA1zZAJ9y7HquxYLH5l0rYff6+cvpU+9lVACfVNG0
PPJZu8IPMBR5j/xPy1+CFDs=
=G3Bt
-----END PGP SIGNATURE-----

Re: FW: Setting up of PITR system.

От
"Rajesh Kumar Mallah"
Дата:
> | Do you see any problem in the current approach ?
> |  i have seen it working fine till now.
>
> I do, to be honest. The WAL location counter accounts for 4294967295
> positions and while I'm certain that's WAY more than the average number
> of transactions that go into a WAL, quite a number of small ones can
> certainly happen before a WAL is rolled over, and until then, you're
> dealing with the same log file.
>
> If two backups happen in that period of time for whatever reason, you're
> going to have a false positive by looking into ${WAL_ARCHIVE} and
> searching just for the WAL name, so including the location in the search
> of a WAL fragment is certainly necessary. Infact, going purely by
> chance, the probability of hitting the same location in two different
> log files in two subsequent backups is much lower than hitting the same
> WAL twice.

Dear Grega,

sincere thanks for your time,

The current wal log is not being removed from the wal archive area
in any case. The files less than the current ones are being rm'ed.

I am sorry i am not able to get your apprehension. But i shall
surely try harder to understand your point.

anyways have a look at the current script with following improvements.

1. Do some sanity checks about folder existance and permissions
2. accepts 3 mandatory args now ,
    PGDATADIR , BACKUP DUMP FOLDER and WAL ARCHIVE AREA
3. use readlink -f to probe all the directories to be included in basebackup
4. Attempt to probe psql and rsync in system and bail out if not found.

Regarding :

> | 2. Frees disk space by removing unwanted LOG files in WAL_ARCHIVE_DIR
>
> Perhaps moving the old log files into a father backup directory and
> having them stick around for a period of time before removing them isn't
> a bad idea either, just in case something goes wrong with your latest
> backup. You could go about that using find as well; see the -ctime
> predicate in find(1).

the old log files without the base backup are not useful. since
rsync is being used to optimise the copying by overwriting the
base backup everytime, i dont thing preserving the old files
makes sense. Had it been and non overwritng backup the files
would have made sense.

----------------   BEGIN -------------------------------------------------
#!/bin/bash

##################################################
# it does following
# 1. checks existance and permission of imp folders.
# 2. takes base backup to a destined folder by rsync
# 3. removes unwanted archived log files.
##################################################

if [ $# -ne 3 ]
  then
        echo "Usage: $0 <DATADIR> <BACKUP DIRECTORY> <WAL ARCHIVE DIRECTORY>"
        exit 1
fi
DATADIR_IN=$1
BACKUPFOLDER=$2
WAL_ARCHIVE=$3

if [ -z $BACKUPFOLDER ] || [ ! -d $BACKUPFOLDER ] || [ ! -w $BACKUPFOLDER ]
   then
        echo "Sorry base backup folder $BACKUPFOLDER does not exists
or is not writable or is not specified!"
        exit 1
fi
if [ -z $WAL_ARCHIVE ] || [ ! -d $WAL_ARCHIVE ] || [ ! -w $WAL_ARCHIVE ]
   then
        echo "Sorry WAL archive folder $WAL_ARCHIVE does not exists or
is not writable or is not specified!"
        exit 1
fi
if [ -L $DATADIR_IN ]
   then
        DATADIR=`readlink -f $DATADIR_IN`
        echo "Using $DATADIR instead of $DATADIR_IN as $DATADIR_IN is a link"
   else
        DATADIR=$DATADIR_IN
fi

# get all tablespaces from $DATADIR/pg_tblspc
DIRS=(`find $DATADIR/pg_tblspc -type l -exec readlink -f {} \;`)
# append DATADIR to it
DIRS=( "${DIRS[@]}" $DATADIR)

CTR=0
echo "Script shall backup following folders"
while [ -n "${DIRS[${CTR}]}" ]; do
    echo "${DIRS[${CTR}]}"
    CTR=$((CTR + 1))
done
unset CTR

PSQL_BIN=`which psql` || /usr/local/pgsql/bin/psql
RSYNC_BIN=`which rsync` || /usr/bin/rsync

for PROG in $PSQL_BIN $RSYNC_BIN ; do
if [ ! -f $PROG ] || [ ! -x $PROG ]
   then
        echo "Sorry $PROG does not exists or is not executable by you"
        echo "Please set env variable PATH to include psql and rsync"
        exit 1
   else
        echo "Using $PROG"
fi
done

RSYNC_OPTS="--delete-after -a --exclude pg_xlog"
RSYNC="$RSYNC_BIN $RSYNC_OPTS"
PSQL=$PSQL_BIN

today=`date +%d-%m-%Y-%H-%M-%S`
label=base_backup_${today}

echo "Executing pg_start_backup with label $label in server ... "

# get the checkpoint at which backup starts
# the .backup files seems to be bearing this string in it.

CP=`$PSQL -q -Upostgres -d template1 -c "SELECT
pg_start_backup('$label');" -P tuples_only -P format=unaligned`

RVAL=$?
if [ $RVAL -ne 0 ]
        then
        echo "PSQL pg_start_backup failed:$CP"
        exit 1;
fi
echo "pg_start_backup executed successfully"

# read the backup_label file in pgdatadir and get the name of start wal file
# below is example content.
#START WAL LOCATION: E/A9145E4 (file 000000010000000E0000000A)
#CHECKPOINT LOCATION: E/A92939C
#START TIME: 2006-04-01 14:36:48 IST
#LABEL: base_backup_01-04-2006-14-36-45

BACKUP_LABEL=$DATADIR/backup_label # assuming pg_start_backup
immediate puts backup_label in
                                   # pgdatadir on finish.
START_LINE=`grep -i  "START WAL LOCATION"  $BACKUP_LABEL` # get the
like containing START WAL LOCATION
START_LINE=${START_LINE/#START*file /} # strip something like 'START
WAL LOCATION: E/A9145E4 (file ' from begin.
START_LINE=${START_LINE/%)/} # strip ')' from end.

# REF_FILE_NUM is something like 000000010000000A00000068
REF_FILE_NUM=$START_LINE

echo "Content of $BACKUP_LABEL"
echo "------------- begin -----------"
cat $BACKUP_LABEL
echo "------------- end -----------"
echo "Read Start Wal as : $REF_FILE_NUM"


echo "RSYNC begins.."

# rsync each of the folders to the backup folder.
CTR=0
while [ -n "${DIRS[${CTR}]}" ]; do
    echo "Syncing ${DIRS[${CTR}]}..."
    echo "Executing:${RSYNC} ${DIRS[${CTR}]} ${BACKUPFOLDER}"
    time ${RSYNC} ${DIRS[${CTR}]} ${BACKUPFOLDER}
    RVAL=$?
    echo "Sync finished with exit status ${RVAL}"
    if [[ ${RVAL} -eq 0 || ${RVAL} -eq 23  ]]; then
        echo "Rsync success"
    else
        echo "Rsync failed"
        $PSQL -Upostgres template1 -c "SELECT pg_stop_backup();"
        exit 1
    fi
    CTR=$((CTR + 1))
done
unset CTR


echo "Executing pg_stop_backup in server ... "
$PSQL -Upostgres template1 -c "SELECT pg_stop_backup();"
if [ $? -ne 0 ]
        then
        echo "PSQL pg_stop_backup failed"
        exit 1;
fi
echo "pg_stop_backup done successfully"


echo "REF_FILE_NUM=$REF_FILE_NUM"

# iterate list of files in the WAL_ARCHIVE folder
for i in `ls -1 $WAL_ARCHIVE` ;
do
        # $i is :000000010000000A0000005D.bz2 eg
        # get first 24 chars in filename
        FILE_NUM=${i:0:24}

        # compare if the number is less than the reference
        # here string comparison is being used.
        if [[ $FILE_NUM  < $REF_FILE_NUM ]]
        then
                echo "$FILE_NUM [ $i ] removed"
                rm -f $WAL_ARCHIVE/$i
        else
                echo "$FILE_NUM [ $i ] not removed"
        fi
done
------------------ END -----------------------------------------------------

Re: FW: Setting up of PITR system.

От
Grega Bremec
Дата:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

Rajesh Kumar Mallah wrote:
|>| Do you see any problem in the current approach ?
|>|  i have seen it working fine till now.
|>
|>I do, to be honest. The WAL location counter accounts for 4294967295
|>positions and while I'm certain that's WAY more than the average number
|>of transactions that go into a WAL, quite a number of small ones can
|>certainly happen before a WAL is rolled over, and until then, you're
|>dealing with the same log file.
|>
|>If two backups happen in that period of time for whatever reason, you're
|>going to have a false positive by looking into ${WAL_ARCHIVE} and
|>searching just for the WAL name, so including the location in the search
|>of a WAL fragment is certainly necessary. Infact, going purely by
|>chance, the probability of hitting the same location in two different
|>log files in two subsequent backups is much lower than hitting the same
|>WAL twice.
|
| The current wal log is not being removed from the wal archive area
| in any case. The files less than the current ones are being rm'ed.
|
| I am sorry i am not able to get your apprehension. But i shall
| surely try harder to understand your point.

Hi Rajesh, list.

I'm sorry I didn't get back to you earlier, I was at an IBM business
conference for a couple of days; not to say it rendered me incapable of
communicating via e-mail, but it did bring along certain social
responsibilities which caused me to both stay up and sleep late, if you
know what I mean. :)

Let me explain the above predicament in more practical terms.

Let us say you started a backup very soon after a WAL had been rolled
over. Current WAL at that time was called, for example,
${PGDATA}/pg_xlog/000000010000000E0000000A. The location at that time
was 000F594A (iow, early in the WAL cycle). [disclaimer: all events in
this story are entirely fictional, any similarity to actual persons and
events is purely coincidental :) ]

pg_start_backup() will create a WAL backup:
~  ${PGDATA}/pg_xlog/000000010000000E0000000A.000F594A.backup

which will be archived to ${WAL_ARCHIVE} under the same name, or
possibly given a different extension, depending on archive_method. Let
us assume for the purpose of this explanation, that archive_method
consists only of cp -i </dev/null, although the problem would have been
identical if one used gzip -c, for example.

Now, this backup fails for whatever reason (rsync trouble, etc.). You
abort it and leave WAL archive as it was. You diagnose the problem that
caused the backup to fail and repeat the procedure. And since your
diagnostic skills are so good it took you almost no time to fix it, the
database engine is now at location 002D94AF in that _same_ WAL.

Once you restart the backup script, pg_start_backup() is called and
${PGDATA}/pg_xlog/000000010000000E0000000A.002D94AF.backup is created
and archived to ${WAL_ARCHIVE} under that same name.

Your method of discovering logs to delete will now match _two_ "current"
log file archives instead of one, because they both come from the same
WAL, fail to actually delete the stale one (the one from position
000F594A) and thus clutter your backup with irrelevant WAL fragments.

The second part of the second paragraph was only to expose that,
following the same logic as outlined above, if you take WAL locations as
the criterion of removing stale WAL fragments instead of WAL names, it
is far less likely to hit a false positive, because you would have to
pg_start_backup() _exactly_ 4294967296 locations after the first one.

Of course, you want to be unambiguous in your search of the perfect WAL
archive, so you want to use _both_ WAL name and location as the criterion.

| the old log files without the base backup are not useful. since
| rsync is being used to optimise the copying by overwriting the
| base backup everytime, i dont thing preserving the old files
| makes sense. Had it been and non overwritng backup the files
| would have made sense.

I see. I was assuming you used rsync to copy the database cluster
somewhere then tar it there, while it was lying still ("Fell, Destroyed"
of Fugazi comes to mind :) ).

I will get back to you with the review of your script later. A quick
scan reveals there is not much left to be improved, though.

Kind regards,
- --
~    Grega Bremec
~    gregab at p0f dot net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFEN68Vfu4IwuB3+XoRAxbjAJ49Hg6d9J0RZLywEZLr3WeD1sHevQCfSOm8
Pr3jEYb/+viTp/OAoJ6a5/M=
=9e0i
-----END PGP SIGNATURE-----

Re: FW: Setting up of PITR system.

От
"Rajesh Kumar Mallah"
Дата:
> Rajesh Kumar Mallah wrote:
> |>| Do you see any problem in the current approach ?
> |>|  i have seen it working fine till now.
> |>
> |>I do, to be honest. The WAL location counter accounts for 4294967295
> |>positions and while I'm certain that's WAY more than the average number
> |>of transactions that go into a WAL, quite a number of small ones can
> |>certainly happen before a WAL is rolled over, and until then, you're
> |>dealing with the same log file.
> |>
> |>If two backups happen in that period of time for whatever reason, you're
> |>going to have a false positive by looking into ${WAL_ARCHIVE} and
> |>searching just for the WAL name, so including the location in the search
> |>of a WAL fragment is certainly necessary. Infact, going purely by
> |>chance, the probability of hitting the same location in two different
> |>log files in two subsequent backups is much lower than hitting the same
> |>WAL twice.
> |
> | The current wal log is not being removed from the wal archive area
> | in any case. The files less than the current ones are being rm'ed.
> |
> | I am sorry i am not able to get your apprehension. But i shall
> | surely try harder to understand your point.
>
> Hi Rajesh, list.
>
> I'm sorry I didn't get back to you earlier, I was at an IBM business
> conference for a couple of days; not to say it rendered me incapable of
> communicating via e-mail, but it did bring along certain social
> responsibilities which caused me to both stay up and sleep late, if you
> know what I mean. :)
>
> Let me explain the above predicament in more practical terms.
>
> Let us say you started a backup very soon after a WAL had been rolled
> over. Current WAL at that time was called, for example,
> ${PGDATA}/pg_xlog/000000010000000E0000000A. The location at that time
> was 000F594A (iow, early in the WAL cycle). [disclaimer: all events in
> this story are entirely fictional, any similarity to actual persons and
> events is purely coincidental :) ]
>
> pg_start_backup() will create a WAL backup:
> ~  ${PGDATA}/pg_xlog/000000010000000E0000000A.000F594A.backup
>
> which will be archived to ${WAL_ARCHIVE} under the same name, or
> possibly given a different extension, depending on archive_method. Let
> us assume for the purpose of this explanation, that archive_method
> consists only of cp -i </dev/null, although the problem would have been
> identical if one used gzip -c, for example.
>
> Now, this backup fails for whatever reason (rsync trouble, etc.). You
> abort it and leave WAL archive as it was. You diagnose the problem that
> caused the backup to fail and repeat the procedure. And since your
> diagnostic skills are so good it took you almost no time to fix it, the
> database engine is now at location 002D94AF in that _same_ WAL.
>
> Once you restart the backup script, pg_start_backup() is called and
> ${PGDATA}/pg_xlog/000000010000000E0000000A.002D94AF.backup is created
> and archived to ${WAL_ARCHIVE} under that same name.
>
> Your method of discovering logs to delete will now match _two_ "current"
> log file archives instead of one, because they both come from the same
> WAL, fail to actually delete the stale one (the one from position
> 000F594A) and thus clutter your backup with irrelevant WAL fragments.

Dear Grega,
Thanks for the reply.

now i have started understanding !

Is cluttering of the wal archive area in cases where that backup
had to be re-started for whatever reasons is the *only* concern ?

if its so , we should not be too much bothered becoz in the next
successfull backup the extra clutter will get deleted.

if there are other concern please lemme know.


>
> The second part of the second paragraph was only to expose that,
> following the same logic as outlined above, if you take WAL locations as
> the criterion of removing stale WAL fragments instead of WAL names, it
> is far less likely to hit a false positive, because you would have to
> pg_start_backup() _exactly_ 4294967296 locations after the first one.
>
> Of course, you want to be unambiguous in your search of the perfect WAL
> archive, so you want to use _both_ WAL name and location as the criterion.
>
> | the old log files without the base backup are not useful. since
> | rsync is being used to optimise the copying by overwriting the
> | base backup everytime, i dont thing preserving the old files
> | makes sense. Had it been and non overwritng backup the files
> | would have made sense.
>
> I see. I was assuming you used rsync to copy the database cluster
> somewhere then tar it there, while it was lying still ("Fell, Destroyed"
> of Fugazi comes to mind :) ).
>
> I will get back to you with the review of your script later. A quick
> scan reveals there is not much left to be improved, though.


Please do not put too much effort, as i the drives in my other
server has got installed and i am adapting the script for doing
remote backup ( which is a more common senerio).

Thank You
Regds
Rajesh Kumar Mallah.

>

Re: FW: Setting up of PITR system.

От
Bruce Momjian
Дата:
Tom Lane wrote:
> It definitely is a pain in the neck that GNU tar complains about files
> changing underneath it --- I've looked for a way to disable that, or at
> least reduce it to a warning instead of an error condition, but gtar
> doesn't seem to have such a switch.  You should try alternative backup
> tools such as cpio or rsync.

You mean the tar exits or that it just returns an error code on
completion?

--
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: FW: Setting up of PITR system.

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> It definitely is a pain in the neck that GNU tar complains about files
>> changing underneath it --- I've looked for a way to disable that, or at
>> least reduce it to a warning instead of an error condition, but gtar
>> doesn't seem to have such a switch.  You should try alternative backup
>> tools such as cpio or rsync.

> You mean the tar exits or that it just returns an error code on
> completion?

I don't recall whether it finishes making the tarball, but it definitely
returns nonzero exit status, which makes it effectively unusable in any
automated script (you certainly don't want to ignore exit status in a
backup script ...)

            regards, tom lane

Re: FW: Setting up of PITR system.

От
"Spiegelberg, Greg"
Дата:
You should implement filesystem or volume level snapshots.  LVM, Veritas and the like all have the functionality and it gets the database out of "backup" mode quickly.
 
 
Before the lvcreate -s command issue the pg_start_backup() and after pg_stop_backup().  Then do what you will with the snapshot volume.  Just in case a fsck may be necessary I'd recommend mounting read-only.
 
Caveat emptor, I believe certain filesystems take issue with this, like xfs, however we have not seen anything unusual with ext3.  Also note that you must have some space unallocated in the proper LVM disk group for the snapshot volume.
 
Greg


From: pgsql-admin-owner@postgresql.org on behalf of Tom Lane
Sent: Mon 4/10/2006 7:19 PM
To: Bruce Momjian
Cc: andy.shellam@mailnetwork.co.uk; 'Rajesh Kumar Mallah'; pgsql-admin@postgresql.org
Subject: Re: FW: [ADMIN] Setting up of PITR system.

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> It definitely is a pain in the neck that GNU tar complains about files
>> changing underneath it --- I've looked for a way to disable that, or at
>> least reduce it to a warning instead of an error condition, but gtar
>> doesn't seem to have such a switch.  You should try alternative backup
>> tools such as cpio or rsync.

> You mean the tar exits or that it just returns an error code on
> completion?

I don't recall whether it finishes making the tarball, but it definitely
returns nonzero exit status, which makes it effectively unusable in any
automated script (you certainly don't want to ignore exit status in a
backup script ...)

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to majordomo@postgresql.org so that your
       message can get through to the mailing list cleanly

Re: FW: Setting up of PITR system.

От
Christian Kratzer
Дата:
Hi,

On Tue, 11 Apr 2006, Spiegelberg, Greg wrote:
> You should implement filesystem or volume level snapshots.  LVM,
> Veritas and the like all have the functionality and it gets the
> database out of "backup" mode quickly.

from what I understand there is no need to keep the time between
pg_start_backup and pg_stop_backup small.  The datebase is in no
way restrcited in its operation or performance during this time.

The only magic seems to be the marker file containing the information
which transaction were commited at the time the backup started.

As we run freebsd and our tar does not complain of files changing
underneath we prefer to avoid the additional complexity and of
filesystem snapshots and just plain tar the data directory
without the pg_xlog subdirectory.

Greetings
Christian

--
Christian Kratzer                       ck@cksoft.de
CK Software GmbH                        http://www.cksoft.de/
Phone: +49 7452 889 135                 Fax: +49 7452 889 136

Re: FW: Setting up of PITR system.

От
Bruce Momjian
Дата:
We already recommend file system snapshots as a backup method in our
documentation.

---------------------------------------------------------------------------

Spiegelberg, Greg wrote:
> You should implement filesystem or volume level snapshots.  LVM, Veritas and the like all have the functionality and
itgets the database out of "backup" mode quickly. 
>
> http://www.tldp.org/HOWTO/LVM-HOWTO/snapshots_backup.html
>
> Before the lvcreate -s command issue the pg_start_backup() and after pg_stop_backup().  Then do what you will with
thesnapshot volume.  Just in case a fsck may be necessary I'd recommend mounting read-only. 
>
> Caveat emptor, I believe certain filesystems take issue with this, like xfs, however we have not seen anything
unusualwith ext3.  Also note that you must have some space unallocated in the proper LVM disk group for the snapshot
volume.
>
> Greg
>
> ________________________________
>
> From: pgsql-admin-owner@postgresql.org on behalf of Tom Lane
> Sent: Mon 4/10/2006 7:19 PM
> To: Bruce Momjian
> Cc: andy.shellam@mailnetwork.co.uk; 'Rajesh Kumar Mallah'; pgsql-admin@postgresql.org
> Subject: Re: FW: [ADMIN] Setting up of PITR system.
>
>
>
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> It definitely is a pain in the neck that GNU tar complains about files
> >> changing underneath it --- I've looked for a way to disable that, or at
> >> least reduce it to a warning instead of an error condition, but gtar
> >> doesn't seem to have such a switch.  You should try alternative backup
> >> tools such as cpio or rsync.
>
> > You mean the tar exits or that it just returns an error code on
> > completion?
>
> I don't recall whether it finishes making the tarball, but it definitely
> returns nonzero exit status, which makes it effectively unusable in any
> automated script (you certainly don't want to ignore exit status in a
> backup script ...)
>
>                         regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly
>
>

--
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: FW: Setting up of PITR system.

От
Grega Bremec
Дата:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

Rajesh Kumar Mallah wrote:
> Is cluttering of the wal archive area in cases where that backup
> had to be re-started for whatever reasons is the *only* concern ?

Well, yes, to be honest. But it may in consequence cause problems of
another kind, which I don't feel qualified to reassure you on: I am not
positive about what happens if you try to replay an old WAL on a current
database backup.

If nothing else, it is going to make the person restoring the backup
rather unnerved about success of the operation they are currently
performing, which is not a good thing, IMHO. They are restoring a backup
afterall, which means they'd already undergone a fair amount of stress
as it is. :)

End all, it is your choice to decide which is more trouble and which is
worth more: fixing the script to produce clean backups or informing your
backup operators about the extra care they need to take when restoring
backups.

> Please do not put too much effort, as i the drives in my other
> server has got installed and i am adapting the script for doing
> remote backup ( which is a more common senerio).

Very nice! How is it going? And how are you copying the WALs? scp? rsync?

Kind regards,
- --
    Grega Bremec
    gregab at p0f dot net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFEPgU9fu4IwuB3+XoRA+MeAJ0dbbfcgBqP9SCYq0VICN8xrtGN0wCffE6i
kq1LlDwlJwmfrOtwRBwGqFg=
=olf3
-----END PGP SIGNATURE-----

Re: FW: Setting up of PITR system.

От
"Rajesh Kumar Mallah"
Дата:


On 4/13/06, Grega Bremec <gregab@p0f.net> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: RIPEMD160
>
> Rajesh Kumar Mallah wrote:
> > Is cluttering of the wal archive area in cases where that backup
> > had to be re-started for whatever reasons is the *only* concern ?
>
> Well, yes, to be honest. But it may in consequence cause problems of
> another kind, which I don't feel qualified to reassure you on: I am not
> positive about what happens if you try to replay an old WAL on a current
> database backup.
>
> If nothing else, it is going to make the person restoring the backup
> rather unnerved about success of the operation they are currently
> performing, which is not a good thing, IMHO. They are restoring a backup
> afterall, which means they'd already undergone a fair amount of stress
> as it is. :)
>
> End all, it is your choice to decide which is more trouble and which is
> worth more: fixing the script to produce clean backups or informing your
> backup operators about the extra care they need to take when restoring
> backups.

Dear Grega,

gald to see the clarification.

The concern you have is valid only if something goes wrong
after pg_start_backup() . In such case the backup admin
can get notified in advance by using the MAILTO env variable
in crontab. so its not that we are stressing him in already stressed
situation.  

the reason of my aversion in incorporating you suggestion is however
different. There does not seem to be any realiable way of getting the
name of .backup file from the contents in backup_label. consider the
method you gave.

REF_FILE="`grep 'START WAL LOCATION' ${BACKUP_LABEL} | \
   awk '{
            sub(/)/, "", $6);
            sub(/[0-9A-F]\//, "", $4);
            printf("%s.%08s.backup", $6, $4);
   }'`"

firstly i feel there should be a + after [0-9A-F] , when i run the command on
my data following is the outcome.

-bash-2.05b$ cat  /mnt/disk2/base_backups/pgdatadir/backup_label
START WAL LOCATION: 1C/4C7E5E90 (file 000000010000001C0000004C)
CHECKPOINT LOCATION: 1C/4C824A44
START TIME: 2006-04-12 21:32:43 IST
LABEL: base_backup_12-04-2006-21-32-52

-bash-2.05b$ grep 'START WAL'  /mnt/disk2/base_backups/pgdatadir/backup_label | awk '{ sub (/)/, "", $6); sub(/[0-9A-F]+\//, "", $4); printf("%s.%08s.backup\n", $6, $4 );  }'
000000010000001C0000004C.4C7E5E90.backup

-bash-2.05b$ ls -l /mnt/disk2/wal_archive/*.backup
-rw-------    1 postgres postgres      270 Apr 12 21:53 /mnt/disk2/wal_archive/000000010000001C0000004C.007E5E90.backup


was you can see the predicted name
000000010000001C0000004C.4C7E5E90.backup is not same as
000000010000001C0000004C.007E5E90.backup

one may argue that replacing first 2 charachers of 4C7E5E90 with
'0's may yeild the correct result, but it is not so , i have observed
a case which runs against this hypothesis.

therefore unless someone tells the correct method of derieving
the filename , the current approach is the best that can be done.
I really hope some guru to throw some light here. i tried digging
xlog.c and xlogutils.c without any success.


>
> > Please do not put too much effort, as i the drives in my other
> > server has got installed and i am adapting the script for doing
> > remote backup ( which is a more common senerio).
>
> Very nice! How is it going? And how are you copying the WALs? scp? rsync?

I finished it long back but i was waiting for comments from the list
regarding the concern to which you replied. I think we discuss the
new version once this issue is resolved.

Regds
Rajesh Kumar Mallah.

>
> Kind regards,
> - --
>     Grega Bremec
>     gregab at p0f dot net
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.0 (GNU/Linux)
>
> iD8DBQFEPgU9fu4IwuB3+XoRA+MeAJ0dbbfcgBqP9SCYq0VICN8xrtGN0wCffE6i
> kq1LlDwlJwmfrOtwRBwGqFg=
> =olf3
> -----END PGP SIGNATURE-----
>

Re: FW: Setting up of PITR system.

От
"Rajesh Kumar Mallah"
Дата:
Hi List,

please find below a

1. A script that takes remote base backups of a postgresql database.
it assumes a password less communication between the backup
server and the database server. After taking a base backup it removes
that un neccesary WAL log files from the archive folder. The script has been
running in my server for past few weeks without any problems. It takes
apprx 40 mins to backup around 40 GB data.

sample execution on the backup server
$ ./pg_rsync.sh 216.247.238.130 /mnt/disk2/base_backups  /mnt/disk2/wal_archive
or in crontab as
30 20 * * * /usr/bin/time /home/postgres/pg_rsync.sh 216.247.238.130 /mnt/disk2/base_backups  /mnt/disk2/wal_archive

Note: script also assumes that you have setup remote wal archiving using a suitable
archive_command in postgresql.conf , these scripts are *not* posted here.


any suggestions welcome.


------------------------ script begins (pg_rsync.sh)------------------------------------------------
#!/bin/bash

##################################################
# it does following
# 1. checks existance and permission of imp folders.
# 2. takes base backup to a destined folder by rsync
# 3. removes unwanted archived log files.
# Contributions: Grega Bremec (gregab@p0f.net) ,
#                        Alvaro Herrera (alvherre@commandprompt.com)
# Preliminary Version: Mallah (mallah.rajesh@gmail.com)
##################################################

if [ $# -ne 3 ]
  then
        echo "Usage: $0 <HOSTNAME> <BACKUP DIRECTORY> <WAL ARCHIVE DIRECTORY>"
        exit 1
fi
HOSTNAME=$1
BACKUPFOLDER=$2
WAL_ARCHIVE=$3



echo "HOSTNAME=$HOSTNAME"
echo "BACKUPFOLDER=$BACKUPFOLDER"
echo "WAL_ARCHIVE=$WAL_ARCHIVE"

if [ -z $BACKUPFOLDER ] || [ ! -d $BACKUPFOLDER ] || [ ! -w $BACKUPFOLDER ]
   then
        echo "Sorry base backup folder $BACKUPFOLDER does not exists or is not writable or is not specified!"
        exit 1
fi
if [ -z $WAL_ARCHIVE ] || [ ! -d $WAL_ARCHIVE ] || [ ! -w $WAL_ARCHIVE ]
   then
        echo "Sorry WAL archive folder $WAL_ARCHIVE does not exists or is not writable or is not specified!"
        exit 1
fi

PSQL_BIN=`which psql` || /usr/local/pgsql/bin/psql
RSYNC_BIN=`which rsync` || /usr/bin/rsync
SSH_BIN=`which ssh` || /usr/bin/ssh

for PROG in $PSQL_BIN $RSYNC_BIN $SSH_BIN ; do
if [ ! -f $PROG ] || [ ! -x $PROG ]
   then
        echo "Sorry $PROG does not exists or is not executable by you"
        echo "Please set env variable PATH to include psql and rsync"
        exit 1
   else
        echo "Using $PROG"
fi
done

RSYNC_OPTS="--delete-after -a -e $SSH_BIN --exclude pg_xlog"
RSYNC="$RSYNC_BIN $RSYNC_OPTS"
PSQL=$PSQL_BIN

today=`date +%d-%m-%Y-%H-%M-%S`
label=base_backup_${today}

echo "Querying data_directory and tablespace folders from server $HOSTNAME"
DATA_DIR=`$PSQL -q -Upostgres -d template1 -h $HOSTNAME -c "show data_directory;" -P tuples_only -P format=unaligned 2>&1`
RVAL=$?
if [ $RVAL -ne 0 ]
        then
        echo "Some error in getting data_directory:$DATA_DIR"
        exit 1;
fi
echo "DATA_DIR:$DATA_DIR"

TBL_SPCS=(`$PSQL -q -Upostgres -d template1 -h $HOSTNAME -c "SELECT  spclocation from pg_catalog.pg_tablespace where length(spclocation)>0;" -P tuples_only -P format=unaligned`)
RVAL=$?
if [ $RVAL -ne 0 ]
        then
        echo "There is some problem in getting table spaces"
        exit 1;
fi
DIRS=( "${TBL_SPCS[@]}" $DATA_DIR)
echo "Folders for Backup:"
CTR=0
while [ -n "${DIRS[${CTR}]}" ]; do
    echo -n "${DIRS[${CTR}]} --> "
    DIRS[${CTR}]=`ssh $HOSTNAME readlink -f ${DIRS[${CTR}]}`
    echo "${DIRS[${CTR}]} (after symlink resolution)"
    CTR=$((CTR + 1))
done
unset CTR




CP=`$PSQL -q -Upostgres -d template1 -h$HOSTNAME -c "SELECT pg_start_backup('$label');" -P tuples_only -P format=unaligned 2>&1`

RVAL=$?
if [ $RVAL -ne 0 ]
        then
        echo "PSQL pg_start_backup failed:$CP"
        exit 1;
fi
echo "pg_start_backup executed successfully: $CP"



echo "RSYNC begins.."

# rsync each of the folders to the backup folder.
CTR=0
while [ -n "${DIRS[${CTR}]}" ]; do
    echo "Syncing ${DIRS[${CTR}]}..."
    time ${RSYNC} $HOSTNAME:${DIRS[${CTR}]} ${BACKUPFOLDER}
    RVAL=$?
    echo "Sync finished with exit status ${RVAL}"
    if [[ ${RVAL} -eq 0 || ${RVAL} -eq 23  ]]; then
        echo "Rsync success"
    else
        echo "Rsync failed"
        $PSQL -Upostgres -h$HOSTNAME template1 -c "SELECT pg_stop_backup();"
        exit 1
    fi
    CTR=$((CTR + 1))
done
unset CTR


echo "Executing pg_stop_backup in server ... "
$PSQL -Upostgres -h$HOSTNAME template1 -c "SELECT pg_stop_backup();"
if [ $? -ne 0 ]
        then
        echo "PSQL pg_stop_backup failed"
        exit 1;
fi
echo "pg_stop_backup done successfully"

# read the backup_label file in pgdatadir and get the name of start wal file
# below is example content.
#START WAL LOCATION: E/A9145E4 (file 000000010000000E0000000A)
#CHECKPOINT LOCATION: E/A92939C
#START TIME: 2006-04-01 14:36:48 IST
#LABEL: base_backup_01-04-2006-14-36-45

DATA_DIR_NAME=`basename $DATA_DIR`
BACKUP_LABEL=$BACKUPFOLDER/$DATA_DIR_NAME/backup_label
echo "BACKUP_LABEL: $BACKUP_LABEL"

START_LINE=`grep -i  "START WAL LOCATION"  $BACKUP_LABEL` # get the like containing START WAL LOCATION
START_LINE=${START_LINE/#START*file /} # strip something like 'START WAL LOCATION: E/A9145E4 (file ' from begin.
START_LINE=${START_LINE/%)/} # strip ')' from end.

# REF_FILE_NUM is something like 000000010000000A00000068
REF_FILE_NUM=$START_LINE

echo "Content of $BACKUP_LABEL"
echo "------------- begin -----------"
cat $BACKUP_LABEL
echo "------------- end -----------"
echo "Read Start Wal as : $REF_FILE_NUM"

echo "REF_FILE_NUM=$REF_FILE_NUM"

# iterate list of files in the WAL_ARCHIVE folder
for i in `ls -1 $WAL_ARCHIVE` ;
do
        # $i is :000000010000000A0000005D.bz2 eg
        # get first 24 chars in filename
        FILE_NUM=${i:0:24}

        # compare if the number is less than the reference
        # here string comparison is being used.
        if [[ $FILE_NUM  < $REF_FILE_NUM ]]
        then
                echo "$FILE_NUM [ $i ] removed"
                rm -f $WAL_ARCHIVE/$i
        else
                echo "$FILE_NUM [ $i ] not removed"
        fi
done
#------------------------- script ends -----------------------------------------------------------------