Обсуждение: Reliable WAL file shipping over unreliable network

Поиск

Список

Период

Сортировка

Reliable WAL file shipping over unreliable network

От

Nagy László Zsolt

Дата:

28 февраля 2018 г., 20:57:30

  Hello!

Let's suppose that a replication master is writting a WAL segment file
into a directory. That directory is mounted on the replication slave. Is
it possible that the slave will try to read a WAL segment file that is
not yet fully flushed to disk? I did not find any requirement in the
official documentation about this. How does it work? Do I have to copy
segments to temp files, and rename them when they are fully flushed to
disk? Or is it okay to have half complete files in the archive dir for a
while?

Actually I cannot mount the slave's archive directory on the mater,
because the network is not reliable. The WAL files need to be copied
from the master to the slave. The network can go down in the middle of a
copy operation. I can write a program that detects this, and retries the
copy when the network comes back. But what happens on the slave side?
What will a replication slave do if it sees a half complete WAL file?

Is there a utility that is widely used for WAL file shipping, and
addresses these problems? I know that PostgreSQL itself does not care
about how the log files are shipped, but I suspect that there are
robust, proven methods for shipping WAL files over unreliable networks.

And finally: if I also enable streaming replication, then it seems that
log file shipping is not needed at all. If I omit archive_command and
restore_command from the configs, and setup the replication slots and
primary_conninfo only, then it seems to be working just fine. But when
the network goes down for a while, then the slave goes out of sync and
it cannot recover. It was not clear for me from the documentation, but
am I right in that I can combine log file shipping with streaming
replication, and achieve small replication delays plus the ability to
recover after a longer period if network outage?

Thanks,

   Laszlo

RE: Reliable WAL file shipping over unreliable network

От

Alvaro Aguayo Garcia-Rada

Дата:

28 февраля 2018 г., 21:04:56

Have you tried pglogical? It should work fine for cases as yours, and I know pn first hand that it works well on unstable connections: I have a 18 slave pglogical cluster over bad connections, and it works pretty well.

Regards,

Alvaro Aguayo
Jefe de Operaciones
Open Comb Systems E.I.R.L.

Oficina: (+51-1) 3377813 | RPM: #034252 / (+51) 995540103 | RPC: (+51) 954183248
Website: www.ocs.pe

Sent from my Sony Xperia™ smartphone

---- Nagy László Zsolt wrote ----

Hello!

Let's suppose that a replication master is writting a WAL segment file
into a directory. That directory is mounted on the replication slave. Is
it possible that the slave will try to read a WAL segment file that is
not yet fully flushed to disk? I did not find any requirement in the
official documentation about this. How does it work? Do I have to copy
segments to temp files, and rename them when they are fully flushed to
disk? Or is it okay to have half complete files in the archive dir for a
while?

Actually I cannot mount the slave's archive directory on the mater,
because the network is not reliable. The WAL files need to be copied
from the master to the slave. The network can go down in the middle of a
copy operation. I can write a program that detects this, and retries the
copy when the network comes back. But what happens on the slave side?
What will a replication slave do if it sees a half complete WAL file?

Is there a utility that is widely used for WAL file shipping, and
addresses these problems? I know that PostgreSQL itself does not care
about how the log files are shipped, but I suspect that there are
robust, proven methods for shipping WAL files over unreliable networks.

And finally: if I also enable streaming replication, then it seems that
log file shipping is not needed at all. If I omit archive_command and
restore_command from the configs, and setup the replication slots and
primary_conninfo only, then it seems to be working just fine. But when
the network goes down for a while, then the slave goes out of sync and
it cannot recover. It was not clear for me from the documentation, but
am I right in that I can combine log file shipping with streaming
replication, and achieve small replication delays plus the ability to
recover after a longer period if network outage?

Thanks,

Laszlo

Re: Reliable WAL file shipping over unreliable network

От

scott ribe

Дата:

28 февраля 2018 г., 21:12:50

On Feb 28, 2018, at 7:57 AM, Nagy László Zsolt <gandalf@shopzeus.com> wrote:
>
> Is there a utility that is widely used for WAL file shipping, and
> addresses these problems?

rsync

--
Scott Ribe
https://www.linkedin.com/in/scottribe/
(303) 722-0567

Re: Reliable WAL file shipping over unreliable network

От

Ray Stell

Дата:

28 февраля 2018 г., 21:21:47

On 2/28/18 9:57 AM, Nagy László Zsolt wrote:

>   but
> am I right in that I can combine log file shipping with streaming
> replication, and achieve small replication delays plus the ability to
> recover after a longer period if network outage?
>
yes.

You might look at adjusting your wal_keep_segments to retain WAL for a 
period longer than expected network failures.  This might require some 
storage planning/adjustments.

Re: Reliable WAL file shipping over unreliable network

От

Nagy László Zsolt

Дата:

28 февраля 2018 г., 21:23:01

>> Is there a utility that is widely used for WAL file shipping, and
>> addresses these problems?
> rsync
Should I use rsync with --remove-source-files option? Will that solve
all problems with unreliable networks?

Re: Reliable WAL file shipping over unreliable network

От

Nagy László Zsolt

Дата:

28 февраля 2018 г., 21:46:42

>>> Is there a utility that is widely used for WAL file shipping, and
>>> addresses these problems?
>> rsync
> Should I use rsync with --remove-source-files option? Will that solve
> all problems with unreliable networks?
Another problem with rsync might be that it does not know if a WAL file
on the source side is growing (e.g. being written) and it might start
copy that before it is fully flushed to disk. I see this as a big
problem, but I don't have experience.

My first idea to address this is to execute an archive command something
like this:

test ! -f /pg_wal/%f && cp %p /volume/pg_wal_tmp/%f && mv
/volume/pg_wal_tmp/%f /volume/pg_wal/%f

E.g. move the file into the archive dir only after it has been fully
copied to the destination volume. But then we still have the problem of
partially written files on the slave side.

I can come up with other ideas, but I do not want to reinvent the wheel.
Can somebody please tell me how to deal with partially written WAL files?

Thanks,

  Laszlo

Re: Reliable WAL file shipping over unreliable network

От

scott ribe

Дата:

28 февраля 2018 г., 22:38:57

On Feb 28, 2018, at 8:46 AM, Nagy László Zsolt <gandalf@shopzeus.com> wrote:
>
> Another problem with rsync might be that it does not know if a WAL file
> on the source side is growing (e.g. being written) and it might start
> copy that before it is fully flushed to disk. I see this as a big
> problem, but I don't have experience.

You use rsync in the archive command, not by itself. Postgres does not try to copy the file until it is ready.

> E.g. move the file into the archive dir only after it has been fully
> copied to the destination volume. But then we still have the problem of
> partially written files on the slave side.

No, you do not have that problem at all.

Given an unreliable network, you probably do want to use --partial


--
Scott Ribe
https://www.linkedin.com/in/scottribe/
(303) 722-0567

Re: Reliable WAL file shipping over unreliable network

От

Rui DeSousa

Дата:

28 февраля 2018 г., 23:05:25

Just use "-ac”; you want -c option to ensure no data corruption during the transfer. Do not delete the file; let Postgres manage that.

Here is a snippet from I script I use for archiving. You also want to make your script returns failure or success correctly.

# SSH Command and options

SSH_CMD="ssh -o ServerAliveInterval=20 $ARCH_SERVER"

STS=3

OUTPUT=$(rsync -ac --rsync-path="mkdir -p $ARCH_DIR && rsync" $XLOGFILE $ARCH_SERVER:$ARCH_DIR/$WALFILE)

if [ $? == 0 ]; then

STS=0

exit $STS

On Feb 28, 2018, at 11:38 AM, scott ribe <scott_ribe@elevated-dev.com> wrote:

On Feb 28, 2018, at 8:46 AM, Nagy László Zsolt <gandalf@shopzeus.com> wrote:

Another problem with rsync might be that it does not know if a WAL file
on the source side is growing (e.g. being written) and it might start
copy that before it is fully flushed to disk. I see this as a big
problem, but I don't have experience.

You use rsync in the archive command, not by itself. Postgres does not try to copy the file until it is ready.

E.g. move the file into the archive dir only after it has been fully
copied to the destination volume. But then we still have the problem of
partially written files on the slave side.

No, you do not have that problem at all.

Given an unreliable network, you probably do want to use --partial

--
Scott Ribe
https://www.linkedin.com/in/scottribe/
(303) 722-0567

Re: Reliable WAL file shipping over unreliable network

От

Nagy László Zsolt

Дата:

28 февраля 2018 г., 23:20:26

>> Another problem with rsync might be that it does not know if a WAL file
>> on the source side is growing (e.g. being written) and it might start
>> copy that before it is fully flushed to disk. I see this as a big
>> problem, but I don't have experience.
> You use rsync in the archive command, not by itself. Postgres does not try to copy the file until it is ready.
Well, that will be a problem because postgres is running in an isolated
docker container. But that is a totally different problem. I can see
that if rsync is used in the archive command, then postgres will retry
unsuccessful transfers on the source side.
>
>> E.g. move the file into the archive dir only after it has been fully
>> copied to the destination volume. But then we still have the problem of
>> partially written files on the slave side.
> No, you do not have that problem at all.
Can you please explain how postres on the replication slave can
determine if the file in the archive dir is completely flushed to the disk?
>
> Given an unreliable network, you probably do want to use --partial
>
--partial can resume interrupted transfers. But on the slave side, there
will be partially transferred files in the archive dir, maybe for a long
time.

As far as I understand, the restore command will not (and cannot)
examine of the source file is completely copied. It will just be called,
and if there is a file with the expected name, then it will copy it into
pg_wal.

Maybe I'm missing a key concept, but I still don't see how it works.

  L

Re: Reliable WAL file shipping over unreliable network

От

Nagy László Zsolt

Дата:

28 февраля 2018 г., 23:53:53

Just use "-ac”; you want -c option to ensure no data corruption during the transfer. Do not delete the file; let Postgres manage that.

Here is a snippet from I script I use for archiving. You also want to make your script returns failure or success correctly.

# SSH Command and options
SSH_CMD="ssh -o ServerAliveInterval=20 $ARCH_SERVER"
STS=3

OUTPUT=$(rsync -ac --rsync-path="mkdir -p $ARCH_DIR && rsync" $XLOGFILE $ARCH_SERVER:$ARCH_DIR/$WALFILE)
if [ $? == 0 ]; then
STS=0
fi

exit $STS

Thanks for the script. So I need to use this on the master side in archive_command. It ensures that postgres will retry to transfer partially transferred files until it succeeds. Unfortunately, I cannot use this in my docker container which is isolated and does not contain SSH or rsync. But I get the idea and I can come up with a simple script that runs on the host machine and transfers these files reliably to the slave side.

But I still don't understand what happens on the slave side when the slave tries to use a partially transferred WAL file. I have this recovery.conf on the slave side:

standby_mode='on'
primary_conninfo='host=postgres-master port=5432 user=not_telling password=not_telling'
trigger_file='/backup/trigger'
restore_command = 'cp /transfer/%f %p'
archive_cleanup_command = 'pg_archivecleanup /transfer %r'

So what happens when the slave postgres executes restore_command on a WAL file that was transferred partially? It cannot test the file for completeness before it was already copied to pg_wal. The restore command itself also cannot tell if the file is complete or not. What does PostgreSQL do when it sees an incomplete file in pg_wal? Detects that the file was incomplete, and tries to execute the restore_command again? When? How often?

It seems inefficient to execute the restore_command multiple times, just to find out that the file was not yet complete, but if this is how it works, then it is fine with me. But does it really work that way? I don't see it documented (or maybe I'm looking at the wrong place.)

Sorry for the many questions. I need to understand every detail of this before we do it in production.

Re: Reliable WAL file shipping over unreliable network

От

scott ribe

Дата:

01 марта 2018 г., 00:05:19

On Feb 28, 2018, at 10:05 AM, Rui DeSousa <rui.desousa@icloud.com> wrote:
>
> Just use "-ac”;  you want -c option to ensure no data corruption during the transfer.  Do not delete the file; let
Postgresmanage that. 

-c has nothing to do with data corruption during transfer

--
Scott Ribe
https://www.linkedin.com/in/scottribe/
(303) 722-0567

Re: Reliable WAL file shipping over unreliable network

От

Rui DeSousa

Дата:

01 марта 2018 г., 00:11:54

Easiest thing to do is to keep the replica from seeing the file until it’s successfully transferred; i.e.

1. Transfer the file successfully first with a temporary extension: $ARCH_DIR/$WALFILE.swap

2. If successfully transferred then move it into place:

mv $ARCH_DIR/$WALFILE.swap $ARCH_DIR/$WALFILE

3. Return status success/failure code back to primary

On Feb 28, 2018, at 12:53 PM, Nagy László Zsolt <gandalf@shopzeus.com> wrote:

Just use "-ac”; you want -c option to ensure no data corruption during the transfer. Do not delete the file; let Postgres manage that.

Here is a snippet from I script I use for archiving. You also want to make your script returns failure or success correctly.

# SSH Command and options
SSH_CMD="ssh -o ServerAliveInterval=20 $ARCH_SERVER"
STS=3

OUTPUT=$(rsync -ac --rsync-path="mkdir -p $ARCH_DIR && rsync" $XLOGFILE $ARCH_SERVER:$ARCH_DIR/$WALFILE)
if [ $? == 0 ]; then
STS=0
fi

exit $STS

Thanks for the script. So I need to use this on the master side in archive_command. It ensures that postgres will retry to transfer partially transferred files until it succeeds. Unfortunately, I cannot use this in my docker container which is isolated and does not contain SSH or rsync. But I get the idea and I can come up with a simple script that runs on the host machine and transfers these files reliably to the slave side.

But I still don't understand what happens on the slave side when the slave tries to use a partially transferred WAL file. I have this recovery.conf on the slave side:

standby_mode='on'
primary_conninfo='host=postgres-master port=5432 user=not_telling password=not_telling'
trigger_file='/backup/trigger'
restore_command = 'cp /transfer/%f %p'
archive_cleanup_command = 'pg_archivecleanup /transfer %r'

So what happens when the slave postgres executes restore_command on a WAL file that was transferred partially? It cannot test the file for completeness before it was already copied to pg_wal. The restore command itself also cannot tell if the file is complete or not. What does PostgreSQL do when it sees an incomplete file in pg_wal? Detects that the file was incomplete, and tries to execute the restore_command again? When? How often?

It seems inefficient to execute the restore_command multiple times, just to find out that the file was not yet complete, but if this is how it works, then it is fine with me. But does it really work that way? I don't see it documented (or maybe I'm looking at the wrong place.)

Sorry for the many questions. I need to understand every detail of this before we do it in production.

Re: Reliable WAL file shipping over unreliable network

От

scott ribe

Дата:

01 марта 2018 г., 00:12:03

There seem to be 2 fundamental misunderstandings here:

1) That other processes cannot see data written to a file until it is flushed to disk; this is not true; while file
datais still in file cache, it is visible to other processes. 

2) That rsync writes the file on the destination directly; it does not; it writes into a temporary file and renames
thatfile when it is complete. 

> On Feb 28, 2018, at 10:53 AM, Nagy László Zsolt <gandalf@shopzeus.com> wrote:
>
>
>>
>> Just use "-ac”;  you want -c option to ensure no data corruption during the transfer.  Do not delete the file; let
Postgresmanage that. 
>>
>> Here is a snippet from I script I use for archiving.  You also want to make your script returns failure or success
correctly.  
>>
>> # SSH Command and options
>> SSH_CMD="ssh -o ServerAliveInterval=20 $ARCH_SERVER"
>> STS=3
>>
>> OUTPUT=$(rsync -ac --rsync-path="mkdir -p $ARCH_DIR && rsync" $XLOGFILE $ARCH_SERVER:$ARCH_DIR/$WALFILE)
>> if [ $? == 0 ]; then
>>    STS=0
>> fi
>>
>> exit $STS
>>

--
Scott Ribe
https://www.linkedin.com/in/scottribe/
(303) 722-0567

Re: Reliable WAL file shipping over unreliable network

От

Rui DeSousa

Дата:

01 марта 2018 г., 00:15:48

Sure it does… what happens if the archive server gets full? There is a situation that can fool rsync into thinking it
wassuccessfully transferred as rsync only relies on filesystem metadata to verify the file is different.  The “-c”
optionwill validate the file using a checksum and not just rely on just filesystem metadata.   

Trust me on this one; I’ve seen this happen.

> On Feb 28, 2018, at 1:05 PM, scott ribe <scott_ribe@elevated-dev.com> wrote:
>
> On Feb 28, 2018, at 10:05 AM, Rui DeSousa <rui.desousa@icloud.com> wrote:
>>
>> Just use "-ac”;  you want -c option to ensure no data corruption during the transfer.  Do not delete the file; let
Postgresmanage that. 
>
> -c has nothing to do with data corruption during transfer
>
> --
> Scott Ribe
> https://www.linkedin.com/in/scottribe/
> (303) 722-0567
>

Re: Reliable WAL file shipping over unreliable network

От

Rui DeSousa

Дата:

01 марта 2018 г., 00:25:45

On Feb 28, 2018, at 1:12 PM, scott ribe <scott_ribe@elevated-dev.com> wrote:

There seem to be 2 fundamental misunderstandings here:

1) That other processes cannot see data written to a file until it is flushed to disk; this is not true; while file data is still in file cache, it is visible to other processes.

2) That rsync writes the file on the destination directly; it does not; it writes into a temporary file and renames that file when it is complete.

While you’re correct; I never made either of those assumptions

Re: Reliable WAL file shipping over unreliable network

От

Dianne Skoll

Дата:

01 марта 2018 г., 00:26:20

On Wed, 28 Feb 2018 13:15:48 -0500
Rui DeSousa <rui.desousa@icloud.com> wrote:

> Sure it does… what happens if the archive server gets full? There is
> a situation that can fool rsync into thinking it was successfully
> transferred as rsync only relies on filesystem metadata to verify the
> file is different.

That's incorrect.  The original transfer will fail and rsync will remove
the temporary file.  It doesn't rename the file to the final destination
until and unless the transfer is successful.

"-c" is only used for files that might change multiple times per second;
since most file systems have only 1-second granualarity in file timestamps,
you can't rely on a file being identical if the size and timestamps are the
same.

> The “-c” option will validate the file using a checksum and not just
> rely on just filesystem metadata.

That's incorrect.  "-c" applies only when rsync looks at existing files
to see if they differ.  It does not affect what happens when rsync actually
copies the file data over; that's always verified with a checksum.

Regards,

Dianne.

Re: Reliable WAL file shipping over unreliable network

От

Rui DeSousa

Дата:

01 марта 2018 г., 00:49:33

Really? I think you really need to think about it.  You are dealing with a race condition.

While your though process might seem to work and it will appear to work; however in the end it will fail — like I said;
I’veseen it fail first hand. 

You are dealing with a race condition involving filesystem metadata.  If you choose to not use “-c” option then; all I
cantell you is that you have been warned. 


> On Feb 28, 2018, at 1:26 PM, Dianne Skoll <dfs@roaringpenguin.com> wrote:
>
> On Wed, 28 Feb 2018 13:15:48 -0500
> Rui DeSousa <rui.desousa@icloud.com> wrote:
>
>> Sure it does… what happens if the archive server gets full? There is
>> a situation that can fool rsync into thinking it was successfully
>> transferred as rsync only relies on filesystem metadata to verify the
>> file is different.
>
> That's incorrect.  The original transfer will fail and rsync will remove
> the temporary file.  It doesn't rename the file to the final destination
> until and unless the transfer is successful.
>
> "-c" is only used for files that might change multiple times per second;
> since most file systems have only 1-second granualarity in file timestamps,
> you can't rely on a file being identical if the size and timestamps are the
> same.
>
>> The “-c” option will validate the file using a checksum and not just
>> rely on just filesystem metadata.
>
> That's incorrect.  "-c" applies only when rsync looks at existing files
> to see if they differ.  It does not affect what happens when rsync actually
> copies the file data over; that's always verified with a checksum.
>
> Regards,
>
> Dianne.
>

Re: Reliable WAL file shipping over unreliable network

От

Dianne Skoll

Дата:

01 марта 2018 г., 00:52:20

On Wed, 28 Feb 2018 13:49:33 -0500
Rui DeSousa <rui.desousa@icloud.com> wrote:

> Really? I think you really need to think about it.  You are dealing
> with a race condition.

Not at all, assuming rsync is being called by PostgreSQL's archive_command.

> You are dealing with a race condition involving filesystem metadata.

Nope.  Not true.  What are the things that are racing?

If you rsync data to make a base backup, then yes, use -c... but not because
of any race condition.  Instead, it's because of the coarse filesystem
timestamp granularity.

Regards,

Dianne.

Re: Reliable WAL file shipping over unreliable network

От

Nagy László Zsolt

Дата:

01 марта 2018 г., 01:03:43

There seem to be 2 fundamental misunderstandings here:

1) That other processes cannot see data written to a file until it is flushed to disk; this is not true; while file data is still in file cache, it is visible to other processes.

2) That rsync writes the file on the destination directly; it does not; it writes into a temporary file and renames that file when it is complete.

While you’re correct; I never made either of those assumptions

Number 1 is irrelevant because files are sent from the master and received by the slave. Two different computers that don't share the same cache. Number 2 blows away my obscurity completely, I just didn't know that rsync works that way. It must be documented somewhere in the (very long) rsync manual. I just opened it and I looked specifically for this behaviour, but I cannot find it. There is a --temp-dir option that suggests that data is written to temporary files first. But that is only a suggestion. I don't see anything explicit about writting data to temporary files and renaming them once they are complete. Although it seems logical and I believe you, but I did not want to make such assumptions either.

So, problem solved in theory. The next step is to do tests by simulating network outages.

I also have a proposal: let's change the example in the PostgreSQL documentation! The example archive_command presented in the docs contains a simple cp command that does not write data to temporary files. Anyone who tries to use that example on a production server will shoot himself on the foot. At least we should have a note there, telling that it is the user's responsibility to make sure that only complete WAL files appear on the slave's archive directory, and that the cp command is just a silly example that should never be used on a production server.

Thank you for your help!

Laszlo

Re: Reliable WAL file shipping over unreliable network

От

scott ribe

Дата:

01 марта 2018 г., 01:11:15

On Feb 28, 2018, at 12:03 PM, Nagy László Zsolt <gandalf@shopzeus.com> wrote:
>
> There is a --temp-dir option that suggests that data is written to temporary files first. But that is only a
suggestion.I don't see anything explicit about writting data to temporary files and renaming them once they are
complete.

Seems hard to me to make it more clear:

-T, --temp-dir=DIR
This option instructs rsync to use DIR as a scratch directory
when creating temporary copies of the files transferred on the
receiving side. The default behavior is to create each tempo-
rary file in the same directory as the associated destination
file.

And:

--delay-updates
This option puts the temporary file from each updated file into
a holding directory until the end of the transfer, at which time
all the files are renamed into place in rapid succession. This
attempts to make the updating of the files a little more atomic.
By default the files are placed into a directory named ".~tmp~"
in each file's destination directory, but if you've specified
the --partial-dir option, that directory will be used instead.
See the comments in the --partial-dir section for a discussion
of how this ".~tmp~" dir will be excluded from the transfer, and
what you can do if you wnat rsync to cleanup old ".~tmp~" dirs
that might be lying around. Conflicts with --inplace and
--append.

--
Scott Ribe
https://www.linkedin.com/in/scottribe/
(303) 722-0567

Re: Reliable WAL file shipping over unreliable network

От

Nagy László Zsolt

Дата:

01 марта 2018 г., 01:11:33


2018. 02. 28. 19:11 keltezéssel, Rui DeSousa írta:
> Easiest thing to do is to keep the replica from seeing the file until
> it’s successfully transferred; i.e.
>
> 1. Transfer the file successfully first with a temporary extension:  
> $ARCH_DIR/$WALFILE.swap
> 2. If successfully transferred then move it into place:
>     mv $ARCH_DIR/$WALFILE.swap $ARCH_DIR/$WALFILE
This works ONLY if the directory is mounted on the master, and you can
execute "mv" on the master (as part of the archive_command) to rename
the file that is stored on the slave.

But the original question specified that mounting is not an option.

Re: Reliable WAL file shipping over unreliable network

От

Rui DeSousa

Дата:

01 марта 2018 г., 01:12:39

You are dealing with a race condition. Postgres will continually attempt to archive the same file over and over until
rsyncfoolishly returns success.   

You are making the assumption that rsync removed the file? What if it can’t? If you’re dealing with a logged filesystem
andit can’t log the metafile changes what happens then? What happened when rsync core dumps?  

Why would you not use “-c”?  I have seen this fail without the “-c” option…

> On Feb 28, 2018, at 1:52 PM, Dianne Skoll <dfs@roaringpenguin.com> wrote:
>
> On Wed, 28 Feb 2018 13:49:33 -0500
> Rui DeSousa <rui.desousa@icloud.com> wrote:
>
>> Really? I think you really need to think about it.  You are dealing
>> with a race condition.
>
> Not at all, assuming rsync is being called by PostgreSQL's archive_command.
>
>> You are dealing with a race condition involving filesystem metadata.
>
> Nope.  Not true.  What are the things that are racing?
>
> If you rsync data to make a base backup, then yes, use -c... but not because
> of any race condition.  Instead, it's because of the coarse filesystem
> timestamp granularity.
>
> Regards,
>
> Dianne.
>

Re: Reliable WAL file shipping over unreliable network

От

Dianne Skoll

Дата:

01 марта 2018 г., 01:16:43

So, I decided to run a test.

It turns out we are both wrong.

My test script is as follows:

#======================================================================
#!/bin/bash

# Remote machine
MACH=dianne.skoll.ca

# Create a 10MB pseudo-random file
echo "`date` Creating 10MB file"
FILE=pseudo-random
dd if=/dev/urandom bs=1k count=10k of=$FILE 2>/dev/null

# Rsync it over at 500kB per second in the background
echo "`date` Starting rsync"
rsync -c --verbose --bwlimit=500k $FILE $MACH:/tmp &
RSYNC_PID=$!

echo "`date` rsync PID=$RSYNC_PID; sleeping for 3 seconds"
# Sleep for 3 seconds
sleep 3

echo "`date` Stopping rsync"
# Stop the rsync
kill -STOP $RSYNC_PID

echo "`date` Changing source file"
# Change the source file
dd if=/dev/urandom bs=1k count=10k of=$FILE 2>/dev/null

echo "`date` Continuing rsync"
# Restart the rsync and bring it into the foreground
kill -CONT $RSYNC_PID
wait

echo "`date` Rsync done; comparing files"

ssh $MACH "cd /tmp && sha1sum $FILE" | sha1sum -c -
#======================================================================

Running it produces this output:

Wed Feb 28 14:14:32 EST 2018 Creating 10MB file
Wed Feb 28 14:14:32 EST 2018 Starting rsync
Wed Feb 28 14:14:32 EST 2018 rsync PID=24456; sleeping for 3 seconds
pseudo-random
Wed Feb 28 14:14:35 EST 2018 Stopping rsync
Wed Feb 28 14:14:35 EST 2018 Changing source file
Wed Feb 28 14:14:35 EST 2018 Continuing rsync

sent 10,488,419 bytes  received 19,505 bytes  488,740.65 bytes/sec
total size is 10,485,760  speedup is 1.00
Wed Feb 28 14:14:53 EST 2018 Rsync done; comparing files
pseudo-random: FAILED
sha1sum: WARNING: 1 computed checksum did NOT match

So even *with* the -c flag, rsync didn't notice the file had been changed
while it was running.  Interesting.

Regards,

Dianne.

Re: Reliable WAL file shipping over unreliable network

От

Dianne Skoll

Дата:

01 марта 2018 г., 01:22:02

On Wed, 28 Feb 2018 14:12:39 -0500
Rui DeSousa <rui.desousa@icloud.com> wrote:

> You are dealing with a race condition. Postgres will continually
> attempt to archive the same file over and over until rsync foolishly
> returns success.  

What do you mean by "rsync foolishly returns success"?

> You are making the assumption that rsync removed the file?

Rsync does not create the file with the final name until it
has successfully transferred it completely.  It copies it to a temporary
file and then if (and only if) the transfer succeeded, then it atomically
renames the temporary file to the final filename.

The only failure mode if metadata updates are lost is that the file
will appear not to exist when it really should, not the other way
around.  You can never, ever have the file existing with the proper
filename until a transfer has succeeded.

However, see my other post about a test.  The -c option doesn't actually
help if a file changes while rsync is running.

Regards,

Dianne.

Re: Reliable WAL file shipping over unreliable network

От

Nagy László Zsolt

Дата:

01 марта 2018 г., 01:25:45

>               By default the files are placed into a directory named  ".~tmp~"
>               in  each  file's  destination directory, but if you've specified
>               the --partial-dir option, that directory will be  used  instead.
>               See  the  comments in the --partial-dir section for a discussion
>               of how this ".~tmp~" dir will be excluded from the transfer, and
>               what  you  can do if you wnat rsync to cleanup old ".~tmp~" dirs
>               that might  be  lying  around.   Conflicts  with  --inplace  and
>               --append.
I stand corrected. It is there. But still: this is a general behaviour
of rsync that is independent of most command line options. And yet, "the
temporary file" is just an unexplained term that appears in the middle
of the description of a (rarely used) option, and the "renamed only
after the transfer has completed" part is completely missing. That said,
it is just an excuse on my side. I should have figured it out. :-)

Re: Reliable WAL file shipping over unreliable network

От

Dianne Skoll

Дата:

01 марта 2018 г., 01:53:41

On Wed, 28 Feb 2018 13:49:33 -0500
Rui DeSousa <rui.desousa@icloud.com> wrote:

> Really? I think you really need to think about it.  You are dealing
> with a race condition.

You are not correct.

1) rsync copies the source file to a temporary file on the destination.

2) If and only if step (1) is successful, rsync renames the temporary
file to the final filename.

3) If and only if step (2) is successful, rsync sets the timestamp on
the destination file to match that of the source file, assuming the
--archive or similar command-line option was given.

There is no possible sequence of events that could give you a destination
file with the same size and timestamp as the source file and yet be
incorrect (just so long as the source file has stayed the same.)

Regards,

Dianne.

Re: Reliable WAL file shipping over unreliable network

От

Laurenz Albe

Дата:

01 марта 2018 г., 02:12:11

Nagy László Zsolt wrote:
> Let's suppose that a replication master is writting a WAL segment file
> into a directory. That directory is mounted on the replication slave. Is
> it possible that the slave will try to read a WAL segment file that is
> not yet fully flushed to disk? I did not find any requirement in the
> official documentation about this. How does it work? Do I have to copy
> segments to temp files, and rename them when they are fully flushed to
> disk? Or is it okay to have half complete files in the archive dir for a
> while?

I suppose you are talking about "archive_command" here.

If the file restored with "restore_command" is too small,
the operation fails, and you get a DEBUG1 message:

  archive file "..." has wrong size: ... instead of ...

So nothing can go wrong there.

> And finally: if I also enable streaming replication, then it seems that
> log file shipping is not needed at all. If I omit archive_command and
> restore_command from the configs, and setup the replication slots and
> primary_conninfo only, then it seems to be working just fine. But when
> the network goes down for a while, then the slave goes out of sync and
> it cannot recover. It was not clear for me from the documentation, but
> am I right in that I can combine log file shipping with streaming
> replication, and achieve small replication delays plus the ability to
> recover after a longer period if network outage?

If you use a replication slot, the standby will never get out of sync
because the primary will retain all WAL that the standby has not
received yet.

Streaming replication together with archive recovery is only useful
if you are *not* using replication slots.

Yours,
Laurenz Albe

Re: Reliable WAL file shipping over unreliable network

От

Nagy László Zsolt

Дата:

01 марта 2018 г., 02:52:11

>>  Do I have to copy
>> segments to temp files, and rename them when they are fully flushed to
>> disk? Or is it okay to have half complete files in the archive dir for a
>> while?
> I suppose you are talking about "archive_command" here.
>
> If the file restored with "restore_command" is too small,
> the operation fails, and you get a DEBUG1 message:
>
>   archive file "..." has wrong size: ... instead of ...
>
> So nothing can go wrong there.
Nothing can go wrong? Does it mean that PostgreSQL will re-execute the
restore_command if the file was too small? If it won't retry the
restore_command, then everything goes wrong. It might be documented
somewhere, but apparently I'm out of luck with documentations.

>
>> And finally: if I also enable streaming replication, then it seems that
>> log file shipping is not needed at all. If I omit archive_command and
>> restore_command from the configs, and setup the replication slots and
>> primary_conninfo only, then it seems to be working just fine. But when
>> the network goes down for a while, then the slave goes out of sync and
>> it cannot recover. It was not clear for me from the documentation, but
>> am I right in that I can combine log file shipping with streaming
>> replication, and achieve small replication delays plus the ability to
>> recover after a longer period if network outage?
> If you use a replication slot, the standby will never get out of sync
> because the primary will retain all WAL that the standby has not
> received yet.
>
> Streaming replication together with archive recovery is only useful
> if you are *not* using replication slots.
So you are saying that if I use replication slots, then I can completely
forget about manual WAL file shipping. There is one thing in the docs
that contradicts the above statement.

This is from
https://www.postgresql.org/docs/10/static/warm-standby.html#STREAMING-REPLICATION

> If you use streaming replication without file-based continuous
> archiving, the server might recycle old WAL segments before the
> standby has received them. If this occurs, the standby will need to be
> reinitialized from a new base backup.
In other words, using file based archive recovery plus streaming
replication at the same time can be useful, because it allows you you to
store WAL segments on a disk that is independent of the master, and it
makes recovery possible after a longer period of time. Increasing
wal_keep_segments may help the replication slave to recover after a
longer time, but it is more risky. The segments are stored on the
master, and they don't provide protection against disk failures (on the
master).


Cheers,

   Laszlo

Re: Reliable WAL file shipping over unreliable network

От

Laurenz Albe

Дата:

01 марта 2018 г., 03:16:56

Nagy László Zsolt wrote:
> > >  Do I have to copy
> > > segments to temp files, and rename them when they are fully flushed to
> > > disk? Or is it okay to have half complete files in the archive dir for a
> > > while?
> > 
> > I suppose you are talking about "archive_command" here.
> > 
> > If the file restored with "restore_command" is too small,
> > the operation fails, and you get a DEBUG1 message:
> > 
> >   archive file "..." has wrong size: ... instead of ...
> > 
> > So nothing can go wrong there.
> 
> Nothing can go wrong? Does it mean that PostgreSQL will re-execute the
> restore_command if the file was too small? If it won't retry the
> restore_command, then everything goes wrong. It might be documented
> somewhere, but apparently I'm out of luck with documentations.

To verify that restore_command won't accept a file that's too short,
read the code in in backend/access/transam/xlogarchive.c

The behavior for streaming replication with a WAL archive is documented here:
https://www.postgresql.org/docs/current/static/warm-standby.html#STANDBY-SERVER-OPERATION

 "At startup, the standby begins by restoring all WAL available in the archive
  location, calling restore_command. Once it reaches the end of WAL available
  there and restore_command fails, it tries to restore any WAL available in the
  pg_wal directory. If that fails, and streaming replication has been configured,
  the standby tries to connect to the primary server and start streaming WAL
  from the last valid record found in archive or pg_wal. If that fails or
  streaming replication is not configured, or if the connection is later
  disconnected, the standby goes back to step 1 and tries to restore the file
  from the archive again. This loop of retries from the archive, pg_wal, and
  via streaming replication goes on until the server is stopped or failover
  is triggered by a trigger file."

> > > And finally: if I also enable streaming replication, then it seems that
> > > log file shipping is not needed at all. If I omit archive_command and
> > > restore_command from the configs, and setup the replication slots and
> > > primary_conninfo only, then it seems to be working just fine. But when
> > > the network goes down for a while, then the slave goes out of sync and
> > > it cannot recover. It was not clear for me from the documentation, but
> > > am I right in that I can combine log file shipping with streaming
> > > replication, and achieve small replication delays plus the ability to
> > > recover after a longer period if network outage?
> > 
> > If you use a replication slot, the standby will never get out of sync
> > because the primary will retain all WAL that the standby has not
> > received yet.
> > 
> > Streaming replication together with archive recovery is only useful
> > if you are *not* using replication slots.
> 
> So you are saying that if I use replication slots, then I can completely
> forget about manual WAL file shipping.

Precisely.

>  There is one thing in the docs
> that contradicts the above statement.
> 
> This is from
> https://www.postgresql.org/docs/10/static/warm-standby.html#STREAMING-REPLICATION
> 
> > If you use streaming replication without file-based continuous
> > archiving, the server might recycle old WAL segments before the
> > standby has received them. If this occurs, the standby will need to be
> > reinitialized from a new base backup.

This only talks about the case where you do not use replication slots.

Read https://www.postgresql.org/docs/current/static/warm-standby.html#STREAMING-REPLICATION-SLOTS:

 "Replication slots provide an automated way to ensure that the master does not
  remove WAL segments until they have been received by all standbys, and that
  the master does not remove rows which could cause a recovery conflict even
  when the standby is disconnected."

The key word is "ensure".

Yours,
Laurenz Albe

Re: Reliable WAL file shipping over unreliable network

От

Rui DeSousa

Дата:

01 марта 2018 г., 07:10:59

I’ve tested this and it seems that there is a still a bug in rsync (rsync  version 3.1.2  protocol version 31).  I used
a1GB archive filesytem to allow for an out of space test case.  Not sure of the actual cause as it seems to work a few
times;however, it then fails leaving a truncated file and returning a success code. 

Example: 00000001000000590000003E  - failed 4 fimes and on the fifth try rsync returned success and left a truncated
file.  

When there is actually no space left; rsync fails immidately and never returns a success code; i.e.
00000001000000590000003F. When freeing up space; archive resumes again.  It seems that if rsync is already syncing when
thefiles system fills up then there is a high risk the bug will occur; i.e. 000000010000005A00000001 is also truncated
andwith a rsync success code. 

Since rsync is returning success on a failed sync; even the "-c" option will not help here.

Archive Script critical code:

OUTPUT=$(rsync -ac $XLOGFILE $ARCH_SERVER:$ARCH_DIR/$WALFILE)
if [ $? == 0 ]; then
   STS=0
   echo "Success: $WALFILE" >> /tmp/waltest.log
else
   echo "Failed: $WALFILE" >> /tmp/waltest.log
fi

exit $STS


Archive Directory (Note: useing 64MB WALs):

[postgres@hades ~/arch/dbc1/wal]$ ls -al
total 1044351
drwxr-xr-x  2 postgres  postgres        57 Feb 28 19:46 .
drwxr-xr-x  3 postgres  postgres         3 Feb 28 17:33 ..
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000000B
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000000C
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000000D
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000000E
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000000F
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000010
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000011
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000012
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000013
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000014
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000015
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000016
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000017
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000018
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000019
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000001A
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000001B
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000001C
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000001D
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000001E
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000001F
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000020
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000021
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000022
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000023
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000024
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000025
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000026
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000027
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000028
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000029
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000002A
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000002B
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000002C
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000002D
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000002E
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000002F
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000030
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000031
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000032
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000033
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000034
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000035
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000036
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000037
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000038
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005900000039
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000003A
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000003B
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000003C
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000003D
-rw-------  1 postgres  postgres   3670016 Feb 28 17:33 00000001000000590000003E
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 00000001000000590000003F
-rw-------  1 postgres  postgres  67108864 Feb 28 17:33 000000010000005A00000000
-rw-------  1 postgres  postgres  12713984 Feb 28 17:33 000000010000005A00000001


Success/Failure Log:

[postgres@hades ~/arch/dbc1/wal]$ cat /tmp/waltest.log
Success: 000000010000005900000008
Success: 000000010000005900000009
Success: 00000001000000590000000A
Success: 00000001000000590000000B
Success: 00000001000000590000000C
Success: 00000001000000590000000D
Success: 00000001000000590000000E
Success: 00000001000000590000000F
Success: 000000010000005900000010
Success: 000000010000005900000011
Success: 000000010000005900000012
Success: 000000010000005900000013
Success: 000000010000005900000014
Success: 000000010000005900000015
Success: 000000010000005900000016
Success: 000000010000005900000017
Success: 000000010000005900000018
Success: 000000010000005900000019
Success: 00000001000000590000001A
Success: 00000001000000590000001B
Success: 00000001000000590000001C
Success: 00000001000000590000001D
Success: 00000001000000590000001E
Success: 00000001000000590000001F
Success: 000000010000005900000020
Success: 000000010000005900000021
Success: 000000010000005900000022
Success: 000000010000005900000023
Success: 000000010000005900000024
Success: 000000010000005900000025
Success: 000000010000005900000026
Success: 000000010000005900000027
Success: 000000010000005900000028
Success: 000000010000005900000029
Success: 00000001000000590000002A
Success: 00000001000000590000002B
Success: 00000001000000590000002C
Success: 00000001000000590000002D
Success: 00000001000000590000002E
Success: 00000001000000590000002F
Success: 000000010000005900000030
Success: 000000010000005900000031
Success: 000000010000005900000032
Success: 000000010000005900000033
Success: 000000010000005900000034
Success: 000000010000005900000035
Success: 000000010000005900000036
Success: 000000010000005900000037
Success: 000000010000005900000038
Success: 000000010000005900000039
Success: 00000001000000590000003A
Success: 00000001000000590000003B
Success: 00000001000000590000003C
Success: 00000001000000590000003D
Failed: 00000001000000590000003E
Failed: 00000001000000590000003E
Failed: 00000001000000590000003E
Failed: 00000001000000590000003E
Success: 00000001000000590000003E
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Failed: 00000001000000590000003F
Success: 00000001000000590000003F
Success: 000000010000005A00000000
Failed: 000000010000005A00000001
Failed: 000000010000005A00000001
Failed: 000000010000005A00000001
Failed: 000000010000005A00000001
Failed: 000000010000005A00000001
Failed: 000000010000005A00000001
Failed: 000000010000005A00000001
Success: 000000010000005A00000001
Failed: 000000010000005A00000002
Failed: 000000010000005A00000002
Failed: 000000010000005A00000002
Failed: 000000010000005A00000002
Failed: 000000010000005A00000002
Failed: 000000010000005A00000002
Failed: 000000010000005A00000002
Failed: 000000010000005A00000002
Failed: 000000010000005A00000002

Re: Reliable WAL file shipping over unreliable network

От

Dianne Skoll

Дата:

01 марта 2018 г., 07:54:58

Wow, that's good sleuthing!  Are you going to report the bug to the rsync maintainers?

You can work around it by not having your script trust rsync, but to do a sha1 hash on both ends after rsync exits.  We
dothis in production because we once encountered a server with bad memory that was mysteriously corrupting the files;
weended up never removing the sha1-comparison code. 

Regards,

Dianne.

On February 28, 2018 8:10:59 PM EST, Rui DeSousa <rui.desousa@icloud.com> wrote:
>
>I’ve tested this and it seems that there is a still a bug in rsync
>(rsync  version 3.1.2  protocol version 31).  I used a 1GB archive
>filesytem to allow for an out of space test case.  Not sure of the
>actual cause as it seems to work a few times; however, it then fails
>leaving a truncated file and returning a success code.

Re: Reliable WAL file shipping over unreliable network

От

Mark Kirkwood

Дата:

01 марта 2018 г., 07:59:14

Hi

On 01/03/18 14:10, Rui DeSousa wrote:
> I’ve tested this and it seems that there is a still a bug in rsync (rsync  version 3.1.2  protocol version 31).  I
useda 1GB archive filesytem to allow for an out of space test case.  Not sure of the actual cause as it seems to work a
fewtimes; however, it then fails leaving a truncated file and returning a success code.
 
>
>
> Archive Script critical code:
>
> OUTPUT=$(rsync -ac $XLOGFILE $ARCH_SERVER:$ARCH_DIR/$WALFILE)
> if [ $? == 0 ]; then
>     STS=0
>     echo "Success: $WALFILE" >> /tmp/waltest.log
> else
>     echo "Failed: $WALFILE" >> /tmp/waltest.log
> fi
>
> exit $STS
>
>
>

Looking at the above snippet, $STS is not properly initialized in the 
failure section of the code...so you might be getting false 0 returns 
when rsync is actually setting some other value.

regards

Mark

Re: Reliable WAL file shipping over unreliable network

От

Rui DeSousa

Дата:

01 марта 2018 г., 08:27:24

STS is initialized as 3; and is only set to zero if rsync returns zero.


> On Feb 28, 2018, at 8:59 PM, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote:
>
> Hi
>
> On 01/03/18 14:10, Rui DeSousa wrote:
>> I’ve tested this and it seems that there is a still a bug in rsync (rsync  version 3.1.2  protocol version 31).  I
useda 1GB archive filesytem to allow for an out of space test case.  Not sure of the actual cause as it seems to work a
fewtimes; however, it then fails leaving a truncated file and returning a success code. 
>>
>>
>> Archive Script critical code:
>>
>> OUTPUT=$(rsync -ac $XLOGFILE $ARCH_SERVER:$ARCH_DIR/$WALFILE)
>> if [ $? == 0 ]; then
>>    STS=0
>>    echo "Success: $WALFILE" >> /tmp/waltest.log
>> else
>>    echo "Failed: $WALFILE" >> /tmp/waltest.log
>> fi
>>
>> exit $STS
>>
>>
>>
>
> Looking at the above snippet, $STS is not properly initialized in the failure section of the code...so you might be
gettingfalse 0 returns when rsync is actually setting some other value. 
>
> regards
>
> Mark

Re: Reliable WAL file shipping over unreliable network

От

Mark Kirkwood

Дата:

01 марта 2018 г., 09:46:41

Hi

On 01/03/18 15:27, Rui DeSousa wrote:
> STS is initialized as 3; and is only set to zero if rsync returns zero.
>
>

Ok, so if that works set correctly then another possibility is the 
archive filesystem (what is that BTW?) not handling out of space 
nicely/properly. I'm thinking this because doing some testing against a 
small ext4 filesystem finds rsync reliably returning 11 when trying to 
copy a file that will not fit (version 3.1.2).

regards
Mark

Re: Reliable WAL file shipping over unreliable network

От

Rui DeSousa

Дата:

01 марта 2018 г., 10:57:44

It looks like rsync is not handling the disk quota when actually syncing the file —  Disc quota exceeded (69) vs no
spaceleft (28).  It only fails if it can’t rename the truncated temp file — and in some cases there is just enough
spaceleft to allow the rename to succeed thus leaving a truncated WAL file.   

I’m using ZFS; and have 1GB quota on the given filesystem.

[postgres@hades ~/dbc1/pg_wal]$ rsync -a 000000010000005B00000035 ~/arch/dbc1/wal/000000010000005B00000035
rsync: rename "/usr/home/postgres/arch/dbc1/wal/.000000010000005B00000035.6bjO94" -> "000000010000005B00000035": Disc
quotaexceeded (69) 
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1200) [sender=3.1.2]

Re: Reliable WAL file shipping over unreliable network

От

Mark Kirkwood

Дата:

01 марта 2018 г., 11:03:42

On 01/03/18 17:57, Rui DeSousa wrote:

> It looks like rsync is not handling the disk quota when actually syncing the file —  Disc quota exceeded (69) vs no
spaceleft (28).  It only fails if it can’t rename the truncated temp file — and in some cases there is just enough
spaceleft to allow the rename to succeed thus leaving a truncated WAL file.
 
>
> I’m using ZFS; and have 1GB quota on the given filesystem.
>
> [postgres@hades ~/dbc1/pg_wal]$ rsync -a 000000010000005B00000035 ~/arch/dbc1/wal/000000010000005B00000035
> rsync: rename "/usr/home/postgres/arch/dbc1/wal/.000000010000005B00000035.6bjO94" -> "000000010000005B00000035": Disc
quotaexceeded (69)
 
> rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1200) [sender=3.1.2]
>
>

Interesting - so could be rsync not handling the quota error sensibly - 
or perhaps more likely ZFS and/or quota bugs. If the latter the solution 
would be not to use these until the devs have fixed them... :-)

regards
Mark

Re: Reliable WAL file shipping over unreliable network

От

Rui DeSousa

Дата:

01 марта 2018 г., 11:18:01

I can tell you this; it’s not ZFS or it’s quotas.  I know that Postgres has no problems with ZFS and quotas; I have
seenPostgres development environments run out of quota space on ZFS and Postgres never got corrupted — it would just
crashand end up in a crash/restore loop.  Just increase the quota and Postgres recovers without issue exiting the
crash/restoreloop, cleanup the rouge space usage, and reduce the quota back down.  It makes recovering from out of
spaceissues very quick.

Re: Reliable WAL file shipping over unreliable network

От

scott ribe

Дата:

01 марта 2018 г., 11:21:11

On Feb 28, 2018, at 10:03 PM, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote:
>
> Interesting - so could be rsync not handling the quota error sensibly - or perhaps more likely ZFS and/or quota bugs.
Ifthe latter the solution would be not to use these until the devs have fixed them... :-) 

Or realize that if you have less disk space than needed for a consistent replica or archive, you cannot possibly have a
consistentreplica or archive. 

The false report of success is not good, but it's not the root problem.

--
Scott Ribe
https://www.linkedin.com/in/scottribe/
(303) 722-0567

Re: Reliable WAL file shipping over unreliable network

От

Mark Kirkwood

Дата:

01 марта 2018 г., 11:32:11

On 01/03/18 18:18, Rui DeSousa wrote:

> I can tell you this; it’s not ZFS or it’s quotas.  I know that Postgres has no problems with ZFS and quotas; I have
seenPostgres development environments run out of quota space on ZFS and Postgres never got corrupted — it would just
crashand end up in a crash/restore loop.  Just increase the quota and Postgres recovers without issue exiting the
crash/restoreloop, cleanup the rouge space usage, and reduce the quota back down.  It makes recovering from out of
spaceissues very quick.

>
>

So enabling quotas on the same ext4 filesystem I reliably getting 'quota 
exceeded' and 11 return code from rsync. This is on Ubuntu 17.10 (not 
sure what you are on)...but to me it points the finger at a ZFS bug.

Also as has just been pointed out - typically people enable monitoring 
and provision so as to specifically avoid the weird edge cases for out 
of disk/quota etc.

regards
Mark

Re: Reliable WAL file shipping over unreliable network

От

Rui DeSousa

Дата:

01 марта 2018 г., 20:21:18

On Mar 1, 2018, at 12:21 AM, scott ribe <scott_ribe@elevated-dev.com> wrote:

The false report of success is not good, but it's not the root problem.

A false success if a problem; especially in this use case as the source WAL file will be deleted by Postgres before it was truly successful. While monitoring is nice to avoid the issue it is not a fix for the issue.

I personally cannot recommend the use of rsync in this application for two reasons.

1. It adds no value; it’s a more complex cp command (no bandwidth saved, etc as archive processes a single file at a time).

2. It lies on success/failure — Period.

I have use “cat” longer than I have used rsync to archive WALs. I can say that I’ve lost zero WAL files using cat; I can not say the same for rsync.

The following code is more reliable than rsync and works with across multiple platforms and filesystems without fail.

STS=3

OUTPUT=$(cat $XLOGFILE | $SSH_CMD "(mkdir -p $ARCH_DIR && cat > $ARCH_DIR/$WALFILE.swap) && mv $ARCH_DIR/$WALFILE.swap $ARCH_DIR/$WALFILE")

if [ $? == 0 ]; then

STS=0

exit $STS

Re: Reliable WAL file shipping over unreliable network

От

Mark Kirkwood

Дата:

02 марта 2018 г., 04:40:51

On 02/03/18 03:21, Rui DeSousa wrote:

>
>
>> On Mar 1, 2018, at 12:21 AM, scott ribe <scott_ribe@elevated-dev.com 
>> <mailto:scott_ribe@elevated-dev.com>> wrote:
>>
>> The false report of success is not good, but it's not the root problem.
>
> A false success if a problem; especially in this use case as the 
> source WAL file will be deleted by Postgres before it was truly 
> successful.  While monitoring is nice to avoid the issue it is not a 
> fix for the issue.
>
> I personally cannot recommend the use of rsync in this application for 
> two reasons.
>
> 1. It adds no value; it’s a more complex cp command (no bandwidth 
> saved, etc as archive processes a single file at a time).
> 2. It lies on success/failure — Period.
>
>
> I have use “cat” longer than I have used rsync to archive WALs.  I can 
> say that I’ve lost zero WAL files using cat; I can not say the same 
> for rsync.
>
> The following code is more reliable than rsync and works with across 
> multiple platforms and filesystems without fail.
>
> STS=3
>
> OUTPUT=$(cat $XLOGFILE | $SSH_CMD "(mkdir -p $ARCH_DIR && cat > 
> $ARCH_DIR/$WALFILE.swap) && mv $ARCH_DIR/$WALFILE.swap 
> $ARCH_DIR/$WALFILE")
> if [ $? == 0 ]; then
>    STS=0
> fi
>
> exit $STS
>
>

If you have a self contained case that demonstrates rsync returning 0 
when it has actually failed, then please do get the rsync authors 
involved in investigating it (I'm sure they would be interested).

Now I've been unable to reproduce any cases of bad return codes or zero 
length files (using rsync based archive command + quotas), however I'm 
probably not using the same setup as you (and probably a different 
platform as well).

regards
Mark

Re: Reliable WAL file shipping over unreliable network

От

Rui DeSousa

Дата:

02 марта 2018 г., 04:46:31

I opened a ticket this morning:

https://bugzilla.samba.org/show_bug.cgi?id=13317

Re: Reliable WAL file shipping over unreliable network

От

Stephen Frost

Дата:

03 марта 2018 г., 00:09:11

Greetings,

* Rui DeSousa (rui.desousa@icloud.com) wrote:
> > On Mar 1, 2018, at 12:21 AM, scott ribe <scott_ribe@elevated-dev.com> wrote:
> > The false report of success is not good, but it's not the root problem.
>
> A false success if a problem; especially in this use case as the source WAL file will be deleted by Postgres before
itwas truly successful.  While monitoring is nice to avoid the issue it is not a fix for the issue. 
>
> I personally cannot recommend the use of rsync in this application for two reasons.

There's a bigger problem that I'm amazed hasn't been brought up already-
the rsync, copy, cp, whatever, may finish just fine and then the system
that the WAL file was copied to crashes and you end up losing a bunch of
WAL because none of these methods actually sync's the WAL to disk.

No archive_command should ever return success until the WAL is actually
written out to permanent storage and sync'd.  That's what tools like
pgBackRest do and is part of the reason why people really shouldn't try
to hack up their own archive_command solution but should be using well
tested existing solutions.

Thanks!

Stephen

Вложения

signature.asc

Re: Reliable WAL file shipping over unreliable network

От

Rui DeSousa

Дата:

03 марта 2018 г., 00:22:53

Not a problem; as you should be archiving your WALs to multiple sites simultaneously.  You’re DR site should not rely
onany resources stored at the primary site. 


> On Mar 2, 2018, at 1:09 PM, Stephen Frost <sfrost@snowman.net> wrote:
>
> Greetings,
>
> * Rui DeSousa (rui.desousa@icloud.com) wrote:
>>> On Mar 1, 2018, at 12:21 AM, scott ribe <scott_ribe@elevated-dev.com> wrote:
>>> The false report of success is not good, but it's not the root problem.
>>
>> A false success if a problem; especially in this use case as the source WAL file will be deleted by Postgres before
itwas truly successful.  While monitoring is nice to avoid the issue it is not a fix for the issue. 
>>
>> I personally cannot recommend the use of rsync in this application for two reasons.
>
> There's a bigger problem that I'm amazed hasn't been brought up already-
> the rsync, copy, cp, whatever, may finish just fine and then the system
> that the WAL file was copied to crashes and you end up losing a bunch of
> WAL because none of these methods actually sync's the WAL to disk.
>
> No archive_command should ever return success until the WAL is actually
> written out to permanent storage and sync'd.  That's what tools like
> pgBackRest do and is part of the reason why people really shouldn't try
> to hack up their own archive_command solution but should be using well
> tested existing solutions.
>
> Thanks!
>
> Stephen

Re: Reliable WAL file shipping over unreliable network

От

Nagy László Zsolt

Дата:

03 марта 2018 г., 01:43:25

>
> https://bugzilla.samba.org/show_bug.cgi?id=13317
>
Wow, I would never have tought that a so simple question ends up in an
rsync bugreport.

Re: Reliable WAL file shipping over unreliable network

От

Stephen Frost

Дата:

04 марта 2018 г., 21:06:40

Greetings,

* Rui DeSousa (rui.desousa@icloud.com) wrote:
> Not a problem; as you should be archiving your WALs to multiple sites simultaneously.  You’re DR site should not rely
onany resources stored at the primary site. 

Please don't top-post on the PG mailing lists.  We prefer in-line
responses, such as this.

I agree that archiving WAL to multiple sites is a good idea, but none of
the proposed examples does that or even provides a way to easily and
reliably do it, so that doesn't actually address the issue.

Thanks!

Stephen

Вложения

signature.asc

Re: Reliable WAL file shipping over unreliable network

От

Rui DeSousa

Дата:

05 марта 2018 г., 04:29:44

In that case I suggest reading up on replication slots and pg_receievexlog.  Everyone’s tropology and requirements are
differentand there are many ways to skin a cat.  

> On Mar 4, 2018, at 10:06 AM, Stephen Frost <sfrost@snowman.net> wrote:
>
> Greetings,
>
> * Rui DeSousa (rui.desousa@icloud.com) wrote:
>> Not a problem; as you should be archiving your WALs to multiple sites simultaneously.  You’re DR site should not
relyon any resources stored at the primary site. 
>
> Please don't top-post on the PG mailing lists.  We prefer in-line
> responses, such as this.
>
> I agree that archiving WAL to multiple sites is a good idea, but none of
> the proposed examples does that or even provides a way to easily and
> reliably do it, so that doesn't actually address the issue.
>
> Thanks!
>
> Stephen

Re: Reliable WAL file shipping over unreliable network

От

Stephen Frost

Дата:

05 марта 2018 г., 17:15:21

Greetings,

* Rui DeSousa (rui.desousa@icloud.com) wrote:
> In that case I suggest reading up on replication slots and pg_receievexlog.  Everyone’s tropology and requirements
aredifferent and there are many ways to skin a cat.  

Please don't top-post.

Using replication slots and pg_receivewal can be a good solution if
you're on a version which supports the combination of both.  That
doesn't mean the proposed archive commands are any good though, just to
be clear.  An issue with pg_receivewal though is that it's
single-threaded.

Thanks!

Stephen

Вложения

signature.asc

Re: Reliable WAL file shipping over unreliable network

От

Rui DeSousa

Дата:

05 марта 2018 г., 20:52:53

On Mar 5, 2018, at 6:15 AM, Stephen Frost <sfrost@snowman.net> wrote:

Using replication slots and pg_receivewal can be a good solution if
you're on a version which supports the combination of both. That
doesn't mean the proposed archive commands are any good though, just to
be clear. An issue with pg_receivewal though is that it's
single-threaded.

I must say this is a bit absurd; I didn’t realize that telling someone not to delete Postgres WAL files from underneath Postgres would require me to provide a complete redundant backup solution.

Pg_receivxlog or the archive_command being single threaded is not an issue; at least not for me and I’m generating 1/2TB of WALs a day. The real problem is that applying the WALs is single threaded — trying to apply a single days worth of WALs takes too long and is one of the reasons I take multiple backups a day to reduce the number of WALs required during a PITR.

The solution you proposed; would not be able to keep up with the rate of backups I issue daily nor is it capable of taking a backups on the replica at least not yet from the presentation I reviewed. I do my backups on replicas at multiple sites with WAL files also being stored at multiple sites. I also do daily restores in a lower environment which take less than 5 minutes to do — obviously I’m making extensive use of snapshots and snapshot replication.

If your archive server is crashing then you have other issues and one should work to remove single points of failure. I’m not sure what filesystem you’re using but the one I use sync to disk every 30 seconds.

The original poster stated that rsync wasn’t even an option and it not even using it. I’m not here trying to push a backup solution or anything else; I was just trying give some simple advice to the given problem.

Re: Reliable WAL file shipping over unreliable network

От

Stephen Frost

Дата:

05 марта 2018 г., 21:02:18

Greetings,

* Rui DeSousa (rui.desousa@icloud.com) wrote:
> > On Mar 5, 2018, at 6:15 AM, Stephen Frost <sfrost@snowman.net> wrote:
> > Using replication slots and pg_receivewal can be a good solution if
> > you're on a version which supports the combination of both.  That
> > doesn't mean the proposed archive commands are any good though, just to
> > be clear.  An issue with pg_receivewal though is that it's
> > single-threaded.
>
> I must say this is a bit absurd; I didn’t realize that telling someone not to delete Postgres WAL files from
underneathPostgres would require me to provide a complete redundant backup solution. 

It doesn't- but please don't encourage partial solutions which have very
clear issues.

> Pg_receivxlog or the archive_command being single threaded is not an issue; at least not for me and I’m generating
1/2TBof WALs a day.  The real problem is that applying the WALs is single threaded — trying to apply a single days
worthof WALs takes too long and is one of the reasons I take multiple backups a day to reduce the number of WALs
requiredduring a PITR. 

I agree that the single-threaded nature of pg_basebackup and
pg_receivewal isn't an issue for all environments, and that it would be
good to have a way to replay WAL faster, and that performing frequent
incrementals is a way to reduce the amount of WAL to be replayed.

> The solution you proposed; would not be able to keep up with the rate of backups I issue daily nor is it capable of
takinga backups on the replica at least not yet from the presentation I reviewed.  I do my backups on replicas at
multiplesites with WAL files also being stored at multiple sites.  I also do daily restores in a lower environment
whichtake less than 5 minutes to do — obviously I’m making extensive use of snapshots and snapshot replication. 

The only solution which I mentioned on this thread was pgBackRest, which
keeps up very well with very frequent backups, including those which
utilize a replica, so I'm not sure what you're referring to here.
pgBackRest is also able to perform fast restores when there have been
few changes.  Note that you had suggested pg_receivewal- I was just
commenting on it.  Snapshots and snapshot replication can be useful, but
it's vital to ensure that the entire snapshot is taken atomically across
all tablespaces (unless you're doing both snapshots *and* using
pg_start/stop_backup, collecting necessary WAL, and being sure to put
the backup_label file into place during restore).

> If your archive server is crashing then you have other issues and one should work to remove single points of failure.
I’m not sure what filesystem you’re using but the one I use sync to disk every 30 seconds. 

This thread began with a concern about an unreliable network, I don't
think it's unreasonable to consider other failure scenarios.  PostgreSQL
performs a great deal of work to ensure that data is written out in a
manner which can be recovered in the event of a server crash- utilities
which are also working with that data should be similairly resiliant.

> The original poster stated that rsync wasn’t even an option and it not even using it.  I’m not here trying to push a
backupsolution or anything else; I was just trying give some simple advice to the given problem. 

I agree that rsync is a bad approach.  Similairly, proposing archive
commands which don't do the basics, such as ensure the WAL file has been
sync'd to disk before returning success, isn't good.

Thanks!

Stephen

Вложения

signature.asc

Re: Reliable WAL file shipping over unreliable network

От

Rui DeSousa

Дата:

06 марта 2018 г., 23:55:20

On Mar 5, 2018, at 10:02 AM, Stephen Frost <sfrost@snowman.net> wrote:

It doesn’t- but please don't encourage partial solutions which have ver clear issues.

Then problem is there are no good base utilities that is useful with archive_command; unless you’re writing directly to an NFS mount or a tape library. Even Barman recommends rsync with the archive_command; if you are unable to use pg_receivexlog solution. There are countless Postgres documentation out there that recommends use of rsync with the archive_command.

Here a solution that will fsync() file on the other end.

SSH_CMD="ssh -o ServerAliveInterval=20 $ARCH_SERVER"

STS=3

OUTPUT=$(cat $XLOGFILE | $SSH_CMD "(mkdir -p $ARCH_DIR && ~/bin/fwrite $ARCH_DIR/$WALFILE)")

if [ $? == 0 ]; then

STS=0

exit $STS

fwrite code:

#include <sys/stat.h>

#include <err.h>

#include <fcntl.h>

#include <stdio.h>

#include <stdlib.h>

#include <unistd.h>

#include <string.h>

#define BUFSIZE 131072

int

main(int argc, char *argv[])

{

int fd, r, w;

char *buf;

char *name;

if (argc != 2) {

fprintf(stderr, "usage: fwrite [file]\n");

exit(1);

}

if ((buf = malloc(BUFSIZE)) == NULL)

err(1, "malloc");

++argv;

if ((name = (char *) malloc(strlen(*argv) + 10)) == NULL)

err(1, "malloc");

strcat(strcpy(name, *argv), ".XXXXXX");

if ((fd = mkstemp(name)) < 0)

err(1, "mkstemp");

while ((r = read(STDIN_FILENO, buf, BUFSIZE)) > 0)

if ((w = write(fd, buf, r)) == -1)

err(1, "write");

if (r < 0)

err(1, "read");

if (fsync(fd) != 0)

err(1, "fsync");

if (close(fd) != 0)

err(1, "close");

if (rename(name, *argv) != 0)

err(1, "rename");

free(name);

exit(0);

}

Re: Reliable WAL file shipping over unreliable network

От

David Steele

Дата:

07 марта 2018 г., 00:47:35

On 3/6/18 12:55 PM, Rui DeSousa wrote:
> 
>> On Mar 5, 2018, at 10:02 AM, Stephen Frost <sfrost@snowman.net
>> <mailto:sfrost@snowman.net>> wrote:
>>
>> It doesn’t- but please don't encourage partial solutions which have
>> ver clear issues.
> 
> Then problem is there are no good base utilities that is useful with
> archive_command; unless you’re writing directly to an NFS mount or a
> tape library.  

It's true that there are no good base utilities for this purpose, not
sure why NFS would be an exception, though.

> Even Barman recommends rsync with the archive_command; if
> you are unable to use pg_receivexlog solution.  There are
> countless Postgres documentation out there that recommends use of rsync
> with the archive_command.  

Also true, even though rsync has show itself to be a poor tool for
database backups.

> Here a solution that will fsync() file on the other end.
At a quick glance, that looks like it would work.

But why go through the trouble when pgBackRest does all of that (and
much more), and does not rely on rsync?  You are obviously knowledgeable
and like to do it yourself, but not everybody wants to develop a
solution from scratch, test it, and maintain it.

Regards,
-- 
-David
david@pgmasters.net

Re: Reliable WAL file shipping over unreliable network

От

Rui DeSousa

Дата:

07 марта 2018 г., 00:53:57

On Mar 6, 2018, at 1:47 PM, David Steele <david@pgmasters.net> wrote:

It's true that there are no good base utilities for this purpose, not
sure why NFS would be an exception, though.

NFS file writes are synchronous writes on the NFS server — unless you make them asynchronous which makes for a bad NFS server.

Re: Reliable WAL file shipping over unreliable network

От

David Steele

Дата:

07 марта 2018 г., 00:58:35

On 3/6/18 1:53 PM, Rui DeSousa wrote:
> 
>> On Mar 6, 2018, at 1:47 PM, David Steele <david@pgmasters.net
>> <mailto:david@pgmasters.net>> wrote:
>>
>> It's true that there are no good base utilities for this purpose, not
>> sure why NFS would be an exception, though.
> 
> NFS file writes are synchronous writes on the NFS server — unless you
> make them asynchronous which makes for a bad NFS server.

In my experience this is common -- because it gives better performance
and most people don't understand the risks.

-- 
-David
david@pgmasters.net

Re: Reliable WAL file shipping over unreliable network

От

Stephen Frost

Дата:

07 марта 2018 г., 01:00:04

Greetings,

* Rui DeSousa (rui.desousa@icloud.com) wrote:
> > On Mar 5, 2018, at 10:02 AM, Stephen Frost <sfrost@snowman.net> wrote:
> > It doesn’t- but please don't encourage partial solutions which have ver clear issues.
>
> Then problem is there are no good base utilities that is useful with archive_command; unless you’re writing directly
toan NFS mount or a tape library.  Even Barman recommends rsync with the archive_command; if you are unable to use
pg_receivexlogsolution.  There are countless Postgres documentation out there that recommends use of rsync with the
archive_command.  
>
> Here a solution that will fsync() file on the other end.

I'd encourage you to consider submitting this to a commitfest to get
feedback on it, as a proper patch to add such a utlity to PG and which
we could use in our documentation instead of saying "cp", which is
pretty terrible.

Then again, as David mentioned, there are already utilities which do
this properly, even if they're not included in core today.

Thanks!

Stephen

Вложения

signature.asc

Re: Reliable WAL file shipping over unreliable network

От

David Steele

Дата:

07 марта 2018 г., 01:31:55

On 3/6/18 2:00 PM, Stephen Frost wrote:
>
> * Rui DeSousa (rui.desousa@icloud.com) wrote:
>>> On Mar 5, 2018, at 10:02 AM, Stephen Frost <sfrost@snowman.net> wrote:
>>> It doesn’t- but please don't encourage partial solutions which have ver clear issues.
>>
>> Then problem is there are no good base utilities that is useful with archive_command; unless you’re writing directly
toan NFS mount or a tape library.  Even Barman recommends rsync with the archive_command; if you are unable to use
pg_receivexlogsolution.  There are countless Postgres documentation out there that recommends use of rsync with the
archive_command.  
>>
>> Here a solution that will fsync() file on the other end.
>
> I'd encourage you to consider submitting this to a commitfest to get
> feedback on it, as a proper patch to add such a utlity to PG and which
> we could use in our documentation instead of saying "cp", which is
> pretty terrible.

+1.  I believe there's been talk of a pgcopy utility before but it never
went anywhere.  I think it would be worth it just to have a decent
example in the docs.

--
-David
david@pgmasters.net

Вложения

signature.asc

Re: Reliable WAL file shipping over unreliable network

От

Johannes Truschnigg

Дата:

07 марта 2018 г., 01:39:33

On Tue, Mar 06, 2018 at 02:31:55PM -0500, David Steele wrote:
> On 3/6/18 2:00 PM, Stephen Frost wrote:
> > [...]
> >
> > I'd encourage you to consider submitting this to a commitfest to get
> > feedback on it, as a proper patch to add such a utlity to PG and which
> > we could use in our documentation instead of saying "cp", which is
> > pretty terrible.
>
> +1.  I believe there's been talk of a pgcopy utility before but it never
> went anywhere.  I think it would be worth it just to have a decent
> example in the docs.

I wrote a (rather trivial, but it has proven useful) utility for a
minimalistic "archive_command" a while ago - it is available at
https://johannes.truschnigg.info/code/pg_archive_wal_segment-2.0.0/
because I was unhappy with the suboptimal approach that the shell one-liner
cited in the docs took.

I'm not sure if it will survive a thousand eyeballs' close look, but it has
served us well for more than three years now with anywhere between 10000 and
80000 WAL segments archived each day (to a destination filesystem that is then
exported as an NFS share).

--
with best regards:
- Johannes Truschnigg ( johannes@truschnigg.info )

www:   https://johannes.truschnigg.info/
phone: +43 650 2 133337
xmpp:  johannes@truschnigg.info

Please do not bother me with HTML-email or attachments. Thank you.

Вложения

signature.asc

Re: Reliable WAL file shipping over unreliable network

От

Mark Kirkwood

Дата:

07 марта 2018 г., 09:18:30

On 07/03/18 08:31, David Steele wrote:

> On 3/6/18 2:00 PM, Stephen Frost wrote:
>> * Rui DeSousa (rui.desousa@icloud.com) wrote:
>>>> On Mar 5, 2018, at 10:02 AM, Stephen Frost <sfrost@snowman.net> wrote:
>>>> It doesn’t- but please don't encourage partial solutions which have ver clear issues.
>>> Then problem is there are no good base utilities that is useful with archive_command; unless you’re writing
directlyto an NFS mount or a tape library.  Even Barman recommends rsync with the archive_command; if you are unable to
usepg_receivexlog solution.  There are countless Postgres documentation out there that recommends use of rsync with the
archive_command.
>>>
>>> Here a solution that will fsync() file on the other end.
>> I'd encourage you to consider submitting this to a commitfest to get
>> feedback on it, as a proper patch to add such a utlity to PG and which
>> we could use in our documentation instead of saying "cp", which is
>> pretty terrible.
> +1.  I believe there's been talk of a pgcopy utility before but it never
> went anywhere.  I think it would be worth it just to have a decent
> example in the docs.
>
+1. I think if there is a strong feeling that the current docs are 
suggesting things that shouldn't actually be done, then we should 
provide an alternative in the distribution itself.

Re Pgbackrest (and other backup tools that various Postgres companies 
have written) - while these may be great (Pgbackrest looks pretty good 
for instance), they don't come with the Postgres package itself, and our 
docs suggest something else. Hence discussion threads like this one!

The added (new) issue that there may be some unusual bug with rsync + 
quotas and ZFS on BSD (at least) gives a bit more weight to this idea.

regards
Mark

Re: Reliable WAL file shipping over unreliable network

От

Mark Kirkwood

Дата:

08 марта 2018 г., 03:01:53

On 07/03/18 16:18, Mark Kirkwood wrote:

>
>
> The added (new) issue that there may be some unusual bug with rsync + 
> quotas and ZFS on BSD (at least) gives a bit more weight to this idea.
>
>
>

Looking at the progress of said bug 
(https://bugzilla.samba.org/show_bug.cgi?id=13317), it appears to be:

- due interaction of ZFS and quotas
- specific to Freebsd
- patch has been developed

However, code review of rsync shows it needs a patch too (not always 
checking success of writes). I'll keep an eye out for progress there.

regards
Mark

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Reliable WAL file shipping over unreliable network

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения