Обсуждение: [GENERAL] Shared WAL archive between master and standby: WALs not always identical

Поиск
Список
Период
Сортировка

[GENERAL] Shared WAL archive between master and standby: WALs not always identical

От
Sasa Vilic
Дата:
Hallo,

I am trying to setup shared WAL archive between master and standby. Standby is synchronously streaming from master and both servers run with archive_mode = always. The ideas is that when promoting standby to master we would not missed WALs.

My problem is that sometimes WAL uploaded from master and from slave are not 100% identical. In most cases they are but occasionally they are not. I have written small script that ensures that upload is free of race condition and I log md5 sum of each WAL. Aren't WALs from master and standby supposed to be identical? After all, standby is just consuming WAL that it is receiving from master ...

Or do you have any better suggestion on how to achieve continuous incremental backup?

Thanks in advance

Re: [GENERAL] Shared WAL archive between master and standby: WALs notalways identical

От
Adrian Klaver
Дата:
On 02/27/2017 04:40 PM, Sasa Vilic wrote:
> Hallo,
>
> I am trying to setup shared WAL archive between master and standby.
> Standby is synchronously streaming from master and both servers run with
> archive_mode = always. The ideas is that when promoting standby to
> master we would not missed WALs.

I seem to be missing the point of duplicating your effort.

You are doing this, correct?:

Master WAL --> WAL archive <--
                               |
Master stream --> Standby --> |

I can't see how the Standby contributes anything to the archive that it
does not already have from the Master?

>
> My problem is that sometimes WAL uploaded from master and from slave are
> not 100% identical. In most cases they are but occasionally they are
> not. I have written small script that ensures that upload is free of
> race condition and I log md5 sum of each WAL. Aren't WALs from master
> and standby supposed to be identical? After all, standby is just
> consuming WAL that it is receiving from master ...
>
> Or do you have any better suggestion on how to achieve continuous
> incremental backup?
>
> Thanks in advance


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: [GENERAL] Shared WAL archive between master and standby: WALs notalways identical

От
"David G. Johnston"
Дата:
On Mon, Feb 27, 2017 at 5:40 PM, Sasa Vilic <sasavilic@gmail.com> wrote:
Aren't WALs from master and standby supposed to be identical?

​This would seem unwise to assume on its face and at least one piece of documentation directly mentions that it is false:


"""
When continuous WAL archiving is used in a standby, there are two different scenarios: the WAL archive can be shared between the primary and the standby, or the standby can have its own WAL archive. When the standby has its own WAL archive, set archive_mode to always, and the standby will call the archive command for every WAL segment it receives, whether it's by restoring from the archive or by streaming replication. The shared archive can be handled similarly, but the archive_command must test if the file being archived exists already, and if the existing file has identical contents. This requires more care in the archive_command, as it must be careful to not overwrite an existing file with different contents, but return success if the exactly same file is archived twice. And all that must be done free of race conditions, if two servers attempt to archive the same file at the same time.
"""

​The contents of both must match with respect to the data files but there are likely things that go into the master WAL stream solely for the purpose of communicating with a standby - ​and possibly some standby concepts that would be unique to the standby's WAL - that would cause them to differ.  Not familiar enough to quickly list examples of what those might be.  But IIUC the system seems designed around master->slave replication and doesn't support slave daisy-chains.

David J.

Re: [GENERAL] Shared WAL archive between master and standby: WALs notalways identical

От
Sasa Vilic
Дата:
Master is streaming directly to standby. Both master and standby are pushing WALs to archive.

My point is that in case that master crashed completely (and we failover to standby) and wal archiver on master didn't push everything to wal archive, we would still have a wal pushed from slave. Therefore there is no interruption in WAL stream.

Regards,
Sasa

On 28 February 2017 at 01:57, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 02/27/2017 04:40 PM, Sasa Vilic wrote:
Hallo,

I am trying to setup shared WAL archive between master and standby.
Standby is synchronously streaming from master and both servers run with
archive_mode = always. The ideas is that when promoting standby to
master we would not missed WALs.

I seem to be missing the point of duplicating your effort.

You are doing this, correct?:

Master WAL --> WAL archive <--
                              |
Master stream --> Standby --> |

I can't see how the Standby contributes anything to the archive that it does not already have from the Master?



My problem is that sometimes WAL uploaded from master and from slave are
not 100% identical. In most cases they are but occasionally they are
not. I have written small script that ensures that upload is free of
race condition and I log md5 sum of each WAL. Aren't WALs from master
and standby supposed to be identical? After all, standby is just
consuming WAL that it is receiving from master ...

Or do you have any better suggestion on how to achieve continuous
incremental backup?

Thanks in advance


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: [GENERAL] Shared WAL archive between master and standby: WALs notalways identical

От
Sasa Vilic
Дата:
Hi David,

thanks for the answer. I read this in documentation but here there is a corner case that I am not sure how to handle:
"""
This requires more care in the archive_command, as it must be careful to not overwrite an existing file with different contents, but return success if the exactly same file is archived twice.
"""
But what I am supposed to do when content differs? Still return success and ignore or return error? If I return error, wouldn't that prevent wal archiver slave from pushing further WALs?

Regards,
Sasa


On 28 February 2017 at 02:10, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Mon, Feb 27, 2017 at 5:40 PM, Sasa Vilic <sasavilic@gmail.com> wrote:
Aren't WALs from master and standby supposed to be identical?

​This would seem unwise to assume on its face and at least one piece of documentation directly mentions that it is false:


"""
When continuous WAL archiving is used in a standby, there are two different scenarios: the WAL archive can be shared between the primary and the standby, or the standby can have its own WAL archive. When the standby has its own WAL archive, set archive_mode to always, and the standby will call the archive command for every WAL segment it receives, whether it's by restoring from the archive or by streaming replication. The shared archive can be handled similarly, but the archive_command must test if the file being archived exists already, and if the existing file has identical contents. This requires more care in the archive_command, as it must be careful to not overwrite an existing file with different contents, but return success if the exactly same file is archived twice. And all that must be done free of race conditions, if two servers attempt to archive the same file at the same time.
"""

​The contents of both must match with respect to the data files but there are likely things that go into the master WAL stream solely for the purpose of communicating with a standby - ​and possibly some standby concepts that would be unique to the standby's WAL - that would cause them to differ.  Not familiar enough to quickly list examples of what those might be.  But IIUC the system seems designed around master->slave replication and doesn't support slave daisy-chains.

David J.


Re: [GENERAL] Shared WAL archive between master and standby: WALs notalways identical

От
Sasa Vilic
Дата:
And also this:

"""
If archive_mode is set to on, the archiver is not enabled during recovery or standby mode. If the standby server is promoted, it will start archiving after the promotion, but will not archive any WAL it did not generate itself. To get a complete series of WAL files in the archive, you must ensure that all WAL is archived, before it reaches the standby
"""

If I understand it correctly, WAL started on master but finished on standby (after failover) will still not be (old master is presumably dead and new master must wait for next WAL segment). Of course, the next WAL segment will be sent by new master, but we are going to miss exactly this one WAL segment during which failover occurred and thus introduce interruption in our WAL stream. Am I right?

Regards,
Sasa

On 28 February 2017 at 02:33, Sasa Vilic <sasavilic@gmail.com> wrote:
Hi David,

thanks for the answer. I read this in documentation but here there is a corner case that I am not sure how to handle:
"""
This requires more care in the archive_command, as it must be careful to not overwrite an existing file with different contents, but return success if the exactly same file is archived twice.
"""
But what I am supposed to do when content differs? Still return success and ignore or return error? If I return error, wouldn't that prevent wal archiver slave from pushing further WALs?

Regards,
Sasa


On 28 February 2017 at 02:10, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Mon, Feb 27, 2017 at 5:40 PM, Sasa Vilic <sasavilic@gmail.com> wrote:
Aren't WALs from master and standby supposed to be identical?

​This would seem unwise to assume on its face and at least one piece of documentation directly mentions that it is false:


"""
When continuous WAL archiving is used in a standby, there are two different scenarios: the WAL archive can be shared between the primary and the standby, or the standby can have its own WAL archive. When the standby has its own WAL archive, set archive_mode to always, and the standby will call the archive command for every WAL segment it receives, whether it's by restoring from the archive or by streaming replication. The shared archive can be handled similarly, but the archive_command must test if the file being archived exists already, and if the existing file has identical contents. This requires more care in the archive_command, as it must be careful to not overwrite an existing file with different contents, but return success if the exactly same file is archived twice. And all that must be done free of race conditions, if two servers attempt to archive the same file at the same time.
"""

​The contents of both must match with respect to the data files but there are likely things that go into the master WAL stream solely for the purpose of communicating with a standby - ​and possibly some standby concepts that would be unique to the standby's WAL - that would cause them to differ.  Not familiar enough to quickly list examples of what those might be.  But IIUC the system seems designed around master->slave replication and doesn't support slave daisy-chains.

David J.



Re: [GENERAL] Shared WAL archive between master and standby: WALs notalways identical

От
"David G. Johnston"
Дата:
On Mon, Feb 27, 2017 at 6:33 PM, Sasa Vilic <sasavilic@gmail.com> wrote:
Hi David,

thanks for the answer. I read this in documentation but here there is a corner case that I am not sure how to handle:
"""
This requires more care in the archive_command, as it must be careful to not overwrite an existing file with different contents, but return success if the exactly same file is archived twice.
"""
But what I am supposed to do when content differs? Still return success and ignore or return error? If I return error, wouldn't that prevent wal archiver slave from pushing further WALs?


​As Adrian said - I'm not seeing the point to even dealing with a shared archive.  My solution would be to avoid the problem completely by pointing the standby WAL elsewhere.

That said, if I was a guessing man, I would say that, yes, you indicate failure.  The file in question will exist within the archive and will contain the contents from the master.  The standby's view of the file would be discarded.​

"as it must be careful to not overwrite an existing file with different contents, but return success if the exactly same file is archived twice." - the unspoken flip side is not returning true if the "not overwrite" provision took precedence.

David J.

Re: [GENERAL] Shared WAL archive between master and standby: WALs notalways identical

От
Adrian Klaver
Дата:
On 02/27/2017 05:29 PM, Sasa Vilic wrote:
> Master is streaming directly to standby. Both master and standby are
> pushing WALs to archive.
>
> My point is that in case that master crashed completely (and we failover
> to standby) and wal archiver on master didn't push everything to wal
> archive, we would still have a wal pushed from slave. Therefore there is
> no interruption in WAL stream.

Still failing to see how the standby can have more information then what
the master had sent to it at the time of the crash.

>
> Regards,
> Sasa
>
> On 28 February 2017 at 01:57, Adrian Klaver <adrian.klaver@aklaver.com
> <mailto:adrian.klaver@aklaver.com>> wrote:
>
>     On 02/27/2017 04:40 PM, Sasa Vilic wrote:
>
>         Hallo,
>
>         I am trying to setup shared WAL archive between master and standby.
>         Standby is synchronously streaming from master and both servers
>         run with
>         archive_mode = always. The ideas is that when promoting standby to
>         master we would not missed WALs.
>
>
>     I seem to be missing the point of duplicating your effort.
>
>     You are doing this, correct?:
>
>     Master WAL --> WAL archive <--
>                                   |
>     Master stream --> Standby --> |
>
>     I can't see how the Standby contributes anything to the archive that
>     it does not already have from the Master?
>
>
>
>         My problem is that sometimes WAL uploaded from master and from
>         slave are
>         not 100% identical. In most cases they are but occasionally they are
>         not. I have written small script that ensures that upload is free of
>         race condition and I log md5 sum of each WAL. Aren't WALs from
>         master
>         and standby supposed to be identical? After all, standby is just
>         consuming WAL that it is receiving from master ...
>
>         Or do you have any better suggestion on how to achieve continuous
>         incremental backup?
>
>         Thanks in advance
>
>
>
>     --
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: [GENERAL] Shared WAL archive between master and standby: WALs notalways identical

От
"David G. Johnston"
Дата:
It is customary to inline or bottom-posts on these lists.  Please follow the example of those responding to your emails.

On Mon, Feb 27, 2017 at 6:45 PM, Sasa Vilic <sasavilic@gmail.com> wrote:
And also this:

"""
If archive_mode is set to on, the archiver is not enabled during recovery or standby mode. If the standby server is promoted, it will start archiving after the promotion, but will not archive any WAL it did not generate itself. To get a complete series of WAL files in the archive, you must ensure that all WAL is archived, before it reaches the standby
"""

If I understand it correctly, WAL started on master but finished on standby (after failover) will still not be (old master is presumably dead and new master must wait for next WAL segment). Of course, the next WAL segment will be sent by new master, but we are going to miss exactly this one WAL segment during which failover occurred and thus introduce interruption in our WAL stream. Am I right?

​Requires knowledge and familiarity I present lack.  Sorry.  I think I see where you are going with all of this but it would probably help to explicitly state the overall concern or plan and not just ask how specific mechanics work in isolation.

David J.

Re: [GENERAL] Shared WAL archive between master and standby: WALs notalways identical

От
Sasa Vilic
Дата:
Because standby is running in syncronous replication, whereby wal archiver is asynchronous. Therefore there is a small window where slave has received the data but master has not pushed it yet to wal archive.

Regards,
Sasa

Am 28.02.2017 02:48 schrieb "Adrian Klaver" <adrian.klaver@aklaver.com>:
On 02/27/2017 05:29 PM, Sasa Vilic wrote:
Master is streaming directly to standby. Both master and standby are
pushing WALs to archive.

My point is that in case that master crashed completely (and we failover
to standby) and wal archiver on master didn't push everything to wal
archive, we would still have a wal pushed from slave. Therefore there is
no interruption in WAL stream.

Still failing to see how the standby can have more information then what the master had sent to it at the time of the crash.


Regards,
Sasa

On 28 February 2017 at 01:57, Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:

    On 02/27/2017 04:40 PM, Sasa Vilic wrote:

        Hallo,

        I am trying to setup shared WAL archive between master and standby.
        Standby is synchronously streaming from master and both servers
        run with
        archive_mode = always. The ideas is that when promoting standby to
        master we would not missed WALs.


    I seem to be missing the point of duplicating your effort.

    You are doing this, correct?:

    Master WAL --> WAL archive <--
                                  |
    Master stream --> Standby --> |

    I can't see how the Standby contributes anything to the archive that
    it does not already have from the Master?



        My problem is that sometimes WAL uploaded from master and from
        slave are
        not 100% identical. In most cases they are but occasionally they are
        not. I have written small script that ensures that upload is free of
        race condition and I log md5 sum of each WAL. Aren't WALs from
        master
        and standby supposed to be identical? After all, standby is just
        consuming WAL that it is receiving from master ...

        Or do you have any better suggestion on how to achieve continuous
        incremental backup?

        Thanks in advance



    --
    Adrian Klaver
    adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>




--
Adrian Klaver
adrian.klaver@aklaver.com

Re: [GENERAL] Shared WAL archive between master and standby: WALs notalways identical

От
"David G. Johnston"
Дата:
On Mon, Feb 27, 2017 at 6:10 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
But IIUC the system seems designed around master->slave replication and doesn't support slave daisy-chains.


​I thought that sounded wrong when I wrote it...


David J.

Re: [GENERAL] Shared WAL archive between master and standby: WALs notalways identical

От
Sasa Vilic
Дата:
Am 28.02.2017 02:50 schrieb "David G. Johnston" <david.g.johnston@gmail.com>:
It is customary to inline or bottom-posts on these lists.  Please follow the example of those responding to your emails.

On Mon, Feb 27, 2017 at 6:45 PM, Sasa Vilic <sasavilic@gmail.com> wrote:
And also this:

"""
If archive_mode is set to on, the archiver is not enabled during recovery or standby mode. If the standby server is promoted, it will start archiving after the promotion, but will not archive any WAL it did not generate itself. To get a complete series of WAL files in the archive, you must ensure that all WAL is archived, before it reaches the standby
"""

If I understand it correctly, WAL started on master but finished on standby (after failover) will still not be (old master is presumably dead and new master must wait for next WAL segment). Of course, the next WAL segment will be sent by new master, but we are going to miss exactly this one WAL segment during which failover occurred and thus introduce interruption in our WAL stream. Am I right?

​Requires knowledge and familiarity I present lack.  Sorry.  I think I see where you are going with all of this but it would probably help to explicitly state the overall concern or plan and not just ask how specific mechanics work in isolation.

David J.

Hi David,

sorry about email formating. I didn't realize that such convetions exists as I am first time here + I am currently using gmail mobile app. I hope it looks good this time. :)

My general idea is to have synchronous hot standby and asynchronous shared wal archive. If that were possible I could actually switch back and forth between master and slave without interrupting wal stream and with very short downtime. This also makes PostgreSQL upgrade very easy. Alternative to this is to have separate backup for master and slave, but this has other disadvantages:

* I need double disk space for backup if I am going to archive WALs from standby and master at the same time, or
* I could only archive WALs from current master, but that would require performing base backup immediately after failover. Otherwise archived WALs are useless. 

Both of these solutions are good solution but not perfect. I thought that shared wal archive is possible, based on PostgreSQL documention. I also assume that requirement "not to overwrite existing WAL with different content" was only there to prevent from accidental mis-configuration. 

I wasn't aware that standby server would produce different WAL. My current test setup looks just like that. And it works, except for 2-3 WALs per day that are not identical. Everything else is same. I can even restore backup without issues, but those non-identical WALs still throuble me because I fear that I am missing something big.

I wrote a small python script to perform WAL decoding and I hope get more information tommorow morning. Since I was playing today with repmgr (which internaly uses pg_rewind for switchover) I got suspicios that this could be it.

Anyway, I hopped that someone had similar setup and experience with it. :(

Sorry for such lengthly email.

Regards,
Sasa

Re: [GENERAL] Shared WAL archive between master and standby: WALs notalways identical

От
"David G. Johnston"
Дата:
On Mon, Feb 27, 2017 at 7:32 PM, Sasa Vilic <sasavilic@gmail.com> wrote:

My general idea is to have synchronous hot standby and asynchronous shared wal archive. If that were possible I could actually switch back and forth between master and slave without interrupting wal stream and with very short downtime. This also makes PostgreSQL upgrade very easy. Alternative to this is to have separate backup for master and slave, but this has other disadvantages:

* I need double disk space for backup if I am going to archive WALs from standby and master at the same time, or
* I could only archive WALs from current master, but that would require performing base backup immediately after failover. Otherwise archived WALs are useless. 


I became a bit skeptical when I re​ad:

"To get a complete series of WAL files in the archive, you must ensure that all WAL is archived, before it reaches the standby. This is inherently true with file-based log shipping, as the standby can only restore files that are found in the archive, but not if streaming replication is enabled."

given the lack of any hint as to how one would accomplish "a complete series of WAL files in the archive" in streaming replication mode...

Maybe I'm just missing something here - but it does seem like you don't get to have your cake and eat it...

Dave

Re: [GENERAL] Shared WAL archive between master and standby: WALs notalways identical

От
Adrian Klaver
Дата:
On 02/27/2017 05:52 PM, Sasa Vilic wrote:
> Because standby is running in syncronous replication, whereby wal
> archiver is asynchronous. Therefore there is a small window where slave
> has received the data but master has not pushed it yet to wal archive.

Exactly. The standby already has the latest information, it would gain
nothing from fetching it from the archive and anything it wrote to the
archive would only have the information it got from the master at the
point of failure. That is before you promoted it, after it would be on
its own path independent of the master.

See here for more info:

https://www.postgresql.org/docs/9.6/static/warm-standby-failover.html

>
> Regards,
> Sasa
>
> Am 28.02.2017 02:48 schrieb "Adrian Klaver" <adrian.klaver@aklaver.com
> <mailto:adrian.klaver@aklaver.com>>:
>
>     On 02/27/2017 05:29 PM, Sasa Vilic wrote:
>
>         Master is streaming directly to standby. Both master and standby are
>         pushing WALs to archive.
>
>         My point is that in case that master crashed completely (and we
>         failover
>         to standby) and wal archiver on master didn't push everything to wal
>         archive, we would still have a wal pushed from slave. Therefore
>         there is
>         no interruption in WAL stream.
>
>
>     Still failing to see how the standby can have more information then
>     what the master had sent to it at the time of the crash.
>
>
>         Regards,
>         Sasa
>
>         On 28 February 2017 at 01:57, Adrian Klaver
>         <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>         <mailto:adrian.klaver@aklaver.com
>         <mailto:adrian.klaver@aklaver.com>>> wrote:
>
>             On 02/27/2017 04:40 PM, Sasa Vilic wrote:
>
>                 Hallo,
>
>                 I am trying to setup shared WAL archive between master
>         and standby.
>                 Standby is synchronously streaming from master and both
>         servers
>                 run with
>                 archive_mode = always. The ideas is that when promoting
>         standby to
>                 master we would not missed WALs.
>
>
>             I seem to be missing the point of duplicating your effort.
>
>             You are doing this, correct?:
>
>             Master WAL --> WAL archive <--
>                                           |
>             Master stream --> Standby --> |
>
>             I can't see how the Standby contributes anything to the
>         archive that
>             it does not already have from the Master?
>
>
>
>                 My problem is that sometimes WAL uploaded from master
>         and from
>                 slave are
>                 not 100% identical. In most cases they are but
>         occasionally they are
>                 not. I have written small script that ensures that
>         upload is free of
>                 race condition and I log md5 sum of each WAL. Aren't
>         WALs from
>                 master
>                 and standby supposed to be identical? After all, standby
>         is just
>                 consuming WAL that it is receiving from master ...
>
>                 Or do you have any better suggestion on how to achieve
>         continuous
>                 incremental backup?
>
>                 Thanks in advance
>
>
>
>             --
>             Adrian Klaver
>             adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>         <mailto:adrian.klaver@aklaver.com
>         <mailto:adrian.klaver@aklaver.com>>
>
>
>
>
>     --
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: [GENERAL] Shared WAL archive between master and standby: WALs notalways identical

От
Sasa Vilic
Дата:
On 2017-02-28 06:14, Adrian Klaver wrote:
> On 02/27/2017 05:52 PM, Sasa Vilic wrote:
>> Because standby is running in syncronous replication, whereby wal
>> archiver is asynchronous. Therefore there is a small window where slave
>> has received the data but master has not pushed it yet to wal archive.
>
> Exactly. The standby already has the latest information, it would gain
> nothing from fetching it from the archive and anything it wrote to the
> archive would only have the information it got from the master at the
> point of failure. That is before you promoted it, after it would be on
> its own path independent of the master.

Hi Adrian,

I am afraid that you don't understand me.

Standby is not fetching WAL from archive, it fetches it directly from
master and is done synchronously, which means that master will only
confirm transaction to client when WAL is streamed and applied at
standby. On the other hand, master does not have to wait for WAL
archiver. If master crashes before WAL archiver is able to send WAL, we
would still have it on standby.

Let us for the sake of demonstration consider that we have same very low
busy but very critical system:

1. Your client connects to primary server (master) and performs changes
on data
2. It just happen that this is a moment where PostgreSQL opens new WAL
segment. It writes few kilobytes in this new WAL segment but it has
almost 16MB to write before segment is complete. So the wal archiver has
to wait before it can push wal segment in wal archive
3. Secondary server (standby) is doing same, it is writing changes to
newly created WAL segment
4. Your client issues COMMIT
     - primary waits until changes are applied at secondary
     - primary flushes changes to WAL
     - secondary confirms transaction to primary
     - primary confirms transaction to client
     - WAL is still not processed by wal archiver because it is only
i.e. 1 MB big and we are still left 15MB to go
5. Primary server crashes, i.e. due to catastrophic disk failure
     - everything stops and can't be recovered
     - wal archiver is dead, but even if it were alive it wouldn't send
WAL to archive anyway because 16MB of wal segment was not filled up
6. We promote our secondary server to master
     - In secondary server's WAL we already got changes from primary
     - Secondary continues appending new changes to wal segment
7. Eventually WAL segment on secondary fills up and then pushes it to
wal archive.
8. Although primary is dead, we didn't loose anything because lost WAL
data was pushed by secondary.

Regards,
Sasa


Re: [GENERAL] Shared WAL archive between master and standby: WALs notalways identical

От
Adrian Klaver
Дата:
On 02/27/2017 11:14 PM, Sasa Vilic wrote:
> On 2017-02-28 06:14, Adrian Klaver wrote:
>> On 02/27/2017 05:52 PM, Sasa Vilic wrote:
>>> Because standby is running in syncronous replication, whereby wal
>>> archiver is asynchronous. Therefore there is a small window where slave
>>> has received the data but master has not pushed it yet to wal archive.
>>
>> Exactly. The standby already has the latest information, it would gain
>> nothing from fetching it from the archive and anything it wrote to the
>> archive would only have the information it got from the master at the
>> point of failure. That is before you promoted it, after it would be on
>> its own path independent of the master.
>
> Hi Adrian,
>
> I am afraid that you don't understand me.
>
> Standby is not fetching WAL from archive, it fetches it directly from
> master and is done synchronously, which means that master will only
> confirm transaction to client when WAL is streamed and applied at
> standby. On the other hand, master does not have to wait for WAL
> archiver. If master crashes before WAL archiver is able to send WAL, we
> would still have it on standby.
>
> Let us for the sake of demonstration consider that we have same very low
> busy but very critical system:
>
> 1. Your client connects to primary server (master) and performs changes
> on data
> 2. It just happen that this is a moment where PostgreSQL opens new WAL
> segment. It writes few kilobytes in this new WAL segment but it has
> almost 16MB to write before segment is complete. So the wal archiver has
> to wait before it can push wal segment in wal archive
> 3. Secondary server (standby) is doing same, it is writing changes to
> newly created WAL segment
> 4. Your client issues COMMIT
>     - primary waits until changes are applied at secondary
>     - primary flushes changes to WAL
>     - secondary confirms transaction to primary
>     - primary confirms transaction to client
>     - WAL is still not processed by wal archiver because it is only i.e.
> 1 MB big and we are still left 15MB to go
> 5. Primary server crashes, i.e. due to catastrophic disk failure
>     - everything stops and can't be recovered
>     - wal archiver is dead, but even if it were alive it wouldn't send
> WAL to archive anyway because 16MB of wal segment was not filled up
> 6. We promote our secondary server to master
>     - In secondary server's WAL we already got changes from primary
>     - Secondary continues appending new changes to wal segment
> 7. Eventually WAL segment on secondary fills up and then pushes it to
> wal archive.
> 8. Although primary is dead, we didn't loose anything because lost WAL
> data was pushed by secondary.

I understand the above, what I did not understand, from your original post:

"My problem is that sometimes WAL uploaded from master and from slave
are not 100% identical. In most cases they are but occasionally they are
not. I have written small script that ensures that upload is free of
race condition and I log md5 sum of each WAL."

To me that reads as you sending WALs to the archive from both the master
and the standby in parallel, instead of sequentially as you imply in the
outline above. It would seem to be confirmed by the setting of
archive_mode = always:

https://www.postgresql.org/docs/9.6/static/runtime-config-wal.html#RUNTIME-CONFIG-WAL-ARCHIVING

https://www.postgresql.org/docs/9.6/static/warm-standby.html#CONTINUOUS-ARCHIVING-IN-STANDBY

Seems to mean the simpler thing to do would be to set standby to
archive_mode = on, in which case the standby would not contribute WAL's
until it was promoted which would seem to be what you want.

>
> Regards,
> Sasa
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: [GENERAL] Shared WAL archive between master and standby: WALs notalways identical

От
Jon Nelson
Дата:


On Tue, Feb 28, 2017 at 9:41 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 02/27/2017 11:14 PM, Sasa Vilic wrote:
...
 
"My problem is that sometimes WAL uploaded from master and from slave are not 100% identical. In most cases they are but occasionally they are not. I have written small script that ensures that upload is free of race condition and I log md5 sum of each WAL."

The wisdom (or not!) in archiving WAL to the same location from multiple sources (even if they share a common ancestor) notwithstanding, I must admit to having my curiosity piqued. 

Let's assume a different situation:
- a master and one or more standby units each archiving every WAL file but to it's own archive
- we check to see if identically named WAL files are content identical

Does it surprise anybody else that, sometimes, an identically named WAL file from the master and from a standby have different contents?  It surprises me.

I would love to know if the differences are due to some oversight in the WAL archiving mechanism chosen by the OP or if, in fact, a master and a standby generate different WAL files!

What does pg_xlogdump say about the differences in the files?


--
Jon

Re: [GENERAL] Shared WAL archive between master and standby: WALs notalways identical

От
Sasa Vilic
Дата:
On 2017-02-28 16:41, Adrian Klaver wrote:
> Seems to mean the simpler thing to do would be to set standby to
> archive_mode = on, in which case the standby would not contribute WAL's
> until it was promoted which would seem to be what you want.

Yes, that was my first thought. Except that documentation states following:

"""
If archive_mode is set to on, the archiver is not enabled during
recovery or standby mode. If the standby server is promoted, it will
start archiving after the promotion, *but will not archive any WAL it
did not generate itself*
"""

What happens with WAL that are started on primary but finished on secondary?

Regards,
Sasa


Re: [GENERAL] Shared WAL archive between master and standby: WALs notalways identical

От
Sasa Vilic
Дата:
On 2017-02-28 16:57, Jon Nelson wrote:
> What does pg_xlogdump say about the differences in the files?

What a nice tool. I didn't realize that it exists for 9.6.

Unfortunately, we gave up on shared WAL archive, so I don't if I will
still have all both WALs. I have one conflicting WAL from one of the
servers and there is the *whole* content:

rmgr: Standby     len (rec/tot):     24/    50, tx:          0, lsn:
43/74000028, prev 43/73000140, desc: RUNNING_XACTS nextXid 11617888
latestCompletedXid 11617887 oldestRunningXid 11617888
rmgr: XLOG        len (rec/tot):     80/   106, tx:          0, lsn:
43/74000060, prev 43/74000028, desc: CHECKPOINT_ONLINE redo 43/74000028;
tli 13; prev tli 13; fpw true; xid 0:11617888; oid 25304; multi 1;
offset 0; oldest xid 1750 in DB 13322; oldest multi 1 in DB 1;
oldest/newest commit timestamp xid: 0/0; oldest running xid 11617888; online
rmgr: Standby     len (rec/tot):     24/    50, tx:          0, lsn:
43/740000D0, prev 43/74000060, desc: RUNNING_XACTS nextXid 11617888
latestCompletedXid 11617887 oldestRunningXid 11617888
rmgr: XLOG        len (rec/tot):      8/    34, tx:          0, lsn:
43/74000108, prev 43/740000D0, desc: BACKUP_END 43/74000028
rmgr: XLOG        len (rec/tot):      0/    24, tx:          0, lsn:
43/74000130, prev 43/74000108, desc: SWITCH

I don't have WAL from other server, but I hope I will be able to find it
tomorrow (if I haven't deleted it).

But I can share with you what I have observed by manually looking into
WALs with hex editor:

* If I remember correctly, first page was same (what we see above)
* For one of the logs (the one I have right now), all first 16 pages
except for the first had XLogPageHeaderData set (magic number=0xD093,
tli=13, pageaddr) but WITHOUT any payload. Everything except header was
zero. Remaining part of WAL WAS ALL ZEROED.
* Same WAL from other server was also similar. First page contained data
and remaining part of WAL was filled with XLogPageHeaderData WITHOUT
payload.
* The only different was that one WAL had XLogPageHeaderData without
payload and other zeros instead.

This was on system during initial setup so for most time we didn't have
any clients at all => not much to be logged in WAL. We were trying out
replication, failover/switchover scenarios with repmgr and creating and
restoring backup with barman.

Regards,
Sasa


Re: [GENERAL] Shared WAL archive between master and standby: WALs notalways identical

От
Adrian Klaver
Дата:
On 02/28/2017 02:20 PM, Sasa Vilic wrote:
> On 2017-02-28 16:41, Adrian Klaver wrote:
>> Seems to mean the simpler thing to do would be to set standby to
>> archive_mode = on, in which case the standby would not contribute WAL's
>> until it was promoted which would seem to be what you want.
>
> Yes, that was my first thought. Except that documentation states following:
>
> """
> If archive_mode is set to on, the archiver is not enabled during
> recovery or standby mode. If the standby server is promoted, it will
> start archiving after the promotion, *but will not archive any WAL it
> did not generate itself*
> """
>
> What happens with WAL that are started on primary but finished on
> secondary?

I thought that was covered in your scenario?:

"
5. Primary server crashes, i.e. due to catastrophic disk failure
     - everything stops and can't be recovered
     - wal archiver is dead, but even if it were alive it wouldn't send
WAL to archive anyway because 16MB of wal segment was not filled up
6. We promote our secondary server to master
     - In secondary server's WAL we already got changes from primary
     - Secondary continues appending new changes to wal segment
7. Eventually WAL segment on secondary fills up and then pushes it to
wal archive.
"

So the WAL that is sent to the archive by the standby is the one it
generated from the records it got via streaming replication from the master.

>
> Regards,
> Sasa
>


--
Adrian Klaver
adrian.klaver@aklaver.com