I did a quick test using near to empty databases. I did not do the promote but the basebackup with two different methods.
When I did it on MacOS and PostgreSQL 11.1, the .done file existed only under the data directory I created with -X fetch. The files were the same.
When I did it on FreeBSD and PostgreSQL 10.6, the .done file existed only under the -X fetch directory, and the wal files were also different. II don’t know whether it is a problem or not, but I could reproduce it at the first attempt.
This was after the two basebackups:
$ pg_basebackup -p 5433 -v -R -P -D 1 -X fetch
$ pg_basebackup -p 5433 -v -R -P -D 2 -X stream
$ diff -ur 1/pg_wal/ 2/pg_wal/
Only in 1/pg_wal/: 00000001000000000000000C
Only in 1/pg_wal/: 00000001000000000000000D
Files 1/pg_wal/00000001000000000000000E and 2/pg_wal/00000001000000000000000E differ
Only in 1/pg_wal/archive_status: 00000001000000000000000C.done
Only in 1/pg_wal/archive_status: 00000001000000000000000D.done
Only in 1/pg_wal/archive_status: 00000001000000000000000E.done
$ less log/1/2019-02-16_19-48-29.log
2019-02-16 19:48:29 CET LOG: database system was interrupted; last known up at 2019-02-16 19:44:45 CET
2019-02-16 19:48:29 CET LOG: entering standby mode
2019-02-16 19:48:29 CET LOG: redo starts at 0/C000028
2019-02-16 19:48:29 CET LOG: consistent recovery state reached at 0/C000130
2019-02-16 19:48:29 CET LOG: database system is ready to accept read only connections
2019-02-16 19:48:29 CET LOG: started streaming WAL from primary at 0/D000000 on timeline 1
$ less log/2/2019-02-16_19-48-34.log
2019-02-16 19:48:34 CET LOG: database system was interrupted; last known up at 2019-02-16 19:45:15 CET
2019-02-16 19:48:34 CET LOG: entering standby mode
2019-02-16 19:48:34 CET LOG: redo starts at 0/E000028
2019-02-16 19:48:34 CET LOG: consistent recovery state reached at 0/E000130
2019-02-16 19:48:34 CET LOG: database system is ready to accept read only connections
2019-02-16 19:48:34 CET LOG: started streaming WAL from primary at 0/F000000 on timeline 1
$ diff -ur 1/base/ 2/base/
Files 1/base/16386/pg_internal.init and 2/base/16386/pg_internal.init differ
I did nothing except for starting the two clusters. There was no activity on the master. I did not promote.
M.
On Sat, Feb 16, 2019 at 12:26:13AM +0000, PG Bug reporting form wrote:
> When new slave is created by taking base backup from the primary using
> pg_basebackup with --wal-method=stream option the WAL file generated during
> the backup is different (as compared with diff or cmp command) than that on
> the master and in WAL archive directory. Furthermore, this file does not
> exist in pg_wal/archive_status with .done extension on new slave, though it
> exists in pg_wal directory, resulting in failed attempt to archive this file
> when slave node is promoted as master node.
> 2019-02-15 14:15:58.872 PST [5369] DETAIL: The failed archive command was:
> test ! -f /mnt/pgsql/archive/000000010000000000000002 && cp
> pg_wal/000000010000000000000002
> /mnt/pgsql/archive/000000010000000000000002
How do you promote your standby? In Postgres 10, the last, partial
WAL segment of a past timeline generated at promotion is renamed
.partial to avoid any conflicts, so as this should normally not
happen if you do not use archive_mode = always.
Please note that your archive command is not safe. For one, it does
not sync the archived segment before archive_command returns to the
backend..
--
Michael