Обсуждение: Replication: slave server has 3x size of production server?

Поиск
Список
Период
Сортировка

Replication: slave server has 3x size of production server?

От
Edson Richter
Дата:
Hi!

I've a database cluster created at 9.6.10 linux x64 server rhel. I made progressive upgrades, first upgrading slave and then upgrading master.
Actually both are running 9.6.17.
Current production server has 196Gb in size.
Nevertheless, the replicated (slave) server has 598 Gb in size.
Replication server has 3x size of production server, is that normal?

Shall I drop the slave server and re-create it? How to avoid this situation in future?

Thanks,

Edson


Re: Replication: slave server has 3x size of production server?

От
Adrian Klaver
Дата:
On 2/22/20 9:25 AM, Edson Richter wrote:
> Hi!
> 
> I've a database cluster created at 9.6.10 linux x64 server rhel. I made 
> progressive upgrades, first upgrading slave and then upgrading master.
> Actually both are running 9.6.17.
> Current production server has 196Gb in size.
> Nevertheless, the replicated (slave) server has 598 Gb in size.
> Replication server has 3x size of production server, is that normal?

How are you measuring the sizes?

Where is the space being taken up on disk?

> 
> Shall I drop the slave server and re-create it? How to avoid this 
> situation in future?
> 
> Thanks,
> 
> Edson
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



RE: Replication: slave server has 3x size of production server?

От
Edson Richter
Дата:


De: Adrian Klaver <adrian.klaver@aklaver.com>
Enviado: sábado, 22 de fevereiro de 2020 14:33
Para: Edson Richter <edsonrichter@hotmail.com>; pgsql-general <pgsql-general@postgresql.org>
Assunto: Re: Replication: slave server has 3x size of production server?
 
On 2/22/20 9:25 AM, Edson Richter wrote:
> Hi!
>
> I've a database cluster created at 9.6.10 linux x64 server rhel. I made
> progressive upgrades, first upgrading slave and then upgrading master.
> Actually both are running 9.6.17.
> Current production server has 196Gb in size.
> Nevertheless, the replicated (slave) server has 598 Gb in size.
> Replication server has 3x size of production server, is that normal?

How are you measuring the sizes?


This is the command:

du --max-depth 1 -h pgDbCluster


Production:

du --max-depth 1 -h pgDbCluster

56M     pgDbCluster/pg_log
444K    pgDbCluster/global
4,0K    pgDbCluster/pg_stat
4,0K    pgDbCluster/pg_snapshots
16K     pgDbCluster/pg_logical
20K     pgDbCluster/pg_replslot
61M     pgDbCluster/pg_subtrans
4,0K    pgDbCluster/pg_commit_ts
465M    pgDbCluster/pg_xlog
4,0K    pgDbCluster/pg_twophase
12M     pgDbCluster/pg_multixact
4,0K    pgDbCluster/pg_serial
195G    pgDbCluster/base
284K    pgDbCluster/pg_stat_tmp
12M     pgDbCluster/pg_clog
4,0K    pgDbCluster/pg_dynshmem
12K     pgDbCluster/pg_notify
4,0K    pgDbCluster/pg_tblspc
196G    pgDbCluster


Slave:

du -h --max-depth 1 pgDbCluster

403G    pgDbCluster/pg_xlog
120K    pgDbCluster/pg_log
424K    pgDbCluster/global
0       pgDbCluster/pg_stat
0       pgDbCluster/pg_snapshots
4,0K    pgDbCluster/pg_logical
8,0K    pgDbCluster/pg_replslot
60M     pgDbCluster/pg_subtrans
0       pgDbCluster/pg_commit_ts
0       pgDbCluster/pg_twophase
11M     pgDbCluster/pg_multixact
0       pgDbCluster/pg_serial
195G    pgDbCluster/base
12M     pgDbCluster/pg_clog
0       pgDbCluster/pg_dynshmem
8,0K    pgDbCluster/pg_notify
12K     pgDbCluster/pg_stat_tmp
0       pgDbCluster/pg_tblspc
598G    pgDbCluster


Edson



Where is the space being taken up on disk?

>
> Shall I drop the slave server and re-create it? How to avoid this
> situation in future?
>
> Thanks,
>
> Edson
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Replication: slave server has 3x size of production server?

От
Adrian Klaver
Дата:
On 2/22/20 10:05 AM, Edson Richter wrote:
>     ------------------------------------------------------------------------
> 
>     *De:* Adrian Klaver <adrian.klaver@aklaver.com>
>     *Enviado:* sábado, 22 de fevereiro de 2020 14:33
>     *Para:* Edson Richter <edsonrichter@hotmail.com>; pgsql-general
>     <pgsql-general@postgresql.org>
>     *Assunto:* Re: Replication: slave server has 3x size of production
>     server?
>     On 2/22/20 9:25 AM, Edson Richter wrote:
>     > Hi!
>     > 
>     > I've a database cluster created at 9.6.10 linux x64 server rhel. I made 
>     > progressive upgrades, first upgrading slave and then upgrading master.
>     > Actually both are running 9.6.17.
>     > Current production server has 196Gb in size.
>     > Nevertheless, the replicated (slave) server has 598 Gb in size.
>     > Replication server has 3x size of production server, is that normal?
> 
>     How are you measuring the sizes?
> 
> 
> This is the command:
> 
> du --max-depth 1 -h pgDbCluster
> 
> 
> Production:
> 
> du --max-depth 1 -h pgDbCluster
> 
> 56M     pgDbCluster/pg_log
> 444K    pgDbCluster/global
> 4,0K    pgDbCluster/pg_stat
> 4,0K    pgDbCluster/pg_snapshots
> 16K     pgDbCluster/pg_logical
> 20K     pgDbCluster/pg_replslot
> 61M     pgDbCluster/pg_subtrans
> 4,0K    pgDbCluster/pg_commit_ts
> 465M    pgDbCluster/pg_xlog
> 4,0K    pgDbCluster/pg_twophase
> 12M     pgDbCluster/pg_multixact
> 4,0K    pgDbCluster/pg_serial
> 195G    pgDbCluster/base
> 284K    pgDbCluster/pg_stat_tmp
> 12M     pgDbCluster/pg_clog
> 4,0K    pgDbCluster/pg_dynshmem
> 12K     pgDbCluster/pg_notify
> 4,0K    pgDbCluster/pg_tblspc
> 196G    pgDbCluster
> 
> 
> Slave:
> 
> du -h --max-depth 1 pgDbCluster
> 
> 403G    pgDbCluster/pg_xlog
> 120K    pgDbCluster/pg_log
> 424K    pgDbCluster/global
> 0       pgDbCluster/pg_stat
> 0       pgDbCluster/pg_snapshots
> 4,0K    pgDbCluster/pg_logical
> 8,0K    pgDbCluster/pg_replslot
> 60M     pgDbCluster/pg_subtrans
> 0       pgDbCluster/pg_commit_ts
> 0       pgDbCluster/pg_twophase
> 11M     pgDbCluster/pg_multixact
> 0       pgDbCluster/pg_serial
> 195G    pgDbCluster/base
> 12M     pgDbCluster/pg_clog
> 0       pgDbCluster/pg_dynshmem
> 8,0K    pgDbCluster/pg_notify
> 12K     pgDbCluster/pg_stat_tmp
> 0       pgDbCluster/pg_tblspc
> 598G    pgDbCluster

So the WAL logs are not being cleared.

What replication method is being used?

What are the settings for the replication?

> 
> 
> Edson
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com



RE: Replication: slave server has 3x size of production server?

От
Edson Richter
Дата:


De: Adrian Klaver <adrian.klaver@aklaver.com>
Enviado: sábado, 22 de fevereiro de 2020 15:50
Para: Edson Richter <edsonrichter@hotmail.com>; pgsql-general <pgsql-general@postgresql.org>
Assunto: Re: Replication: slave server has 3x size of production server?
 
On 2/22/20 10:05 AM, Edson Richter wrote:
>     ------------------------------------------------------------------------
>
>     *De:* Adrian Klaver <adrian.klaver@aklaver.com>
>     *Enviado:* sábado, 22 de fevereiro de 2020 14:33
>     *Para:* Edson Richter <edsonrichter@hotmail.com>; pgsql-general
>     <pgsql-general@postgresql.org>
>     *Assunto:* Re: Replication: slave server has 3x size of production
>     server?
>     On 2/22/20 9:25 AM, Edson Richter wrote:
>     > Hi!
>     >
>     > I've a database cluster created at 9.6.10 linux x64 server rhel. I made
>     > progressive upgrades, first upgrading slave and then upgrading master.
>     > Actually both are running 9.6.17.
>     > Current production server has 196Gb in size.
>     > Nevertheless, the replicated (slave) server has 598 Gb in size.
>     > Replication server has 3x size of production server, is that normal?
>
>     How are you measuring the sizes?
>
>
> This is the command:
>
> du --max-depth 1 -h pgDbCluster
>
>
> Production:
>
> du --max-depth 1 -h pgDbCluster
>
> 56M     pgDbCluster/pg_log
> 444K    pgDbCluster/global
> 4,0K    pgDbCluster/pg_stat
> 4,0K    pgDbCluster/pg_snapshots
> 16K     pgDbCluster/pg_logical
> 20K     pgDbCluster/pg_replslot
> 61M     pgDbCluster/pg_subtrans
> 4,0K    pgDbCluster/pg_commit_ts
> 465M    pgDbCluster/pg_xlog
> 4,0K    pgDbCluster/pg_twophase
> 12M     pgDbCluster/pg_multixact
> 4,0K    pgDbCluster/pg_serial
> 195G    pgDbCluster/base
> 284K    pgDbCluster/pg_stat_tmp
> 12M     pgDbCluster/pg_clog
> 4,0K    pgDbCluster/pg_dynshmem
> 12K     pgDbCluster/pg_notify
> 4,0K    pgDbCluster/pg_tblspc
> 196G    pgDbCluster
>
>
> Slave:
>
> du -h --max-depth 1 pgDbCluster
>
> 403G    pgDbCluster/pg_xlog
> 120K    pgDbCluster/pg_log
> 424K    pgDbCluster/global
> 0       pgDbCluster/pg_stat
> 0       pgDbCluster/pg_snapshots
> 4,0K    pgDbCluster/pg_logical
> 8,0K    pgDbCluster/pg_replslot
> 60M     pgDbCluster/pg_subtrans
> 0       pgDbCluster/pg_commit_ts
> 0       pgDbCluster/pg_twophase
> 11M     pgDbCluster/pg_multixact
> 0       pgDbCluster/pg_serial
> 195G    pgDbCluster/base
> 12M     pgDbCluster/pg_clog
> 0       pgDbCluster/pg_dynshmem
> 8,0K    pgDbCluster/pg_notify
> 12K     pgDbCluster/pg_stat_tmp
> 0       pgDbCluster/pg_tblspc
> 598G    pgDbCluster

So the WAL logs are not being cleared.

What replication method is being used?

What are the settings for the replication?

Streaming replication. Initiated via pg_basebackup.

Settings on master server:

# - Sending Server(s) -
# Set these on the master and on any standby that will send replication data.
max_wal_senders = 2             # max number of walsender processes (change requires restart)
wal_keep_segments = 25          # in logfile segments, 16MB each; 0 disables
#wal_sender_timeout = 60s       # in milliseconds; 0 disables
max_replication_slots = 2       # max number of replication slots (change requires restart)
#track_commit_timestamp = off   # collect timestamp of transaction commit (change requires restart)
# - Master Server -
# These settings are ignored on a standby server.
#synchronous_standby_names = '' # standby servers that provide sync rep number of sync standbys and comma-separated list of application_name from standby(s); '*' = all
#vacuum_defer_cleanup_age = 0   # number of xacts by which cleanup is delayed



Settings on slave server:

# - Standby Servers -
# These settings are ignored on a master server.
hot_standby = on                        # "on" allows queries during recovery (change requires restart)
max_standby_archive_delay = -1          # max delay before canceling queries when reading WAL from archive; -1 allows indefinite delay
max_standby_streaming_delay = -1        # max delay before canceling queries when reading streaming WAL; -1 allows indefinite delay
wal_receiver_status_interval = 10s      # send replies at least this often 0 disables
hot_standby_feedback = on               # send info from standby to prevent query conflicts
wal_receiver_timeout = 0                # time that receiver waits for communication from master in milliseconds; 0 disables
wal_retrieve_retry_interval = 5s        # time to wait before retrying to retrieve WAL after a failed attempt


Regards,

Edson

>
>
> Edson
>

--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Replication: slave server has 3x size of production server?

От
Adrian Klaver
Дата:
On 2/22/20 11:03 AM, Edson Richter wrote:
>     ------------------------------------------------------------------------
> 

> 
> 
> Streaming replication. Initiated via pg_basebackup.
> 
> Settings on master server:
> 
> # - Sending Server(s) -
> # Set these on the master and on any standby that will send replication 
> data.
> max_wal_senders = 2             # max number of walsender processes 
> (change requires restart)
> wal_keep_segments = 25          # in logfile segments, 16MB each; 0 disables
> #wal_sender_timeout = 60s       # in milliseconds; 0 disables
> max_replication_slots = 2       # max number of replication 
> slots (change requires restart)
> #track_commit_timestamp = off   # collect timestamp of transaction 
> commit (change requires restart)
> # - Master Server -
> # These settings are ignored on a standby server.
> #synchronous_standby_names = '' # standby servers that provide sync 
> rep number of sync standbys and comma-separated list of 
> application_name from standby(s); '*' = all
> #vacuum_defer_cleanup_age = 0   # number of xacts by which cleanup is 
> delayed
> 
> 
> 
> Settings on slave server:
> 
> # - Standby Servers -
> # These settings are ignored on a master server.
> hot_standby = on                        # "on" allows queries during 
> recovery (change requires restart)
> max_standby_archive_delay = -1          # max delay before canceling 
> queries when reading WAL from archive; -1 allows indefinite delay
> max_standby_streaming_delay = -1        # max delay before canceling 
> queries when reading streaming WAL; -1 allows indefinite delay
> wal_receiver_status_interval = 10s      # send replies at least this 
> often 0 disables
> hot_standby_feedback = on               # send info from standby to 
> prevent query conflicts
> wal_receiver_timeout = 0                # time that receiver waits for 
> communication from master in milliseconds; 0 disables
> wal_retrieve_retry_interval = 5s        # time to wait before retrying 
> to retrieve WAL after a failed attempt

What are the settings for:

archive_mode
archive_command

on the standby?

Are the files in pg_xlog on the standby mostly from well in the past?



> 
> 
> Regards,
> 
> Edson
> 
>     > 
>     > 
>     > Edson
>     > 
> 
>     -- 
>     Adrian Klaver
>     adrian.klaver@aklaver.com
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



RE: Replication: slave server has 3x size of production server?

От
Edson Richter
Дата:


De: Adrian Klaver <adrian.klaver@aklaver.com>
Enviado: sábado, 22 de fevereiro de 2020 16:16
Para: Edson Richter <edsonrichter@hotmail.com>; pgsql-general <pgsql-general@postgresql.org>
Assunto: Re: Replication: slave server has 3x size of production server?
 
On 2/22/20 11:03 AM, Edson Richter wrote:
>     ------------------------------------------------------------------------
>

>
>
> Streaming replication. Initiated via pg_basebackup.
>
> Settings on master server:
>
> # - Sending Server(s) -
> # Set these on the master and on any standby that will send replication
> data.
> max_wal_senders = 2             # max number of walsender processes
> (change requires restart)
> wal_keep_segments = 25          # in logfile segments, 16MB each; 0 disables
> #wal_sender_timeout = 60s       # in milliseconds; 0 disables
> max_replication_slots = 2       # max number of replication
> slots (change requires restart)
> #track_commit_timestamp = off   # collect timestamp of transaction
> commit (change requires restart)
> # - Master Server -
> # These settings are ignored on a standby server.
> #synchronous_standby_names = '' # standby servers that provide sync
> rep number of sync standbys and comma-separated list of
> application_name from standby(s); '*' = all
> #vacuum_defer_cleanup_age = 0   # number of xacts by which cleanup is
> delayed
>
>
>
> Settings on slave server:
>
> # - Standby Servers -
> # These settings are ignored on a master server.
> hot_standby = on                        # "on" allows queries during
> recovery (change requires restart)
> max_standby_archive_delay = -1          # max delay before canceling
> queries when reading WAL from archive; -1 allows indefinite delay
> max_standby_streaming_delay = -1        # max delay before canceling
> queries when reading streaming WAL; -1 allows indefinite delay
> wal_receiver_status_interval = 10s      # send replies at least this
> often 0 disables
> hot_standby_feedback = on               # send info from standby to
> prevent query conflicts
> wal_receiver_timeout = 0                # time that receiver waits for
> communication from master in milliseconds; 0 disables
> wal_retrieve_retry_interval = 5s        # time to wait before retrying
> to retrieve WAL after a failed attempt

What are the settings for:

archive_mode
archive_command

on the standby?

Are the files in pg_xlog on the standby mostly from well in the past?

Actually, standby server is sending wals to a backup (barman) server:

archive_mode = always           # enables archiving; off, on, or always (change requires restart)
archive_command = 'rsync -e "ssh -2 -C -p 2022" -az %p barman@192.168.0.2:/dados/barman/dbcluster/incoming/%f'


The files are about 7 months old.


Thanks,

Edson

>
>
> Regards,
>
> Edson
>
>     >
>     >
>     > Edson
>     >
>
>     --
>     Adrian Klaver
>     adrian.klaver@aklaver.com
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Replication: slave server has 3x size of production server?

От
Adrian Klaver
Дата:
On 2/22/20 11:23 AM, Edson Richter wrote:
>     ------------------------------------------------------------------------
> 
>     *De:* Adrian Klaver <adrian.klaver@aklaver.com>
>     *Enviado:* sábado, 22 de fevereiro de 2020 16:16
>     *Para:* Edson Richter <edsonrichter@hotmail.com>; pgsql-general
>     <pgsql-general@postgresql.org>
>     *Assunto:* Re: Replication: slave server has 3x size of production
>     server?
>     On 2/22/20 11:03 AM, Edson Richter wrote:
>     >     ------------------------------------------------------------------------
>     > 
> 
>     > 
>     > 
>     > Streaming replication. Initiated via pg_basebackup.
>     > 
>     > Settings on master server:
>     > 
>     > # - Sending Server(s) -
>     > # Set these on the master and on any standby that will send replication 
>     > data.
>     > max_wal_senders = 2             # max number of walsender processes 
>     > (change requires restart)
>     > wal_keep_segments = 25          # in logfile segments, 16MB each; 0 disables
>     > #wal_sender_timeout = 60s       # in milliseconds; 0 disables
>     > max_replication_slots = 2       # max number of replication 
>     > slots (change requires restart)
>     > #track_commit_timestamp = off   # collect timestamp of transaction 
>     > commit (change requires restart)
>     > # - Master Server -
>     > # These settings are ignored on a standby server.
>     > #synchronous_standby_names = '' # standby servers that provide sync 
>     > rep number of sync standbys and comma-separated list of 
>     > application_name from standby(s); '*' = all
>     > #vacuum_defer_cleanup_age = 0   # number of xacts by which cleanup is 
>     > delayed
>     > 
>     > 
>     > 
>     > Settings on slave server:
>     > 
>     > # - Standby Servers -
>     > # These settings are ignored on a master server.
>     > hot_standby = on                        # "on" allows queries during 
>     > recovery (change requires restart)
>     > max_standby_archive_delay = -1          # max delay before canceling 
>     > queries when reading WAL from archive; -1 allows indefinite delay
>     > max_standby_streaming_delay = -1        # max delay before canceling 
>     > queries when reading streaming WAL; -1 allows indefinite delay
>     > wal_receiver_status_interval = 10s      # send replies at least this 
>     > often 0 disables
>     > hot_standby_feedback = on               # send info from standby to 
>     > prevent query conflicts
>     > wal_receiver_timeout = 0                # time that receiver waits for 
>     > communication from master in milliseconds; 0 disables
>     > wal_retrieve_retry_interval = 5s        # time to wait before retrying 
>     > to retrieve WAL after a failed attempt
> 
>     What are the settings for:
> 
>     archive_mode
>     archive_command
> 
>     on the standby?
> 
>     Are the files in pg_xlog on the standby mostly from well in the past?
> 
> 
> Actually, standby server is sending wals to a backup (barman) server:
> 
> archive_mode = always           # enables archiving; off, on, or always 
> (change requires restart)
> archive_command = 'rsync -e "ssh -2 -C -p 2022" -az %p 
> barman@192.168.0.2:/dados/barman/dbcluster/incoming/%f'

And the above is working, the files are showing up on the barman server?


> 
> 
> The files are about 7 months old.

Are there newer files that would indicate that the streaming is working?

> 
> 
> Thanks,
> 
> Edson
> 
>     > 
>     > 
>     > Regards,
>     > 
>     > Edson
>     > 
>     >     > 
>     >     > 
>     >     > Edson
>     >     > 
>     > 
>     >     -- 
>     >     Adrian Klaver
>     >     adrian.klaver@aklaver.com
>     > 
> 
> 
>     -- 
>     Adrian Klaver
>     adrian.klaver@aklaver.com
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



RE: Replication: slave server has 3x size of production server?

От
Edson Richter
Дата:


De: Adrian Klaver <adrian.klaver@aklaver.com>
Enviado: sábado, 22 de fevereiro de 2020 18:12
Para: Edson Richter <edsonrichter@hotmail.com>; pgsql-general <pgsql-general@postgresql.org>
Assunto: Re: Replication: slave server has 3x size of production server?
 
On 2/22/20 11:23 AM, Edson Richter wrote:
>     ------------------------------------------------------------------------
>
>     *De:* Adrian Klaver <adrian.klaver@aklaver.com>
>     *Enviado:* sábado, 22 de fevereiro de 2020 16:16
>     *Para:* Edson Richter <edsonrichter@hotmail.com>; pgsql-general
>     <pgsql-general@postgresql.org>
>     *Assunto:* Re: Replication: slave server has 3x size of production
>     server?
>     On 2/22/20 11:03 AM, Edson Richter wrote:
>     >     ------------------------------------------------------------------------
>     >
>
>     >
>     >
>     > Streaming replication. Initiated via pg_basebackup.
>     >
>     > Settings on master server:
>     >
>     > # - Sending Server(s) -
>     > # Set these on the master and on any standby that will send replication
>     > data.
>     > max_wal_senders = 2             # max number of walsender processes
>     > (change requires restart)
>     > wal_keep_segments = 25          # in logfile segments, 16MB each; 0 disables
>     > #wal_sender_timeout = 60s       # in milliseconds; 0 disables
>     > max_replication_slots = 2       # max number of replication
>     > slots (change requires restart)
>     > #track_commit_timestamp = off   # collect timestamp of transaction
>     > commit (change requires restart)
>     > # - Master Server -
>     > # These settings are ignored on a standby server.
>     > #synchronous_standby_names = '' # standby servers that provide sync
>     > rep number of sync standbys and comma-separated list of
>     > application_name from standby(s); '*' = all
>     > #vacuum_defer_cleanup_age = 0   # number of xacts by which cleanup is
>     > delayed
>     >
>     >
>     >
>     > Settings on slave server:
>     >
>     > # - Standby Servers -
>     > # These settings are ignored on a master server.
>     > hot_standby = on                        # "on" allows queries during
>     > recovery (change requires restart)
>     > max_standby_archive_delay = -1          # max delay before canceling
>     > queries when reading WAL from archive; -1 allows indefinite delay
>     > max_standby_streaming_delay = -1        # max delay before canceling
>     > queries when reading streaming WAL; -1 allows indefinite delay
>     > wal_receiver_status_interval = 10s      # send replies at least this
>     > often 0 disables
>     > hot_standby_feedback = on               # send info from standby to
>     > prevent query conflicts
>     > wal_receiver_timeout = 0                # time that receiver waits for
>     > communication from master in milliseconds; 0 disables
>     > wal_retrieve_retry_interval = 5s        # time to wait before retrying
>     > to retrieve WAL after a failed attempt
>
>     What are the settings for:
>
>     archive_mode
>     archive_command
>
>     on the standby?
>
>     Are the files in pg_xlog on the standby mostly from well in the past?
>
>
> Actually, standby server is sending wals to a backup (barman) server:
>
> archive_mode = always           # enables archiving; off, on, or always
> (change requires restart)
> archive_command = 'rsync -e "ssh -2 -C -p 2022" -az %p
> barman@192.168.0.2:/dados/barman/dbcluster/incoming/%f'

And the above is working, the files are showing up on the barman server?

Yes, it is working. Last X'log file is present on all thee servers.
Also, comparting last transaction number on master and slave shows that all are in sync.
Last, but not least, select max(id) from a busy table shows same id (when queried almost simultaneously using a simple test routine).

>
>
> The files are about 7 months old.

Are there newer files that would indicate that the streaming is working?

Yes, streaming is working properly (as stated above).

Thanks,


Edson Richter


>
>
> Thanks,
>
> Edson
>
>     >
>     >
>     > Regards,
>     >
>     > Edson
>     >
>     >     >
>     >     >
>     >     > Edson
>     >     >
>     >
>     >     --
>     >     Adrian Klaver
>     >     adrian.klaver@aklaver.com
>     >
>
>
>     --
>     Adrian Klaver
>     adrian.klaver@aklaver.com
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Replication: slave server has 3x size of production server?

От
Adrian Klaver
Дата:
On 2/22/20 2:51 PM, Edson Richter wrote:

> 
> Yes, it is working. Last X'log file is present on all thee servers.
> Also, comparting last transaction number on master and slave shows that 
> all are in sync.
> Last, but not least, select max(id) from a busy table shows same id 
> (when queried almost simultaneously using a simple test routine).

Well something is keeping those WAL file around. You probably should 
analyze your complete setup to see what else is touching those servers.

> 
>     > 
>     > 
>     > The files are about 7 months old.
> 
>     Are there newer files that would indicate that the streaming is working?
> 
> 
> Yes, streaming is working properly (as stated above).
> 
> Thanks,
> 
> 
> Edson Richter
> 
> 
>> 



-- 
Adrian Klaver
adrian.klaver@aklaver.com



RE: Replication: slave server has 3x size of production server?

От
Edson Richter
Дата:


De: Adrian Klaver <adrian.klaver@aklaver.com>
Enviado: sábado, 22 de fevereiro de 2020 20:34
Para: Edson Richter <edsonrichter@hotmail.com>; pgsql-general <pgsql-general@postgresql.org>
Assunto: Re: Replication: slave server has 3x size of production server?
 
On 2/22/20 2:51 PM, Edson Richter wrote:

>
> Yes, it is working. Last X'log file is present on all thee servers.
> Also, comparting last transaction number on master and slave shows that
> all are in sync.
> Last, but not least, select max(id) from a busy table shows same id
> (when queried almost simultaneously using a simple test routine).

Well something is keeping those WAL file around. You probably should
analyze your complete setup to see what else is touching those servers.

It is safe to add a "--remove-source-files" into my archive_command as folows into my slave server?


archive_command = 'rsync --remove-source-files -e "ssh -2 -C -p 2022" -az %p barman@192.168.0.2:/dados/barman/dbcluster/incoming/%f'


and remove the xlog file after copy to barman?
I mean, whem the archive command starts, the wal has been already processed by the slave server, so we don't need them after copying to backup server, right?


Regards,

Edson

>
>     >
>     >
>     > The files are about 7 months old.
>
>     Are there newer files that would indicate that the streaming is working?
>
>
> Yes, streaming is working properly (as stated above).
>
> Thanks,
>
>
> Edson Richter
>
>
>>



--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Replication: slave server has 3x size of production server?

От
Adrian Klaver
Дата:
On 2/23/20 8:04 AM, Edson Richter wrote:
>     ------------------------------------------------------------------------
> 
>     *De:* Adrian Klaver <adrian.klaver@aklaver.com>
>     *Enviado:* sábado, 22 de fevereiro de 2020 20:34
>     *Para:* Edson Richter <edsonrichter@hotmail.com>; pgsql-general
>     <pgsql-general@postgresql.org>
>     *Assunto:* Re: Replication: slave server has 3x size of production
>     server?
>     On 2/22/20 2:51 PM, Edson Richter wrote:
> 
>     > 
>     > Yes, it is working. Last X'log file is present on all thee servers.
>     > Also, comparting last transaction number on master and slave shows that 
>     > all are in sync.
>     > Last, but not least, select max(id) from a busy table shows same id 
>     > (when queried almost simultaneously using a simple test routine).
> 
>     Well something is keeping those WAL file around. You probably should
>     analyze your complete setup to see what else is touching those servers.
> 
> 
> It is safe to add a "--remove-source-files" into my archive_command as 
> folows into my slave server?

I would say not. See:

https://www.postgresql.org/docs/12/wal-configuration.html

"Checkpoints are points in the sequence of transactions at which it is 
guaranteed that the heap and index data files have been updated with all 
information written before that checkpoint. At checkpoint time, all 
dirty data pages are flushed to disk and a special checkpoint record is 
written to the log file. (The change records were previously flushed to 
the WAL files.) In the event of a crash, the crash recovery procedure 
looks at the latest checkpoint record to determine the point in the log 
(known as the redo record) from which it should start the REDO 
operation. Any changes made to data files before that point are 
guaranteed to be already on disk. Hence, after a checkpoint, log 
segments preceding the one containing the redo record are no longer 
needed and can be recycled or removed. (When WAL archiving is being 
done, the log segments must be archived before being recycled or removed.)"

So there is a window where a WAL is written but before the data it 
represents is check pointed, so it still needed.

> 
> 
> archive_command = 'rsync --remove-source-files -e "ssh -2 -C -p 2022" 
> -az %p barman@192.168.0.2:/dados/barman/dbcluster/incoming/%f'
> 
> 
> and remove the xlog file after copy to barman?
> I mean, whem the archive command starts, the wal has been already 
> processed by the slave server, so we don't need them after copying to 
> backup server, right?
> 
> 
> Regards,
> 
> Edson
> 
>     > 
>     >     > 
>     >     > 
>     >     > The files are about 7 months old.
>     > 
>     >     Are there newer files that would indicate that the streaming is working?
>     > 
>     > 
>     > Yes, streaming is working properly (as stated above).
>     > 
>     > Thanks,
>     > 
>     > 
>     > Edson Richter
>     > 
>     > 
>     >> 
> 
> 
> 
>     -- 
>     Adrian Klaver
>     adrian.klaver@aklaver.com
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



RE: Replication: slave server has 3x size of production server?

От
Edson Richter
Дата:


De: Adrian Klaver <adrian.klaver@aklaver.com>
Enviado: domingo, 23 de fevereiro de 2020 15:42
Para: Edson Richter <edsonrichter@hotmail.com>; pgsql-general <pgsql-general@postgresql.org>
Assunto: Re: Replication: slave server has 3x size of production server?
 
On 2/23/20 8:04 AM, Edson Richter wrote:
>     ------------------------------------------------------------------------
>
>     *De:* Adrian Klaver <adrian.klaver@aklaver.com>
>     *Enviado:* sábado, 22 de fevereiro de 2020 20:34
>     *Para:* Edson Richter <edsonrichter@hotmail.com>; pgsql-general
>     <pgsql-general@postgresql.org>
>     *Assunto:* Re: Replication: slave server has 3x size of production
>     server?
>     On 2/22/20 2:51 PM, Edson Richter wrote:
>
>     >
>     > Yes, it is working. Last X'log file is present on all thee servers.
>     > Also, comparting last transaction number on master and slave shows that
>     > all are in sync.
>     > Last, but not least, select max(id) from a busy table shows same id
>     > (when queried almost simultaneously using a simple test routine).
>
>     Well something is keeping those WAL file around. You probably should
>     analyze your complete setup to see what else is touching those servers.
>
>
> It is safe to add a "--remove-source-files" into my archive_command as
> folows into my slave server?

I would say not. See:

https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fdocs%2F12%2Fwal-configuration.html&amp;data=02%7C01%7C%7Cb49e9c01f11a4b9fe4d108d7b8902bd2%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637180801653706393&amp;sdata=QY24D6R%2B%2BJ7VgnctERdK964reKEp7XbxERiXGC2XL8Y%3D&amp;reserved=0

"Checkpoints are points in the sequence of transactions at which it is
guaranteed that the heap and index data files have been updated with all
information written before that checkpoint. At checkpoint time, all
dirty data pages are flushed to disk and a special checkpoint record is
written to the log file. (The change records were previously flushed to
the WAL files.) In the event of a crash, the crash recovery procedure
looks at the latest checkpoint record to determine the point in the log
(known as the redo record) from which it should start the REDO
operation. Any changes made to data files before that point are
guaranteed to be already on disk. Hence, after a checkpoint, log
segments preceding the one containing the redo record are no longer
needed and can be recycled or removed. (When WAL archiving is being
done, the log segments must be archived before being recycled or removed.)"

So there is a window where a WAL is written but before the data it
represents is check pointed, so it still needed.

I see. Makes sense.
I suppose that long lifed xlog files are of no use then... I would expect PostgreSQL delete them automatically.
Perhaps, since I have full backups happening every odd days, I can create a "post backup command" in barman script so it will delete files above 1 week from the server it is backup up from...
I understand there is no guarantee that these files have already been processed... but if they are needed, they can be recovered from the barman server...

Thanks,

Edson

>
>
> archive_command = 'rsync --remove-source-files -e "ssh -2 -C -p 2022"
> -az %p barman@192.168.0.2:/dados/barman/dbcluster/incoming/%f'
>
>
> and remove the xlog file after copy to barman?
> I mean, whem the archive command starts, the wal has been already
> processed by the slave server, so we don't need them after copying to
> backup server, right?
>
>
> Regards,
>
> Edson
>
>     >
>     >     >
>     >     >
>     >     > The files are about 7 months old.
>     >
>     >     Are there newer files that would indicate that the streaming is working?
>     >
>     >
>     > Yes, streaming is working properly (as stated above).
>     >
>     > Thanks,
>     >
>     >
>     > Edson Richter
>     >
>     >
>     >>
>
>
>
>     --
>     Adrian Klaver
>     adrian.klaver@aklaver.com
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Replication: slave server has 3x size of production server?

От
Jehan-Guillaume de Rorthais
Дата:
On Sat, 22 Feb 2020 19:23:05 +0000
Edson Richter <edsonrichter@hotmail.com> wrote:
[...]
> Actually, standby server is sending wals to a backup (barman) server:
> 
> archive_mode = always           # enables archiving; off, on, or always
> (change requires restart) archive_command = 'rsync -e "ssh -2 -C -p 2022" -az
> %p barman@192.168.0.2:/dados/barman/dbcluster/incoming/%f'
> 
> 
> The files are about 7 months old.

Did you check the return code of your archive_command? 

Did you check the log produced by your archive_command and postmaster?

How many files with ".ready" extension in "$PGDATA/pg_xlog/archive_status/"?

Can you confirm there's no missing WAL between the older one and
the newer one in "$PGDATA/pg_xlog" in alphanum order?