Обсуждение: BUG #6094: Streaming replication does not catch up when writing enough data

Поиск
Список
Период
Сортировка

BUG #6094: Streaming replication does not catch up when writing enough data

От
"David Hartveld"
Дата:
The following bug has been logged online:

Bug reference:      6094
Logged by:          David Hartveld
Email address:      david.hartveld@mendix.com
PostgreSQL version: 9.1-beta2
Operating system:   Debian GNU/Linux 6.0.2 "Squeeze"
Description:        Streaming replication does not catch up when writing
enough data
Details:

After creation of two new clusters, and setting them up as master and slave
(in async mode, according to the current 9.1 docs), the execution of a large
SQL script (creating a db, tables, sequences, etc., filling them with data
through COPY) runs properly on the master, but does not stream to the slave,
i.e. the slave does not catch up. In the master log, the following line is
printed many times:

2011-07-07 13:48:27 CEST LOG:  could not send data to client: Connection
reset by peer

In the slave log, the following corresponding lines are printed, for each
log line on the master:

2011-07-07 13:48:27 CEST LOG:  streaming replication successfully connected
to primary
2011-07-07 13:48:27 CEST LOG:  record with zero length at 0/51E0010
2011-07-07 13:48:27 CEST FATAL:  terminating walreceiver process due to
administrator command
cp: cannot stat `/walshipping/9.1/test/000000010000000000000005': No such
file or directory
2011-07-07 13:48:27 CEST LOG:  record with zero length at 0/51E4010
cp: cannot stat `/walshipping/9.1/test/000000010000000000000005': No such
file or directory

The 'record with zero length' line is printed many times.

I have configured the clusters with the following 'script':

EDITOR=/usr/bin/vim

MASTER=pg-db-01
SLAVE=pg-db-02
PORT=3000
VERSION=9.1
CLUSTERNAME=test

BOTH
- Create 9.1 cluster on port 3000
   # pg_createcluster -p $PORT $VERSION $CLUSTERNAME
- Add line 'host all all samenet trust' to pg_hba.conf.
   # $EDITOR /etc/postgresql/$VERSION/$CLUSTERNAME/pg_hba.conf
- Listen on all IPs: Change 'listen_addresses' to '*' in postgresql.conf.
   # $EDITOR /etc/postgresql/$VERSION/$CLUSTERNAME/postgresql.conf

MASTER
- Enable wal archiving. Set the following configuration parameters in
postgresql.conf
  (and create directory /walshipping/9.1/test, owned by postgres):
   wal_level = hot_standby
   archive_mode = on
   archive_command = 'cp -i %p /walshipping/9.1/test/%f < /dev/null'
   # $EDITOR /etc/postgresql/$VERSION/$CLUSTERNAME/postgresql.conf
- To enable streaming replication, set the following configuration
parameters in postgresql.conf:
   wal_keep_segments = 64 # * 16 MiB, 1 GiB disk space needed.
   max_wal_senders = 1 # Or some other number at least equal to the number
of standby servers.
   # $EDITOR /etc/postgresql/$VERSION/$CLUSTERNAME/postgresql.conf
- Also add line 'host replication postgres samenet trust' to pg_hba.conf
   # $EDITOR /etc/postgresql/$VERSION/$CLUSTERNAME/pg_hba.conf
- Start the cluster.
   # pg_ctlcluster $VERSION $CLUSTERNAME start
- Create a base backup for the slave.
   # psql -U postgres -h localhost -p $PORT \
        -c "SELECT pg_start_backup('base', true)"
   # rsync -a /var/lib/postgresql/$VERSION/$CLUSTERNAME/*
/pgbackup/$VERSION/$CLUSTERNAME/
   # psql -U postgres -h localhost -p $PORT \
        -c "SELECT pg_stop_backup()"
   # rm -rf /pgbackup/$VERSION/$CLUSTERNAME/{postmaster.pid,pg_xlog/*}
   # cd /pgbackup/$VERSION
   # tar jcvf $CLUSTERNAME.tar.bz2 ./$CLUSTERNAME/


SLAVE
- 'Restore' the created backup from the master.
   # cd /var/lib/postgresql/$VERSION
   # rm -rf $CLUSTERNAME.orig
   # mv -f $CLUSTERNAME $CLUSTERNAME.orig
   # tar jxvf /$CLUSTERNAME.tar.bz2
- Create recovery.conf with the following configuration parameters:
   standby_mode = 'on'
   primary_conninfo = 'host=$MASTER port=$PORT user=postgres'
   restore_command = 'cp /walshipping/$VERSION/$CLUSTERNAME/%f %p'
   # $EDITOR /var/lib/postgresql/$VERSION/$CLUSTERNAME/recovery.conf
- Start the cluster.
   # chown -R postgres.postgres $CLUSTERNAME
   # chmod 0700 $CLUSTERNAME
   # pg_ctlcluster $VERSION $CLUSTERNAME start

Re: BUG #6094: Streaming replication does not catch up when writing enough data

От
Simon Riggs
Дата:
On Thu, Jul 7, 2011 at 1:05 PM, David Hartveld
<david.hartveld@mendix.com> wrote:
>
> The following bug has been logged online:
>
> Bug reference: =A0 =A0 =A06094
> Logged by: =A0 =A0 =A0 =A0 =A0David Hartveld
> Email address: =A0 =A0 =A0david.hartveld@mendix.com
> PostgreSQL version: 9.1-beta2
> Operating system: =A0 Debian GNU/Linux 6.0.2 "Squeeze"
> Description: =A0 =A0 =A0 =A0Streaming replication does not catch up when =
writing
> enough data
> Details:
>
> After creation of two new clusters, and setting them up as master and sla=
ve
> (in async mode, according to the current 9.1 docs), the execution of a la=
rge
> SQL script (creating a db, tables, sequences, etc., filling them with data
> through COPY) runs properly on the master, but does not stream to the sla=
ve,
> i.e. the slave does not catch up. In the master log, the following line is
> printed many times:

Your output indicates that there is a problem in your replication
setup and this is why the slave does not catch up.

This is not a performance issue. It is either a bug in replication, or
a user configuration issue. Since few things have changed in 9.1 in
this area, at the moment the balance of probablity if user error. If
you can provide a more isolated bug report we may be able to
investigate.

This is being discussed in a thread on the General list and there is
no reason to post twice.

--=20
=A0Simon Riggs=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 http:/=
/www.2ndQuadrant.com/
=A0PostgreSQL Development, 24x7 Support, Training & Services