Обсуждение: Problems with Streaming Replication and PostgreSQL 9.2.2

Поиск
Список
Период
Сортировка

Problems with Streaming Replication and PostgreSQL 9.2.2

От
JotaComm
Дата:
Hello, everybody

I have one problem and I need some help.

My environment: one master and one slave (PostgreSQL 9.2.2).

My cluster has about 160GB and pg_basebackup to syncronize them (master and slave).

The syntax is:

pg_basebackup -h productionaddress -p productionport -U productionuser -D datadirectory -P -v

My recovery.conf:

standby_mode = 'on'
 
primary_conninfo = 'host=productionaddress port=productionport user=productionuser'

archive_cleanup_command = 'pg_archivecleanup /slave/transactionlogs %r'

My postgresql.conf: (master)

wal_level = hot_standby

checkpoint_segments = 10

archive_mode = on

archive_command = 'rsync -Crap %p postgres@slaveaddress:/slave/transactionlogs/%f'

max_wal_senders = 1

wal_keep_segments = 50

My postgresql.conf: (slave)

checkpoint_segments = 10

hot_standby = on

In my slave (logs) I have the following erros: My first attempt

2013-03-07 15:58:21 BRT [11817]: [1-1] user=,db= LOG:  database system was interrupted; last known up at 2013-03-07 15:55:43 BRT
2013-03-07 15:58:21 BRT [11817]: [2-1] user=,db= LOG:  entering standby mode
2013-03-07 15:58:21 BRT [11818]: [1-1] user=,db= LOG:  streaming replication successfully connected to primary
2013-03-07 15:58:25 BRT [11817]: [3-1] user=,db= LOG:  consistent recovery state reached at 141/8FBB5F0
2013-03-07 15:58:25 BRT [11817]: [4-1] user=,db= LOG:  redo starts at 141/2251F90
2013-03-07 15:58:25 BRT [11817]: [5-1] user=,db= FATAL:  could not access status of transaction 30622931
2013-03-07 15:58:25 BRT [11817]: [6-1] user=,db= DETAIL:  Could not read from file "pg_clog/001D" at offset 49152: Success.
2013-03-07 15:58:25 BRT [11817]: [7-1] user=,db= CONTEXT:  xlog redo commit: 2013-03-07 15:55:40.673623-03
2013-03-07 15:58:25 BRT [11767]: [1-1] user=,db= LOG:  startup process (PID 11817) exited with exit code 1
2013-03-07 15:58:25 BRT [11767]: [2-1] user=,db= LOG:  terminating any other active server processes

In my slave (logs) I have the following erros: My second attempt

2013-03-11 12:07:49 BRT [5862]: [1-1] user=,db= LOG:  database system was interrupted; last known up at 2013-03-11 12:06:31 BRT
2013-03-11 12:07:49 BRT [5862]: [2-1] user=,db= LOG:  entering standby mode
2013-03-11 12:07:49 BRT [5864]: [1-1] user=,db= LOG:  streaming replication successfully connected to primary
2013-03-11 12:07:53 BRT [5862]: [3-1] user=,db= LOG:  consistent recovery state reached at 168/816AE10
2013-03-11 12:07:53 BRT [5862]: [4-1] user=,db= LOG:  redo starts at 167/FEC3D828
2013-03-11 12:07:53 BRT [5862]: [5-1] user=,db= FATAL:  could not access status of transaction 36529670
2013-03-11 12:07:53 BRT [5862]: [6-1] user=,db= DETAIL:  Could not read from file "pg_clog/0022" at offset 212992: Success.
2013-03-11 12:07:53 BRT [5862]: [7-1] user=,db= CONTEXT:  xlog redo commit: 2013-03-11 12:05:35.069759-03
2013-03-11 12:07:53 BRT [5762]: [1-1] user=,db= LOG:  startup process (PID 5862) exited with exit code 1
2013-03-11 12:07:53 BRT [5762]: [2-1] user=,db= LOG:  terminating any other active server processes

I had the same problem, but in different files (pg_clog):

First attempt:

2013-03-07 15:58:25 BRT [11817]: [5-1] user=,db= FATAL:  could not access status of transaction 30622931
2013-03-07 15:58:25 BRT [11817]: [6-1] user=,db= DETAIL:  Could not read from file "pg_clog/001D" at offset 49152: Success.

Second attempt:

2013-03-11 12:07:53 BRT [5862]: [5-1] user=,db= FATAL:  could not access status of transaction 36529670
2013-03-11 12:07:53 BRT [5862]: [6-1] user=,db= DETAIL:  Could not read from file "pg_clog/0022" at offset 212992: Success.

When I started the synchronism at the first time (no data), it was OK. Now, my cluster has about 160GB and when the replication finishes I have these problems. I noticed that pg_clog/current_file is the problem.

Any idea? Suggestions?

Thank you.

Best Regards

João Paulo
--
JotaComm
http://jotacomm.wordpress.com

Re: Problems with Streaming Replication and PostgreSQL 9.2.2

От
Lonni J Friedman
Дата:
If you're not running pg_basebackup with the -x option, how are you
getting the WAL files onto the slave?

On Mon, Apr 1, 2013 at 11:19 AM, JotaComm <jota.comm@gmail.com> wrote:
> Hello, everybody
>
> I have one problem and I need some help.
>
> My environment: one master and one slave (PostgreSQL 9.2.2).
>
> My cluster has about 160GB and pg_basebackup to syncronize them (master and
> slave).
>
> The syntax is:
>
> pg_basebackup -h productionaddress -p productionport -U productionuser -D
> datadirectory -P -v
>
> My recovery.conf:
>
> standby_mode = 'on'
>
> primary_conninfo = 'host=productionaddress port=productionport
> user=productionuser'
>
> archive_cleanup_command = 'pg_archivecleanup /slave/transactionlogs %r'
>
> My postgresql.conf: (master)
>
> wal_level = hot_standby
>
> checkpoint_segments = 10
>
> archive_mode = on
>
> archive_command = 'rsync -Crap %p
> postgres@slaveaddress:/slave/transactionlogs/%f'
>
> max_wal_senders = 1
>
> wal_keep_segments = 50
>
> My postgresql.conf: (slave)
>
> checkpoint_segments = 10
>
> hot_standby = on
>
> In my slave (logs) I have the following erros: My first attempt
>
> 2013-03-07 15:58:21 BRT [11817]: [1-1] user=,db= LOG:  database system was
> interrupted; last known up at 2013-03-07 15:55:43 BRT
> 2013-03-07 15:58:21 BRT [11817]: [2-1] user=,db= LOG:  entering standby mode
> 2013-03-07 15:58:21 BRT [11818]: [1-1] user=,db= LOG:  streaming replication
> successfully connected to primary
> 2013-03-07 15:58:25 BRT [11817]: [3-1] user=,db= LOG:  consistent recovery
> state reached at 141/8FBB5F0
> 2013-03-07 15:58:25 BRT [11817]: [4-1] user=,db= LOG:  redo starts at
> 141/2251F90
> 2013-03-07 15:58:25 BRT [11817]: [5-1] user=,db= FATAL:  could not access
> status of transaction 30622931
> 2013-03-07 15:58:25 BRT [11817]: [6-1] user=,db= DETAIL:  Could not read
> from file "pg_clog/001D" at offset 49152: Success.
> 2013-03-07 15:58:25 BRT [11817]: [7-1] user=,db= CONTEXT:  xlog redo commit:
> 2013-03-07 15:55:40.673623-03
> 2013-03-07 15:58:25 BRT [11767]: [1-1] user=,db= LOG:  startup process (PID
> 11817) exited with exit code 1
> 2013-03-07 15:58:25 BRT [11767]: [2-1] user=,db= LOG:  terminating any other
> active server processes
>
> In my slave (logs) I have the following erros: My second attempt
>
> 2013-03-11 12:07:49 BRT [5862]: [1-1] user=,db= LOG:  database system was
> interrupted; last known up at 2013-03-11 12:06:31 BRT
> 2013-03-11 12:07:49 BRT [5862]: [2-1] user=,db= LOG:  entering standby mode
> 2013-03-11 12:07:49 BRT [5864]: [1-1] user=,db= LOG:  streaming replication
> successfully connected to primary
> 2013-03-11 12:07:53 BRT [5862]: [3-1] user=,db= LOG:  consistent recovery
> state reached at 168/816AE10
> 2013-03-11 12:07:53 BRT [5862]: [4-1] user=,db= LOG:  redo starts at
> 167/FEC3D828
> 2013-03-11 12:07:53 BRT [5862]: [5-1] user=,db= FATAL:  could not access
> status of transaction 36529670
> 2013-03-11 12:07:53 BRT [5862]: [6-1] user=,db= DETAIL:  Could not read from
> file "pg_clog/0022" at offset 212992: Success.
> 2013-03-11 12:07:53 BRT [5862]: [7-1] user=,db= CONTEXT:  xlog redo commit:
> 2013-03-11 12:05:35.069759-03
> 2013-03-11 12:07:53 BRT [5762]: [1-1] user=,db= LOG:  startup process (PID
> 5862) exited with exit code 1
> 2013-03-11 12:07:53 BRT [5762]: [2-1] user=,db= LOG:  terminating any other
> active server processes
>
> I had the same problem, but in different files (pg_clog):
>
> First attempt:
>
> 2013-03-07 15:58:25 BRT [11817]: [5-1] user=,db= FATAL:  could not access
> status of transaction 30622931
> 2013-03-07 15:58:25 BRT [11817]: [6-1] user=,db= DETAIL:  Could not read
> from file "pg_clog/001D" at offset 49152: Success.
>
> Second attempt:
>
> 2013-03-11 12:07:53 BRT [5862]: [5-1] user=,db= FATAL:  could not access
> status of transaction 36529670
> 2013-03-11 12:07:53 BRT [5862]: [6-1] user=,db= DETAIL:  Could not read from
> file "pg_clog/0022" at offset 212992: Success.
>
> When I started the synchronism at the first time (no data), it was OK. Now,
> my cluster has about 160GB and when the replication finishes I have these
> problems. I noticed that pg_clog/current_file is the problem.
>
> Any idea? Suggestions?
>
> Thank you.
>
> Best Regards
>
> João Paulo
> --
> JotaComm
> http://jotacomm.wordpress.com



--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman                                    netllama@gmail.com
LlamaLand                       https://netllama.linux-sxs.org


Re: Problems with Streaming Replication and PostgreSQL 9.2.2

От
JotaComm
Дата:
Hello, Lonni

I use rsync to transfer the files (pg_xlog) from master to slave.

rsync -Crap %p postgres@slaveaddress:/slavetransactionlogs/%f

In my recovery.conf (slave) I have the following statement:

restore_command = 'test -f $slavetransactionlogs/%f && cp $slavetransactionlogs/%f %p'

Best Regards



2013/4/1 Lonni J Friedman <netllama@gmail.com>
If you're not running pg_basebackup with the -x option, how are you
getting the WAL files onto the slave?

On Mon, Apr 1, 2013 at 11:19 AM, JotaComm <jota.comm@gmail.com> wrote:
> Hello, everybody
>
> I have one problem and I need some help.
>
> My environment: one master and one slave (PostgreSQL 9.2.2).
>
> My cluster has about 160GB and pg_basebackup to syncronize them (master and
> slave).
>
> The syntax is:
>
> pg_basebackup -h productionaddress -p productionport -U productionuser -D
> datadirectory -P -v
>
> My recovery.conf:
>
> standby_mode = 'on'
>
> primary_conninfo = 'host=productionaddress port=productionport
> user=productionuser'
>
> archive_cleanup_command = 'pg_archivecleanup /slave/transactionlogs %r'
>
> My postgresql.conf: (master)
>
> wal_level = hot_standby
>
> checkpoint_segments = 10
>
> archive_mode = on
>
> archive_command = 'rsync -Crap %p
> postgres@slaveaddress:/slave/transactionlogs/%f'
>
> max_wal_senders = 1
>
> wal_keep_segments = 50
>
> My postgresql.conf: (slave)
>
> checkpoint_segments = 10
>
> hot_standby = on
>
> In my slave (logs) I have the following erros: My first attempt
>
> 2013-03-07 15:58:21 BRT [11817]: [1-1] user=,db= LOG:  database system was
> interrupted; last known up at 2013-03-07 15:55:43 BRT
> 2013-03-07 15:58:21 BRT [11817]: [2-1] user=,db= LOG:  entering standby mode
> 2013-03-07 15:58:21 BRT [11818]: [1-1] user=,db= LOG:  streaming replication
> successfully connected to primary
> 2013-03-07 15:58:25 BRT [11817]: [3-1] user=,db= LOG:  consistent recovery
> state reached at 141/8FBB5F0
> 2013-03-07 15:58:25 BRT [11817]: [4-1] user=,db= LOG:  redo starts at
> 141/2251F90
> 2013-03-07 15:58:25 BRT [11817]: [5-1] user=,db= FATAL:  could not access
> status of transaction 30622931
> 2013-03-07 15:58:25 BRT [11817]: [6-1] user=,db= DETAIL:  Could not read
> from file "pg_clog/001D" at offset 49152: Success.
> 2013-03-07 15:58:25 BRT [11817]: [7-1] user=,db= CONTEXT:  xlog redo commit:
> 2013-03-07 15:55:40.673623-03
> 2013-03-07 15:58:25 BRT [11767]: [1-1] user=,db= LOG:  startup process (PID
> 11817) exited with exit code 1
> 2013-03-07 15:58:25 BRT [11767]: [2-1] user=,db= LOG:  terminating any other
> active server processes
>
> In my slave (logs) I have the following erros: My second attempt
>
> 2013-03-11 12:07:49 BRT [5862]: [1-1] user=,db= LOG:  database system was
> interrupted; last known up at 2013-03-11 12:06:31 BRT
> 2013-03-11 12:07:49 BRT [5862]: [2-1] user=,db= LOG:  entering standby mode
> 2013-03-11 12:07:49 BRT [5864]: [1-1] user=,db= LOG:  streaming replication
> successfully connected to primary
> 2013-03-11 12:07:53 BRT [5862]: [3-1] user=,db= LOG:  consistent recovery
> state reached at 168/816AE10
> 2013-03-11 12:07:53 BRT [5862]: [4-1] user=,db= LOG:  redo starts at
> 167/FEC3D828
> 2013-03-11 12:07:53 BRT [5862]: [5-1] user=,db= FATAL:  could not access
> status of transaction 36529670
> 2013-03-11 12:07:53 BRT [5862]: [6-1] user=,db= DETAIL:  Could not read from
> file "pg_clog/0022" at offset 212992: Success.
> 2013-03-11 12:07:53 BRT [5862]: [7-1] user=,db= CONTEXT:  xlog redo commit:
> 2013-03-11 12:05:35.069759-03
> 2013-03-11 12:07:53 BRT [5762]: [1-1] user=,db= LOG:  startup process (PID
> 5862) exited with exit code 1
> 2013-03-11 12:07:53 BRT [5762]: [2-1] user=,db= LOG:  terminating any other
> active server processes
>
> I had the same problem, but in different files (pg_clog):
>
> First attempt:
>
> 2013-03-07 15:58:25 BRT [11817]: [5-1] user=,db= FATAL:  could not access
> status of transaction 30622931
> 2013-03-07 15:58:25 BRT [11817]: [6-1] user=,db= DETAIL:  Could not read
> from file "pg_clog/001D" at offset 49152: Success.
>
> Second attempt:
>
> 2013-03-11 12:07:53 BRT [5862]: [5-1] user=,db= FATAL:  could not access
> status of transaction 36529670
> 2013-03-11 12:07:53 BRT [5862]: [6-1] user=,db= DETAIL:  Could not read from
> file "pg_clog/0022" at offset 212992: Success.
>
> When I started the synchronism at the first time (no data), it was OK. Now,
> my cluster has about 160GB and when the replication finishes I have these
> problems. I noticed that pg_clog/current_file is the problem.
>
> Any idea? Suggestions?
>
> Thank you.
>
> Best Regards
>
> João Paulo
> --
> JotaComm
> http://jotacomm.wordpress.com



--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman                                    netllama@gmail.com
LlamaLand                       https://netllama.linux-sxs.org



--
JotaComm
http://jotacomm.wordpress.com

Re: Problems with Streaming Replication and PostgreSQL 9.2.2

От
Lonni J Friedman
Дата:
I'd be curious whether your problem persisted if you ran pg_basebackup
with the -x option.

On Mon, Apr 1, 2013 at 11:30 AM, JotaComm <jota.comm@gmail.com> wrote:
> Hello, Lonni
>
> I use rsync to transfer the files (pg_xlog) from master to slave.
>
> rsync -Crap %p postgres@slaveaddress:/slavetransactionlogs/%f
>
> In my recovery.conf (slave) I have the following statement:
>
> restore_command = 'test -f $slavetransactionlogs/%f && cp
> $slavetransactionlogs/%f %p'
>
> Best Regards
>
>
>
> 2013/4/1 Lonni J Friedman <netllama@gmail.com>
>>
>> If you're not running pg_basebackup with the -x option, how are you
>> getting the WAL files onto the slave?
>>
>> On Mon, Apr 1, 2013 at 11:19 AM, JotaComm <jota.comm@gmail.com> wrote:
>> > Hello, everybody
>> >
>> > I have one problem and I need some help.
>> >
>> > My environment: one master and one slave (PostgreSQL 9.2.2).
>> >
>> > My cluster has about 160GB and pg_basebackup to syncronize them (master
>> > and
>> > slave).
>> >
>> > The syntax is:
>> >
>> > pg_basebackup -h productionaddress -p productionport -U productionuser
>> > -D
>> > datadirectory -P -v
>> >
>> > My recovery.conf:
>> >
>> > standby_mode = 'on'
>> >
>> > primary_conninfo = 'host=productionaddress port=productionport
>> > user=productionuser'
>> >
>> > archive_cleanup_command = 'pg_archivecleanup /slave/transactionlogs %r'
>> >
>> > My postgresql.conf: (master)
>> >
>> > wal_level = hot_standby
>> >
>> > checkpoint_segments = 10
>> >
>> > archive_mode = on
>> >
>> > archive_command = 'rsync -Crap %p
>> > postgres@slaveaddress:/slave/transactionlogs/%f'
>> >
>> > max_wal_senders = 1
>> >
>> > wal_keep_segments = 50
>> >
>> > My postgresql.conf: (slave)
>> >
>> > checkpoint_segments = 10
>> >
>> > hot_standby = on
>> >
>> > In my slave (logs) I have the following erros: My first attempt
>> >
>> > 2013-03-07 15:58:21 BRT [11817]: [1-1] user=,db= LOG:  database system
>> > was
>> > interrupted; last known up at 2013-03-07 15:55:43 BRT
>> > 2013-03-07 15:58:21 BRT [11817]: [2-1] user=,db= LOG:  entering standby
>> > mode
>> > 2013-03-07 15:58:21 BRT [11818]: [1-1] user=,db= LOG:  streaming
>> > replication
>> > successfully connected to primary
>> > 2013-03-07 15:58:25 BRT [11817]: [3-1] user=,db= LOG:  consistent
>> > recovery
>> > state reached at 141/8FBB5F0
>> > 2013-03-07 15:58:25 BRT [11817]: [4-1] user=,db= LOG:  redo starts at
>> > 141/2251F90
>> > 2013-03-07 15:58:25 BRT [11817]: [5-1] user=,db= FATAL:  could not
>> > access
>> > status of transaction 30622931
>> > 2013-03-07 15:58:25 BRT [11817]: [6-1] user=,db= DETAIL:  Could not read
>> > from file "pg_clog/001D" at offset 49152: Success.
>> > 2013-03-07 15:58:25 BRT [11817]: [7-1] user=,db= CONTEXT:  xlog redo
>> > commit:
>> > 2013-03-07 15:55:40.673623-03
>> > 2013-03-07 15:58:25 BRT [11767]: [1-1] user=,db= LOG:  startup process
>> > (PID
>> > 11817) exited with exit code 1
>> > 2013-03-07 15:58:25 BRT [11767]: [2-1] user=,db= LOG:  terminating any
>> > other
>> > active server processes
>> >
>> > In my slave (logs) I have the following erros: My second attempt
>> >
>> > 2013-03-11 12:07:49 BRT [5862]: [1-1] user=,db= LOG:  database system
>> > was
>> > interrupted; last known up at 2013-03-11 12:06:31 BRT
>> > 2013-03-11 12:07:49 BRT [5862]: [2-1] user=,db= LOG:  entering standby
>> > mode
>> > 2013-03-11 12:07:49 BRT [5864]: [1-1] user=,db= LOG:  streaming
>> > replication
>> > successfully connected to primary
>> > 2013-03-11 12:07:53 BRT [5862]: [3-1] user=,db= LOG:  consistent
>> > recovery
>> > state reached at 168/816AE10
>> > 2013-03-11 12:07:53 BRT [5862]: [4-1] user=,db= LOG:  redo starts at
>> > 167/FEC3D828
>> > 2013-03-11 12:07:53 BRT [5862]: [5-1] user=,db= FATAL:  could not access
>> > status of transaction 36529670
>> > 2013-03-11 12:07:53 BRT [5862]: [6-1] user=,db= DETAIL:  Could not read
>> > from
>> > file "pg_clog/0022" at offset 212992: Success.
>> > 2013-03-11 12:07:53 BRT [5862]: [7-1] user=,db= CONTEXT:  xlog redo
>> > commit:
>> > 2013-03-11 12:05:35.069759-03
>> > 2013-03-11 12:07:53 BRT [5762]: [1-1] user=,db= LOG:  startup process
>> > (PID
>> > 5862) exited with exit code 1
>> > 2013-03-11 12:07:53 BRT [5762]: [2-1] user=,db= LOG:  terminating any
>> > other
>> > active server processes
>> >
>> > I had the same problem, but in different files (pg_clog):
>> >
>> > First attempt:
>> >
>> > 2013-03-07 15:58:25 BRT [11817]: [5-1] user=,db= FATAL:  could not
>> > access
>> > status of transaction 30622931
>> > 2013-03-07 15:58:25 BRT [11817]: [6-1] user=,db= DETAIL:  Could not read
>> > from file "pg_clog/001D" at offset 49152: Success.
>> >
>> > Second attempt:
>> >
>> > 2013-03-11 12:07:53 BRT [5862]: [5-1] user=,db= FATAL:  could not access
>> > status of transaction 36529670
>> > 2013-03-11 12:07:53 BRT [5862]: [6-1] user=,db= DETAIL:  Could not read
>> > from
>> > file "pg_clog/0022" at offset 212992: Success.
>> >
>> > When I started the synchronism at the first time (no data), it was OK.
>> > Now,
>> > my cluster has about 160GB and when the replication finishes I have
>> > these
>> > problems. I noticed that pg_clog/current_file is the problem.
>> >
>> > Any idea? Suggestions?
>> >
>> > Thank you.
>> >
>> > Best Regards
>> >
>> > João Paulo
>> > --
>> > JotaComm
>> > http://jotacomm.wordpress.com
>>
>>
>>
>> --
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> L. Friedman                                    netllama@gmail.com
>> LlamaLand                       https://netllama.linux-sxs.org
>
>
>
>
> --
> JotaComm
> http://jotacomm.wordpress.com



--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman                                    netllama@gmail.com
LlamaLand                       https://netllama.linux-sxs.org