Обсуждение: FATAL: could not receive data from WAL stream

Поиск
Список
Период
Сортировка

FATAL: could not receive data from WAL stream

От
Patrick B
Дата:
Hi guys,

I got a slave server running Postgres 9.2 with streaming replication and wal_archive in an EC2 Instance at Amazon.

Postgres logs are showing me this error:
restored log file "000000020000179A000000F8" from archive
invalid record length at 179A/F8FFF3D0
WAL segment `/var/lib/pgsql/9.2/archive/00000003.history` not found
streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000020000179A000000F8 has already been removed

However, 000000020000179A000000F8 file is inside /var/lib/pgsql/9.2/archive directory:
postgres@devops:/var/lib/pgsql/9.2/archive$ ls -la | grep 000000020000179A000000F8
-rw------- 1 postgres postgres 16777216 Sep 16 05:16 000000020000179A000000F8 


It's an UBUNTU instance, so my recovery.conf is:


/etc/postgresql/9.2/main/recovery.conf:
restore_command = 'exec /var/lib/pgsql/bin/restore_wal_segment.bash "/var/lib/pgsql/9.2/wal_archive/%f" "%p"'
archive_cleanup_command = '/var/lib/postgresql/bin/pg_archivecleaup_mv.bash'
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=IP_MY_SLAVE port=5432 user=replicator application_name=devops' 


What can be happening, if the file is in there?

Thanks
Patrick 

Re: FATAL: could not receive data from WAL stream

От
Venkata B Nagothi
Дата:

On Tue, Sep 20, 2016 at 12:38 PM, Patrick B <patrickbakerbr@gmail.com> wrote:
Hi guys,

I got a slave server running Postgres 9.2 with streaming replication and wal_archive in an EC2 Instance at Amazon.

Postgres logs are showing me this error:
restored log file "000000020000179A000000F8" from archive
invalid record length at 179A/F8FFF3D0
WAL segment `/var/lib/pgsql/9.2/archive/00000003.history` not found
streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000020000179A000000F8 has already been removed

However, 000000020000179A000000F8 file is inside /var/lib/pgsql/9.2/archive directory:
postgres@devops:/var/lib/pgsql/9.2/archive$ ls -la | grep 000000020000179A000000F8
-rw------- 1 postgres postgres 16777216 Sep 16 05:16 000000020000179A000000F8 


It's an UBUNTU instance, so my recovery.conf is:


/etc/postgresql/9.2/main/recovery.conf:
restore_command = 'exec /var/lib/pgsql/bin/restore_wal_segment.bash "/var/lib/pgsql/9.2/wal_archive/%f" "%p"'
archive_cleanup_command = '/var/lib/postgresql/bin/pg_archivecleaup_mv.bash'
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=IP_MY_SLAVE port=5432 user=replicator application_name=devops' 


What can be happening, if the file is in there?

Do you mean to say that the WAL file "000000020000179A000000F8" is available @ "/var/lib/pgsql/9.2/archive" location ?

Regards,
Venkata B N

Fujitsu Australia

Re: FATAL: could not receive data from WAL stream

От
Lucas Possamai
Дата:


2016-09-20 15:14 GMT+12:00 Venkata B Nagothi <nag1010@gmail.com>:

On Tue, Sep 20, 2016 at 12:38 PM, Patrick B <patrickbakerbr@gmail.com> wrote:
Hi guys,

I got a slave server running Postgres 9.2 with streaming replication and wal_archive in an EC2 Instance at Amazon.

Postgres logs are showing me this error:
restored log file "000000020000179A000000F8" from archive
invalid record length at 179A/F8FFF3D0
WAL segment `/var/lib/pgsql/9.2/archive/00000003.history` not found
streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000020000179A000000F8 has already been removed

However, 000000020000179A000000F8 file is inside /var/lib/pgsql/9.2/archive directory:
postgres@devops:/var/lib/pgsql/9.2/archive$ ls -la | grep 000000020000179A000000F8
-rw------- 1 postgres postgres 16777216 Sep 16 05:16 000000020000179A000000F8 


It's an UBUNTU instance, so my recovery.conf is:


/etc/postgresql/9.2/main/recovery.conf:
restore_command = 'exec /var/lib/pgsql/bin/restore_wal_segment.bash "/var/lib/pgsql/9.2/wal_archive/%f" "%p"'
archive_cleanup_command = '/var/lib/postgresql/bin/pg_archivecleaup_mv.bash'
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=IP_MY_SLAVE port=5432 user=replicator application_name=devops' 


What can be happening, if the file is in there?

Do you mean to say that the WAL file "000000020000179A000000F8" is available @ "/var/lib/pgsql/9.2/archive" location ?

Regards,
Venkata B N

Fujitsu Australia





Yes.....  

Re: FATAL: could not receive data from WAL stream

От
Lucas Possamai
Дата:


2016-09-20 16:29 GMT+12:00 Lucas Possamai <drum.lucas@gmail.com>:


2016-09-20 15:14 GMT+12:00 Venkata B Nagothi <nag1010@gmail.com>:

On Tue, Sep 20, 2016 at 12:38 PM, Patrick B <patrickbakerbr@gmail.com> wrote:
Hi guys,

I got a slave server running Postgres 9.2 with streaming replication and wal_archive in an EC2 Instance at Amazon.

Postgres logs are showing me this error:
restored log file "000000020000179A000000F8" from archive
invalid record length at 179A/F8FFF3D0
WAL segment `/var/lib/pgsql/9.2/archive/00000003.history` not found
streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000020000179A000000F8 has already been removed

However, 000000020000179A000000F8 file is inside /var/lib/pgsql/9.2/archive directory:
postgres@devops:/var/lib/pgsql/9.2/archive$ ls -la | grep 000000020000179A000000F8
-rw------- 1 postgres postgres 16777216 Sep 16 05:16 000000020000179A000000F8 


It's an UBUNTU instance, so my recovery.conf is:


/etc/postgresql/9.2/main/recovery.conf:
restore_command = 'exec /var/lib/pgsql/bin/restore_wal_segment.bash "/var/lib/pgsql/9.2/wal_archive/%f" "%p"'
archive_cleanup_command = '/var/lib/postgresql/bin/pg_archivecleaup_mv.bash'
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=IP_MY_SLAVE port=5432 user=replicator application_name=devops' 


What can be happening, if the file is in there?

Do you mean to say that the WAL file "000000020000179A000000F8" is available @ "/var/lib/pgsql/9.2/archive" location ?

Regards,
Venkata B N

Fujitsu Australia





Yes.....  



Ops.. sorry... sent to the wrong email 

Re: FATAL: could not receive data from WAL stream

От
Patrick B
Дата:


2016-09-20 15:14 GMT+12:00 Venkata B Nagothi <nag1010@gmail.com>:

On Tue, Sep 20, 2016 at 12:38 PM, Patrick B <patrickbakerbr@gmail.com> wrote:
Hi guys,

I got a slave server running Postgres 9.2 with streaming replication and wal_archive in an EC2 Instance at Amazon.

Postgres logs are showing me this error:
restored log file "000000020000179A000000F8" from archive
invalid record length at 179A/F8FFF3D0
WAL segment `/var/lib/pgsql/9.2/archive/00000003.history` not found
streaming replication successfully connected to primary
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000020000179A000000F8 has already been removed

However, 000000020000179A000000F8 file is inside /var/lib/pgsql/9.2/archive directory:
postgres@devops:/var/lib/pgsql/9.2/archive$ ls -la | grep 000000020000179A000000F8
-rw------- 1 postgres postgres 16777216 Sep 16 05:16 000000020000179A000000F8 


It's an UBUNTU instance, so my recovery.conf is:


/etc/postgresql/9.2/main/recovery.conf:
restore_command = 'exec /var/lib/pgsql/bin/restore_wal_segment.bash "/var/lib/pgsql/9.2/wal_archive/%f" "%p"'
archive_cleanup_command = '/var/lib/postgresql/bin/pg_archivecleaup_mv.bash'
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=IP_MY_SLAVE port=5432 user=replicator application_name=devops' 


What can be happening, if the file is in there?

Do you mean to say that the WAL file "000000020000179A000000F8" is available @ "/var/lib/pgsql/9.2/archive" location ?



Yes!

Re: FATAL: could not receive data from WAL stream

От
Michael Paquier
Дата:
On Tue, Sep 20, 2016 at 1:30 PM, Patrick B <patrickbakerbr@gmail.com> wrote:
> 2016-09-20 15:14 GMT+12:00 Venkata B Nagothi <nag1010@gmail.com>:
>> Do you mean to say that the WAL file "000000020000179A000000F8" is
>> available @ "/var/lib/pgsql/9.2/archive" location ?
>
> Yes!

Timeline 2 has visibly reached its end at segment
000000020000179A000000F8 and it cannot find in the archive the history
file to see from which timeline it needs to fetch afterwards. As the
timeline file cannot be found, it then attempts to fetch the segment
that it thinks is complete from the master itself.

Didn't you trigger a promotion which would make the master reach the
timeline 3? And are you sure that 00000003.history is not in the
archives?
--
Michael


Re: FATAL: could not receive data from WAL stream

От
Patrick B
Дата:


2016-09-20 16:46 GMT+12:00 Michael Paquier <michael.paquier@gmail.com>:
On Tue, Sep 20, 2016 at 1:30 PM, Patrick B <patrickbakerbr@gmail.com> wrote:
> 2016-09-20 15:14 GMT+12:00 Venkata B Nagothi <nag1010@gmail.com>:
>> Do you mean to say that the WAL file "000000020000179A000000F8" is
>> available @ "/var/lib/pgsql/9.2/archive" location ?
>
> Yes!

Timeline 2 has visibly reached its end at segment
000000020000179A000000F8 and it cannot find in the archive the history
file to see from which timeline it needs to fetch afterwards. As the
timeline file cannot be found, it then attempts to fetch the segment
that it thinks is complete from the master itself.

Didn't you trigger a promotion which would make the master reach the
timeline 3? And are you sure that 00000003.history is not in the
archives?
--
Michael



The server went down and when it came back online I got that errors.. 


I got some errors on the logs: systemd1: Removed slice User Slice of postgres.

I belive something happened with Postgres user and when the server came back online it started postgres in a new path... that excluded recovery.conf and the server might have been promoted as master

This means I'll have to re-build the DB right?

Patrick