Postgresql 9.5: Streaming Replication: Secondaries Fail To Start Post WAL Error

Поиск
Список
Период
Сортировка
От Mohan NBSPS
Тема Postgresql 9.5: Streaming Replication: Secondaries Fail To Start Post WAL Error
Дата
Msg-id CAPCvfWcm0JDC+q54MSW7N90PYvh+PefaP6SxfonbkGcUwpS1+g@mail.gmail.com
обсуждение исходный текст
Ответы Re: Postgresql 9.5: Streaming Replication: Secondaries Fail To Start Post WAL Error  (Johannes Truschnigg <johannes@truschnigg.info>)
Список pgsql-admin
Dear Community,

I am trying to understand why all the secondary databases failed to start
after seeing a WAL related error for some time.

Timeline:

2024-04-19: WAL errors appear in the secondary database nodes

```
LOG: invalid resource manager ID 55 at 40/F46CBCA8
```

- the secondaries did not lag in replication
  - monitored via query
```
pg_last_xact_replay_timestamp
```

- 2024-05-02; Secondaries reboot and fail to start up

```
FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 000000010000004100000049 has already been removed
 FATAL:  the database system is starting up
```

from my understanding, the WAL file is streamed over the network (secondary pulls from primary) and creates a WAL file in the secondary.
then it replays the copied WAL file using a different process.

in order for the local WAL file to go out of sync,

1. the primary removed the WAL file, the secondary was streaming
2. the WAL file on the secondary got corrupted
3 ....

Questions

- what do those error messages mean ?
- how can I prevent this from happening ?

- references

Any advice/information is highly appreciated.
thank you
mohan

В списке pgsql-admin по дате отправления:

Предыдущее
От: Muhammad Imtiaz
Дата:
Сообщение: Re: Pg_squeze
Следующее
От: Johannes Truschnigg
Дата:
Сообщение: Re: Postgresql 9.5: Streaming Replication: Secondaries Fail To Start Post WAL Error