[BUGS] BUG #14538: streaming replication same wal missing
От | vodevsh@gmail.com |
---|---|
Тема | [BUGS] BUG #14538: streaming replication same wal missing |
Дата | |
Msg-id | 20170209135808.1405.57354@wrigleys.postgresql.org обсуждение исходный текст |
Ответы |
Re: [BUGS] BUG #14538: streaming replication same wal missing
(Michael Paquier <michael.paquier@gmail.com>)
|
Список | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 14538 Logged by: Vladimir Svedov Email address: vodevsh@gmail.com PostgreSQL version: 9.3.11 Operating system: Linux Description: Hi, I posted formatted text as a question here:http://dba.stackexchange.com/questions/163735/streaming-replication-same-wal-missing Several days ago I saw this in master log: replica 2017-02-06 16:10:55 UTC ERROR: requested WAL segment 000000030000096D00000052 has already been removed This message was repeating until I stopped slave. Checking pg_stat_activity on prod showed autovacuum task on table ~250GB, I decided that it produced huge amount of WALs. Checking them confirmed my assumption - all wal_keep_segments=500 were last minute. So I thought that network bandwidth was not sufficient to send WALs fast enough to slave to replay them. Master has archive_command = cd ., so no way back. I did not panic, I recreate replication. I: delete all data files on slave pg_start_backup() on prod rsync -zahv data pg_start slave pg_stop_backup() on prod and check logs. Imagine my desperation to see that slave has this: -bash-4.2$ head -n 20 /pg/data93/pg_log/postgresql-2017-02-09_121514.log LOG: could not open usermap file "/pg/data93/pg_ident.conf": No such file or directory LOG: database system was shut down in recovery at 2017-02-09 12:15:07 UTC LOG: entering standby mode LOG: redo starts at 982/39074640 FATAL: the database system is starting up FATAL: the database system is starting up FATAL: the database system is starting up LOG: consistent recovery state reached at 982/3AEC60F8 LOG: unexpected pageaddr 97C/83EC8000 in log segment 00000003000009820000003A, offset 15499264 LOG: database system is ready to accept read only connections LOG: started streaming WAL from primary at 982/3A000000 on timeline 3 ERROR: requested WAL segment 000000030000096D00000052 has already been removed ERROR: requested WAL segment 000000030000096D00000052 has already been removed ERROR: requested WAL segment 000000030000096D00000052 has already been removed ERROR: requested WAL segment 000000030000096D00000052 has already been removed ERROR: requested WAL segment 000000030000096D00000052 has already been removed ERROR: requested WAL segment 000000030000096D00000052 has already been removed and so on... 3 days passed! That segment has gone, but why is it looking for it??? I copied whole data directory. All pages are readable (pg_dump works on all dbs), changes from master go to slave. ERROR keeps appearing each several seconds on slave. Please, tell me I did something stupid and it is not a bug. It happened on prod. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
В списке pgsql-bugs по дате отправления: