Re: Recovery inconsistencies, standby much larger than primary
От | Andres Freund |
---|---|
Тема | Re: Recovery inconsistencies, standby much larger than primary |
Дата | |
Msg-id | 20140126174514.GI30218@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Recovery inconsistencies, standby much larger than primary (Greg Stark <stark@mit.edu>) |
Ответы |
Re: Recovery inconsistencies, standby much larger than primary
(Greg Stark <stark@mit.edu>)
Re: Recovery inconsistencies, standby much larger than primary (Greg Stark <stark@mit.edu>) |
Список | pgsql-hackers |
Hi, On 2014-01-24 19:23:28 -0500, Greg Stark wrote: > Since the point release we've run into a number of databases that when > we restore from a base backup end up being larger than the primary > database was. Sometimes by a large factor. The data below is from > 9.1.11 (both primary and standby) but we've seen the same thing on > 9.2.6. What's the procedure for creating those standbys? Were they of similar size after being cloned? > primary$ for i in 1261982 1364767 1366221 473158 ; do echo -n "$i " ; > du -shc $i* | tail -1 ; done > 1261982 29G total > 1364767 23G total > 1366221 12G total > 473158 76G total > > standby$ for i in 1261982 1364767 1366221 473158 ; do echo -n "$i " ; > du -shc $i* | tail -1 ; done > 1261982 55G total > 1364767 28G total > 1366221 17G total > 473158 139G total > ... > The first three are btrees and the fourth is a haeap btw. Are those all of the same underlying heap relation? > We're also seeing log entries about "wal contains reference to invalid > pages" but these errors seem only vaguely correlated. Sometimes we get > the errors but the tables don't grow noticeably and sometimes we don't > get the errors and the tables are much larger. Uhm. I am a bit confused. You see those in the standby's log? At !debug log levels? That'd imply that the standby is dead and needed to be recloned, no? How do you continue after that? > Much of the added space is uninitialized pages as you might expect but > I don't understand is how the database can start up without running > into the "reference to invalid pages" panic consistently. We check > both that there are no references after consistency is reached *and* > that any references before consistency are resolved by a truncate or > unlink before consistency. Well, it's pretty easy to get into a situation with lot's of new pages. Lots of concurrent inserts that all fail before logging WAL. The next insert will extend the relation and only initialise that last value. It'd be interesting to look for TRUNCATE records using xlogdump. Could you show those for starters? > I'm assuming this is somehow related to the mulixact or transaction > wraparound problems but I don't really understand how they could be > hitting when both the primary and standby are post-upgrade to the most > recent point release which have the fixes That doesn't sound likely. For one the symptoms don't fit, for another, those problems are mostly 9.3+. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
В списке pgsql-hackers по дате отправления:
Следующее
От: Andres FreundДата:
Сообщение: shouldn't we log permission errors when accessing the configured trigger file?