Re: Recovery inconsistencies, standby much larger than primary

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Recovery inconsistencies, standby much larger than primary
Дата
Msg-id 20140126174514.GI30218@alap3.anarazel.de
обсуждение исходный текст
Ответ на Recovery inconsistencies, standby much larger than primary  (Greg Stark <stark@mit.edu>)
Ответы Re: Recovery inconsistencies, standby much larger than primary  (Greg Stark <stark@mit.edu>)
Re: Recovery inconsistencies, standby much larger than primary  (Greg Stark <stark@mit.edu>)
Список pgsql-hackers
Hi,

On 2014-01-24 19:23:28 -0500, Greg Stark wrote:
> Since the point release we've run into a number of databases that when
> we restore from a base backup end up being larger than the primary
> database was. Sometimes by a large factor. The data below is from
> 9.1.11 (both primary and standby) but we've seen the same thing on
> 9.2.6.

What's the procedure for creating those standbys? Were they of similar
size after being cloned?

> primary$ for i in  1261982 1364767 1366221 473158 ; do echo -n "$i " ;
> du -shc $i* | tail -1 ; done
> 1261982 29G total
> 1364767 23G total
> 1366221 12G total
> 473158 76G total
> 
> standby$ for i in  1261982 1364767 1366221 473158 ; do echo -n "$i " ;
> du -shc $i* | tail -1 ; done
> 1261982 55G total
> 1364767 28G total
> 1366221 17G total
> 473158 139G total
> ...
> The first three are btrees and the fourth is a haeap btw.

Are those all of the same underlying heap relation?

> We're also seeing log entries about "wal contains reference to invalid
> pages" but these errors seem only vaguely correlated. Sometimes we get
> the errors but the tables don't grow noticeably and sometimes we don't
> get the errors and the tables are much larger.

Uhm. I am a bit confused. You see those in the standby's log? At !debug
log levels? That'd imply that the standby is dead and needed to be
recloned, no? How do you continue after that?

> Much of the added space is uninitialized pages as you might expect but
> I don't understand is how the database can start up without running
> into the "reference to invalid pages" panic consistently. We check
> both that there are no references after consistency is reached *and*
> that any references before consistency are resolved by a truncate or
> unlink before consistency.

Well, it's pretty easy to get into a situation with lot's of new
pages. Lots of concurrent inserts that all fail before logging WAL. The
next insert will extend the relation and only initialise that last
value.

It'd be interesting to look for TRUNCATE records using xlogdump. Could
you show those for starters?

> I'm assuming this is somehow related to the mulixact or transaction
> wraparound problems but I don't really understand how they could be
> hitting when both the primary and standby are post-upgrade to the most
> recent point release which have the fixes

That doesn't sound likely. For one the symptoms don't fit, for another,
those problems are mostly 9.3+.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: Tablespace options in \db+
Следующее
От: Andres Freund
Дата:
Сообщение: shouldn't we log permission errors when accessing the configured trigger file?