Re: BUG #7500: hot-standby replica crash after an initial rsync
От | Maxim Boguk |
---|---|
Тема | Re: BUG #7500: hot-standby replica crash after an initial rsync |
Дата | |
Msg-id | CAK-MWwQdxsE3fvi2HzHeiAr9sGNp3ESx=B6gD=TXqZW6e2mf8A@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #7500: hot-standby replica crash after an initial rsync (Andres Freund <andres@2ndquadrant.com>) |
Ответы |
Re: BUG #7500: hot-standby replica crash after an initial rsync
|
Список | pgsql-bugs |
> > Does that mean that file was damaged during rsync? > Not necessarily. When did you initially set up that cluster? Normally the > file > should get zeroed out before its being used. If the cluster was copied > improperly (i.e. no pg_start/stop backup or such) it could easily happen. > But > I would guess that you did it properly? > That server was master before planned switchover to replica server was performed. Procedure: 1)stop former master 2)trigger replica failover (thus server become new master) There are unusual part because human error from system administrator: 2.1)start former master 2.2)stop former master 3)empty pg_xlog directory on former master 4)pg_start_backup on new master 5)re-resync datafiles to former master using: rsync -aq --delete --exclude pg_xlog /db/data [former master]:/db/ 6)pg_stop_backup on new master 7)start former master as replica... that procedure used more than 10 times without any problem in the past. May be I missing something in it. > > Could you show pg_controldata output for primary/replica? To see how new > that > multixact offset is. > On failing replica: [pgsql@db1 ~/tmp/postgresql-9.0.8]$ pg_controldata /db/data_back pg_control version number: 903 Catalog version number: 201008051 Database system identifier: 5687760461964967531 Database cluster state: in archive recovery pg_control last modified: Wed Aug 22 13:47:35 2012 Latest checkpoint location: 925/2D41CAA0 Prior checkpoint location: 925/2D41CAA0 Latest checkpoint's REDO location: 925/2C0004C8 Latest checkpoint's TimeLineID: 5 Latest checkpoint's NextXID: 0/1768511856 Latest checkpoint's NextOID: 59880938 Latest checkpoint's NextMultiXactId: 361788644 Latest checkpoint's NextMultiOffset: 822580965 Latest checkpoint's oldestXID: 1569182525 Latest checkpoint's oldestXID's DB: 16424 Latest checkpoint's oldestActiveXID: 1768498919 Time of latest checkpoint: Tue Aug 21 08:21:42 2012 Minimum recovery ending location: 925/2C0004C8 Backup start location: 925/2C0004C8 Current wal_level setting: hot_standby Current max_connections setting: 300 Current max_prepared_xacts setting: 0 Current max_locks_per_xact setting: 2048 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Maximum size of a TOAST chunk: 1996 Date/time type storage: 64-bit integers Float4 argument passing: by value Float8 argument passing: by value On master no sense to look into it because master 1 day ahead of that replica. It's a production system. > > > What next I should examine? > If you could give me a DEBUG3 log from startup till it crashes it would be > helpful. Please #define MULTIXACT_DEBUG ontop of multixact.c before doing > so. > Thats probably going to be to big to go through the mailinglist. > I see nothing new after I rebuit postgres with that define ( /usr/local/pgsql/tmp/postgresql/bin/postgres -d 3 -D /db/data_back/ ). Just: LOCATION: readRecoveryCommandFile, xlog.c:5269 DEBUG: 00000: primary_conninfo = 'host=10.12.22.19 port=5433 user=replica password=acilper' LOCATION: readRecoveryCommandFile, xlog.c:5276 DEBUG: 00000: trigger_file = '/db/data/failover' LOCATION: readRecoveryCommandFile, xlog.c:5283 LOG: 00000: entering standby mode LOCATION: StartupXLOG, xlog.c:5827 DEBUG: 00000: checkpoint record is at 925/2D41CAA0 LOCATION: StartupXLOG, xlog.c:5894 DEBUG: 00000: redo record is at 925/2C0004C8; shutdown FALSE LOCATION: StartupXLOG, xlog.c:5929 DEBUG: 00000: next transaction ID: 0/1768511856; next OID: 59880938 LOCATION: StartupXLOG, xlog.c:5933 DEBUG: 00000: next MultiXactId: 361788644; next MultiXactOffset: 822580965 LOCATION: StartupXLOG, xlog.c:5936 DEBUG: 00000: oldest unfrozen transaction ID: 1569182525, in database 16424 LOCATION: StartupXLOG, xlog.c:5939 DEBUG: 00000: MultiXact: setting next multi to 361788644 offset 822580965 LOCATION: MultiXactSetNextMXact, multixact.c:1668 DEBUG: 00000: transaction ID wrap limit is 3716666172, limited by database with OID 16424 LOCATION: SetTransactionIdLimit, varsup.c:334 WARNING: 01000: pid of startup is: 16919, sleeping for 60s LOCATION: StartupXLOG, xlog.c:5983 DEBUG: 00000: initializing for hot standby LOCATION: StartupXLOG, xlog.c:6073 LOG: 00000: redo starts at 925/2C0004C8 LOCATION: StartupXLOG, xlog.c:6202 FATAL: XX000: could not access status of transaction 361788644 DETAIL: Could not read from file "pg_multixact/offsets/1590" at offset 114688: Unknown error: 0. CONTEXT: xlog redo create multixact 361788644 offset 822580965: 1768508061 1768508078 LOCATION: SlruReportIOError, slru.c:840 > > If youre feeling adventurous you could use the xlogdump project > (https://github.com/snaga/xlogdump) to see what xlog records happen from > the > time it starts up till it crashes. I don't immediately see a way a > XLOG_MULTIXACT_CREATE_ID can be generated without a preceding > XLOG_MULTIXACT_ZERO_OFF_PAGE zeroing the page. > But most of that should be already be visible in the log with the above > define. > > Did you have any crashes, -m immediate shutdowns, or similar on the > primary or > replica? > I don't think. But I'm not sure because task was performed by system administrator using provided instruction. And I was informed only after replica doesn't start. Kind Regards, Maksym
В списке pgsql-bugs по дате отправления:
Предыдущее
От: Sachin SrivastavaДата:
Сообщение: Re: postgresql-9.1.4 macosx-10.6.8 (hostssl) psql -h hostname crashes
Следующее
От: Andres FreundДата:
Сообщение: Re: BUG #7500: hot-standby replica crash after an initial rsync