Re: Cascade Standby - HELP! (Problem and solution)

Поиск
Список
Период
Сортировка
От Murthy Nunna
Тема Re: Cascade Standby - HELP! (Problem and solution)
Дата
Msg-id BD7563973EF8894D905C80593EE9A4ED36EAF5@MAIL01.fnal.gov
обсуждение исходный текст
Список pgsql-admin

Gentlemen:

 

This is how I hacked the issue….

 

Scenario:

Cascade replication using streaming

Master->slave1->slave2

 

Problem:

If slave2 is stopped for a long time AND slave1 does not have sufficient WAL segments (checkpoint_segments and wal_keep_segments) to satisfy the long shutdown of slave2 then the restart of slave2 fails with an obvious error like “FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 000000010000001C00000013 has already been removed” This is because slave1 does not ship WAL files to slave2 automatically.

 

My (poor man’s) solution:

In slave1 recovery.conf change archive_cleanup_command as follows: (include rsync command to ship logs to slave2 before they get deleted)

 

archive_cleanup_command = 'echo "**** `date` **** rsync is starting..." ; rsync -arv /data/postgres/archive/rep_poc/remote/* postgre

s@ifdb02.fnal.gov:/data/postgres/archive/rep_poc/remote/ ; /fnal/ups/prd/postgres/v9_2_4_x64/Linux-2-6/bin/pg_archivecleanup /data/p

ostgres/archive/rep_poc/remote %r'

 

[Note: The above needs some tweaking… if rsync fails for any reason I do not want the cleanup to occur. I am going to work on this]

 

Downside of poor man’s solution:

The copy will not occur until a restartpoint is reached in slave1 so there can be some delay before slave2 can see the log. However, the use of these logs on slave2 is meant for lifeline purposes to prevent a total rebuild of the slave when streaming cannot catch up.

 

Suggestion for improvement:

I don’t know really…. We have to somehow capture the event of appearance of a new WAL in archive area of cascading-standby (slave1) and execute a command (say archive_copy_command)

 

Thanks,

Murthy

 

 

From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Murthy Nunna
Sent: Wednesday, April 02, 2014 4:07 PM
To: Scott Whitney
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Cascade Standby - HELP!

 

Hi Scott,

 

Yes, I thought of increasing wal_keep_segments but it doesn’t seem to be a bullet proof solution. It locks that much space and most of the time it is a wasted space since streaming will be working and you really don’t need these logs hanging around.

 

Besides, things work fine in primary->slave configuration because archive_command ships the logs constantly to slave and if slave is in good hands of streaming it cleans up the logs. It will use it only when it is necessary for gap resolution. However, it doesn’t work from slave->slave configuration because the archive_command has no meaning there while it is recovery mode and it has no logs of its own.

 

This restriction limits the cascade feature and makes it fragile in my opinion. It could break anytime and you end up rebuilding.

 

Thanks,

Murthy

 

 

From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Scott Whitney
Sent: Wednesday, April 02, 2014 3:49 PM
To: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Cascade Standby - HELP!

 

My only response that might help you with this is that my setup is:

 

Server 1) primary production db server (5432)

 

Server 2) replicates production (5432) also primary for non-production use (6432)

 

Server 3) (offsite) replicates #2 (5432 -> 7432 internally and 6432 -> 6432) AND hosts internal databases

 

So:

1) Production

2) Replication + non production

3) Internal + replication of NON production + replication of PRODUCTION (via the cascaded server)

 

I would find that when replicating the PRODUCTION server internally (#3), I had not set wal_keep_segments high enough,

due to the lag of WAN replication, production and non production would replicate fine (gigabit backend), but sometimes when

a load of data was being updated/moved/deleted, the offsite (#3) would fall behind far enough that it could not recover

without a new pg base backup, in essence setting up replication again.

 

I solved this by upping the wal_keep_segments.

 

 


Hi Robert,

 

Thanks for the response. Mine is 9.2.4 and the issue is specifically with cascading standby. I don’t think 8.3 had that feature.

 

Murthy

 

From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Burgholzer, Robert (DEQ)
Sent: Wednesday, April 02, 2014 3:29 PM
To: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Cascade Standby - HELP!

 

Murthy,

I have experienced something similar to this issue, and recall that it had to do with log file difficulties.  I tried my best to document this, but recall that I had problems working back through it, so please, note that this is a procedure that carries no guarantees.  Note that the step-by-step calls a couple of custom scripts that I wrote (they are at the end of the page), AND this is on an 8.3 postgresql install:

 

 

Regards,

/r/b

 

 

В списке pgsql-admin по дате отправления:

Предыдущее
От: Matthew Rantala
Дата:
Сообщение: How to find calculated CRC checksum of pg_controldata
Следующее
От: ALEXANDER JOSE
Дата:
Сообщение: Migration Postgresql