Обсуждение: [GENERAL] BDR, wal segment has been removed, is it possible move forward?

Поиск
Список
Период
Сортировка

[GENERAL] BDR, wal segment has been removed, is it possible move forward?

От
milist ujang
Дата:
Hi all,

due to space issue and high volume transaction, some wal segments removed from pg_xlog on bdr environment.

warning log at node1 saying "requested WAL segment ..... has already been removed" following Connection reset by peer.

log at node2 :
Sending replication command: START_REPLICATION SLOT
...
...
XX000: data stream ended


I had played streams and goldengate (oracle product) , that at capture side we can move forward to certain sequence (archivedlog/redolog - wal segment in postgres).

So, is it possible to move forward to read recent wal segment in bdr environment? (assume data reconciliation will be done manually).


--

Re: [GENERAL] BDR, wal segment has been removed, is it possible move forward?

От
Craig Ringer
Дата:
On 6 September 2017 at 01:52, milist ujang <ujang.milist@gmail.com> wrote:
Hi all,

due to space issue and high volume transaction, some wal segments removed from pg_xlog on bdr environment.

What, you deleted them?
 
I had played streams and goldengate (oracle product) , that at capture side we can move forward to certain sequence (archivedlog/redolog - wal segment in postgres).

So, is it possible to move forward to read recent wal segment in bdr environment? (assume data reconciliation will be done manually).

BDR can, see bdr.skip_changes_upto .

But PostgreSQL's logical decoding requires a contiguous WAL stream to maintain a valid catalog_xmin and restart_lsn, so it'll still fail to advance. So your replication from that node is broken, and you have to part the node then rejoin. You'll need to manually recover any diverged data.

Don't go in and randomly delete things in the postgres data directory, or things will break.

The BDR manual warns of the importance of disk space monitoring...

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [GENERAL] BDR, wal segment has been removed, is it possible move forward?

От
milist ujang
Дата:
Hi Craig

On Wed, Sep 6, 2017 at 7:21 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

BDR can, see bdr.skip_changes_upto .

Unluckily my bdr is 0.9.3
 
But PostgreSQL's logical decoding requires a contiguous WAL stream to maintain a valid catalog_xmin and restart_lsn, so it'll still fail to advance. So your replication from that node is broken, and you have to part the node then rejoin. You'll need to manually recover any diverged data.

Yup, I willing to reconcile data manualy via dblink or else,
is it still possible to move wal segment in 0.9.3?

I've played these at dev env to simulate, but no luck:

# select * from pg_replication_identifier_progress;
 local_id |              external_id               | remote_lsn | local_lsn
----------+----------------------------------------+------------+-----------
        1 | bdr_6461744703437035137_1_34424_30406_ | 0/4288950  | 0/3007588
(1 row)

# SELECT pg_replication_identifier_setup_replaying_from('bdr_6461744703437035137_1_34424_30406_');
 pg_replication_identifier_setup_replaying_from
------------------------------------------------

(1 row)

# SELECT pg_replication_identifier_setup_tx_origin('0/4288960','2017-09-06');
 pg_replication_identifier_setup_tx_origin
-------------------------------------------

(1 row)

# select * from pg_replication_identifier_progress;
 local_id |              external_id               | remote_lsn | local_lsn
----------+----------------------------------------+------------+-----------
        1 | bdr_6461744703437035137_1_34424_30406_ | 0/4288950  | 0/3007588
(1 row)


--

Re: [GENERAL] BDR, wal segment has been removed, is it possible move forward?

От
Craig Ringer
Дата:
On 6 September 2017 at 08:47, milist ujang <ujang.milist@gmail.com> wrote:
Hi Craig

On Wed, Sep 6, 2017 at 7:21 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

BDR can, see bdr.skip_changes_upto .

Unluckily my bdr is 0.9.3
 
But PostgreSQL's logical decoding requires a contiguous WAL stream to maintain a valid catalog_xmin and restart_lsn, so it'll still fail to advance. So your replication from that node is broken, and you have to part the node then rejoin. You'll need to manually recover any diverged data.

Yup, I willing to reconcile data manualy via dblink or else,
is it still possible to move wal segment in 0.9.3?

Well, you can skip changes on the logical decoding stream manually in 0.9.3 using the underlying postgres functions. But it won't help you because what you broke when you deleted the WAL segments wasn't BDR, it was postgres logical decoding.

It *requires* a contiguous stream of WAL. It cannot recover if you do not have that. And you cannot really reset it.

You could drop and re-create the replication slot, I guess. But your nodes would be hopelessly out of sync and need manual resync (with data replication disabled) of one node vs another.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [GENERAL] BDR, wal segment has been removed, is it possible move forward?

От
milist ujang
Дата:
Hi Craig,

On Wed, Sep 6, 2017 at 4:07 PM, Craig Ringer <craig@2ndquadrant.com> wrote:

You could drop and re-create the replication slot, I guess. But your nodes would be hopelessly out of sync and need manual resync (with data replication disabled) of one node vs another.

Thanks for pointing to replication slot.
I Simulate the similar situation in dev env by remove the wal segment on node1, when  node2 keep inserting into a table, now it perfectly can move forward to latest wal segment, but the difference situation is at node_status.

In production node_status is  i  in node1 but  r  in node2, where on my dev both nodes keep  r  , even I waited to let it may change quite long.

can I safely update the node_status directy on bdr.bdr_nodes?


--

Re: [GENERAL] BDR, wal segment has been removed, is it possible move forward?

От
Craig Ringer
Дата:
On 7 September 2017 at 21:16, milist ujang <ujang.milist@gmail.com> wrote:
Hi Craig,

On Wed, Sep 6, 2017 at 4:07 PM, Craig Ringer <craig@2ndquadrant.com> wrote:

You could drop and re-create the replication slot, I guess. But your nodes would be hopelessly out of sync and need manual resync (with data replication disabled) of one node vs another.

Thanks for pointing to replication slot.
I Simulate the similar situation in dev env by remove the wal segment on node1, when  node2 keep inserting into a table, now it perfectly can move forward to latest wal segment, but the difference situation is at node_status.

In production node_status is  i  in node1

there's a known bug in bdr1 where sometimes the node status doesn't update from 'i' after joining.
 
can I safely update the node_status directy on bdr.bdr_nodes?

Usually not. In this one specific case where a node is known to be fully joined and online, but its status is stuck at 'i', yes.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services