Обсуждение: Reversing pg_logical replication

Поиск
Список
Период
Сортировка

Reversing pg_logical replication

От
Craig James
Дата:
I dug around on the web for comments and tools about how to reverse the master/slave relationship of a pg_logical publisher/subscriber pair without having to start fresh and re-copy all the data. All I found were several discussions that seemed to say, "You can't do it. You have to promote the subscriber to publisher, then blow off the original publisher's tables, make it a subscriber, and start from scratch."

But the create-subscription SQL has a "with (copy_data = false)", which seems to do exactly what I need. So I tried it, and it works fine. For a publish/subscript from servers A --> B, the procedure is:
  1. Shut off activity
  2. delete subscription from B
  3. delete publication from A
  4. create publication on B
  5. create subscription on A with copy_data = false.
It seems simple. It works. Have I overlooked something? For all the hand-wringing I found when others asked about this, I'm worried that I overlooked something.

Thanks,
Craig

--
---------------------------------
Craig A. James
Chief Technology Officer
eMolecules, Inc.
---------------------------------

Re: Reversing pg_logical replication

От
Peter Eisentraut
Дата:
On 21/09/2018 17:21, Craig James wrote:
> I dug around on the web for comments and tools about how to reverse the
> master/slave relationship of a pg_logical publisher/subscriber pair
> without having to start fresh and re-copy all the data. All I found were
> several discussions that seemed to say, "You can't do it. You have to
> promote the subscriber to publisher, then blow off the original
> publisher's tables, make it a subscriber, and start from scratch."
> 
> But the create-subscription SQL has a "with (copy_data = false)", which
> seems to do exactly what I need. So I tried it, and it works fine.

This works as long as both nodes are in sync and no one is writing to
either of them while you do these changes.

The hard part is doing it while both nodes are active.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Data Corruption Errors

От
"Ferrell, Denise SDC"
Дата:
Using Linux/PostgreSQL 9.5 &  setup on a VM.

I'm receiving the following errors during an insert using a function but this does not happen every time.  When this
doesoccur the database/system is unavailable until the processes are killed or the service is restarted. 

"LOG:  server process (PID xxxxx) was terminated by signal 9:  Killed
DETAIL: Failed process was running
   select insert_data ($1, $2,....)
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because
anotherserver process exited abnormally and possibly corrupted shared memory. 
HINT:  In a moment you should be able to reconnect ....
LOG:  connection received
FATAL:  the database system is in recovery mode"

The last 2 lines are repeated and logging is not available.

How can I determine data/table/shared memory corruption?

Thank you in advance for your responses.

D. Ferrell

Re: Data Corruption Errors

От
Achilleas Mantzios
Дата:
On 27/9/18 3:39 μ.μ., Ferrell, Denise SDC wrote:
> Using Linux/PostgreSQL 9.5 &  setup on a VM.
>
> I'm receiving the following errors during an insert using a function but this does not happen every time.  When this
doesoccur the database/system is unavailable until the processes are killed or the service is restarted.
 
>
> "LOG:  server process (PID xxxxx) was terminated by signal 9:  Killed
> DETAIL: Failed process was running
>     select insert_data ($1, $2,....)
> WARNING:  terminating connection because of crash of another server process
> DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because
anotherserver process exited abnormally and possibly corrupted shared memory.
 
> HINT:  In a moment you should be able to reconnect ....
> LOG:  connection received
> FATAL:  the database system is in recovery mode"

Did your DB eventually recover? Have you ruled reasons like OOM, non-optimal vm.swapiness , etc ?

>
> The last 2 lines are repeated and logging is not available.
>                  
> How can I determine data/table/shared memory corruption?
>
> Thank you in advance for your responses.
>
> D. Ferrell


-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt



Re: Data Corruption Errors

От
Kevin Grittner
Дата:
On Thu, Sep 27, 2018 at 7:40 AM Ferrell, Denise SDC
<denise.ferrell@sdc-world.com> wrote:

> Using Linux/PostgreSQL 9.5 &  setup on a VM.

Please always show the entire version number -- it can matter a lot.
One way to see it is to run `select version();` on a database
connection.  From the shell you can run one of the executables with
the --version option, which will show the version and do nothing else.
If you are running in a VM, please say which product, and show the
exact version of that.

> I'm receiving the following errors during an insert using a function but this does not happen every time.  When this
doesoccur the database/system is unavailable until the processes are killed or the service is restarted.
 
>
> "LOG:  server process (PID xxxxx) was terminated by signal 9:  Killed
> DETAIL: Failed process was running
>    select insert_data ($1, $2,....)
> WARNING:  terminating connection because of crash of another server process
> DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because
anotherserver process exited abnormally and possibly corrupted shared memory.
 
> HINT:  In a moment you should be able to reconnect ....
> LOG:  connection received
> FATAL:  the database system is in recovery mode"
>
> The last 2 lines are repeated and logging is not available.
>
> How can I determine data/table/shared memory corruption?

You have not shown any evidence of corruption.  What happened was that
some database service process was killed with SIGKILL (kill -9).
Never do that.  It may have been done by the Out of Memory Killer
process.  You should disable that by setting vm.overcommit_memory=2
and vm.overcommit_ratio=80 *or maybe as high as 90).

When a database service process is killed with SIGKILL the service
cannot continue without *RISK* of corruption, so it does a crash and
restart.  These messages:

  FATAL:  the database system is in recovery mode

Just mean that an attempt was made to the database while the recovery
was in progress, so the connection attempt was rejected.  Just ignore
those messages.  Just wait for the service to restart.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/


Re: Data Corruption Errors

От
"Ferrell, Denise SDC"
Дата:
This seems to have been the issue.  Thank you for the responses.
________________________________________
From: Kevin Grittner <kgrittn@gmail.com>
Sent: Thursday, September 27, 2018 9:24 AM
To: Ferrell, Denise SDC
Cc: pgsql-admin@postgresql.org
Subject: Re: Data Corruption Errors

On Thu, Sep 27, 2018 at 7:40 AM Ferrell, Denise SDC
<denise.ferrell@sdc-world.com> wrote:

> Using Linux/PostgreSQL 9.5 &  setup on a VM.

Please always show the entire version number -- it can matter a lot.
One way to see it is to run `select version();` on a database
connection.  From the shell you can run one of the executables with
the --version option, which will show the version and do nothing else.
If you are running in a VM, please say which product, and show the
exact version of that.

> I'm receiving the following errors during an insert using a function but this does not happen every time.  When this
doesoccur the database/system is unavailable until the processes are killed or the service is restarted. 
>
> "LOG:  server process (PID xxxxx) was terminated by signal 9:  Killed
> DETAIL: Failed process was running
>    select insert_data ($1, $2,....)
> WARNING:  terminating connection because of crash of another server process
> DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because
anotherserver process exited abnormally and possibly corrupted shared memory. 
> HINT:  In a moment you should be able to reconnect ....
> LOG:  connection received
> FATAL:  the database system is in recovery mode"
>
> The last 2 lines are repeated and logging is not available.
>
> How can I determine data/table/shared memory corruption?

You have not shown any evidence of corruption.  What happened was that
some database service process was killed with SIGKILL (kill -9).
Never do that.  It may have been done by the Out of Memory Killer
process.  You should disable that by setting vm.overcommit_memory=2
and vm.overcommit_ratio=80 *or maybe as high as 90).

When a database service process is killed with SIGKILL the service
cannot continue without *RISK* of corruption, so it does a crash and
restart.  These messages:

  FATAL:  the database system is in recovery mode

Just mean that an attempt was made to the database while the recovery
was in progress, so the connection attempt was rejected.  Just ignore
those messages.  Just wait for the service to restart.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/