Обсуждение: DROP SUBSCRIPTION with no slot

Поиск
Список
Период
Сортировка

DROP SUBSCRIPTION with no slot

От
Jeff Janes
Дата:
I recently had to cut loose (pg_drop_replication_slot) a logical replica that couldn't keep up and so was threatening to bring down the master.

In mopping up on the replica side, I couldn't just drop the subscription, because it couldn't drop the nonexistent slot on the master and so refused to work.  So I had to do a silly little dance where I first disable the subscription, then ALTER SUBSCRIPTION ... SET (slot_name = NONE), then drop it.

Wanting to clean up after itself is admirable, but if there is nothing to clean up, why should that be an error condition?  Should this be an item on https://wiki.postgresql.org/wiki/Todo (to whatever extent that is still used).

Cheers,

Jeff

Re: DROP SUBSCRIPTION with no slot

От
Peter Eisentraut
Дата:
On 2019-09-24 16:31, Jeff Janes wrote:
> I recently had to cut loose (pg_drop_replication_slot) a logical replica
> that couldn't keep up and so was threatening to bring down the master.
> 
> In mopping up on the replica side, I couldn't just drop the
> subscription, because it couldn't drop the nonexistent slot on the
> master and so refused to work.  So I had to do a silly little dance
> where I first disable the subscription, then ALTER SUBSCRIPTION ... SET
> (slot_name = NONE), then drop it.
> 
> Wanting to clean up after itself is admirable, but if there is nothing
> to clean up, why should that be an error condition?

The alternatives seem quite error prone to me.  Better be explicit.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: DROP SUBSCRIPTION with no slot

От
Ziga
Дата:
This also seems to be a problem for somewhat fringe case of subscriptions created with connect=false option.
They cannot be dropped in an obvious way, without knowing the ALTER SUBSCRIPTION trick.

For example:

contrib_regression=# create subscription test_sub connection 'dbname=contrib_regression' publication test_pub with ( connect=false );                                                                                                                 
WARNING:  tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
CREATE SUBSCRIPTION

contrib_regression=# drop subscription test_sub; -- fails
ERROR:  could not drop the replication slot "test_sub" on publisher
DETAIL:  The error was: ERROR:  replication slot "test_sub" does not exist

contrib_regression=# alter subscription test_sub set ( slot_name=none );
ALTER SUBSCRIPTION

contrib_regression=# drop subscription test_sub; -- succeeds
DROP SUBSCRIPTION


Note that the publication was never refreshed.
It seems that the first DROP should succeed in the above case. 
Or at least some hint should be given how to fix this.




On 24 Sep 2019, at 23:25, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:

On 2019-09-24 16:31, Jeff Janes wrote:
I recently had to cut loose (pg_drop_replication_slot) a logical replica
that couldn't keep up and so was threatening to bring down the master.

In mopping up on the replica side, I couldn't just drop the
subscription, because it couldn't drop the nonexistent slot on the
master and so refused to work.  So I had to do a silly little dance
where I first disable the subscription, then ALTER SUBSCRIPTION ... SET
(slot_name = NONE), then drop it.

Wanting to clean up after itself is admirable, but if there is nothing
to clean up, why should that be an error condition?

The alternatives seem quite error prone to me.  Better be explicit.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: DROP SUBSCRIPTION with no slot

От
Jeff Janes
Дата:
On Tue, Sep 24, 2019 at 5:25 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 2019-09-24 16:31, Jeff Janes wrote:
> I recently had to cut loose (pg_drop_replication_slot) a logical replica
> that couldn't keep up and so was threatening to bring down the master.
>
> In mopping up on the replica side, I couldn't just drop the
> subscription, because it couldn't drop the nonexistent slot on the
> master and so refused to work.  So I had to do a silly little dance
> where I first disable the subscription, then ALTER SUBSCRIPTION ... SET
> (slot_name = NONE), then drop it.
>
> Wanting to clean up after itself is admirable, but if there is nothing
> to clean up, why should that be an error condition?

The alternatives seem quite error prone to me.  Better be explicit.

If you can connect to the master, and see that the slot already fails to exist, what is error prone about that?

If someone goes to the effort of setting up a different master, configures it to receive replica connections, and alters the subscription CONNECTION parameter on the replica to point to that poisoned master, will an error on the DROP SUBSCRIPTION really force them to see the error of their ways, or will they just succeed at explicitly doing the silly dance and finalize the process of shooting themselves in the foot via a roundabout mechanism?  (And at the point the CONNECTION is changed, he is in the same boat even if he doesn't try to drop the sub--either way the master has a dangling slot).  I'm in favor of protecting a fool from his foolishness, except when it is annoying to the rest of us (Well, annoying to me, I guess. I don't know yet about the rest of us).

Cheers,

Jeff

Re: DROP SUBSCRIPTION with no slot

От
Jeff Janes
Дата:
On Tue, Sep 24, 2019 at 6:42 PM Ziga <ziga@ljudmila.org> wrote:
This also seems to be a problem for somewhat fringe case of subscriptions created with connect=false option.
They cannot be dropped in an obvious way, without knowing the ALTER SUBSCRIPTION trick.

For example:

contrib_regression=# create subscription test_sub connection 'dbname=contrib_regression' publication test_pub with ( connect=false );                                                                                                                 
WARNING:  tables were not subscribed, you will have to run ALTER SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
CREATE SUBSCRIPTION

contrib_regression=# drop subscription test_sub; -- fails
ERROR:  could not drop the replication slot "test_sub" on publisher
DETAIL:  The error was: ERROR:  replication slot "test_sub" does not exist

contrib_regression=# alter subscription test_sub set ( slot_name=none );
ALTER SUBSCRIPTION

contrib_regression=# drop subscription test_sub; -- succeeds
DROP SUBSCRIPTION


Note that the publication was never refreshed.
It seems that the first DROP should succeed in the above case. 
Or at least some hint should be given how to fix this.

There is no HINT in the error message itself, but there is in the documentation, see note at end of https://www.postgresql.org/docs/current/sql-dropsubscription.html.  I agree with you that the DROP should just work in this case, even more so than in my case.  But if we go with the argument that doing that is too error prone, then do we want to include a HINT on how to be error prone more conveniently?

Cheers,

Jeff

Re: DROP SUBSCRIPTION with no slot

От
Žiga Kranjec
Дата:
On 25 Sep 2019, at 01:22, Jeff Janes <jeff.janes@gmail.com> wrote:

There is no HINT in the error message itself, but there is in the documentation, see note at end of https://www.postgresql.org/docs/current/sql-dropsubscription.html.  I agree with you that the DROP should just work in this case, even more so than in my case.  But if we go with the argument that doing that is too error prone, then do we want to include a HINT on how to be error prone more conveniently?

Cheers,

Jeff

Ah. I missed that bit in the documentation!

Perhaps a publication should remember, whether it actually created a replication slot and only try to remove it, if it did. Although that probably wouldn't help much in your case.


Ž.

Re: DROP SUBSCRIPTION with no slot

От
Isaac Morland
Дата:
On Wed, 25 Sep 2019 at 13:55, Žiga Kranjec <ziga@ljudmila.org> wrote:

Ah. I missed that bit in the documentation!

Perhaps a publication should remember, whether it actually created a replication slot and only try to remove it, if it did. Although that probably wouldn't help much in your case.

What about issuing a NOTICE if the slot doesn't exist? I'm thinking if it is able to connect to the primary and issue the command to delete the slot, but it doesn't exist, not in case of arbitrary errors. I believe there is precedent for similar behaviour from all the DROP ... IF EXISTS commands. 

Re: DROP SUBSCRIPTION with no slot

От
Petr Jelinek
Дата:
Hi,

On 25/09/2019 01:07, Jeff Janes wrote:
> On Tue, Sep 24, 2019 at 5:25 PM Peter Eisentraut 
> <peter.eisentraut@2ndquadrant.com 
> <mailto:peter.eisentraut@2ndquadrant.com>> wrote:
> 
>     On 2019-09-24 16:31, Jeff Janes wrote:
>      > I recently had to cut loose (pg_drop_replication_slot) a logical
>     replica
>      > that couldn't keep up and so was threatening to bring down the
>     master.
>      >
>      > In mopping up on the replica side, I couldn't just drop the
>      > subscription, because it couldn't drop the nonexistent slot on the
>      > master and so refused to work.  So I had to do a silly little dance
>      > where I first disable the subscription, then ALTER SUBSCRIPTION
>     ... SET
>      > (slot_name = NONE), then drop it.
>      >
>      > Wanting to clean up after itself is admirable, but if there is
>     nothing
>      > to clean up, why should that be an error condition?
> 
>     The alternatives seem quite error prone to me.  Better be explicit.
> 
> 
> If you can connect to the master, and see that the slot already fails to 
> exist, what is error prone about that?
> 
> If someone goes to the effort of setting up a different master, 
> configures it to receive replica connections, and alters the 
> subscription CONNECTION parameter on the replica to point to that 
> poisoned master, will an error on the DROP SUBSCRIPTION really force 
> them to see the error of their ways, or will they just succeed at 
> explicitly doing the silly dance and finalize the process of shooting 
> themselves in the foot via a roundabout mechanism? 

All that needs to happen to get into this situation is to have 
replication go through haproxy or some other loadbalancer or dns name 
that points to different server after failover. So user really does not 
have to touch the subscription

We should at least offer HINT though.

However, I'd be in favor of removing this restriction once the patch 
which limits how much wal a slot can retain gets in.

-- 
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/