Обсуждение: viewing connectioninfo used by subscriber on the publication serverwhen inactive

Поиск
Список
Период
Сортировка

viewing connectioninfo used by subscriber on the publication serverwhen inactive

От
Wim Bertels
Дата:
Hello,

(context: logical built-in replication)

On the publication server:

How can you see what is the subscription connection is when this connection is inactive?
Is there query that can be used?, or is this only possible with extra monitoring tools (or going through the pg.log files if logged)?
(If so, what would you suggest on debian, preferably available in the repository of apt.postresql.org or debian.org?)

An example:

# select * from pg_replication_slots;
 slot_name |  plugin  | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn 
-----------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
 db2_sub1  | pgoutput | logical   |  20751 | db2      | f         | f      |            |      |       171086 | C/C4C4EA08  | C/C4C4EA08
 db2_sub2  | pgoutput | logical   |  20751 | db2      | f         | t      |       9957 |      |       171113 | C/C4C90EE0  | C/C4CC0FE0

* so for db2_sub2 there is pid: pg_stat_activity gives me the userinfo and others,
* but how can you obtain this for the inactive subscription (db2_sub1)?


--
mvg,
Wim 
--
In India, "cold weather" is merely a conventional phrase and has come into
use through the necessity of having some way to distinguish between weather
which will melt a brass door-knob and weather which will only make it mushy.	-- Mark Twain

Re: viewing connectioninfo used by subscriber on the publicationserver when inactive

От
Keith Fiske
Дата:


On Thu, May 14, 2020 at 6:50 AM Wim Bertels <wim.bertels@ucll.be> wrote:
Hello,

(context: logical built-in replication)

On the publication server:

How can you see what is the subscription connection is when this connection is inactive?
Is there query that can be used?, or is this only possible with extra monitoring tools (or going through the pg.log files if logged)?
(If so, what would you suggest on debian, preferably available in the repository of apt.postresql.org or debian.org?)

An example:

# select * from pg_replication_slots;
 slot_name |  plugin  | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn 
-----------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
 db2_sub1  | pgoutput | logical   |  20751 | db2      | f         | f      |            |      |       171086 | C/C4C4EA08  | C/C4C4EA08
 db2_sub2  | pgoutput | logical   |  20751 | db2      | f         | t      |       9957 |      |       171113 | C/C4C90EE0  | C/C4CC0FE0

* so for db2_sub2 there is pid: pg_stat_activity gives me the userinfo and others,
* but how can you obtain this for the inactive subscription (db2_sub1)?


--
mvg,
Wim 
--
In India, "cold weather" is merely a conventional phrase and has come into
use through the necessity of having some way to distinguish between weather
which will melt a brass door-knob and weather which will only make it mushy.	-- Mark Twain

The replication slot connection could use any role that has been configured to allow replication. So there is no way to know which role is actually using a slot without it actually being connected. You can look and see which roles have replication privileges by checking the output of \du in psql, but there is nothing tying any roles to any specific slots.

--
Keith Fiske
Senior Database Engineer
Crunchy Data - http://crunchydata.com

Re: viewing connectioninfo used by subscriber on the publicationserver when inactive

От
Wim Bertels
Дата:
> 
> 
> The replication slot connection could use any role that has been
> configured to allow replication. So there is no way to know which
> role is actually using a slot without it actually being connected.
> You can look and see which roles have replication privileges by
> checking the output of \du in psql, but there is nothing tying any
> roles to any specific slots.

i assume that a replication slot can only be used by 1 subscriber,
and subscribers can migrate; just to be clear the conneinfo is only
kept on the subscriber and used on the publisher, but not stored on the
publisher (unless active)

so if a replication slot is inactive, you can't contact for example the
user that (implicitely) created the replication slot, to see if it is
still needed for example.

background: i'm trying out an atypical? setup with many students
subscribing to a single publication server, thus with subscribers that
are not always online. As the number of slots might quickly grow:)

> 
-- 
mvg,
Wim Bertels

--
Swerve me?  The path to my fixed purpose is laid with iron rails,
whereon my soul is grooved to run.  Over unsounded gorges, through
the rifled hearts of mountains, under torrents' beds, unerringly I rush!
        -- Captain Ahab, "Moby Dick"




Re: viewing connectioninfo used by subscriber on the publicationserver when inactive

От
Keith Fiske
Дата:


On Thu, May 14, 2020 at 8:33 AM Wim Bertels <wim.bertels@ucll.be> wrote:
>
>
> The replication slot connection could use any role that has been
> configured to allow replication. So there is no way to know which
> role is actually using a slot without it actually being connected.
> You can look and see which roles have replication privileges by
> checking the output of \du in psql, but there is nothing tying any
> roles to any specific slots.

i assume that a replication slot can only be used by 1 subscriber,
and subscribers can migrate; just to be clear the conneinfo is only
kept on the subscriber and used on the publisher, but not stored on the
publisher (unless active)

so if a replication slot is inactive, you can't contact for example the
user that (implicitely) created the replication slot, to see if it is
still needed for example.

background: i'm trying out an atypical? setup with many students
subscribing to a single publication server, thus with subscribers that
are not always online. As the number of slots might quickly grow:)

>
--
mvg,
Wim Bertels

--
Swerve me?  The path to my fixed purpose is laid with iron rails,
whereon my soul is grooved to run.  Over unsounded gorges, through
the rifled hearts of mountains, under torrents' beds, unerringly I rush!
                -- Captain Ahab, "Moby Dick"



While running CREATE SUBSCRIPTION on the subscriber side does conveniently make a slot for you back on the primary, it's also possible to manually create logical replication slots directly on the primary. So a replication slot could potentially be made directly on the publication side and have no awareness of what role may eventually connect.

If you're going to have multiple subscribers connecting to a single publication, you're going to have to have different slots for each of those subscribers anyway. Otherwise one subscriber could cause another one to miss data. Honestly, the logical replication system really isn't meant to have subscribers be offline for any extended period of time, especially if the publisher is continuing to write. I would question the design of this that is allowing the subscribers to be offline. You may want to rethink this design a bit.

--
Keith Fiske
Senior Database Engineer
Crunchy Data - http://crunchydata.com

Re: viewing connectioninfo used by subscriber on the publicationserver when inactive

От
Wim Bertels
Дата:
> period of time, especially if the publisher is continuing to write. I
> would question the design of this that is allowing the subscribers to
> be offline. You may want to rethink this design a bit.

That is another aspect, to achieve a goal there generally more options,
but not my question.

In this case: a small dataset with little changes over time is used,
(so the overhead in storage should be little) ;
as this seems an easy introduction exercise to replication for the
students who are not always online. So they see what happens, always
more appealing that just talking about it.

Just to be clear, not this case: a setup for a company

> 
-- 
mvg,
Wim 
--
Kindness is a language which the deaf can hear and the blind can read.
        -- Mark Twain




Re: viewing connectioninfo used by subscriber on the publicationserver when inactive

От
Keith Fiske
Дата:


On Thu, May 14, 2020 at 9:59 AM Wim Bertels <wim.bertels@ucll.be> wrote:
> period of time, especially if the publisher is continuing to write. I
> would question the design of this that is allowing the subscribers to
> be offline. You may want to rethink this design a bit.

That is another aspect, to achieve a goal there generally more options,
but not my question.

In this case: a small dataset with little changes over time is used,
(so the overhead in storage should be little) ;
as this seems an easy introduction exercise to replication for the
students who are not always online. So they see what happens, always
more appealing that just talking about it.

Just to be clear, not this case: a setup for a company

>
--
mvg,
Wim
--
Kindness is a language which the deaf can hear and the blind can read.
                -- Mark Twain


It doesn't matter how small the dataset change is. The same WAL stream is used for both logical and physical replication so it has to keep all WAL files until all subscribers for that publication have confirmed they have received them. If even a single subscriber goes offline, all WAL will be kept until that subscriber reconnects.

--
Keith Fiske
Senior Database Engineer
Crunchy Data - http://crunchydata.com

Re: viewing connectioninfo used by subscriber on the publicationserver when inactive

От
Wim Bertels
Дата:
Keith Fiske schreef op do 14-05-2020 om 10:08 [-0400]:
> It doesn't matter how small the dataset change is. The same WAL
> stream is used for both logical and physical replication so it has to
> keep all WAL files until all subscribers for that publication have
> confirmed they have received them. If even a single subscriber goes
> offline, all WAL will be kept until that subscriber reconnects.

That is interesting, i assume this the WAL for the whole cluster, as
logical decoding is then used on this WAL for the logical replication,
do you have an estimate of order of magnitude for the all WAL files?

So far this seems ok over here (with one subscriber inactive for 2
days):
# du -ch pg_logical/ pg_wal/ pg_replslot/
912K    pg_logical/snapshots
4,0K    pg_logical/mappings
924K    pg_logical/
4,0K    pg_wal/archive_status
81M    pg_wal/
8,0K    pg_replslot/db2_sub
8,0K    pg_replslot/db2_sub1
8,0K    pg_replslot/db2_sub2
28K    pg_replslot/
81M    totaal
this after two days of replication setup. 

assuming that students will be offline for at most 2 or 3 days,
this seems ok?

-- 
mvg,
Wim
--
Tell the truth or trump--but get the trick.
        -- Mark Twain, "Pudd'nhead Wilson's Calendar"




Re: viewing connectioninfo used by subscriber on the publicationserver when inactive

От
Keith Fiske
Дата:


On Thu, May 14, 2020 at 10:22 AM Wim Bertels <wim.bertels@ucll.be> wrote:
Keith Fiske schreef op do 14-05-2020 om 10:08 [-0400]:
> It doesn't matter how small the dataset change is. The same WAL
> stream is used for both logical and physical replication so it has to
> keep all WAL files until all subscribers for that publication have
> confirmed they have received them. If even a single subscriber goes
> offline, all WAL will be kept until that subscriber reconnects.

That is interesting, i assume this the WAL for the whole cluster, as
logical decoding is then used on this WAL for the logical replication,
do you have an estimate of order of magnitude for the all WAL files?

So far this seems ok over here (with one subscriber inactive for 2
days):
# du -ch pg_logical/ pg_wal/ pg_replslot/
912K    pg_logical/snapshots
4,0K    pg_logical/mappings
924K    pg_logical/
4,0K    pg_wal/archive_status
81M     pg_wal/
8,0K    pg_replslot/db2_sub
8,0K    pg_replslot/db2_sub1
8,0K    pg_replslot/db2_sub2
28K     pg_replslot/
81M     totaal
this after two days of replication setup.

assuming that students will be offline for at most 2 or 3 days,
this seems ok?

--
mvg,
Wim
--
Tell the truth or trump--but get the trick.
                -- Mark Twain, "Pudd'nhead Wilson's Calendar"



WAL is always cluster/instance-wide. Locally generated WAL files are, by default, always 16MB in size no matter how little has changed. So if you have archive_timeout set and only a single row has changed, that WAL file will still be 16MB (although it will compress down very small if you're archiving elsewhere).

How much of an impact this will be is entirely dependent on the write rate of your cluster. If you have very few writes you may be fine. But I would definitely suggest getting some monitoring in place if you expect to have offline subscribers for any long period of time.

But, again, I would still try and rethink this strategy. Offline subscribers can be a very big problem if they never come back. Not only because you'd eventually fill up your disk, but also because that no longer allows PG to recycle its WAL files, or can cause excessive cleanup operations when that subscriber finally comes back, which can have big IO impacts.
--
Keith Fiske
Senior Database Engineer
Crunchy Data - http://crunchydata.com

Re: viewing connectioninfo used by subscriber on the publicationserver when inactive

От
Scott Ribe
Дата:
> On May 14, 2020, at 8:34 AM, Keith Fiske <keith.fiske@crunchydata.com> wrote:
>
> But, again, I would still try and rethink this strategy.

For instance:

- single replication slot
- logical decoding locally that filters updates and records the relevant ones into a serialized updates table
- clients record their position in the updates table, ask for new updates on connect

Or, similar to above, but just use triggers on relevant tables.


Re: viewing connectioninfo used by subscriber on the publicationserver when inactive

От
Wim Bertels
Дата:
Keith Fiske schreef op do 14-05-2020 om 10:34 [-0400]:
> WAL is always cluster/instance-wide. Locally generated WAL files are,
> by default, always 16MB in size no matter how little has changed. So
> if you have archive_timeout set and only a single row has changed,
> that WAL file will still be 16MB (although it will compress down very
> small if you're archiving elsewhere).
> 
> How much of an impact this will be is entirely dependent on the write
> rate of your cluster. If you have very few writes you may be fine.
> But I would definitely suggest getting some monitoring in place if
> you expect to have offline subscribers for any long period of time. 

this is not a typical use case,
and this server is just setup for only this purpose, there are very
little writes, but i understand your concern for other people reading
this without the proper context of this thread.

monitoring: as this was part of my original question, do you have a
suggestion for this setup?, preferably using a solution that is
available in the debian repo (apt.postgresql or debian)

> 
> But, again, I would still try and rethink this strategy. Offline
> subscribers can be a very big problem if they never come back. Not
> only because you'd eventually fill up your disk, but also because
> that no longer allows PG to recycle its WAL files, or can cause
> excessive cleanup operations when that subscriber finally comes back,
> which can have big IO impacts.

as it is a demo/exercise setup with very little writes,
and was planning on using pg_cron weekly to cleanup inactive slots.

-- 
mvg,
Wim 
--
The bone-chilling scream split the warm summer night in two, the first
half being before the scream when it was fairly balmy and calm and
pleasant, the second half still balmy and quite pleasant for those who
hadn't heard the scream at all, but not calm or balmy or even very nice
for those who did hear the scream, discounting the little period of time
during the actual scream itself when your ears might have been hearing it
but your brain wasn't reacting yet to let you know.
        -- Winning sentence, 1986 Bulwer-Lytton bad fiction contest.




Re: viewing connectioninfo used by subscriber on the publicationserver when inactive

От
Wim Bertels
Дата:
Wim Bertels schreef op do 14-05-2020 om 16:22 [+0200]:
> Keith Fiske schreef op do 14-05-2020 om 10:08 [-0400]:
> > It doesn't matter how small the dataset change is. The same WAL
> > stream is used for both logical and physical replication so it has
> > to
> > keep all WAL files until all subscribers for that publication have
> > confirmed they have received them. If even a single subscriber goes
> > offline, all WAL will be kept until that subscriber reconnects.
> 
> That is interesting, i assume this the WAL for the whole cluster, as
> logical decoding is then used on this WAL for the logical
> replication,
> do you have an estimate of order of magnitude for the all WAL files?
> 
> So far this seems ok over here (with one subscriber inactive for 2
> days):
> # du -ch pg_logical/ pg_wal/ pg_replslot/
> 912K    pg_logical/snapshots
> 4,0K    pg_logical/mappings
> 924K    pg_logical/
> 4,0K    pg_wal/archive_status
> 81M    pg_wal/
> 8,0K    pg_replslot/db2_sub
> 8,0K    pg_replslot/db2_sub1
> 8,0K    pg_replslot/db2_sub2
> 28K    pg_replslot/
> 81M    totaal
> this after two days of replication setup. 
> 
> assuming that students will be offline for at most 2 or 3 days,
> this seems ok?

Follow-up info after 14 days: (this is a typical usecase)

the main stress seems to be the cpu, the load is 3-6 as high as normal,
storage and memory are ok

pg_logical/snapshots takes the most storage in this use case (little
writes, many async log replication users), not pg_wal

8,4G    pg_logical/snapshots
4,0K    pg_logical/mappings
8,4G    pg_logical/
4,0K    pg_wal/archive_status
161M    pg_wal/
8,0K    pg_replslot/db2_sub
8,0K    pg_replslot/db2_sub_ma
8,0K    pg_replslot/db2_sub_r074
8,0K    pg_replslot/db2_sub_si
8,0K    pg_replslot/db2_sub_r0604
8,0K    pg_replslot/db2_sub_r0761
8,0K    pg_replslot/db2_r0653
8,0K    pg_replslot/nook_subscription
8,0K    pg_replslot/r0745
8,0K    pg_replslot/db2_sub_ma
8,0K    pg_replslot/db2_sub_an
8,0K    pg_replslot/db2_sub_th
8,0K    pg_replslot/r07456
8,0K    pg_replslot/db2_sub5
8,0K    pg_replslot/mysub
12K    pg_replslot/db2_sub_mart
8,0K    pg_replslot/sub_db2
8,0K    pg_replslot/db2_r653
8,0K    pg_replslot/sub_rube
8,0K    pg_replslot/db2_sub_
8,0K    pg_replslot/db2_sub_b
8,0K    pg_replslot/db2_sub2
8,0K    pg_replslot/db2_sub_b
8,0K    pg_replslot/db2_sub73
8,0K    pg_replslot/db2_sub_r74
8,0K    pg_replslot/db2_sub_andr
8,0K    pg_replslot/db2_0653
8,0K    pg_replslot/db2_sub_i
8,0K    pg_replslot/db2_pub
240K    pg_replslot/
8,6G    totaal

simple user logging:
a psql outfile is 17MB after 14 days (where the replication connections
are appended each minute)

> 





Re: viewing connectioninfo used by subscriber on the publicationserver when inactive

От
Wim Bertels
Дата:
Wim Bertels schreef op di 26-05-2020 om 15:29 [+0200]:
> Wim Bertels schreef op do 14-05-2020 om 16:22 [+0200]:
> > Keith Fiske schreef op do 14-05-2020 om 10:08 [-0400]:
> > > 
> 
> Follow-up info after 14 days: (this is a typical usecase)

sorry, this is NOT a typical usecase