Обсуждение: BUG #13443: master will remove dead rows when hot standby(use slot) disconnect

Поиск
Список
Период
Сортировка

BUG #13443: master will remove dead rows when hot standby(use slot) disconnect

От
digoal@126.com
Дата:
The following bug has been logged on the website:

Bug reference:      13443
Logged by:          digoal
Email address:      digoal@126.com
PostgreSQL version: Unsupported/Unknown
Operating system:   CentOS 6.x x64
Description:

http://www.postgresql.org/docs/devel/static/warm-standby.html#STREAMING-REPLICATION

25.2.6. Replication Slots
Replication slots provide an automated way to ensure that the master does
not remove WAL segments until they have been received by all standbys, and
that the master does not remove rows which could cause a recovery conflict
even when the standby is disconnected.

In my test, master will remove rows when standby disconnect.

standby:
postgres=# begin transaction isolation level repeatable read ;
BEGIN
postgres=# select * from test;
 id
----
  1
  1
  1
  1
  1
(5 rows)

master:
delete from test;
postgres=# vacuum verbose test;
INFO:  vacuuming "public.test"
INFO:  "test": found 0 removable, 5 nonremovable row versions in 1 out of 1
pages
DETAIL:  5 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM

but when i stop standby:
pg95@db-172-16-3-150-> pg_ctl stop -m immediate -D
/data03/pg95_stdby/pg_root
waiting for server to shut down.... done
server stopped

the deadtuple will removes.
postgres=# vacuum verbose test;
INFO:  vacuuming "public.test"
INFO:  "test": removed 5 row versions in 1 pages
INFO:  "test": found 5 removable, 0 nonremovable row versions in 1 out of 1
pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "test": truncated 1 to 0 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.03 sec.
VACUUM

Re: BUG #13443: master will remove dead rows when hot standby(use slot) disconnect

От
Jeff Janes
Дата:
On Mon, Jun 15, 2015 at 2:05 AM, <digoal@126.com> wrote:

> The following bug has been logged on the website:
>
> Bug reference:      13443
> Logged by:          digoal
> Email address:      digoal@126.com
> PostgreSQL version: Unsupported/Unknown
>


Does that mean you are using 9.5dev?  If so, do you see the same behavior
in 9.4.4?



> Operating system:   CentOS 6.x x64
> Description:
>
>
> http://www.postgresql.org/docs/devel/static/warm-standby.html#STREAMING-REPLICATION
>
> 25.2.6. Replication Slots
> Replication slots provide an automated way to ensure that the master does
> not remove WAL segments until they have been received by all standbys, and
> that the master does not remove rows which could cause a recovery conflict
> even when the standby is disconnected.
>
> In my test, master will remove rows when standby disconnect.
>

I can't reproduce this.  In my hands when the standby crashes, tuples on
the master are not removed until either that slot is destroyed on the
master, or until the standby reconnects.

Can you show us all the settings changes you have made to postgresql.conf
and recovery.conf, and to the replication slots table?

One potential doc bug I see is that the it seems to imply that replication
slots replaces the need for hot_standby_feedback, when it fact it must be
used in conjunction with it.  Do you have hot_standby_feedback turned on in
the standby?

Cheers,

Jeff

Re: BUG #13443: master will remove dead rows when hot standby(use slot) disconnect

От
Michael Paquier
Дата:
On Tue, Jun 16, 2015 at 4:30 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Mon, Jun 15, 2015 at 2:05 AM, <digoal@126.com> wrote:
>> http://www.postgresql.org/docs/devel/static/warm-standby.html#STREAMING-REPLICATION
>>
>> 25.2.6. Replication Slots
>> Replication slots provide an automated way to ensure that the master does
>> not remove WAL segments until they have been received by all standbys, and
>> that the master does not remove rows which could cause a recovery conflict
>> even when the standby is disconnected.
>>
>> In my test, master will remove rows when standby disconnect.
>
>
> I can't reproduce this.  In my hands when the standby crashes, tuples on the
> master are not removed until either that slot is destroyed on the master, or
> until the standby reconnects.

Yep.

> Can you show us all the settings changes you have made to postgresql.conf
> and recovery.conf, and to the replication slots table?

Yes, perhaps the standby has already acknowledged the dead tuples
before you shut it down.

> One potential doc bug I see is that the it seems to imply that replication
> slots replaces the need for hot_standby_feedback, when it fact it must be
> used in conjunction with it.  Do you have hot_standby_feedback turned on in
> the standby?

As far as I recall, using replication slots implies that the
RecentGlobalXmin horizon is updated to guarantee the presence of
tuples on the standby once it reconnects. Perhaps I am missing
something?
--
Michael

Re: BUG #13443: master will remove dead rows when hot standby(use slot) disconnect

От
德哥
Дата:
At 2015-06-16 08:52:48, "Michael Paquier" <michael.paquier@gmail.com> wrote:
>On Tue, Jun 16, 2015 at 4:30 AM, Jeff Janes <jeff.janes@gmail.com> wrote: >> On Mon, Jun 15, 2015 at 2:05 AM, <digoal@126.com> wrote: >>> http://www.postgresql.org/docs/devel/static/warm-standby.html#STREAMING-REPLICATION >>> >>> 25.2.6. Replication Slots >>> Replication slots provide an automated way to ensure that the master does >>> not remove WAL segments until they have been received by all standbys, and >>> that the master does not remove rows which could cause a recovery conflict >>> even when the standby is disconnected. >>> >>> In my test, master will remove rows when standby disconnect. >> >> >> I can't reproduce this. In my hands when the standby crashes, tuples on the >> master are not removed until either that slot is destroyed on the master, or >> until the standby reconnects. > >Yep. > >> Can you show us all the settings changes you have made to postgresql.conf >> and recovery.conf, and to the replication slots table? > >Yes, perhaps the standby has already acknowledged the dead tuples >before you shut it down. > >> One potential doc bug I see is that the it seems to imply that replication >> slots replaces the need for hot_standby_feedback, when it fact it must be >> used in conjunction with it. Do you have hot_standby_feedback turned on in
>> the standby?
yes, i'm not open the hot_standby_feedback, thanks. When open it, no problem now.

> >As far as I recall, using replication slots implies that the >RecentGlobalXmin horizon is updated to guarantee the presence of >tuples on the standby once it reconnects. Perhaps I am missing >something? >--
>Michael

Example:
///////////////////////// Primary: postgres=# select * from pg_file_settings; sourcefile | sourceline | seqno | name | setting -----------------------------------------------+------------+-------+----------------------------+-------------------- /data03/pgdata95/pg_root/postgresql.conf | 63 | 1 | port | 1922 /data03/pgdata95/pg_root/postgresql.conf | 64 | 2 | max_connections | 100 /data03/pgdata95/pg_root/postgresql.conf | 68 | 3 | unix_socket_directories | . /data03/pgdata95/pg_root/postgresql.conf | 71 | 4 | unix_socket_permissions | 0700 /data03/pgdata95/pg_root/postgresql.conf | 102 | 5 | tcp_keepalives_idle | 60 /data03/pgdata95/pg_root/postgresql.conf | 104 | 6 | tcp_keepalives_interval | 10 /data03/pgdata95/pg_root/postgresql.conf | 106 | 7 | tcp_keepalives_count | 10 /data03/pgdata95/pg_root/postgresql.conf | 116 | 8 | shared_buffers | 4096MB /data03/pgdata95/pg_root/postgresql.conf | 131 | 9 | dynamic_shared_memory_type | posix /data03/pgdata95/pg_root/postgresql.conf | 160 | 10 | bgwriter_delay | 10ms /data03/pgdata95/pg_root/postgresql.conf | 176 | 11 | wal_level | logical /data03/pgdata95/pg_root/postgresql.conf | 179 | 12 | synchronous_commit | off /data03/pgdata95/pg_root/postgresql.conf | 189 | 13 | wal_compression | on /data03/pgdata95/pg_root/postgresql.conf | 192 | 14 | wal_buffers | 16MB /data03/pgdata95/pg_root/postgresql.conf | 194 | 15 | wal_writer_delay | 10ms /data03/pgdata95/pg_root/postgresql.conf | 202 | 16 | max_wal_size | 8GB /data03/pgdata95/pg_root/postgresql.conf | 227 | 17 | max_wal_senders | 32 /data03/pgdata95/pg_root/postgresql.conf | 232 | 18 | max_replication_slots | 32 /data03/pgdata95/pg_root/postgresql.conf | 234 | 19 | track_commit_timestamp | off /data03/pgdata95/pg_root/postgresql.conf | 322 | 20 | log_destination | csvlog /data03/pgdata95/pg_root/postgresql.conf | 328 | 21 | logging_collector | on /data03/pgdata95/pg_root/postgresql.conf | 340 | 22 | log_truncate_on_rotation | on /data03/pgdata95/pg_root/postgresql.conf | 446 | 23 | log_timezone | PRC /data03/pgdata95/pg_root/postgresql.conf | 533 | 24 | datestyle | iso, mdy /data03/pgdata95/pg_root/postgresql.conf | 535 | 25 | timezone | PRC /data03/pgdata95/pg_root/postgresql.conf | 548 | 26 | lc_messages | C /data03/pgdata95/pg_root/postgresql.conf | 550 | 27 | lc_monetary | C /data03/pgdata95/pg_root/postgresql.conf | 551 | 28 | lc_numeric | C /data03/pgdata95/pg_root/postgresql.conf | 552 | 29 | lc_time | C /data03/pgdata95/pg_root/postgresql.conf | 555 | 30 | default_text_search_config | pg_catalog.english /data03/pgdata95/pg_root/postgresql.auto.conf | 3 | 31 | track_commit_timestamp | on /data03/pgdata95/pg_root/postgresql.auto.conf | 4 | 32 | shared_buffers | 8GB /data03/pgdata95/pg_root/postgresql.auto.conf | 5 | 33 | synchronous_commit | on (33 rows) ///////////////////////// Standby: pg95@db-172-16-3-150-> psql -h 127.0.0.1 -p 1923 psql (9.5devel) Type "help" for help. postgres=# select * from pg_file_settings; sourcefile | sourceline | seqno | name | setting -------------------------------------------------+------------+-------+------------------------------+-------------------- /data03/pg95_stdby/pg_root/postgresql.conf | 63 | 1 | port | 1923 /data03/pg95_stdby/pg_root/postgresql.conf | 64 | 2 | max_connections | 100 /data03/pg95_stdby/pg_root/postgresql.conf | 68 | 3 | unix_socket_directories | . /data03/pg95_stdby/pg_root/postgresql.conf | 71 | 4 | unix_socket_permissions | 0700 /data03/pg95_stdby/pg_root/postgresql.conf | 102 | 5 | tcp_keepalives_idle | 60 /data03/pg95_stdby/pg_root/postgresql.conf | 104 | 6 | tcp_keepalives_interval | 10 /data03/pg95_stdby/pg_root/postgresql.conf | 106 | 7 | tcp_keepalives_count | 10 /data03/pg95_stdby/pg_root/postgresql.conf | 116 | 8 | shared_buffers | 4096MB /data03/pg95_stdby/pg_root/postgresql.conf | 131 | 9 | dynamic_shared_memory_type | posix /data03/pg95_stdby/pg_root/postgresql.conf | 160 | 10 | bgwriter_delay | 10ms /data03/pg95_stdby/pg_root/postgresql.conf | 176 | 11 | wal_level | logical /data03/pg95_stdby/pg_root/postgresql.conf | 179 | 12 | synchronous_commit | off /data03/pg95_stdby/pg_root/postgresql.conf | 189 | 13 | wal_compression | on /data03/pg95_stdby/pg_root/postgresql.conf | 192 | 14 | wal_buffers | 16MB /data03/pg95_stdby/pg_root/postgresql.conf | 194 | 15 | wal_writer_delay | 10ms /data03/pg95_stdby/pg_root/postgresql.conf | 202 | 16 | max_wal_size | 8GB /data03/pg95_stdby/pg_root/postgresql.conf | 227 | 17 | max_wal_senders | 32 /data03/pg95_stdby/pg_root/postgresql.conf | 232 | 18 | max_replication_slots | 32 /data03/pg95_stdby/pg_root/postgresql.conf | 234 | 19 | track_commit_timestamp | off /data03/pg95_stdby/pg_root/postgresql.conf | 250 | 20 | hot_standby | on /data03/pg95_stdby/pg_root/postgresql.conf | 258 | 21 | wal_receiver_status_interval | 1s /data03/pg95_stdby/pg_root/postgresql.conf | 260 | 22 | hot_standby_feedback | on /data03/pg95_stdby/pg_root/postgresql.conf | 322 | 23 | log_destination | csvlog /data03/pg95_stdby/pg_root/postgresql.conf | 328 | 24 | logging_collector | on /data03/pg95_stdby/pg_root/postgresql.conf | 340 | 25 | log_truncate_on_rotation | on /data03/pg95_stdby/pg_root/postgresql.conf | 446 | 26 | log_timezone | PRC /data03/pg95_stdby/pg_root/postgresql.conf | 533 | 27 | datestyle | iso, mdy /data03/pg95_stdby/pg_root/postgresql.conf | 535 | 28 | timezone | PRC /data03/pg95_stdby/pg_root/postgresql.conf | 548 | 29 | lc_messages | C /data03/pg95_stdby/pg_root/postgresql.conf | 550 | 30 | lc_monetary | C /data03/pg95_stdby/pg_root/postgresql.conf | 551 | 31 | lc_numeric | C /data03/pg95_stdby/pg_root/postgresql.conf | 552 | 32 | lc_time | C /data03/pg95_stdby/pg_root/postgresql.conf | 555 | 33 | default_text_search_config | pg_catalog.english /data03/pg95_stdby/pg_root/postgresql.auto.conf | 3 | 34 | track_commit_timestamp | on /data03/pg95_stdby/pg_root/postgresql.auto.conf | 4 | 35 | shared_buffers | 8GB /data03/pg95_stdby/pg_root/postgresql.auto.conf | 5 | 36 | synchronous_commit | on (36 rows) pg95@db-172-16-3-150-> cd /data03/pg95_stdby/pg_root/ pg95@db-172-16-3-150-> cat recovery.conf |grep ^[a-z] recovery_target_timeline = 'latest' standby_mode = on primary_conninfo = 'hostaddr=127.0.0.1 port=1922' # e.g. 'host=localhost port=5432' primary_slot_name = 'abc' ///////////////////////// Primary: pg95@db-172-16-3-150-> psql psql (9.5devel) Type "help" for help. postgres=# select pg_create_physical_replication_slot('abc'); postgres=# select * from pg_replication_slots ; slot_name | plugin | slot_type | datoid | database | active | active_pid | xmin | catalog_xmin | restart_lsn -----------+--------+-----------+--------+----------+--------+------------+------+--------------+------------- abc | | physical | | | t | 5093 | 1809 | | 6/6E000000 (1 row) postgres=# create table test(id int) postgres-# ; CREATE TABLE postgres=# insert into test values (1),(2),(3); INSERT 0 3 ///////////////////////// Standby: pg95@db-172-16-3-150-> psql -h 127.0.0.1 -p 1923 postgres=# begin transaction isolation level repeatable read; BEGIN postgres=# select * from test; id ---- 1 2 3 (3 rows) ///////////////////////// Primary: postgres=# delete from test; DELETE 3 postgres=# vacuum verbose test; INFO: vacuuming "public.test" INFO: "test": found 0 removable, 3 nonremovable row versions in 1 out of 1 pages DETAIL: 3 dead row versions cannot be removed yet. There were 0 unused item pointers. Skipped 0 pages due to buffer pins. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM postgres=# select * from pg_stat_replication ; -[ RECORD 1 ]----+------------------------------ pid | 5093 usesysid | 10 usename | postgres application_name | walreceiver client_addr | 127.0.0.1 client_hostname | client_port | 20694 backend_start | 2015-06-16 08:50:58.779855+08 backend_xmin | state | streaming sent_location | 7/77B09918 write_location | 7/77B09918 flush_location | 7/77B09918 replay_location | 7/77B09918 sync_priority | 0 sync_state | async postgres=# checkpoint; CHECKPOINT postgres=# vacuum VERBOSE test; INFO: vacuuming "public.test" INFO: "test": found 0 removable, 3 nonremovable row versions in 1 out of 1 pages DETAIL: 3 dead row versions cannot be removed yet. There were 0 unused item pointers. Skipped 0 pages due to buffer pins. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM postgres=# select * from pg_replication_slots ; -[ RECORD 1 ]+----------- slot_name | abc plugin | slot_type | physical datoid | database | active | t active_pid | 5093 xmin | 1815 catalog_xmin | restart_lsn | 7/77B099F8 postgres=# select * from pg_stat_replication ; -[ RECORD 1 ]----+------------------------------ pid | 5093 usesysid | 10 usename | postgres application_name | walreceiver client_addr | 127.0.0.1 client_hostname | client_port | 20694 backend_start | 2015-06-16 08:50:58.779855+08 backend_xmin | state | streaming sent_location | 7/77B099F8 write_location | 7/77B099F8 flush_location | 7/77B099F8 replay_location | 7/77B099F8 sync_priority | 0 sync_state | async ///////////////////////// HOST: [root@db-172-16-3-150 backend]# ifdown lo ///////////////////////// Primary: postgres=# select * from pg_stat_replication ; pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state -----+----------+---------+------------------+-------------+-----------------+-------------+---------------+--------------+-------+- --------------+----------------+----------------+-----------------+---------------+------------ (0 rows) postgres=# vacuum VERBOSE test; INFO: vacuuming "public.test" INFO: "test": found 0 removable, 3 nonremovable row versions in 1 out of 1 pages DETAIL: 3 dead row versions cannot be removed yet. There were 0 unused item pointers. Skipped 0 pages due to buffer pins. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM ///////////////////////// HOST: [root@db-172-16-3-150 backend]# ifup lo ///////////////////////// Primary: postgres=# vacuum VERBOSE test; INFO: vacuuming "public.test" INFO: "test": removed 3 row versions in 1 pages INFO: "test": found 3 removable, 0 nonremovable row versions in 1 out of 1 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. Skipped 0 pages due to buffer pins. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "test": truncated 1 to 0 pages DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM ///////////////////////// use shutdown immediate can also reflect it: ///////////////////////// Primary: postgres=# insert into test values (1),(2),(3); INSERT 0 3 ///////////////////////// Standby: postgres=# begin transaction isolation level repeatable read; BEGIN postgres=# select * from test; id ---- 1 2 3 (3 rows) ///////////////////////// Primary: postgres=# delete from test; DELETE 3 postgres=# vacuum VERBOSE test; INFO: vacuuming "public.test" INFO: "test": found 0 removable, 3 nonremovable row versions in 1 out of 1 pages DETAIL: 3 dead row versions cannot be removed yet. There were 0 unused item pointers. Skipped 0 pages due to buffer pins. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM ///////////////////////// shutdown hot_standby: pg95@db-172-16-3-150-> pg_ctl stop -m immediate -D /data03/pg95_stdby/pg_root waiting for server to shut down.... done server stopped ///////////////////////// Primary: postgres=# select * from pg_stat_replication ; (0 rows) postgres=# select * from pg_replication_slots; -[ RECORD 1 ]+----------- slot_name | abc plugin | slot_type | physical datoid | database | active | f active_pid | xmin | 1818 catalog_xmin | restart_lsn | 7/77B0B690 postgres=# vacuum VERBOSE test; INFO: vacuuming "public.test" INFO: "test": found 0 removable, 3 nonremovable row versions in 1 out of 1 pages DETAIL: 3 dead row versions cannot be removed yet. There were 0 unused item pointers. Skipped 0 pages due to buffer pins. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM postgres=# select pg_drop_replication_slot('abc'); -[ RECORD 1 ]------------+- pg_drop_replication_slot | postgres=# vacuum VERBOSE test; INFO: vacuuming "public.test" INFO: "test": removed 3 row versions in 1 pages INFO: "test": found 3 removable, 0 nonremovable row versions in 1 out of 1 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. Skipped 0 pages due to buffer pins. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "test": truncated 1 to 0 pages DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM

Re: BUG #13443: master will remove dead rows when hot standby(use slot) disconnect

От
Jeff Janes
Дата:
On Mon, Jun 15, 2015 at 5:52 PM, Michael Paquier <michael.paquier@gmail.com>
wrote:

> On Tue, Jun 16, 2015 at 4:30 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> > On Mon, Jun 15, 2015 at 2:05 AM, <digoal@126.com> wrote:
> >>
> http://www.postgresql.org/docs/devel/static/warm-standby.html#STREAMING-REPLICATION
> >>
> >> 25.2.6. Replication Slots
> >> Replication slots provide an automated way to ensure that the master
> does
> >> not remove WAL segments until they have been received by all standbys,
> and
> >> that the master does not remove rows which could cause a recovery
> conflict
> >> even when the standby is disconnected.
> >>
>
>
> > One potential doc bug I see is that the it seems to imply that
> replication
> > slots replaces the need for hot_standby_feedback, when it fact it must be
> > used in conjunction with it.  Do you have hot_standby_feedback turned on
> in
> > the standby?
>
> As far as I recall, using replication slots implies that the
> RecentGlobalXmin horizon is updated to guarantee the presence of
> tuples on the standby once it reconnects. Perhaps I am missing
> something?
>

I haven't looked at the code, or paid much attention when the feature went
in.  From the docs, I thought that is what would happen as well.  But
experimentally, that only happens if hot_standby_feedback is on.  I get the
same behavior
on 9.4.4 and on 9.5devel.

postgresql.conf:

wal_level = logical
max_wal_senders = 3
max_replication_slots = 3
hot_standby = on
hot_standby_feedback = off    ## toggled on standby over course of
experiment
port=5433     ### on standby only. default on master

on master:
SELECT * FROM pg_create_physical_replication_slot('node_a_slot');

recovery.conf:
standby_mode = 'on'
primary_conninfo = 'port=5432 user=jjanes'
primary_slot_name = 'node_a_slot'

I repeat Digoal's example, only don't crash the standby.

I leave the stand-by connected, with a client sitting in a repeatable read
transaction, and then delete the tuples on the master side.  Vacuum will
quickly reclaim the tuples, and then after 30s the stand-by session gets
disconnect due to conflict.  When I toggle hot_standby_feedback on, then it
behaves as expected, with the master never cleaning up the deleted tuples.


But if I do set hot_standby_feedback to on, then I still can't reproduce
Digoal's example.

Cheers,

Jeff

Re: BUG #13443: master will remove dead rows when hot standby(use slot) disconnect

От
Andres Freund
Дата:
On 2015-06-16 10:00:13 -0700, Jeff Janes wrote:
> On Mon, Jun 15, 2015 at 5:52 PM, Michael Paquier <michael.paquier@gmail.com>
> wrote:
> I haven't looked at the code, or paid much attention when the feature went
> in.  From the docs, I thought that is what would happen as well.  But
> experimentally, that only happens if hot_standby_feedback is on.  I get the
> same behavior
> on 9.4.4 and on 9.5devel.
>
> postgresql.conf:
>
> wal_level = logical
> max_wal_senders = 3
> max_replication_slots = 3
> hot_standby = on
> hot_standby_feedback = off    ## toggled on standby over course of
> experiment
> port=5433     ### on standby only. default on master
>
> on master:
> SELECT * FROM pg_create_physical_replication_slot('node_a_slot');
>
> recovery.conf:
> standby_mode = 'on'
> primary_conninfo = 'port=5432 user=jjanes'
> primary_slot_name = 'node_a_slot'
>
> I repeat Digoal's example, only don't crash the standby.
>
> I leave the stand-by connected, with a client sitting in a repeatable read
> transaction, and then delete the tuples on the master side.  Vacuum will
> quickly reclaim the tuples, and then after 30s the stand-by session gets
> disconnect due to conflict.  When I toggle hot_standby_feedback on, then it
> behaves as expected, with the master never cleaning up the deleted tuples.

It's expected that we only hold up the xmin horizon on the primary via
slots if hot_standby_feedback is enabled. It seemed - and still seems -
like a bad idea to force hs_feedback to on if slots are used, using it
is much more expensive than just retaining WAL.  Do you disagree?

Where do the docs imply the contrary?

Greetings,

Andres Freund

Re: BUG #13443: master will remove dead rows when hot standby(use slot) disconnect

От
Jeff Janes
Дата:
On Tue, Jun 16, 2015 at 12:21 PM, Andres Freund <andres@anarazel.de> wrote:

> On 2015-06-16 10:00:13 -0700, Jeff Janes wrote:
> > On Mon, Jun 15, 2015 at 5:52 PM, Michael Paquier <
> michael.paquier@gmail.com>
> > wrote:
> > I haven't looked at the code, or paid much attention when the feature
> went
> > in.  From the docs, I thought that is what would happen as well.  But
> > experimentally, that only happens if hot_standby_feedback is on.  I get
> the
> > same behavior
> > on 9.4.4 and on 9.5devel.
> >
> > postgresql.conf:
> >
> > wal_level = logical
> > max_wal_senders = 3
> > max_replication_slots = 3
> > hot_standby = on
> > hot_standby_feedback = off    ## toggled on standby over course of
> > experiment
> > port=5433     ### on standby only. default on master
> >
> > on master:
> > SELECT * FROM pg_create_physical_replication_slot('node_a_slot');
> >
> > recovery.conf:
> > standby_mode = 'on'
> > primary_conninfo = 'port=5432 user=jjanes'
> > primary_slot_name = 'node_a_slot'
> >
> > I repeat Digoal's example, only don't crash the standby.
> >
> > I leave the stand-by connected, with a client sitting in a repeatable
> read
> > transaction, and then delete the tuples on the master side.  Vacuum will
> > quickly reclaim the tuples, and then after 30s the stand-by session gets
> > disconnect due to conflict.  When I toggle hot_standby_feedback on, then
> it
> > behaves as expected, with the master never cleaning up the deleted
> tuples.
>
> It's expected that we only hold up the xmin horizon on the primary via
> slots if hot_standby_feedback is enabled. It seemed - and still seems -
> like a bad idea to force hs_feedback to on if slots are used, using it
> is much more expensive than just retaining WAL.  Do you disagree?
>

No disagreement.  I don't know much about the expensive, and don't have an
opinion one
way or the other.


>
> Where do the docs imply the contrary?
>

"Replication slots provide an automated way to ensure that the master does
not remove WAL segments until they have been received by all standbys, and
that the master does not remove rows which could cause a recovery conflict
even when the standby is disconnected."

To me, "automated way to ensure" implies you don't need to also set other
things, that it is automatic.

Perhaps a change to  "Replication slots provide an automated way to ensure
that the master does not remove WAL segments until they have been received
by all standbys.  With hs_feedback turned on, it also ensures that the
master does not remove rows which could cause a recovery conflict even when
the standby is disconnected."

Also "Similarly, hot_standby_feedback and vacuum_defer_cleanup_age provide
protection against relevant rows being removed by vacuum, but the former
provides no protection during any time period when the standby is not
connected, and the latter often needs to be set to a high value to provide
adequate protection. Replication slots overcome these disadvantages."

It reads to me like slots is being offered as an alternative, not an
adjunct, to hs_feedback.  It overcomes the disadvantages for the two things
in dissimilar ways: one by bridging the connectivity gap; and the other by
making the use of the setting unnecessary in the first place.  Having the
dissimilar things presented in parallel was unintuitive to me.  You can
read it correctly, provided you already know what the correct reading is,
but it doesn't really provide much illumination to the learner.  But if the
previous paragraph were explicit, then this would be fine.


Jeff

Re: BUG #13443: master will remove dead rows when hot standby(use slot) disconnect

От
Michael Paquier
Дата:
On Wed, Jun 17, 2015 at 4:58 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Tue, Jun 16, 2015 at 12:21 PM, Andres Freund <andres@anarazel.de> wrote:
>>
>> On 2015-06-16 10:00:13 -0700, Jeff Janes wrote:
>> > On Mon, Jun 15, 2015 at 5:52 PM, Michael Paquier
>> > <michael.paquier@gmail.com>
>> > wrote:
>> > I haven't looked at the code, or paid much attention when the feature
>> > went
>> > in.  From the docs, I thought that is what would happen as well.  But
>> > experimentally, that only happens if hot_standby_feedback is on.  I get
>> > the
>> > same behavior
>> > on 9.4.4 and on 9.5devel.
>> >
>> > postgresql.conf:
>> >
>> > wal_level = logical
>> > max_wal_senders = 3
>> > max_replication_slots = 3
>> > hot_standby = on
>> > hot_standby_feedback = off    ## toggled on standby over course of
>> > experiment
>> > port=5433     ### on standby only. default on master
>> >
>> > on master:
>> > SELECT * FROM pg_create_physical_replication_slot('node_a_slot');
>> >
>> > recovery.conf:
>> > standby_mode = 'on'
>> > primary_conninfo = 'port=5432 user=jjanes'
>> > primary_slot_name = 'node_a_slot'
>> >
>> > I repeat Digoal's example, only don't crash the standby.
>> >
>> > I leave the stand-by connected, with a client sitting in a repeatable
>> > read
>> > transaction, and then delete the tuples on the master side.  Vacuum will
>> > quickly reclaim the tuples, and then after 30s the stand-by session gets
>> > disconnect due to conflict.  When I toggle hot_standby_feedback on, then
>> > it
>> > behaves as expected, with the master never cleaning up the deleted
>> > tuples.
>>
>> It's expected that we only hold up the xmin horizon on the primary via
>> slots if hot_standby_feedback is enabled. It seemed - and still seems -
>> like a bad idea to force hs_feedback to on if slots are used, using it
>> is much more expensive than just retaining WAL.  Do you disagree?
>
>
> No disagreement.  I don't know much about the expensive, and don't have an
> opinion one
> way or the other.

Agreement from here as well. So my recollection of elements on the
matter was incorrect. Thanks for the clarification.
--
Michael

Re: BUG #13443: master will remove dead rows when hot standby(use slot) disconnect

От
Jeff Janes
Дата:
On Mon, Jun 15, 2015 at 6:20 PM, =E5=BE=B7=E5=93=A5 <digoal@126.com> wrote:
> At 2015-06-16 08:52:48, "Michael Paquier" <michael.paquier@gmail.com>
wrote:
> >On Tue, Jun 16, 2015 at 4:30 AM, Jeff Janes <jeff.janes@gmail.com> wrote=
:
> >> On Mon, Jun 15, 2015 at 2:05 AM, <digoal@126.com> wrote:
>
> >> One potential doc bug I see is that the it seems to imply that
replication
> >> slots replaces the need for hot_standby_feedback, when it fact it must
be
> >> used in conjunction with it.  Do you have hot_standby_feedback turned
on in
> >> the standby?
>
> yes, i'm not open the hot_standby_feedback, thanks. When open it, no
problem now.

Sorry, I missed this sentence when I first read your reply.  I thought you
still had an issue, but now I see that the only problem is that both of us
were confused by the documentation wording.

Incidentally for anyone wanting to do similar tests, 'ifdown lo' is ignored
on ubuntu trusty, you have to use 'ifconfig lo down' instead to take down
the loopback interface.

Cheers,

Jeff