Re: Introduce XID age and inactive timeout based replication slot invalidation
От | Bharath Rupireddy |
---|---|
Тема | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Дата | |
Msg-id | CALj2ACVtLt0rPnr4NS6c72DdtQmpyQA=qcMaVfn6vu7j5yQvzw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Introduce XID age and inactive timeout based replication slot invalidation (Amit Kapila <amit.kapila16@gmail.com>) |
Ответы |
Re: Introduce XID age and inactive timeout based replication slot invalidation
|
Список | pgsql-hackers |
On Wed, Mar 13, 2024 at 9:38 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > BTW, is XID the based parameter 'max_slot_xid_age' not have similarity > with 'max_slot_wal_keep_size'? I think it will impact the rows we > removed based on xid horizons. Don't we need to consider it while > vacuum computing the xid horizons in ComputeXidHorizons() similar to > what we do for WAL w.r.t 'max_slot_wal_keep_size'? I'm having a hard time understanding why we'd need something up there in ComputeXidHorizons(). Can you elaborate it a bit please? What's proposed with max_slot_xid_age is that during checkpoint we look at slot's xmin and catalog_xmin, and the current system txn id. Then, if the XID age of (xmin, catalog_xmin) and current_xid crosses max_slot_xid_age, we invalidate the slot. Let me illustrate how all this works: 1. Setup a primary and standby with hot_standby_feedback set to on on standby. For instance, check my scripts at [1]. 2. Stop the standby to make the slot inactive on the primary. Check the slot is holding xmin of 738. ./pg_ctl -D sbdata -l logfilesbdata stop postgres=# SELECT * FROM pg_replication_slots; -[ RECORD 1 ]-------+------------- slot_name | sb_repl_slot plugin | slot_type | physical datoid | database | temporary | f active | f active_pid | xmin | 738 catalog_xmin | restart_lsn | 0/3000000 confirmed_flush_lsn | wal_status | reserved safe_wal_size | two_phase | f conflict_reason | failover | f synced | f 3. Start consuming the XIDs on the primary with the following script for instance ./psql -d postgres -p 5432 DROP TABLE tab_int; CREATE TABLE tab_int (a int); do $$ begin for i in 1..268435 loop -- use an exception block so that each iteration eats an XID begin insert into tab_int values (i); exception when division_by_zero then null; end; end loop; end$$; 4. Make some dead rows in the table. update tab_int set a = a+1; delete from tab_int where a%4=0; postgres=# SELECT n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del FROM pg_stat_user_tables WHERE relname = 'tab_int'; -[ RECORD 1 ]------ n_dead_tup | 335544 n_tup_ins | 268435 n_tup_upd | 268435 n_tup_del | 67109 5. Try vacuuming to delete the dead rows, observe 'tuples: 0 removed, 536870 remain, 335544 are dead but not yet removable'. The dead rows can't be removed because the inactive slot is holding an xmin, see 'removable cutoff: 738, which was 268441 XIDs old when operation ended'. postgres=# vacuum verbose tab_int; INFO: vacuuming "postgres.public.tab_int" INFO: finished vacuuming "postgres.public.tab_int": index scans: 0 pages: 0 removed, 2376 remain, 2376 scanned (100.00% of total) tuples: 0 removed, 536870 remain, 335544 are dead but not yet removable removable cutoff: 738, which was 268441 XIDs old when operation ended frozen: 0 pages from table (0.00% of total) had 0 tuples frozen index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s buffer usage: 4759 hits, 0 misses, 0 dirtied WAL usage: 0 records, 0 full page images, 0 bytes system usage: CPU: user: 0.07 s, system: 0.00 s, elapsed: 0.07 s VACUUM 6. Now, repeat the above steps but with setting max_slot_xid_age = 200000 on the primary. 7. Do a checkpoint to invalidate the slot. postgres=# checkpoint; CHECKPOINT postgres=# SELECT * FROM pg_replication_slots; -[ RECORD 1 ]-------+------------- slot_name | sb_repl_slot plugin | slot_type | physical datoid | database | temporary | f active | f active_pid | xmin | 738 catalog_xmin | restart_lsn | 0/3000000 confirmed_flush_lsn | wal_status | lost safe_wal_size | two_phase | f conflicting | failover | f synced | f invalidation_reason | xid_aged 8. And, then vacuum the table, observe 'tuples: 335544 removed, 201326 remain, 0 are dead but not yet removable'. postgres=# vacuum verbose tab_int; INFO: vacuuming "postgres.public.tab_int" INFO: finished vacuuming "postgres.public.tab_int": index scans: 0 pages: 0 removed, 2376 remain, 2376 scanned (100.00% of total) tuples: 335544 removed, 201326 remain, 0 are dead but not yet removable removable cutoff: 269179, which was 0 XIDs old when operation ended new relfrozenxid: 269179, which is 268441 XIDs ahead of previous value frozen: 1189 pages from table (50.04% of total) had 201326 tuples frozen index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed avg read rate: 0.000 MB/s, avg write rate: 193.100 MB/s buffer usage: 4760 hits, 0 misses, 2381 dirtied WAL usage: 5942 records, 2378 full page images, 8343275 bytes system usage: CPU: user: 0.09 s, system: 0.00 s, elapsed: 0.09 s VACUUM [1] cd /home/ubuntu/postgres/pg17/bin ./pg_ctl -D db17 -l logfile17 stop rm -rf db17 logfile17 rm -rf /home/ubuntu/postgres/pg17/bin/archived_wal mkdir /home/ubuntu/postgres/pg17/bin/archived_wal ./initdb -D db17 echo "archive_mode = on archive_command='cp %p /home/ubuntu/postgres/pg17/bin/archived_wal/%f'" | tee -a db17/postgresql.conf ./pg_ctl -D db17 -l logfile17 start ./psql -d postgres -p 5432 -c "SELECT pg_create_physical_replication_slot('sb_repl_slot', true, false);" rm -rf sbdata logfilesbdata ./pg_basebackup -D sbdata echo "port=5433 primary_conninfo='host=localhost port=5432 dbname=postgres user=ubuntu' primary_slot_name='sb_repl_slot' restore_command='cp /home/ubuntu/postgres/pg17/bin/archived_wal/%f %p' hot_standby_feedback = on" | tee -a sbdata/postgresql.conf touch sbdata/standby.signal ./pg_ctl -D sbdata -l logfilesbdata start ./psql -d postgres -p 5433 -c "SELECT pg_is_in_recovery();" -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
В списке pgsql-hackers по дате отправления: