Proposal: Filter irrelevant change before reassemble transactions during logical decoding

Поиск
Список
Период
Сортировка
От li jie
Тема Proposal: Filter irrelevant change before reassemble transactions during logical decoding
Дата
Msg-id CAGfChW51P944nM5h0HTV9HistvVfwBxNaMt_s-OZ9t=uXz+Zbg@mail.gmail.com
обсуждение исходный текст
Ответы Re: Proposal: Filter irrelevant change before reassemble transactions during logical decoding  (li jie <ggysxcq@gmail.com>)
Список pgsql-hackers
Hi,

During logical decoding, if there is a large write transaction, some
spill files will be written to disk,
depending on the setting of max_changes_in_memory.

This behavior can effectively avoid OOM, but if the transaction
generates a lot of change before commit,
a large number of files may fill the disk. For example, you can update
a TB-level table.
Of course,   this is also inevitable.

But I found an inelegant phenomenon. If the updated large table is not
published, its changes will also
be written with a large number of spill files. Look at an example below:

publisher:
```
create table tbl_pub(id int, val1 text, val2 text,val3 text);
create table tbl_t1(id int, val1 text, val2 text,val3 text);
CREATE PUBLICATION mypub FOR TABLE public.tbl_pub;
```

subscriber:
```
create table tbl_pub(id int, val1 text, val2 text,val3 text);
create table tbl_t1(id int, val1 text, val2 text,val3 text);
CREATE SUBSCRIPTION mysub CONNECTION 'host=127.0.0.1 port=5432
user=postgres dbname=postgres' PUBLICATION mypub;
```

publisher:
```
begin;
insert into tbl_t1 select i,repeat('xyzzy', i),repeat('abcba',
i),repeat('dfds', i) from generate_series(0,999999) i;
```

Later you will see a large number of spill files in the
"/$PGDATA/pg_replslot/mysub/" directory.
```
$ll -sh
total 4.5G
4.0K -rw------- 1 postgres postgres 200 Nov 30 09:24 state
17M -rw------- 1 postgres postgres 17M Nov 30 08:22 xid-750-lsn-0-10000000.spill
12M -rw------- 1 postgres postgres 12M Nov 30 08:20 xid-750-lsn-0-1000000.spill
17M -rw------- 1 postgres postgres 17M Nov 30 08:23 xid-750-lsn-0-11000000.spill
......
```

We can see that table tbl_t1 is not published in mypub. It is also not
sent downstream because it is subscribed.
After the transaction is reorganized, the pgoutput decoding plug-in
filters out these change  of unpublished relation
when sending logical changes. see function pgoutput_change.

Above all, if after constructing a change and before queuing a change
into a transaction, we filter out unpublished
relation-related changes, will it make logical decoding less laborious
and avoid disk growth as much as possible?


This is just an immature idea. I haven't started to implement it yet.
Maybe it was designed this way because there
are key factors that I didn't consider. So I want to hear everyone's
opinions, especially the designers of logic decoding.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alexander Korotkov
Дата:
Сообщение: Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: [PoC] pg_upgrade: allow to upgrade publisher node