Re: row filtering for logical replication

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: row filtering for logical replication
Дата
Msg-id CAA4eK1+xQb06NGs6Y7OzwMtKYYixEqR8tdWV5THAVE4SAqNrDg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: row filtering for logical replication  ("Euler Taveira" <euler@eulerto.com>)
Ответы Re: row filtering for logical replication
Список pgsql-hackers
On Mon, Jul 12, 2021 at 1:09 AM Euler Taveira <euler@eulerto.com> wrote:
>
> I did another measure using as baseline the previous patch (v16).
>
> without cache (v16)
> ---------------------------
>
> mean:           1.46 us
> stddev:         2.13 us
> median:         1.39 us
> min-max:        [0.69 .. 1456.69] us
> percentile(99): 3.15 us
> mode:           0.91 us
>
> with cache (v18)
> -----------------------
>
> mean:           0.63 us
> stddev:         1.07 us
> median:         0.55 us
> min-max:        [0.29 .. 844.87] us
> percentile(99): 1.38 us
> mode:           0.41 us
>
> It represents -57%. It is a really good optimization for just a few extra lines
> of code.
>

Good improvement but I think it is better to measure the performance
by using synchronous_replication by setting the subscriber as
standby_synchronous_names, which will provide the overall saving of
time. We can probably see when the timings when no rows are filtered,
when 10% rows are filtered when 30% are filtered and so on.

I think the way caching has been done in the patch is a bit
inefficient. Basically, it always invalidates and rebuilds the
expressions even though some unrelated operation has happened on
publication. For example, say publication has initially table t1 with
rowfilter r1 for which we have cached the state. Now you altered
publication and added table t2, it will invalidate the entire state of
t1 as well. I think we can avoid that if we invalidate the rowfilter
related state only on relcache invalidation i.e in
rel_sync_cache_relation_cb and save it the very first time we prepare
the expression. In that case, we don't need to do it in advance when
preparing relsyncentry, this will have the additional advantage that
we won't spend cycles on preparing state unless it is required (for
truncate we won't require row_filtering, so it won't be prepared).

Few other things, I have noticed:
1.
I am seeing tupledesc leak by following below steps:
ERROR:  tupdesc reference 00000000008D7D18 is not owned by resource
owner TopTransaction
CONTEXT:  slot "tap_sub", output plugin "pgoutput", in the change
callback, associated LSN 0/170BD50

Publisher
CREATE TABLE tab_rowfilter_1 (a int primary key, b text);
CREATE PUBLICATION tap_pub_1 FOR TABLE tab_rowfilter_1 WHERE (a > 1000
AND b <> 'filtered');

Subscriber
CREATE TABLE tab_rowfilter_1 (a int primary key, b text);
CREATE SUBSCRIPTION tap_sub
         CONNECTION 'host=localhost port=5432 dbname=postgres'
        PUBLICATION tap_pub_1;

Publisher
INSERT INTO tab_rowfilter_1 (a, b) VALUES (1980, 'not filtered');
Alter table tab_rowfilter_1 drop column b cascade;
INSERT INTO tab_rowfilter_1 (a) VALUES (1982);

2.
postgres=# Alter table tab_rowfilter_1 alter column b set data type varchar;
ERROR:  unexpected object depending on column: publication of table
tab_rowfilter_1 in publication tap_pub_1

I think for this you need to change ATExecAlterColumnType to handle
the publication case.


-- 
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Magnus Hagander
Дата:
Сообщение: Re: Support kerberos authentication for postgres_fdw
Следующее
От: vignesh C
Дата:
Сообщение: Re: Added schema level support for publication.