Re: row filtering for logical replication

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: row filtering for logical replication
Дата
Msg-id CAA4eK1JmmGY46xsniE1nXmUg+JoLTzw6BE4H+4gPLNkBN7MafQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: row filtering for logical replication  ("Euler Taveira" <euler@eulerto.com>)
Список pgsql-hackers
On Mon, Dec 6, 2021 at 6:49 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Fri, Dec 3, 2021, at 8:12 PM, Euler Taveira wrote:
>
> PS> I will update the commit message in the next version. I barely changed the
> documentation to reflect the current behavior. I probably missed some changes
> but I will fix in the next version.
>
> I realized that I forgot to mention a few things about the UPDATE behavior.
> Regardless of 0003, we need to define which tuple will be used to evaluate the
> row filter for UPDATEs. We already discussed it circa [1]. This current version
> chooses *new* tuple. Is it the best choice?
>

Apart from the data inconsistency problems you outlined below, I think
there is a major design problem with that w.r.t toast tuples as
unchanged key values won't be part of *new* tuple.

> Let's check all cases. There are 2 rows on the provider. One row satisfies the
> row filter and the other one doesn't. For each case, I expect the initial rows
> to be there (no modifications). The DDLs are:
>
> CREATE TABLE foo (a integer, b text, PRIMARY KEY(a));
> INSERT INTO foo (a, b) VALUES(10, 'abc'),(30, 'abc');
> CREATE PUBLICATION bar FOR TABLE foo WHERE (a > 20);
>
> The table describes what happen on the subscriber. BEFORE is the current row on
> subscriber. OLD, NEW and OLD & NEW are action/row if we consider different ways
> to evaluate the row filter.
>
> -- case 1: old tuple (10, abc) ; new tuple (10, def)
> UPDATE foo SET b = 'def' WHERE a = 10;
>
> +-----------+--------------------+------------------+------------------+
> |   BEFORE  |       OLD          |        NEW       |    OLD & NEW     |
> +-----------+--------------------+------------------+------------------+
> |    NA     |       NA           |       NA         |       NA         |
> +-----------+--------------------+------------------+------------------+
>
> If the old and new tuple don't satisfy the row filter, there is no issue.
>
> -- case 2: old tuple (30, abc) ; new tuple (30, def)
> UPDATE foo SET b = 'def' WHERE a = 30;
>
> +-----------+--------------------+------------------+------------------+
> |   BEFORE  |       OLD          |        NEW       |    OLD & NEW     |
> +-----------+--------------------+------------------+------------------+
> | (30, abc) | UPDATE (30, def)   | UPDATE (30, def) | UPDATE (30, def) |
> +-----------+--------------------+------------------+------------------+
>
> If the old and new tuple satisfy the row filter, there is no issue.
>
> -- case 3: old tuple (30, abc) ; new tuple (10, def)
> UPDATE foo SET a = 10, b = 'def' WHERE a = 30;
>
> +-----------+--------------------+------------------+------------------+
> |   BEFORE  |       OLD          |        NEW       |    OLD & NEW     |
> +-----------+--------------------+------------------+------------------+
> | (30, abc) | UPDATE (10, def) * | KEEP (30, abc) * | KEEP (30, abc) * |
> +-----------+--------------------+------------------+------------------+
>
> If the old tuple satisfies the row filter but the new tuple doesn't,  we have a
> data consistency issue. Since the old tuple satisfies the row filter, the
> initial table synchronization copies this row. However, after the UPDATE the
> new tuple doesn't satisfy the row filter then, from the data consistency
> perspective, that row should be removed on the subscriber.
>

This is the reason we decide to make such cases to transform UPDATE to DELETE.

> The OLD sends the UPDATE because it satisfies the row filter (if it is a
> sharding solution this new row should be moved to another node). The new row
> would likely not be modified by replication again. That's a data inconsistency
> according to the row filter.
>
> The NEW and OLD & NEW don't send the UPDATE because it doesn't satisfy the row
> filter. Keep the old row is undesirable because it doesn't reflect what we have
> on the source. This row on the subscriber would likely not be modified by
> replication again. If someone inserted a new row with a = 30, replication will
> stop because there is already a row with that value.
>

This shouldn't be a problem with the v44 patch version (0003 handles it).

> -- case 4: old tuple (10, abc) ; new tuple (30, def)
> UPDATE foo SET a = 30, b = 'def' WHERE a = 10;
>
> +-----------+--------------------+------------------+------------------+
> |   BEFORE  |       OLD          |        NEW       |    OLD & NEW     |
> +-----------+--------------------+------------------+------------------+
> |    NA     |       NA !         |       NA !       |       NA         |
> +-----------+--------------------+------------------+------------------+
>
> The OLD and OLD & NEW don't send the UPDATE because it doesn't satisfy the row
> filter. The NEW sends the UPDATE because it satisfies the row filter but there
> is no row to modify. The current behavior does nothing. However, it should
> INSERT the new tuple. Subsequent UPDATE or DELETE have no effect. It could be a
> surprise for an application that expects the same data set from the provider.
>

Again this is addressed by V44 as an Insert would be performed in this case.

> If we have to choose the default behavior I would say use the old tuple for
> evaluates row filter. Why? The validation already restricts the columns to
> replica identity so there isn't an issues with missing (NULL) columns. The case
> 3 updates the row with a value that is not consistent but keeping the old row
> is worse because it could stop the replication if someone inserted the old key
> in a new row on the provider. The case 4 ignores the UPDATE if it cannot find
> the tuple but it could provide an error if there was an strict mode.
>

Hmm, I think it is much better to translate Update to Delete in case-3
and Update to Insert in case-4 as there shouldn't be any data
consistency issues after that. All these issues have been discussed in
detail in this thread and based on that we decided to follow the v44
(0003) patch version approach. We have also investigated some other
replication solutions and they were also doing the similar
translations to avoid such issues.


> Since this change is very simple to revert, this new version contains this
> modification. I also improve the documentation, remove extra parenthesis from
> psql/pg_dump. As I said in the previous email, I merged the validation patch too.
>

As said previously it might be better to keep those separate for
easier review. It is anyway better to split such a big patch for ease
of review even if in the end we combine all the work.

> FWIW in the previous version, I removed a code that compares nodes to decide if
> it is necessary to remove the publication-relation entry. I had a similar code
> in a ancient version of this patch but decided that the additional code is not
> worth.
>
> There is at least one issue in the current code that should be addressed: PK or
> REPLICA IDENTITY modification could break the publication check for UPDATEs and
> DELETEs.
>

Please see patch 0005  [1]. I think it tries to address the issues
w.r.t Replica Identity interaction with this feature. Feel free to
test/review and let us know if you see any issues.

[1] - https://www.postgresql.org/message-id/CAHut%2BPtJnnM8MYQDf7xCyFAp13U_0Ya2dv-UQeFD%3DghixFLZiw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: pg_get_publication_tables() output duplicate relid
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Non-superuser subscription owners