Re: row filtering for logical replication

Поиск

Список

Период

Сортировка

От	Amit Kapila
Тема	Re: row filtering for logical replication
Дата	24 сентября 2021 г. 08:20:11
Msg-id	CAA4eK1JWpdUhwueSa-uc5Begez+kFW0vf+4DOZKeLThn3-TrXg@mail.gmail.com обсуждение исходный текст
Ответ на	Re: row filtering for logical replication (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы	Re: row filtering for logical replication Re: row filtering for logical replication
Список	pgsql-hackers

Дерево обсуждения

On Thu, Sep 23, 2021 at 6:03 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> 6) parse_oper.c
>
> I'm having some second thoughts about (not) allowing UDFs ...
>
> Yes, I get that if the function starts failing, e.g. because querying a
> dropped table or something, that breaks the replication and can't be
> fixed without a resync.
>

The other problem is that users can access/query any table inside the
function and that also won't work in a logical decoding environment as
we use historic snapshots using which we can access only catalog
tables.

> That's pretty annoying, but maybe disallowing anything user-defined
> (functions and operators) is maybe overly anxious? Also, extensibility
> is one of the hallmarks of Postgres, and disallowing all custom UDF and
> operators seems to contradict that ...
>
> Perhaps just explaining that the expression can / can't do in the docs,
> with clear warnings of the risks, would be acceptable.
>

I think the right way to support functions is by the explicit marking
of functions and in one of the emails above Jeff Davis also agreed
with the same. I think we should probably introduce a new marking for
this. I feel this is important because without this it won't be safe
to access even some of the built-in functions that can access/update
database (non-immutable functions) due to logical decoding environment
restrictions.

>
> 12) misuse of REPLICA IDENTITY
>
> The more I think about this, the more I think we're actually misusing
> REPLICA IDENTITY for something entirely different. The whole purpose of
> RI was to provide a row identifier for the subscriber.
>
> But now we're using it to ensure we have all the necessary columns,
> which is entirely orthogonal to the original purpose. I predict this
> will have rather negative consequences.
>
> People will either switch everything to REPLICA IDENTITY FULL, or create
> bogus unique indexes with extra columns. Which is really silly, because
> it wastes network bandwidth (transfers more data) or local resources
> (CPU and disk space to maintain extra indexes).
>
> IMHO this needs more infrastructure to request extra columns to decode
> (e.g. for the filter expression), and then remove them before sending
> the data to the subscriber.
>

Yeah, but that would have an additional load on write operations and I
am not sure at this stage but maybe there could be other ways to
extend the current infrastructure wherein we build the snapshots using
which we can access the user tables instead of only catalog tables.
Such enhancements if feasible would be useful not only for allowing
additional column access in row filters but for other purposes like
allowing access to functions that access user tables. I feel we can
extend this later as well seeing the usage and requests. For the first
version, this doesn't sound too limiting to me.

-- 
With Regards,
Amit Kapila.

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Dilip Kumar
Дата: 24 сентября 2021 г., 08:19:01
Сообщение: Re: Gather performance analysis

Следующее

От: Dilip Kumar
Дата: 24 сентября 2021 г., 08:36:12
Сообщение: Re: row filtering for logical replication

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: row filtering for logical replication

Предыдущее

Следующее