Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

Поиск

Список

Период

Сортировка

От	Marco Slot
Тема	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Дата	20 января 2023 г. 15:35:51
Msg-id	CAFMSG9E7_JgnaBj1f_jU4dEmOnL=K6UyBAPR7eZ0NukUcR2L9g@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Список	pgsql-hackers

Дерево обсуждения

On Mon, Jan 9, 2023 at 11:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Anyway, to get back to the point at hand: if we do have a REPLICA IDENTITY
> FULL situation then we can make use of any unique index over a subset of
> the transmitted columns, and if there's more than one candidate index
> it's unlikely to matter which one we pick. Given your comment I guess
> we have to also compare the non-indexed columns, so we can't completely
> convert the FULL case to the straight index case. But still it doesn't
> seem to me to be appropriate to use the planner to find a suitable index.

The main purpose of REPLICA IDENTITY FULL seems to be to enable logical replication for tables that may have duplicates and therefore cannot have a unique index that can be used as a replica identity.

For those tables the user currently needs to choose between update/delete erroring (bad) or doing a sequential scan on the apply side per updated/deleted tuple (often worse). This issue currently prevents a lot of automation around logical replication, because users need to decide whether and when they are willing to accept partial downtime. The current REPLICA IDENTITY FULL implementation can work in some cases, but applying the effects of an update that affected a million rows through a million sequential scans will certainly not end well.

This patch solves the problem by allowing the apply side to pick a non-unique index to find any matching tuple instead of always using a sequential scan, but that either requires some planning/costing logic to avoid picking a lousy index, or allowing the user to manually preselect the index to use, which is less convenient.

An alternative might be to construct prepared statements and using the regular planner. If applied uniformly that would also be nice from the extensibility point-of-view, since there is currently no way for an extension to augment the apply side. However, I assume the current approach of using low-level functions in the common case was chosen for performance reasons.

I suppose the options are:

1. use regular planner uniformly

2. use regular planner only when there's no replica identity (or configurable?)

3. only use low-level functions

4. keep using sequential scans for every single updated row

5. introduce a hidden logical row identifier in the heap that is guaranteed unique within a table and can be used as a replica identity when no unique index exists

Any thoughts?

cheers,

Marco

В списке pgsql-hackers по дате отправления:

Предыдущее

От: "Karl O. Pinc"
Дата: 20 января 2023 г., 15:26:00
Сообщение: Re: Doc: Rework contrib appendix -- informative titles, tweaked sentences

Следующее

От: Ted Yu
Дата: 20 января 2023 г., 16:23:16
Сообщение: Re: Operation log for major operations

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

Предыдущее

Следующее