Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

Поиск
Список
Период
Сортировка
От Marco Slot
Тема Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Дата
Msg-id CAFMSG9E7_JgnaBj1f_jU4dEmOnL=K6UyBAPR7eZ0NukUcR2L9g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Önder Kalacı <onderkalaci@gmail.com>)
Список pgsql-hackers
On Mon, Jan 9, 2023 at 11:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Anyway, to get back to the point at hand: if we do have a REPLICA IDENTITY
> FULL situation then we can make use of any unique index over a subset of
> the transmitted columns, and if there's more than one candidate index
> it's unlikely to matter which one we pick.  Given your comment I guess
> we have to also compare the non-indexed columns, so we can't completely
> convert the FULL case to the straight index case.  But still it doesn't
> seem to me to be appropriate to use the planner to find a suitable index.

The main purpose of REPLICA IDENTITY FULL seems to be to enable logical replication for tables that may have duplicates and therefore cannot have a unique index that can be used as a replica identity.

For those tables the user currently needs to choose between update/delete erroring (bad) or doing a sequential scan on the apply side per updated/deleted tuple (often worse). This issue currently prevents a lot of automation around logical replication, because users need to decide whether and when they are willing to accept partial downtime. The current REPLICA IDENTITY FULL implementation can work in some cases, but applying the effects of an update that affected a million rows through a million sequential scans will certainly not end well.

This patch solves the problem by allowing the apply side to pick a non-unique index to find any matching tuple instead of always using a sequential scan, but that either requires some planning/costing logic to avoid picking a lousy index, or allowing the user to manually preselect the index to use, which is less convenient.

An alternative might be to construct prepared statements and using the regular planner. If applied uniformly that would also be nice from the extensibility point-of-view, since there is currently no way for an extension to augment the apply side. However, I assume the current approach of using low-level functions in the common case was chosen for performance reasons.

I suppose the options are:
1. use regular planner uniformly
2. use regular planner only when there's no replica identity (or configurable?)
3. only use low-level functions
4. keep using sequential scans for every single updated row
5. introduce a hidden logical row identifier in the heap that is guaranteed unique within a table and can be used as a replica identity when no unique index exists

Any thoughts?

cheers,
Marco

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Karl O. Pinc"
Дата:
Сообщение: Re: Doc: Rework contrib appendix -- informative titles, tweaked sentences
Следующее
От: Ted Yu
Дата:
Сообщение: Re: Operation log for major operations