Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Дата
Msg-id CAA4eK1KyJ84HcPoQ0gZunZuXg_=F=+sZQdGJ0i4T46jzxuwRCQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Mon, Jul 18, 2022 at 11:59 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jul 12, 2022 at 7:07 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
> >
> > Hi hackers,
> >
> >
> > It is often not feasible to use `REPLICA IDENTITY FULL` on the publication, because it leads to full table scan
> >
> > per tuple change on the subscription. This makes `REPLICA IDENTITY FULL` impracticable -- probably other
> >
> > than some small number of use cases.
> >
>
> IIUC, this proposal is to optimize cases where users can't have a
> unique/primary key for a relation on the subscriber and those
> relations receive lots of updates or deletes?
>
> > With this patch, I'm proposing the following change: If there is an index on the subscriber, use the index
> >
> > as long as the planner sub-modules picks any index over sequential scan.
> >
> > Majority of the logic on the subscriber side has already existed in the code. The subscriber is already
> >
> > capable of doing (unique) index scans. With this patch, we are allowing the index to iterate over the
> >
> > tuples fetched and only act when tuples are equal. The ones familiar with this part of the code could
> >
> > realize that the sequential scan code on the subscriber already implements the `tuples_equal()` function.
> >
> > In short, the changes on the subscriber are mostly combining parts of (unique) index scan and
> >
> > sequential scan codes.
> >
> > The decision on whether to use an index (or which index) is mostly derived from planner infrastructure.
> >
> > The idea is that on the subscriber we have all the columns. So, construct all the `Path`s with the
> >
> > restrictions on all columns, such as `col_1 = $1 AND col_2 = $2 ... AND col_n = $N`. Finally, let
> >
> > the planner sub-module -- `create_index_paths()` -- to give us the relevant  index `Path`s. On top of
> >
> > that adds the sequential scan `Path` as well. Finally, pick the cheapest `Path` among.
> >
> > From the performance point of view, there are few things to note. First, the patch aims not to
> > change the behavior when PRIMARY KEY or UNIQUE INDEX is used. Second, when REPLICA IDENTITY
> > IS FULL on the publisher and an index is used on the subscriber, the difference mostly comes down
> > to `index scan` vs `sequential scan`. That's why it is hard to claim a certain number of improvements.
> > It mostly depends on the data size, index and the data distribution.
> >
>
> It seems that in favorable cases it will improve performance but we
> should consider unfavorable cases as well. Two things that come to
> mind in that regard are (a) while choosing index/seq. scan paths, the
> patch doesn't account for cost for tuples_equal() which needs to be
> performed for index scans, (b) it appears to me that the patch decides
> which index to use the first time it opens the rel (or if the rel gets
> invalidated) on subscriber and then for all consecutive operations it
> uses the same index. It is quite possible that after some more
> operations on the table, using the same index will actually be
> costlier than a sequence scan or some other index scan.
>

Point (a) won't matter because we perform tuples_equal both for
sequence and index scans. So, we can ignore point (a).

--
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Etsuro Fujita
Дата:
Сообщение: Re: Fast COPY FROM based on batch insert
Следующее
От: David Geier
Дата:
Сообщение: Re: Lazy JIT IR code generation to increase JIT speed with partitions