Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Дата
Msg-id 3466340.1673117404@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Önder Kalacı <onderkalaci@gmail.com>)
Ответы Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Önder Kalacı <onderkalaci@gmail.com>)
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
=?UTF-8?B?w5ZuZGVyIEthbGFjxLE=?= <onderkalaci@gmail.com> writes:
> Attached v22.

I took a very brief look through this.  I'm not too pleased with
this whole line of development TBH.  It seems to me that the core
design of execReplication.c and related code is "let's build our
own half-baked executor and much-less-than-half-baked planner,
because XXX".  (I'm not too sure what XXX was, really, but apparently
somebody managed to convince people that that is a sane and
maintainable design.)  Now this patch has decided that it *will*
use the real planner, or at least portions of it in some cases.
If we're going to do that ISTM we ought to replace all the existing
not-really-a-planner logic, but this has not done that; instead
we have a large net addition to the already very duplicative
replication code, with weird restrictions because it doesn't want
to make changes to the half-baked executor.

I think we should either live within the constraints set by this
overarching design, or else nuke execReplication.c from orbit and
start using the real planner and executor.  Perhaps the foreign
key enforcement mechanisms could be a model --- although if you
don't want to buy into using SPI as well, you probably should look
at Amit L's work at [1].

Also ... maybe I am missing something, but is REPLICA IDENTITY FULL
sanely defined in the first place?  It looks to me that
RelationFindReplTupleSeq assumes without proof that there is a unique
full-tuple match, but that is only reasonable to assume if there is at
least one unique index (and maybe not even then, if nulls are involved).
If there is a unique index, why can't that be chosen as replica identity?
If there isn't, what semantics are we actually providing?

What I'm thinking about is that maybe REPLICA IDENTITY FULL should be
defined as "the subscriber can pick any unique index to match on,
and is allowed to fail if the table has none".  Or if "fail" is a bridge
too far for you, we could fall back to the existing seqscan logic.
But thumbing through the existing indexes to find a non-expression unique
index wouldn't require invoking the full planner.  Any candidate index
would result in a plan estimated to fetch just one row, so there aren't
likely to be serious cost differences.

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/CA+HiwqG5e8pk8s7+7zhr1Nc_PGyhEdM5f=pHkMOdK1RYWXfJsg@mail.gmail.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: [RFC] Add jit deform_counter
Следующее
От: Joseph Koshakow
Дата:
Сообщение: Re: Infinite Interval