Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

Поиск

Список

Период

Сортировка

От	Önder Kalacı
Тема	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Дата	3 марта 2023 г. 09:32:39
Msg-id	CACawEhWkzEwzLo3Pmb72Xz+y=EfGhV-JsA97uxTx0z-QihR9rQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher (Amit Kapila <amit.kapila16@gmail.com>)
Ответы	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Список	pgsql-hackers

Дерево обсуждения

Hi Amit, all

Few comments on 0001
====================
1.
+ such suitable indexes, the search on the subscriber s ide can be
very inefficient,

unnecessary space in 'side'

Fixed in v28

2.
- identity. If the table does not have any suitable key, then it can be set
- to replica identity <quote>full</quote>, which means the entire row becomes
- the key. This, however, is very inefficient and should only be used as a
- fallback if no other solution is possible. If a replica identity other
+ identity. When replica identity <literal>FULL</literal> is specified,
+ indexes can be used on the subscriber side for searching the rows.

I think it is better to retain the first sentence (If the table does
not ... entire row becomes the key.) as that says what will be part of
the key.

right, that sentence looks useful, added back.

3.
- comprising the same or fewer columns must also be set on the subscriber
- side. See <xref linkend="sql-altertable-replica-identity"/> for details on
- how to set the replica identity. If a table without a replica identity is
- added to a publication that replicates <command>UPDATE</command>
+ comprising the same or fewer columns must also be set on the
subscriber side.
+ See <xref linkend="sql-altertable-replica-identity"/> for
+ details on how to set the replica identity. If a table without a replica
+ identity is added to a publication that replicates <command>UPDATE</command>

I don't see any change in this except line length. If so, I don't
think we should change it as part of this patch.

Yes, fixed. But the first line (starting with See <xref ..) still shows as if it is changed,

probably because its line has changed. I couldn't make that line show as it had not

changed.

4.
/*
* Setup a ScanKey for a search in the relation 'rel' for a tuple 'key' that
* is setup to match 'rel' (*NOT* idxrel!).
*
- * Returns whether any column contains NULLs.
+ * Returns how many columns should be used for the index scan.
+ *
+ * This is not generic routine, it expects the idxrel to be
+ * a btree, non-partial and have at least one column
+ * reference (e.g., should not consist of only expressions).
*
- * This is not generic routine, it expects the idxrel to be replication
- * identity of a rel and meet all limitations associated with that.
+ * By definition, replication identity of a rel meets all
+ * limitations associated with that. Note that any other
+ * index could also meet these limitations.

The comment changes look quite asymmetric to me. Normally, we break
the line if the line length goes beyond 80 cols. Please check and
change other places in the patch if they have a similar symptom.

Went over the patch, and expanded each line ~80 chars.

I'm guessing 80 is not the hard limit, in some cases I went over 81-82.

5.
+ * There are no fundamental problems for supporting non-btree
+ * and/or partial indexes.

Can we mention partial indexes in the above comment? It seems to me
that because the required tuple may not satisfy the expression (in the
case of partial indexes) it may not be easy to support it.

Expanded the comment and explained the differences a little further.

6.
build_replindex_scan_key()
{
...
+ for (index_attoff = 0; index_attoff <
IndexRelationGetNumberOfKeyAttributes(idxrel);
+ index_attoff++)
...
...
+#ifdef USE_ASSERT_CHECKING
+ IndexInfo *indexInfo = BuildIndexInfo(idxrel);
+
+ Assert(!IsIndexOnlyOnExpression(indexInfo));
+#endif
...
}

We can avoid building index info multiple times. This can be either
checked at the beginning of the function outside attribute offset loop
or we can probably cache it. I understand this is for assert builds
but seems easy to avoid it doing multiple times and it also looks odd
to do it multiple times for the same index.

Applied your suggestions. Although I do not have strong opinions, I think that

it was easier to follow with building the indexInfo for each iteration.

7.
- /* Build scankey for every attribute in the index. */
- for (attoff = 0; attoff <
IndexRelationGetNumberOfKeyAttributes(idxrel); attoff++)
+ /* Build scankey for every non-expression attribute in the index. */
+ for (index_attoff = 0; index_attoff <
IndexRelationGetNumberOfKeyAttributes(idxrel);
+ index_attoff++)
{
Oid operator;
Oid opfamily;
+ Oid optype = get_opclass_input_type(opclass->values[index_attoff]);
RegProcedure regop;
- int pkattno = attoff + 1;
- int mainattno = indkey->values[attoff];
- Oid optype = get_opclass_input_type(opclass->values[attoff]);
+ int table_attno = indkey->values[index_attoff];

I don't think here we need to change variable names if we retain
mainattno as it is instead of changing it to table_attno. The current
naming doesn't seem bad for the current usage of the patch.

Hmm, I'm actually not convinced that the variable naming on HEAD is good for

the current patch. The main difference is that now we allow indexes like:

CREATE INDEX idx ON table(foo(col), col_2)

(See # Testcase start: SUBSCRIPTION CAN USE INDEXES WITH

EXPRESSIONS AND COLUMNS)

In such indexes, we could skip the attributes of the index. So, skey_attoff is not

equal to index_attoff anymore. So, calling out this explicitly via the variable names

seems more robust to me. Plus, mainattno sounded vague to me when I first read

this function.

So, unless you have strong objections, I'm leaning towards having variable

names more explicit. I'm also open for suggestions if you think the names

I picked is not clear enough.

8.
+ TypeCacheEntry **eq = NULL; /* only used when the index is not RI or PK */

Normally, we don't add such comments as the usage is quite obvious by
looking at the code.

Sure, I also don't see much value for it, removed.

Attached v29 for this review. Note that I'll be working on the disable index scan changes after

I reply to some of the other pending reviews.

Thanks,

Onder

Вложения

v29_0001_use_index_on_subs_when_pub_rep_ident_full.patch

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

Вложения