Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

Поиск
Список
Период
Сортировка
От Önder Kalacı
Тема Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Дата
Msg-id CACawEhUxEfVr64ywC0sj7nDwFg=vR30qehNoqSDcx6VV2=B_Lw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Marco Slot <marco.slot@gmail.com>)
Ответы Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
Hi Marco, Tom,

> But still it doesn't seem to me to be appropriate to use the planner to find a suitable index.

As Marco noted, here we are trying to pick an index that is non-unique. We could pick the index based on information extracted from pg_index (or such), but then, it'd be a premature selection. Before sending the patch to pgsql-hackers, I initially tried to find a suitable one with such an approach. 

But then, I still ended up using costing functions (and some other low level functions). Overall, it felt like the planner is the module that makes this decision best. Why would we try to invent another immature way of doing this? With that reasoning, I ended up using the related planner functions directly.

However, I assume the current approach of using low-level functions in the common case was chosen for performance reasons.

That's partially the reason. If you look at the patch, we use the planner (or the low level functions) infrequently. It is only called when the logical replication relation cache is rebuilt. As far as I can see, that happens with (auto) ANALYZE or DDLs etc. I expect these are infrequent operations. Still, I wanted to make sure we do not create too much overhead even if there are frequent invalidations.

The main reason for using the low level functions over the planner itself is to have some more control over the decision. For example, due to the execution limitations, we currently cannot allow an index that consists of only expressions (similar to pkey restriction). With the current approach, we can easily filter those out.

Also, another minor reason is that, if we use planner, we'd get a PlannedStmt back. It also felt weird to check back the index used from a PlannedStmt. In the current patch, we iterate over Paths, which seems more intuitive to me.


I suppose the options are:
1. use regular planner uniformly
2. use regular planner only when there's no replica identity (or configurable?)
3. only use low-level functions
4. keep using sequential scans for every single updated row
5. introduce a hidden logical row identifier in the heap that is guaranteed unique within a table and can be used as a replica identity when no unique index exists
 
One other option I considered was to ask the index explicitly on the subscriber side from the user when REPLICA IDENTITY is FULL. But, it is a pretty hard choice for any user, even a planner sometimes fails to pick the right index :)  Also, it is probably controversial to change any of the APIs for this purpose?

I'd be happy to hear from more experienced hackers on the trade-offs for the above, and I'd be open to work on that if there is a clear winner. For me (3) is a decent solution for the problem.

Thanks,
Onder

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nitin Jadhav
Дата:
Сообщение: Add a test case related to the error "cannot fetch toast data without an active snapshot"
Следующее
От: Nitin Jadhav
Дата:
Сообщение: Re: Improve GetConfigOptionValues function