Re: Proposal: Filter irrelevant change before reassemble transactions during logical decoding

Поиск
Список
Период
Сортировка
От Ajin Cherian
Тема Re: Proposal: Filter irrelevant change before reassemble transactions during logical decoding
Дата
Msg-id CAFPTHDbS_accCqVwAgk48eWL-w4o3jwS+YENqLr2oyPNrSEfDA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Proposal: Filter irrelevant change before reassemble transactions during logical decoding  (Ajin Cherian <itsajin@gmail.com>)
Список pgsql-hackers


On Wed, May 22, 2024 at 2:17 PM Ajin Cherian <itsajin@gmail.com> wrote:

The reason for the crash is that the RelationSyncCache was NULL prior to reaching a consistent point.
Hi li jie, I see that you created a new thread with an updated version of this patch [1]. I used that patch and addressed the crash seen above, rebased the patch and addressed a few other comments.
I'm happy to help you with this patch and address comments if you are not available.

regards,
Ajin Cherian
Fujitsu Australia

I was discussing this with Kuroda-san who made a patch to add a table filter with test_decoding plugin. The filter does nothing, just returns false( doesn't filter anything) and I see that 8 test_decoding tests fail. In my analysis, I could see that some of the failures were because the new filter logic was accessing the relation cache using the latest snapshot for relids which was getting incorrect relation information while decoding attribute values.
for eg:
CREATE TABLE replication_example(id SERIAL PRIMARY KEY, somedata int, text varchar(120));
BEGIN;
INSERT INTO replication_example(somedata, text) VALUES (1, 1);
INSERT INTO replication_example(somedata, text) VALUES (1, 2);
COMMIT;
ALTER TABLE replication_example RENAME COLUMN text TO somenum;
SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');

Here, after decoding, the changes for the INSERT, were reflecting the new column name (somenum) which was altered in a later transaction. This is because the new Filterby Table() logic was getting relcache with latest snapshot and does not use a historic snapshot like logical decoding should be doing. This is because the changes are at the decode.c level and not the reorderbuffer level and does not have access to the txn from the reorderbuffer. This problem could be fixed by invalidating the cache in the FilterByTable() logic, but this does not solve other problems like the table name itself is changed in a later transaction. I think the patch has a fundamental problem that the table filter logic does not respect the snapshot of the transaction being decoded. The historic snapshot is currently only set up when the actual changes are committed or streamed at ReorderBufferProcessTXN().

If the purpose of the patch is to filter out unnecessary changes prior to actual decode, then it will use an invalid snapshot and have lots of problems. Otherwise this logic has to be moved to the reorderbuffer level and there will be a big overhead of extracting reorderbuffer while each change is queued in memory/disk.
 regards,
Ajin Cherian
Fujitsu Australia

В списке pgsql-hackers по дате отправления:

Предыдущее
От: David E. Wheeler
Дата:
Сообщение: Re: Patch bug: Fix jsonpath .* on Arrays
Следующее
От: "Fujii.Yuki@df.MitsubishiElectric.co.jp"
Дата:
Сообщение: RE: Should consider materializing the cheapest inner path in consider_parallel_nestloop()