Re: Implementing Incremental View Maintenance

Поиск
Список
Период
Сортировка
От huyajun
Тема Re: Implementing Incremental View Maintenance
Дата
Msg-id tencent_FCAF11BCA5003FD16BDDFDDA5D6A19587809@qq.com
обсуждение исходный текст
Ответ на Re: Implementing Incremental View Maintenance  (Yugo NAGATA <nagata@sraoss.co.jp>)
Ответы Re: Implementing Incremental View Maintenance  (Yugo NAGATA <nagata@sraoss.co.jp>)
Список pgsql-hackers
Hi, Nagata-san
Thank you for your answer, I agree with your opinion, and found some new problems to discuss with you

3. Consider truncate base tables, IVM will not refresh, maybe raise an error will be better

I fixed to support TRUNCATE on base tables in our repository.
https://github.com/sraoss/pgsql-ivm/commit/a1365ed69f34e1adbd160f2ce8fd1e80e032392f

When a base table is truncated, the view content will be empty if the
view definition query does not contain an aggregate without a GROUP clause.
Therefore, such views can be truncated.

Aggregate views without a GROUP clause always have one row. Therefore,
if a base table is truncated, the view will not be empty and will contain
a row with NULL value (or 0 for count()). So, in this case, we refresh the
view instead of truncating it.

The next version of the patch-set will include this change.

I read your patch and think this processing is greet, but there is a risk of deadlock. 
Although I have not thought of a suitable processing method for the time being, 
it is also acceptable for truncate scenarios.The deadlock scene is as follows:

Mv define is: select * from base_a,base_b;
S1: truncate base_a; — only AccessExclusiveLock base_a and not run into after trigger
S2: insert into base_b; — The update has been completed and the incremental refresh is started in the after trigger,RowExclusive on base_b and ExclusiveLock on mv
S1: continue truncate mv, wait for AccessExclusiveLock on mv, wait for S2
S2: continue refresh mv, wait for AccessShardLock on base_a, wait for S1
So deadlock occurred

I also found some new issues that I would like to discuss with you
1. Concurrent DML causes imv data error, case like below
Setup:
Create table t( a int);
Insert into t select 1 from generate_series(1,3);
create incremental materialized view s  as select count(*) from t;

S1: begin;delete from t where ctid in (select ctid from t limit 1);
S2: begin;delete from t where ctid in (select ctid from t limit 1 offset 1);
S1: commit;
S2: commit;

After this, The count data of s becomes 2 but correct data is 1.
I found out that the problem is probably because to our use of ctid update
Consider user behavior unrelated to imv:

Create table t( a int);
Insert into t select 1;
s1: BEGIN
s1: update t set a = 2 where ctid in (select ctid from t); -- UPDATE 1
s2: BEGIN
s2: update t set a = 3 where ctid in (select ctid from t); -- wait row lock
s1: COMMIT
s2: -- UPDATE 0 -- ctid change so can't UPDATE one rows
So we lost the s2 update

2. Sometimes it will crash when the columns of the created materialized view do not match
Create table t( a int);
create incremental materialized view s(z)  as select sum(1) as a, sum(1) as b from t;

The problem should be that colNames in rewriteQueryForIMMV does not consider this situation

3. Sometimes no error when the columns of the created materialized view do not match
 Create table t( a int);
 create incremental materialized view s(y,z)  as select  count(1) as b from t;

But the hidden column of IMV is overwritten to z which will cause refresh failed.

The problem should be that checkRuleResultList we should only skip imv hidden columns check

4. A unique index should not be created in the case of a Cartesian product

create table base_a (i int primary key, j varchar);
create table base_b (i int primary key, k varchar);
INSERT INTO base_a VALUES
(1,10),
(2,20),
(3,30),
(4,40),
(5,50);
INSERT INTO base_b VALUES
(1,101),
(2,102),
(3,103),
(4,104);
CREATE incremental MATERIALIZED VIEW s as
select base_a.i,base_a.j from base_a,base_b; — create error because of unique index

5. Besides, I would like to ask you if you have considered implementing an IMV with delayed refresh?
The advantage of delayed refresh is that it will not have much impact on write performance
I probably have some ideas about it now, do you think it works?
1. After the base table is updated, the delayed IMV's after trigger is used to record the delta
 information in another table similar to the incremental log of the base table
2. When incremental refresh, use the data in the log instead of the data in the trasient table
of the after trigger
3. We need to merge the incremental information in advance to ensure that the base_table
after transaction filtering UNION ALL old_delta is the state before the base table is updated
Case like below:
Create table t( a int);
—begin to record log
Insert into t select 1; — newlog: 1 oldlog: empty
Delete from t; —newlog:1, oldlog:1
— begin to incremental refresh
Select * from t where xmin < xid or (xmin = xid and cmin < cid); — empty
So this union all oldlog is not equal to  before the base table is updated
We need  merge the incremental log in advance to make newlog: empty, oldlog: empty

If implemented, incremental refresh must still be serialized, but the DML of the base table 
can not be blocked, that is to say, the base table can still record logs during incremental refresh,
as long as we use same snapshot when incrementally updating.

do you think there will be any problems with this solution?

Looking forward to your reply to answer my above doubts, thank you a lot!
Regards,
Yajun Hu

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Handle infinite recursion in logical replication setup
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Handle infinite recursion in logical replication setup