Re: Implementing Incremental View Maintenance

Поиск
Список
Период
Сортировка
От Yugo NAGATA
Тема Re: Implementing Incremental View Maintenance
Дата
Msg-id 20220708192211.a5a5210b13c2d1911b578a38@sraoss.co.jp
обсуждение исходный текст
Ответ на Re: Implementing Incremental View Maintenance  (huyajun <hu_yajun@qq.com>)
Ответы Re: Implementing Incremental View Maintenance  (huyajun <hu_yajun@qq.com>)
Список pgsql-hackers
Hi huyajun, 

Thank you for your comments!

On Wed, 29 Jun 2022 17:56:39 +0800
huyajun <hu_yajun@qq.com> wrote:


> Hi, Nagata-san
> I read your patch with v27 version and has some new comments,I want to discuss with you.
> 
> 1. How about use DEPENDENCY_INTERNAL instead of DEPENDENCY_AUTO
>   when record dependence on trigger created by IMV.( related code is in the end of CreateIvmTrigger)
> Otherwise, User can use sql to drop trigger and corrupt IVM, DEPENDENCY_INTERNAL is also semantically more correct
> Crash case like:
>    create table t( a int);
>    create incremental  materialized view s as select * from t;
>    drop trigger "IVM_trigger_XXXX”;
>    Insert into t values(1);

We use DEPENDENCY_AUTO because we want to delete the triggers when
REFRESH ... WITH NO DATA is performed on the materialized view in order
to disable IVM. Triggers created with DEPENDENCY_INTERNAL cannot be dropped.
Such triggers are re-created when REFRESH ... [WITH DATA] is performed.

We can use DEPENDENCY_INTERNAL if we disable/enable such triggers instead of
dropping/re-creating them, although users also can disable triggers using
ALTER TRIGGER.

> 2. In get_matching_condition_string, Considering NULL values, we can not use simple = operator.
> But how about 'record = record', record_eq treat NULL = NULL
> it should fast than current implementation for only one comparation
> Below is my simple implementation with this, Variables are named arbitrarily..
> I test some cases it’s ok
> 
> static char *
> get_matching_condition_string(List *keys)
> {
>     StringInfoData match_cond;
>     ListCell    *lc;
> 
>     /* If there is no key columns, the condition is always true. */
>     if (keys == NIL)
>         return "true";
>     else
>     {
>         StringInfoData s1;
>         StringInfoData s2;
>         initStringInfo(&match_cond);
>         initStringInfo(&s1);
>         initStringInfo(&s2);
>         /* Considering NULL values, we can not use simple = operator. */
>         appendStringInfo(&s1, "ROW(");
>         appendStringInfo(&s2, "ROW(");
>         foreach (lc, keys)
>         {
>             Form_pg_attribute attr = (Form_pg_attribute) lfirst(lc);
>             char   *resname = NameStr(attr->attname);
>             char   *mv_resname = quote_qualified_identifier("mv", resname);
>             char   *diff_resname = quote_qualified_identifier("diff", resname);
>             
>             appendStringInfo(&s1, "%s", mv_resname);
>             appendStringInfo(&s2, "%s", diff_resname);
> 
>             if (lnext(lc))
>             {
>                 appendStringInfo(&s1, ", ");
>                 appendStringInfo(&s2, ", ");
>             }
>         }
>         appendStringInfo(&s1, ")::record");
>         appendStringInfo(&s2, ")::record");
>         appendStringInfo(&match_cond, "%s operator(pg_catalog.=) %s", s1.data, s2.data);
>         return match_cond.data;
>     }
> }

As you say, we don't have to use IS NULL if we use ROW(...)::record, but we
cannot use an index in this case and it makes IVM ineffecient. As showed
bellow (#5), an index works even when we use simple = operations together
with together "IS NULL" operations.
 
> 3. Consider truncate base tables, IVM will not refresh, maybe raise an error will be better

I fixed to support TRUNCATE on base tables in our repository.
https://github.com/sraoss/pgsql-ivm/commit/a1365ed69f34e1adbd160f2ce8fd1e80e032392f

When a base table is truncated, the view content will be empty if the
view definition query does not contain an aggregate without a GROUP clause.
Therefore, such views can be truncated. 

Aggregate views without a GROUP clause always have one row. Therefore,
if a base table is truncated, the view will not be empty and will contain
a row with NULL value (or 0 for count()). So, in this case, we refresh the
view instead of truncating it.

The next version of the patch-set will include this change. 
 
> 4. In IVM_immediate_before,I know Lock base table with ExclusiveLock is 
>   for concurrent updates to the IVM correctly, But how about to Lock it when actually 
>   need  to maintain MV which in IVM_immediate_maintenance
>     In this way you don't have to lock multiple times.

Yes, as you say, we don't have to lock the view multiple times.
I'll investigate better locking ways including the way that you suggest.
 
> 5. Why we need CreateIndexOnIMMV, is it a optimize? 
>    It seems like when maintenance MV,
>    the index may not be used because of our match conditions  can’t use simple = operator

No, the index works even when we use simple = operator together with "IS NULL".
For example:

postgres=# \d mv
           Materialized view "public.mv"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 id     | integer |           |          | 
 v1     | integer |           |          | 
 v2     | integer |           |          | 
Indexes:
    "mv_index" UNIQUE, btree (id) NULLS NOT DISTINCT

postgres=# EXPLAIN ANALYZE
     WITH diff(id, v1, v2) AS MATERIALIZED ((VALUES(42, 420, NULL::int)))
     SELECT mv.* FROM mv, diff 
     WHERE (mv.id = diff.id OR (mv.id IS NULL AND diff.id IS NULL)) AND 
           (mv.v1 = diff.v1 OR (mv.v1 IS NULL AND diff.v1 IS NULL)) AND 
           (mv.v2 = diff.v2 OR (mv.v2 IS NULL AND diff.v2 IS NULL));

                                                                                              QUERY PLAN
                                                                
 
             

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------
 Nested Loop  (cost=133.87..137.92 rows=1 width=12) (actual time=0.180..0.191 rows=1 loops=1)
   CTE diff
     ->  Result  (cost=0.00..0.01 rows=1 width=12) (actual time=0.027..0.028 rows=1 loops=1)
   ->  CTE Scan on diff  (cost=0.00..0.02 rows=1 width=12) (actual time=0.037..0.040 rows=1 loops=1)
   ->  Bitmap Heap Scan on mv  (cost=133.86..137.88 rows=1 width=12) (actual time=0.127..0.132 rows=1 loops=1)
         Recheck Cond: ((id = diff.id) OR (id IS NULL))
         Filter: (((id = diff.id) OR ((id IS NULL) AND (diff.id IS NULL))) AND ((v1 = diff.v1) OR ((v1 IS NULL) AND
(diff.v1IS NULL))) AND ((v2 = diff.v2) OR ((v2 IS NULL) AND (diff.v2
 
 IS NULL))))
         Heap Blocks: exact=1
         ->  BitmapOr  (cost=133.86..133.86 rows=1 width=0) (actual time=0.091..0.093 rows=0 loops=1)
               ->  Bitmap Index Scan on mv_index  (cost=0.00..4.43 rows=1 width=0) (actual time=0.065..0.065 rows=1
loops=1)
                     Index Cond: (id = diff.id)
               ->  Bitmap Index Scan on mv_index  (cost=0.00..4.43 rows=1 width=0) (actual time=0.021..0.021 rows=0
loops=1)
                     Index Cond: (id IS NULL)
 Planning Time: 0.666 ms
 Execution Time: 0.399 ms
(15 rows)



Regards,
Yugo Nagata


-- 
Yugo NAGATA <nagata@sraoss.co.jp>



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bharath Rupireddy
Дата:
Сообщение: Re: Add function to return backup_label and tablespace_map
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Support for grabbing multiple consecutive values with nextval()