Обсуждение: Implementing Incremental View Maintenance

Поиск

Список

Период

Сортировка

Implementing Incremental View Maintenance

От

Yugo Nagata

Дата:

27 декабря 2018 г., 12:57:26

Hi,

I would like to implement Incremental View Maintenance (IVM) on PostgreSQL.
IVM is a technique to maintain materialized views which computes and applies
only the incremental changes to the materialized views rather than
recomputate the contents as the current REFRESH command does.

I had a presentation on our PoC implementation of IVM at PGConf.eu 2018 [1].
Our implementation uses row OIDs to compute deltas for materialized views.
The basic idea is that if we have information about which rows in base tables
are contributing to generate a certain row in a matview then we can identify
the affected rows when a base table is updated. This is based on an idea of
Dr. Masunaga [2] who is a member of our group and inspired from ID-based
approach[3].

In our implementation, the mapping of the row OIDs of the materialized view
and the base tables are stored in "OID map". When a base relation is modified,
AFTER trigger is executed and the delta is recorded in delta tables using
the transition table feature. The accual udpate of the matview is triggerd
by REFRESH command with INCREMENTALLY option.

However, we realize problems of our implementation. First, WITH OIDS will
be removed since PG12, so OIDs are no longer available. Besides this, it would
be hard to implement this since it needs many changes of executor nodes to
collect base tables's OIDs during execuing a query. Also, the cost of maintaining
OID map would be high.

For these reasons, we started to think to implement IVM without relying on OIDs
and made a bit more surveys.

We also looked at Kevin Grittner's discussion [4] on incremental matview
maintenance. In this discussion, Kevin proposed to use counting algorithm [5]
to handle projection views (using DISTNICT) properly. This algorithm need an
additional system column, count_t, in materialized views and delta tables of
base tables.

However, the discussion about IVM is now stoped, so we would like to restart and
progress this.

Through our PoC inplementation and surveys, I think we need to think at least
the followings for implementing IVM.

1. How to extract changes on base tables

I think there would be at least two approaches for it.

- Using transition table in AFTER triggers
- Extracting changes from WAL using logical decoding

In our PoC implementation, we used AFTER trigger and transition tables, but using
logical decoding might be better from the point of performance of base table
modification.

If we can represent a change of UPDATE on a base table as query-like rather than
OLD and NEW, it may be possible to update the materialized view directly instead
of performing delete & insert.

2. How to compute the delta to be applied to materialized views

Essentially, IVM is based on relational algebra. Theorically, changes on base
tables are represented as deltas on this, like "R <- R + dR", and the delta on
the materialized view is computed using base table deltas based on "change
propagation equations". For implementation, we have to derive the equation from
the view definition query (Query tree, or Plan tree?) and describe this as SQL
query to compulte delta to be applied to the materialized view.

There could be several operations for view definition: selection, projection,
join, aggregation, union, difference, intersection, etc. If we can prepare a
module for each operation, it makes IVM extensable, so we can start a simple
view definition, and then support more complex views.

3. How to identify rows to be modifed in materialized views

When applying the delta to the materialized view, we have to identify which row
in the matview is corresponding to a row in the delta. A naive method is matching
by using all columns in a tuple, but clearly this is unefficient. If thematerialized
view has unique index, we can use this. Maybe, we have to force materialized views
to have all primary key colums in their base tables. In our PoC implementation, we
used OID to identify rows, but this will be no longer available as said above.

4. When to maintain materialized views

There are two candidates of the timing of maintenance, immediate (eager) or deferred.

In eager maintenance, the materialized view is updated in the same transaction
where the base table is updated. In deferred maintenance, this is done after the
transaction is commited, for example, when view is accessed, as a response to user
request, etc.

In the previous discussion[4], it is planned to start from "eager" approach. In our PoC
implementaion, we used the other aproach, that is, using REFRESH command to perform IVM.
I am not sure which is better as a start point, but I begin to think that the eager
approach may be more simple since we don't have to maintain base table changes in other
past transactions.

In the eager maintenance approache, we have to consider a race condition where two
different transactions change base tables simultaneously as discussed in [4].

[1]
https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
[2]
https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1
(Japaneseonly)

[3] https://dl.acm.org/citation.cfm?id=2750546
[4] https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
[5] https://dl.acm.org/citation.cfm?id=170066

Regards,
--
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

denty

Дата:

31 декабря 2018 г., 10:41:15

Hi Yugo.

> I would like to implement Incremental View Maintenance (IVM) on
> PostgreSQL.

Great. :-)

I think it would address an important gap in PostgreSQL’s feature set.

> 2. How to compute the delta to be applied to materialized views
>
> Essentially, IVM is based on relational algebra. Theorically, changes on
> base
> tables are represented as deltas on this, like "R <- R + dR", and the
> delta on
> the materialized view is computed using base table deltas based on "change
> propagation equations".  For implementation, we have to derive the
> equation from
> the view definition query (Query tree, or Plan tree?) and describe this as
> SQL
> query to compulte delta to be applied to the materialized view.

We had a similar discussion in this thread
https://www.postgresql.org/message-id/flat/FC784A9F-F599-4DCC-A45D-DBF6FA582D30%40QQdd.eu,
and I’m very much in agreement that the "change propagation equations”
approach can solve for a very substantial subset of common MV use cases.

> There could be several operations for view definition: selection,
> projection,
> join,  aggregation, union, difference, intersection, etc.  If we can
> prepare a
> module for each operation, it makes IVM extensable, so we can start a
> simple
> view definition, and then support more complex views.

Such a decomposition also allows ’stacking’, allowing complex MV definitions
to be attacked even with only a small handful of modules.

I did a bit of an experiment to see if "change propagation equations” could
be computed directly from the MV’s pg_node_tree representation in the
catalog in PlPgSQL. I found that pg_node_trees are not particularly friendly
to manipulation in PlPgSQL. Even with a more friendly-to-PlPgSQL
representation (I played with JSONB), then the next problem is making sense
of the structures, and unfortunately amongst the many plan/path/tree utility
functions in the code base, I figured only a very few could be sensibly
exposed to PlPgSQL. Ultimately, although I’m still attracted to the idea,
and I think it could be made to work, native code is the way to go at least
for now.

> 4. When to maintain materialized views
>
> [...]
>
> In the previous discussion[4], it is planned to start from "eager"
> approach. In our PoC
> implementaion, we used the other aproach, that is, using REFRESH command
> to perform IVM.
> I am not sure which is better as a start point, but I begin to think that
> the eager
> approach may be more simple since we don't have to maintain base table
> changes in other
> past transactions.

Certainly the eager approach allows progress to be made with less
infrastructure.

I am concerned that the eager approach only addresses a subset of the MV use
case space, though. For example, if we presume that an MV is present because
the underlying direct query would be non-performant, then we have to at
least question whether applying the delta-update would also be detrimental
to some use cases.

In the eager maintenance approache, we have to consider a race condition
where two
different transactions change base tables simultaneously as discussed in
[4].

I wonder if that nudges towards a logged approach. If the race is due to
fact of JOIN-worthy tuples been made visible after a COMMIT, but not before,
then does it not follow that the eager approach has to fire some kind of
reconciliation work at COMMIT time? That seems to imply a persistent queue
of some kind, since we can’t assume transactions to be so small to be able
to hold the queue in memory.

Hmm. I hadn’t really thought about that particular corner case. I guess a
‘catch' could be simply be to detect such a concurrent update and demote the
refresh approach by marking the MV stale awaiting a full refresh.

denty.



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: Implementing Incremental View Maintenance

От

Adam Brusselback

Дата:

31 декабря 2018 г., 16:20:11

Hi all, just wanted to say I am very happy to see progress made on this, my codebase has multiple "materialized tables" which are maintained with statement triggers (transition tables) and custom functions. They are ugly and a pain to maintain, but they work because I have no other solution...for now at least.

I am concerned that the eager approach only addresses a subset of the MV use
case space, though. For example, if we presume that an MV is present because
the underlying direct query would be non-performant, then we have to at
least question whether applying the delta-update would also be detrimental
to some use cases.

I will say that in my case, as long as my reads of the materialized view are always consistent with the underlying data, that's what's important. I don't mind if it's eager, or lazy (as long as lazy still means it will refresh prior to reading).

Re: Implementing Incremental View Maintenance

От

Nguyễn Trần Quốc Vinh

Дата:

01 января 2019 г., 07:46:25

Dear all,

We have some result on incremental update for MVs. We generate triggers on C to do the incremental maintenance. We posted the code to github about 1 year ago, but unfortunately i posted a not-right ctrigger.h header. The mistake was exposed to me when a person could not compile the generated triggers and reported to me. And now i re-posted with the right ctrigger.h file.

You can find the codes of the generator here: https://github.com/ntqvinh/PgMvIncrementalUpdate/commits/master. You can find how did we do here: https://link.springer.com/article/10.1134/S0361768816050066. The paper is about generating of codes in pl/pgsql. Anyway i see it is useful for reading the codes. I don't know if i can share the paper or not so that i don't publish anywhere else. The text about how to generate triggers in C was published with open-access but unfortunately, it is in Vietnamese.

We are happy if the codes are useful for someone.

Thank you and best regards,

NTQ Vinh

TS. Nguyễn Trần Quốc Vinh

-----------------------------------------------

Chủ nhiệm khoa Tin học

Trường ĐH Sư phạm - ĐH Đà Nẵng

------------------------------------------------

Nguyen Tran Quoc Vinh, PhD

Dean

Faculty of Information Technology

Danang University of Education

Website: http://it.ued.udn.vn; http://www.ued.vn; http://www.ued.udn.vn

SCV: http://scv.ued.vn/~ntquocvinh

Phone: (+84) 511.6-512-586

Mobile: (+84) 914.78-08-98

On Mon, Dec 31, 2018 at 11:20 PM Adam Brusselback <adambrusselback@gmail.com> wrote:

Hi all, just wanted to say I am very happy to see progress made on this, my codebase has multiple "materialized tables" which are maintained with statement triggers (transition tables) and custom functions. They are ugly and a pain to maintain, but they work because I have no other solution...for now at least.

I am concerned that the eager approach only addresses a subset of the MV use
case space, though. For example, if we presume that an MV is present because
the underlying direct query would be non-performant, then we have to at
least question whether applying the delta-update would also be detrimental
to some use cases.

I will say that in my case, as long as my reads of the materialized view are always consistent with the underlying data, that's what's important. I don't mind if it's eager, or lazy (as long as lazy still means it will refresh prior to reading).

Re: Implementing Incremental View Maintenance

От

Tatsuo Ishii

Дата:

07 января 2019 г., 01:59:56

> Hi all, just wanted to say  I am very happy to see progress made on this,
> my codebase has multiple "materialized tables" which are maintained with
> statement triggers (transition tables) and custom functions. They are ugly
> and a pain to maintain, but they work because I have no other
> solution...for now at least.
> 
> I am concerned that the eager approach only addresses a subset of the MV use
>> case space, though. For example, if we presume that an MV is present
>> because
>> the underlying direct query would be non-performant, then we have to at
>> least question whether applying the delta-update would also be detrimental
>> to some use cases.
>>
> 
> I will say that in my case, as long as my reads of the materialized view
> are always consistent with the underlying data, that's what's important.  I
> don't mind if it's eager, or lazy (as long as lazy still means it will
> refresh prior to reading).

Assuming that we want to implement IVM incrementally (that means, for
example, we implement DELETE for IVM in PostgreSQL XX, then INSERT for
IVM for PostgreSQL XX+1... etc.), I think it's hard to do it with an
eager approach if want to MV is always consistent with base tables.

On the other hand, a lazy approach allows to implement IVM
incrementally because we could always let full MV build from scratch
if operations on MV include queries we do not support.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: Implementing Incremental View Maintenance

От

Nguyễn Trần Quốc Vinh

Дата:

07 января 2019 г., 02:51:41

Dear All,

The tool analyzes the input query and then generates triggers (trigger functions and pl/pgsql scripts as well) on all manipulating events (insert/updates/delete) for all underlying base tables. The triggers do incremental updates to the table that contains the query result (MV). You can build the tool, then see the provided example and try the tool. It is for synchronous maintenance. It was hard tested but you can use it with your own risk.

For Asynchronous maintenance, we generate 1) triggers on all manipulating events on base tables to collect all the data changes and save to the 'special' tables; then 2) the tool to do incremental updates of MVs.

Best regards,

Vinh

TS. Nguyễn Trần Quốc Vinh

-----------------------------------------------

Chủ nhiệm khoa Tin học

Trường ĐH Sư phạm - ĐH Đà Nẵng

------------------------------------------------

Nguyen Tran Quoc Vinh, PhD

Dean

Faculty of Information Technology

Danang University of Education

Website: http://it.ued.udn.vn; http://www.ued.vn; http://www.ued.udn.vn

SCV: http://scv.ued.vn/~ntquocvinh

Phone: (+84) 511.6-512-586

Mobile: (+84) 914.78-08-98

On Mon, Jan 7, 2019 at 9:00 AM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

> Hi all, just wanted to say I am very happy to see progress made on this,
> my codebase has multiple "materialized tables" which are maintained with
> statement triggers (transition tables) and custom functions. They are ugly
> and a pain to maintain, but they work because I have no other
> solution...for now at least.
>
> I am concerned that the eager approach only addresses a subset of the MV use
>> case space, though. For example, if we presume that an MV is present
>> because
>> the underlying direct query would be non-performant, then we have to at
>> least question whether applying the delta-update would also be detrimental
>> to some use cases.
>>
>
> I will say that in my case, as long as my reads of the materialized view
> are always consistent with the underlying data, that's what's important. I
> don't mind if it's eager, or lazy (as long as lazy still means it will
> refresh prior to reading).

Assuming that we want to implement IVM incrementally (that means, for
example, we implement DELETE for IVM in PostgreSQL XX, then INSERT for
IVM for PostgreSQL XX+1... etc.), I think it's hard to do it with an
eager approach if want to MV is always consistent with base tables.

On the other hand, a lazy approach allows to implement IVM
incrementally because we could always let full MV build from scratch
if operations on MV include queries we do not support.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: Implementing Incremental View Maintenance

От

Mitar

Дата:

07 января 2019 г., 08:39:00

Hi!

On Thu, Dec 27, 2018 at 4:57 AM Yugo Nagata <nagata@sraoss.co.jp> wrote:
> I would like to implement Incremental View Maintenance (IVM) on PostgreSQL.
> IVM is a technique to maintain materialized views which computes and applies
> only the incremental changes to the materialized views rather than
> recomputate the contents as the current REFRESH command does.

That sounds great! I am interested in this topic because I am
interested in reactive/live queries and support for them in
PostgreSQL. [1]

In that context, the problem is very similar: based on some state of
query results and updated source tables, determine what should be new
updates to send to the client describing changes to the query results.
So after computing those incremental changes, instead of applying them
to materialized view I would send them to the client. One could see
materialized views only type of consumers of such information about
incremental change.

So I would like to ask if whatever is done in this setting is done in
a way that one could also outside of the context of materialized view.
Not sure what would API be thought.

From the perspective of reactive/live queries, this package [2] is
interesting. To my understanding, it adds to all base tables two
columns, one for unique ID and one for revision of the row. And then
rewrites queries so that this information is passed all the way to
query results. In this way it can then determine mapping between
inputs and outputs. I am not sure if it then does incremental update
or just uses that to determine if view is invalidated. Not sure if
there is anything about such approach in literature. Or why both index
and revision columns are needed.

> For these reasons, we started to think to implement IVM without relying on OIDs
> and made a bit more surveys.

I also do not see much difference between asking users to have primary
key on base tables or asking them to have OIDs. Why do you think that
a requirement for primary keys is a hard one? I think we should first
focus on having IVM with base tables with primary keys. Maybe then
later on we could improve on that and make it also work without.

To me personally, having unique index on source tables and also on
materialized view is a reasonable restriction for this feature.
Especially for initial versions of it.

> However, the discussion about IVM is now stoped, so we would like to restart and
> progress this.

What would be next steps in your view to move this further?

> If we can represent a change of UPDATE on a base table as query-like rather than
> OLD and NEW, it may be possible to update the materialized view directly instead
> of performing delete & insert.

Why do you need OLD and NEW? Don't you need just NEW and a list of
columns which changed from those in NEW? I use such diffing query [4]
to represent changes: first column has a flag telling if the row is
representing insert, update, and remove, the second column tells which
column are being changed in the case of the update, and then the NEW
columns follow.

I think that maybe standardizing structure for representing those
changes would be a good step towards making this modular and reusable.
Because then we can have three parts:

* Recording and storing changes in a standard format.
* A function which given original data, stored changes, computes
updates needed, also in some standard format.
* A function which given original data and updates needed, applies them.

> In the previous discussion[4], it is planned to start from "eager" approach. In our PoC
> implementaion, we used the other aproach, that is, using REFRESH command to perform IVM.
> I am not sure which is better as a start point, but I begin to think that the eager
> approach may be more simple since we don't have to maintain base table changes in other
> past transactions.

I think if we split things into three parts as I described above, then
this is just a question of configuration. Or you call all three inside
one trigger to update in "eager" fashion. Or you store computed
updates somewhere and then on demand apply those in "lazy" fashion.

> In the eager maintenance approache, we have to consider a race condition where two
> different transactions change base tables simultaneously as discussed in [4].

But in the case of "lazy" maintenance there is a mirror problem: what
if later changes to base tables invalidate some previous change to the
materialized view. Imagine that one cell in a base table is first
updated too "foo" and we compute an update for the materialized view
to set it to "foo". And then the same cell is updated to "bar" and we
compute an update for the materialized view again. If we have not
applied any of those updates (because we are "lazy") now the
previously computed update can be discarded. We could still apply
both, but it would not be efficient.

[1] https://www.postgresql.org/message-id/flat/CAKLmikP%2BPPB49z8rEEvRjFOD0D2DV72KdqYN7s9fjh9sM_32ZA%40mail.gmail.com
[2] https://github.com/nothingisdead/pg-live-query
[3] https://www.postgresql.org/docs/devel/sql-createtable.html
[4] https://github.com/tozd/node-reactive-postgres/blob/eeda4f28d096b6e552d04c5ea138c258cb5b9389/index.js#L329-L340

Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

Re: Implementing Incremental View Maintenance

От

Yugo Nagata

Дата:

31 января 2019 г., 12:38:58

On Tue, 1 Jan 2019 14:46:25 +0700
Nguyễn Trần Quốc Vinh <ntquocvinh@gmail.com> wrote:

> We have some result on incremental update for MVs. We generate triggers on
> C to do the incremental maintenance. We posted the code to github about 1
> year ago, but unfortunately i posted a not-right ctrigger.h header. The
> mistake was exposed to me when a person could not compile the generated
> triggers and reported to me. And now i re-posted with the right ctrigger.h
> file.
>
> You can find the codes of the generator here:
> https://github.com/ntqvinh/PgMvIncrementalUpdate/commits/master. You can
> find how did we do here:
> https://link.springer.com/article/10.1134/S0361768816050066. The paper is
> about generating of codes in pl/pgsql. Anyway i see it is useful for
> reading the codes. I don't know if i can share the paper or not so that i
> don't publish anywhere else. The text about how to generate triggers in C
> was published with open-access but unfortunately, it is in Vietnamese.
>
> We are happy if the codes are useful for someone.

I have read your paper. It is interesting and great so that the algorithm
is described concretely.

After reading this, I have a few questions about your implementation.
Although I may be able to understand by reading your paper and code carefully,
I would appreciate it if you could answer these.

- It is said there are many limitations on the view definition query.
  How does this work when the query is not supported?

- Is it possible to support materialized views that have DISTINCT,
  OUTER JOIN, or sub-query in your approach?

- It is said that AVG is splitted to SUM and COUNT. Are these new additional
  columns in MV visible for users?

- Does this can handle the race condition discussed in [1], that is,
  if concurrent transactions update different two tables in the join
  view definition, is MV updated sucessfully?

[1] https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com

Regards,
--
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Yugo Nagata

Дата:

31 января 2019 г., 14:20:32

On Mon, 7 Jan 2019 00:39:00 -0800
Mitar <mmitar@gmail.com> wrote:

> That sounds great! I am interested in this topic because I am
> interested in reactive/live queries and support for them in
> PostgreSQL. [1]
> 
> In that context, the problem is very similar: based on some state of
> query results and updated source tables, determine what should be new
> updates to send to the client describing changes to the query results.
> So after computing those incremental changes, instead of applying them
> to materialized view I would send them to the client. One could see
> materialized views only type of consumers of such information about
> incremental change.
> 
> So I would like to ask if whatever is done in this setting is done in
> a way that one could also outside of the context of materialized view.
> Not sure what would API be thought.

I didn't know about reactive/live queries but this seems share a part of
problem with IVM, so we might have common API.

BTW, what is uecase of reactive/live queries? (just curious)
 
> > For these reasons, we started to think to implement IVM without relying on OIDs
> > and made a bit more surveys.
> 
> I also do not see much difference between asking users to have primary
> key on base tables or asking them to have OIDs. Why do you think that
> a requirement for primary keys is a hard one? I think we should first
> focus on having IVM with base tables with primary keys. Maybe then
> later on we could improve on that and make it also work without.
>  
> To me personally, having unique index on source tables and also on
> materialized view is a reasonable restriction for this feature.
> Especially for initial versions of it.

Initially, I chose to use OIDs for theoretical reason, that is, to handle
"bag-semantics" which allows duplicate rows in tables. However, I agree
that we start from the restriction of having unique index on base tables.
 
> > If we can represent a change of UPDATE on a base table as query-like rather than
> > OLD and NEW, it may be possible to update the materialized view directly instead
> > of performing delete & insert.
> 
> Why do you need OLD and NEW? Don't you need just NEW and a list of
> columns which changed from those in NEW? I use such diffing query [4]
> to represent changes: first column has a flag telling if the row is
> representing insert, update, and remove, the second column tells which
> column are being changed in the case of the update, and then the NEW
> columns follow.

According the change propagation equation approach, OLD is necessary
to calculate tuples in MV to be deleted or modified. However, if tables
has unique keys, such tuples can be identifeid using the keys, so
OLD may not be needed, at least in eager approach.

In lazy approach, OLD contents of table is useful. For example, with
a join view MV = R * S, when dR is inserted into R and dS is inserted
into S, the delta to be inserted into MV will be
 
 dMV = (R_old * dS) + (dR * S_new)
    = (R_old * dS) + (dR * S_old) + (dR * dS)

, hence the old contents of tables R and S are needed.
 
> I think that maybe standardizing structure for representing those
> changes would be a good step towards making this modular and reusable.
> Because then we can have three parts:
> 
> * Recording and storing changes in a standard format.
> * A function which given original data, stored changes, computes
> updates needed, also in some standard format.
> * A function which given original data and updates needed, applies them.

> I think if we split things into three parts as I described above, then
> this is just a question of configuration. Or you call all three inside
> one trigger to update in "eager" fashion. Or you store computed
> updates somewhere and then on demand apply those in "lazy" fashion.

I agree that defining the format to represent changes is important. However,
I am not sure both of eager and lazy can be handled in the same manner. I'll
consider about this more.

> > In the eager maintenance approache, we have to consider a race condition where two
> > different transactions change base tables simultaneously as discussed in [4].
> 
> But in the case of "lazy" maintenance there is a mirror problem: what
> if later changes to base tables invalidate some previous change to the
> materialized view. Imagine that one cell in a base table is first
> updated too "foo" and we compute an update for the materialized view
> to set it to "foo". And then the same cell is updated to "bar" and we
> compute an update for the materialized view again. If we have not
> applied any of those updates (because we are "lazy") now the
> previously computed update can be discarded. We could still apply
> both, but it would not be efficient.

In our PoC implementation, I handled this situation by removing
old contents from NEW delata table. In your example, when the base
table is updated from "foo" to "bar", the "foo" tuple is removed
from and the "bar" tuple is inserted in NEW delta and the delta
of MV is computed using the final NEW delta.

Regards,
-- 
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Mitar

Дата:

14 марта 2019 г., 07:41:49

Hi!

On Thu, Jan 31, 2019 at 6:20 AM Yugo Nagata <nagata@sraoss.co.jp> wrote:
> BTW, what is uecase of reactive/live queries? (just curious)

It allows syncing the state between client and server. Client can then
have a subset of data and server can push changes as they are
happening to the client. Client can in a reactive manner render that
in the UI to the user. So you can easily create a reactive UI which
always shows up-to-date data without having to poll or something
similar.

How are things progressing? Any news on this topic?


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

Re: Implementing Incremental View Maintenance

От

Yugo Nagata

Дата:

01 апреля 2019 г., 03:11:22

On Thu, 27 Dec 2018 21:57:26 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:

> Hi,
>
> I would like to implement Incremental View Maintenance (IVM) on PostgreSQL.

I am now working on an initial patch for implementing IVM on PostgreSQL.
This enables materialized views to be updated incrementally after one
of their base tables is modified.

At the first patch, I want to start from very simple features.

Firstly, this will handle simple definition views which includes only
selection, projection, and join. Standard aggregations (count, sum, avg,
min, max) are not planned to be implemented in the first patch, but these
are commonly used in materialized views, so I'll implement them later on.
Views which include sub-query, outer-join, CTE, and window functions are also
out of scope of the first patch. Also, views including self-join or views
including other views in their definition is not considered well, either.
I need more investigation on these type of views although I found some papers
explaining how to handle sub-quries and outer-joins.

Next, this will handle materialized views with no duplicates in their
tuples. I am thinking of implementing an algorithm to handle duplicates
called "counting-algorithm" afterward, but I'll start from this
no-duplicates assumption in the first patch for simplicity.

In the first patch, I will implement only "immediate maintenance", that is, materialized views are updated immediately
ina transaction where a base

table is modified. On other hand, in "deferred maintenance", materialized
views are updated after the transaction, for example, by the user command
like REFRESH. Although I plan to implement both eventually, I'll start from "immediate" because this seems to need
smallercode than "deferred". For

implementing "deferred", it is need to implement a mechanism to maintain logs
for recording changes and an algorithm to compute the delta to be applied to
materialized views are necessary.

I plan to implement the immediate maintenance using AFTER triggers created
automatically on a materialized view's base tables. In AFTER trigger using
transition table features, changes occurs on base tables is recorded ephemeral relations. We can compute the delta to
beapplied to materialized views by

using these ephemeral relations and the view definition query, then update
the view by applying this delta.

--
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Greg Stark

Дата:

03 апреля 2019 г., 21:41:36

On Sun, 31 Mar 2019 at 23:22, Yugo Nagata <nagata@sraoss.co.jp> wrote:
>
> Firstly, this will handle simple definition views which includes only
> selection, projection, and join.  Standard aggregations (count, sum, avg,
> min, max) are not planned to be implemented in the first patch, but these
> are commonly used in materialized views, so I'll implement them later on.

It's fine to not have all the features from day 1 of course. But I
just picked up this comment and the followup talking about splitting
AVG into SUM and COUNT and I had a comment. When you do look at
tackling aggregates I don't think you should restrict yourself to
these specific standard aggregations. We have all the necessary
abstractions to handle all aggregations that are feasible, see
https://www.postgresql.org/docs/devel/xaggr.html#XAGGR-MOVING-AGGREGATES

What you need to do -- I think -- is store the "moving aggregate
state" before the final function. Then whenever a row is inserted or
deleted or updated (or whenever another column is updated which causes
the value to row to enter or leave the aggregation) apply either
aggtransfn or aggminvtransfn to the state. I'm not sure if you want to
apply the final function on every update or only lazily either may be
better in some usage.

Re: Implementing Incremental View Maintenance

От

Yugo Nagata

Дата:

14 мая 2019 г., 06:46:48

On Mon, 1 Apr 2019 12:11:22 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:

> On Thu, 27 Dec 2018 21:57:26 +0900
> Yugo Nagata <nagata@sraoss.co.jp> wrote:
> 
> > Hi,
> > 
> > I would like to implement Incremental View Maintenance (IVM) on PostgreSQL.  
> 
> I am now working on an initial patch for implementing IVM on PostgreSQL.
> This enables materialized views to be updated incrementally after one
> of their base tables is modified.

Attached is a WIP patch of Incremental View Maintenance (IVM).
Major part is written by me, and changes in syntax and pg_class
are Hoshiai-san's work.

Although this is sill a draft patch in work-in-progress, any
suggestions or thoughts would be appreciated.

* What it is

This allows a kind of Immediate Maintenance of materialized views. if a 
materialized view is created by CRATE INCREMENTAL MATERIALIZED VIEW command,
the contents of the mateview is updated automatically and incrementally
after base tables are updated. Noted this syntax is just tentative, so it
may be changed.

====== Example 1 ======
postgres=# CREATE INCREMENTAL MATERIALIZED VIEW m AS SELECT * FROM t0;
SELECT 3
postgres=# SELECT * FROM m;
 i 
---
 3
 2
 1
(3 rows)

postgres=# INSERT INTO t0 VALUES (4);
INSERT 0 1
postgres=# SELECt * FROM m; -- automatically updated
 i 
---
 3
 2
 1
 4
(4 rows)
=============================

This implementation also supports matviews including duplicate tuples or
DISTINCT clause in its view definition query.  For example, even if a matview
is defined with DISTINCT to remove duplication of tuples in a base table, this
can perform incremental update of the matview properly. That is, the contents
of the matview doesn't change when exiting tuples are inserted into the base
tables, and a tuple in the matview is deleted only when duplicity of the
corresponding tuple in the base table becomes zero.

This is due to "colunting alogorithm" in which the number of each tuple is
stored in matviews as a special column value.

====== Example 2 ======
postgres=# SELECT * FROM t1;
 id | t 
----+---
  1 | A
  2 | B
  3 | C
  4 | A
(4 rows)

postgres=# CREATE INCREMENTAL MATERIALIZED VIEW m1 AS SELECT t FROM t1;
SELECT 3
postgres=# CREATE INCREMENTAL MATERIALIZED VIEW m2 AS SELECT DISTINCT t FROM t1;
SELECT 3
postgres=# SELECT * FROM m1; -- with duplicity
 t 
---
 A
 A
 C
 B
(4 rows)

postgres=# SELECT * FROM m2;
 t 
---
 A
 B
 C
(3 rows)

postgres=# INSERT INTO t1 VALUES (5, 'B');
INSERT 0 1
postgres=# DELETE FROM t1 WHERE id IN (1,3); -- delete (1,A),(3,C)
DELETE 2
postgres=# SELECT * FROM m1; -- one A left and one more B
 t 
---
 B
 B
 A
(3 rows)

postgres=# SELECT * FROM m2; -- only C is removed
 t 
---
 B
 A
(2 rows)
=============================

* How it works

1. Creating matview

When a matview is created, AFTER triggers are internally created
on its base tables. When the base tables is modified (INSERT, DELETE,
UPDATE), the matview is updated incrementally in the trigger function.

When populating the matview, GROUP BY and count(*) are added to the
view definition query before this is executed for counting duplicity
of tuples in the matview. The result of count is stored in the matview
as a special column named "__ivm_count__". 

2. Maintenance of matview

When base tables are modified, the change set of the table can be
referred as Ephemeral Named Relations (ENRs) thanks to Transition Table
(a feature of trigger implemented since PG10). We can calculate the diff
set of the matview by replacing the base table in the view definition
query with the ENR (at least if it is Selection-Projection -Join view). 
As well as view definition time, GROUP BY and count(*) is added in order
to count the duplicity of tuples in the diff set. As a result, two diff
sets (to be deleted from and to be inserted into the matview) are
calculated, and the results are stored into temporary tables respectively.

The matiview is updated by merging these change sets. Instead of executing
DELETE or INSERT simply, the values of __ivm_count__ column in the matview
is decreased or increased. When the values becomes zero, the corresponding
tuple is deleted from the matview.

3. Access to matview

When SELECT is issued for IVM matviews defined with DISTINCT, all columns
except to __ivm_count__ of each tuple in the matview are returned.  This is 
correct because duplicity of tuples are eliminated by GROUP BY.

When DISTINCT is not used, SELECT for the IVM matviews returns each tuple
__ivm_count__ times. Currently, this is implemented by rewriting the SELECT
query to replace the matview RTE with a subquery which joins the matview
and generate_series function as bellow. 

 SELECT mv.* FROM mv, generate_series(1, mv.__ivm_count__);

__ivm_count__ column is invisible for users when "SELECT * FROM ..." is
issued, but users can see the value by specifying in target list explicitly.

====== Example 3 ======
postgres=# \d+ m1
                                  Materialized view "public.m1"
    Column     |  Type  | Collation | Nullable | Default | Storage  | Stats target | Description 
---------------+--------+-----------+----------+---------+----------+--------------+-------------
 t             | text   |           |          |         | extended |              | 
 __ivm_count__ | bigint |           |          |         | plain    |              | 
View definition:
 SELECT t1.t
   FROM t1;
Access method: heap

postgres=# \d+ m2
                                  Materialized view "public.m2"
    Column     |  Type  | Collation | Nullable | Default | Storage  | Stats target | Description 
---------------+--------+-----------+----------+---------+----------+--------------+-------------
 t             | text   |           |          |         | extended |              | 
 __ivm_count__ | bigint |           |          |         | plain    |              | 
View definition:
 SELECT DISTINCT t1.t
   FROM t1;
Access method: heap

postgres=# SELECT *, __ivm_count__ FROM m1;
 t | __ivm_count__ 
---+---------------
 B |             2
 B |             2
 A |             1
(3 rows)

postgres=# SELECT *, __ivm_count__ FROM m2;
 t | __ivm_count__ 
---+---------------
 B |             2
 A |             1
(2 rows)

postgres=# EXPLAIN SELECT * FROM m1;
                                  QUERY PLAN                                  
------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..61.03 rows=3000 width=2)
   ->  Seq Scan on m1 mv  (cost=0.00..1.03 rows=3 width=10)
   ->  Function Scan on generate_series  (cost=0.00..10.00 rows=1000 width=0)
(3 rows)
=============================

* Simple Performance Evaluation

I confirmed that "incremental" update of matviews is more effective
than the standard REFRESH by using simple exapmle.  I used tables
of pgbench (SF=100) here.

Create two matviews, that is, without and with IVM.

test=# CREATE MATERIALIZED VIEW bench1 AS
        SELECT aid, bid, abalance, bbalance
        FROM pgbench_accounts JOIN pgbench_branches USING (bid)
        WHERE abalance > 0 OR bbalance > 0;
SELECT 5001054
test=# CREATE INCREMENTAL MATERIALIZED VIEW bench2 AS
        SELECT aid, bid, abalance, bbalance
        FROM pgbench_accounts JOIN pgbench_branches USING (bid)
        WHERE abalance > 0 OR bbalance > 0;
SELECT 5001054

The standard REFRESH of bench1 took more than 10 seconds.

test=# \timing 
Timing is on.
test=# REFRESH MATERIALIZED VIEW bench1 ;
REFRESH MATERIALIZED VIEW
Time: 11210.563 ms (00:11.211)

Create an index on the IVM matview (bench2).

test=# CREATE INDEX on bench2(aid,bid);
CREATE INDEX

Updating a tuple in pgbench_accounts took 18ms. After this, bench2
was updated automatically and correctly.

test=# SELECT * FROM bench2 WHERE aid = 1;
 aid | bid | abalance | bbalance 
-----+-----+----------+----------
   1 |   1 |       10 |       10
(1 row)

Time: 2.498 ms
test=# UPDATE pgbench_accounts SET abalance = 1000 WHERE aid = 1;
UPDATE 1
Time: 18.634 ms
test=# SELECT * FROM bench2 WHERE aid = 1;
 aid | bid | abalance | bbalance 
-----+-----+----------+----------
   1 |   1 |     1000 |       10
(1 row)

However, if there is not the index on bench2, it took 4 sec, so
appropriate indexes are needed on IVM matviews.

test=# DROP INDEX bench2_aid_bid_idx ;
DROP INDEX
Time: 10.613 ms
test=# UPDATE pgbench_accounts SET abalance = 2000 WHERE aid = 1;
UPDATE 1
Time: 3931.274 ms (00:03.931)

* Restrictions on view definition

This patch is still in Work-in-Progress and there are many restrictions
on the view definition query of matviews.

The current implementation supports views including selection, projection,
and inner join with or without DISTINCT. Aggregation and GROUP BY are not
supported yet, but I plan to deal with these by the first release. 
Self-join, subqueries, OUTER JOIN, CTE, window functions are not
considered well, either. I need more investigation on these type of views
although I found some papers explaining how to handle sub-queries and
outer-joins. 

These unsupported views should be checked when a matview is created, but
this is not implemented yet.  Hoshiai-san are working on this.

* Timing of view maintenance

This patch implements a kind of Immediate Maintenance, that is, a matview
is updated immediately when a base table is modified.  On other hand, in
"Deferred Maintenance", matviews are updated after the transaction, for
example, by the user command like REFRESH. 

For implementing "deferred", it is need to implement a mechanism to maintain
logs for recording changes of base tables and an algorithm to compute the
delta to be applied to matviews. 

In addition, there could be another implementation of Immediate Maintenance
in which matview is updated at the end of a transaction that modified base
table, rather than in AFTER trigger. Oracle supports this type of IVM. To
implement this, we will need a mechanism to maintain change logs on base
tables as well as Deferred maintenance.

* Counting algorithm implementation

There will be also discussions on counting-algorithm implementation.
Firstly, the current patch treats "__ivm_count__" as a special column name
in a somewhat ad hoc way. This is used when maintaining and accessing matviews,
and when "SELECT * FROM ..." is issued,  __ivm_count__ column is invisible for
users.  Maybe this name has to be inhibited in user tables. Is it acceptable
to use such columns for IVM, and is there better way, if not?

Secondly, a matview with duplicate tuples is replaces with a subquery which
uses generate_series function. It does not have to be generate_series, and we
can make a new set returning function for this. Anyway, this internal behaviour
is visible in EXPLAIN results as shown in Example 3. Also, there is a
performance impact because   estimated rows number is wrong, and what is worse,
the cost of join is not small when the size of matview is large. Therefore, we
might have to add a new plan node for selecting from matviews rather than using
such a special set returning function.

Ragards,
-- 
Yugo Nagata <nagata@sraoss.co.jp>

Вложения

WIP_immediate_IVM.patch

Re: Implementing Incremental View Maintenance

От

Yugo Nagata

Дата:

20 июня 2019 г., 07:44:10

Hi hackers,

Thank you for your many questions and feedbacks at PGCon 2019.
Attached is the patch rebased for the current master branch.

Regards,
Yugo Nagata

On Tue, 14 May 2019 15:46:48 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:

> On Mon, 1 Apr 2019 12:11:22 +0900
> Yugo Nagata <nagata@sraoss.co.jp> wrote:
> 
> > On Thu, 27 Dec 2018 21:57:26 +0900
> > Yugo Nagata <nagata@sraoss.co.jp> wrote:
> > 
> > > Hi,
> > > 
> > > I would like to implement Incremental View Maintenance (IVM) on PostgreSQL.  
> > 
> > I am now working on an initial patch for implementing IVM on PostgreSQL.
> > This enables materialized views to be updated incrementally after one
> > of their base tables is modified.
> 
> Attached is a WIP patch of Incremental View Maintenance (IVM).
> Major part is written by me, and changes in syntax and pg_class
> are Hoshiai-san's work.
> 
> Although this is sill a draft patch in work-in-progress, any
> suggestions or thoughts would be appreciated.
>  
> * What it is
> 
> This allows a kind of Immediate Maintenance of materialized views. if a 
> materialized view is created by CRATE INCREMENTAL MATERIALIZED VIEW command,
> the contents of the mateview is updated automatically and incrementally
> after base tables are updated. Noted this syntax is just tentative, so it
> may be changed.
> 
> ====== Example 1 ======
> postgres=# CREATE INCREMENTAL MATERIALIZED VIEW m AS SELECT * FROM t0;
> SELECT 3
> postgres=# SELECT * FROM m;
>  i 
> ---
>  3
>  2
>  1
> (3 rows)
> 
> postgres=# INSERT INTO t0 VALUES (4);
> INSERT 0 1
> postgres=# SELECt * FROM m; -- automatically updated
>  i 
> ---
>  3
>  2
>  1
>  4
> (4 rows)
> =============================
> 
> This implementation also supports matviews including duplicate tuples or
> DISTINCT clause in its view definition query.  For example, even if a matview
> is defined with DISTINCT to remove duplication of tuples in a base table, this
> can perform incremental update of the matview properly. That is, the contents
> of the matview doesn't change when exiting tuples are inserted into the base
> tables, and a tuple in the matview is deleted only when duplicity of the
> corresponding tuple in the base table becomes zero.
> 
> This is due to "colunting alogorithm" in which the number of each tuple is
> stored in matviews as a special column value.
> 
> ====== Example 2 ======
> postgres=# SELECT * FROM t1;
>  id | t 
> ----+---
>   1 | A
>   2 | B
>   3 | C
>   4 | A
> (4 rows)
> 
> postgres=# CREATE INCREMENTAL MATERIALIZED VIEW m1 AS SELECT t FROM t1;
> SELECT 3
> postgres=# CREATE INCREMENTAL MATERIALIZED VIEW m2 AS SELECT DISTINCT t FROM t1;
> SELECT 3
> postgres=# SELECT * FROM m1; -- with duplicity
>  t 
> ---
>  A
>  A
>  C
>  B
> (4 rows)
> 
> postgres=# SELECT * FROM m2;
>  t 
> ---
>  A
>  B
>  C
> (3 rows)
> 
> postgres=# INSERT INTO t1 VALUES (5, 'B');
> INSERT 0 1
> postgres=# DELETE FROM t1 WHERE id IN (1,3); -- delete (1,A),(3,C)
> DELETE 2
> postgres=# SELECT * FROM m1; -- one A left and one more B
>  t 
> ---
>  B
>  B
>  A
> (3 rows)
> 
> postgres=# SELECT * FROM m2; -- only C is removed
>  t 
> ---
>  B
>  A
> (2 rows)
> =============================
> 
> * How it works
> 
> 1. Creating matview
> 
> When a matview is created, AFTER triggers are internally created
> on its base tables. When the base tables is modified (INSERT, DELETE,
> UPDATE), the matview is updated incrementally in the trigger function.
> 
> When populating the matview, GROUP BY and count(*) are added to the
> view definition query before this is executed for counting duplicity
> of tuples in the matview. The result of count is stored in the matview
> as a special column named "__ivm_count__". 
> 
> 2. Maintenance of matview
> 
> When base tables are modified, the change set of the table can be
> referred as Ephemeral Named Relations (ENRs) thanks to Transition Table
> (a feature of trigger implemented since PG10). We can calculate the diff
> set of the matview by replacing the base table in the view definition
> query with the ENR (at least if it is Selection-Projection -Join view). 
> As well as view definition time, GROUP BY and count(*) is added in order
> to count the duplicity of tuples in the diff set. As a result, two diff
> sets (to be deleted from and to be inserted into the matview) are
> calculated, and the results are stored into temporary tables respectively.
> 
> The matiview is updated by merging these change sets. Instead of executing
> DELETE or INSERT simply, the values of __ivm_count__ column in the matview
> is decreased or increased. When the values becomes zero, the corresponding
> tuple is deleted from the matview.
> 
> 3. Access to matview
> 
> When SELECT is issued for IVM matviews defined with DISTINCT, all columns
> except to __ivm_count__ of each tuple in the matview are returned.  This is 
> correct because duplicity of tuples are eliminated by GROUP BY.
> 
> When DISTINCT is not used, SELECT for the IVM matviews returns each tuple
> __ivm_count__ times. Currently, this is implemented by rewriting the SELECT
> query to replace the matview RTE with a subquery which joins the matview
> and generate_series function as bellow. 
> 
>  SELECT mv.* FROM mv, generate_series(1, mv.__ivm_count__);
> 
> __ivm_count__ column is invisible for users when "SELECT * FROM ..." is
> issued, but users can see the value by specifying in target list explicitly.
> 
> ====== Example 3 ======
> postgres=# \d+ m1
>                                   Materialized view "public.m1"
>     Column     |  Type  | Collation | Nullable | Default | Storage  | Stats target | Description 
> ---------------+--------+-----------+----------+---------+----------+--------------+-------------
>  t             | text   |           |          |         | extended |              | 
>  __ivm_count__ | bigint |           |          |         | plain    |              | 
> View definition:
>  SELECT t1.t
>    FROM t1;
> Access method: heap
> 
> postgres=# \d+ m2
>                                   Materialized view "public.m2"
>     Column     |  Type  | Collation | Nullable | Default | Storage  | Stats target | Description 
> ---------------+--------+-----------+----------+---------+----------+--------------+-------------
>  t             | text   |           |          |         | extended |              | 
>  __ivm_count__ | bigint |           |          |         | plain    |              | 
> View definition:
>  SELECT DISTINCT t1.t
>    FROM t1;
> Access method: heap
> 
> postgres=# SELECT *, __ivm_count__ FROM m1;
>  t | __ivm_count__ 
> ---+---------------
>  B |             2
>  B |             2
>  A |             1
> (3 rows)
> 
> postgres=# SELECT *, __ivm_count__ FROM m2;
>  t | __ivm_count__ 
> ---+---------------
>  B |             2
>  A |             1
> (2 rows)
> 
> postgres=# EXPLAIN SELECT * FROM m1;
>                                   QUERY PLAN                                  
> ------------------------------------------------------------------------------
>  Nested Loop  (cost=0.00..61.03 rows=3000 width=2)
>    ->  Seq Scan on m1 mv  (cost=0.00..1.03 rows=3 width=10)
>    ->  Function Scan on generate_series  (cost=0.00..10.00 rows=1000 width=0)
> (3 rows)
> =============================
> 
> * Simple Performance Evaluation
> 
> I confirmed that "incremental" update of matviews is more effective
> than the standard REFRESH by using simple exapmle.  I used tables
> of pgbench (SF=100) here.
> 
> Create two matviews, that is, without and with IVM.
> 
> test=# CREATE MATERIALIZED VIEW bench1 AS
>         SELECT aid, bid, abalance, bbalance
>         FROM pgbench_accounts JOIN pgbench_branches USING (bid)
>         WHERE abalance > 0 OR bbalance > 0;
> SELECT 5001054
> test=# CREATE INCREMENTAL MATERIALIZED VIEW bench2 AS
>         SELECT aid, bid, abalance, bbalance
>         FROM pgbench_accounts JOIN pgbench_branches USING (bid)
>         WHERE abalance > 0 OR bbalance > 0;
> SELECT 5001054
> 
> The standard REFRESH of bench1 took more than 10 seconds.
> 
> test=# \timing 
> Timing is on.
> test=# REFRESH MATERIALIZED VIEW bench1 ;
> REFRESH MATERIALIZED VIEW
> Time: 11210.563 ms (00:11.211)
> 
> Create an index on the IVM matview (bench2).
> 
> test=# CREATE INDEX on bench2(aid,bid);
> CREATE INDEX
> 
> Updating a tuple in pgbench_accounts took 18ms. After this, bench2
> was updated automatically and correctly.
> 
> test=# SELECT * FROM bench2 WHERE aid = 1;
>  aid | bid | abalance | bbalance 
> -----+-----+----------+----------
>    1 |   1 |       10 |       10
> (1 row)
> 
> Time: 2.498 ms
> test=# UPDATE pgbench_accounts SET abalance = 1000 WHERE aid = 1;
> UPDATE 1
> Time: 18.634 ms
> test=# SELECT * FROM bench2 WHERE aid = 1;
>  aid | bid | abalance | bbalance 
> -----+-----+----------+----------
>    1 |   1 |     1000 |       10
> (1 row)
> 
> However, if there is not the index on bench2, it took 4 sec, so
> appropriate indexes are needed on IVM matviews.
> 
> test=# DROP INDEX bench2_aid_bid_idx ;
> DROP INDEX
> Time: 10.613 ms
> test=# UPDATE pgbench_accounts SET abalance = 2000 WHERE aid = 1;
> UPDATE 1
> Time: 3931.274 ms (00:03.931)
> 
> * Restrictions on view definition
> 
> This patch is still in Work-in-Progress and there are many restrictions
> on the view definition query of matviews.
> 
> The current implementation supports views including selection, projection,
> and inner join with or without DISTINCT. Aggregation and GROUP BY are not
> supported yet, but I plan to deal with these by the first release. 
> Self-join, subqueries, OUTER JOIN, CTE, window functions are not
> considered well, either. I need more investigation on these type of views
> although I found some papers explaining how to handle sub-queries and
> outer-joins. 
> 
> These unsupported views should be checked when a matview is created, but
> this is not implemented yet.  Hoshiai-san are working on this.
> 
> * Timing of view maintenance
> 
> This patch implements a kind of Immediate Maintenance, that is, a matview
> is updated immediately when a base table is modified.  On other hand, in
> "Deferred Maintenance", matviews are updated after the transaction, for
> example, by the user command like REFRESH. 
> 
> For implementing "deferred", it is need to implement a mechanism to maintain
> logs for recording changes of base tables and an algorithm to compute the
> delta to be applied to matviews. 
>  
> In addition, there could be another implementation of Immediate Maintenance
> in which matview is updated at the end of a transaction that modified base
> table, rather than in AFTER trigger. Oracle supports this type of IVM. To
> implement this, we will need a mechanism to maintain change logs on base
> tables as well as Deferred maintenance.
> 
> * Counting algorithm implementation
> 
> There will be also discussions on counting-algorithm implementation.
> Firstly, the current patch treats "__ivm_count__" as a special column name
> in a somewhat ad hoc way. This is used when maintaining and accessing matviews,
> and when "SELECT * FROM ..." is issued,  __ivm_count__ column is invisible for
> users.  Maybe this name has to be inhibited in user tables. Is it acceptable
> to use such columns for IVM, and is there better way, if not?
> 
> Secondly, a matview with duplicate tuples is replaces with a subquery which
> uses generate_series function. It does not have to be generate_series, and we
> can make a new set returning function for this. Anyway, this internal behaviour
> is visible in EXPLAIN results as shown in Example 3. Also, there is a
> performance impact because   estimated rows number is wrong, and what is worse,
> the cost of join is not small when the size of matview is large. Therefore, we
> might have to add a new plan node for selecting from matviews rather than using
> such a special set returning function.
> 
> 
> Ragards,
> -- 
> Yugo Nagata <nagata@sraoss.co.jp>


-- 
Yugo Nagata <nagata@sraoss.co.jp>

Вложения

WIP_immediate_IVM_v2.patch

Re: Implementing Incremental View Maintenance

От

Jim Finnerty

Дата:

21 июня 2019 г., 15:41:11

Hi Yugo,

    I'd like to compare the performance of your MV refresh algorithm versus
an approach that logs changes into an mv log table, and can then apply the
changes at some later point in time.  I'd like to handle the materialized
join view (mjv) case first, specifically a 2-way left outer join, with a UDF
in the SELECT list of the mjv.

    Does your refresh algorithm handle mjv's with connected join graphs that
consist entirely of inner and left outer joins?

    If so, I'd like to measure the overhead of your refresh algorithm on
pgbench, modified to include an mjv, versus a (hand coded) incremental
maintenance algorithm that uses mv log tables populated by ordinary
triggers.  We may also want to look at capturing the deltas using logical
replication, which ought to be faster than a trigger-based solution. 

    I have someone available to do the performance testing for another 2
months, so if you can connect with me off-list to coordinate, we can set up
the performance experiments and run them on our AWS clusters.

best regards,

    /Jim F



-----
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: Implementing Incremental View Maintenance

От

Yugo Nagata

Дата:

28 июня 2019 г., 10:01:43

Hi Jim,

On Fri, 21 Jun 2019 08:41:11 -0700 (MST)
Jim Finnerty <jfinnert@amazon.com> wrote:

> Hi Yugo,
> 
>     I'd like to compare the performance of your MV refresh algorithm versus
> an approach that logs changes into an mv log table, and can then apply the
> changes at some later point in time.  I'd like to handle the materialized
> join view (mjv) case first, specifically a 2-way left outer join, with a UDF
> in the SELECT list of the mjv.

Do you mean you have your implementation of IVM that using log tables?
I'm so interested in this, and  I would appreciate it if you explain the  
detail.

>     Does your refresh algorithm handle mjv's with connected join graphs that
> consist entirely of inner and left outer joins?

>     If so, I'd like to measure the overhead of your refresh algorithm on
> pgbench, modified to include an mjv, versus a (hand coded) incremental
> maintenance algorithm that uses mv log tables populated by ordinary
> triggers.  We may also want to look at capturing the deltas using logical
> replication, which ought to be faster than a trigger-based solution. 

In the current our implementation, outer joins is not yet supported though
we plan to handle this in future. So,we would not be able to compare these 
directly in the same workload in the current status.

However, the current our implementation supports only the way to update 
materialized views in a trigger, and the performance of modifying base tables 
will be lower than the approach  which uses log tables. This is because queries 
to update materialized views are issued in the trigger. This is not only a 
overhead itself, but also takes a lock on a materialized view, which has an ]
impact on concurrent execution performance. 

In the previous our PoC, we implemented IVM using log tables, in which logs are 
captured by triggers and materialized views are update incrementally by a user 
command[1]. However, to implement log table approach, we need a infrastructure 
to maintain these logs. For example, which logs are necessary and which logs 
can be discarded, etc. We thought this is not trivial work, so we decided to 
start from the current approach which doesn't use log tables. We are now 
preparing to implement this in the next step because this is also needed to 
support deferred maintenance of views.

[1]
https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/

I agree that capturing the deltas using logical decoding will be faster than 
using a trigger although we haven't yet consider this well.

Best regadrds,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Yugo Nagata

Дата:

28 июня 2019 г., 10:56:20

Hi,

Attached is a WIP patch of IVM which supports some aggregate functions.

Currently, only count and sum are supported. Avg, min, or max is not supported 
although I think supporting this would not be so hard. 

As a restriction, expressions specified in GROUP BY must appear in the target 
list of views because tuples to be updated in MV are identified by using this 
group keys.


In the case of views without aggregate functions, only the number of tuple 
duplicates (__ivm_count__) are updated at incremental maintenance. On the other 
hand, in the case of vies with aggregations, the aggregated values are also 
updated. The way of update depends the kind of aggregate function.

In the case of sum (or agg functions except to count), NULL in input values is 
ignored, and this returns a null value when no rows are selected. To support 
this specification, the number of not-NULL input values is counted and stored 
in MV as a hidden column whose name is like __ivm_count_sum__, for example.

In the case of count, this returns zero when no rows are selected, and count(*) 
doesn't ignore NULL input. These specification are also supported.

Tuples to be updated in MV are identified by using keys specified by GROUP BY 
clause. However, in the case of aggregation without GROUP BY, there is only one 
tuple in the view, so keys are not uses to identify tuples.


In addition, a race condition which occurred in the previous version is 
prevented in this patch. In the previous version, when two translocations 
change a base tables concurrently, an anormal update of MV was possible because 
a change in one transaction was not visible for another transaction even in 
READ COMMITTED level. 

To prevent this, I fix this to take a lock in early stage of view maintenance 
to wait for concurrent transactions which are updating the same MV end. Also, 
we have to get the latest snapshot before computting delta tables because any 
changes which occurs in other transaction during lock waiting is not visible 
even in READ COMMITTED level.

In REPEATABLE READ or SERIALIZABLE level, don't wait a lock, and raise an error 
immediately to prevent anormal update. These solutions might be ugly, but 
something to prevent anormal update is anyway necessary. There may be better 
way.


Moreover, some regression test are added for aggregate functions support.
This is Hoshiai-san's work.

Although the code is not refined yet and I will need a deal of refactoring
and  reorganizing, I submitted this to share the current status.


* Exapmle (from regression test)

=======================================================================
(1) creating tables

CREATE TABLE mv_base_a (i int, j int);
INSERT INTO mv_base_a VALUES
  (1,10),
  (2,20),
  (3,30),
  (4,40),
  (5,50);


(2) Views sith SUM() and COUNT() aggregation function

BEGIN;
CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_agg AS SELECT i, SUM(j), COUNT(i)  FROM mv_base_a GROUP BY i;
SELECT * FROM mv_ivm_agg ORDER BY 1,2,3;
 i | sum | count
---+-----+-------
 1 |  10 |     1
 2 |  20 |     1
 3 |  30 |     1
 4 |  40 |     1
 5 |  50 |     1
(5 rows)

INSERT INTO mv_base_a VALUES(2,100);
SELECT * FROM mv_ivm_agg ORDER BY 1,2,3;
 i | sum | count
---+-----+-------
 1 |  10 |     1
 2 | 120 |     2
 3 |  30 |     1
 4 |  40 |     1
 5 |  50 |     1
(5 rows)

UPDATE mv_base_a SET j = 200 WHERE (i,j) = (2,100);
SELECT * FROM mv_ivm_agg ORDER BY 1,2,3;
 i | sum | count
---+-----+-------
 1 |  10 |     1
 2 | 220 |     2
 3 |  30 |     1
 4 |  40 |     1
 5 |  50 |     1
(5 rows)

DELETE FROM mv_base_a WHERE (i,j) = (2,200);
SELECT * FROM mv_ivm_agg ORDER BY 1,2,3;
 i | sum | count
---+-----+-------
 1 |  10 |     1
 2 |  20 |     1
 3 |  30 |     1
 4 |  40 |     1
 5 |  50 |     1
(5 rows)

ROLLBACK;


(3) Views with COUNT(*) aggregation function

BEGIN;
CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_agg AS SELECT i, SUM(j),COUNT(*)  FROM mv_base_a GROUP BY i;
SELECT * FROM mv_ivm_agg ORDER BY 1,2,3;
 i | sum | count
---+-----+-------
 1 |  10 |     1
 2 |  20 |     1
 3 |  30 |     1
 4 |  40 |     1
 5 |  50 |     1
(5 rows)

INSERT INTO mv_base_a VALUES(2,100);
SELECT * FROM mv_ivm_agg ORDER BY 1,2,3;
 i | sum | count
---+-----+-------
 1 |  10 |     1
 2 | 120 |     2
 3 |  30 |     1
 4 |  40 |     1
 5 |  50 |     1
(5 rows)

ROLLBACK;

(4) Views with aggregation function without GROUP clause

BEGIN;
CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_group AS SELECT SUM(j)FROM mv_base_a;
SELECT * FROM mv_ivm_group ORDER BY 1;
 sum
-----
 150
(1 row)

INSERT INTO mv_base_a VALUES(6,20);
SELECT * FROM mv_ivm_group ORDER BY 1;
 sum
-----
 170
(1 row)
=======================================================================



On Thu, 20 Jun 2019 16:44:10 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:

> Hi hackers,
> 
> Thank you for your many questions and feedbacks at PGCon 2019.
> Attached is the patch rebased for the current master branch.
> 
> Regards,
> Yugo Nagata
> 
> On Tue, 14 May 2019 15:46:48 +0900
> Yugo Nagata <nagata@sraoss.co.jp> wrote:
> 
> > On Mon, 1 Apr 2019 12:11:22 +0900
> > Yugo Nagata <nagata@sraoss.co.jp> wrote:
> > 
> > > On Thu, 27 Dec 2018 21:57:26 +0900
> > > Yugo Nagata <nagata@sraoss.co.jp> wrote:
> > > 
> > > > Hi,
> > > > 
> > > > I would like to implement Incremental View Maintenance (IVM) on PostgreSQL.  
> > > 
> > > I am now working on an initial patch for implementing IVM on PostgreSQL.
> > > This enables materialized views to be updated incrementally after one
> > > of their base tables is modified.
> > 
> > Attached is a WIP patch of Incremental View Maintenance (IVM).
> > Major part is written by me, and changes in syntax and pg_class
> > are Hoshiai-san's work.
> > 
> > Although this is sill a draft patch in work-in-progress, any
> > suggestions or thoughts would be appreciated.
> >  
> > * What it is
> > 
> > This allows a kind of Immediate Maintenance of materialized views. if a 
> > materialized view is created by CRATE INCREMENTAL MATERIALIZED VIEW command,
> > the contents of the mateview is updated automatically and incrementally
> > after base tables are updated. Noted this syntax is just tentative, so it
> > may be changed.
> > 
> > ====== Example 1 ======
> > postgres=# CREATE INCREMENTAL MATERIALIZED VIEW m AS SELECT * FROM t0;
> > SELECT 3
> > postgres=# SELECT * FROM m;
> >  i 
> > ---
> >  3
> >  2
> >  1
> > (3 rows)
> > 
> > postgres=# INSERT INTO t0 VALUES (4);
> > INSERT 0 1
> > postgres=# SELECt * FROM m; -- automatically updated
> >  i 
> > ---
> >  3
> >  2
> >  1
> >  4
> > (4 rows)
> > =============================
> > 
> > This implementation also supports matviews including duplicate tuples or
> > DISTINCT clause in its view definition query.  For example, even if a matview
> > is defined with DISTINCT to remove duplication of tuples in a base table, this
> > can perform incremental update of the matview properly. That is, the contents
> > of the matview doesn't change when exiting tuples are inserted into the base
> > tables, and a tuple in the matview is deleted only when duplicity of the
> > corresponding tuple in the base table becomes zero.
> > 
> > This is due to "colunting alogorithm" in which the number of each tuple is
> > stored in matviews as a special column value.
> > 
> > ====== Example 2 ======
> > postgres=# SELECT * FROM t1;
> >  id | t 
> > ----+---
> >   1 | A
> >   2 | B
> >   3 | C
> >   4 | A
> > (4 rows)
> > 
> > postgres=# CREATE INCREMENTAL MATERIALIZED VIEW m1 AS SELECT t FROM t1;
> > SELECT 3
> > postgres=# CREATE INCREMENTAL MATERIALIZED VIEW m2 AS SELECT DISTINCT t FROM t1;
> > SELECT 3
> > postgres=# SELECT * FROM m1; -- with duplicity
> >  t 
> > ---
> >  A
> >  A
> >  C
> >  B
> > (4 rows)
> > 
> > postgres=# SELECT * FROM m2;
> >  t 
> > ---
> >  A
> >  B
> >  C
> > (3 rows)
> > 
> > postgres=# INSERT INTO t1 VALUES (5, 'B');
> > INSERT 0 1
> > postgres=# DELETE FROM t1 WHERE id IN (1,3); -- delete (1,A),(3,C)
> > DELETE 2
> > postgres=# SELECT * FROM m1; -- one A left and one more B
> >  t 
> > ---
> >  B
> >  B
> >  A
> > (3 rows)
> > 
> > postgres=# SELECT * FROM m2; -- only C is removed
> >  t 
> > ---
> >  B
> >  A
> > (2 rows)
> > =============================
> > 
> > * How it works
> > 
> > 1. Creating matview
> > 
> > When a matview is created, AFTER triggers are internally created
> > on its base tables. When the base tables is modified (INSERT, DELETE,
> > UPDATE), the matview is updated incrementally in the trigger function.
> > 
> > When populating the matview, GROUP BY and count(*) are added to the
> > view definition query before this is executed for counting duplicity
> > of tuples in the matview. The result of count is stored in the matview
> > as a special column named "__ivm_count__". 
> > 
> > 2. Maintenance of matview
> > 
> > When base tables are modified, the change set of the table can be
> > referred as Ephemeral Named Relations (ENRs) thanks to Transition Table
> > (a feature of trigger implemented since PG10). We can calculate the diff
> > set of the matview by replacing the base table in the view definition
> > query with the ENR (at least if it is Selection-Projection -Join view). 
> > As well as view definition time, GROUP BY and count(*) is added in order
> > to count the duplicity of tuples in the diff set. As a result, two diff
> > sets (to be deleted from and to be inserted into the matview) are
> > calculated, and the results are stored into temporary tables respectively.
> > 
> > The matiview is updated by merging these change sets. Instead of executing
> > DELETE or INSERT simply, the values of __ivm_count__ column in the matview
> > is decreased or increased. When the values becomes zero, the corresponding
> > tuple is deleted from the matview.
> > 
> > 3. Access to matview
> > 
> > When SELECT is issued for IVM matviews defined with DISTINCT, all columns
> > except to __ivm_count__ of each tuple in the matview are returned.  This is 
> > correct because duplicity of tuples are eliminated by GROUP BY.
> > 
> > When DISTINCT is not used, SELECT for the IVM matviews returns each tuple
> > __ivm_count__ times. Currently, this is implemented by rewriting the SELECT
> > query to replace the matview RTE with a subquery which joins the matview
> > and generate_series function as bellow. 
> > 
> >  SELECT mv.* FROM mv, generate_series(1, mv.__ivm_count__);
> > 
> > __ivm_count__ column is invisible for users when "SELECT * FROM ..." is
> > issued, but users can see the value by specifying in target list explicitly.
> > 
> > ====== Example 3 ======
> > postgres=# \d+ m1
> >                                   Materialized view "public.m1"
> >     Column     |  Type  | Collation | Nullable | Default | Storage  | Stats target | Description 
> > ---------------+--------+-----------+----------+---------+----------+--------------+-------------
> >  t             | text   |           |          |         | extended |              | 
> >  __ivm_count__ | bigint |           |          |         | plain    |              | 
> > View definition:
> >  SELECT t1.t
> >    FROM t1;
> > Access method: heap
> > 
> > postgres=# \d+ m2
> >                                   Materialized view "public.m2"
> >     Column     |  Type  | Collation | Nullable | Default | Storage  | Stats target | Description 
> > ---------------+--------+-----------+----------+---------+----------+--------------+-------------
> >  t             | text   |           |          |         | extended |              | 
> >  __ivm_count__ | bigint |           |          |         | plain    |              | 
> > View definition:
> >  SELECT DISTINCT t1.t
> >    FROM t1;
> > Access method: heap
> > 
> > postgres=# SELECT *, __ivm_count__ FROM m1;
> >  t | __ivm_count__ 
> > ---+---------------
> >  B |             2
> >  B |             2
> >  A |             1
> > (3 rows)
> > 
> > postgres=# SELECT *, __ivm_count__ FROM m2;
> >  t | __ivm_count__ 
> > ---+---------------
> >  B |             2
> >  A |             1
> > (2 rows)
> > 
> > postgres=# EXPLAIN SELECT * FROM m1;
> >                                   QUERY PLAN                                  
> > ------------------------------------------------------------------------------
> >  Nested Loop  (cost=0.00..61.03 rows=3000 width=2)
> >    ->  Seq Scan on m1 mv  (cost=0.00..1.03 rows=3 width=10)
> >    ->  Function Scan on generate_series  (cost=0.00..10.00 rows=1000 width=0)
> > (3 rows)
> > =============================
> > 
> > * Simple Performance Evaluation
> > 
> > I confirmed that "incremental" update of matviews is more effective
> > than the standard REFRESH by using simple exapmle.  I used tables
> > of pgbench (SF=100) here.
> > 
> > Create two matviews, that is, without and with IVM.
> > 
> > test=# CREATE MATERIALIZED VIEW bench1 AS
> >         SELECT aid, bid, abalance, bbalance
> >         FROM pgbench_accounts JOIN pgbench_branches USING (bid)
> >         WHERE abalance > 0 OR bbalance > 0;
> > SELECT 5001054
> > test=# CREATE INCREMENTAL MATERIALIZED VIEW bench2 AS
> >         SELECT aid, bid, abalance, bbalance
> >         FROM pgbench_accounts JOIN pgbench_branches USING (bid)
> >         WHERE abalance > 0 OR bbalance > 0;
> > SELECT 5001054
> > 
> > The standard REFRESH of bench1 took more than 10 seconds.
> > 
> > test=# \timing 
> > Timing is on.
> > test=# REFRESH MATERIALIZED VIEW bench1 ;
> > REFRESH MATERIALIZED VIEW
> > Time: 11210.563 ms (00:11.211)
> > 
> > Create an index on the IVM matview (bench2).
> > 
> > test=# CREATE INDEX on bench2(aid,bid);
> > CREATE INDEX
> > 
> > Updating a tuple in pgbench_accounts took 18ms. After this, bench2
> > was updated automatically and correctly.
> > 
> > test=# SELECT * FROM bench2 WHERE aid = 1;
> >  aid | bid | abalance | bbalance 
> > -----+-----+----------+----------
> >    1 |   1 |       10 |       10
> > (1 row)
> > 
> > Time: 2.498 ms
> > test=# UPDATE pgbench_accounts SET abalance = 1000 WHERE aid = 1;
> > UPDATE 1
> > Time: 18.634 ms
> > test=# SELECT * FROM bench2 WHERE aid = 1;
> >  aid | bid | abalance | bbalance 
> > -----+-----+----------+----------
> >    1 |   1 |     1000 |       10
> > (1 row)
> > 
> > However, if there is not the index on bench2, it took 4 sec, so
> > appropriate indexes are needed on IVM matviews.
> > 
> > test=# DROP INDEX bench2_aid_bid_idx ;
> > DROP INDEX
> > Time: 10.613 ms
> > test=# UPDATE pgbench_accounts SET abalance = 2000 WHERE aid = 1;
> > UPDATE 1
> > Time: 3931.274 ms (00:03.931)
> > 
> > * Restrictions on view definition
> > 
> > This patch is still in Work-in-Progress and there are many restrictions
> > on the view definition query of matviews.
> > 
> > The current implementation supports views including selection, projection,
> > and inner join with or without DISTINCT. Aggregation and GROUP BY are not
> > supported yet, but I plan to deal with these by the first release. 
> > Self-join, subqueries, OUTER JOIN, CTE, window functions are not
> > considered well, either. I need more investigation on these type of views
> > although I found some papers explaining how to handle sub-queries and
> > outer-joins. 
> > 
> > These unsupported views should be checked when a matview is created, but
> > this is not implemented yet.  Hoshiai-san are working on this.
> > 
> > * Timing of view maintenance
> > 
> > This patch implements a kind of Immediate Maintenance, that is, a matview
> > is updated immediately when a base table is modified.  On other hand, in
> > "Deferred Maintenance", matviews are updated after the transaction, for
> > example, by the user command like REFRESH. 
> > 
> > For implementing "deferred", it is need to implement a mechanism to maintain
> > logs for recording changes of base tables and an algorithm to compute the
> > delta to be applied to matviews. 
> >  
> > In addition, there could be another implementation of Immediate Maintenance
> > in which matview is updated at the end of a transaction that modified base
> > table, rather than in AFTER trigger. Oracle supports this type of IVM. To
> > implement this, we will need a mechanism to maintain change logs on base
> > tables as well as Deferred maintenance.
> > 
> > * Counting algorithm implementation
> > 
> > There will be also discussions on counting-algorithm implementation.
> > Firstly, the current patch treats "__ivm_count__" as a special column name
> > in a somewhat ad hoc way. This is used when maintaining and accessing matviews,
> > and when "SELECT * FROM ..." is issued,  __ivm_count__ column is invisible for
> > users.  Maybe this name has to be inhibited in user tables. Is it acceptable
> > to use such columns for IVM, and is there better way, if not?
> > 
> > Secondly, a matview with duplicate tuples is replaces with a subquery which
> > uses generate_series function. It does not have to be generate_series, and we
> > can make a new set returning function for this. Anyway, this internal behaviour
> > is visible in EXPLAIN results as shown in Example 3. Also, there is a
> > performance impact because   estimated rows number is wrong, and what is worse,
> > the cost of join is not small when the size of matview is large. Therefore, we
> > might have to add a new plan node for selecting from matviews rather than using
> > such a special set returning function.
> > 
> > 
> > Ragards,
> > -- 
> > Yugo Nagata <nagata@sraoss.co.jp>
> 
> 
> -- 
> Yugo Nagata <nagata@sraoss.co.jp>


-- 
Yugo Nagata <nagata@sraoss.co.jp>

Вложения

WIP_immediate_IVM_v3.patch

Re: Implementing Incremental View Maintenance

От

Yugo Nagata

Дата:

28 июня 2019 г., 11:03:54

Hi Greg,

On Wed, 3 Apr 2019 17:41:36 -0400
Greg Stark <stark@mit.edu> wrote:

> On Sun, 31 Mar 2019 at 23:22, Yugo Nagata <nagata@sraoss.co.jp> wrote:
> >
> > Firstly, this will handle simple definition views which includes only
> > selection, projection, and join.  Standard aggregations (count, sum, avg,
> > min, max) are not planned to be implemented in the first patch, but these
> > are commonly used in materialized views, so I'll implement them later on.
> 
> It's fine to not have all the features from day 1 of course. But I
> just picked up this comment and the followup talking about splitting
> AVG into SUM and COUNT and I had a comment. When you do look at
> tackling aggregates I don't think you should restrict yourself to
> these specific standard aggregations. We have all the necessary
> abstractions to handle all aggregations that are feasible, see
> https://www.postgresql.org/docs/devel/xaggr.html#XAGGR-MOVING-AGGREGATES
> 
> What you need to do -- I think -- is store the "moving aggregate
> state" before the final function. Then whenever a row is inserted or
> deleted or updated (or whenever another column is updated which causes
> the value to row to enter or leave the aggregation) apply either
> aggtransfn or aggminvtransfn to the state. I'm not sure if you want to
> apply the final function on every update or only lazily either may be
> better in some usage.

Thank you for your suggestion! I submitted the latest patch just now supporting 
some aggregate functions, but this supports only sum and count, and lacking a 
kind of generalization. However,  I would like to refine this to support more 
general aggregate functions. I think your suggestions is helpful for me to do 
this. Thank you!

Best regards,
Yugo Nagata


-- 
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Thomas Munro

Дата:

08 июля 2019 г., 06:31:47

On Fri, Jun 28, 2019 at 10:56 PM Yugo Nagata <nagata@sraoss.co.jp> wrote:
> Attached is a WIP patch of IVM which supports some aggregate functions.

Hi Nagata-san and Hoshiai-san,

Thank you for working on this.  I enjoyed your talk at PGCon.  I've
added Kevin Grittner just in case he missed this thread; he has talked
often about implementing the counting algorithm, and he wrote the
"trigger transition tables" feature to support exactly this.  While
integrating trigger transition tables with the new partition features,
we had to make a number of decisions about how that should work, and
we tried to come up with answers that would work for IMV, and I hope
we made the right choices!

I am quite interested to learn how IVM interacts with SERIALIZABLE.

A couple of superficial review comments:

+            const char *aggname = get_func_name(aggref->aggfnoid);
...
+            else if (!strcmp(aggname, "sum"))

I guess you need a more robust way to detect the supported aggregates
than their name, or I guess some way for aggregates themselves to
specify that they support this and somehow supply the extra logic.
Perhaps I just waid what Greg Stark already said, except not as well.

+                            elog(ERROR, "Aggrege function %s is not
supported", aggname);

s/Aggrege/aggregate/

Of course it is not helpful to comment on typos at this early stage,
it's just that this one appears many times in the test output :-)

+static bool
+isIvmColumn(const char *s)
+{
+    char pre[7];
+
+     strlcpy(pre, s, sizeof(pre));
+    return (strcmp(pre, "__ivm_") == 0);
+}

What about strncmp(s, "__ivm_", 6) == 0)?  As for the question of how
to reserve a namespace for system columns that won't clash with user
columns, according to our manual the SQL standard doesn't allow $ in
identifier names, and according to my copy SQL92 "intermediate SQL"
doesn't allow identifiers that end in an underscore.  I don't know
what the best answer is but we should probably decide on a something
based the standard.

As for how to make internal columns invisible to SELECT *, previously
there have been discussions about doing that using a new flag in
pg_attribute:

https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com

+                            "WITH t AS ("
+                            "  SELECT diff.__ivm_count__,
(diff.__ivm_count__ = mv.__ivm_count__) AS for_dlt, mv.ctid"
+                            ", %s"
+                            "  FROM %s AS mv, %s AS diff WHERE (%s) = (%s)"
+                            "), updt AS ("
+                            "  UPDATE %s AS mv SET __ivm_count__ =
mv.__ivm_count__ - t.__ivm_count__"
+                            ", %s "
+                            "  FROM t WHERE mv.ctid = t.ctid AND NOT for_dlt"
+                            ") DELETE FROM %s AS mv USING t WHERE
mv.ctid = t.ctid AND for_dlt;",

I fully understand that this is POC code, but I am curious about one
thing.  These queries that are executed by apply_delta() would need to
be converted to C, or at least used reusable plans, right?  Hmm,
creating and dropping temporary tables every time is a clue that the
ultimate form of this should be tuplestores and C code, I think,
right?

> Moreover, some regression test are added for aggregate functions support.
> This is Hoshiai-san's work.

Great.  Next time you post a WIP patch, could you please fix this
small compiler warning?

describe.c: In function ‘describeOneTableDetails’:
describe.c:3270:55: error: ‘*((void *)&tableinfo+48)’ may be used
uninitialized in this function [-Werror=maybe-uninitialized]
if (verbose && tableinfo.relkind == RELKIND_MATVIEW && tableinfo.isivm)
^
describe.c:1495:4: note: ‘*((void *)&tableinfo+48)’ was declared here
} tableinfo;
^

Then our unofficial automatic CI system[1] will run these tests every
day, which sometimes finds problems.

[1] cfbot.cputube.org

--
Thomas Munro
https://enterprisedb.com

Re: Implementing Incremental View Maintenance

От

Tatsuo Ishii

Дата:

08 июля 2019 г., 08:04:38

> As for how to make internal columns invisible to SELECT *, previously
> there have been discussions about doing that using a new flag in
> pg_attribute:
> 
> https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com

Now that I realized that there are several use cases for invisible
columns, I think this is the way what we shoud go for.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: Implementing Incremental View Maintenance

От

Takuma Hoshiai

Дата:

10 июля 2019 г., 02:07:15

Hi Thomas,

2019年7月8日(月) 15:32 Thomas Munro <thomas.munro@gmail.com>:

On Fri, Jun 28, 2019 at 10:56 PM Yugo Nagata <nagata@sraoss.co.jp> wrote:
> Attached is a WIP patch of IVM which supports some aggregate functions.

Hi Nagata-san and Hoshiai-san,

Thank you for working on this. I enjoyed your talk at PGCon. I've
added Kevin Grittner just in case he missed this thread; he has talked
often about implementing the counting algorithm, and he wrote the
"trigger transition tables" feature to support exactly this. While
integrating trigger transition tables with the new partition features,
we had to make a number of decisions about how that should work, and
we tried to come up with answers that would work for IMV, and I hope
we made the right choices!

I am quite interested to learn how IVM interacts with SERIALIZABLE.

Nagata-san has been studying this. Nagata-san, any comment?

A couple of superficial review comments:

Thank you for your review comments.
Please find attached patches. The some of your review is reflected in patch too.

We manage and update IVM on following github repository.
https://github.com/sraoss/pgsql-ivm
you also can found latest WIP patch here.

+ const char *aggname = get_func_name(aggref->aggfnoid);
...
+ else if (!strcmp(aggname, "sum"))

I guess you need a more robust way to detect the supported aggregates
than their name, or I guess some way for aggregates themselves to
specify that they support this and somehow supply the extra logic.
Perhaps I just waid what Greg Stark already said, except not as well.

We have recognized the issue and are welcome any input.

+ elog(ERROR, "Aggrege function %s is not
supported", aggname);

s/Aggrege/aggregate/

I fixed this typo.

Of course it is not helpful to comment on typos at this early stage,
it's just that this one appears many times in the test output :-)

+static bool
+isIvmColumn(const char *s)
+{
+ char pre[7];
+
+ strlcpy(pre, s, sizeof(pre));
+ return (strcmp(pre, "__ivm_") == 0);
+}

What about strncmp(s, "__ivm_", 6) == 0)?

I agree with you, I fixed it.

As for the question of how
to reserve a namespace for system columns that won't clash with user
columns, according to our manual the SQL standard doesn't allow $ in
identifier names, and according to my copy SQL92 "intermediate SQL"
doesn't allow identifiers that end in an underscore. I don't know
what the best answer is but we should probably decide on a something
based the standard.

As for how to make internal columns invisible to SELECT *, previously
there have been discussions about doing that using a new flag in
pg_attribute:

https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com

+ "WITH t AS ("
+ " SELECT diff.__ivm_count__,
(diff.__ivm_count__ = mv.__ivm_count__) AS for_dlt, mv.ctid"
+ ", %s"
+ " FROM %s AS mv, %s AS diff WHERE (%s) = (%s)"
+ "), updt AS ("
+ " UPDATE %s AS mv SET __ivm_count__ =
mv.__ivm_count__ - t.__ivm_count__"
+ ", %s "
+ " FROM t WHERE mv.ctid = t.ctid AND NOT for_dlt"
+ ") DELETE FROM %s AS mv USING t WHERE
mv.ctid = t.ctid AND for_dlt;",

I fully understand that this is POC code, but I am curious about one
thing. These queries that are executed by apply_delta() would need to
be converted to C, or at least used reusable plans, right? Hmm,
creating and dropping temporary tables every time is a clue that the
ultimate form of this should be tuplestores and C code, I think,
right?

Nagata-san is investing the issue.

> Moreover, some regression test are added for aggregate functions support.
> This is Hoshiai-san's work.

Great. Next time you post a WIP patch, could you please fix this
small compiler warning?

describe.c: In function ‘describeOneTableDetails’:
describe.c:3270:55: error: ‘*((void *)&tableinfo+48)’ may be used
uninitialized in this function [-Werror=maybe-uninitialized]
if (verbose && tableinfo.relkind == RELKIND_MATVIEW && tableinfo.isivm)
^
describe.c:1495:4: note: ‘*((void *)&tableinfo+48)’ was declared here
} tableinfo;
^

It is fixed too.

Then our unofficial automatic CI system[1] will run these tests every
day, which sometimes finds problems.

[1] cfbot.cputube.org

--
Thomas Munro
https://enterprisedb.com

Best regards,

Takuma Hoshiai

Re: Implementing Incremental View Maintenance

От

Takuma Hoshiai

Дата:

10 июля 2019 г., 02:29:38

On Wed, 10 Jul 2019 11:07:15 +0900
Takuma Hoshiai <takuma.hoshiai@gmail.com> wrote:

> Hi Thomas,
> 
> 2019年7月8日(月) 15:32 Thomas Munro <thomas.munro@gmail.com>:
> 
> > On Fri, Jun 28, 2019 at 10:56 PM Yugo Nagata <nagata@sraoss.co.jp> wrote:
> > > Attached is a WIP patch of IVM which supports some aggregate functions.
> >
> > Hi Nagata-san and Hoshiai-san,
> >
> > Thank you for working on this.  I enjoyed your talk at PGCon.  I've
> > added Kevin Grittner just in case he missed this thread; he has talked
> > often about implementing the counting algorithm, and he wrote the
> > "trigger transition tables" feature to support exactly this.  While
> > integrating trigger transition tables with the new partition features,
> > we had to make a number of decisions about how that should work, and
> > we tried to come up with answers that would work for IMV, and I hope
> > we made the right choices!
> >
> > I am quite interested to learn how IVM interacts with SERIALIZABLE.
> >
> 
>  Nagata-san has been studying this. Nagata-san, any comment?
> 
> > A couple of superficial review comments:
> 
> Thank you for your review comments.
> Please find attached patches. The some of your review is reflected in patch
> too.

Sorry, I forgot attaching patch.
In addition, avg() function is supported newly. We found a issue
when use avg() with IVM, added its reproduction case in
regressio test. We are being to fix now.

> We manage and update IVM on following github repository.
> https://github.com/sraoss/pgsql-ivm
> you also can found latest WIP patch here.
> 
> 
> > +            const char *aggname = get_func_name(aggref->aggfnoid);
> > ...
> > +            else if (!strcmp(aggname, "sum"))
> >
> > I guess you need a more robust way to detect the supported aggregates
> > than their name, or I guess some way for aggregates themselves to
> > specify that they support this and somehow supply the extra logic.
> > Perhaps I just waid what Greg Stark already said, except not as well.
> >
> 
> We have recognized the issue and are welcome any input.
> 
> > +                            elog(ERROR, "Aggrege function %s is not
> > supported", aggname);
> >
> > s/Aggrege/aggregate/
> >
> 
> I fixed this typo.
> 
> > Of course it is not helpful to comment on typos at this early stage,
> > it's just that this one appears many times in the test output :-)
> >
> > +static bool
> > +isIvmColumn(const char *s)
> > +{
> > +    char pre[7];
> > +
> > +     strlcpy(pre, s, sizeof(pre));
> > +    return (strcmp(pre, "__ivm_") == 0);
> > +}
> >
> > What about strncmp(s, "__ivm_", 6) == 0)?
> 
> 
> I agree with you, I fixed it.
> 
> > As for the question of how
> > to reserve a namespace for system columns that won't clash with user
> > columns, according to our manual the SQL standard doesn't allow $ in
> > identifier names, and according to my copy SQL92 "intermediate SQL"
> > doesn't allow identifiers that end in an underscore.  I don't know
> > what the best answer is but we should probably decide on a something
> > based the standard.
> >
> > As for how to make internal columns invisible to SELECT *, previously
> > there have been discussions about doing that using a new flag in
> > pg_attribute:
> >
> >
> >
https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com
> >
> > +                            "WITH t AS ("
> > +                            "  SELECT diff.__ivm_count__,
> > (diff.__ivm_count__ = mv.__ivm_count__) AS for_dlt, mv.ctid"
> > +                            ", %s"
> > +                            "  FROM %s AS mv, %s AS diff WHERE (%s) =
> > (%s)"
> > +                            "), updt AS ("
> > +                            "  UPDATE %s AS mv SET __ivm_count__ =
> > mv.__ivm_count__ - t.__ivm_count__"
> > +                            ", %s "
> > +                            "  FROM t WHERE mv.ctid = t.ctid AND NOT
> > for_dlt"
> > +                            ") DELETE FROM %s AS mv USING t WHERE
> > mv.ctid = t.ctid AND for_dlt;",
> >
> > I fully understand that this is POC code, but I am curious about one
> > thing.  These queries that are executed by apply_delta() would need to
> > be converted to C, or at least used reusable plans, right?  Hmm,
> > creating and dropping temporary tables every time is a clue that the
> > ultimate form of this should be tuplestores and C code, I think,
> > right?
> >
> 
> Nagata-san is investing the issue.
> 
> 
> > > Moreover, some regression test are added for aggregate functions support.
> > > This is Hoshiai-san's work.
> >
> > Great.  Next time you post a WIP patch, could you please fix this
> > small compiler warning?
> >
> > describe.c: In function ‘describeOneTableDetails’:
> > describe.c:3270:55: error: ‘*((void *)&tableinfo+48)’ may be used
> > uninitialized in this function [-Werror=maybe-uninitialized]
> > if (verbose && tableinfo.relkind == RELKIND_MATVIEW && tableinfo.isivm)
> > ^
> > describe.c:1495:4: note: ‘*((void *)&tableinfo+48)’ was declared here
> > } tableinfo;
> > ^
> >
> 
> It is fixed too.
> 
> > Then our unofficial automatic CI system[1] will run these tests every
> > day, which sometimes finds problems.
> >
> > [1] cfbot.cputube.org
> >
> > --
> > Thomas Munro
> > https://enterprisedb.com
> >
> >
> Best regards,
> 
> Takuma Hoshiai


-- 
Takuma Hoshiai <hoshiai@sraoss.co.jp>

Вложения

WIP_immediate_IVM_v4.patch

Re: Implementing Incremental View Maintenance

От

Tatsuo Ishii

Дата:

11 июля 2019 г., 02:39:36

> I am quite interested to learn how IVM interacts with SERIALIZABLE.

Just for a fun, I have added:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

right after every BEGIN; in incremental_matview.sql in regression test
and it seems it works.

> A couple of superficial review comments:
> 
> +            const char *aggname = get_func_name(aggref->aggfnoid);
> ...
> +            else if (!strcmp(aggname, "sum"))
> 
> I guess you need a more robust way to detect the supported aggregates
> than their name, or I guess some way for aggregates themselves to
> specify that they support this and somehow supply the extra logic.
> Perhaps I just waid what Greg Stark already said, except not as well.

I guess we could use moving aggregate (or partial aggregate?)
functions for this purpose, but then we need to run executor directly
rather using SPI. It needs more codes...

> +                            "WITH t AS ("
> +                            "  SELECT diff.__ivm_count__,
> (diff.__ivm_count__ = mv.__ivm_count__) AS for_dlt, mv.ctid"
> +                            ", %s"
> +                            "  FROM %s AS mv, %s AS diff WHERE (%s) = (%s)"
> +                            "), updt AS ("
> +                            "  UPDATE %s AS mv SET __ivm_count__ =
> mv.__ivm_count__ - t.__ivm_count__"
> +                            ", %s "
> +                            "  FROM t WHERE mv.ctid = t.ctid AND NOT for_dlt"
> +                            ") DELETE FROM %s AS mv USING t WHERE
> mv.ctid = t.ctid AND for_dlt;",
> 
> I fully understand that this is POC code, but I am curious about one
> thing.  These queries that are executed by apply_delta() would need to
> be converted to C, or at least used reusable plans, right?  Hmm,
> creating and dropping temporary tables every time is a clue that the
> ultimate form of this should be tuplestores and C code, I think,
> right?

Yes, we could reuse the temp tables and plans.

> Then our unofficial automatic CI system[1] will run these tests every
> day, which sometimes finds problems.
> 
> [1] cfbot.cputube.org

I appreciate that you provide the system.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: Implementing Incremental View Maintenance

От

Yugo Nagata

Дата:

11 июля 2019 г., 04:28:04

Hi Thomas,

Thank you for your review and discussion on this patch!

> > 2019年7月8日(月) 15:32 Thomas Munro <thomas.munro@gmail.com>:
> > 
> > > On Fri, Jun 28, 2019 at 10:56 PM Yugo Nagata <nagata@sraoss.co.jp> wrote:
> > > > Attached is a WIP patch of IVM which supports some aggregate functions.
> > >
> > > Hi Nagata-san and Hoshiai-san,
> > >
> > > Thank you for working on this.  I enjoyed your talk at PGCon.  I've
> > > added Kevin Grittner just in case he missed this thread; he has talked
> > > often about implementing the counting algorithm, and he wrote the
> > > "trigger transition tables" feature to support exactly this.  While
> > > integrating trigger transition tables with the new partition features,
> > > we had to make a number of decisions about how that should work, and
> > > we tried to come up with answers that would work for IMV, and I hope
> > > we made the right choices!

Transition tables is a great feature. I am now using this in my implementation
of IVM, but the first time I used this feature was when I implemented a PoC
for extending view updatability of PostgreSQL[1]. At that time, I didn't know
that this feature is made originally aiming to support IVM. 

[1] https://www.pgcon.org/2017/schedule/events/1074.en.html

I think transition tables is a good choice to implement a statement level
immediate view maintenance where materialized views are refreshed in a statement
level after trigger. However, when implementing a transaction level immediate
view maintenance where views are refreshed per transaction, or deferred view
maintenance, we can't update views in a after trigger, and we will need an
infrastructure to manage change logs of base tables. Transition tables can be
used to collect these logs, but using logical decoding of WAL is another candidate.
In any way, if these logs can be collected in a tuplestore, we might able to
convert this to "ephemeral named relation (ENR)" and use this to calculate diff
sets for views.

> > >
> > > I am quite interested to learn how IVM interacts with SERIALIZABLE.
> > >
> > 
> >  Nagata-san has been studying this. Nagata-san, any comment?

In SERIALIZABLE or REPEATABLE READ level, table changes occurred in other 
ransactions are not visible, so views can not be maintained correctly in AFTER
triggers. If a view is defined on two tables and each table is modified in
different concurrent transactions respectively, the result of view maintenance
done in trigger functions can be incorrect due to the race condition. This is the
reason why such transactions are aborted immediately in that case in my current
implementation.

One idea to resolve this is performing view maintenance in two phases. Firstly, 
views are updated using only table changes visible in this transaction. Then, 
just after this transaction is committed, views have to be updated additionally 
using changes happened in other transactions to keep consistency. This is a just 
idea, but  to implement this idea, I think, we will need keep to keep and 
maintain change logs.

> > > A couple of superficial review comments:

> > > +            const char *aggname = get_func_name(aggref->aggfnoid);
> > > ...
> > > +            else if (!strcmp(aggname, "sum"))
> > >
> > > I guess you need a more robust way to detect the supported aggregates
> > > than their name, or I guess some way for aggregates themselves to
> > > specify that they support this and somehow supply the extra logic.
> > > Perhaps I just waid what Greg Stark already said, except not as well.

Yes. Using name is not robust because users can make same name aggregates like 
sum(text) (although I am not sure this makes sense). We can use oids instead 
of names, but it would be nice to extend pg_aggregate and add new attributes 
for informing that this supports IVM and for providing functions for IVM logic.

> > > As for the question of how
> > > to reserve a namespace for system columns that won't clash with user
> > > columns, according to our manual the SQL standard doesn't allow $ in
> > > identifier names, and according to my copy SQL92 "intermediate SQL"
> > > doesn't allow identifiers that end in an underscore.  I don't know
> > > what the best answer is but we should probably decide on a something
> > > based the standard.

Ok, so we should use "__ivm_count__" since this ends in "_" at least.

Another idea is that users specify the name of this special column when 
defining materialized views with IVM support. This way can avoid the conflict 
because users will specify a name which does not appear in the target list.

As for aggregates supports, it may be also possible to make it a restriction 
that count(expr) must be in the target list explicitly when sum(expr) or 
avg(expr) is included, instead of making hidden column like __ivm_count_sum__,
like Oracle does.

> > >
> > > As for how to make internal columns invisible to SELECT *, previously
> > > there have been discussions about doing that using a new flag in
> > > pg_attribute:
> > >
> > >
> > >
https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com

I agree implementing this feature in PostgreSQL since there are at least a few
use cases, IVM and temporal database.

> > >
> > > +                            "WITH t AS ("
> > > +                            "  SELECT diff.__ivm_count__,
> > > (diff.__ivm_count__ = mv.__ivm_count__) AS for_dlt, mv.ctid"
> > > +                            ", %s"
> > > +                            "  FROM %s AS mv, %s AS diff WHERE (%s) =
> > > (%s)"
> > > +                            "), updt AS ("
> > > +                            "  UPDATE %s AS mv SET __ivm_count__ =
> > > mv.__ivm_count__ - t.__ivm_count__"
> > > +                            ", %s "
> > > +                            "  FROM t WHERE mv.ctid = t.ctid AND NOT
> > > for_dlt"
> > > +                            ") DELETE FROM %s AS mv USING t WHERE
> > > mv.ctid = t.ctid AND for_dlt;",
> > >
> > > I fully understand that this is POC code, but I am curious about one
> > > thing.  These queries that are executed by apply_delta() would need to
> > > be converted to C, or at least used reusable plans, right?  Hmm,
> > > creating and dropping temporary tables every time is a clue that the
> > > ultimate form of this should be tuplestores and C code, I think,
> > > right?

I used SPI just because REFRESH CONCURRENTLY uses this, but, as you said,
it is inefficient to create/drop temp tables and perform parse/plan every times.
It seems to be enough to perform this once when creating materialized views or 
at the first maintenance time.

Best regards,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Yugo Nagata

Дата:

26 июля 2019 г., 02:35:53

Hi,

I've updated the wiki page of Incremental View Maintenance.

https://wiki.postgresql.org/wiki/Incremental_View_Maintenance

On Thu, 11 Jul 2019 13:28:04 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:

> Hi Thomas,
> 
> Thank you for your review and discussion on this patch!
> 
> > > 2019年7月8日(月) 15:32 Thomas Munro <thomas.munro@gmail.com>:
> > > 
> > > > On Fri, Jun 28, 2019 at 10:56 PM Yugo Nagata <nagata@sraoss.co.jp> wrote:
> > > > > Attached is a WIP patch of IVM which supports some aggregate functions.
> > > >
> > > > Hi Nagata-san and Hoshiai-san,
> > > >
> > > > Thank you for working on this.  I enjoyed your talk at PGCon.  I've
> > > > added Kevin Grittner just in case he missed this thread; he has talked
> > > > often about implementing the counting algorithm, and he wrote the
> > > > "trigger transition tables" feature to support exactly this.  While
> > > > integrating trigger transition tables with the new partition features,
> > > > we had to make a number of decisions about how that should work, and
> > > > we tried to come up with answers that would work for IMV, and I hope
> > > > we made the right choices!
> 
> Transition tables is a great feature. I am now using this in my implementation
> of IVM, but the first time I used this feature was when I implemented a PoC
> for extending view updatability of PostgreSQL[1]. At that time, I didn't know
> that this feature is made originally aiming to support IVM. 
> 
> [1] https://www.pgcon.org/2017/schedule/events/1074.en.html
> 
> I think transition tables is a good choice to implement a statement level
> immediate view maintenance where materialized views are refreshed in a statement
> level after trigger. However, when implementing a transaction level immediate
> view maintenance where views are refreshed per transaction, or deferred view
> maintenance, we can't update views in a after trigger, and we will need an
> infrastructure to manage change logs of base tables. Transition tables can be
> used to collect these logs, but using logical decoding of WAL is another candidate.
> In any way, if these logs can be collected in a tuplestore, we might able to
> convert this to "ephemeral named relation (ENR)" and use this to calculate diff
> sets for views.
> 
> > > >
> > > > I am quite interested to learn how IVM interacts with SERIALIZABLE.
> > > >
> > > 
> > >  Nagata-san has been studying this. Nagata-san, any comment?
> 
> In SERIALIZABLE or REPEATABLE READ level, table changes occurred in other 
> ransactions are not visible, so views can not be maintained correctly in AFTER
> triggers. If a view is defined on two tables and each table is modified in
> different concurrent transactions respectively, the result of view maintenance
> done in trigger functions can be incorrect due to the race condition. This is the
> reason why such transactions are aborted immediately in that case in my current
> implementation.
> 
> One idea to resolve this is performing view maintenance in two phases. Firstly, 
> views are updated using only table changes visible in this transaction. Then, 
> just after this transaction is committed, views have to be updated additionally 
> using changes happened in other transactions to keep consistency. This is a just 
> idea, but  to implement this idea, I think, we will need keep to keep and 
> maintain change logs.
> 
> > > > A couple of superficial review comments:
> 
> 
>  
> > > > +            const char *aggname = get_func_name(aggref->aggfnoid);
> > > > ...
> > > > +            else if (!strcmp(aggname, "sum"))
> > > >
> > > > I guess you need a more robust way to detect the supported aggregates
> > > > than their name, or I guess some way for aggregates themselves to
> > > > specify that they support this and somehow supply the extra logic.
> > > > Perhaps I just waid what Greg Stark already said, except not as well.
> 
> Yes. Using name is not robust because users can make same name aggregates like 
> sum(text) (although I am not sure this makes sense). We can use oids instead 
> of names, but it would be nice to extend pg_aggregate and add new attributes 
> for informing that this supports IVM and for providing functions for IVM logic.
> 
> > > > As for the question of how
> > > > to reserve a namespace for system columns that won't clash with user
> > > > columns, according to our manual the SQL standard doesn't allow $ in
> > > > identifier names, and according to my copy SQL92 "intermediate SQL"
> > > > doesn't allow identifiers that end in an underscore.  I don't know
> > > > what the best answer is but we should probably decide on a something
> > > > based the standard.
> 
> Ok, so we should use "__ivm_count__" since this ends in "_" at least.
> 
> Another idea is that users specify the name of this special column when 
> defining materialized views with IVM support. This way can avoid the conflict 
> because users will specify a name which does not appear in the target list.
> 
> As for aggregates supports, it may be also possible to make it a restriction 
> that count(expr) must be in the target list explicitly when sum(expr) or 
> avg(expr) is included, instead of making hidden column like __ivm_count_sum__,
> like Oracle does.
> 
> > > >
> > > > As for how to make internal columns invisible to SELECT *, previously
> > > > there have been discussions about doing that using a new flag in
> > > > pg_attribute:
> > > >
> > > >
> > > >
https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com
> 
> I agree implementing this feature in PostgreSQL since there are at least a few
> use cases, IVM and temporal database.
> 
> > > >
> > > > +                            "WITH t AS ("
> > > > +                            "  SELECT diff.__ivm_count__,
> > > > (diff.__ivm_count__ = mv.__ivm_count__) AS for_dlt, mv.ctid"
> > > > +                            ", %s"
> > > > +                            "  FROM %s AS mv, %s AS diff WHERE (%s) =
> > > > (%s)"
> > > > +                            "), updt AS ("
> > > > +                            "  UPDATE %s AS mv SET __ivm_count__ =
> > > > mv.__ivm_count__ - t.__ivm_count__"
> > > > +                            ", %s "
> > > > +                            "  FROM t WHERE mv.ctid = t.ctid AND NOT
> > > > for_dlt"
> > > > +                            ") DELETE FROM %s AS mv USING t WHERE
> > > > mv.ctid = t.ctid AND for_dlt;",
> > > >
> > > > I fully understand that this is POC code, but I am curious about one
> > > > thing.  These queries that are executed by apply_delta() would need to
> > > > be converted to C, or at least used reusable plans, right?  Hmm,
> > > > creating and dropping temporary tables every time is a clue that the
> > > > ultimate form of this should be tuplestores and C code, I think,
> > > > right?
> 
> I used SPI just because REFRESH CONCURRENTLY uses this, but, as you said,
> it is inefficient to create/drop temp tables and perform parse/plan every times.
> It seems to be enough to perform this once when creating materialized views or 
> at the first maintenance time.
> 
> 
> Best regards,
> Yugo Nagata
> 
> 
> -- 
> Yugo Nagata <nagata@sraoss.co.jp>
> 
> 


-- 
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Yugo Nagata

Дата:

31 июля 2019 г., 09:08:51

Hi,

Attached is the latest patch for supporting min and max aggregate functions.

When new tuples are inserted into base tables, if new values are small
(for min) or large (for max), matview just have to be updated with these
new values. Otherwise, old values just remains.
    
However, in the case of deletion, this is more complicated. If deleted
values exists in matview as current min or max, we have to recomputate
new min or max values from base tables for affected groups, and matview
should be updated with these recomputated values. 

Also, regression tests for min/max are also added.

In addition, incremental update algorithm of avg aggregate values is a bit
improved. If an avg result in materialized views is updated incrementally
y using the old avg value, numerical errors in avg values are accumulated
and the values get wrong eventually. To prevent this, both of sum and count
values are contained in views as hidden columns and use them to calculate
new avg value instead of using old avg values.

Regards,

On Fri, 26 Jul 2019 11:35:53 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:

> Hi,
> 
> I've updated the wiki page of Incremental View Maintenance.
> 
> https://wiki.postgresql.org/wiki/Incremental_View_Maintenance
> 
> On Thu, 11 Jul 2019 13:28:04 +0900
> Yugo Nagata <nagata@sraoss.co.jp> wrote:
> 
> > Hi Thomas,
> > 
> > Thank you for your review and discussion on this patch!
> > 
> > > > 2019年7月8日(月) 15:32 Thomas Munro <thomas.munro@gmail.com>:
> > > > 
> > > > > On Fri, Jun 28, 2019 at 10:56 PM Yugo Nagata <nagata@sraoss.co.jp> wrote:
> > > > > > Attached is a WIP patch of IVM which supports some aggregate functions.
> > > > >
> > > > > Hi Nagata-san and Hoshiai-san,
> > > > >
> > > > > Thank you for working on this.  I enjoyed your talk at PGCon.  I've
> > > > > added Kevin Grittner just in case he missed this thread; he has talked
> > > > > often about implementing the counting algorithm, and he wrote the
> > > > > "trigger transition tables" feature to support exactly this.  While
> > > > > integrating trigger transition tables with the new partition features,
> > > > > we had to make a number of decisions about how that should work, and
> > > > > we tried to come up with answers that would work for IMV, and I hope
> > > > > we made the right choices!
> > 
> > Transition tables is a great feature. I am now using this in my implementation
> > of IVM, but the first time I used this feature was when I implemented a PoC
> > for extending view updatability of PostgreSQL[1]. At that time, I didn't know
> > that this feature is made originally aiming to support IVM. 
> > 
> > [1] https://www.pgcon.org/2017/schedule/events/1074.en.html
> > 
> > I think transition tables is a good choice to implement a statement level
> > immediate view maintenance where materialized views are refreshed in a statement
> > level after trigger. However, when implementing a transaction level immediate
> > view maintenance where views are refreshed per transaction, or deferred view
> > maintenance, we can't update views in a after trigger, and we will need an
> > infrastructure to manage change logs of base tables. Transition tables can be
> > used to collect these logs, but using logical decoding of WAL is another candidate.
> > In any way, if these logs can be collected in a tuplestore, we might able to
> > convert this to "ephemeral named relation (ENR)" and use this to calculate diff
> > sets for views.
> > 
> > > > >
> > > > > I am quite interested to learn how IVM interacts with SERIALIZABLE.
> > > > >
> > > > 
> > > >  Nagata-san has been studying this. Nagata-san, any comment?
> > 
> > In SERIALIZABLE or REPEATABLE READ level, table changes occurred in other 
> > ransactions are not visible, so views can not be maintained correctly in AFTER
> > triggers. If a view is defined on two tables and each table is modified in
> > different concurrent transactions respectively, the result of view maintenance
> > done in trigger functions can be incorrect due to the race condition. This is the
> > reason why such transactions are aborted immediately in that case in my current
> > implementation.
> > 
> > One idea to resolve this is performing view maintenance in two phases. Firstly, 
> > views are updated using only table changes visible in this transaction. Then, 
> > just after this transaction is committed, views have to be updated additionally 
> > using changes happened in other transactions to keep consistency. This is a just 
> > idea, but  to implement this idea, I think, we will need keep to keep and 
> > maintain change logs.
> > 
> > > > > A couple of superficial review comments:
> > 
> > 
> >  
> > > > > +            const char *aggname = get_func_name(aggref->aggfnoid);
> > > > > ...
> > > > > +            else if (!strcmp(aggname, "sum"))
> > > > >
> > > > > I guess you need a more robust way to detect the supported aggregates
> > > > > than their name, or I guess some way for aggregates themselves to
> > > > > specify that they support this and somehow supply the extra logic.
> > > > > Perhaps I just waid what Greg Stark already said, except not as well.
> > 
> > Yes. Using name is not robust because users can make same name aggregates like 
> > sum(text) (although I am not sure this makes sense). We can use oids instead 
> > of names, but it would be nice to extend pg_aggregate and add new attributes 
> > for informing that this supports IVM and for providing functions for IVM logic.
> > 
> > > > > As for the question of how
> > > > > to reserve a namespace for system columns that won't clash with user
> > > > > columns, according to our manual the SQL standard doesn't allow $ in
> > > > > identifier names, and according to my copy SQL92 "intermediate SQL"
> > > > > doesn't allow identifiers that end in an underscore.  I don't know
> > > > > what the best answer is but we should probably decide on a something
> > > > > based the standard.
> > 
> > Ok, so we should use "__ivm_count__" since this ends in "_" at least.
> > 
> > Another idea is that users specify the name of this special column when 
> > defining materialized views with IVM support. This way can avoid the conflict 
> > because users will specify a name which does not appear in the target list.
> > 
> > As for aggregates supports, it may be also possible to make it a restriction 
> > that count(expr) must be in the target list explicitly when sum(expr) or 
> > avg(expr) is included, instead of making hidden column like __ivm_count_sum__,
> > like Oracle does.
> > 
> > > > >
> > > > > As for how to make internal columns invisible to SELECT *, previously
> > > > > there have been discussions about doing that using a new flag in
> > > > > pg_attribute:
> > > > >
> > > > >
> > > > >
https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com
> > 
> > I agree implementing this feature in PostgreSQL since there are at least a few
> > use cases, IVM and temporal database.
> > 
> > > > >
> > > > > +                            "WITH t AS ("
> > > > > +                            "  SELECT diff.__ivm_count__,
> > > > > (diff.__ivm_count__ = mv.__ivm_count__) AS for_dlt, mv.ctid"
> > > > > +                            ", %s"
> > > > > +                            "  FROM %s AS mv, %s AS diff WHERE (%s) =
> > > > > (%s)"
> > > > > +                            "), updt AS ("
> > > > > +                            "  UPDATE %s AS mv SET __ivm_count__ =
> > > > > mv.__ivm_count__ - t.__ivm_count__"
> > > > > +                            ", %s "
> > > > > +                            "  FROM t WHERE mv.ctid = t.ctid AND NOT
> > > > > for_dlt"
> > > > > +                            ") DELETE FROM %s AS mv USING t WHERE
> > > > > mv.ctid = t.ctid AND for_dlt;",
> > > > >
> > > > > I fully understand that this is POC code, but I am curious about one
> > > > > thing.  These queries that are executed by apply_delta() would need to
> > > > > be converted to C, or at least used reusable plans, right?  Hmm,
> > > > > creating and dropping temporary tables every time is a clue that the
> > > > > ultimate form of this should be tuplestores and C code, I think,
> > > > > right?
> > 
> > I used SPI just because REFRESH CONCURRENTLY uses this, but, as you said,
> > it is inefficient to create/drop temp tables and perform parse/plan every times.
> > It seems to be enough to perform this once when creating materialized views or 
> > at the first maintenance time.
> > 
> > 
> > Best regards,
> > Yugo Nagata
> > 
> > 
> > -- 
> > Yugo Nagata <nagata@sraoss.co.jp>
> > 
> > 
> 
> 
> -- 
> Yugo Nagata <nagata@sraoss.co.jp>
> 
> 


-- 
Yugo Nagata <nagata@sraoss.co.jp>

Вложения

WIP_immediate_IVM_v5.patch

Re: Implementing Incremental View Maintenance

От

Tatsuo Ishii

Дата:

06 августа 2019 г., 00:25:02

It's not mentioned below but some bugs including seg fault when
--enable-casser is enabled was also fixed in this patch.

BTW, I found a bug with min/max support in this patch and I believe
Yugo is working on it. Details:
https://github.com/sraoss/pgsql-ivm/issues/20

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

From: Yugo Nagata <nagata@sraoss.co.jp>
Subject: Re: Implementing Incremental View Maintenance
Date: Wed, 31 Jul 2019 18:08:51 +0900
Message-ID: <20190731180851.73856441d8abb494bf5e68e7@sraoss.co.jp>

> Hi,
> 
> Attached is the latest patch for supporting min and max aggregate functions.
> 
> When new tuples are inserted into base tables, if new values are small
> (for min) or large (for max), matview just have to be updated with these
> new values. Otherwise, old values just remains.
>     
> However, in the case of deletion, this is more complicated. If deleted
> values exists in matview as current min or max, we have to recomputate
> new min or max values from base tables for affected groups, and matview
> should be updated with these recomputated values. 
> 
> Also, regression tests for min/max are also added.
> 
> In addition, incremental update algorithm of avg aggregate values is a bit
> improved. If an avg result in materialized views is updated incrementally
> y using the old avg value, numerical errors in avg values are accumulated
> and the values get wrong eventually. To prevent this, both of sum and count
> values are contained in views as hidden columns and use them to calculate
> new avg value instead of using old avg values.
> 
> Regards,
> 
> On Fri, 26 Jul 2019 11:35:53 +0900
> Yugo Nagata <nagata@sraoss.co.jp> wrote:
> 
>> Hi,
>> 
>> I've updated the wiki page of Incremental View Maintenance.
>> 
>> https://wiki.postgresql.org/wiki/Incremental_View_Maintenance
>> 
>> On Thu, 11 Jul 2019 13:28:04 +0900
>> Yugo Nagata <nagata@sraoss.co.jp> wrote:
>> 
>> > Hi Thomas,
>> > 
>> > Thank you for your review and discussion on this patch!
>> > 
>> > > > 2019年7月8日(月) 15:32 Thomas Munro <thomas.munro@gmail.com>:
>> > > > 
>> > > > > On Fri, Jun 28, 2019 at 10:56 PM Yugo Nagata <nagata@sraoss.co.jp> wrote:
>> > > > > > Attached is a WIP patch of IVM which supports some aggregate functions.
>> > > > >
>> > > > > Hi Nagata-san and Hoshiai-san,
>> > > > >
>> > > > > Thank you for working on this.  I enjoyed your talk at PGCon.  I've
>> > > > > added Kevin Grittner just in case he missed this thread; he has talked
>> > > > > often about implementing the counting algorithm, and he wrote the
>> > > > > "trigger transition tables" feature to support exactly this.  While
>> > > > > integrating trigger transition tables with the new partition features,
>> > > > > we had to make a number of decisions about how that should work, and
>> > > > > we tried to come up with answers that would work for IMV, and I hope
>> > > > > we made the right choices!
>> > 
>> > Transition tables is a great feature. I am now using this in my implementation
>> > of IVM, but the first time I used this feature was when I implemented a PoC
>> > for extending view updatability of PostgreSQL[1]. At that time, I didn't know
>> > that this feature is made originally aiming to support IVM. 
>> > 
>> > [1] https://www.pgcon.org/2017/schedule/events/1074.en.html
>> > 
>> > I think transition tables is a good choice to implement a statement level
>> > immediate view maintenance where materialized views are refreshed in a statement
>> > level after trigger. However, when implementing a transaction level immediate
>> > view maintenance where views are refreshed per transaction, or deferred view
>> > maintenance, we can't update views in a after trigger, and we will need an
>> > infrastructure to manage change logs of base tables. Transition tables can be
>> > used to collect these logs, but using logical decoding of WAL is another candidate.
>> > In any way, if these logs can be collected in a tuplestore, we might able to
>> > convert this to "ephemeral named relation (ENR)" and use this to calculate diff
>> > sets for views.
>> > 
>> > > > >
>> > > > > I am quite interested to learn how IVM interacts with SERIALIZABLE.
>> > > > >
>> > > > 
>> > > >  Nagata-san has been studying this. Nagata-san, any comment?
>> > 
>> > In SERIALIZABLE or REPEATABLE READ level, table changes occurred in other 
>> > ransactions are not visible, so views can not be maintained correctly in AFTER
>> > triggers. If a view is defined on two tables and each table is modified in
>> > different concurrent transactions respectively, the result of view maintenance
>> > done in trigger functions can be incorrect due to the race condition. This is the
>> > reason why such transactions are aborted immediately in that case in my current
>> > implementation.
>> > 
>> > One idea to resolve this is performing view maintenance in two phases. Firstly, 
>> > views are updated using only table changes visible in this transaction. Then, 
>> > just after this transaction is committed, views have to be updated additionally 
>> > using changes happened in other transactions to keep consistency. This is a just 
>> > idea, but  to implement this idea, I think, we will need keep to keep and 
>> > maintain change logs.
>> > 
>> > > > > A couple of superficial review comments:
>> > 
>> > 
>> >  
>> > > > > +            const char *aggname = get_func_name(aggref->aggfnoid);
>> > > > > ...
>> > > > > +            else if (!strcmp(aggname, "sum"))
>> > > > >
>> > > > > I guess you need a more robust way to detect the supported aggregates
>> > > > > than their name, or I guess some way for aggregates themselves to
>> > > > > specify that they support this and somehow supply the extra logic.
>> > > > > Perhaps I just waid what Greg Stark already said, except not as well.
>> > 
>> > Yes. Using name is not robust because users can make same name aggregates like 
>> > sum(text) (although I am not sure this makes sense). We can use oids instead 
>> > of names, but it would be nice to extend pg_aggregate and add new attributes 
>> > for informing that this supports IVM and for providing functions for IVM logic.
>> > 
>> > > > > As for the question of how
>> > > > > to reserve a namespace for system columns that won't clash with user
>> > > > > columns, according to our manual the SQL standard doesn't allow $ in
>> > > > > identifier names, and according to my copy SQL92 "intermediate SQL"
>> > > > > doesn't allow identifiers that end in an underscore.  I don't know
>> > > > > what the best answer is but we should probably decide on a something
>> > > > > based the standard.
>> > 
>> > Ok, so we should use "__ivm_count__" since this ends in "_" at least.
>> > 
>> > Another idea is that users specify the name of this special column when 
>> > defining materialized views with IVM support. This way can avoid the conflict 
>> > because users will specify a name which does not appear in the target list.
>> > 
>> > As for aggregates supports, it may be also possible to make it a restriction 
>> > that count(expr) must be in the target list explicitly when sum(expr) or 
>> > avg(expr) is included, instead of making hidden column like __ivm_count_sum__,
>> > like Oracle does.
>> > 
>> > > > >
>> > > > > As for how to make internal columns invisible to SELECT *, previously
>> > > > > there have been discussions about doing that using a new flag in
>> > > > > pg_attribute:
>> > > > >
>> > > > >
>> > > > >
https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com
>> > 
>> > I agree implementing this feature in PostgreSQL since there are at least a few
>> > use cases, IVM and temporal database.
>> > 
>> > > > >
>> > > > > +                            "WITH t AS ("
>> > > > > +                            "  SELECT diff.__ivm_count__,
>> > > > > (diff.__ivm_count__ = mv.__ivm_count__) AS for_dlt, mv.ctid"
>> > > > > +                            ", %s"
>> > > > > +                            "  FROM %s AS mv, %s AS diff WHERE (%s) =
>> > > > > (%s)"
>> > > > > +                            "), updt AS ("
>> > > > > +                            "  UPDATE %s AS mv SET __ivm_count__ =
>> > > > > mv.__ivm_count__ - t.__ivm_count__"
>> > > > > +                            ", %s "
>> > > > > +                            "  FROM t WHERE mv.ctid = t.ctid AND NOT
>> > > > > for_dlt"
>> > > > > +                            ") DELETE FROM %s AS mv USING t WHERE
>> > > > > mv.ctid = t.ctid AND for_dlt;",
>> > > > >
>> > > > > I fully understand that this is POC code, but I am curious about one
>> > > > > thing.  These queries that are executed by apply_delta() would need to
>> > > > > be converted to C, or at least used reusable plans, right?  Hmm,
>> > > > > creating and dropping temporary tables every time is a clue that the
>> > > > > ultimate form of this should be tuplestores and C code, I think,
>> > > > > right?
>> > 
>> > I used SPI just because REFRESH CONCURRENTLY uses this, but, as you said,
>> > it is inefficient to create/drop temp tables and perform parse/plan every times.
>> > It seems to be enough to perform this once when creating materialized views or 
>> > at the first maintenance time.
>> > 
>> > 
>> > Best regards,
>> > Yugo Nagata
>> > 
>> > 
>> > -- 
>> > Yugo Nagata <nagata@sraoss.co.jp>
>> > 
>> > 
>> 
>> 
>> -- 
>> Yugo Nagata <nagata@sraoss.co.jp>
>> 
>> 
> 
> 
> -- 
> Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Alvaro Herrera

Дата:

12 сентября 2019 г., 15:19:50

On 2019-Aug-06, Tatsuo Ishii wrote:

> It's not mentioned below but some bugs including seg fault when
> --enable-casser is enabled was also fixed in this patch.
> 
> BTW, I found a bug with min/max support in this patch and I believe
> Yugo is working on it. Details:
> https://github.com/sraoss/pgsql-ivm/issues/20

So is he posting an updated patch soon?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Implementing Incremental View Maintenance

От

Paul Draper

Дата:

15 сентября 2019 г., 17:52:22

As I understand it, the current patch performs immediate IVM using AFTER STATEMENT trigger transition tables.

However, multiple tables can be modified *before* AFTER STATEMENT triggers are fired.

CREATE TABLE example1 (a int);

CREATE TABLE example2 (a int);

CREATE INCREMENTAL MATERIALIZED VIEW mv AS

SELECT example1.a, example2.a

FROM example1 JOIN example2 ON a;

WITH

insert1 AS (INSERT INTO example1 VALUES (1)),

insert2 AS (INSERT INTO example2 VALUES (1))

SELECT NULL;

Changes to example1 are visible in an AFTER STATEMENT trigger on example2, and vice versa. Would this not result in the (1, 1) tuple being "double-counted"?

IVM needs to either:

(1) Evaluate deltas "serially' (e.g. EACH ROW triggers)

(2) Have simultaneous access to multiple deltas:

delta_mv = example1 x delta_example2 + example2 x delta_example1 - delta_example1 x delta_example2

This latter method is the "logged" approach that has been discussed for deferred evaluation.

tl;dr It seems that AFTER STATEMENT triggers required a deferred-like implementation anyway.

Re: Implementing Incremental View Maintenance

От

Tatsuo Ishii

Дата:

17 сентября 2019 г., 02:49:13

> On 2019-Aug-06, Tatsuo Ishii wrote:
> 
>> It's not mentioned below but some bugs including seg fault when
>> --enable-casser is enabled was also fixed in this patch.
>> 
>> BTW, I found a bug with min/max support in this patch and I believe
>> Yugo is working on it. Details:
>> https://github.com/sraoss/pgsql-ivm/issues/20
> 
> So is he posting an updated patch soon?

I think he is going to post an updated patch by the end of this month.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: Implementing Incremental View Maintenance

От

Yugo Nagata

Дата:

17 сентября 2019 г., 10:02:40

Hi Paul,

Thank you for your suggestion.

On Sun, 15 Sep 2019 11:52:22 -0600
Paul Draper <paulddraper@gmail.com> wrote:

> As I understand it, the current patch performs immediate IVM using AFTER
> STATEMENT trigger transition tables.
> 
> However, multiple tables can be modified *before* AFTER STATEMENT triggers
> are fired.
> 
> CREATE TABLE example1 (a int);
> CREATE TABLE example2 (a int);
> 
> CREATE INCREMENTAL MATERIALIZED VIEW mv AS
> SELECT example1.a, example2.a
> FROM example1 JOIN example2 ON a;
> 
> WITH
>   insert1 AS (INSERT INTO example1 VALUES (1)),
>   insert2 AS (INSERT INTO example2 VALUES (1))
> SELECT NULL;
> 
> Changes to example1 are visible in an AFTER STATEMENT trigger on example2,
> and vice versa. Would this not result in the (1, 1) tuple being
> "double-counted"?
> 
> IVM needs to either:
> 
> (1) Evaluate deltas "serially' (e.g. EACH ROW triggers)
> 
> (2) Have simultaneous access to multiple deltas:
> delta_mv = example1 x delta_example2 + example2 x delta_example1 -
> delta_example1 x delta_example2
> 
> This latter method is the "logged" approach that has been discussed for
> deferred evaluation.
> 
> tl;dr It seems that AFTER STATEMENT triggers required a deferred-like
> implementation anyway.

You are right,  the latest patch doesn't support the situation where
multiple tables are modified in a query. I noticed this when working
on self-join, which also virtually need to handle multiple table
modification.

I am now working on this issue and the next patch will enable to handle
this situation. I plan to submit the patch during this month. Roughly
speaking, in the new implementation, AFTER STATEMENT triggers are used to
collect  information of modified table and its changes (= transition tables), 
and then the only last trigger updates the view. This will avoid the
double-counting. I think this implementation also would be a base of
deferred approach implementation in future where "logs" are used instead
of transition tables.

Regards,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Paul Draper

Дата:

17 сентября 2019 г., 18:03:20

Have you had any thoughts for more than two joined tables?

Either there needs to be an quadratic number of joins, or intermediate join results need to be stored and reused.

On Tue, Sep 17, 2019 at 8:50 AM Yugo Nagata <nagata@sraoss.co.jp> wrote:

Hi Paul,

Thank you for your suggestion.

On Sun, 15 Sep 2019 11:52:22 -0600
Paul Draper <paulddraper@gmail.com> wrote:

> As I understand it, the current patch performs immediate IVM using AFTER
> STATEMENT trigger transition tables.
>
> However, multiple tables can be modified *before* AFTER STATEMENT triggers
> are fired.
>
> CREATE TABLE example1 (a int);
> CREATE TABLE example2 (a int);
>
> CREATE INCREMENTAL MATERIALIZED VIEW mv AS
> SELECT example1.a, example2.a
> FROM example1 JOIN example2 ON a;
>
> WITH
> insert1 AS (INSERT INTO example1 VALUES (1)),
> insert2 AS (INSERT INTO example2 VALUES (1))
> SELECT NULL;
>
> Changes to example1 are visible in an AFTER STATEMENT trigger on example2,
> and vice versa. Would this not result in the (1, 1) tuple being
> "double-counted"?
>
> IVM needs to either:
>
> (1) Evaluate deltas "serially' (e.g. EACH ROW triggers)
>
> (2) Have simultaneous access to multiple deltas:
> delta_mv = example1 x delta_example2 + example2 x delta_example1 -
> delta_example1 x delta_example2
>
> This latter method is the "logged" approach that has been discussed for
> deferred evaluation.
>
> tl;dr It seems that AFTER STATEMENT triggers required a deferred-like
> implementation anyway.

You are right, the latest patch doesn't support the situation where
multiple tables are modified in a query. I noticed this when working
on self-join, which also virtually need to handle multiple table
modification.

I am now working on this issue and the next patch will enable to handle
this situation. I plan to submit the patch during this month. Roughly
speaking, in the new implementation, AFTER STATEMENT triggers are used to
collect information of modified table and its changes (= transition tables),
and then the only last trigger updates the view. This will avoid the
double-counting. I think this implementation also would be a base of
deferred approach implementation in future where "logs" are used instead
of transition tables.

Regards,
Yugo Nagata

--
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Tatsuo Ishii

Дата:

27 сентября 2019 г., 02:47:40

> Have you had any thoughts for more than two joined tables?

I am not sure what you are asking here but if you are asking if IVM
supports two or more tables involved in a join, we already support it:

DROP MATERIALIZED VIEW mv1;
DROP MATERIALIZED VIEW
DROP TABLE t1;
DROP TABLE
DROP TABLE t2;
DROP TABLE
DROP TABLE t3;
DROP TABLE
CREATE TABLE t1(i int, j int);
CREATE TABLE
CREATE TABLE t2(k int, l int);
CREATE TABLE
CREATE TABLE t3(m int, n int);
CREATE TABLE
INSERT INTO t1 VALUES(1,10),(2,11);
INSERT 0 2
INSERT INTO t2 VALUES(1,20),(2,21);
INSERT 0 2
INSERT INTO t3 VALUES(1,30),(2,31);
INSERT 0 2
CREATE INCREMENTAL MATERIALIZED VIEW mv1 AS SELECT * FROM t1 INNER JOIN t2 ON t1.i = t2.k INNER JOIN t3 ON t1.i =
t3.m;
SELECT 2
SELECT * FROM mv1;
 i | j  | k | l  | m | n  
---+----+---+----+---+----
 1 | 10 | 1 | 20 | 1 | 30
 2 | 11 | 2 | 21 | 2 | 31
(2 rows)

UPDATE t1 SET j = 15 WHERE i = 1;
UPDATE 1
SELECT * FROM mv1;
 i | j  | k | l  | m | n  
---+----+---+----+---+----
 2 | 11 | 2 | 21 | 2 | 31
 1 | 15 | 1 | 20 | 1 | 30
(2 rows)

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> Either there needs to be an quadratic number of joins, or intermediate join
> results need to be stored and reused.
> 
> On Tue, Sep 17, 2019 at 8:50 AM Yugo Nagata <nagata@sraoss.co.jp> wrote:
> 
>> Hi Paul,
>>
>> Thank you for your suggestion.
>>
>> On Sun, 15 Sep 2019 11:52:22 -0600
>> Paul Draper <paulddraper@gmail.com> wrote:
>>
>> > As I understand it, the current patch performs immediate IVM using AFTER
>> > STATEMENT trigger transition tables.
>> >
>> > However, multiple tables can be modified *before* AFTER STATEMENT
>> triggers
>> > are fired.
>> >
>> > CREATE TABLE example1 (a int);
>> > CREATE TABLE example2 (a int);
>> >
>> > CREATE INCREMENTAL MATERIALIZED VIEW mv AS
>> > SELECT example1.a, example2.a
>> > FROM example1 JOIN example2 ON a;
>> >
>> > WITH
>> >   insert1 AS (INSERT INTO example1 VALUES (1)),
>> >   insert2 AS (INSERT INTO example2 VALUES (1))
>> > SELECT NULL;
>> >
>> > Changes to example1 are visible in an AFTER STATEMENT trigger on
>> example2,
>> > and vice versa. Would this not result in the (1, 1) tuple being
>> > "double-counted"?
>> >
>> > IVM needs to either:
>> >
>> > (1) Evaluate deltas "serially' (e.g. EACH ROW triggers)
>> >
>> > (2) Have simultaneous access to multiple deltas:
>> > delta_mv = example1 x delta_example2 + example2 x delta_example1 -
>> > delta_example1 x delta_example2
>> >
>> > This latter method is the "logged" approach that has been discussed for
>> > deferred evaluation.
>> >
>> > tl;dr It seems that AFTER STATEMENT triggers required a deferred-like
>> > implementation anyway.
>>
>> You are right,  the latest patch doesn't support the situation where
>> multiple tables are modified in a query. I noticed this when working
>> on self-join, which also virtually need to handle multiple table
>> modification.
>>
>> I am now working on this issue and the next patch will enable to handle
>> this situation. I plan to submit the patch during this month. Roughly
>> speaking, in the new implementation, AFTER STATEMENT triggers are used to
>> collect  information of modified table and its changes (= transition
>> tables),
>> and then the only last trigger updates the view. This will avoid the
>> double-counting. I think this implementation also would be a base of
>> deferred approach implementation in future where "logs" are used instead
>> of transition tables.
>>
>> Regards,
>> Yugo Nagata
>>
>> --
>> Yugo Nagata <nagata@sraoss.co.jp>
>>

Re: Implementing Incremental View Maintenance

От

Yugo Nagata

Дата:

27 сентября 2019 г., 08:17:42

On Tue, 17 Sep 2019 12:03:20 -0600
Paul Draper <paulddraper@gmail.com> wrote:

> Have you had any thoughts for more than two joined tables?
> 
> Either there needs to be an quadratic number of joins, or intermediate join
> results need to be stored and reused.

I don't think that we need to store intermediate join results.

Suppose that  we have a view V joining table R,S, and new tuples are inserted
to each table, dR,dS, and dT respectively.

 V = R*S*T
 R_new = R + dR
 S_new = S + dS
 T_new = T + dT

In this situation, we can calculate the new view state as bellow.

V_new 
= R_new * S_new * T_new
= (R + dR) * (S + dS) * (T + dT)
= R*S*T + dR*(S + dS)*(T + dT) + R*dS*(T + dT) + R*S*dT
= V + dR*(S + dS)*(T + dT) + R*dS*(T + dT) + R*S*dT

Although the number of terms is 2^3(=8), if we can use both of pre-update
state (eg. R) and post-update state (eg. R+dR), we only need only three joins.
Actually, post-update state is available in AFTER trigger, and pre-update state
can be calculated by using delta tables (transition tables) and cmin/xmin system
columns (or snapshot). This is the approach my implementation adopts.


> 
> On Tue, Sep 17, 2019 at 8:50 AM Yugo Nagata <nagata@sraoss.co.jp> wrote:
> 
> > Hi Paul,
> >
> > Thank you for your suggestion.
> >
> > On Sun, 15 Sep 2019 11:52:22 -0600
> > Paul Draper <paulddraper@gmail.com> wrote:
> >
> > > As I understand it, the current patch performs immediate IVM using AFTER
> > > STATEMENT trigger transition tables.
> > >
> > > However, multiple tables can be modified *before* AFTER STATEMENT
> > triggers
> > > are fired.
> > >
> > > CREATE TABLE example1 (a int);
> > > CREATE TABLE example2 (a int);
> > >
> > > CREATE INCREMENTAL MATERIALIZED VIEW mv AS
> > > SELECT example1.a, example2.a
> > > FROM example1 JOIN example2 ON a;
> > >
> > > WITH
> > >   insert1 AS (INSERT INTO example1 VALUES (1)),
> > >   insert2 AS (INSERT INTO example2 VALUES (1))
> > > SELECT NULL;
> > >
> > > Changes to example1 are visible in an AFTER STATEMENT trigger on
> > example2,
> > > and vice versa. Would this not result in the (1, 1) tuple being
> > > "double-counted"?
> > >
> > > IVM needs to either:
> > >
> > > (1) Evaluate deltas "serially' (e.g. EACH ROW triggers)
> > >
> > > (2) Have simultaneous access to multiple deltas:
> > > delta_mv = example1 x delta_example2 + example2 x delta_example1 -
> > > delta_example1 x delta_example2
> > >
> > > This latter method is the "logged" approach that has been discussed for
> > > deferred evaluation.
> > >
> > > tl;dr It seems that AFTER STATEMENT triggers required a deferred-like
> > > implementation anyway.
> >
> > You are right,  the latest patch doesn't support the situation where
> > multiple tables are modified in a query. I noticed this when working
> > on self-join, which also virtually need to handle multiple table
> > modification.
> >
> > I am now working on this issue and the next patch will enable to handle
> > this situation. I plan to submit the patch during this month. Roughly
> > speaking, in the new implementation, AFTER STATEMENT triggers are used to
> > collect  information of modified table and its changes (= transition
> > tables),
> > and then the only last trigger updates the view. This will avoid the
> > double-counting. I think this implementation also would be a base of
> > deferred approach implementation in future where "logs" are used instead
> > of transition tables.
> >
> > Regards,
> > Yugo Nagata
> >
> > --
> > Yugo Nagata <nagata@sraoss.co.jp>
> >


-- 
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Yugo Nagata

Дата:

30 сентября 2019 г., 13:34:14

Hi,

Attached is the latest patch for supporting self-join views. This also
including the following fix mentioned by Tatsuo Ishii.

> > On 2019-Aug-06, Tatsuo Ishii wrote:
> > 
> >> It's not mentioned below but some bugs including seg fault when
> >> --enable-casser is enabled was also fixed in this patch.
> >> 
> >> BTW, I found a bug with min/max support in this patch and I believe
> >> Yugo is working on it. Details:
> >> https://github.com/sraoss/pgsql-ivm/issues/20

This patch allows to support self-join views, simultaneous updates of more
than one base tables, and also multiple updates of the same base table. 
I first tried to support just self-join, but I found that this is essentially
same as to support simultaneous table updates, so I decided to support them in 
the same commit. I think this will be a base for implementing
Deferred-maintenance in future.

In the new implementation, AFTER triggers are used to collecting tuplestores 
containing transition table contents. When multiple tables are changed, 
multiple AFTER triggers are invoked, then the final AFTER trigger performs 
actual update of the matview. In addition AFTER trigger, also BEFORE trigger
is used to handle global information for view maintenance. 

For example, suppose that we have a view V joining table R,S, and new tuples are
inserted to each table, dR,dS, and dT respectively.

 V = R*S*T
 R_new = R + dR
 S_new = S + dS
 T_new = T + dT

In this situation, we can calculate the new view state as bellow.

V_new 
= R_new * S_new * T_new
= (R + dR) * (S + dS) * (T + dT)
= R*S*T + dR*(S + dS)*(T + dT) + R*dS*(T + dT) + R*S*dT
= V + dR*(S + dS)*(T + dT) + R*dS*(T + dT) + R*S*dT
= V + (dR *S_new*T_new) + (R*dS*T_new) + (R*S*dT)

To calculate view deltas, we need both pre-state (R,S, and T) and post-state 
(R_new, S_new, and T_new) of base tables. 

Post-update  states are available in AFTER trigger, and we calculate pre-update
states by filtering inserted tuples using cmin/xmin system columns, and appendding
deleted  tuples which are contained in a old transition table.

In the original core implementation, tuplestores of transition tables were 
freed for each query depth. However, we want to prolong their life plan because
we have to preserve these for a whole query assuming some base tables are changed
in other trigger functions, so I added a hack to trigger.c.

Regression tests are also added for self join view, multiple change on the same
table, simultaneous two table changes, and foreign reference constrains.

Here are behavior examples:

1. Table definition
- t: for self-join
- r,s: for 2-ways join

CREATE TABLE r (i int, v int);
CREATE TABLE
CREATE TABLE s (i int, v int);
CREATE TABLE
CREATE TABLE t (i int, v int);
CREATE TABLE

2. Initial data

INSERT INTO r VALUES (1, 10), (2, 20), (3, 30);
INSERT 0 3
INSERT INTO s VALUES (1, 100), (2, 200), (3, 300);
INSERT 0 3
INSERT INTO t VALUES (1, 10), (2, 20), (3, 30);
INSERT 0 3

3. View definition

3.1. self-join（mv_self, v_slef)

CREATE INCREMENTAL MATERIALIZED VIEW mv_self(v1, v2) AS
 SELECT t1.v, t2.v FROM t t1 JOIN t t2 ON t1.i = t2.i;
SELECT 3
CREATE VIEW v_self(v1, v2) AS
 SELECT t1.v, t2.v FROM t t1 JOIN t t2 ON t1.i = t2.i;
CREATE VIEW

3.2. 2-ways join (mv, v)

CREATE INCREMENTAL MATERIALIZED VIEW mv(v1, v2) AS
 SELECT r.v, s.v FROM r JOIN s USING(i);
SELECT 3
CREATE VIEW v(v1, v2) AS
 SELECT r.v, s.v FROM r JOIN s USING(i);
CREATE VIEW

3.3 Initial contents

SELECT * FROM mv_self ORDER BY v1;
 v1 | v2 
----+----
 10 | 10
 20 | 20
 30 | 30
(3 rows)

SELECT * FROM mv ORDER BY v1;
 v1 | v2  
----+-----
 10 | 100
 20 | 200
 30 | 300
(3 rows)

4. Update a base table for the self-join view

INSERT INTO t VALUES (4,40);
INSERT 0 1
DELETE FROM t WHERE i = 1;
DELETE 1
UPDATE t SET v = v*10 WHERE i=2;
UPDATE 1

4.1. Results
- Comparison with the normal view

SELECT * FROM mv_self ORDER BY v1;
 v1  | v2  
-----+-----
  30 |  30
  40 |  40
 200 | 200
(3 rows)

SELECT * FROM v_self ORDER BY v1;
 v1  | v2  
-----+-----
  30 |  30
  40 |  40
 200 | 200
(3 rows)

5. pdate a base table for the 2-way join view

WITH
 ins_r AS (INSERT INTO r VALUES (1,11) RETURNING 1),
 ins_r2 AS (INSERT INTO r VALUES (3,33) RETURNING 1),
 ins_s AS (INSERT INTO s VALUES (2,222) RETURNING 1),
 upd_r AS (UPDATE r SET v = v + 1000 WHERE i = 2 RETURNING 1),
 dlt_s AS (DELETE FROM s WHERE i = 3 RETURNING 1)
SELECT NULL;
 ?column? 
----------

(1 row)

5.1. Results
- Comparison with the normal view

SELECT * FROM mv ORDER BY v1;
  v1  | v2  
------+-----
   10 | 100
   11 | 100
 1020 | 200
 1020 | 222
(4 rows)

SELECT * FROM v ORDER BY v1;
  v1  | v2  
------+-----
   10 | 100
   11 | 100
 1020 | 200
 1020 | 222
(4 rows)

========

Best Regards,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>

Вложения

WIP_immediate_IVM_v6.patch

Re: Implementing Incremental View Maintenance

От

Tatsuo Ishii

Дата:

22 ноября 2019 г., 06:29:45

Attached is the latest patch to add support for Incremental
Materialized View Maintenance (IVM). IVM allows to reflect
modifications made on base tables immediately to the target
materialized views.

Up to now, IVM supports materialized views using:

- Inner joins
- Some aggregate functions (count, sum, min, max, avg)
- GROUP BY
- Self joins

With the latest patch now IVM supports subqueries in addition to
above.

Known limitations are listed here:

https://github.com/sraoss/pgsql-ivm/issues

See more details at:
https://wiki.postgresql.org/wiki/Incremental_View_Maintenance

About subquery support:

The patch supports simple subqueries using EXISTS:

CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_exists_subquery AS SELECT
a.i, a.j FROM mv_base_a a WHERE EXISTS(SELECT 1 FROM mv_base_b b WHERE
a.i = b.i);

and subqueries in the FROM clause:

CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_subquery AS SELECT a.i,a.j
FROM mv_base_a a,( SELECT * FROM mv_base_b) b WHERE a.i = b.i;

Other form of subqueries such as below are not supported:

-- WHERE IN .. (subquery) is not supported
CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm03 AS SELECT i,j FROM
mv_base_a WHERE i IN (SELECT i FROM mv_base_b WHERE k < 103 );

-- subqueries in target list is not supported
CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm05 AS SELECT i,j, (SELECT k
FROM mv_base_b b WHERE a.i = b.i) FROM mv_base_a a;

-- nested EXISTS subqueries is not supported
CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm11 AS SELECT a.i,a.j FROM
mv_base_a a WHERE EXISTS(SELECT 1 FROM mv_base_b b WHERE EXISTS(SELECT
1 FROM mv_base_b c WHERE b.i = c.i));

-- EXISTS subquery with aggragate function is not supported
CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_exists AS SELECT COUNT(*)
FROM mv_base_a a WHERE EXISTS(SELECT 1 FROM mv_base_b b WHERE a.i =
b.i) OR a.i > 5;

--  EXISTS subquery with condition except AND is not supported.
CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm10 AS SELECT a.i,a.j FROM
mv_base_a a WHERE EXISTS(SELECT 1 FROM mv_base_b b WHERE a.i = b.i) OR
a.i > 5;

This work has been done by Yugo Nagata (nagata@sraoss.co.jp), Takuma
Hoshiai (hoshiai@sraoss.co.jp).  Adding support for EXISTS clause has
been done by Takuma.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Вложения

IVM_v7.patch.gz

Re: Implementing Incremental View Maintenance

От

Yugo Nagata

Дата:

26 ноября 2019 г., 07:02:25

Hi,

Attached is the latest patch (v8) to add support for Incremental View
Maintenance (IVM).  This patch adds OUTER join support in addition
to the patch (v7) submitted last week in the following post.

On Fri, 22 Nov 2019 15:29:45 +0900 (JST)
Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
> Up to now, IVM supports materialized views using:
> 
> - Inner joins
> - Some aggregate functions (count, sum, min, max, avg)
> - GROUP BY
> - Self joins
> 
> With the latest patch now IVM supports subqueries in addition to
> above.
> 
> Known limitations are listed here:
> 
> https://github.com/sraoss/pgsql-ivm/issues
> 
> See more details at:
> https://wiki.postgresql.org/wiki/Incremental_View_Maintenance

* About outer join support:

In case of outer-join, when a table is modified, in addition to deltas
which occur in inner-join case, we also need to deletion or insertion of
dangling tuples, that is, null-extended tuples generated when a join
condition isn't met.

[Example]
---------------------------------------------
-- Create base tables and an outer join view
CREATE TABLE r(i int);
CREATE TABLE s(j int);
INSERT INTO r VALUES (1);
CREATE INCREMENTAL MATERIALIZED VIEW mv 
 AS SELECT * FROM r LEFT JOIN s ON r.i=s.j;
SELECT * FROM mv;
 i | j 
---+---
(1 row)

-- After an insertion to a base table ...
INSERT INTO s VALUES (1);
 -- (1,1) is inserted and (1,null) is deleted from the view.
SELECT * FROM mv;
 i | j 
---+---
 1 | 1
(1 row)
---------------------------------------------

Our implementation is basically based on the algorithm of Larson & Zhou
(2007) [1]. Before view maintenances, the view definition query's jointree
is analysed to make "view maintenance graph". This graph represents
which tuples in the views are affected when a base table is modified.
Specifically, tuples which are not null-extended on the modified table
(that is, tuples generated by joins with the modiifed table) are directly
affected. The delta of such effects are calculated similarly to inner-joins.

On the other hand, dangling tuples generated by anti-joins with directly
affected tuples can be indirectly affected. This means that we may need to
delete dangling tuples when any tuples are inserted to a table, as well as
to insert dangling tuples when tuples are deleted from a table.

[1] Efficient Maintenance of Materialized Outer-Join Views (Larson & Zhou, 2007)
https://ieeexplore.ieee.org/document/4221654

Although the original paper assumes that every base table and view have a
unique key and tuple duplicates is disallowed, we allow this. If a view has
tuple duplicates, we have to determine the number of each dangling tuple to
be inserted into the view when tuples in a table are deleted. For this purpose,
we count the number of each tuples which constitute a deleted tuple. These
counts are stored as JSONB object in the delta table, and we use this
information to maintain outer-join views. Also, we support outer self-joins
that is not assumed in the original paper.

* Restrictions

Currently, we have following restrictions:

- outer join view's targetlist must contain attributes used in join conditions
- outer join view's targetlist cannot contain non-strict functions
- outer join supports only simple equijoin
- outer join view's WHERE clause cannot contain non null-rejecting predicates
- aggregate is not supported with outer join
- subquery (including EXSITS) is not supported with outer join

Regression tests for all patterns of 3-way outer join and are added. 

Moreover, I reordered IVM related functions in matview.c so that ones
which have relationship will be located closely. Moreover, I added more
comments.

Regards,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>

Вложения

IVM_v8.patch.gz

Re: Implementing Incremental View Maintenance

От

Tatsuo Ishii

Дата:

26 ноября 2019 г., 07:14:21

Note that this is the last patch in the series of IVM patches: now we
would like focus on blushing up the patches, rather than adding new
SQL support to IVM, so that the patch is merged into PostgreSQL 13
(hopefully). We are very welcome reviews, comments on the patch.

BTW, the SGML docs in the patch is very poor at this point. I am going
to add more descriptions to the doc.

> Hi,
> 
> Attached is the latest patch (v8) to add support for Incremental View
> Maintenance (IVM).  This patch adds OUTER join support in addition
> to the patch (v7) submitted last week in the following post.
> 
> On Fri, 22 Nov 2019 15:29:45 +0900 (JST)
> Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
>> Up to now, IVM supports materialized views using:
>> 
>> - Inner joins
>> - Some aggregate functions (count, sum, min, max, avg)
>> - GROUP BY
>> - Self joins
>> 
>> With the latest patch now IVM supports subqueries in addition to
>> above.
>> 
>> Known limitations are listed here:
>> 
>> https://github.com/sraoss/pgsql-ivm/issues
>> 
>> See more details at:
>> https://wiki.postgresql.org/wiki/Incremental_View_Maintenance
> 
> * About outer join support:
> 
> In case of outer-join, when a table is modified, in addition to deltas
> which occur in inner-join case, we also need to deletion or insertion of
> dangling tuples, that is, null-extended tuples generated when a join
> condition isn't met.
> 
> [Example]
> ---------------------------------------------
> -- Create base tables and an outer join view
> CREATE TABLE r(i int);
> CREATE TABLE s(j int);
> INSERT INTO r VALUES (1);
> CREATE INCREMENTAL MATERIALIZED VIEW mv 
>  AS SELECT * FROM r LEFT JOIN s ON r.i=s.j;
> SELECT * FROM mv;
>  i | j 
> ---+---
> (1 row)
> 
> -- After an insertion to a base table ...
> INSERT INTO s VALUES (1);
>  -- (1,1) is inserted and (1,null) is deleted from the view.
> SELECT * FROM mv;
>  i | j 
> ---+---
>  1 | 1
> (1 row)
> ---------------------------------------------
> 
> Our implementation is basically based on the algorithm of Larson & Zhou
> (2007) [1]. Before view maintenances, the view definition query's jointree
> is analysed to make "view maintenance graph". This graph represents
> which tuples in the views are affected when a base table is modified.
> Specifically, tuples which are not null-extended on the modified table
> (that is, tuples generated by joins with the modiifed table) are directly
> affected. The delta of such effects are calculated similarly to inner-joins.
> 
> On the other hand, dangling tuples generated by anti-joins with directly
> affected tuples can be indirectly affected. This means that we may need to
> delete dangling tuples when any tuples are inserted to a table, as well as
> to insert dangling tuples when tuples are deleted from a table.
> 
> [1] Efficient Maintenance of Materialized Outer-Join Views (Larson & Zhou, 2007)
> https://ieeexplore.ieee.org/document/4221654
> 
> Although the original paper assumes that every base table and view have a
> unique key and tuple duplicates is disallowed, we allow this. If a view has
> tuple duplicates, we have to determine the number of each dangling tuple to
> be inserted into the view when tuples in a table are deleted. For this purpose,
> we count the number of each tuples which constitute a deleted tuple. These
> counts are stored as JSONB object in the delta table, and we use this
> information to maintain outer-join views. Also, we support outer self-joins
> that is not assumed in the original paper.
> 
> * Restrictions
> 
> Currently, we have following restrictions:
> 
> - outer join view's targetlist must contain attributes used in join conditions
> - outer join view's targetlist cannot contain non-strict functions
> - outer join supports only simple equijoin
> - outer join view's WHERE clause cannot contain non null-rejecting predicates
> - aggregate is not supported with outer join
> - subquery (including EXSITS) is not supported with outer join
> 
> 
> Regression tests for all patterns of 3-way outer join and are added. 
> 
> Moreover, I reordered IVM related functions in matview.c so that ones
> which have relationship will be located closely. Moreover, I added more
> comments.
> 
> Regards,
> Yugo Nagata
> 
> 
> -- 
> Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Tatsuo Ishii

Дата:

28 ноября 2019 г., 02:26:40

> Note that this is the last patch in the series of IVM patches: now we
> would like focus on blushing up the patches, rather than adding new
> SQL support to IVM, so that the patch is merged into PostgreSQL 13
> (hopefully). We are very welcome reviews, comments on the patch.
> 
> BTW, the SGML docs in the patch is very poor at this point. I am going
> to add more descriptions to the doc.

As promised, I have created the doc (CREATE MATERIALIZED VIEW manual)
patch.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
diff --git a/doc/src/sgml/ref/create_materialized_view.sgml b/doc/src/sgml/ref/create_materialized_view.sgml
index 964c9abbf7..92f5668771 100644
--- a/doc/src/sgml/ref/create_materialized_view.sgml
+++ b/doc/src/sgml/ref/create_materialized_view.sgml
@@ -62,36 +62,167 @@ CREATE [ INCREMENTAL ] MATERIALIZED VIEW [ IF NOT EXISTS ] <replaceable>table_na
       of the materialized view are immediately updated when base tables of the
       materialized view are updated. In general, this allows faster update of
       the materialized view at a price of slower update of the base tables
-      because the triggers will be invoked.
+      because the triggers will be invoked. We call this form of materialized
+      view as "Incremantal materialized View Maintenance" (IVM).
      </para>
      <para>
       There are restrictions of query definitions allowed to use this
-      option. Followings are allowed query definitions:
+      option. Followings are supported query definitions for IVM:
       <itemizedlist>
+
        <listitem>
         <para>
          Inner joins (including self-joins).
         </para>
        </listitem>
+
        <listitem>
         <para>
-         Some of aggregations (count, sum, avg, min, max) without HAVING clause.
+         Outer joins with following restrictions:
+
+         <itemizedlist>
+          <listitem>
+           <para>
+            Outer join view's targetlist must contain attributes used in the
+            join conditions.
+            <programlisting>
+CREATE INCREMENTAL MATERIALIZED VIEW mv AS SELECT a.i FROM mv_base_a a LEFT
+JOIN mv_base_b b ON a.i=b.i;
+ERROR:  targetlist must contain vars in the join condition for IVM with outer join
+            </programlisting>
+           </para>
+          </listitem>
+
+          <listitem>
+           <para>
+            Outer join view's targetlist cannot contain non strict functions.
+            <programlisting>
+CREATE INCREMENTAL MATERIALIZED VIEW mv AS SELECT a.i, b.i, (k > 10 OR k = -1)
+FROM mv_base_a a LEFT JOIN mv_base_b b ON a.i=b.i;
+ERROR:  targetlist cannot contain non strict functions for IVM with outer join
+             </programlisting>
+           </para>
+          </listitem>
+
+          <listitem>
+           <para>
+            Outer join supports only simple equijoin.
+            <programlisting>
+CREATE INCREMENTAL MATERIALIZED VIEW mv(a,b) AS SELECT a.i, b.i FROM mv_base_a
+a LEFT JOIN mv_base_b b ON a.i>b.i;
+ERROR:  Only simple equijoin is supported for IVM with outer join
+CREATE INCREMENTAL MATERIALIZED VIEW mv(a,b,k,j) AS SELECT a.i, b.i, k j FROM
+mv_base_a a LEFT JOIN mv_base_b b ON a.i=b.i AND k=j;
+ERROR:  Only simple equijoin is supported for IVM with outer join
+            </programlisting>
+           </para>
+           </listitem>
+
+          <listitem>
+           <para>
+            Outer join view's WHERE clause cannot contain non null-rejecting
+            predicates.
+            <programlisting>
+CREATE INCREMENTAL MATERIALIZED VIEW mv(a,b) AS SELECT a.i, b.i FROM mv_base_a
+a LEFT JOIN mv_base_b b ON a.i=b.i WHERE k IS NULL;
+ERROR:  WHERE cannot contain non null-rejecting predicates for IVM with outer join
+CREATE INCREMENTAL MATERIALIZED VIEW mv(a,b) AS SELECT a.i, b.i FROM mv_base_a
+a LEFT JOIN mv_base_b b ON a.i=b.i WHERE (k > 10 OR k = -1);
+ERROR:  WHERE cannot contain non null-rejecting predicates for IVM with outer join
+            </programlisting>
+           </para>
+          </listitem>
+
+          <listitem>
+           <para>
+            Aggregate is not supported with outer join.
+            <programlisting>
+CREATE INCREMENTAL MATERIALIZED VIEW mv(a,b,v) AS SELECT a.i, b.i, sum(k) FROM
+mv_base_a a LEFT JOIN mv_base_b b ON a.i=b.i GROUP BY a.i, b.i;
+ERROR:  aggregate is not supported with IVM together with outer join
+</programlisting>
+           </para>
+          </listitem>
+
+          <listitem>
+           <para>
+            Subquery is not supported with outer join.
+            <programlisting>
+CREATE INCREMENTAL MATERIALIZED VIEW mv(a,b) AS SELECT a.i, b.i FROM mv_base_a
+a LEFT JOIN (SELECT * FROM mv_base_b) b ON a.i=b.i;
+ERROR:  subquery is not supported with IVM together with outer join
+CREATE INCREMENTAL MATERIALIZED VIEW mv(a,b) AS SELECT a.i, b.i FROM mv_base_a
+a LEFT JOIN mv_base_b b ON a.i=b.i WHERE EXISTS (SELECT 1 FROM mv_base_b b2
+WHERE a.j = b.k);
+ERROR:  subquery is not supported by IVM together with outer join
+             </programlisting>
+           </para>
+          </listitem>
+         </itemizedlist>
         </para>
-        </listitem>
-      </itemizedlist>
 
-      Prohibited queries with this option include followings:
-      <itemizedlist>
+       </listitem>
+
        <listitem>
         <para>
-         Outer joins.
+         Subqueries. However following forms are not supported.
         </para>
-       </listitem>
+
+        <para>
+         WHERE IN .. (subquery) is not supported:
+         <programlisting>
+CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm03 AS SELECT i,j FROM
+mv_base_a WHERE i IN (SELECT i FROM mv_base_b WHERE k < 103 );
+         </programlisting>
+        </para>
+        <para>
+         subqueries in target list is not supported:
+         <programlisting>
+CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm05 AS SELECT i,j, (SELECT k
+FROM mv_base_b b WHERE a.i = b.i) FROM mv_base_a a;
+         </programlisting>
+        </para>
+        <para>
+         Nested EXISTS subqueries is not supported:
+         <programlisting>
+CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm11 AS SELECT a.i,a.j FROM
+mv_base_a a WHERE EXISTS(SELECT 1 FROM mv_base_b b WHERE EXISTS(SELECT
+1 FROM mv_base_b c WHERE b.i = c.i));
+         </programlisting>
+        </para>
+        <para>
+         EXISTS subquery with aggregate function is not supported:
+         <programlisting>
+CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_exists AS SELECT COUNT(*)
+FROM mv_base_a a WHERE EXISTS(SELECT 1 FROM mv_base_b b WHERE a.i =
+b.i) OR a.i > 5;
+         </programlisting>
+        </para>
+        <para>
+         EXISTS subquery with condition other than AND is not supported:
+         <programlisting>
+CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm10 AS SELECT a.i,a.j FROM
+mv_base_a a WHERE EXISTS(SELECT 1 FROM mv_base_b b WHERE a.i = b.i) OR
+a.i > 5;
+         </programlisting>
+        </para>
+        </listitem>
+
        <listitem>
         <para>
-         Subqueries.
+         Some of aggregations (count, sum, avg, min, max) without HAVING
+         clause.  However, aggregate functions in subquery is not supported:
+         <programlisting>
+CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm09 AS SELECT a.i,a.j FROM mv_base_a
+a, (SELECT i, COUNT(*) FROM mv_base_b GROUP BY i) b WHERE a.i = b.i;
+         </programlisting>
         </para>
         </listitem>
+      </itemizedlist>
+
+      Unsupported queries with this option include followings:
+
+      <itemizedlist>
        <listitem>
         <para>
          Aggregations other than count, sum, avg, min and max.
@@ -111,24 +242,50 @@ CREATE [ INCREMENTAL ] MATERIALIZED VIEW [ IF NOT EXISTS ] <replaceable>table_na
 
       Other restrictions include:
       <itemizedlist>
+
        <listitem>
         <para>
-         Incremental materialized views must be based on simple base
-         tables. Views or materialized views are not allowed to create
-         incremental materialized views.
+         IVMs must be based on simple base tables. Views or materialized views
+         are not allowed to create IVM on them.
         </para>
        </listitem>
+
+       <listitem>
+        <para>
+         <command>pg_dump</command> and <command>pg_restore</command> do not
+         support IVMs. IVMs are dumped as ordinary materialized views.
+        </para>
+       </listitem>
+
+       <listitem>
+        <para>
+         <command>REFRESH MATERIALIZED VIEW</command> does not support IVMs.
+        </para>
+       </listitem>
+
        <listitem>
         <para>
          When TRUNCATE command is executed on a base table, nothing occurs and
          this is not applied to the materialized view.
         </para>
        </listitem>
+
+       <listitem>
+        <para>
+         IVM including system columns is not supported.
+         <programlisting>
+CREATE INCREMENTAL MATERIALIZED VIEW  mv_ivm02 AS SELECT i,j FROM mv_base_a WHERE xmin = '610';
+ERROR:  system column is not supported with IVM
+         </programlisting>
+        </para>
+       </listitem>
+
        <listitem>
         <para>
-         Incremental materialized views are not supported by logical replication.
+         IVMs not supported by logical replication.
         </para>
         </listitem>
+
       </itemizedlist>
 
      </para>

Re: Implementing Incremental View Maintenance

От

Yugo Nagata

Дата:

28 ноября 2019 г., 08:10:52

On Thu, 28 Nov 2019 11:26:40 +0900 (JST)
Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

> > Note that this is the last patch in the series of IVM patches: now we
> > would like focus on blushing up the patches, rather than adding new
> > SQL support to IVM, so that the patch is merged into PostgreSQL 13
> > (hopefully). We are very welcome reviews, comments on the patch.
> > 
> > BTW, the SGML docs in the patch is very poor at this point. I am going
> > to add more descriptions to the doc.
> 
> As promised, I have created the doc (CREATE MATERIALIZED VIEW manual)
> patch.

-      because the triggers will be invoked.
+      because the triggers will be invoked. We call this form of materialized
+      view as "Incremantal materialized View Maintenance" (IVM).

This part seems incorrect to me.  Incremental (materialized) View
Maintenance (IVM) is a way to maintain materialized views and is not a
word to refer views to be maintained.

However, it would be useful if there is a term referring views which
can be maintained using IVM. Off the top of my head, we can call this
Incrementally Maintainable Views (= IMVs), but this might cofusable with
IVM, so I'll think about that a little more....

Regards,
Yugo Nagata

> 
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp

-- 
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Alvaro Herrera

Дата:

28 ноября 2019 г., 14:03:33

One thing pending in this development line is how to catalogue aggregate
functions that can be used in incrementally-maintainable views.
I saw a brief mention somewhere that the devels knew it needed to be
done, but I don't see in the thread that they got around to doing it.
Did you guys have any thoughts on how it can be represented in catalogs?
It seems sine-qua-non ...

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Implementing Incremental View Maintenance

От

Tatsuo Ishii

Дата:

28 ноября 2019 г., 22:19:44

>> As promised, I have created the doc (CREATE MATERIALIZED VIEW manual)
>> patch.
> 
> -      because the triggers will be invoked.
> +      because the triggers will be invoked. We call this form of materialized
> +      view as "Incremantal materialized View Maintenance" (IVM).
> 
> This part seems incorrect to me.  Incremental (materialized) View
> Maintenance (IVM) is a way to maintain materialized views and is not a
> word to refer views to be maintained.
> 
> However, it would be useful if there is a term referring views which
> can be maintained using IVM. Off the top of my head, we can call this
> Incrementally Maintainable Views (= IMVs), but this might cofusable with
> IVM, so I'll think about that a little more....

But if we introduce IMV, IVM would be used in much less places in the
doc and source code, so less confusion would happen, I guess.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: Implementing Incremental View Maintenance

От

Tatsuo Ishii

Дата:

29 ноября 2019 г., 00:50:49

> Hi,
> 
> Attached is the latest patch (v8) to add support for Incremental View
> Maintenance (IVM).  This patch adds OUTER join support in addition
> to the patch (v7) submitted last week in the following post.

There's a compiler warning:

matview.c: In function ‘getRteListCell’:
matview.c:2685:9: warning: ‘rte_lc’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  return rte_lc;
         ^~~~~~

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: Implementing Incremental View Maintenance

От

Yugo Nagata

Дата:

29 ноября 2019 г., 00:56:15

On Fri, 29 Nov 2019 09:50:49 +0900 (JST)
Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

> > Hi,
> > 
> > Attached is the latest patch (v8) to add support for Incremental View
> > Maintenance (IVM).  This patch adds OUTER join support in addition
> > to the patch (v7) submitted last week in the following post.
> 
> There's a compiler warning:
> 
> matview.c: In function ‘getRteListCell’:
> matview.c:2685:9: warning: ‘rte_lc’ may be used uninitialized in this function [-Wmaybe-uninitialized]
>   return rte_lc;
>          ^~~~~~

Thanks! I'll fix this.

> 
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp


-- 
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Yugo Nagata

Дата:

29 ноября 2019 г., 04:00:53

On Fri, 29 Nov 2019 07:19:44 +0900 (JST)
Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

> >> As promised, I have created the doc (CREATE MATERIALIZED VIEW manual)
> >> patch.
> > 
> > -      because the triggers will be invoked.
> > +      because the triggers will be invoked. We call this form of materialized
> > +      view as "Incremantal materialized View Maintenance" (IVM).
> > 
> > This part seems incorrect to me.  Incremental (materialized) View
> > Maintenance (IVM) is a way to maintain materialized views and is not a
> > word to refer views to be maintained.
> > 
> > However, it would be useful if there is a term referring views which
> > can be maintained using IVM. Off the top of my head, we can call this
> > Incrementally Maintainable Views (= IMVs), but this might cofusable with
> > IVM, so I'll think about that a little more....
> 
> But if we introduce IMV, IVM would be used in much less places in the
> doc and source code, so less confusion would happen, I guess.

Make senses. However, we came to think that "Incrementally Maintainable
Materialized Views" (IMMs) would be good. So, how about using this for now?
When other better opinions are raised, let's discuss again

Regards,
Yugo Nagata

> 
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp


-- 
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Tatsuo Ishii

Дата:

29 ноября 2019 г., 04:46:05

>> But if we introduce IMV, IVM would be used in much less places in the
>> doc and source code, so less confusion would happen, I guess.
> 
> Make senses. However, we came to think that "Incrementally Maintainable
> Materialized Views" (IMMs) would be good. So, how about using this for now?
> When other better opinions are raised, let's discuss again

Sounds good to me.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: Implementing Incremental View Maintenance

От

Amit Langote

Дата:

29 ноября 2019 г., 06:34:52

Hello,

Thanks a lot for working on this.  It's a great (and big!) feature and
I can see that a lot of work has been put into writing this patch.  I
started looking at the patch (v8), but as it's quite big:

 34 files changed, 5444 insertions(+), 69 deletions(-)

I'm having a bit of trouble reading through, which I suspect others
may be too.  Perhaps, it can be easier for you, as authors, to know
everything that's being changed (added, removed, existing code
rewritten), but certainly not for a reviewer, so I think it would be a
good idea to try to think dividing this into parts.  I still don't
have my head wrapped around the topic of materialized view
maintenance, but roughly it looks to me like there are really *two*
features that are being added:

1. Add a new method to refresh an MV incrementally; IIUC, there's
already one method that's used by REFRESH MATERIALIZED VIEW
CONCURRENTLY, correct?

2. Make the refresh automatic (using triggers on the component tables)

Maybe, there are even:

0. Infrastructure additions

As you can tell, having the patch broken down like this would allow us
to focus on the finer aspects of each of the problem being solved and
solution being adopted, for example:

* It would be easier for someone having an expert opinion on how to
implement incremental refresh to have to only look at the patch for
(1).  If the new method handles more query types than currently, which
obviously means more code is needed, which in turn entails possibility
of bugs, despite the best efforts.  It would be better to get more
eyeballs at this portion of the patch and having it isolated seems
like a good way to attract more eyeballs.

* Someone well versed in trigger infrastructure can help fine tune the
patch for (2)

and so on.

So, please consider giving some thought to this.

Thanks,
Amit

Re: Implementing Incremental View Maintenance

От

Yugo Nagata

Дата:

29 ноября 2019 г., 06:45:13

The following review on our patch was posted on another thread,
so I quote here. The tab completion is Hoshiai-san's work, so
he will handle this issue.

Regards,
Yugo Nagata.

On Thu, 28 Nov 2019 13:00:05 +0900
nuko yokohama <nuko.yokohama@gmail.com> wrote:

> Hi.
> 
> I'm using the "Incremental Materialized View Maintenance" patch and have
> reported the following issues.
> (https://commitfest.postgresql.org/25/2138/)
> 
> To Suggest a "DROP INCREMENTAL MATERIALIZED VIEW" in psql, but the syntax
> error when you run.
> ("DROP MATERIALIZED VIEW" command can drop Incremental Materialozed view
> normally.)
> 
> 
> ramendb=# CREATE INCREMENTAL MATERIALIZED VIEW pref_count AS SELECT pref,
> COUNT(pref) FROM shops GROUP BY pref;
> SELECT 48
> ramendb=# \d pref_count
>           Materialized view "public.pref_count"
>     Column     |  Type  | Collation | Nullable | Default
> ---------------+--------+-----------+----------+---------
>  pref          | text   |           |          |
>  count         | bigint |           |          |
>  __ivm_count__ | bigint |           |          |
> 
> ramendb=# DROP IN
> INCREMENTAL MATERIALIZED VIEW  INDEX
> ramendb=# DROP INCREMENTAL MATERIALIZED VIEW pref_count;
> 2019-11-27 11:51:03.916 UTC [9759] ERROR:  syntax error at or near
> "INCREMENTAL" at character 6
> 2019-11-27 11:51:03.916 UTC [9759] STATEMENT:  DROP INCREMENTAL
> MATERIALIZED VIEW pref_count;
> ERROR:  syntax error at or near "INCREMENTAL"
> LINE 1: DROP INCREMENTAL MATERIALIZED VIEW pref_count;
>              ^
> ramendb=# DROP MATERIALIZED VIEW pref_count ;
> DROP MATERIALIZED VIEW
> ramendb=#
> 
> 
> Regard.


-- 
Yugo Nagata <nagata@sraoss.co.jp>


-- 
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Takuma Hoshiai

Дата:

29 ноября 2019 г., 07:10:40

On Fri, 29 Nov 2019 15:45:13 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:

> The following review on our patch was posted on another thread,
> so I quote here. The tab completion is Hoshiai-san's work, so
> he will handle this issue.
> 
> Regards,
> Yugo Nagata.
> 
> On Thu, 28 Nov 2019 13:00:05 +0900
> nuko yokohama <nuko.yokohama@gmail.com> wrote:
> 
> > Hi.
> > 
> > I'm using the "Incremental Materialized View Maintenance" patch and have
> > reported the following issues.
> > (https://commitfest.postgresql.org/25/2138/)
> > 
> > To Suggest a "DROP INCREMENTAL MATERIALIZED VIEW" in psql, but the syntax
> > error when you run.
> > ("DROP MATERIALIZED VIEW" command can drop Incremental Materialozed view
> > normally.)

Thank you for your review. This psql's suggestion is mistake, 
"INCREMENTAL MATERIALIZED" phrase is only used for CREATE statement.

I will fix it as the following:

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 2051bc3..8c4b211 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1001,7 +1001,7 @@ static const pgsql_thing_t words_after_create[] = {
     {"FOREIGN TABLE", NULL, NULL, NULL},
     {"FUNCTION", NULL, NULL, Query_for_list_of_functions},
     {"GROUP", Query_for_list_of_roles},
-    {"INCREMENTAL MATERIALIZED VIEW", NULL, NULL, &Query_for_list_of_matviews},
+    {"INCREMENTAL MATERIALIZED VIEW", NULL, NULL, &Query_for_list_of_matviews, THING_NO_DROP | THING_NO_ALTER},
     {"INDEX", NULL, NULL, &Query_for_list_of_indexes},
     {"LANGUAGE", Query_for_list_of_languages},
     {"LARGE OBJECT", NULL, NULL, NULL, THING_NO_CREATE | THING_NO_DROP},


Best Regards,
Takuma Hoshiai

> > ramendb=# CREATE INCREMENTAL MATERIALIZED VIEW pref_count AS SELECT pref,
> > COUNT(pref) FROM shops GROUP BY pref;
> > SELECT 48
> > ramendb=# \d pref_count
> >           Materialized view "public.pref_count"
> >     Column     |  Type  | Collation | Nullable | Default
> > ---------------+--------+-----------+----------+---------
> >  pref          | text   |           |          |
> >  count         | bigint |           |          |
> >  __ivm_count__ | bigint |           |          |
> > 
> > ramendb=# DROP IN
> > INCREMENTAL MATERIALIZED VIEW  INDEX
> > ramendb=# DROP INCREMENTAL MATERIALIZED VIEW pref_count;
> > 2019-11-27 11:51:03.916 UTC [9759] ERROR:  syntax error at or near
> > "INCREMENTAL" at character 6
> > 2019-11-27 11:51:03.916 UTC [9759] STATEMENT:  DROP INCREMENTAL
> > MATERIALIZED VIEW pref_count;
> > ERROR:  syntax error at or near "INCREMENTAL"
> > LINE 1: DROP INCREMENTAL MATERIALIZED VIEW pref_count;
> >              ^
> > ramendb=# DROP MATERIALIZED VIEW pref_count ;
> > DROP MATERIALIZED VIEW
> > ramendb=#
> > 
> > 
> > Regard.
> 
> 
> -- 
> Yugo Nagata <nagata@sraoss.co.jp>
> 
> 
> -- 
> Yugo Nagata <nagata@sraoss.co.jp>
> 


-- 
Takuma Hoshiai <hoshiai@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Yugo Nagata

Дата:

29 ноября 2019 г., 08:33:28

On Thu, 28 Nov 2019 11:03:33 -0300
Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

> One thing pending in this development line is how to catalogue aggregate
> functions that can be used in incrementally-maintainable views.
> I saw a brief mention somewhere that the devels knew it needed to be
> done, but I don't see in the thread that they got around to doing it.
> Did you guys have any thoughts on how it can be represented in catalogs?
> It seems sine-qua-non ...

Yes, this is a pending issue. Currently, supported aggregate functions are
identified their name, that is, we support aggregate functions named "count",
"sum", "avg", "min", or "max". As mentioned before, this is not robust
because there might be user-defined aggregates with these names although all
built-in aggregates can be used in IVM.

In our implementation, the new aggregate values are calculated using "+" and
"-" operations for sum and count, "/" for agv, and ">=" / "<=" for min/max.
Therefore, if there is a user-defined aggregate on a user-defined type which
doesn't support these operators, errors will raise. Obviously, this is a
problem.  Even if these operators are defined, the semantics of user-defined
aggregate functions might not match with the way of maintaining views, and
resultant might be incorrect.

I think there are at least three options to prevent these problems.

In the first option, we support only built-in aggregates which we know able
to handle correctly. Supported aggregates can be identified using their OIDs.
User-defined aggregates are not supported. I think this is the simplest and
easiest way.

Second,  supported aggregates can be identified using name, like the current
implementation, but also it is checked if required operators are defined. In
this case, user-defined aggregates are allowed to some　extent and we can
prevent errors during IVM although aggregates value in view might be
incorrect if the semantics doesn't match.

Third, we can add a new attribute to pg_aggregate which shows if each
aggregate can be used in IVM. We don't need to use names or OIDs list of
supported aggregates although we need modification of the system catalogue.

Regarding pg_aggregate, now we have aggcombinefn attribute for supporting
partial aggregation. Maybe we could use combine functions to calculate new
aggregate values in IVM when tuples are inserted into a table. However, in
the context of IVM, we also need other function used when tuples are deleted
from a table, so we can not use partial aggregation for IVM in the current
implementation. This might be another option to implement "inverse combine
function"(?) for IVM, but I am not sure it worth.

Regards,
Yugo Nagata

>
> --
> Álvaro Herrera                https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Yugo Nagata

Дата:

29 ноября 2019 г., 09:16:00

On Fri, 29 Nov 2019 15:34:52 +0900
Amit Langote <amitlangote09@gmail.com> wrote:

> Thanks a lot for working on this.  It's a great (and big!) feature and
> I can see that a lot of work has been put into writing this patch.  I
> started looking at the patch (v8), but as it's quite big:
> 
>  34 files changed, 5444 insertions(+), 69 deletions(-)

Thank you for your reviewing the patch! Yes, this is a big patch
athough 

> I'm having a bit of trouble reading through, which I suspect others
> may be too.  Perhaps, it can be easier for you, as authors, to know
> everything that's being changed (added, removed, existing code
> rewritten), but certainly not for a reviewer, so I think it would be a
> good idea to try to think dividing this into parts.  I still don't

I agree with you. We also think the need to split the patch and we are
considering the way.

> have my head wrapped around the topic of materialized view
> maintenance, but roughly it looks to me like there are really *two*
> features that are being added:
> 
> 1. Add a new method to refresh an MV incrementally; IIUC, there's
> already one method that's used by REFRESH MATERIALIZED VIEW
> CONCURRENTLY, correct?

No, REFRESH MATERIALIZED VIEW CONCURRENTLY is not the way to refresh
materialized views. This just acquires weaker locks on views to not
prevent SELECT, so this calculate the content of the view completely
from scratch. There is no method to incrementally refresh materialized
views in the current PostgreSQL.

Also, we didn't implement incremental refresh on REFRESH command in
this patch. This supports only automatically refresh using triggers.
However, we used the code for REFRESH in our IVM implementation, so
I think splitting the patch according to this point of view can make
sense.

> 2. Make the refresh automatic (using triggers on the component tables)
> 
> Maybe, there are even:
> 
> 0. Infrastructure additions

Yes, we have a bit modification on the infrastructure, for example,
trigger.c.

> As you can tell, having the patch broken down like this would allow us
> to focus on the finer aspects of each of the problem being solved and
> solution being adopted, for example:
> 
> * It would be easier for someone having an expert opinion on how to
> implement incremental refresh to have to only look at the patch for
> (1).  If the new method handles more query types than currently, which
> obviously means more code is needed, which in turn entails possibility
> of bugs, despite the best efforts.  It would be better to get more
> eyeballs at this portion of the patch and having it isolated seems
> like a good way to attract more eyeballs.
> 
> * Someone well versed in trigger infrastructure can help fine tune the
> patch for (2)
> 
> and so on.
> 
> So, please consider giving some thought to this.

Agreed. Although I am not sure we will do it as above way, we will
consider to split the patch, anyway. Thanks. 

Regards,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Yugo Nagata

Дата:

29 ноября 2019 г., 09:19:54

On Fri, 29 Nov 2019 18:16:00 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:

> On Fri, 29 Nov 2019 15:34:52 +0900
> Amit Langote <amitlangote09@gmail.com> wrote:
> 
> > Thanks a lot for working on this.  It's a great (and big!) feature and
> > I can see that a lot of work has been put into writing this patch.  I
> > started looking at the patch (v8), but as it's quite big:
> > 
> >  34 files changed, 5444 insertions(+), 69 deletions(-)
> 
> Thank you for your reviewing the patch! Yes, this is a big patch
> athough 

Sorry, an unfinished line was left... Please ignore this.

-- 
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Michael Paquier

Дата:

01 декабря 2019 г., 02:55:14

On Fri, Nov 29, 2019 at 06:19:54PM +0900, Yugo Nagata wrote:
> Sorry, an unfinished line was left... Please ignore this.

A rebase looks to be necessary, Mr Robot complains that the patch does
not apply cleanly.  As the thread is active recently, I have moved the
patch to next CF, waiting on author.
--
Michael

Вложения

signature.asc

Re: Implementing Incremental View Maintenance

От

Tatsuo Ishii

Дата:

02 декабря 2019 г., 01:01:18

Michael,

> A rebase looks to be necessary, Mr Robot complains that the patch does
> not apply cleanly.  As the thread is active recently, I have moved the
> patch to next CF, waiting on author.

Thank you for taking care of this patch. Hoshiai-san, can you please
rebase the patch?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: Implementing Incremental View Maintenance

От

Tatsuo Ishii

Дата:

02 декабря 2019 г., 01:36:36

>> One thing pending in this development line is how to catalogue aggregate
>> functions that can be used in incrementally-maintainable views.
>> I saw a brief mention somewhere that the devels knew it needed to be
>> done, but I don't see in the thread that they got around to doing it.
>> Did you guys have any thoughts on how it can be represented in catalogs?
>> It seems sine-qua-non ...
> 
> Yes, this is a pending issue. Currently, supported aggregate functions are
> identified their name, that is, we support aggregate functions named "count",
> "sum", "avg", "min", or "max". As mentioned before, this is not robust
> because there might be user-defined aggregates with these names although all
> built-in aggregates can be used in IVM.
> 
> In our implementation, the new aggregate values are calculated using "+" and
> "-" operations for sum and count, "/" for agv, and ">=" / "<=" for min/max. 
> Therefore, if there is a user-defined aggregate on a user-defined type which
> doesn't support these operators, errors will raise. Obviously, this is a
> problem.  Even if these operators are defined, the semantics of user-defined
> aggregate functions might not match with the way of maintaining views, and
> resultant might be incorrect.
> 
> I think there are at least three options to prevent these problems.
> 
> In the first option, we support only built-in aggregates which we know able
> to handle correctly. Supported aggregates can be identified using their OIDs.
> User-defined aggregates are not supported. I think this is the simplest and
> easiest way.

I think this is enough for the first cut of IVM. So +1.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: Implementing Incremental View Maintenance

От

Tatsuo Ishii

Дата:

02 декабря 2019 г., 01:57:29

> On Fri, Nov 29, 2019 at 06:19:54PM +0900, Yugo Nagata wrote:
>> Sorry, an unfinished line was left... Please ignore this.
> 
> A rebase looks to be necessary, Mr Robot complains that the patch does
> not apply cleanly.

Is this because the patch has ".gz" suffix?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: Implementing Incremental View Maintenance

От

Takuma Hoshiai

Дата:

02 декабря 2019 г., 02:05:38

On Mon, 02 Dec 2019 10:01:18 +0900 (JST)
Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

> Michael,
> 
> > A rebase looks to be necessary, Mr Robot complains that the patch does
> > not apply cleanly.  As the thread is active recently, I have moved the
> > patch to next CF, waiting on author.
> 
> Thank you for taking care of this patch. Hoshiai-san, can you please
> rebase the patch?


Sure,
I re-created a patch. This contains 'IVM_8.patch' and 'create_materialized_view.patch', 
and I checked to apply a patch on latest master.

> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
> 

Best Regards,

-- 
Takuma Hoshiai <hoshiai@sraoss.co.jp>

Вложения

IVM_v9.patch.gz

Re: Implementing Incremental View Maintenance

От

Yugo Nagata

Дата:

02 декабря 2019 г., 06:42:08

On Mon, 02 Dec 2019 10:36:36 +0900 (JST)
Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

> >> One thing pending in this development line is how to catalogue aggregate
> >> functions that can be used in incrementally-maintainable views.
> >> I saw a brief mention somewhere that the devels knew it needed to be
> >> done, but I don't see in the thread that they got around to doing it.
> >> Did you guys have any thoughts on how it can be represented in catalogs?
> >> It seems sine-qua-non ...
> > 
> > Yes, this is a pending issue. Currently, supported aggregate functions are
> > identified their name, that is, we support aggregate functions named "count",
> > "sum", "avg", "min", or "max". As mentioned before, this is not robust
> > because there might be user-defined aggregates with these names although all
> > built-in aggregates can be used in IVM.
> > 
> > In our implementation, the new aggregate values are calculated using "+" and
> > "-" operations for sum and count, "/" for agv, and ">=" / "<=" for min/max. 
> > Therefore, if there is a user-defined aggregate on a user-defined type which
> > doesn't support these operators, errors will raise. Obviously, this is a
> > problem.  Even if these operators are defined, the semantics of user-defined
> > aggregate functions might not match with the way of maintaining views, and
> > resultant might be incorrect.
> > 
> > I think there are at least three options to prevent these problems.
> > 
> > In the first option, we support only built-in aggregates which we know able
> > to handle correctly. Supported aggregates can be identified using their OIDs.
> > User-defined aggregates are not supported. I think this is the simplest and
> > easiest way.
> 
> I think this is enough for the first cut of IVM. So +1.

If there is no objection, I will add the check of aggregate functions
by this way. Thanks.

Regards,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Alvaro Herrera

Дата:

02 декабря 2019 г., 16:48:40

On 2019-Dec-02, Yugo Nagata wrote:

> On Mon, 02 Dec 2019 10:36:36 +0900 (JST)
> Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
> 
> > >> One thing pending in this development line is how to catalogue aggregate
> > >> functions that can be used in incrementally-maintainable views.
> > >> I saw a brief mention somewhere that the devels knew it needed to be
> > >> done, but I don't see in the thread that they got around to doing it.
> > >> Did you guys have any thoughts on how it can be represented in catalogs?
> > >> It seems sine-qua-non ...

> > > In the first option, we support only built-in aggregates which we know able
> > > to handle correctly. Supported aggregates can be identified using their OIDs.
> > > User-defined aggregates are not supported. I think this is the simplest and
> > > easiest way.
> > 
> > I think this is enough for the first cut of IVM. So +1.
> 
> If there is no objection, I will add the check of aggregate functions
> by this way. Thanks.

The way I imagine things is that there's (one or more) new column in
pg_aggregate that links to the operator(s) (or function(s)?) that
support incremental update of the MV for that aggregate function.  Is
that what you're proposing?

All that query-construction business in apply_delta() looks quite
suspicious.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Implementing Incremental View Maintenance

От

Yugo Nagata

Дата:

03 декабря 2019 г., 05:41:22

On Mon, 2 Dec 2019 13:48:40 -0300
Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

> On 2019-Dec-02, Yugo Nagata wrote:
> 
> > On Mon, 02 Dec 2019 10:36:36 +0900 (JST)
> > Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
> > 
> > > >> One thing pending in this development line is how to catalogue aggregate
> > > >> functions that can be used in incrementally-maintainable views.
> > > >> I saw a brief mention somewhere that the devels knew it needed to be
> > > >> done, but I don't see in the thread that they got around to doing it.
> > > >> Did you guys have any thoughts on how it can be represented in catalogs?
> > > >> It seems sine-qua-non ...
> 
> > > > In the first option, we support only built-in aggregates which we know able
> > > > to handle correctly. Supported aggregates can be identified using their OIDs.
> > > > User-defined aggregates are not supported. I think this is the simplest and
> > > > easiest way.
> > > 
> > > I think this is enough for the first cut of IVM. So +1.
> > 
> > If there is no objection, I will add the check of aggregate functions
> > by this way. Thanks.
> 
> The way I imagine things is that there's (one or more) new column in
> pg_aggregate that links to the operator(s) (or function(s)?) that
> support incremental update of the MV for that aggregate function.  Is
> that what you're proposing?

The way I am proposing above is using OID to check if a aggregate can be
used in IVM. This allows only a part of built-in aggreagete functions.

This way you mentioned was proposed as one of options as following.

On Fri, 29 Nov 2019 17:33:28 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:
> Third, we can add a new attribute to pg_aggregate which shows if each
> aggregate can be used in IVM. We don't need to use names or OIDs list of
> supported aggregates although we need modification of the system catalogue.
> 
> Regarding pg_aggregate, now we have aggcombinefn attribute for supporting
> partial aggregation. Maybe we could use combine functions to calculate new
> aggregate values in IVM when tuples are inserted into a table. However, in
> the context of IVM, we also need other function used when tuples are deleted
> from a table, so we can not use partial aggregation for IVM in the current
> implementation. This might be another option to implement "inverse combine
> function"(?) for IVM, but I am not sure it worth.

If we add "inverse combine function" in pg_aggregate that takes two results
of aggregating over tuples in a view and tuples in a delta, and produces a
result of aggregating over tuples in the view after tuples in the delta are
deleted from this, it would allow to calculate new aggregate values in IVM
using aggcombinefn together when the aggregate function provides both
functions.

Another idea is to use support functions for moving-aggregate mode which are
already provided in pg_aggregate. However, in this case, we have to apply
tuples in the delta to the view one by one instead of applying after
aggregating tuples in the delta.

In both case, we can not use these support functions in SQL via SPI because
the type of some aggregates is internal. We have to alter the current
apply_delta implementation if we adopt a way using these support functions.
Instead, we also can add support functions for IVM independent to partial
aggregate or moving-aggregate. Maybe this is also one of options.

Regards,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

nuko yokohama

Дата:

04 декабря 2019 г., 12:18:02

Hi.

I found the problem after running "ALTER MATERIALIZED VIEW ... RENAME TO".

2018年12月27日(木) 21:57 Yugo Nagata <nagata@sraoss.co.jp>:

Hi,

I would like to implement Incremental View Maintenance (IVM) on PostgreSQL.
IVM is a technique to maintain materialized views which computes and applies
only the incremental changes to the materialized views rather than
recomputate the contents as the current REFRESH command does.

I had a presentation on our PoC implementation of IVM at PGConf.eu 2018 [1].
Our implementation uses row OIDs to compute deltas for materialized views.
The basic idea is that if we have information about which rows in base tables
are contributing to generate a certain row in a matview then we can identify
the affected rows when a base table is updated. This is based on an idea of
Dr. Masunaga [2] who is a member of our group and inspired from ID-based
approach[3].

In our implementation, the mapping of the row OIDs of the materialized view
and the base tables are stored in "OID map". When a base relation is modified,
AFTER trigger is executed and the delta is recorded in delta tables using
the transition table feature. The accual udpate of the matview is triggerd
by REFRESH command with INCREMENTALLY option.

However, we realize problems of our implementation. First, WITH OIDS will
be removed since PG12, so OIDs are no longer available. Besides this, it would
be hard to implement this since it needs many changes of executor nodes to
collect base tables's OIDs during execuing a query. Also, the cost of maintaining
OID map would be high.

For these reasons, we started to think to implement IVM without relying on OIDs
and made a bit more surveys.

We also looked at Kevin Grittner's discussion [4] on incremental matview
maintenance. In this discussion, Kevin proposed to use counting algorithm [5]
to handle projection views (using DISTNICT) properly. This algorithm need an
additional system column, count_t, in materialized views and delta tables of
base tables.

However, the discussion about IVM is now stoped, so we would like to restart and
progress this.

Through our PoC inplementation and surveys, I think we need to think at least
the followings for implementing IVM.

1. How to extract changes on base tables

I think there would be at least two approaches for it.

- Using transition table in AFTER triggers
- Extracting changes from WAL using logical decoding

In our PoC implementation, we used AFTER trigger and transition tables, but using
logical decoding might be better from the point of performance of base table
modification.

If we can represent a change of UPDATE on a base table as query-like rather than
OLD and NEW, it may be possible to update the materialized view directly instead
of performing delete & insert.

2. How to compute the delta to be applied to materialized views

Essentially, IVM is based on relational algebra. Theorically, changes on base
tables are represented as deltas on this, like "R <- R + dR", and the delta on
the materialized view is computed using base table deltas based on "change
propagation equations". For implementation, we have to derive the equation from
the view definition query (Query tree, or Plan tree?) and describe this as SQL
query to compulte delta to be applied to the materialized view.

There could be several operations for view definition: selection, projection,
join, aggregation, union, difference, intersection, etc. If we can prepare a
module for each operation, it makes IVM extensable, so we can start a simple
view definition, and then support more complex views.

3. How to identify rows to be modifed in materialized views

When applying the delta to the materialized view, we have to identify which row
in the matview is corresponding to a row in the delta. A naive method is matching
by using all columns in a tuple, but clearly this is unefficient. If thematerialized
view has unique index, we can use this. Maybe, we have to force materialized views
to have all primary key colums in their base tables. In our PoC implementation, we
used OID to identify rows, but this will be no longer available as said above.

4. When to maintain materialized views

There are two candidates of the timing of maintenance, immediate (eager) or deferred.

In eager maintenance, the materialized view is updated in the same transaction
where the base table is updated. In deferred maintenance, this is done after the
transaction is commited, for example, when view is accessed, as a response to user
request, etc.

In the previous discussion[4], it is planned to start from "eager" approach. In our PoC
implementaion, we used the other aproach, that is, using REFRESH command to perform IVM.
I am not sure which is better as a start point, but I begin to think that the eager
approach may be more simple since we don't have to maintain base table changes in other
past transactions.

In the eager maintenance approache, we have to consider a race condition where two
different transactions change base tables simultaneously as discussed in [4].

[1] https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
[2] https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1 (Japanese only)
[3] https://dl.acm.org/citation.cfm?id=2750546
[4] https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
[5] https://dl.acm.org/citation.cfm?id=170066

Regards,
--
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

nuko yokohama

Дата:

04 декабря 2019 г., 12:43:09

2019年12月3日(火) 14:42 Yugo Nagata <nagata@sraoss.co.jp>:

On Mon, 2 Dec 2019 13:48:40 -0300
Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

> On 2019-Dec-02, Yugo Nagata wrote:
>
> > On Mon, 02 Dec 2019 10:36:36 +0900 (JST)
> > Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
> >
> > > >> One thing pending in this development line is how to catalogue aggregate
> > > >> functions that can be used in incrementally-maintainable views.
> > > >> I saw a brief mention somewhere that the devels knew it needed to be
> > > >> done, but I don't see in the thread that they got around to doing it.
> > > >> Did you guys have any thoughts on how it can be represented in catalogs?
> > > >> It seems sine-qua-non ...
>
> > > > In the first option, we support only built-in aggregates which we know able
> > > > to handle correctly. Supported aggregates can be identified using their OIDs.
> > > > User-defined aggregates are not supported. I think this is the simplest and
> > > > easiest way.
> > >
> > > I think this is enough for the first cut of IVM. So +1.
> >
> > If there is no objection, I will add the check of aggregate functions
> > by this way. Thanks.
>
> The way I imagine things is that there's (one or more) new column in
> pg_aggregate that links to the operator(s) (or function(s)?) that
> support incremental update of the MV for that aggregate function. Is
> that what you're proposing?

The way I am proposing above is using OID to check if a aggregate can be
used in IVM. This allows only a part of built-in aggreagete functions.

This way you mentioned was proposed as one of options as following.

On Fri, 29 Nov 2019 17:33:28 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:
> Third, we can add a new attribute to pg_aggregate which shows if each
> aggregate can be used in IVM. We don't need to use names or OIDs list of
> supported aggregates although we need modification of the system catalogue.
>
> Regarding pg_aggregate, now we have aggcombinefn attribute for supporting
> partial aggregation. Maybe we could use combine functions to calculate new
> aggregate values in IVM when tuples are inserted into a table. However, in
> the context of IVM, we also need other function used when tuples are deleted
> from a table, so we can not use partial aggregation for IVM in the current
> implementation. This might be another option to implement "inverse combine
> function"(?) for IVM, but I am not sure it worth.

If we add "inverse combine function" in pg_aggregate that takes two results
of aggregating over tuples in a view and tuples in a delta, and produces a
result of aggregating over tuples in the view after tuples in the delta are
deleted from this, it would allow to calculate new aggregate values in IVM
using aggcombinefn together when the aggregate function provides both
functions.

Another idea is to use support functions for moving-aggregate mode which are
already provided in pg_aggregate. However, in this case, we have to apply
tuples in the delta to the view one by one instead of applying after
aggregating tuples in the delta.

In both case, we can not use these support functions in SQL via SPI because
the type of some aggregates is internal. We have to alter the current
apply_delta implementation if we adopt a way using these support functions.
Instead, we also can add support functions for IVM independent to partial
aggregate or moving-aggregate. Maybe this is also one of options.

Regards,
Yugo Nagata

--
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Yugo Nagata

Дата:

05 декабря 2019 г., 01:19:51

On Wed, 4 Dec 2019 21:18:02 +0900
nuko yokohama <nuko.yokohama@gmail.com> wrote:

> Hi.
> 
> I found the problem after running "ALTER MATERIALIZED VIEW ... RENAME TO".
> If a view created with "CREATE INCREMENT MATERIALIZED VIEW" is renamed,
> subsequent INSERT operations to the base table will fail.
> 
> Error message.
> ```
> ERROR:  could not open relation with OID 0

Thank you for your pointing out this issue! This error occurs
because the view's OID is retrieved using the view name.
Considering that the name can be changed, this is obviously
wrong. We'll fix it.

Regards,
Yugo Nagata

> ```
> 
> Execution log.
> ```
> [ec2-user@ip-10-0-1-10 ivm]$ psql -U postgres test -e -f
> ~/test/ivm/alter_rename_bug.sql
> DROP TABLE IF EXISTS table_x CASCADE;
> psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:1: NOTICE:  drop cascades
> to materialized view group_imv
> DROP TABLE
> CREATE TABLE table_x AS
>    SELECT generate_series(1, 10000) AS id,
>    ROUND(random()::numeric * 100, 2) AS data,
>    CASE (random() * 5)::integer
>      WHEN 4 THEN 'group-a'
>      WHEN 3 THEN 'group-b'
>      ELSE 'group-c'
>    END AS part_key
> ;
> SELECT 10000
>                Table "public.table_x"
>   Column  |  Type   | Collation | Nullable | Default
> ----------+---------+-----------+----------+---------
>  id       | integer |           |          |
>  data     | numeric |           |          |
>  part_key | text    |           |          |
> 
> DROP MATERIALIZED VIEW IF EXISTS group_imv;
> psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:15: NOTICE:  materialized
> view "group_imv" does not exist, skipping
> DROP MATERIALIZED VIEW
> CREATE INCREMENTAL MATERIALIZED VIEW group_imv AS
> SELECT part_key, COUNT(*), MAX(data), MIN(data), SUM(data), AVG(data)
> FROM table_x
> GROUP BY part_key;
> SELECT 3
>                  List of relations
>  Schema |   Name    |       Type        |  Owner
> --------+-----------+-------------------+----------
>  public | group_imv | materialized view | postgres
>  public | table_x   | table             | postgres
> (2 rows)
> 
>              Materialized view "public.group_imv"
>       Column       |  Type   | Collation | Nullable | Default
> -------------------+---------+-----------+----------+---------
>  part_key          | text    |           |          |
>  count             | bigint  |           |          |
>  max               | numeric |           |          |
>  min               | numeric |           |          |
>  sum               | numeric |           |          |
>  avg               | numeric |           |          |
>  __ivm_count_max__ | bigint  |           |          |
>  __ivm_count_min__ | bigint  |           |          |
>  __ivm_count_sum__ | bigint  |           |          |
>  __ivm_count_avg__ | bigint  |           |          |
>  __ivm_sum_avg__   | numeric |           |          |
>  __ivm_count__     | bigint  |           |          |
> 
> SELECT * FROM group_imv ORDER BY part_key;
>  part_key | count |  max  | min  |    sum    |         avg
> ----------+-------+-------+------+-----------+---------------------
>  group-a  |  1966 | 99.85 | 0.05 |  98634.93 | 50.1703611393692777
>  group-b  |  2021 | 99.99 | 0.17 | 102614.02 | 50.7738842157347848
>  group-c  |  6013 | 99.99 | 0.02 | 300968.43 | 50.0529569266589057
> (3 rows)
> 
> ALTER MATERIALIZED VIEW group_imv RENAME TO group_imv2;
> ALTER MATERIALIZED VIEW
>                  List of relations
>  Schema |    Name    |       Type        |  Owner
> --------+------------+-------------------+----------
>  public | group_imv2 | materialized view | postgres
>  public | table_x    | table             | postgres
> (2 rows)
> 
>             Materialized view "public.group_imv2"
>       Column       |  Type   | Collation | Nullable | Default
> -------------------+---------+-----------+----------+---------
>  part_key          | text    |           |          |
>  count             | bigint  |           |          |
>  max               | numeric |           |          |
>  min               | numeric |           |          |
>  sum               | numeric |           |          |
>  avg               | numeric |           |          |
>  __ivm_count_max__ | bigint  |           |          |
>  __ivm_count_min__ | bigint  |           |          |
>  __ivm_count_sum__ | bigint  |           |          |
>  __ivm_count_avg__ | bigint  |           |          |
>  __ivm_sum_avg__   | numeric |           |          |
>  __ivm_count__     | bigint  |           |          |
> 
> SET client_min_messages = debug5;
> psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:30: DEBUG:
>  CommitTransaction(1) name: unnamed; blockState: STARTED; state:
> INPROGRESS, xid/subid/cid: 0/1/0
> SET
> INSERT INTO table_x VALUES (10000001, ROUND(random()::numeric * 100, 2),
> 'gruop_d');
> psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:33: DEBUG:
>  StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS,
> xid/subid/cid: 0/1/0
> psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:33: DEBUG:  relation
> "public.group_imv" does not exist
> psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:33: DEBUG:  relation
> "public.group_imv" does not exist
> psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:33: ERROR:  could not
> open relation with OID 0
> RESET client_min_messages;
> psql:/home/ec2-user/test/ivm/alter_rename_bug.sql:34: DEBUG:
>  StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS,
> xid/subid/cid: 0/1/0
> RESET
> SELECT * FROM group_imv2 ORDER BY part_key;
>  part_key | count |  max  | min  |    sum    |         avg
> ----------+-------+-------+------+-----------+---------------------
>  group-a  |  1966 | 99.85 | 0.05 |  98634.93 | 50.1703611393692777
>  group-b  |  2021 | 99.99 | 0.17 | 102614.02 | 50.7738842157347848
>  group-c  |  6013 | 99.99 | 0.02 | 300968.43 | 50.0529569266589057
> (3 rows)
> 
> ALTER MATERIALIZED VIEW group_imv2 RENAME TO group_imv;
> ALTER MATERIALIZED VIEW
> INSERT INTO table_x VALUES (10000001, ROUND(random()::numeric * 100, 2),
> 'gruop_d');
> INSERT 0 1
> SELECT * FROM group_imv ORDER BY part_key;
>  part_key | count |  max  |  min  |    sum    |         avg
> ----------+-------+-------+-------+-----------+---------------------
>  group-a  |  1966 | 99.85 |  0.05 |  98634.93 | 50.1703611393692777
>  group-b  |  2021 | 99.99 |  0.17 | 102614.02 | 50.7738842157347848
>  group-c  |  6013 | 99.99 |  0.02 | 300968.43 | 50.0529569266589057
>  gruop_d  |     1 | 81.43 | 81.43 |     81.43 | 81.4300000000000000
> (4 rows)
> 
> [ec2-user@ip-10-0-1-10 ivm]$
> ```
> 
> This may be because IVM internal information is not modified when the view
> name is renamed.
> 
> 2018年12月27日(木) 21:57 Yugo Nagata <nagata@sraoss.co.jp>:
> 
> > Hi,
> >
> > I would like to implement Incremental View Maintenance (IVM) on
> > PostgreSQL.
> > IVM is a technique to maintain materialized views which computes and
> > applies
> > only the incremental changes to the materialized views rather than
> > recomputate the contents as the current REFRESH command does.
> >
> > I had a presentation on our PoC implementation of IVM at PGConf.eu 2018
> > [1].
> > Our implementation uses row OIDs to compute deltas for materialized
> > views.
> > The basic idea is that if we have information about which rows in base
> > tables
> > are contributing to generate a certain row in a matview then we can
> > identify
> > the affected rows when a base table is updated. This is based on an idea of
> > Dr. Masunaga [2] who is a member of our group and inspired from ID-based
> > approach[3].
> >
> > In our implementation, the mapping of the row OIDs of the materialized view
> > and the base tables are stored in "OID map". When a base relation is
> > modified,
> > AFTER trigger is executed and the delta is recorded in delta tables using
> > the transition table feature. The accual udpate of the matview is triggerd
> > by REFRESH command with INCREMENTALLY option.
> >
> > However, we realize problems of our implementation. First, WITH OIDS will
> > be removed since PG12, so OIDs are no longer available. Besides this, it
> > would
> > be hard to implement this since it needs many changes of executor nodes to
> > collect base tables's OIDs during execuing a query. Also, the cost of
> > maintaining
> > OID map would be high.
> >
> > For these reasons, we started to think to implement IVM without relying on
> > OIDs
> > and made a bit more surveys.
> >
> > We also looked at Kevin Grittner's discussion [4] on incremental matview
> > maintenance.  In this discussion, Kevin proposed to use counting algorithm
> > [5]
> > to handle projection views (using DISTNICT) properly. This algorithm need
> > an
> > additional system column, count_t, in materialized views and delta tables
> > of
> > base tables.
> >
> > However, the discussion about IVM is now stoped, so we would like to
> > restart and
> > progress this.
> >
> >
> > Through our PoC inplementation and surveys, I think we need to think at
> > least
> > the followings for implementing IVM.
> >
> > 1. How to extract changes on base tables
> >
> > I think there would be at least two approaches for it.
> >
> >  - Using transition table in AFTER triggers
> >  - Extracting changes from WAL using logical decoding
> >
> > In our PoC implementation, we used AFTER trigger and transition tables,
> > but using
> > logical decoding might be better from the point of performance of base
> > table
> > modification.
> >
> > If we can represent a change of UPDATE on a base table as query-like
> > rather than
> > OLD and NEW, it may be possible to update the materialized view directly
> > instead
> > of performing delete & insert.
> >
> >
> > 2. How to compute the delta to be applied to materialized views
> >
> > Essentially, IVM is based on relational algebra. Theorically, changes on
> > base
> > tables are represented as deltas on this, like "R <- R + dR", and the
> > delta on
> > the materialized view is computed using base table deltas based on "change
> > propagation equations".  For implementation, we have to derive the
> > equation from
> > the view definition query (Query tree, or Plan tree?) and describe this as
> > SQL
> > query to compulte delta to be applied to the materialized view.
> >
> > There could be several operations for view definition: selection,
> > projection,
> > join,  aggregation, union, difference, intersection, etc.  If we can
> > prepare a
> > module for each operation, it makes IVM extensable, so we can start a
> > simple
> > view definition, and then support more complex views.
> >
> >
> > 3. How to identify rows to be modifed in materialized views
> >
> > When applying the delta to the materialized view, we have to identify
> > which row
> > in the matview is corresponding to a row in the delta.  A naive method is
> > matching
> > by using all columns in a tuple, but clearly this is unefficient. If
> > thematerialized
> > view has unique index, we can use this. Maybe, we have to force
> > materialized views
> > to have all primary key colums in their base tables.  In our PoC
> > implementation, we
> > used OID to identify rows, but this will be no longer available as said
> > above.
> >
> >
> > 4. When to maintain materialized views
> >
> > There are two candidates of the timing of maintenance, immediate (eager)
> > or deferred.
> >
> > In eager maintenance, the materialized view is updated in the same
> > transaction
> > where the base table is updated. In deferred maintenance, this is done
> > after the
> > transaction is commited, for example, when view is accessed, as a response
> > to user
> > request, etc.
> >
> > In the previous discussion[4], it is planned to start from "eager"
> > approach. In our PoC
> > implementaion, we used the other aproach, that is, using REFRESH command
> > to perform IVM.
> > I am not sure which is better as a start point, but I begin to think that
> > the eager
> > approach may be more simple since we don't have to maintain base table
> > changes in other
> > past transactions.
> >
> > In the eager maintenance approache, we have to consider a race condition
> > where two
> > different transactions change base tables simultaneously as discussed in
> > [4].
> >
> >
> > [1]
> >
https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
> > [2]
> >
https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1
> > (Japanese only)
> > [3] https://dl.acm.org/citation.cfm?id=2750546
> > [4]
> > https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
> > [5] https://dl.acm.org/citation.cfm?id=170066
> >
> > Regards,
> > --
> > Yugo Nagata <nagata@sraoss.co.jp>
> >
> >


-- 
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Yugo Nagata

Дата:

20 декабря 2019 г., 05:02:32

Hi,

Attached is the latest patch (v10) to add support for Incremental
Materialized View Maintenance (IVM).   

IVM is a way to make materialized views up-to-date in which only
incremental changes are computed and applied on views rather than
recomputing the contents from scratch as REFRESH MATERIALIZED VIEW
does. IVM can update materialized views more efficiently
than recomputation when only small part of the view need updates.

There are two approaches with regard to timing of view maintenance:
immediate and deferred. In immediate maintenance, views are updated in
the same transaction where its base table is modified. In deferred
maintenance, views are updated after the transaction is committed,
for example, when the view is accessed, as a response to user command
like REFRESH, or periodically in background, and so on. 

This patch implements a kind of immediate maintenance, in which
materialized views are updated immediately in AFTER triggers when a
base table is modified.

This supports views using:
 - inner and outer joins including self-join
 - some built-in aggregate functions (count, sum, agv, min, max)
 - a part of subqueries
   -- simple subqueries in FROM clause
   -- EXISTS subqueries in WHERE clause
 - DISTINCT and views with tuple duplicates

===
Here are major changes we made after the previous submitted patch:

* Aggregate functions are checked if they can be used in IVM 
  using their OID. Per comments from Alvaro Herrera.

  For this purpose, Gen_fmgrtab.pl was modified so that OIDs of
  aggregate functions are output to fmgroids.h.

* Some bug fixes including:

 - Mistake of tab-completion of psql pointed out by nuko-san
 - A bug relating rename of matview pointed out by nuko-san
 - spelling errors
 - etc.

* Add documentations for IVM

* Patch is splited into eleven parts to make review easier
  as suggested by Amit Langote:

 - 0001: Add a new syntax:
             CREATE INCREMENTAL MATERIALIZED VIEW
 - 0002: Add a new column relisivm to pg_class
 - 0003: Change trigger.c to allow to prolong life span of tupestores
         containing Transition Tables generated via AFTER trigger
 - 0004: Add the basic IVM future using counting algorithm:
         This supports inner joins, DISTINCT, and tuple duplicates.
 - 0005: Change GEN_fmgrtab.pl to output aggregate function's OIDs
 - 0006: Add aggregates support for IVM
 - 0007: Add subqueries support for IVM
 - 0008: Add outer joins support for IVM
 - 0009: Add IVM support to psql command
 - 0010: Add regression tests for IVM
 - 0011: Add documentations for IVM

===
Todo:

Currently, REFRESH and pg_dump/pg_restore is not supported, but
we are working on them.

Also, TRUNCATE is not supported. When TRUNCATE command is executed
on a base table, nothing occurs on materialized views. We are
now considering another better options, like:

- Raise an error or warning when a base table is TRUNCATEed.
- Make the view non-scannable (like WITH NO DATA)
- Update the view in some ways. It would be easy for inner joins
  or aggregate views, but there is some difficult with outer joins.


Regards,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>

Вложения

IVM_patches_v10.tar.gz

RE: Implementing Incremental View Maintenance

От

"tsunakawa.takay@fujitsu.com"

Дата:

20 декабря 2019 г., 07:12:23

Hello,


I'm starting to take a closer look at this feature.  I've just finished reading the discussion, excluding other
referencedmaterials. 

The following IVM wiki page returns an error.  Does anybody know what's wrong?

https://wiki.postgresql.org/wiki/Incremental_View_Maintenance

[screen]
----------
MediaWiki internal error.

 Exception caught inside exception handler.

 Set $wgShowExceptionDetails = true; at the bottom of LocalSettings.php to show detailed debugging information.
----------


Could you give some concrete use cases, so that I can have a clearer image of the target data?  In the discussion,
someonereferred to master data with low update frequency, because the proposed IVM implementation adds triggers on
sourcetables, which limits the applicability to update-heavy tables. 


Regards
Takayuki Tsunakawa

Re: Implementing Incremental View Maintenance

От

Tatsuo Ishii

Дата:

20 декабря 2019 г., 08:10:58

> I'm starting to take a closer look at this feature.  I've just finished reading the discussion, excluding other
referencedmaterials.
 

Thank you!

> The following IVM wiki page returns an error.  Does anybody know what's wrong?
> 
> https://wiki.postgresql.org/wiki/Incremental_View_Maintenance

I don't have any problem with the page. Maybe temporary error?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: Implementing Incremental View Maintenance

От

nuko yokohama

Дата:

22 декабря 2019 г., 11:54:41

SELECT statement that is not IMMUTABLE must not be specified when creating a view.

An expression SELECT statement that is not IMMUTABLE must not be specified when creating a view.

In the current implementation, a SELECT statement containing an expression that is not IMMUTABLE can be specified when creating a view.
If an incremental materialized view is created from a SELECT statement that contains an expression that is not IMMUTABLE, applying the SELECT statement to the view returns incorrect results.
To prevent this, we propose that the same error occur when a non-IMMUTABLE expression is specified in the "CREATE INDEX" statement.

The following is an inappropriate example.

----

----

2018年12月27日(木) 21:57 Yugo Nagata <nagata@sraoss.co.jp>:

Hi,

I would like to implement Incremental View Maintenance (IVM) on PostgreSQL.
IVM is a technique to maintain materialized views which computes and applies
only the incremental changes to the materialized views rather than
recomputate the contents as the current REFRESH command does.

I had a presentation on our PoC implementation of IVM at PGConf.eu 2018 [1].
Our implementation uses row OIDs to compute deltas for materialized views.
The basic idea is that if we have information about which rows in base tables
are contributing to generate a certain row in a matview then we can identify
the affected rows when a base table is updated. This is based on an idea of
Dr. Masunaga [2] who is a member of our group and inspired from ID-based
approach[3].

In our implementation, the mapping of the row OIDs of the materialized view
and the base tables are stored in "OID map". When a base relation is modified,
AFTER trigger is executed and the delta is recorded in delta tables using
the transition table feature. The accual udpate of the matview is triggerd
by REFRESH command with INCREMENTALLY option.

However, we realize problems of our implementation. First, WITH OIDS will
be removed since PG12, so OIDs are no longer available. Besides this, it would
be hard to implement this since it needs many changes of executor nodes to
collect base tables's OIDs during execuing a query. Also, the cost of maintaining
OID map would be high.

For these reasons, we started to think to implement IVM without relying on OIDs
and made a bit more surveys.

We also looked at Kevin Grittner's discussion [4] on incremental matview
maintenance. In this discussion, Kevin proposed to use counting algorithm [5]
to handle projection views (using DISTNICT) properly. This algorithm need an
additional system column, count_t, in materialized views and delta tables of
base tables.

However, the discussion about IVM is now stoped, so we would like to restart and
progress this.

Through our PoC inplementation and surveys, I think we need to think at least
the followings for implementing IVM.

1. How to extract changes on base tables

I think there would be at least two approaches for it.

- Using transition table in AFTER triggers
- Extracting changes from WAL using logical decoding

In our PoC implementation, we used AFTER trigger and transition tables, but using
logical decoding might be better from the point of performance of base table
modification.

If we can represent a change of UPDATE on a base table as query-like rather than
OLD and NEW, it may be possible to update the materialized view directly instead
of performing delete & insert.

2. How to compute the delta to be applied to materialized views

Essentially, IVM is based on relational algebra. Theorically, changes on base
tables are represented as deltas on this, like "R <- R + dR", and the delta on
the materialized view is computed using base table deltas based on "change
propagation equations". For implementation, we have to derive the equation from
the view definition query (Query tree, or Plan tree?) and describe this as SQL
query to compulte delta to be applied to the materialized view.

There could be several operations for view definition: selection, projection,
join, aggregation, union, difference, intersection, etc. If we can prepare a
module for each operation, it makes IVM extensable, so we can start a simple
view definition, and then support more complex views.

3. How to identify rows to be modifed in materialized views

When applying the delta to the materialized view, we have to identify which row
in the matview is corresponding to a row in the delta. A naive method is matching
by using all columns in a tuple, but clearly this is unefficient. If thematerialized
view has unique index, we can use this. Maybe, we have to force materialized views
to have all primary key colums in their base tables. In our PoC implementation, we
used OID to identify rows, but this will be no longer available as said above.

4. When to maintain materialized views

There are two candidates of the timing of maintenance, immediate (eager) or deferred.

In eager maintenance, the materialized view is updated in the same transaction
where the base table is updated. In deferred maintenance, this is done after the
transaction is commited, for example, when view is accessed, as a response to user
request, etc.

In the previous discussion[4], it is planned to start from "eager" approach. In our PoC
implementaion, we used the other aproach, that is, using REFRESH command to perform IVM.
I am not sure which is better as a start point, but I begin to think that the eager
approach may be more simple since we don't have to maintain base table changes in other
past transactions.

In the eager maintenance approache, we have to consider a race condition where two
different transactions change base tables simultaneously as discussed in [4].

[1] https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
[2] https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1 (Japanese only)
[3] https://dl.acm.org/citation.cfm?id=2750546
[4] https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
[5] https://dl.acm.org/citation.cfm?id=170066

Regards,
--
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

legrand legrand

Дата:

22 декабря 2019 г., 20:22:50

Hello,

First of all many thanks for this Great feature 
replacing so many triggers by a so simple syntax ;o)

I was wondering about performances and add a look 
at pg_stat_statements (with track=all) with IVM_v9.patch.

For each insert into a base table there are 3 statements:
- ANALYZE pg_temp_3.pg_temp_81976
- WITH updt AS (  UPDATE public.mv1 AS mv SET __ivm_count__ = ...
- DROP TABLE pg_temp_3.pg_temp_81976

It generates a lot of lines in pg_stat_statements with calls = 1.
Thoses statements can not be shared because the temp table is dropped each
time.

Is there a plan to change this ?

Many Thanks again

Regards
PAscal



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

RE: Implementing Incremental View Maintenance

От

"tsunakawa.takay@fujitsu.com"

Дата:

22 декабря 2019 г., 23:38:43

From: Tatsuo Ishii <ishii@sraoss.co.jp>
> > The following IVM wiki page returns an error.  Does anybody know what's
> wrong?
> >
> > https://wiki.postgresql.org/wiki/Incremental_View_Maintenance
>
> I don't have any problem with the page. Maybe temporary error?

Yeah, I can see it now.  I could see it on the weekend.  The page was not available for at least an hour or so when I
askedabout this.  I thought the Pgsql-www team kindly solved the issue. 


Regards
Takayuki Tsunakawa

Re: Implementing Incremental View Maintenance

От

Yugo Nagata

Дата:

23 декабря 2019 г., 01:07:16

On Sun, 22 Dec 2019 20:54:41 +0900
nuko yokohama <nuko.yokohama@gmail.com> wrote:

> SELECT statement that is not IMMUTABLE must not be specified when creating
> a view.
> 
> An expression SELECT statement that is not IMMUTABLE must not be specified
> when creating a view.
> 
> In the current implementation, a SELECT statement containing an expression
> that is not IMMUTABLE can be specified when creating a view.
> If an incremental materialized view is created from a SELECT statement that
> contains an expression that is not IMMUTABLE, applying the SELECT statement
> to the view returns incorrect results.
> To prevent this, we propose that the same error occur when a non-IMMUTABLE
> expression is specified in the "CREATE INDEX" statement.

Thank you for pointing out this. That makes sense.  The check of not-IMMUTABLE
epressions is missing at creating IMMV.  We'll add this.

Thanks,
Yugo Nagata

> 
> The following is an inappropriate example.
> ----
> CREATE TABLE base (id int primary key, data text, ts timestamp);
> CREATE TABLE
> CREATE VIEW base_v AS SELECT * FROM base
>   WHERE ts >= (now() - '3 second'::interval);
> CREATE VIEW
> CREATE MATERIALIZED VIEW base_mv AS SELECT * FROM base
>   WHERE ts >= (now() - '3 second'::interval);
> SELECT 0
> CREATE INCREMENTAL MATERIALIZED VIEW base_imv AS SELECT * FROM base
>   WHERE ts >= (now() - '3 second'::interval);
> SELECT 0
>                                       View "public.base_v"
>  Column |            Type             | Collation | Nullable | Default |
> Storage  | Description
> --------+-----------------------------+-----------+----------+---------+----------+-------------
>  id     | integer                     |           |          |         |
> plain    |
>  data   | text                        |           |          |         |
> extended |
>  ts     | timestamp without time zone |           |          |         |
> plain    |
> View definition:
>  SELECT base.id,
>     base.data,
>     base.ts
>    FROM base
>   WHERE base.ts >= (now() - '00:00:03'::interval);
> 
>                                       Materialized view "public.base_mv"
>  Column |            Type             | Collation | Nullable | Default |
> Storage  | Stats target | Description
> --------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
>  id     | integer                     |           |          |         |
> plain    |              |
>  data   | text                        |           |          |         |
> extended |              |
>  ts     | timestamp without time zone |           |          |         |
> plain    |              |
> View definition:
>  SELECT base.id,
>     base.data,
>     base.ts
>    FROM base
>   WHERE base.ts >= (now() - '00:00:03'::interval);
> Access method: heap
> 
>                                          Materialized view "public.base_imv"
>     Column     |            Type             | Collation | Nullable |
> Default | Storage  | Stats target | Description
>
---------------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
>  id            | integer                     |           |          |
>   | plain    |              |
>  data          | text                        |           |          |
>   | extended |              |
>  ts            | timestamp without time zone |           |          |
>   | plain    |              |
>  __ivm_count__ | bigint                      |           |          |
>   | plain    |              |
> View definition:
>  SELECT base.id,
>     base.data,
>     base.ts
>    FROM base
>   WHERE base.ts >= (now() - '00:00:03'::interval);
> Access method: heap
> Incremental view maintenance: yes
> 
> INSERT INTO base VALUES (generate_series(1,3), 'dummy', clock_timestamp());
> INSERT 0 3
> SELECT * FROM base_v ORDER BY id;
>  id | data  |             ts
> ----+-------+----------------------------
>   1 | dummy | 2019-12-22 11:38:26.367481
>   2 | dummy | 2019-12-22 11:38:26.367599
>   3 | dummy | 2019-12-22 11:38:26.367606
> (3 rows)
> 
> SELECT * FROM base_mv ORDER BY id;
>  id | data | ts
> ----+------+----
> (0 rows)
> 
> REFRESH MATERIALIZED VIEW base_mv;
> REFRESH MATERIALIZED VIEW
> SELECT * FROM base_mv ORDER BY id;
>  id | data  |             ts
> ----+-------+----------------------------
>   1 | dummy | 2019-12-22 11:38:26.367481
>   2 | dummy | 2019-12-22 11:38:26.367599
>   3 | dummy | 2019-12-22 11:38:26.367606
> (3 rows)
> 
> SELECT * FROM base_imv ORDER BY id;
>  id | data  |             ts
> ----+-------+----------------------------
>   1 | dummy | 2019-12-22 11:38:26.367481
>   2 | dummy | 2019-12-22 11:38:26.367599
>   3 | dummy | 2019-12-22 11:38:26.367606
> (3 rows)
> 
> SELECT pg_sleep(3);
>  pg_sleep
> ----------
> 
> (1 row)
> 
> INSERT INTO base VALUES (generate_series(4,6), 'dummy', clock_timestamp());
> INSERT 0 3
> SELECT * FROM base_v ORDER BY id;
>  id | data  |             ts
> ----+-------+----------------------------
>   4 | dummy | 2019-12-22 11:38:29.381414
>   5 | dummy | 2019-12-22 11:38:29.381441
>   6 | dummy | 2019-12-22 11:38:29.381444
> (3 rows)
> 
> SELECT * FROM base_mv ORDER BY id;
>  id | data  |             ts
> ----+-------+----------------------------
>   1 | dummy | 2019-12-22 11:38:26.367481
>   2 | dummy | 2019-12-22 11:38:26.367599
>   3 | dummy | 2019-12-22 11:38:26.367606
> (3 rows)
> 
> REFRESH MATERIALIZED VIEW base_mv;
> REFRESH MATERIALIZED VIEW
> SELECT * FROM base_mv ORDER BY id;
>  id | data  |             ts
> ----+-------+----------------------------
>   4 | dummy | 2019-12-22 11:38:29.381414
>   5 | dummy | 2019-12-22 11:38:29.381441
>   6 | dummy | 2019-12-22 11:38:29.381444
> (3 rows)
> 
> SELECT * FROM base_imv ORDER BY id;
>  id | data  |             ts
> ----+-------+----------------------------
>   1 | dummy | 2019-12-22 11:38:26.367481
>   2 | dummy | 2019-12-22 11:38:26.367599
>   3 | dummy | 2019-12-22 11:38:26.367606
>   4 | dummy | 2019-12-22 11:38:29.381414
>   5 | dummy | 2019-12-22 11:38:29.381441
>   6 | dummy | 2019-12-22 11:38:29.381444
> (6 rows)
> 
> REFRESH MATERIALIZED VIEW base_mv;
> REFRESH MATERIALIZED VIEW
> SELECT * FROM base_imv ORDER BY id;
>  id | data  |             ts
> ----+-------+----------------------------
>   1 | dummy | 2019-12-22 11:38:26.367481
>   2 | dummy | 2019-12-22 11:38:26.367599
>   3 | dummy | 2019-12-22 11:38:26.367606
>   4 | dummy | 2019-12-22 11:38:29.381414
>   5 | dummy | 2019-12-22 11:38:29.381441
>   6 | dummy | 2019-12-22 11:38:29.381444
> (6 rows)
> ----
> 
> 2018年12月27日(木) 21:57 Yugo Nagata <nagata@sraoss.co.jp>:
> 
> > Hi,
> >
> > I would like to implement Incremental View Maintenance (IVM) on
> > PostgreSQL.
> > IVM is a technique to maintain materialized views which computes and
> > applies
> > only the incremental changes to the materialized views rather than
> > recomputate the contents as the current REFRESH command does.
> >
> > I had a presentation on our PoC implementation of IVM at PGConf.eu 2018
> > [1].
> > Our implementation uses row OIDs to compute deltas for materialized
> > views.
> > The basic idea is that if we have information about which rows in base
> > tables
> > are contributing to generate a certain row in a matview then we can
> > identify
> > the affected rows when a base table is updated. This is based on an idea of
> > Dr. Masunaga [2] who is a member of our group and inspired from ID-based
> > approach[3].
> >
> > In our implementation, the mapping of the row OIDs of the materialized view
> > and the base tables are stored in "OID map". When a base relation is
> > modified,
> > AFTER trigger is executed and the delta is recorded in delta tables using
> > the transition table feature. The accual udpate of the matview is triggerd
> > by REFRESH command with INCREMENTALLY option.
> >
> > However, we realize problems of our implementation. First, WITH OIDS will
> > be removed since PG12, so OIDs are no longer available. Besides this, it
> > would
> > be hard to implement this since it needs many changes of executor nodes to
> > collect base tables's OIDs during execuing a query. Also, the cost of
> > maintaining
> > OID map would be high.
> >
> > For these reasons, we started to think to implement IVM without relying on
> > OIDs
> > and made a bit more surveys.
> >
> > We also looked at Kevin Grittner's discussion [4] on incremental matview
> > maintenance.  In this discussion, Kevin proposed to use counting algorithm
> > [5]
> > to handle projection views (using DISTNICT) properly. This algorithm need
> > an
> > additional system column, count_t, in materialized views and delta tables
> > of
> > base tables.
> >
> > However, the discussion about IVM is now stoped, so we would like to
> > restart and
> > progress this.
> >
> >
> > Through our PoC inplementation and surveys, I think we need to think at
> > least
> > the followings for implementing IVM.
> >
> > 1. How to extract changes on base tables
> >
> > I think there would be at least two approaches for it.
> >
> >  - Using transition table in AFTER triggers
> >  - Extracting changes from WAL using logical decoding
> >
> > In our PoC implementation, we used AFTER trigger and transition tables,
> > but using
> > logical decoding might be better from the point of performance of base
> > table
> > modification.
> >
> > If we can represent a change of UPDATE on a base table as query-like
> > rather than
> > OLD and NEW, it may be possible to update the materialized view directly
> > instead
> > of performing delete & insert.
> >
> >
> > 2. How to compute the delta to be applied to materialized views
> >
> > Essentially, IVM is based on relational algebra. Theorically, changes on
> > base
> > tables are represented as deltas on this, like "R <- R + dR", and the
> > delta on
> > the materialized view is computed using base table deltas based on "change
> > propagation equations".  For implementation, we have to derive the
> > equation from
> > the view definition query (Query tree, or Plan tree?) and describe this as
> > SQL
> > query to compulte delta to be applied to the materialized view.
> >
> > There could be several operations for view definition: selection,
> > projection,
> > join,  aggregation, union, difference, intersection, etc.  If we can
> > prepare a
> > module for each operation, it makes IVM extensable, so we can start a
> > simple
> > view definition, and then support more complex views.
> >
> >
> > 3. How to identify rows to be modifed in materialized views
> >
> > When applying the delta to the materialized view, we have to identify
> > which row
> > in the matview is corresponding to a row in the delta.  A naive method is
> > matching
> > by using all columns in a tuple, but clearly this is unefficient. If
> > thematerialized
> > view has unique index, we can use this. Maybe, we have to force
> > materialized views
> > to have all primary key colums in their base tables.  In our PoC
> > implementation, we
> > used OID to identify rows, but this will be no longer available as said
> > above.
> >
> >
> > 4. When to maintain materialized views
> >
> > There are two candidates of the timing of maintenance, immediate (eager)
> > or deferred.
> >
> > In eager maintenance, the materialized view is updated in the same
> > transaction
> > where the base table is updated. In deferred maintenance, this is done
> > after the
> > transaction is commited, for example, when view is accessed, as a response
> > to user
> > request, etc.
> >
> > In the previous discussion[4], it is planned to start from "eager"
> > approach. In our PoC
> > implementaion, we used the other aproach, that is, using REFRESH command
> > to perform IVM.
> > I am not sure which is better as a start point, but I begin to think that
> > the eager
> > approach may be more simple since we don't have to maintain base table
> > changes in other
> > past transactions.
> >
> > In the eager maintenance approache, we have to consider a race condition
> > where two
> > different transactions change base tables simultaneously as discussed in
> > [4].
> >
> >
> > [1]
> >
https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
> > [2]
> >
https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1
> > (Japanese only)
> > [3] https://dl.acm.org/citation.cfm?id=2750546
> > [4]
> > https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
> > [5] https://dl.acm.org/citation.cfm?id=170066
> >
> > Regards,
> > --
> > Yugo Nagata <nagata@sraoss.co.jp>
> >
> >


-- 
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Tatsuo Ishii

Дата:

23 декабря 2019 г., 01:50:50

> Could you give some concrete use cases, so that I can have a clearer image of the target data? In the discussion,
someonereferred to master data with low update frequency, because the proposed IVM implementation adds triggers on
sourcetables, which limits the applicability to update-heavy tables.

But if you want to get always up-to-data you need to pay the cost for
REFRESH MATERIALIZED VIEW. IVM gives a choice here.

pgbench -s 100
create materialized view mv1 as select count(*) from pgbench_accounts;
create incremental materialized view mv2 as select count(*) from pgbench_accounts;

Now I delete one row from pgbench_accounts.

test=# delete from pgbench_accounts where aid = 10000000;
DELETE 1
Time: 12.387 ms

Of course this makes mv1's data obsolete:
test=# select * from mv1;
count
----------
10000000
(1 row)

To reflect the fact on mv1 that a row was deleted from
pgbench_accounts, you need to refresh mv1:

test=# refresh materialized view mv1;
REFRESH MATERIALIZED VIEW
Time: 788.757 ms

which takes 788ms. With mv2 you don't need to pay this cost to get the
latest data.

This is kind of ideal use case for IVM and I do not claim that IVM
always wins over ordinary materialized view (or non materialized
view). IVM will give benefit in that a materialized view instantly
updated whenever base tables get updated with a cost of longer update
time on base tables.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

RE: Implementing Incremental View Maintenance

От

"tsunakawa.takay@fujitsu.com"

Дата:

23 декабря 2019 г., 02:26:09

From: legrand legrand <legrand_legrand@hotmail.com>
> For each insert into a base table there are 3 statements:
> - ANALYZE pg_temp_3.pg_temp_81976
> - WITH updt AS (  UPDATE public.mv1 AS mv SET __ivm_count__ = ...
> - DROP TABLE pg_temp_3.pg_temp_81976

Does it also include CREATE TEMPORARY TABLE, because there's DROP?

I remember that repeated CREATE and DROP of temporary tables should be avoided in PostgreSQL.  Dropped temporary tables
leavesome unused memory in CacheMemoryContext.  If creation and deletion of temporary tables are done per row in a
singlesession, say loading of large amount of data, memory bloat could crash the OS.  That actually happened at a
user'senvironment. 

Plus, repeated create/drop may cause system catalog bloat as well even when they are performed in different sessions.
Ina fortunate case, the garbage records gather at the end of the system tables, and autovacuum will free those empty
areasby truncating data files.  However, if some valid entry persists after the long garbage area, the system tables
wouldremain bloated. 

What kind of workload and data are you targeting with IVM?


Regards
Takayuki Tsunakawa

Re: Implementing Incremental View Maintenance

От

Yugo Nagata

Дата:

23 декабря 2019 г., 06:50:58

On Mon, 23 Dec 2019 02:26:09 +0000
"tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote:

> From: legrand legrand <legrand_legrand@hotmail.com>
> > For each insert into a base table there are 3 statements:
> > - ANALYZE pg_temp_3.pg_temp_81976
> > - WITH updt AS (  UPDATE public.mv1 AS mv SET __ivm_count__ = ...
> > - DROP TABLE pg_temp_3.pg_temp_81976
> 
> Does it also include CREATE TEMPORARY TABLE, because there's DROP?

CREATE TEMPRARY TABLE is not called because temptables are created
by make_new_heap() instead of queries via SPI.

> I remember that repeated CREATE and DROP of temporary tables should be avoided in PostgreSQL.  Dropped temporary
tablesleave some unused memory in CacheMemoryContext.  If creation and deletion of temporary tables are done per row in
asingle session, say loading of large amount of data, memory bloat could crash the OS.  That actually happened at a
user'senvironment.

> Plus, repeated create/drop may cause system catalog bloat as well even when they are performed in different sessions.
In a fortunate case, the garbage records gather at the end of the system tables, and autovacuum will free those empty
areasby truncating data files.  However, if some valid entry persists after the long garbage area, the system tables
wouldremain bloated.

Thank you for explaining the problem. I understood that creating and
dropping temprary tables is harmful more than I have thought. Although
this is not a concrete plan, there are two ideas to reduce creating
temporary tables:

1. Create a temporary table only once at the first view maintenance in
this session. This is possible if we store names or oid of temporary
tables used for each materialized view in memory. However, users may
access to these temptables whenever during the session.

2. Use tuplestores instead of temprary tables. Tuplestores can be
converted to Ephemeral Name Relation (ENR) and used in queries.
 It doesn't need updating system catalogs, but indexes can not be
used to access.

> 
> What kind of workload and data are you targeting with IVM?

IVM (with immediate maintenance approach) would be efficient
in situations where modifications on base tables are not frequent. 
In such situations, create and drop of temptalbes is not so
frequent either, but it would be still possible that the problem
you concern occurs. So, it seems worth to consider the way to
reduce use of temptable.

Regards,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Julien Rouhaud

Дата:

23 декабря 2019 г., 07:22:20

On Mon, Dec 23, 2019 at 7:51 AM Yugo Nagata <nagata@sraoss.co.jp> wrote:
>
> On Mon, 23 Dec 2019 02:26:09 +0000
> "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote:
>
> > From: legrand legrand <legrand_legrand@hotmail.com>
> > > For each insert into a base table there are 3 statements:
> > > - ANALYZE pg_temp_3.pg_temp_81976
> > > - WITH updt AS (  UPDATE public.mv1 AS mv SET __ivm_count__ = ...
> > > - DROP TABLE pg_temp_3.pg_temp_81976
> >
> > Does it also include CREATE TEMPORARY TABLE, because there's DROP?
>
> CREATE TEMPRARY TABLE is not called because temptables are created
> by make_new_heap() instead of queries via SPI.
>
> > I remember that repeated CREATE and DROP of temporary tables should be avoided in PostgreSQL.  Dropped temporary
tablesleave some unused memory in CacheMemoryContext.  If creation and deletion of temporary tables are done per row in
asingle session, say loading of large amount of data, memory bloat could crash the OS.  That actually happened at a
user'senvironment. 
>
> > Plus, repeated create/drop may cause system catalog bloat as well even when they are performed in different
sessions. In a fortunate case, the garbage records gather at the end of the system tables, and autovacuum will free
thoseempty areas by truncating data files.  However, if some valid entry persists after the long garbage area, the
systemtables would remain bloated. 
>
> Thank you for explaining the problem. I understood that creating and
> dropping temprary tables is harmful more than I have thought. Although
> this is not a concrete plan, there are two ideas to reduce creating
> temporary tables:

For the pg_stat_statements point of view, utility command support is
already quite bad as with many workloads it's rather impossible to
activate track_utility as it'd otherwise pollute the hashtable with an
infinity of queries executed only once (random prepared transaction
name, random cursor names...).  I'm wondering whether we should
normalize utility statements deparsing the utilityStmt, and also
normalizing some identifiers (maybe optionally with a GUC), eg.
"DECLARE ? AS CURSOR FOR normalized_query_here".  However commands
like vacuum or drop would be better kept as-is.

RE: Implementing Incremental View Maintenance

От

"tsunakawa.takay@fujitsu.com"

Дата:

23 декабря 2019 г., 07:43:23

From: Tatsuo Ishii <ishii@sraoss.co.jp>
> the target data?  In the discussion, someone referred to master data with low
> update frequency, because the proposed IVM implementation adds triggers on
> source tables, which limits the applicability to update-heavy tables.
>
> But if you want to get always up-to-data you need to pay the cost for
> REFRESH MATERIALIZED VIEW. IVM gives a choice here.

Thank you, that clarified to some extent.  What kind of data do you think of as an example?

Materialized view reminds me of the use in a data warehouse.  Oracle handles the top in its Database Data Warehousing
Guide,and Microsoft has just started to offer the materialized view feature in its Azure Synapse Analytics (formerly
SQLData Warehouse).  AWS also has previewed Redshift's materialized view feature in re:Invent 2019.  Are you targeting
thedata warehouse (analytics) workload? 

IIUC, to put (over) simply, the data warehouse has two kind of tables:

* Facts (transaction data): e.g. sales, user activity
Large amount.  INSERT only on a regular basis (ETL/ELT) or continuously (streaming)

* Dimensions (master/reference data): e.g. product, customer, time, country
Small amount.  Infrequently INSERTed or UPDATEd.


The proposed trigger-based approach does not seem to be suitable for the facts, because the trigger overhead imposed on
dataloading may offset or exceed the time saved by incrementally refreshing the materialized views. 

Then, does the proposed feature fit the dimension tables?  If the materialized view is only based on the dimension
data,then the full REFRESH of the materialized view wouldn't take so long.  The typical materialized view should join
thefact and dimension tables.  Then, the fact table will have to have the triggers, causing the data loading slowdown. 

I'm saying this because I'm concerned about the trigger based overhead.  As you know, Oracle uses materialized view
logsto save changes and incrementally apply them later to the materialized views (REFRESH ON STATEMENT materialized
viewsdoesn't require the materialized view log, so it might use triggers.)  Does any commercial grade database
implementmaterialized view using triggers?  I couldn't find relevant information regarding Azure Synapse and Redshift. 

If our only handy option is a trigger, can we minimize the overhead by doing the view maintenance at transaction
commit?


Regards
Takayuki Tsunakawa

RE: Implementing Incremental View Maintenance

От

"tsunakawa.takay@fujitsu.com"

Дата:

23 декабря 2019 г., 08:08:53

From: Yugo Nagata <nagata@sraoss.co.jp>
> 1. Create a temporary table only once at the first view maintenance in
> this session. This is possible if we store names or oid of temporary
> tables used for each materialized view in memory. However, users may
> access to these temptables whenever during the session.
>
> 2. Use tuplestores instead of temprary tables. Tuplestores can be
> converted to Ephemeral Name Relation (ENR) and used in queries.
>  It doesn't need updating system catalogs, but indexes can not be
> used to access.

How about unlogged tables ?  I thought the point of using a temp table is to avoid WAL overhead.

One concern about the temp table is that it precludes the use of distributed transactions (PREPARE TRANSACTION fails if
thetransaction accessed a temp table.)  This could become a headache when FDW has supported 2PC (which Sawada-san
startedand Horicuchi-san has taken over.)  In the near future, PostgreSQL may evolve into a shared nothing database
withdistributed transactions like Postgres-XL. 


Regards
Takayuki Tsunakawa

Re: Implementing Incremental View Maintenance

От

Tatsuo Ishii

Дата:

23 декабря 2019 г., 08:13:19

>> But if you want to get always up-to-data you need to pay the cost for
>> REFRESH MATERIALIZED VIEW. IVM gives a choice here.
> 
> Thank you, that clarified to some extent.  What kind of data do you think of as an example?
> 
> Materialized view reminds me of the use in a data warehouse.  Oracle handles the top in its Database Data Warehousing
Guide,and Microsoft has just started to offer the materialized view feature in its Azure Synapse Analytics (formerly
SQLData Warehouse).  AWS also has previewed Redshift's materialized view feature in re:Invent 2019.  Are you targeting
thedata warehouse (analytics) workload?
 

First of all, we do not think that current approach is the final
one. Instead we want to implement IVM feature one by one: i.e. we
start with "immediate update" approach, because it's simple and easier
to implement. Then we will add "deferred update" mode later on.

In fact Oracle has both "immediate update" and "deferred update" mode
of IVM (actually there are more "mode" with their implementation).

I recommend you to look into Oracle's materialized view feature
closely. For fair evaluation, probably we should compare the IVM patch
with Oracle's "immediate update" (they call it "on statement") mode.

> IIUC, to put (over) simply, the data warehouse has two kind of tables:

Probably deferred IVM mode is more suitable for DWH. However as I said
earlier, we hope to implement the immediate mode first then add the
deferred mode. Let's start with simple one then add more features.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: Implementing Incremental View Maintenance

От

Yugo Nagata

Дата:

23 декабря 2019 г., 09:50:47

On Mon, 23 Dec 2019 08:08:53 +0000
"tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote:

> From: Yugo Nagata <nagata@sraoss.co.jp>
> > 1. Create a temporary table only once at the first view maintenance in
> > this session. This is possible if we store names or oid of temporary
> > tables used for each materialized view in memory. However, users may
> > access to these temptables whenever during the session.
> > 
> > 2. Use tuplestores instead of temprary tables. Tuplestores can be
> > converted to Ephemeral Name Relation (ENR) and used in queries.
> >  It doesn't need updating system catalogs, but indexes can not be
> > used to access.
> 
> How about unlogged tables ?  I thought the point of using a temp table is to avoid WAL overhead.

Hmm... this might be another option. However, if we use unlogged tables,
we will need to create them in a special schema similar to pg_toast
to split this from user tables. Otherwise, we need to create and drop
unlogged tables repeatedly for each session.

> 
> One concern about the temp table is that it precludes the use of distributed transactions (PREPARE TRANSACTION fails
ifthe transaction accessed a temp table.)  This could become a headache when FDW has supported 2PC (which Sawada-san
startedand Horicuchi-san has taken over.)  In the near future, PostgreSQL may evolve into a shared nothing database
withdistributed transactions like Postgres-XL.

This makes sense since you mean that PREPARE TRANSACTION can not be used
if any base table of incrementally maintainable materialized views is
modified in the transaction, at least in the immediate maintenance. Maybe,
this issue can be resolved if we implement the deferred maintenance planned
in future because  materialized views can be updated in other transactions
in this way.

> 
> 
> Regards
> Takayuki Tsunakawa
> 
> 
> 

-- 
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

legrand legrand

Дата:

23 декабря 2019 г., 10:41:18

Hello,
regarding my initial post:

> For each insert into a base table there are 3 statements:
> - ANALYZE pg_temp_3.pg_temp_81976
> - WITH updt AS (  UPDATE public.mv1 AS mv SET __ivm_count__ = ...
> - DROP TABLE pg_temp_3.pg_temp_81976

For me there where 3 points to discuss:
- create/drop tables may bloat dictionnary tables 
- create/drop tables prevents "WITH updt ..." from being shared  (with some
plan caching)
- generates many lines in pg_stat_statements

In fact I like the idea of a table created per session, but I would even
prefer a common "table" shared between all sessions like GLOBAL TEMPORARY
TABLE (or something similar) as described here:

https://www.postgresql.org/message-id/flat/157703426606.1198.2452090605041230054.pgcf%40coridan.postgresql.org#331e8344bbae904350af161fb43a0aa6

That would remove the drop/create issue, permits to reduce planning time for
"WITH updt ..." statements
(as done today in PLpgsql triggers), and would fix the pgss "bloat" issue. 

Like that the "cost" of the immediate refresh approach would be easier to
support ;o)

Regards
PAscal



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: Implementing Incremental View Maintenance

От

Yugo Nagata

Дата:

24 декабря 2019 г., 02:01:48

On Mon, 23 Dec 2019 03:41:18 -0700 (MST)
legrand legrand <legrand_legrand@hotmail.com> wrote:

> Hello,
> regarding my initial post:
> 
> > For each insert into a base table there are 3 statements:
> > - ANALYZE pg_temp_3.pg_temp_81976
> > - WITH updt AS (  UPDATE public.mv1 AS mv SET __ivm_count__ = ...
> > - DROP TABLE pg_temp_3.pg_temp_81976
> 
> For me there where 3 points to discuss:
> - create/drop tables may bloat dictionnary tables 
> - create/drop tables prevents "WITH updt ..." from being shared  (with some
> plan caching)
> - generates many lines in pg_stat_statements
> 
> In fact I like the idea of a table created per session, but I would even
> prefer a common "table" shared between all sessions like GLOBAL TEMPORARY
> TABLE (or something similar) as described here:
>
https://www.postgresql.org/message-id/flat/157703426606.1198.2452090605041230054.pgcf%40coridan.postgresql.org#331e8344bbae904350af161fb43a0aa6

Although I have not looked into this thread, this may be help if this is
implemented. However, it would be still necessary to truncate the table
before the view maintenance because such tables always exist and can be
accessed and modified by any users.


-- 
Yugo Nagata <nagata@sraoss.co.jp>

RE: Implementing Incremental View Maintenance

От

"tsunakawa.takay@fujitsu.com"

Дата:

24 декабря 2019 г., 06:52:35

From: Tatsuo Ishii <ishii@sraoss.co.jp>
> First of all, we do not think that current approach is the final
> one. Instead we want to implement IVM feature one by one: i.e. we
> start with "immediate update" approach, because it's simple and easier
> to implement. Then we will add "deferred update" mode later on.

I agree about incremental feature introduction.  What I'm simply asking is the concrete use case (workload and data),
sothat I can convince myself to believe that this feature is useful and focus on reviewing and testing (because the
patchseems big and difficult...) 


> In fact Oracle has both "immediate update" and "deferred update" mode
> of IVM (actually there are more "mode" with their implementation).
>
> I recommend you to look into Oracle's materialized view feature
> closely. For fair evaluation, probably we should compare the IVM patch
> with Oracle's "immediate update" (they call it "on statement") mode.
>
> Probably deferred IVM mode is more suitable for DWH. However as I said
> earlier, we hope to implement the immediate mode first then add the
> deferred mode. Let's start with simple one then add more features.

Yes, I know Oracle's ON STATEMENT refresh mode (I attached references at the end for others.)

Unfortunately, it's not clear to me which of ON STATEMENT or ON COMMIT the user should choose.  The benefit of ON
STATEMENTis that the user does not have to create and maintain the materialized view log.  But I'm not sure if and when
thebenefit defeats the performance overhead on DML statements.  It's not disclosed whether ON STATEMENT uses triggers. 

Could you give your opinion on the following to better understand the proposed feature and/or Oracle's ON STATEMENT
refreshmode? 

* What use case does the feature fit?
If the trigger makes it difficult to use in the data ware house, does the feature target OLTP?
What kind of data and query would benefit most from the feature (e.g. join of a large sales table and a small product
table,where the data volume and frequency of data loading is ...)? 
In other words, this is about what kind of example we can recommend as a typical use case of this feature.

* Do you think the benefit of ON STATEMENT (i.e. do not have to use materialized view log) outweighs the drawback of ON
STATEMENT (i.g. DML overhead)? 

* Do you think it's important to refresh the materialized view after every statement, or the per-statement refresh is
nota requirement but simply the result of implementation? 


[References]

https://docs.oracle.com/en/database/oracle/oracle-database/19/dwhsg/refreshing-materialized-views.html#GUID-C40C225A-8328-44D5-AE90-9078C2C773EA
--------------------------------------------------
7.1.5 About ON COMMIT Refresh for Materialized Views

A materialized view can be refreshed automatically using the ON COMMIT method. Therefore, whenever a transaction
commitswhich has updated the tables on which a materialized view is defined, those changes are automatically reflected
inthe materialized view. The advantage of using this approach is you never have to remember to refresh the materialized
view.The only disadvantage is the time required to complete the commit will be slightly longer because of the extra
processinginvolved. However, in a data warehouse, this should not be an issue because there is unlikely to be
concurrentprocesses trying to update the same table.  


7.1.6 About ON STATEMENT Refresh for Materialized Views

A materialized view that uses the ON STATEMENT refresh mode is automatically refreshed every time a DML operation is
performedon any of the materialized view’s base tables.  

With the ON STATEMENT refresh mode, any changes to the base tables are immediately reflected in the materialized view.
Thereis no need to commit the transaction or maintain materialized view logs on the base tables. If the DML statements
aresubsequently rolled back, then the corresponding changes made to the materialized view are also rolled back.  

The advantage of the ON STATEMENT refresh mode is that the materialized view is always synchronized with the data in
thebase tables, without the overhead of maintaining materialized view logs. However, this mode may increase the time
takento perform a DML operation because the materialized view is being refreshed as part of the DML operation.  
--------------------------------------------------



https://docs.oracle.com/en/database/oracle/oracle-database/19/dwhsg/release-changes.html#GUID-2A2D6E3B-A3FD-47A8-82A3-1EF95AEF5993
--------------------------------------------------
ON STATEMENT refresh mode for materialized views
The ON STATEMENT refresh mode refreshes materialized views every time a DML operation is performed on any base table,
withoutthe need to commit the transaction. This mode does not require you to maintain materialized view logs on the
basetables.  
--------------------------------------------------


http://www.oracle.com/us/solutions/sap/matview-refresh-db12c-2877319.pdf
--------------------------------------------------
We have introduced a new Materialized View (MV) refresh mechanism called ON STATEMENT refresh. With the ON STATEMENT
refreshmethod, an MV is automatically refreshed whenever DML happens on a base table of the MV. Therefore, whenever a
DMLhappens on any table on which a materialized view is defined, the change is automatically reflected in the
materializedview. The advantage of using this approach is that the user no long needs to create a materialized view log
oneach of the base table in order to do fast refresh. The refresh can then avoid the overhead introduced by MV logging
butstill keep the materialized view refreshed all the time. 

Specify ON STATEMENT to indicate that a fast refresh is to occur whenever DML happens on a base table of the
materializedview. This is to say, ON STATEMENT materialized view is always in sync with base table changes even before
thetransaction commits. If a transaction that made changes to the base tables rolls back, the corresponding changes in
onstatement MV are rolled back as well. This clause may increase the time taken to complete a DML, because the database
performsthe refresh operation as part of the DML execution. However, unlike other types of fast refreshable
materializedviews, ON STATEMENT MV refresh no longer requires MV log on the base tables or any extra work on MV logs in
orderto do fast refresh. 
--------------------------------------------------


Regards
Takayuki Tsunakawa

RE: Implementing Incremental View Maintenance

От

"tsunakawa.takay@fujitsu.com"

Дата:

24 декабря 2019 г., 07:07:35

From: Yugo Nagata <nagata@sraoss.co.jp>
> On Mon, 23 Dec 2019 08:08:53 +0000
> "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote:
> > How about unlogged tables ?  I thought the point of using a temp table is to
> avoid WAL overhead.
>
> Hmm... this might be another option. However, if we use unlogged tables,
> we will need to create them in a special schema similar to pg_toast
> to split this from user tables. Otherwise, we need to create and drop
> unlogged tables repeatedly for each session.

Maybe we can create the work tables in the same schema as the materialized view, following:

* Prefix the table name to indicate that the table is system-managed, thus alluding to the user that manually deleting
thetable would break something.  This is like the system attribute __imv_count you are proposing. 

* Describe the above in the manual.  Columns of serial and bigserial data type similarly create sequences behind the
scenes.

* Make the work tables depend on the materialized view by recording the dependency in pg_depend, so that Dropping the
materializedview will also drop its work tables. 

Regards
Takayuki Tsunakawa

Re: Implementing Incremental View Maintenance

От

Tatsuo Ishii

Дата:

24 декабря 2019 г., 08:09:09

> Unfortunately, it's not clear to me which of ON STATEMENT or ON COMMIT the user should choose.  The benefit of ON
STATEMENTis that the user does not have to create and maintain the materialized view log.  But I'm not sure if and when
thebenefit defeats the performance overhead on DML statements.  It's not disclosed whether ON STATEMENT uses triggers.
 

AFAIK benefit of ON STATEMENT is the transaction can see the result of
update to the base tables. With ON COMMIT, the transaction does not
see the result until the transaction commits.

> Could you give your opinion on the following to better understand the proposed feature and/or Oracle's ON STATEMENT
refreshmode?
 
> 
> * What use case does the feature fit?
> If the trigger makes it difficult to use in the data ware house, does the feature target OLTP?

Well, I can see use cases of IVM in both DWH and OLTP.

For example, a user create a DWH-like data using materialized
view. After the initial data is loaded, the data is seldom updated.
However one day a user wants to change just one row to see how it
affects to the whole DWH data. IVM will help here because it could be
done in shorter time than loading whole data.

Another use case is a ticket selling system. The system shows how many
tickets remain in a real time manner. For this purpose it needs to
count the number of tickets already sold from a log table. By using
IVM, it could be accomplished in simple and effective way.

> What kind of data and query would benefit most from the feature (e.g. join of a large sales table and a small product
table,where the data volume and frequency of data loading is ...)?
 
> In other words, this is about what kind of example we can recommend as a typical use case of this feature.

Here are some use cases suitable for IVM I can think of:

- Users are creating home made triggers to get data from tables. Since
  IVM could eliminates some of those triggers, we could expect less
  maintenance cost and bugs accidentally brought in when the triggers
  were created.

- Any use case in which the cost of refreshing whole result table
  (materialized view) is so expensive that it justifies the cost of
  updating of base tables. See the example of use cases above.

> * Do you think the benefit of ON STATEMENT (i.e. do not have to use materialized view log) outweighs the drawback of
ON STATEMENT (i.g. DML overhead)?
 

Outweights to what?

> * Do you think it's important to refresh the materialized view after every statement, or the per-statement refresh is
nota requirement but simply the result of implementation?
 

I think it's important to refresh the materialized view after every
statement and the benefit for users are apparent because it brings
real time data refresh to users.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: Implementing Incremental View Maintenance

От

legrand legrand

Дата:

24 декабря 2019 г., 11:12:30

Yugo Nagata wrote
> On Mon, 23 Dec 2019 03:41:18 -0700 (MST)
> legrand legrand <

> legrand_legrand@

> > wrote:
> 
> [ ...]
> 
>> I would even
>> prefer a common "table" shared between all sessions like GLOBAL TEMPORARY
>> TABLE (or something similar) as described here:
>>
https://www.postgresql.org/message-id/flat/157703426606.1198.2452090605041230054.pgcf%40coridan.postgresql.org#331e8344bbae904350af161fb43a0aa6
> 
> Although I have not looked into this thread, this may be help if this is
> implemented. However, it would be still necessary to truncate the table
> before the view maintenance because such tables always exist and can be
> accessed and modified by any users.
> 
> -- 
> Yugo Nagata <

> nagata@.co

> >

For information, in this table data is PRIVATE to each session, can be
purged on the ON COMMIT event and disappear at SESSION end.
Yes, this feature could be utile only if it's implemented. And you are rigth
some data has to be deleted
on the ON STATEMENT event (not sure if TRUNCATE is Global or Session
specific in this situation).





--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: Implementing Incremental View Maintenance

От

Tatsuo Ishii

Дата:

24 декабря 2019 г., 23:14:41

> Materialized view reminds me of the use in a data warehouse.  Oracle handles the top in its Database Data Warehousing
Guide,and Microsoft has just started to offer the materialized view feature in its Azure Synapse Analytics (formerly
SQLData Warehouse).  AWS also has previewed Redshift's materialized view feature in re:Invent 2019.  Are you targeting
thedata warehouse (analytics) workload?
 
> 
> IIUC, to put (over) simply, the data warehouse has two kind of tables:
> 
> * Facts (transaction data): e.g. sales, user activity
> Large amount.  INSERT only on a regular basis (ETL/ELT) or continuously (streaming)
> 
> * Dimensions (master/reference data): e.g. product, customer, time, country
> Small amount.  Infrequently INSERTed or UPDATEd.
> 
> 
> The proposed trigger-based approach does not seem to be suitable for the facts, because the trigger overhead imposed
ondata loading may offset or exceed the time saved by incrementally refreshing the materialized views.
 

I think that depends on use case of the DWH. If the freshness of
materialized view tables is important for a user, then the cost of the
trigger overhead may be acceptable for the user.

> Then, does the proposed feature fit the dimension tables?  If the materialized view is only based on the dimension
data,then the full REFRESH of the materialized view wouldn't take so long.  The typical materialized view should join
thefact and dimension tables.  Then, the fact table will have to have the triggers, causing the data loading slowdown.
 
> 
> I'm saying this because I'm concerned about the trigger based overhead.  As you know, Oracle uses materialized view
logsto save changes and incrementally apply them later to the materialized views (REFRESH ON STATEMENT materialized
viewsdoesn't require the materialized view log, so it might use triggers.)  Does any commercial grade database
implementmaterialized view using triggers?  I couldn't find relevant information regarding Azure Synapse and Redshift.
 

I heard that REFRESH ON STATEMENT of Oracle has been added after ON
COMMIT materialized view. So I suspect Oracle realizes that there are
needs/use case for ON STATEMENT, but I am not sure.

> If our only handy option is a trigger, can we minimize the overhead by doing the view maintenance at transaction
commit?

I am not sure it's worth the trouble. If it involves some form of
logging, then I think it should be used for deferred IVM first because
it has more use case than on commit IVM.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

RE: Implementing Incremental View Maintenance

От

"tsunakawa.takay@fujitsu.com"

Дата:

25 декабря 2019 г., 05:27:26

From: Tatsuo Ishii <ishii@sraoss.co.jp>
> AFAIK benefit of ON STATEMENT is the transaction can see the result of
> update to the base tables. With ON COMMIT, the transaction does not
> see the result until the transaction commits.
>
> Well, I can see use cases of IVM in both DWH and OLTP.
>
> For example, a user create a DWH-like data using materialized
> view. After the initial data is loaded, the data is seldom updated.
> However one day a user wants to change just one row to see how it
> affects to the whole DWH data. IVM will help here because it could be
> done in shorter time than loading whole data.

> I heard that REFRESH ON STATEMENT of Oracle has been added after ON
> COMMIT materialized view. So I suspect Oracle realizes that there are
> needs/use case for ON STATEMENT, but I am not sure.

Yes, it was added relatively recently in Oracle Database 12.2.  As the following introduction to new features shows,
thebenefits are described as twofold: 
1) The transaction can see the refreshed view result without committing.
2) The materialized view log is not needed.

I guess from these that the ON STATEMENT refresh mode can be useful when the user wants to experiment with some changes
tosee how data change could affect the analytics result, without persisting the change.  I think that type of
experimentis done in completely or almost static data marts where the user is allowed to modify the data freely.  The
ONSTATEMENT refresh mode wouldn't be for the DWH that requires high-performance, regular and/or continuous data loading
andmaintenance based on a rigorous discipline.  But I'm still not sure if this is a real-world use case... 


https://docs.oracle.com/en/database/oracle/oracle-database/19/dwhsg/release-changes.html#GUID-2A2D6E3B-A3FD-47A8-82A3-1EF95AEF5993
--------------------------------------------------
ON STATEMENT refresh mode for materialized views
The ON STATEMENT refresh mode refreshes materialized views every time a DML operation is performed on any base table,
withoutthe need to commit the transaction. This mode does not require you to maintain materialized view logs on the
basetables.  
--------------------------------------------------


> Another use case is a ticket selling system. The system shows how many
> tickets remain in a real time manner. For this purpose it needs to
> count the number of tickets already sold from a log table. By using
> IVM, it could be accomplished in simple and effective way.

Wouldn't the app just have a table like ticket(id, name, quantity), decrement the quantity when the ticket is sold, and
readthe current quantity to know the remaining tickets?  If many consumers try to buy tickets for a popular event, the
materializedview refresh would limit the concurrency. 


> Here are some use cases suitable for IVM I can think of:
>
> - Users are creating home made triggers to get data from tables. Since
>   IVM could eliminates some of those triggers, we could expect less
>   maintenance cost and bugs accidentally brought in when the triggers
>   were created.
>
> - Any use case in which the cost of refreshing whole result table
>   (materialized view) is so expensive that it justifies the cost of
>   updating of base tables. See the example of use cases above.

I think we need to find a typical example of this.  That should be useful to write the manual article, because it's
betterto caution users that the IMV is a good fit for this case and not for that case.  Using real-world table names in
thesyntax example will also be good. 


> > * Do you think the benefit of ON STATEMENT (i.e. do not have to use
> materialized view log) outweighs the drawback of ON  STATEMENT (i.g. DML
> overhead)?
>
> Outweights to what?

"outweigh" means "exceed."  I meant that I'm wondering if and why users prefer ON STATEMENT's benefit despite of its
additionaloverhead on update statements. 


Bottom line: The use of triggers makes me hesitate, because I saw someone's (probably Fujii san) article that INSERTs
intoinheritance-and-trigger-based partitioned tables were 10 times slower than the declaration-based partitioned
tables. I think I will try to find a good use case. 


Regards
Takayuki Tsunakawa

Re: Implementing Incremental View Maintenance

От

Tatsuo Ishii

Дата:

26 декабря 2019 г., 00:26:39

>> Another use case is a ticket selling system. The system shows how many
>> tickets remain in a real time manner. For this purpose it needs to
>> count the number of tickets already sold from a log table. By using
>> IVM, it could be accomplished in simple and effective way.
> 
> Wouldn't the app just have a table like ticket(id, name, quantity), decrement the quantity when the ticket is sold,
andread the current quantity to know the remaining tickets?  If many consumers try to buy tickets for a popular event,
thematerialized view refresh would limit the concurrency.
 

Yes, as long as number of sold ticks is the only important data for
the system, it could be true. However suppose the system wants to
start sort of "campaign" and the system needs to collect statistics of
counts depending on the city that each ticket buyer belongs to so that
certain offer is limited to first 100 ticket buyers in each city. In
this case IVM will give more flexible way to handle this kind of
requirements than having adhoc city counts column in a table.

> I think we need to find a typical example of this.  That should be useful to write the manual article, because it's
betterto caution users that the IMV is a good fit for this case and not for that case.  Using real-world table names in
thesyntax example will also be good.
 

In general I agree. I'd try to collect good real-world examples by
myself but my experience is limited. I hope people in this community
come up with such that examples.

> "outweigh" means "exceed."  I meant that I'm wondering if and why users prefer ON STATEMENT's benefit despite of its
additionaloverhead on update statements.
 

I already found at least one such user in the upthread if I don't
missing something.

> Bottom line: The use of triggers makes me hesitate, because I saw someone's (probably Fujii san) article that INSERTs
intoinheritance-and-trigger-based partitioned tables were 10 times slower than the declaration-based partitioned
tables. I think I will try to find a good use case.
 

Great. In the mean time we will try to mitigate the overhead of IVM
(triggers are just one of them).

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: Implementing Incremental View Maintenance

От

Takuma Hoshiai

Дата:

26 декабря 2019 г., 02:03:02

Hi, 

Attached is the latest patch (v11) to add support for Incremental Materialized View Maintenance (IVM).

Differences from the previous patch (v10) include:
- Prohibit creating matviews including mutable functions

Matviews including mutable functions (for example now(),random(), ... etc) could result in inconsistent data with the
basetables.
 
This patch adds a check whether the requested matview definition includes SELECTs using mutable functions. If so, raise
anerror while creating the matview.
 

This issue is reported by nuko-san.
https://www.postgresql.org/message-id/CAF3Gu1Z950HqQJzwanbeg7PmUXLc+7uZMstfnLeZM9iqDWeW9Q@mail.gmail.com


Currently other IVM's support status is:

> IVM is a way to make materialized views up-to-date in which only
> incremental changes are computed and applied on views rather than
> recomputing the contents from scratch as REFRESH MATERIALIZED VIEW
> does. IVM can update materialized views more efficiently
> than recomputation when only small part of the view need updates.
> 
> There are two approaches with regard to timing of view maintenance:
> immediate and deferred. In immediate maintenance, views are updated in
> the same transaction where its base table is modified. In deferred
> maintenance, views are updated after the transaction is committed,
> for example, when the view is accessed, as a response to user command
> like REFRESH, or periodically in background, and so on. 
> 
> This patch implements a kind of immediate maintenance, in which
> materialized views are updated immediately in AFTER triggers when a
> base table is modified.
> 
> This supports views using:
>  - inner and outer joins including self-join
>  - some built-in aggregate functions (count, sum, agv, min, max)
>  - a part of subqueries
>    -- simple subqueries in FROM clause
>    -- EXISTS subqueries in WHERE clause
>  - DISTINCT and views with tuple duplicates
> 
> ===
> Here are major changes we made after the previous submitted patch:
> 
> * Aggregate functions are checked if they can be used in IVM 
>   using their OID. Per comments from Alvaro Herrera.
> 
>   For this purpose, Gen_fmgrtab.pl was modified so that OIDs of
>   aggregate functions are output to fmgroids.h.
> 
> * Some bug fixes including:
> 
>  - Mistake of tab-completion of psql pointed out by nuko-san
>  - A bug relating rename of matview pointed out by nuko-san
>  - spelling errors
>  - etc.
> 
> * Add documentations for IVM
> 
> * Patch is splited into eleven parts to make review easier
>   as suggested by Amit Langote:
> 
>  - 0001: Add a new syntax:
>              CREATE INCREMENTAL MATERIALIZED VIEW
>  - 0002: Add a new column relisivm to pg_class
>  - 0003: Change trigger.c to allow to prolong life span of tupestores
>          containing Transition Tables generated via AFTER trigger
>  - 0004: Add the basic IVM future using counting algorithm:
>          This supports inner joins, DISTINCT, and tuple duplicates.
>  - 0005: Change GEN_fmgrtab.pl to output aggregate function's OIDs
>  - 0006: Add aggregates support for IVM
>  - 0007: Add subqueries support for IVM
>  - 0008: Add outer joins support for IVM
>  - 0009: Add IVM support to psql command
>  - 0010: Add regression tests for IVM
>  - 0011: Add documentations for IVM
> 
> ===
> Todo:
> 
> Currently, REFRESH and pg_dump/pg_restore is not supported, but
> we are working on them.
> 
> Also, TRUNCATE is not supported. When TRUNCATE command is executed
> on a base table, nothing occurs on materialized views. We are
> now considering another better options, like:
> 
> - Raise an error or warning when a base table is TRUNCATEed.
> - Make the view non-scannable (like WITH NO DATA)
> - Update the view in some ways. It would be easy for inner joins
>   or aggregate views, but there is some difficult with outer joins.

Best Regards,

-- 
Takuma Hoshiai <hoshiai@sraoss.co.jp>

Вложения

IVM_patches_v11.tar.gz

Re: Implementing Incremental View Maintenance

От

Yugo Nagata

Дата:

26 декабря 2019 г., 02:36:47

On Tue, 24 Dec 2019 07:07:35 +0000
"tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote:

> From: Yugo Nagata <nagata@sraoss.co.jp>
> > On Mon, 23 Dec 2019 08:08:53 +0000
> > "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote:
> > > How about unlogged tables ?  I thought the point of using a temp table is to
> > avoid WAL overhead.
> > 
> > Hmm... this might be another option. However, if we use unlogged tables,
> > we will need to create them in a special schema similar to pg_toast
> > to split this from user tables. Otherwise, we need to create and drop
> > unlogged tables repeatedly for each session.
> 
> Maybe we can create the work tables in the same schema as the materialized view, following:
> 
> * Prefix the table name to indicate that the table is system-managed, thus alluding to the user that manually
deletingthe table would break something.  This is like the system attribute __imv_count you are proposing.
 
> 
> * Describe the above in the manual.  Columns of serial and bigserial data type similarly create sequences behind the
scenes.
> 
> * Make the work tables depend on the materialized view by recording the dependency in pg_depend, so that Dropping the
materializedview will also drop its work tables.
 

Maybe it works, but instead of using special names for work tables, we can also create
a schema whose name is special and place work tables in this. This will not annoy users
with information they are not interested in when, for example, psql meta-commands like
\d are used.

Anyway, I understood it is better to avoid creating and dropping temporary tables
during view maintenance per statement.

-- 
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

legrand legrand

Дата:

27 декабря 2019 г., 23:42:04

Hello,
Thank you for this patch.

I have tried to use an other patch with yours:
"Planning counters in pg_stat_statements (using pgss_store)"
https://www.postgresql.org/message-id/CAOBaU_Y12bn0tOdN9RMBZn29bfYYH11b2CwKO1RO7dX9fQ3aZA%40mail.gmail.com

setting
shared_preload_libraries='pg_stat_statements'
pg_stat_statements.track=all
and creating the extension


When trying following syntax:

create table b1 (id integer, x numeric(10,3));
create incremental materialized view mv1 as select id, count(*),sum(x) from
b1 group by id;
insert into b1 values (1,1)

I got an ASSERT FAILURE in pg_stat_statements.c
on
    Assert(query != NULL);

comming from matview.c
    refresh_matview_datafill(dest_old, query, queryEnv, NULL);
or
    refresh_matview_datafill(dest_new, query, queryEnv, NULL);


If this (last) NULL field was replaced by the query text, a comment or just
"n/a",
it would fix the problem.

Could this be investigated ?

Thanks in advance
Regards
PAscal




--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: Implementing Incremental View Maintenance

От

Julien Rouhaud

Дата:

28 декабря 2019 г., 11:05:50

On Sat, Dec 28, 2019 at 12:42 AM legrand legrand
<legrand_legrand@hotmail.com> wrote:
>
> Hello,
> Thank you for this patch.
>
> I have tried to use an other patch with yours:
> "Planning counters in pg_stat_statements (using pgss_store)"
> https://www.postgresql.org/message-id/CAOBaU_Y12bn0tOdN9RMBZn29bfYYH11b2CwKO1RO7dX9fQ3aZA%40mail.gmail.com
>
> setting
> shared_preload_libraries='pg_stat_statements'
> pg_stat_statements.track=all
> and creating the extension
>
>
> When trying following syntax:
>
> create table b1 (id integer, x numeric(10,3));
> create incremental materialized view mv1 as select id, count(*),sum(x) from
> b1 group by id;
> insert into b1 values (1,1)
>
> I got an ASSERT FAILURE in pg_stat_statements.c
> on
>         Assert(query != NULL);
>
> comming from matview.c
>         refresh_matview_datafill(dest_old, query, queryEnv, NULL);
> or
>         refresh_matview_datafill(dest_new, query, queryEnv, NULL);
>
>
> If this (last) NULL field was replaced by the query text, a comment or just
> "n/a",
> it would fix the problem.
>
> Could this be investigated ?

I digged deeper into this.  I found a bug in the pg_stat_statements
patch, as the new pgss_planner_hook() doesn't check for a non-zero
queryId, which I think should avoid that problem.  This however indeed
raises the question on whether the query text should be provided, and
if the behavior is otherwise correct.  If I understand correctly, for
now this specific query won't go through parse_analysis, thus won't
get a queryId and will be ignored in pgss_ExecutorEnd, so it'll be
entirely invisible, except with auto_explain which will only show an
orphan plan like this:

2019-12-28 12:03:29.334 CET [9399] LOG:  duration: 0.180 ms  plan:
HashAggregate  (cost=0.04..0.06 rows=1 width=60)
  Group Key: new_16385_0.id
  ->  Named Tuplestore Scan  (cost=0.00..0.02 rows=1 width=52)

Re: Implementing Incremental View Maintenance

От

nuko yokohama

Дата:

11 января 2020 г., 00:27:58

LIMIT clause without ORDER BY should be prohibited when creating incremental materialized views.

In SQL, the result of a LIMIT clause without ORDER BY is undefined.

If the LIMIT clause is allowed when creating an incremental materialized view, incorrect results will be obtained when the view is updated after updating the source table.

```
[ec2-user@ip-10-0-1-10 ivm]$ psql --version
psql (PostgreSQL) 13devel-ivm-3bf6953688153fa72dd48478a77e37cf3111a1ee
[ec2-user@ip-10-0-1-10 ivm]$ psql testdb -e -f limit-problem.sql
DROP TABLE IF EXISTS test CASCADE;
psql:limit-problem.sql:1: NOTICE: drop cascades to materialized view test_imv
DROP TABLE
CREATE TABLE test (id int primary key, data text);
CREATE TABLE
INSERT INTO test VALUES (generate_series(1, 10), 'foo');
INSERT 0 10
CREATE INCREMENTAL MATERIALIZED VIEW test_imv AS SELECT * FROM test LIMIT 1;
SELECT 1
Materialized view "public.test_imv"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
---------------+---------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
data | text | | | | extended | |
__ivm_count__ | bigint | | | | plain | |
View definition:
SELECT test.id,
test.data
FROM test
LIMIT 1;
Access method: heap
Incremental view maintenance: yes

SELECT * FROM test LIMIT 1;
id | data
----+------
1 | foo
(1 row)

TABLE test_imv;
id | data
----+------
1 | foo
(1 row)

UPDATE test SET data = 'bar' WHERE id = 1;
UPDATE 1
SELECT * FROM test LIMIT 1;
id | data
----+------
2 | foo
(1 row)

TABLE test_imv;
id | data
----+------
1 | bar
(1 row)

DELETE FROM test WHERE id = 1;
DELETE 1
SELECT * FROM test LIMIT 1;
id | data
----+------
2 | foo
(1 row)

TABLE test_imv;
id | data
----+------
(0 rows)
```

ORDER BY clause is not allowed when executing CREATE INCREMENTAL MATELIARIZED VIEW.

We propose not to allow LIMIT clauses as well.

2018年12月27日(木) 21:57 Yugo Nagata <nagata@sraoss.co.jp>:

Hi,

I would like to implement Incremental View Maintenance (IVM) on PostgreSQL.
IVM is a technique to maintain materialized views which computes and applies
only the incremental changes to the materialized views rather than
recomputate the contents as the current REFRESH command does.

I had a presentation on our PoC implementation of IVM at PGConf.eu 2018 [1].
Our implementation uses row OIDs to compute deltas for materialized views.
The basic idea is that if we have information about which rows in base tables
are contributing to generate a certain row in a matview then we can identify
the affected rows when a base table is updated. This is based on an idea of
Dr. Masunaga [2] who is a member of our group and inspired from ID-based
approach[3].

In our implementation, the mapping of the row OIDs of the materialized view
and the base tables are stored in "OID map". When a base relation is modified,
AFTER trigger is executed and the delta is recorded in delta tables using
the transition table feature. The accual udpate of the matview is triggerd
by REFRESH command with INCREMENTALLY option.

However, we realize problems of our implementation. First, WITH OIDS will
be removed since PG12, so OIDs are no longer available. Besides this, it would
be hard to implement this since it needs many changes of executor nodes to
collect base tables's OIDs during execuing a query. Also, the cost of maintaining
OID map would be high.

For these reasons, we started to think to implement IVM without relying on OIDs
and made a bit more surveys.

We also looked at Kevin Grittner's discussion [4] on incremental matview
maintenance. In this discussion, Kevin proposed to use counting algorithm [5]
to handle projection views (using DISTNICT) properly. This algorithm need an
additional system column, count_t, in materialized views and delta tables of
base tables.

However, the discussion about IVM is now stoped, so we would like to restart and
progress this.

Through our PoC inplementation and surveys, I think we need to think at least
the followings for implementing IVM.

1. How to extract changes on base tables

I think there would be at least two approaches for it.

- Using transition table in AFTER triggers
- Extracting changes from WAL using logical decoding

In our PoC implementation, we used AFTER trigger and transition tables, but using
logical decoding might be better from the point of performance of base table
modification.

If we can represent a change of UPDATE on a base table as query-like rather than
OLD and NEW, it may be possible to update the materialized view directly instead
of performing delete & insert.

2. How to compute the delta to be applied to materialized views

Essentially, IVM is based on relational algebra. Theorically, changes on base
tables are represented as deltas on this, like "R <- R + dR", and the delta on
the materialized view is computed using base table deltas based on "change
propagation equations". For implementation, we have to derive the equation from
the view definition query (Query tree, or Plan tree?) and describe this as SQL
query to compulte delta to be applied to the materialized view.

There could be several operations for view definition: selection, projection,
join, aggregation, union, difference, intersection, etc. If we can prepare a
module for each operation, it makes IVM extensable, so we can start a simple
view definition, and then support more complex views.

3. How to identify rows to be modifed in materialized views

When applying the delta to the materialized view, we have to identify which row
in the matview is corresponding to a row in the delta. A naive method is matching
by using all columns in a tuple, but clearly this is unefficient. If thematerialized
view has unique index, we can use this. Maybe, we have to force materialized views
to have all primary key colums in their base tables. In our PoC implementation, we
used OID to identify rows, but this will be no longer available as said above.

4. When to maintain materialized views

There are two candidates of the timing of maintenance, immediate (eager) or deferred.

In eager maintenance, the materialized view is updated in the same transaction
where the base table is updated. In deferred maintenance, this is done after the
transaction is commited, for example, when view is accessed, as a response to user
request, etc.

In the previous discussion[4], it is planned to start from "eager" approach. In our PoC
implementaion, we used the other aproach, that is, using REFRESH command to perform IVM.
I am not sure which is better as a start point, but I begin to think that the eager
approach may be more simple since we don't have to maintain base table changes in other
past transactions.

In the eager maintenance approache, we have to consider a race condition where two
different transactions change base tables simultaneously as discussed in [4].

[1] https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
[2] https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1 (Japanese only)
[3] https://dl.acm.org/citation.cfm?id=2750546
[4] https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
[5] https://dl.acm.org/citation.cfm?id=170066

Regards,
--
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Takuma Hoshiai

Дата:

14 января 2020 г., 06:37:45

On Sat, 11 Jan 2020 09:27:58 +0900
nuko yokohama <nuko.yokohama@gmail.com> wrote:

> LIMIT clause without ORDER BY should be prohibited when creating
> incremental materialized views.
> 
> In SQL, the result of a LIMIT clause without ORDER BY is undefined.
> If the LIMIT clause is allowed when creating an incremental materialized
> view, incorrect results will be obtained when the view is updated after
> updating the source table.

Thank you for your advice. It's just as you said. 
LIMIT/OFFSET clause should is prohibited. We will add this to next patch.

Best Regards,
 Takuma Hoshiai

> 
> ```
> [ec2-user@ip-10-0-1-10 ivm]$ psql --version
> psql (PostgreSQL) 13devel-ivm-3bf6953688153fa72dd48478a77e37cf3111a1ee
> [ec2-user@ip-10-0-1-10 ivm]$ psql testdb -e -f limit-problem.sql
> DROP TABLE IF EXISTS test CASCADE;
> psql:limit-problem.sql:1: NOTICE:  drop cascades to materialized view
> test_imv
> DROP TABLE
> CREATE TABLE test (id int primary key, data text);
> CREATE TABLE
> INSERT INTO test VALUES (generate_series(1, 10), 'foo');
> INSERT 0 10
> CREATE INCREMENTAL MATERIALIZED VIEW test_imv AS SELECT * FROM test LIMIT 1;
> SELECT 1
>                                Materialized view "public.test_imv"
>     Column     |  Type   | Collation | Nullable | Default | Storage  |
> Stats target | Description
> ---------------+---------+-----------+----------+---------+----------+--------------+-------------
>  id            | integer |           |          |         | plain    |
>          |
>  data          | text    |           |          |         | extended |
>          |
>  __ivm_count__ | bigint  |           |          |         | plain    |
>          |
> View definition:
>  SELECT test.id,
>     test.data
>    FROM test
>  LIMIT 1;
> Access method: heap
> Incremental view maintenance: yes
> 
> SELECT * FROM test LIMIT 1;
>  id | data
> ----+------
>   1 | foo
> (1 row)
> 
> TABLE test_imv;
>  id | data
> ----+------
>   1 | foo
> (1 row)
> 
> UPDATE test SET data = 'bar' WHERE id = 1;
> UPDATE 1
> SELECT * FROM test LIMIT 1;
>  id | data
> ----+------
>   2 | foo
> (1 row)
> 
> TABLE test_imv;
>  id | data
> ----+------
>   1 | bar
> (1 row)
> 
> DELETE FROM test WHERE id = 1;
> DELETE 1
> SELECT * FROM test LIMIT 1;
>  id | data
> ----+------
>   2 | foo
> (1 row)
> 
> TABLE test_imv;
>  id | data
> ----+------
> (0 rows)
> ```
> 
> ORDER BY clause is not allowed when executing CREATE INCREMENTAL
> MATELIARIZED VIEW.
> We propose not to allow LIMIT clauses as well.
> 
> 
> 2018年12月27日(木) 21:57 Yugo Nagata <nagata@sraoss.co.jp>:
> 
> > Hi,
> >
> > I would like to implement Incremental View Maintenance (IVM) on
> > PostgreSQL.
> > IVM is a technique to maintain materialized views which computes and
> > applies
> > only the incremental changes to the materialized views rather than
> > recomputate the contents as the current REFRESH command does.
> >
> > I had a presentation on our PoC implementation of IVM at PGConf.eu 2018
> > [1].
> > Our implementation uses row OIDs to compute deltas for materialized
> > views.
> > The basic idea is that if we have information about which rows in base
> > tables
> > are contributing to generate a certain row in a matview then we can
> > identify
> > the affected rows when a base table is updated. This is based on an idea of
> > Dr. Masunaga [2] who is a member of our group and inspired from ID-based
> > approach[3].
> >
> > In our implementation, the mapping of the row OIDs of the materialized view
> > and the base tables are stored in "OID map". When a base relation is
> > modified,
> > AFTER trigger is executed and the delta is recorded in delta tables using
> > the transition table feature. The accual udpate of the matview is triggerd
> > by REFRESH command with INCREMENTALLY option.
> >
> > However, we realize problems of our implementation. First, WITH OIDS will
> > be removed since PG12, so OIDs are no longer available. Besides this, it
> > would
> > be hard to implement this since it needs many changes of executor nodes to
> > collect base tables's OIDs during execuing a query. Also, the cost of
> > maintaining
> > OID map would be high.
> >
> > For these reasons, we started to think to implement IVM without relying on
> > OIDs
> > and made a bit more surveys.
> >
> > We also looked at Kevin Grittner's discussion [4] on incremental matview
> > maintenance.  In this discussion, Kevin proposed to use counting algorithm
> > [5]
> > to handle projection views (using DISTNICT) properly. This algorithm need
> > an
> > additional system column, count_t, in materialized views and delta tables
> > of
> > base tables.
> >
> > However, the discussion about IVM is now stoped, so we would like to
> > restart and
> > progress this.
> >
> >
> > Through our PoC inplementation and surveys, I think we need to think at
> > least
> > the followings for implementing IVM.
> >
> > 1. How to extract changes on base tables
> >
> > I think there would be at least two approaches for it.
> >
> >  - Using transition table in AFTER triggers
> >  - Extracting changes from WAL using logical decoding
> >
> > In our PoC implementation, we used AFTER trigger and transition tables,
> > but using
> > logical decoding might be better from the point of performance of base
> > table
> > modification.
> >
> > If we can represent a change of UPDATE on a base table as query-like
> > rather than
> > OLD and NEW, it may be possible to update the materialized view directly
> > instead
> > of performing delete & insert.
> >
> >
> > 2. How to compute the delta to be applied to materialized views
> >
> > Essentially, IVM is based on relational algebra. Theorically, changes on
> > base
> > tables are represented as deltas on this, like "R <- R + dR", and the
> > delta on
> > the materialized view is computed using base table deltas based on "change
> > propagation equations".  For implementation, we have to derive the
> > equation from
> > the view definition query (Query tree, or Plan tree?) and describe this as
> > SQL
> > query to compulte delta to be applied to the materialized view.
> >
> > There could be several operations for view definition: selection,
> > projection,
> > join,  aggregation, union, difference, intersection, etc.  If we can
> > prepare a
> > module for each operation, it makes IVM extensable, so we can start a
> > simple
> > view definition, and then support more complex views.
> >
> >
> > 3. How to identify rows to be modifed in materialized views
> >
> > When applying the delta to the materialized view, we have to identify
> > which row
> > in the matview is corresponding to a row in the delta.  A naive method is
> > matching
> > by using all columns in a tuple, but clearly this is unefficient. If
> > thematerialized
> > view has unique index, we can use this. Maybe, we have to force
> > materialized views
> > to have all primary key colums in their base tables.  In our PoC
> > implementation, we
> > used OID to identify rows, but this will be no longer available as said
> > above.
> >
> >
> > 4. When to maintain materialized views
> >
> > There are two candidates of the timing of maintenance, immediate (eager)
> > or deferred.
> >
> > In eager maintenance, the materialized view is updated in the same
> > transaction
> > where the base table is updated. In deferred maintenance, this is done
> > after the
> > transaction is commited, for example, when view is accessed, as a response
> > to user
> > request, etc.
> >
> > In the previous discussion[4], it is planned to start from "eager"
> > approach. In our PoC
> > implementaion, we used the other aproach, that is, using REFRESH command
> > to perform IVM.
> > I am not sure which is better as a start point, but I begin to think that
> > the eager
> > approach may be more simple since we don't have to maintain base table
> > changes in other
> > past transactions.
> >
> > In the eager maintenance approache, we have to consider a race condition
> > where two
> > different transactions change base tables simultaneously as discussed in
> > [4].
> >
> >
> > [1]
> >
https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
> > [2]
> >
https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1
> > (Japanese only)
> > [3] https://dl.acm.org/citation.cfm?id=2750546
> > [4]
> > https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
> > [5] https://dl.acm.org/citation.cfm?id=170066
> >
> > Regards,
> > --
> > Yugo Nagata <nagata@sraoss.co.jp>
> >
> >


-- 
Takuma Hoshiai <hoshiai@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

nuko yokohama

Дата:

16 января 2020 г., 03:59:11

Aggregate operation of user-defined type cannot be specified
(commit e150d964df7e3aeb768e4bae35d15764f8abd284)

A SELECT statement using the MIN() and MAX() functions can be executed on a user-defined type column that implements the aggregate functions MIN () and MAX ().
However, if the same SELECT statement is specified in the AS clause of CREATE INCREMENTAL MATERIALIZED VIEW, the following error will occur.

```
SELECT MIN(data) data_min, MAX(data) data_max FROM foo;
data_min | data_max
----------+----------
1/3 | 2/3
(1 row)

CREATE INCREMENTAL MATERIALIZED VIEW foo_min_imv AS SELECT MIN(data) data_min FROM foo;
psql:extension-agg.sql:14: ERROR: aggregate function min is not supported
CREATE INCREMENTAL MATERIALIZED VIEW foo_max_imv AS SELECT MAX(data) data_max FROM foo;
psql:extension-agg.sql:15: ERROR: aggregate function max is not supported
```

Does query including user-defined type aggregate operation not supported by INCREMENTAL MATERIALIZED VIEW?

An execution example is shown below.

```
[ec2-user@ip-10-0-1-10 ivm]$ cat extension-agg.sql
--
-- pg_fraction: https://github.com/nuko-yokohama/pg_fraction
--
DROP EXTENSION IF EXISTS pg_fraction CASCADE;
DROP TABLE IF EXISTS foo CASCADE;

CREATE EXTENSION IF NOT EXISTS pg_fraction;
\dx
\dT+ fraction

CREATE TABLE foo (id int, data fraction);
INSERT INTO foo (id, data) VALUES (1,'2/3'),(2,'1/3'),(3,'1/2');
SELECT MIN(data) data_min, MAX(data) data_max FROM foo;
CREATE INCREMENTAL MATERIALIZED VIEW foo_min_imv AS SELECT MIN(data) data_min FROM foo;
CREATE INCREMENTAL MATERIALIZED VIEW foo_max_imv AS SELECT MAX(data) data_max FROM foo;

SELECT MIN(id) id_min, MAX(id) id_max FROM foo;
CREATE INCREMENTAL MATERIALIZED VIEW foo_id_imv AS SELECT MIN(id) id_min, MAX(id) id_max FROM foo;
```

Best regards.

2018年12月27日(木) 21:57 Yugo Nagata <nagata@sraoss.co.jp>:

Hi,

I would like to implement Incremental View Maintenance (IVM) on PostgreSQL.
IVM is a technique to maintain materialized views which computes and applies
only the incremental changes to the materialized views rather than
recomputate the contents as the current REFRESH command does.

I had a presentation on our PoC implementation of IVM at PGConf.eu 2018 [1].
Our implementation uses row OIDs to compute deltas for materialized views.
The basic idea is that if we have information about which rows in base tables
are contributing to generate a certain row in a matview then we can identify
the affected rows when a base table is updated. This is based on an idea of
Dr. Masunaga [2] who is a member of our group and inspired from ID-based
approach[3].

In our implementation, the mapping of the row OIDs of the materialized view
and the base tables are stored in "OID map". When a base relation is modified,
AFTER trigger is executed and the delta is recorded in delta tables using
the transition table feature. The accual udpate of the matview is triggerd
by REFRESH command with INCREMENTALLY option.

However, we realize problems of our implementation. First, WITH OIDS will
be removed since PG12, so OIDs are no longer available. Besides this, it would
be hard to implement this since it needs many changes of executor nodes to
collect base tables's OIDs during execuing a query. Also, the cost of maintaining
OID map would be high.

For these reasons, we started to think to implement IVM without relying on OIDs
and made a bit more surveys.

We also looked at Kevin Grittner's discussion [4] on incremental matview
maintenance. In this discussion, Kevin proposed to use counting algorithm [5]
to handle projection views (using DISTNICT) properly. This algorithm need an
additional system column, count_t, in materialized views and delta tables of
base tables.

However, the discussion about IVM is now stoped, so we would like to restart and
progress this.

Through our PoC inplementation and surveys, I think we need to think at least
the followings for implementing IVM.

1. How to extract changes on base tables

I think there would be at least two approaches for it.

- Using transition table in AFTER triggers
- Extracting changes from WAL using logical decoding

In our PoC implementation, we used AFTER trigger and transition tables, but using
logical decoding might be better from the point of performance of base table
modification.

If we can represent a change of UPDATE on a base table as query-like rather than
OLD and NEW, it may be possible to update the materialized view directly instead
of performing delete & insert.

2. How to compute the delta to be applied to materialized views

Essentially, IVM is based on relational algebra. Theorically, changes on base
tables are represented as deltas on this, like "R <- R + dR", and the delta on
the materialized view is computed using base table deltas based on "change
propagation equations". For implementation, we have to derive the equation from
the view definition query (Query tree, or Plan tree?) and describe this as SQL
query to compulte delta to be applied to the materialized view.

There could be several operations for view definition: selection, projection,
join, aggregation, union, difference, intersection, etc. If we can prepare a
module for each operation, it makes IVM extensable, so we can start a simple
view definition, and then support more complex views.

3. How to identify rows to be modifed in materialized views

When applying the delta to the materialized view, we have to identify which row
in the matview is corresponding to a row in the delta. A naive method is matching
by using all columns in a tuple, but clearly this is unefficient. If thematerialized
view has unique index, we can use this. Maybe, we have to force materialized views
to have all primary key colums in their base tables. In our PoC implementation, we
used OID to identify rows, but this will be no longer available as said above.

4. When to maintain materialized views

There are two candidates of the timing of maintenance, immediate (eager) or deferred.

In eager maintenance, the materialized view is updated in the same transaction
where the base table is updated. In deferred maintenance, this is done after the
transaction is commited, for example, when view is accessed, as a response to user
request, etc.

In the previous discussion[4], it is planned to start from "eager" approach. In our PoC
implementaion, we used the other aproach, that is, using REFRESH command to perform IVM.
I am not sure which is better as a start point, but I begin to think that the eager
approach may be more simple since we don't have to maintain base table changes in other
past transactions.

In the eager maintenance approache, we have to consider a race condition where two
different transactions change base tables simultaneously as discussed in [4].

[1] https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
[2] https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1 (Japanese only)
[3] https://dl.acm.org/citation.cfm?id=2750546
[4] https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
[5] https://dl.acm.org/citation.cfm?id=170066

Regards,
--
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

nuko yokohama

Дата:

16 января 2020 г., 09:50:40

Error occurs when updating user-defined type columns.

Create an INCREMENTAL MATERIALIZED VIEW by specifying a query that includes user-defined type columns.
After the view is created, an error occurs when inserting into the view source table (including the user-defined type column).
```
ERROR: operator does not exist
```

An execution example is shown below.

```
[ec2-user@ip-10-0-1-10 ivm]$ psql testdb -a -f extension-insert.sql
--
-- pg_fraction: https://github.com/nuko-yokohama/pg_fraction
--
DROP EXTENSION IF EXISTS pg_fraction CASCADE;
psql:extension-insert.sql:4: NOTICE: drop cascades to column data of table foo
DROP EXTENSION
DROP TABLE IF EXISTS foo CASCADE;
DROP TABLE
CREATE EXTENSION IF NOT EXISTS pg_fraction;
CREATE EXTENSION
\dx
List of installed extensions
Name | Version | Schema | Description
-------------+---------+------------+------------------------------
pg_fraction | 1.0 | public | fraction data type
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
(2 rows)

\dT+ fraction
List of data types
Schema | Name | Internal name | Size | Elements | Owner | Access privileges | Description
--------+----------+---------------+------+----------+----------+-------------------+-------------
public | fraction | fraction | 16 | | postgres | |
(1 row)

CREATE TABLE foo (id int, data fraction);
CREATE TABLE
INSERT INTO foo (id, data) VALUES (1,'2/3'),(2,'1/3'),(3,'1/2');
INSERT 0 3
SELECT id, data FROM foo WHERE data >= '1/2';
id | data
----+------
1 | 2/3
3 | 1/2
(2 rows)

CREATE INCREMENTAL MATERIALIZED VIEW foo_imv AS SELECT id, data FROM foo WHERE data >= '1/2';
SELECT 2
TABLE foo_imv;
id | data
----+------
1 | 2/3
3 | 1/2
(2 rows)

INSERT INTO foo (id, data) VALUES (4,'2/3'),(5,'2/5'),(6,'3/6'); -- error
psql:extension-insert.sql:17: ERROR: operator does not exist: fraction pg_catalog.= fraction
LINE 1: ...(mv.id IS NULL AND diff.id IS NULL)) AND (mv.data OPERATOR(p...
^
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
QUERY: WITH updt AS (UPDATE public.foo_imv AS mv SET __ivm_count__ = mv.__ivm_count__ OPERATOR(pg_catalog.+) diff.__ivm_count__ FROM pg_temp_3.pg_temp_73900 AS diff WHERE (mv.id OPERATOR(pg_catalog.=) diff.id OR (mv.id IS NULL AND diff.id IS NULL)) AND (mv.data OPERATOR(pg_catalog.=) diff.data OR (mv.data IS NULL AND diff.data IS NULL)) RETURNING mv.id, mv.data) INSERT INTO public.foo_imv SELECT * FROM pg_temp_3.pg_temp_73900 AS diff WHERE NOT EXISTS (SELECT 1 FROM updt AS mv WHERE (mv.id OPERATOR(pg_catalog.=) diff.id OR (mv.id IS NULL AND diff.id IS NULL)) AND (mv.data OPERATOR(pg_catalog.=) diff.data OR (mv.data IS NULL AND diff.data IS NULL)));
TABLE foo;
id | data
----+------
1 | 2/3
2 | 1/3
3 | 1/2
(3 rows)

TABLE foo_imv;
id | data
----+------
1 | 2/3
3 | 1/2
(2 rows)

DROP MATERIALIZED VIEW foo_imv;
DROP MATERIALIZED VIEW
INSERT INTO foo (id, data) VALUES (4,'2/3'),(5,'2/5'),(6,'3/6');
INSERT 0 3
TABLE foo;
id | data
----+------
1 | 2/3
2 | 1/3
3 | 1/2
4 | 2/3
5 | 2/5
6 | 1/2
(6 rows)

```

Best regards.

2018年12月27日(木) 21:57 Yugo Nagata <nagata@sraoss.co.jp>:

Hi,

I would like to implement Incremental View Maintenance (IVM) on PostgreSQL.
IVM is a technique to maintain materialized views which computes and applies
only the incremental changes to the materialized views rather than
recomputate the contents as the current REFRESH command does.

I had a presentation on our PoC implementation of IVM at PGConf.eu 2018 [1].
Our implementation uses row OIDs to compute deltas for materialized views.
The basic idea is that if we have information about which rows in base tables
are contributing to generate a certain row in a matview then we can identify
the affected rows when a base table is updated. This is based on an idea of
Dr. Masunaga [2] who is a member of our group and inspired from ID-based
approach[3].

In our implementation, the mapping of the row OIDs of the materialized view
and the base tables are stored in "OID map". When a base relation is modified,
AFTER trigger is executed and the delta is recorded in delta tables using
the transition table feature. The accual udpate of the matview is triggerd
by REFRESH command with INCREMENTALLY option.

However, we realize problems of our implementation. First, WITH OIDS will
be removed since PG12, so OIDs are no longer available. Besides this, it would
be hard to implement this since it needs many changes of executor nodes to
collect base tables's OIDs during execuing a query. Also, the cost of maintaining
OID map would be high.

For these reasons, we started to think to implement IVM without relying on OIDs
and made a bit more surveys.

We also looked at Kevin Grittner's discussion [4] on incremental matview
maintenance. In this discussion, Kevin proposed to use counting algorithm [5]
to handle projection views (using DISTNICT) properly. This algorithm need an
additional system column, count_t, in materialized views and delta tables of
base tables.

However, the discussion about IVM is now stoped, so we would like to restart and
progress this.

Through our PoC inplementation and surveys, I think we need to think at least
the followings for implementing IVM.

1. How to extract changes on base tables

I think there would be at least two approaches for it.

- Using transition table in AFTER triggers
- Extracting changes from WAL using logical decoding

In our PoC implementation, we used AFTER trigger and transition tables, but using
logical decoding might be better from the point of performance of base table
modification.

If we can represent a change of UPDATE on a base table as query-like rather than
OLD and NEW, it may be possible to update the materialized view directly instead
of performing delete & insert.

2. How to compute the delta to be applied to materialized views

Essentially, IVM is based on relational algebra. Theorically, changes on base
tables are represented as deltas on this, like "R <- R + dR", and the delta on
the materialized view is computed using base table deltas based on "change
propagation equations". For implementation, we have to derive the equation from
the view definition query (Query tree, or Plan tree?) and describe this as SQL
query to compulte delta to be applied to the materialized view.

There could be several operations for view definition: selection, projection,
join, aggregation, union, difference, intersection, etc. If we can prepare a
module for each operation, it makes IVM extensable, so we can start a simple
view definition, and then support more complex views.

3. How to identify rows to be modifed in materialized views

When applying the delta to the materialized view, we have to identify which row
in the matview is corresponding to a row in the delta. A naive method is matching
by using all columns in a tuple, but clearly this is unefficient. If thematerialized
view has unique index, we can use this. Maybe, we have to force materialized views
to have all primary key colums in their base tables. In our PoC implementation, we
used OID to identify rows, but this will be no longer available as said above.

4. When to maintain materialized views

There are two candidates of the timing of maintenance, immediate (eager) or deferred.

In eager maintenance, the materialized view is updated in the same transaction
where the base table is updated. In deferred maintenance, this is done after the
transaction is commited, for example, when view is accessed, as a response to user
request, etc.

In the previous discussion[4], it is planned to start from "eager" approach. In our PoC
implementaion, we used the other aproach, that is, using REFRESH command to perform IVM.
I am not sure which is better as a start point, but I begin to think that the eager
approach may be more simple since we don't have to maintain base table changes in other
past transactions.

In the eager maintenance approache, we have to consider a race condition where two
different transactions change base tables simultaneously as discussed in [4].

[1] https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
[2] https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1 (Japanese only)
[3] https://dl.acm.org/citation.cfm?id=2750546
[4] https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
[5] https://dl.acm.org/citation.cfm?id=170066

Regards,
--
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

17 января 2020 г., 08:11:53

On Thu, 16 Jan 2020 12:59:11 +0900
nuko yokohama <nuko.yokohama@gmail.com> wrote:

> Aggregate operation of user-defined type cannot be specified
> (commit e150d964df7e3aeb768e4bae35d15764f8abd284)
> 
> A SELECT statement using the MIN() and MAX() functions can be executed on a
> user-defined type column that implements the aggregate functions MIN () and
> MAX ().
> However, if the same SELECT statement is specified in the AS clause of
> CREATE INCREMENTAL MATERIALIZED VIEW, the following error will occur.
> 
> ```
> SELECT MIN(data) data_min, MAX(data) data_max FROM foo;
>  data_min | data_max
> ----------+----------
>  1/3      | 2/3
> (1 row)
> 
> CREATE INCREMENTAL MATERIALIZED VIEW foo_min_imv AS SELECT MIN(data)
> data_min FROM foo;
> psql:extension-agg.sql:14: ERROR:  aggregate function min is not supported
> CREATE INCREMENTAL MATERIALIZED VIEW foo_max_imv AS SELECT MAX(data)
> data_max FROM foo;
> psql:extension-agg.sql:15: ERROR:  aggregate function max is not supported
> ```
> 
> Does query including user-defined type aggregate operation not supported by
> INCREMENTAL MATERIALIZED VIEW?

The current implementation supports only built-in aggregate functions, so
user-defined aggregates are not supported, although it is allowed before.
This is because we can not know how user-defined aggregates behave and if
it can work safely with IVM. Min/Max on your fraction type may work well, 
but it is possible that some user-defined aggregate functions named min
or max behave in totally different way than we expected.

In future, maybe it is possible support user-defined aggregates are supported
by extending pg_aggregate and adding support functions for IVM, but there is
not still a concrete plan for now. 

BTW, the following error message doesn't look good because built-in min is
supported, so I will improve it.

 ERROR:  aggregate function min is not supported

Regards,
Yugo Nagata

> 
> An execution example is shown below.
> 
> ```
> [ec2-user@ip-10-0-1-10 ivm]$ cat extension-agg.sql
> --
> -- pg_fraction: https://github.com/nuko-yokohama/pg_fraction
> --
> DROP EXTENSION IF EXISTS pg_fraction CASCADE;
> DROP TABLE IF EXISTS foo CASCADE;
> 
> CREATE EXTENSION IF NOT EXISTS pg_fraction;
> \dx
> \dT+ fraction
> 
> CREATE TABLE foo (id int, data fraction);
> INSERT INTO foo (id, data) VALUES (1,'2/3'),(2,'1/3'),(3,'1/2');
> SELECT MIN(data) data_min, MAX(data) data_max FROM foo;
> CREATE INCREMENTAL MATERIALIZED VIEW foo_min_imv AS SELECT MIN(data)
> data_min FROM foo;
> CREATE INCREMENTAL MATERIALIZED VIEW foo_max_imv AS SELECT MAX(data)
> data_max FROM foo;
> 
> SELECT MIN(id) id_min, MAX(id) id_max FROM foo;
> CREATE INCREMENTAL MATERIALIZED VIEW foo_id_imv AS SELECT MIN(id) id_min,
> MAX(id) id_max FROM foo;
> ```
> 
> Best regards.
> 
> 2018年12月27日(木) 21:57 Yugo Nagata <nagata@sraoss.co.jp>:
> 
> > Hi,
> >
> > I would like to implement Incremental View Maintenance (IVM) on
> > PostgreSQL.
> > IVM is a technique to maintain materialized views which computes and
> > applies
> > only the incremental changes to the materialized views rather than
> > recomputate the contents as the current REFRESH command does.
> >
> > I had a presentation on our PoC implementation of IVM at PGConf.eu 2018
> > [1].
> > Our implementation uses row OIDs to compute deltas for materialized
> > views.
> > The basic idea is that if we have information about which rows in base
> > tables
> > are contributing to generate a certain row in a matview then we can
> > identify
> > the affected rows when a base table is updated. This is based on an idea of
> > Dr. Masunaga [2] who is a member of our group and inspired from ID-based
> > approach[3].
> >
> > In our implementation, the mapping of the row OIDs of the materialized view
> > and the base tables are stored in "OID map". When a base relation is
> > modified,
> > AFTER trigger is executed and the delta is recorded in delta tables using
> > the transition table feature. The accual udpate of the matview is triggerd
> > by REFRESH command with INCREMENTALLY option.
> >
> > However, we realize problems of our implementation. First, WITH OIDS will
> > be removed since PG12, so OIDs are no longer available. Besides this, it
> > would
> > be hard to implement this since it needs many changes of executor nodes to
> > collect base tables's OIDs during execuing a query. Also, the cost of
> > maintaining
> > OID map would be high.
> >
> > For these reasons, we started to think to implement IVM without relying on
> > OIDs
> > and made a bit more surveys.
> >
> > We also looked at Kevin Grittner's discussion [4] on incremental matview
> > maintenance.  In this discussion, Kevin proposed to use counting algorithm
> > [5]
> > to handle projection views (using DISTNICT) properly. This algorithm need
> > an
> > additional system column, count_t, in materialized views and delta tables
> > of
> > base tables.
> >
> > However, the discussion about IVM is now stoped, so we would like to
> > restart and
> > progress this.
> >
> >
> > Through our PoC inplementation and surveys, I think we need to think at
> > least
> > the followings for implementing IVM.
> >
> > 1. How to extract changes on base tables
> >
> > I think there would be at least two approaches for it.
> >
> >  - Using transition table in AFTER triggers
> >  - Extracting changes from WAL using logical decoding
> >
> > In our PoC implementation, we used AFTER trigger and transition tables,
> > but using
> > logical decoding might be better from the point of performance of base
> > table
> > modification.
> >
> > If we can represent a change of UPDATE on a base table as query-like
> > rather than
> > OLD and NEW, it may be possible to update the materialized view directly
> > instead
> > of performing delete & insert.
> >
> >
> > 2. How to compute the delta to be applied to materialized views
> >
> > Essentially, IVM is based on relational algebra. Theorically, changes on
> > base
> > tables are represented as deltas on this, like "R <- R + dR", and the
> > delta on
> > the materialized view is computed using base table deltas based on "change
> > propagation equations".  For implementation, we have to derive the
> > equation from
> > the view definition query (Query tree, or Plan tree?) and describe this as
> > SQL
> > query to compulte delta to be applied to the materialized view.
> >
> > There could be several operations for view definition: selection,
> > projection,
> > join,  aggregation, union, difference, intersection, etc.  If we can
> > prepare a
> > module for each operation, it makes IVM extensable, so we can start a
> > simple
> > view definition, and then support more complex views.
> >
> >
> > 3. How to identify rows to be modifed in materialized views
> >
> > When applying the delta to the materialized view, we have to identify
> > which row
> > in the matview is corresponding to a row in the delta.  A naive method is
> > matching
> > by using all columns in a tuple, but clearly this is unefficient. If
> > thematerialized
> > view has unique index, we can use this. Maybe, we have to force
> > materialized views
> > to have all primary key colums in their base tables.  In our PoC
> > implementation, we
> > used OID to identify rows, but this will be no longer available as said
> > above.
> >
> >
> > 4. When to maintain materialized views
> >
> > There are two candidates of the timing of maintenance, immediate (eager)
> > or deferred.
> >
> > In eager maintenance, the materialized view is updated in the same
> > transaction
> > where the base table is updated. In deferred maintenance, this is done
> > after the
> > transaction is commited, for example, when view is accessed, as a response
> > to user
> > request, etc.
> >
> > In the previous discussion[4], it is planned to start from "eager"
> > approach. In our PoC
> > implementaion, we used the other aproach, that is, using REFRESH command
> > to perform IVM.
> > I am not sure which is better as a start point, but I begin to think that
> > the eager
> > approach may be more simple since we don't have to maintain base table
> > changes in other
> > past transactions.
> >
> > In the eager maintenance approache, we have to consider a race condition
> > where two
> > different transactions change base tables simultaneously as discussed in
> > [4].
> >
> >
> > [1]
> >
https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
> > [2]
> >
https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1
> > (Japanese only)
> > [3] https://dl.acm.org/citation.cfm?id=2750546
> > [4]
> > https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
> > [5] https://dl.acm.org/citation.cfm?id=170066
> >
> > Regards,
> > --
> > Yugo Nagata <nagata@sraoss.co.jp>
> >
> >


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

17 января 2020 г., 08:21:18

On Thu, 16 Jan 2020 18:50:40 +0900
nuko yokohama <nuko.yokohama@gmail.com> wrote:

> Error occurs when updating user-defined type columns.
> 
> Create an INCREMENTAL MATERIALIZED VIEW by specifying a query that includes
> user-defined type columns.
> After the view is created, an error occurs when inserting into the view
> source table (including the user-defined type column).
> ```
> ERROR:  operator does not exist

Thank you for your reporting. I think this error occurs because 
pg_catalog.= is used also for user-defined types. I will fix this.

Regards,
Yugo Nagata

> ```
> 
> An execution example is shown below.
> 
> ```
> [ec2-user@ip-10-0-1-10 ivm]$ psql testdb -a -f extension-insert.sql
> --
> -- pg_fraction: https://github.com/nuko-yokohama/pg_fraction
> --
> DROP EXTENSION IF EXISTS pg_fraction CASCADE;
> psql:extension-insert.sql:4: NOTICE:  drop cascades to column data of table
> foo
> DROP EXTENSION
> DROP TABLE IF EXISTS foo CASCADE;
> DROP TABLE
> CREATE EXTENSION IF NOT EXISTS pg_fraction;
> CREATE EXTENSION
> \dx
>                    List of installed extensions
>     Name     | Version |   Schema   |         Description
> -------------+---------+------------+------------------------------
>  pg_fraction | 1.0     | public     | fraction data type
>  plpgsql     | 1.0     | pg_catalog | PL/pgSQL procedural language
> (2 rows)
> 
> \dT+ fraction
>                                         List of data types
>  Schema |   Name   | Internal name | Size | Elements |  Owner   | Access
> privileges | Description
> --------+----------+---------------+------+----------+----------+-------------------+-------------
>  public | fraction | fraction      | 16   |          | postgres |
>         |
> (1 row)
> 
> CREATE TABLE foo (id int, data fraction);
> CREATE TABLE
> INSERT INTO foo (id, data) VALUES (1,'2/3'),(2,'1/3'),(3,'1/2');
> INSERT 0 3
> SELECT id, data FROM foo WHERE data >= '1/2';
>  id | data
> ----+------
>   1 | 2/3
>   3 | 1/2
> (2 rows)
> 
> CREATE INCREMENTAL MATERIALIZED VIEW foo_imv AS SELECT id, data FROM foo
> WHERE data >= '1/2';
> SELECT 2
> TABLE foo_imv;
>  id | data
> ----+------
>   1 | 2/3
>   3 | 1/2
> (2 rows)
> 
> INSERT INTO foo (id, data) VALUES (4,'2/3'),(5,'2/5'),(6,'3/6'); -- error
> psql:extension-insert.sql:17: ERROR:  operator does not exist: fraction
> pg_catalog.= fraction
> LINE 1: ...(mv.id IS NULL AND diff.id IS NULL)) AND (mv.data OPERATOR(p...
>                                                              ^
> HINT:  No operator matches the given name and argument types. You might
> need to add explicit type casts.
> QUERY:  WITH updt AS (UPDATE public.foo_imv AS mv SET __ivm_count__ =
> mv.__ivm_count__ OPERATOR(pg_catalog.+) diff.__ivm_count__  FROM
> pg_temp_3.pg_temp_73900 AS diff WHERE (mv.id OPERATOR(pg_catalog.=) diff.id
> OR (mv.id IS NULL AND diff.id IS NULL)) AND (mv.data OPERATOR(pg_catalog.=)
> diff.data OR (mv.data IS NULL AND diff.data IS NULL)) RETURNING mv.id,
> mv.data) INSERT INTO public.foo_imv SELECT * FROM pg_temp_3.pg_temp_73900
> AS diff WHERE NOT EXISTS (SELECT 1 FROM updt AS mv WHERE (mv.id
> OPERATOR(pg_catalog.=) diff.id OR (mv.id IS NULL AND diff.id IS NULL)) AND
> (mv.data OPERATOR(pg_catalog.=) diff.data OR (mv.data IS NULL AND diff.data
> IS NULL)));
> TABLE foo;
>  id | data
> ----+------
>   1 | 2/3
>   2 | 1/3
>   3 | 1/2
> (3 rows)
> 
> TABLE foo_imv;
>  id | data
> ----+------
>   1 | 2/3
>   3 | 1/2
> (2 rows)
> 
> DROP MATERIALIZED VIEW foo_imv;
> DROP MATERIALIZED VIEW
> INSERT INTO foo (id, data) VALUES (4,'2/3'),(5,'2/5'),(6,'3/6');
> INSERT 0 3
> TABLE foo;
>  id | data
> ----+------
>   1 | 2/3
>   2 | 1/3
>   3 | 1/2
>   4 | 2/3
>   5 | 2/5
>   6 | 1/2
> (6 rows)
> 
> ```
> 
> Best regards.
> 
> 2018年12月27日(木) 21:57 Yugo Nagata <nagata@sraoss.co.jp>:
> 
> > Hi,
> >
> > I would like to implement Incremental View Maintenance (IVM) on
> > PostgreSQL.
> > IVM is a technique to maintain materialized views which computes and
> > applies
> > only the incremental changes to the materialized views rather than
> > recomputate the contents as the current REFRESH command does.
> >
> > I had a presentation on our PoC implementation of IVM at PGConf.eu 2018
> > [1].
> > Our implementation uses row OIDs to compute deltas for materialized
> > views.
> > The basic idea is that if we have information about which rows in base
> > tables
> > are contributing to generate a certain row in a matview then we can
> > identify
> > the affected rows when a base table is updated. This is based on an idea of
> > Dr. Masunaga [2] who is a member of our group and inspired from ID-based
> > approach[3].
> >
> > In our implementation, the mapping of the row OIDs of the materialized view
> > and the base tables are stored in "OID map". When a base relation is
> > modified,
> > AFTER trigger is executed and the delta is recorded in delta tables using
> > the transition table feature. The accual udpate of the matview is triggerd
> > by REFRESH command with INCREMENTALLY option.
> >
> > However, we realize problems of our implementation. First, WITH OIDS will
> > be removed since PG12, so OIDs are no longer available. Besides this, it
> > would
> > be hard to implement this since it needs many changes of executor nodes to
> > collect base tables's OIDs during execuing a query. Also, the cost of
> > maintaining
> > OID map would be high.
> >
> > For these reasons, we started to think to implement IVM without relying on
> > OIDs
> > and made a bit more surveys.
> >
> > We also looked at Kevin Grittner's discussion [4] on incremental matview
> > maintenance.  In this discussion, Kevin proposed to use counting algorithm
> > [5]
> > to handle projection views (using DISTNICT) properly. This algorithm need
> > an
> > additional system column, count_t, in materialized views and delta tables
> > of
> > base tables.
> >
> > However, the discussion about IVM is now stoped, so we would like to
> > restart and
> > progress this.
> >
> >
> > Through our PoC inplementation and surveys, I think we need to think at
> > least
> > the followings for implementing IVM.
> >
> > 1. How to extract changes on base tables
> >
> > I think there would be at least two approaches for it.
> >
> >  - Using transition table in AFTER triggers
> >  - Extracting changes from WAL using logical decoding
> >
> > In our PoC implementation, we used AFTER trigger and transition tables,
> > but using
> > logical decoding might be better from the point of performance of base
> > table
> > modification.
> >
> > If we can represent a change of UPDATE on a base table as query-like
> > rather than
> > OLD and NEW, it may be possible to update the materialized view directly
> > instead
> > of performing delete & insert.
> >
> >
> > 2. How to compute the delta to be applied to materialized views
> >
> > Essentially, IVM is based on relational algebra. Theorically, changes on
> > base
> > tables are represented as deltas on this, like "R <- R + dR", and the
> > delta on
> > the materialized view is computed using base table deltas based on "change
> > propagation equations".  For implementation, we have to derive the
> > equation from
> > the view definition query (Query tree, or Plan tree?) and describe this as
> > SQL
> > query to compulte delta to be applied to the materialized view.
> >
> > There could be several operations for view definition: selection,
> > projection,
> > join,  aggregation, union, difference, intersection, etc.  If we can
> > prepare a
> > module for each operation, it makes IVM extensable, so we can start a
> > simple
> > view definition, and then support more complex views.
> >
> >
> > 3. How to identify rows to be modifed in materialized views
> >
> > When applying the delta to the materialized view, we have to identify
> > which row
> > in the matview is corresponding to a row in the delta.  A naive method is
> > matching
> > by using all columns in a tuple, but clearly this is unefficient. If
> > thematerialized
> > view has unique index, we can use this. Maybe, we have to force
> > materialized views
> > to have all primary key colums in their base tables.  In our PoC
> > implementation, we
> > used OID to identify rows, but this will be no longer available as said
> > above.
> >
> >
> > 4. When to maintain materialized views
> >
> > There are two candidates of the timing of maintenance, immediate (eager)
> > or deferred.
> >
> > In eager maintenance, the materialized view is updated in the same
> > transaction
> > where the base table is updated. In deferred maintenance, this is done
> > after the
> > transaction is commited, for example, when view is accessed, as a response
> > to user
> > request, etc.
> >
> > In the previous discussion[4], it is planned to start from "eager"
> > approach. In our PoC
> > implementaion, we used the other aproach, that is, using REFRESH command
> > to perform IVM.
> > I am not sure which is better as a start point, but I begin to think that
> > the eager
> > approach may be more simple since we don't have to maintain base table
> > changes in other
> > past transactions.
> >
> > In the eager maintenance approache, we have to consider a race condition
> > where two
> > different transactions change base tables simultaneously as discussed in
> > [4].
> >
> >
> > [1]
> >
https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
> > [2]
> >
https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1
> > (Japanese only)
> > [3] https://dl.acm.org/citation.cfm?id=2750546
> > [4]
> > https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
> > [5] https://dl.acm.org/citation.cfm?id=170066
> >
> > Regards,
> > --
> > Yugo Nagata <nagata@sraoss.co.jp>
> >
> >


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

legrand legrand

Дата:

17 января 2020 г., 21:10:32

Hello,

It seems that patch v11 doesn't apply any more.
Problem with "scanRTEForColumn" maybe because of change:

https://git.postgresql.org/pg/commitdiff/b541e9accb28c90656388a3f827ca3a68dd2a308

Regards
PAscal



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: Implementing Incremental View Maintenance

От

nuko yokohama

Дата:

18 января 2020 г., 05:39:35

Hi.
I understand.
Even if the function name is min, there is a possibility that it is not an aggregation operation for finding the minimum value, so it is restricted.
I understood aggregation of user-defined types is a constraint.

Also, I agree with the error message improvements.

2020年1月17日(金) 17:12 Yugo NAGATA <nagata@sraoss.co.jp>:

On Thu, 16 Jan 2020 12:59:11 +0900
nuko yokohama <nuko.yokohama@gmail.com> wrote:

> Aggregate operation of user-defined type cannot be specified
> (commit e150d964df7e3aeb768e4bae35d15764f8abd284)
>
> A SELECT statement using the MIN() and MAX() functions can be executed on a
> user-defined type column that implements the aggregate functions MIN () and
> MAX ().
> However, if the same SELECT statement is specified in the AS clause of
> CREATE INCREMENTAL MATERIALIZED VIEW, the following error will occur.
>
> ```
> SELECT MIN(data) data_min, MAX(data) data_max FROM foo;
> data_min | data_max
> ----------+----------
> 1/3 | 2/3
> (1 row)
>
> CREATE INCREMENTAL MATERIALIZED VIEW foo_min_imv AS SELECT MIN(data)
> data_min FROM foo;
> psql:extension-agg.sql:14: ERROR: aggregate function min is not supported
> CREATE INCREMENTAL MATERIALIZED VIEW foo_max_imv AS SELECT MAX(data)
> data_max FROM foo;
> psql:extension-agg.sql:15: ERROR: aggregate function max is not supported
> ```
>
> Does query including user-defined type aggregate operation not supported by
> INCREMENTAL MATERIALIZED VIEW?

The current implementation supports only built-in aggregate functions, so
user-defined aggregates are not supported, although it is allowed before.
This is because we can not know how user-defined aggregates behave and if
it can work safely with IVM. Min/Max on your fraction type may work well,
but it is possible that some user-defined aggregate functions named min
or max behave in totally different way than we expected.

In future, maybe it is possible support user-defined aggregates are supported
by extending pg_aggregate and adding support functions for IVM, but there is
not still a concrete plan for now.

BTW, the following error message doesn't look good because built-in min is
supported, so I will improve it.

ERROR: aggregate function min is not supported

Regards,
Yugo Nagata

>
> An execution example is shown below.
>
> ```
> [ec2-user@ip-10-0-1-10 ivm]$ cat extension-agg.sql
> --
> -- pg_fraction: https://github.com/nuko-yokohama/pg_fraction
> --
> DROP EXTENSION IF EXISTS pg_fraction CASCADE;
> DROP TABLE IF EXISTS foo CASCADE;
>
> CREATE EXTENSION IF NOT EXISTS pg_fraction;
> \dx
> \dT+ fraction
>
> CREATE TABLE foo (id int, data fraction);
> INSERT INTO foo (id, data) VALUES (1,'2/3'),(2,'1/3'),(3,'1/2');
> SELECT MIN(data) data_min, MAX(data) data_max FROM foo;
> CREATE INCREMENTAL MATERIALIZED VIEW foo_min_imv AS SELECT MIN(data)
> data_min FROM foo;
> CREATE INCREMENTAL MATERIALIZED VIEW foo_max_imv AS SELECT MAX(data)
> data_max FROM foo;
>
> SELECT MIN(id) id_min, MAX(id) id_max FROM foo;
> CREATE INCREMENTAL MATERIALIZED VIEW foo_id_imv AS SELECT MIN(id) id_min,
> MAX(id) id_max FROM foo;
> ```
>
> Best regards.
>
> 2018年12月27日(木) 21:57 Yugo Nagata <nagata@sraoss.co.jp>:
>
> > Hi,
> >
> > I would like to implement Incremental View Maintenance (IVM) on
> > PostgreSQL.
> > IVM is a technique to maintain materialized views which computes and
> > applies
> > only the incremental changes to the materialized views rather than
> > recomputate the contents as the current REFRESH command does.
> >
> > I had a presentation on our PoC implementation of IVM at PGConf.eu 2018
> > [1].
> > Our implementation uses row OIDs to compute deltas for materialized
> > views.
> > The basic idea is that if we have information about which rows in base
> > tables
> > are contributing to generate a certain row in a matview then we can
> > identify
> > the affected rows when a base table is updated. This is based on an idea of
> > Dr. Masunaga [2] who is a member of our group and inspired from ID-based
> > approach[3].
> >
> > In our implementation, the mapping of the row OIDs of the materialized view
> > and the base tables are stored in "OID map". When a base relation is
> > modified,
> > AFTER trigger is executed and the delta is recorded in delta tables using
> > the transition table feature. The accual udpate of the matview is triggerd
> > by REFRESH command with INCREMENTALLY option.
> >
> > However, we realize problems of our implementation. First, WITH OIDS will
> > be removed since PG12, so OIDs are no longer available. Besides this, it
> > would
> > be hard to implement this since it needs many changes of executor nodes to
> > collect base tables's OIDs during execuing a query. Also, the cost of
> > maintaining
> > OID map would be high.
> >
> > For these reasons, we started to think to implement IVM without relying on
> > OIDs
> > and made a bit more surveys.
> >
> > We also looked at Kevin Grittner's discussion [4] on incremental matview
> > maintenance. In this discussion, Kevin proposed to use counting algorithm
> > [5]
> > to handle projection views (using DISTNICT) properly. This algorithm need
> > an
> > additional system column, count_t, in materialized views and delta tables
> > of
> > base tables.
> >
> > However, the discussion about IVM is now stoped, so we would like to
> > restart and
> > progress this.
> >
> >
> > Through our PoC inplementation and surveys, I think we need to think at
> > least
> > the followings for implementing IVM.
> >
> > 1. How to extract changes on base tables
> >
> > I think there would be at least two approaches for it.
> >
> > - Using transition table in AFTER triggers
> > - Extracting changes from WAL using logical decoding
> >
> > In our PoC implementation, we used AFTER trigger and transition tables,
> > but using
> > logical decoding might be better from the point of performance of base
> > table
> > modification.
> >
> > If we can represent a change of UPDATE on a base table as query-like
> > rather than
> > OLD and NEW, it may be possible to update the materialized view directly
> > instead
> > of performing delete & insert.
> >
> >
> > 2. How to compute the delta to be applied to materialized views
> >
> > Essentially, IVM is based on relational algebra. Theorically, changes on
> > base
> > tables are represented as deltas on this, like "R <- R + dR", and the
> > delta on
> > the materialized view is computed using base table deltas based on "change
> > propagation equations". For implementation, we have to derive the
> > equation from
> > the view definition query (Query tree, or Plan tree?) and describe this as
> > SQL
> > query to compulte delta to be applied to the materialized view.
> >
> > There could be several operations for view definition: selection,
> > projection,
> > join, aggregation, union, difference, intersection, etc. If we can
> > prepare a
> > module for each operation, it makes IVM extensable, so we can start a
> > simple
> > view definition, and then support more complex views.
> >
> >
> > 3. How to identify rows to be modifed in materialized views
> >
> > When applying the delta to the materialized view, we have to identify
> > which row
> > in the matview is corresponding to a row in the delta. A naive method is
> > matching
> > by using all columns in a tuple, but clearly this is unefficient. If
> > thematerialized
> > view has unique index, we can use this. Maybe, we have to force
> > materialized views
> > to have all primary key colums in their base tables. In our PoC
> > implementation, we
> > used OID to identify rows, but this will be no longer available as said
> > above.
> >
> >
> > 4. When to maintain materialized views
> >
> > There are two candidates of the timing of maintenance, immediate (eager)
> > or deferred.
> >
> > In eager maintenance, the materialized view is updated in the same
> > transaction
> > where the base table is updated. In deferred maintenance, this is done
> > after the
> > transaction is commited, for example, when view is accessed, as a response
> > to user
> > request, etc.
> >
> > In the previous discussion[4], it is planned to start from "eager"
> > approach. In our PoC
> > implementaion, we used the other aproach, that is, using REFRESH command
> > to perform IVM.
> > I am not sure which is better as a start point, but I begin to think that
> > the eager
> > approach may be more simple since we don't have to maintain base table
> > changes in other
> > past transactions.
> >
> > In the eager maintenance approache, we have to consider a race condition
> > where two
> > different transactions change base tables simultaneously as discussed in
> > [4].
> >
> >
> > [1]
> > https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
> > [2]
> > https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1
> > (Japanese only)
> > [3] https://dl.acm.org/citation.cfm?id=2750546
> > [4]
> > https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
> > [5] https://dl.acm.org/citation.cfm?id=170066
> >
> > Regards,
> > --
> > Yugo Nagata <nagata@sraoss.co.jp>
> >
> >

--
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

20 января 2020 г., 07:57:58

On Fri, 17 Jan 2020 14:10:32 -0700 (MST)
legrand legrand <legrand_legrand@hotmail.com> wrote:

> Hello,
> 
> It seems that patch v11 doesn't apply any more.
> Problem with "scanRTEForColumn" maybe because of change:

Thank you for your reporting! We will fix this in the next update. 

Regards,
Yugo Nagata

> 
> https://git.postgresql.org/pg/commitdiff/b541e9accb28c90656388a3f827ca3a68dd2a308
> 
> Regards
> PAscal
> 
> 
> 
> --
> Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
> 
> 


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Takuma Hoshiai

Дата:

27 января 2020 г., 00:19:05

On Mon, 20 Jan 2020 16:57:58 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:

> On Fri, 17 Jan 2020 14:10:32 -0700 (MST)
> legrand legrand <legrand_legrand@hotmail.com> wrote:
> 
> > Hello,
> > 
> > It seems that patch v11 doesn't apply any more.
> > Problem with "scanRTEForColumn" maybe because of change:
> 
> Thank you for your reporting! We will fix this in the next update. 

Although I have been working conflict fix and merge latest master, it
takes a little longer, because it has large impact than we thought. 

Please wait a little more.

Regards
Takuma Hoshiai


> Regards,
> Yugo Nagata
> 
> > 
> > https://git.postgresql.org/pg/commitdiff/b541e9accb28c90656388a3f827ca3a68dd2a308
> > 
> > Regards
> > PAscal
> > 
> > 
> > 
> > --
> > Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
> > 
> > 
> 
> 
> -- 
> Yugo NAGATA <nagata@sraoss.co.jp>
> 
> 
> 


-- 
Takuma Hoshiai <hoshiai@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Takuma Hoshiai

Дата:

04 февраля 2020 г., 01:58:02

Hi, 

Attached is the latest patch (v12) to add support for Incremental Materialized View Maintenance (IVM).
It is possible to apply to current latest master branch.

Differences from the previous patch (v11) include:
* support executing REFRESH MATERIALIZED VIEW command with IVM.
* support unscannable state by WITH NO DATA option.
* add a check for LIMIT/OFFSET at creating an IMMV

 If REFRESH is executed for IMMV (incremental maintainable materialized view),  its contents is re-calculated as same
asusual materialized views (full REFRESH). Although IMMV is basically keeping up-to-date data, rounding errors can be
accumulatedin aggregated value in some cases, for example, if the view contains sum/avg on float type columns. Running
REFRESHcommand on IMMV will resolve this. Also, WITH NO DATA option allows to make IMMV unscannable. At that time,  IVM
triggersare dropped from IMMV because these become unneeded and useless. 

Also, we added new deptype option 'm' in pg_depend view for checking a trigger is for IVM. Please tell me, if add new
deptypeoption is unacceptable. It is also possible to perform the check by referencing pg_depend and pg_trigger,
pg_procview instead of adding a new deptype.

We update IVM restrictions. LIMIT/OFFSET clause is not supported with iVM because it is not suitable for incremental
changesto the materialized view. 

This issue is reported by nuko-san.
https://www.postgresql.org/message-id/CAF3Gu1ZK-s9GQh=70n8+21rBL8+fKW4tV3Ce-xuFXMsNFPO+zQ@mail.gmail.com

Best Regards,
Takuma Hoshiai

On Mon, 27 Jan 2020 09:19:05 +0900
Takuma Hoshiai <hoshiai@sraoss.co.jp> wrote:

> On Mon, 20 Jan 2020 16:57:58 +0900
> Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> 
> > On Fri, 17 Jan 2020 14:10:32 -0700 (MST)
> > legrand legrand <legrand_legrand@hotmail.com> wrote:
> > 
> > > Hello,
> > > 
> > > It seems that patch v11 doesn't apply any more.
> > > Problem with "scanRTEForColumn" maybe because of change:
> > 
> > Thank you for your reporting! We will fix this in the next update. 
> 
> Although I have been working conflict fix and merge latest master, it
> takes a little longer, because it has large impact than we thought. 
> 
> Please wait a little more.
> 
> Regards
> Takuma Hoshiai
> 
> 
> > Regards,
> > Yugo Nagata
> > 
> > > 
> > > https://git.postgresql.org/pg/commitdiff/b541e9accb28c90656388a3f827ca3a68dd2a308
> > > 
> > > Regards
> > > PAscal
> > > 
> > > 
> > > 
> > > --
> > > Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
> > > 
> > > 
> > 
> > 
> > -- 
> > Yugo NAGATA <nagata@sraoss.co.jp>
> > 
> > 
> > 
> 
> 
> -- 
> Takuma Hoshiai <hoshiai@sraoss.co.jp>
> 
> 
> 
> 

-- 
Takuma Hoshiai <hoshiai@sraoss.co.jp>

Вложения

IVM_patches_v12.tar.gz

Re: Implementing Incremental View Maintenance

От

nuko yokohama

Дата:

04 февраля 2020 г., 09:40:45

"ROW LEVEL SECURITY" and INCREMENTAL MATERIALIZED VIEW.

Hi.

If ROW LEVEL SECURITY is set for the source table after creating the INCREMENTAL MATELIALIZED VIEW, the search results by that are not reflected.
After setting ROW LEVEL SECURITY (similar to normal MATERIALIZED VIEW), you need to execute REFRESH MATERILALIZED VIEW and reflect the result.
(Not limited to this, but in general cases where search results change by means other than DML)

I propose to add this note to the document (rules.sgml).

execute log.

```
[ec2-user@ip-10-0-1-10 rls]$ psql testdb -e -f rls.sql
CREATE USER user_a;
CREATE ROLE
CREATE TABLE test (id int, data text);
CREATE TABLE
GRANT ALL ON TABLE test TO user_a;
GRANT
GRANT ALL ON SCHEMA public TO user_a;
GRANT
SET ROLE user_a;
SET
INSERT INTO test VALUES (1,'A'),(2,'B'),(3,'C');
INSERT 0 3
SELECT * FROM test;
id | data
----+------
1 | A
2 | B
3 | C
(3 rows)

CREATE VIEW test_v AS SELECT * FROM test;
CREATE VIEW
CREATE MATERIALIZED VIEW test_mv AS SELECT * FROM test;
SELECT 3
CREATE INCREMENTAL MATERIALIZED VIEW test_imv AS SELECT * FROM test;
SELECT 3
SELECT * FROM test_v;
id | data
----+------
1 | A
2 | B
3 | C
(3 rows)

SELECT * FROM test_mv;
id | data
----+------
1 | A
2 | B
3 | C
(3 rows)

SELECT * FROM test_imv;
id | data
----+------
3 | C
1 | A
2 | B
(3 rows)

RESET ROLE;
RESET
CREATE POLICY test_AAA ON test FOR SELECT TO user_a USING (data = 'A');
CREATE POLICY
ALTER TABLE test ENABLE ROW LEVEL SECURITY ;
ALTER TABLE
SET ROLE user_a;
SET
SELECT * FROM test_v;
id | data
----+------
1 | A
(1 row)

SELECT * FROM test_mv;
id | data
----+------
1 | A
2 | B
3 | C
(3 rows)

SELECT * FROM test_imv;
id | data
----+------
3 | C
1 | A
2 | B
(3 rows)

REFRESH MATERIALIZED VIEW test_mv;
REFRESH MATERIALIZED VIEW
REFRESH MATERIALIZED VIEW test_imv;
REFRESH MATERIALIZED VIEW
SELECT * FROM test_mv;
id | data
----+------
1 | A
(1 row)

SELECT * FROM test_imv;
id | data
----+------
1 | A
(1 row)

RESET ROLE;
RESET
REVOKE ALL ON TABLE test FROM user_a;
REVOKE
REVOKE ALL ON TABLE test_v FROM user_a;
REVOKE
REVOKE ALL ON TABLE test_mv FROM user_a;
REVOKE
REVOKE ALL ON TABLE test_imv FROM user_a;
REVOKE
REVOKE ALL ON SCHEMA public FROM user_a;
REVOKE
DROP TABLE test CASCADE;
psql:rls.sql:40: NOTICE: drop cascades to 3 other objects
DETAIL: drop cascades to view test_v
drop cascades to materialized view test_mv
drop cascades to materialized view test_imv
DROP TABLE
DROP USER user_a;
DROP ROLE
[ec2-user@ip-10-0-1-10 rls]$

```

Regard.

2018年12月27日(木) 21:57 Yugo Nagata <nagata@sraoss.co.jp>:

Hi,

I would like to implement Incremental View Maintenance (IVM) on PostgreSQL.
IVM is a technique to maintain materialized views which computes and applies
only the incremental changes to the materialized views rather than
recomputate the contents as the current REFRESH command does.

I had a presentation on our PoC implementation of IVM at PGConf.eu 2018 [1].
Our implementation uses row OIDs to compute deltas for materialized views.
The basic idea is that if we have information about which rows in base tables
are contributing to generate a certain row in a matview then we can identify
the affected rows when a base table is updated. This is based on an idea of
Dr. Masunaga [2] who is a member of our group and inspired from ID-based
approach[3].

In our implementation, the mapping of the row OIDs of the materialized view
and the base tables are stored in "OID map". When a base relation is modified,
AFTER trigger is executed and the delta is recorded in delta tables using
the transition table feature. The accual udpate of the matview is triggerd
by REFRESH command with INCREMENTALLY option.

However, we realize problems of our implementation. First, WITH OIDS will
be removed since PG12, so OIDs are no longer available. Besides this, it would
be hard to implement this since it needs many changes of executor nodes to
collect base tables's OIDs during execuing a query. Also, the cost of maintaining
OID map would be high.

For these reasons, we started to think to implement IVM without relying on OIDs
and made a bit more surveys.

We also looked at Kevin Grittner's discussion [4] on incremental matview
maintenance. In this discussion, Kevin proposed to use counting algorithm [5]
to handle projection views (using DISTNICT) properly. This algorithm need an
additional system column, count_t, in materialized views and delta tables of
base tables.

However, the discussion about IVM is now stoped, so we would like to restart and
progress this.

Through our PoC inplementation and surveys, I think we need to think at least
the followings for implementing IVM.

1. How to extract changes on base tables

I think there would be at least two approaches for it.

- Using transition table in AFTER triggers
- Extracting changes from WAL using logical decoding

In our PoC implementation, we used AFTER trigger and transition tables, but using
logical decoding might be better from the point of performance of base table
modification.

If we can represent a change of UPDATE on a base table as query-like rather than
OLD and NEW, it may be possible to update the materialized view directly instead
of performing delete & insert.

2. How to compute the delta to be applied to materialized views

Essentially, IVM is based on relational algebra. Theorically, changes on base
tables are represented as deltas on this, like "R <- R + dR", and the delta on
the materialized view is computed using base table deltas based on "change
propagation equations". For implementation, we have to derive the equation from
the view definition query (Query tree, or Plan tree?) and describe this as SQL
query to compulte delta to be applied to the materialized view.

There could be several operations for view definition: selection, projection,
join, aggregation, union, difference, intersection, etc. If we can prepare a
module for each operation, it makes IVM extensable, so we can start a simple
view definition, and then support more complex views.

3. How to identify rows to be modifed in materialized views

When applying the delta to the materialized view, we have to identify which row
in the matview is corresponding to a row in the delta. A naive method is matching
by using all columns in a tuple, but clearly this is unefficient. If thematerialized
view has unique index, we can use this. Maybe, we have to force materialized views
to have all primary key colums in their base tables. In our PoC implementation, we
used OID to identify rows, but this will be no longer available as said above.

4. When to maintain materialized views

There are two candidates of the timing of maintenance, immediate (eager) or deferred.

In eager maintenance, the materialized view is updated in the same transaction
where the base table is updated. In deferred maintenance, this is done after the
transaction is commited, for example, when view is accessed, as a response to user
request, etc.

In the previous discussion[4], it is planned to start from "eager" approach. In our PoC
implementaion, we used the other aproach, that is, using REFRESH command to perform IVM.
I am not sure which is better as a start point, but I begin to think that the eager
approach may be more simple since we don't have to maintain base table changes in other
past transactions.

In the eager maintenance approache, we have to consider a race condition where two
different transactions change base tables simultaneously as discussed in [4].

[1] https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
[2] https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1 (Japanese only)
[3] https://dl.acm.org/citation.cfm?id=2750546
[4] https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
[5] https://dl.acm.org/citation.cfm?id=170066

Regards,
--
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

05 февраля 2020 г., 09:52:10

On Tue, 4 Feb 2020 18:40:45 +0900
nuko yokohama <nuko.yokohama@gmail.com> wrote:

> "ROW LEVEL SECURITY" and INCREMENTAL MATERIALIZED VIEW.
> 
> Hi.
> 
> If ROW LEVEL SECURITY is set for the source table after creating the
> INCREMENTAL MATELIALIZED VIEW, the search results by that are not reflected.
> After setting ROW LEVEL SECURITY (similar to normal MATERIALIZED VIEW), you
> need to execute REFRESH MATERILALIZED VIEW and reflect the result.
> (Not limited to this, but in general cases where search results change by
> means other than DML)
> 
> I propose to add this note to the document (rules.sgml).

Thank you for your suggestion!  We'll add some description
about this to the documentation.

Regards,
Yugo Nagata

> 
> execute log.
> 
> ```
> [ec2-user@ip-10-0-1-10 rls]$ psql testdb -e -f rls.sql
> CREATE USER user_a;
> CREATE ROLE
> CREATE TABLE test (id int, data text);
> CREATE TABLE
> GRANT ALL ON TABLE test TO user_a;
> GRANT
> GRANT ALL ON SCHEMA public  TO user_a;
> GRANT
> SET ROLE user_a;
> SET
> INSERT INTO test VALUES (1,'A'),(2,'B'),(3,'C');
> INSERT 0 3
> SELECT * FROM test;
>  id | data
> ----+------
>   1 | A
>   2 | B
>   3 | C
> (3 rows)
> 
> CREATE VIEW test_v AS SELECT * FROM test;
> CREATE VIEW
> CREATE MATERIALIZED VIEW test_mv AS SELECT * FROM test;
> SELECT 3
> CREATE INCREMENTAL MATERIALIZED VIEW test_imv AS SELECT * FROM test;
> SELECT 3
> SELECT * FROM test_v;
>  id | data
> ----+------
>   1 | A
>   2 | B
>   3 | C
> (3 rows)
> 
> SELECT * FROM test_mv;
>  id | data
> ----+------
>   1 | A
>   2 | B
>   3 | C
> (3 rows)
> 
> SELECT * FROM test_imv;
>  id | data
> ----+------
>   3 | C
>   1 | A
>   2 | B
> (3 rows)
> 
> RESET ROLE;
> RESET
> CREATE POLICY test_AAA ON test FOR SELECT TO user_a USING (data = 'A');
> CREATE POLICY
> ALTER TABLE test ENABLE ROW LEVEL SECURITY ;
> ALTER TABLE
> SET ROLE user_a;
> SET
> SELECT * FROM test_v;
>  id | data
> ----+------
>   1 | A
> (1 row)
> 
> SELECT * FROM test_mv;
>  id | data
> ----+------
>   1 | A
>   2 | B
>   3 | C
> (3 rows)
> 
> SELECT * FROM test_imv;
>  id | data
> ----+------
>   3 | C
>   1 | A
>   2 | B
> (3 rows)
> 
> REFRESH MATERIALIZED VIEW test_mv;
> REFRESH MATERIALIZED VIEW
> REFRESH MATERIALIZED VIEW test_imv;
> REFRESH MATERIALIZED VIEW
> SELECT * FROM test_mv;
>  id | data
> ----+------
>   1 | A
> (1 row)
> 
> SELECT * FROM test_imv;
>  id | data
> ----+------
>   1 | A
> (1 row)
> 
> RESET ROLE;
> RESET
> REVOKE ALL ON TABLE test FROM user_a;
> REVOKE
> REVOKE ALL ON TABLE test_v FROM user_a;
> REVOKE
> REVOKE ALL ON TABLE test_mv FROM user_a;
> REVOKE
> REVOKE ALL ON TABLE test_imv FROM user_a;
> REVOKE
> REVOKE ALL ON SCHEMA public FROM user_a;
> REVOKE
> DROP TABLE test CASCADE;
> psql:rls.sql:40: NOTICE:  drop cascades to 3 other objects
> DETAIL:  drop cascades to view test_v
> drop cascades to materialized view test_mv
> drop cascades to materialized view test_imv
> DROP TABLE
> DROP USER user_a;
> DROP ROLE
> [ec2-user@ip-10-0-1-10 rls]$
> 
> ```
> 
> Regard.
> 
> 2018年12月27日(木) 21:57 Yugo Nagata <nagata@sraoss.co.jp>:
> 
> > Hi,
> >
> > I would like to implement Incremental View Maintenance (IVM) on
> > PostgreSQL.
> > IVM is a technique to maintain materialized views which computes and
> > applies
> > only the incremental changes to the materialized views rather than
> > recomputate the contents as the current REFRESH command does.
> >
> > I had a presentation on our PoC implementation of IVM at PGConf.eu 2018
> > [1].
> > Our implementation uses row OIDs to compute deltas for materialized
> > views.
> > The basic idea is that if we have information about which rows in base
> > tables
> > are contributing to generate a certain row in a matview then we can
> > identify
> > the affected rows when a base table is updated. This is based on an idea of
> > Dr. Masunaga [2] who is a member of our group and inspired from ID-based
> > approach[3].
> >
> > In our implementation, the mapping of the row OIDs of the materialized view
> > and the base tables are stored in "OID map". When a base relation is
> > modified,
> > AFTER trigger is executed and the delta is recorded in delta tables using
> > the transition table feature. The accual udpate of the matview is triggerd
> > by REFRESH command with INCREMENTALLY option.
> >
> > However, we realize problems of our implementation. First, WITH OIDS will
> > be removed since PG12, so OIDs are no longer available. Besides this, it
> > would
> > be hard to implement this since it needs many changes of executor nodes to
> > collect base tables's OIDs during execuing a query. Also, the cost of
> > maintaining
> > OID map would be high.
> >
> > For these reasons, we started to think to implement IVM without relying on
> > OIDs
> > and made a bit more surveys.
> >
> > We also looked at Kevin Grittner's discussion [4] on incremental matview
> > maintenance.  In this discussion, Kevin proposed to use counting algorithm
> > [5]
> > to handle projection views (using DISTNICT) properly. This algorithm need
> > an
> > additional system column, count_t, in materialized views and delta tables
> > of
> > base tables.
> >
> > However, the discussion about IVM is now stoped, so we would like to
> > restart and
> > progress this.
> >
> >
> > Through our PoC inplementation and surveys, I think we need to think at
> > least
> > the followings for implementing IVM.
> >
> > 1. How to extract changes on base tables
> >
> > I think there would be at least two approaches for it.
> >
> >  - Using transition table in AFTER triggers
> >  - Extracting changes from WAL using logical decoding
> >
> > In our PoC implementation, we used AFTER trigger and transition tables,
> > but using
> > logical decoding might be better from the point of performance of base
> > table
> > modification.
> >
> > If we can represent a change of UPDATE on a base table as query-like
> > rather than
> > OLD and NEW, it may be possible to update the materialized view directly
> > instead
> > of performing delete & insert.
> >
> >
> > 2. How to compute the delta to be applied to materialized views
> >
> > Essentially, IVM is based on relational algebra. Theorically, changes on
> > base
> > tables are represented as deltas on this, like "R <- R + dR", and the
> > delta on
> > the materialized view is computed using base table deltas based on "change
> > propagation equations".  For implementation, we have to derive the
> > equation from
> > the view definition query (Query tree, or Plan tree?) and describe this as
> > SQL
> > query to compulte delta to be applied to the materialized view.
> >
> > There could be several operations for view definition: selection,
> > projection,
> > join,  aggregation, union, difference, intersection, etc.  If we can
> > prepare a
> > module for each operation, it makes IVM extensable, so we can start a
> > simple
> > view definition, and then support more complex views.
> >
> >
> > 3. How to identify rows to be modifed in materialized views
> >
> > When applying the delta to the materialized view, we have to identify
> > which row
> > in the matview is corresponding to a row in the delta.  A naive method is
> > matching
> > by using all columns in a tuple, but clearly this is unefficient. If
> > thematerialized
> > view has unique index, we can use this. Maybe, we have to force
> > materialized views
> > to have all primary key colums in their base tables.  In our PoC
> > implementation, we
> > used OID to identify rows, but this will be no longer available as said
> > above.
> >
> >
> > 4. When to maintain materialized views
> >
> > There are two candidates of the timing of maintenance, immediate (eager)
> > or deferred.
> >
> > In eager maintenance, the materialized view is updated in the same
> > transaction
> > where the base table is updated. In deferred maintenance, this is done
> > after the
> > transaction is commited, for example, when view is accessed, as a response
> > to user
> > request, etc.
> >
> > In the previous discussion[4], it is planned to start from "eager"
> > approach. In our PoC
> > implementaion, we used the other aproach, that is, using REFRESH command
> > to perform IVM.
> > I am not sure which is better as a start point, but I begin to think that
> > the eager
> > approach may be more simple since we don't have to maintain base table
> > changes in other
> > past transactions.
> >
> > In the eager maintenance approache, we have to consider a race condition
> > where two
> > different transactions change base tables simultaneously as discussed in
> > [4].
> >
> >
> > [1]
> >
https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
> > [2]
> >
https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1
> > (Japanese only)
> > [3] https://dl.acm.org/citation.cfm?id=2750546
> > [4]
> > https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
> > [5] https://dl.acm.org/citation.cfm?id=170066
> >
> > Regards,
> > --
> > Yugo Nagata <nagata@sraoss.co.jp>
> >
> >


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

nuko yokohama

Дата:

08 февраля 2020 г., 02:15:45

Hi.

UNION query problem.(server crash)

When creating an INCREMENTAL MATERIALIZED VIEW,
the server process crashes if you specify a query with a UNION.

(commit id = 23151be7be8d8f8f9c35c2d0e4e5353aedf2b31e)

execute log.

```
[ec2-user@ip-10-0-1-10 ivm]$ psql testdb -e -f union_query_crash.sql
DROP TABLE IF EXISTS table_x CASCADE;
psql:union_query_crash.sql:6: NOTICE: drop cascades to view xy_union_v
DROP TABLE
DROP TABLE IF EXISTS table_y CASCADE;
DROP TABLE
CREATE TABLE table_x (id int, data numeric);
CREATE TABLE
CREATE TABLE table_y (id int, data numeric);
CREATE TABLE
INSERT INTO table_x VALUES (generate_series(1, 3), random()::numeric);
INSERT 0 3
INSERT INTO table_y VALUES (generate_series(1, 3), random()::numeric);
INSERT 0 3
SELECT * FROM table_x;
id | data
----+--------------------
1 | 0.950724735058774
2 | 0.0222670808201144
3 | 0.391258547114841
(3 rows)

SELECT * FROM table_y;
id | data
----+--------------------
1 | 0.991717347778337
2 | 0.0528458947672874
3 | 0.965044982911163
(3 rows)

CREATE VIEW xy_union_v AS
SELECT 'table_x' AS name, * FROM table_x
UNION
SELECT 'table_y' AS name, * FROM table_y
;
CREATE VIEW
TABLE xy_union_v;
name | id | data
---------+----+--------------------
table_y | 2 | 0.0528458947672874
table_x | 2 | 0.0222670808201144
table_y | 3 | 0.965044982911163
table_x | 1 | 0.950724735058774
table_x | 3 | 0.391258547114841
table_y | 1 | 0.991717347778337
(6 rows)

CREATE INCREMENTAL MATERIALIZED VIEW xy_imv AS
SELECT 'table_x' AS name, * FROM table_x
UNION
SELECT 'table_y' AS name, * FROM table_y
;
psql:union_query_crash.sql:28: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql:union_query_crash.sql:28: fatal: connection to server was lost
```
UNION query problem.(server crash)

When creating an INCREMENTAL MATERIALIZED VIEW,
the server process crashes if you specify a query with a UNION.

(commit id = 23151be7be8d8f8f9c35c2d0e4e5353aedf2b31e)

execute log.

```
[ec2-user@ip-10-0-1-10 ivm]$ psql testdb -e -f union_query_crash.sql
DROP TABLE IF EXISTS table_x CASCADE;
psql:union_query_crash.sql:6: NOTICE: drop cascades to view xy_union_v
DROP TABLE
DROP TABLE IF EXISTS table_y CASCADE;
DROP TABLE
CREATE TABLE table_x (id int, data numeric);
CREATE TABLE
CREATE TABLE table_y (id int, data numeric);
CREATE TABLE
INSERT INTO table_x VALUES (generate_series(1, 3), random()::numeric);
INSERT 0 3
INSERT INTO table_y VALUES (generate_series(1, 3), random()::numeric);
INSERT 0 3
SELECT * FROM table_x;
id | data
----+--------------------
1 | 0.950724735058774
2 | 0.0222670808201144
3 | 0.391258547114841
(3 rows)

SELECT * FROM table_y;
id | data
----+--------------------
1 | 0.991717347778337
2 | 0.0528458947672874
3 | 0.965044982911163
(3 rows)

CREATE VIEW xy_union_v AS
SELECT 'table_x' AS name, * FROM table_x
UNION
SELECT 'table_y' AS name, * FROM table_y
;
CREATE VIEW
TABLE xy_union_v;
name | id | data
---------+----+--------------------
table_y | 2 | 0.0528458947672874
table_x | 2 | 0.0222670808201144
table_y | 3 | 0.965044982911163
table_x | 1 | 0.950724735058774
table_x | 3 | 0.391258547114841
table_y | 1 | 0.991717347778337
(6 rows)

CREATE INCREMENTAL MATERIALIZED VIEW xy_imv AS
SELECT 'table_x' AS name, * FROM table_x
UNION
SELECT 'table_y' AS name, * FROM table_y
;
psql:union_query_crash.sql:28: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql:union_query_crash.sql:28: fatal: connection to server was lost
```

2018年12月27日(木) 21:57 Yugo Nagata <nagata@sraoss.co.jp>:

Hi,

I would like to implement Incremental View Maintenance (IVM) on PostgreSQL.
IVM is a technique to maintain materialized views which computes and applies
only the incremental changes to the materialized views rather than
recomputate the contents as the current REFRESH command does.

I had a presentation on our PoC implementation of IVM at PGConf.eu 2018 [1].
Our implementation uses row OIDs to compute deltas for materialized views.
The basic idea is that if we have information about which rows in base tables
are contributing to generate a certain row in a matview then we can identify
the affected rows when a base table is updated. This is based on an idea of
Dr. Masunaga [2] who is a member of our group and inspired from ID-based
approach[3].

In our implementation, the mapping of the row OIDs of the materialized view
and the base tables are stored in "OID map". When a base relation is modified,
AFTER trigger is executed and the delta is recorded in delta tables using
the transition table feature. The accual udpate of the matview is triggerd
by REFRESH command with INCREMENTALLY option.

However, we realize problems of our implementation. First, WITH OIDS will
be removed since PG12, so OIDs are no longer available. Besides this, it would
be hard to implement this since it needs many changes of executor nodes to
collect base tables's OIDs during execuing a query. Also, the cost of maintaining
OID map would be high.

For these reasons, we started to think to implement IVM without relying on OIDs
and made a bit more surveys.

We also looked at Kevin Grittner's discussion [4] on incremental matview
maintenance. In this discussion, Kevin proposed to use counting algorithm [5]
to handle projection views (using DISTNICT) properly. This algorithm need an
additional system column, count_t, in materialized views and delta tables of
base tables.

However, the discussion about IVM is now stoped, so we would like to restart and
progress this.

Through our PoC inplementation and surveys, I think we need to think at least
the followings for implementing IVM.

1. How to extract changes on base tables

I think there would be at least two approaches for it.

- Using transition table in AFTER triggers
- Extracting changes from WAL using logical decoding

In our PoC implementation, we used AFTER trigger and transition tables, but using
logical decoding might be better from the point of performance of base table
modification.

If we can represent a change of UPDATE on a base table as query-like rather than
OLD and NEW, it may be possible to update the materialized view directly instead
of performing delete & insert.

2. How to compute the delta to be applied to materialized views

Essentially, IVM is based on relational algebra. Theorically, changes on base
tables are represented as deltas on this, like "R <- R + dR", and the delta on
the materialized view is computed using base table deltas based on "change
propagation equations". For implementation, we have to derive the equation from
the view definition query (Query tree, or Plan tree?) and describe this as SQL
query to compulte delta to be applied to the materialized view.

There could be several operations for view definition: selection, projection,
join, aggregation, union, difference, intersection, etc. If we can prepare a
module for each operation, it makes IVM extensable, so we can start a simple
view definition, and then support more complex views.

3. How to identify rows to be modifed in materialized views

When applying the delta to the materialized view, we have to identify which row
in the matview is corresponding to a row in the delta. A naive method is matching
by using all columns in a tuple, but clearly this is unefficient. If thematerialized
view has unique index, we can use this. Maybe, we have to force materialized views
to have all primary key colums in their base tables. In our PoC implementation, we
used OID to identify rows, but this will be no longer available as said above.

4. When to maintain materialized views

There are two candidates of the timing of maintenance, immediate (eager) or deferred.

In eager maintenance, the materialized view is updated in the same transaction
where the base table is updated. In deferred maintenance, this is done after the
transaction is commited, for example, when view is accessed, as a response to user
request, etc.

In the previous discussion[4], it is planned to start from "eager" approach. In our PoC
implementaion, we used the other aproach, that is, using REFRESH command to perform IVM.
I am not sure which is better as a start point, but I begin to think that the eager
approach may be more simple since we don't have to maintain base table changes in other
past transactions.

In the eager maintenance approache, we have to consider a race condition where two
different transactions change base tables simultaneously as discussed in [4].

[1] https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
[2] https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1 (Japanese only)
[3] https://dl.acm.org/citation.cfm?id=2750546
[4] https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
[5] https://dl.acm.org/citation.cfm?id=170066

Regards,
--
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

10 февраля 2020 г., 01:37:50

On Sat, 8 Feb 2020 11:15:45 +0900
nuko yokohama <nuko.yokohama@gmail.com> wrote:

> Hi.
> 
> UNION query problem.(server crash)
> 
> When creating an INCREMENTAL MATERIALIZED VIEW,
> the server process crashes if you specify a query with a UNION.

Thank you for your report. As you noticed set operations including
UNION is concurrently unsupported, although this is not checked at
definition time and not documented either. Now we are thoroughly
investigating unsupported queries, and will add checks and
documentations for them.

Regards,
Yugo Nagata

> 
> (commit id = 23151be7be8d8f8f9c35c2d0e4e5353aedf2b31e)
> 
> execute log.
> 
> ```
> [ec2-user@ip-10-0-1-10 ivm]$ psql testdb -e -f union_query_crash.sql
> DROP TABLE IF EXISTS table_x CASCADE;
> psql:union_query_crash.sql:6: NOTICE:  drop cascades to view xy_union_v
> DROP TABLE
> DROP TABLE IF EXISTS table_y CASCADE;
> DROP TABLE
> CREATE TABLE table_x (id int, data numeric);
> CREATE TABLE
> CREATE TABLE table_y (id int, data numeric);
> CREATE TABLE
> INSERT INTO table_x VALUES (generate_series(1, 3), random()::numeric);
> INSERT 0 3
> INSERT INTO table_y VALUES (generate_series(1, 3), random()::numeric);
> INSERT 0 3
> SELECT * FROM table_x;
>  id |        data
> ----+--------------------
>   1 |  0.950724735058774
>   2 | 0.0222670808201144
>   3 |  0.391258547114841
> (3 rows)
> 
> SELECT * FROM table_y;
>  id |        data
> ----+--------------------
>   1 |  0.991717347778337
>   2 | 0.0528458947672874
>   3 |  0.965044982911163
> (3 rows)
> 
> CREATE VIEW xy_union_v AS
> SELECT 'table_x' AS name, * FROM table_x
> UNION
> SELECT 'table_y' AS name, * FROM table_y
> ;
> CREATE VIEW
> TABLE xy_union_v;
>   name   | id |        data
> ---------+----+--------------------
>  table_y |  2 | 0.0528458947672874
>  table_x |  2 | 0.0222670808201144
>  table_y |  3 |  0.965044982911163
>  table_x |  1 |  0.950724735058774
>  table_x |  3 |  0.391258547114841
>  table_y |  1 |  0.991717347778337
> (6 rows)
> 
> CREATE INCREMENTAL MATERIALIZED VIEW xy_imv AS
> SELECT 'table_x' AS name, * FROM table_x
> UNION
> SELECT 'table_y' AS name, * FROM table_y
> ;
> psql:union_query_crash.sql:28: server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
> psql:union_query_crash.sql:28: fatal: connection to server was lost
> ```
> UNION query problem.(server crash)
> 
> When creating an INCREMENTAL MATERIALIZED VIEW,
> the server process crashes if you specify a query with a UNION.
> 
> (commit id = 23151be7be8d8f8f9c35c2d0e4e5353aedf2b31e)
> 
> execute log.
> 
> ```
> [ec2-user@ip-10-0-1-10 ivm]$ psql testdb -e -f union_query_crash.sql
> DROP TABLE IF EXISTS table_x CASCADE;
> psql:union_query_crash.sql:6: NOTICE:  drop cascades to view xy_union_v
> DROP TABLE
> DROP TABLE IF EXISTS table_y CASCADE;
> DROP TABLE
> CREATE TABLE table_x (id int, data numeric);
> CREATE TABLE
> CREATE TABLE table_y (id int, data numeric);
> CREATE TABLE
> INSERT INTO table_x VALUES (generate_series(1, 3), random()::numeric);
> INSERT 0 3
> INSERT INTO table_y VALUES (generate_series(1, 3), random()::numeric);
> INSERT 0 3
> SELECT * FROM table_x;
>  id |        data
> ----+--------------------
>   1 |  0.950724735058774
>   2 | 0.0222670808201144
>   3 |  0.391258547114841
> (3 rows)
> 
> SELECT * FROM table_y;
>  id |        data
> ----+--------------------
>   1 |  0.991717347778337
>   2 | 0.0528458947672874
>   3 |  0.965044982911163
> (3 rows)
> 
> CREATE VIEW xy_union_v AS
> SELECT 'table_x' AS name, * FROM table_x
> UNION
> SELECT 'table_y' AS name, * FROM table_y
> ;
> CREATE VIEW
> TABLE xy_union_v;
>   name   | id |        data
> ---------+----+--------------------
>  table_y |  2 | 0.0528458947672874
>  table_x |  2 | 0.0222670808201144
>  table_y |  3 |  0.965044982911163
>  table_x |  1 |  0.950724735058774
>  table_x |  3 |  0.391258547114841
>  table_y |  1 |  0.991717347778337
> (6 rows)
> 
> CREATE INCREMENTAL MATERIALIZED VIEW xy_imv AS
> SELECT 'table_x' AS name, * FROM table_x
> UNION
> SELECT 'table_y' AS name, * FROM table_y
> ;
> psql:union_query_crash.sql:28: server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
> psql:union_query_crash.sql:28: fatal: connection to server was lost
> ```
> 
> 2018年12月27日(木) 21:57 Yugo Nagata <nagata@sraoss.co.jp>:
> 
> > Hi,
> >
> > I would like to implement Incremental View Maintenance (IVM) on
> > PostgreSQL.
> > IVM is a technique to maintain materialized views which computes and
> > applies
> > only the incremental changes to the materialized views rather than
> > recomputate the contents as the current REFRESH command does.
> >
> > I had a presentation on our PoC implementation of IVM at PGConf.eu 2018
> > [1].
> > Our implementation uses row OIDs to compute deltas for materialized
> > views.
> > The basic idea is that if we have information about which rows in base
> > tables
> > are contributing to generate a certain row in a matview then we can
> > identify
> > the affected rows when a base table is updated. This is based on an idea of
> > Dr. Masunaga [2] who is a member of our group and inspired from ID-based
> > approach[3].
> >
> > In our implementation, the mapping of the row OIDs of the materialized view
> > and the base tables are stored in "OID map". When a base relation is
> > modified,
> > AFTER trigger is executed and the delta is recorded in delta tables using
> > the transition table feature. The accual udpate of the matview is triggerd
> > by REFRESH command with INCREMENTALLY option.
> >
> > However, we realize problems of our implementation. First, WITH OIDS will
> > be removed since PG12, so OIDs are no longer available. Besides this, it
> > would
> > be hard to implement this since it needs many changes of executor nodes to
> > collect base tables's OIDs during execuing a query. Also, the cost of
> > maintaining
> > OID map would be high.
> >
> > For these reasons, we started to think to implement IVM without relying on
> > OIDs
> > and made a bit more surveys.
> >
> > We also looked at Kevin Grittner's discussion [4] on incremental matview
> > maintenance.  In this discussion, Kevin proposed to use counting algorithm
> > [5]
> > to handle projection views (using DISTNICT) properly. This algorithm need
> > an
> > additional system column, count_t, in materialized views and delta tables
> > of
> > base tables.
> >
> > However, the discussion about IVM is now stoped, so we would like to
> > restart and
> > progress this.
> >
> >
> > Through our PoC inplementation and surveys, I think we need to think at
> > least
> > the followings for implementing IVM.
> >
> > 1. How to extract changes on base tables
> >
> > I think there would be at least two approaches for it.
> >
> >  - Using transition table in AFTER triggers
> >  - Extracting changes from WAL using logical decoding
> >
> > In our PoC implementation, we used AFTER trigger and transition tables,
> > but using
> > logical decoding might be better from the point of performance of base
> > table
> > modification.
> >
> > If we can represent a change of UPDATE on a base table as query-like
> > rather than
> > OLD and NEW, it may be possible to update the materialized view directly
> > instead
> > of performing delete & insert.
> >
> >
> > 2. How to compute the delta to be applied to materialized views
> >
> > Essentially, IVM is based on relational algebra. Theorically, changes on
> > base
> > tables are represented as deltas on this, like "R <- R + dR", and the
> > delta on
> > the materialized view is computed using base table deltas based on "change
> > propagation equations".  For implementation, we have to derive the
> > equation from
> > the view definition query (Query tree, or Plan tree?) and describe this as
> > SQL
> > query to compulte delta to be applied to the materialized view.
> >
> > There could be several operations for view definition: selection,
> > projection,
> > join,  aggregation, union, difference, intersection, etc.  If we can
> > prepare a
> > module for each operation, it makes IVM extensable, so we can start a
> > simple
> > view definition, and then support more complex views.
> >
> >
> > 3. How to identify rows to be modifed in materialized views
> >
> > When applying the delta to the materialized view, we have to identify
> > which row
> > in the matview is corresponding to a row in the delta.  A naive method is
> > matching
> > by using all columns in a tuple, but clearly this is unefficient. If
> > thematerialized
> > view has unique index, we can use this. Maybe, we have to force
> > materialized views
> > to have all primary key colums in their base tables.  In our PoC
> > implementation, we
> > used OID to identify rows, but this will be no longer available as said
> > above.
> >
> >
> > 4. When to maintain materialized views
> >
> > There are two candidates of the timing of maintenance, immediate (eager)
> > or deferred.
> >
> > In eager maintenance, the materialized view is updated in the same
> > transaction
> > where the base table is updated. In deferred maintenance, this is done
> > after the
> > transaction is commited, for example, when view is accessed, as a response
> > to user
> > request, etc.
> >
> > In the previous discussion[4], it is planned to start from "eager"
> > approach. In our PoC
> > implementaion, we used the other aproach, that is, using REFRESH command
> > to perform IVM.
> > I am not sure which is better as a start point, but I begin to think that
> > the eager
> > approach may be more simple since we don't have to maintain base table
> > changes in other
> > past transactions.
> >
> > In the eager maintenance approache, we have to consider a race condition
> > where two
> > different transactions change base tables simultaneously as discussed in
> > [4].
> >
> >
> > [1]
> >
https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
> > [2]
> >
https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1
> > (Japanese only)
> > [3] https://dl.acm.org/citation.cfm?id=2750546
> > [4]
> > https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
> > [5] https://dl.acm.org/citation.cfm?id=170066
> >
> > Regards,
> > --
> > Yugo Nagata <nagata@sraoss.co.jp>
> >
> >


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

nuko yokohama

Дата:

10 февраля 2020 г., 03:34:32

Hi.

I understod that UNION is unsupported.

I also refer to the implementation of "./src/backend/commands/createas.c" check_ivm_restriction_walker () to see if there are any other queries that may be problematic.

2020年2月10日(月) 10:38 Yugo NAGATA <nagata@sraoss.co.jp>:

On Sat, 8 Feb 2020 11:15:45 +0900
nuko yokohama <nuko.yokohama@gmail.com> wrote:

> Hi.
>
> UNION query problem.(server crash)
>
> When creating an INCREMENTAL MATERIALIZED VIEW,
> the server process crashes if you specify a query with a UNION.

Thank you for your report. As you noticed set operations including
UNION is concurrently unsupported, although this is not checked at
definition time and not documented either. Now we are thoroughly
investigating unsupported queries, and will add checks and
documentations for them.

Regards,
Yugo Nagata

>
> (commit id = 23151be7be8d8f8f9c35c2d0e4e5353aedf2b31e)
>
> execute log.
>
> ```
> [ec2-user@ip-10-0-1-10 ivm]$ psql testdb -e -f union_query_crash.sql
> DROP TABLE IF EXISTS table_x CASCADE;
> psql:union_query_crash.sql:6: NOTICE: drop cascades to view xy_union_v
> DROP TABLE
> DROP TABLE IF EXISTS table_y CASCADE;
> DROP TABLE
> CREATE TABLE table_x (id int, data numeric);
> CREATE TABLE
> CREATE TABLE table_y (id int, data numeric);
> CREATE TABLE
> INSERT INTO table_x VALUES (generate_series(1, 3), random()::numeric);
> INSERT 0 3
> INSERT INTO table_y VALUES (generate_series(1, 3), random()::numeric);
> INSERT 0 3
> SELECT * FROM table_x;
> id | data
> ----+--------------------
> 1 | 0.950724735058774
> 2 | 0.0222670808201144
> 3 | 0.391258547114841
> (3 rows)
>
> SELECT * FROM table_y;
> id | data
> ----+--------------------
> 1 | 0.991717347778337
> 2 | 0.0528458947672874
> 3 | 0.965044982911163
> (3 rows)
>
> CREATE VIEW xy_union_v AS
> SELECT 'table_x' AS name, * FROM table_x
> UNION
> SELECT 'table_y' AS name, * FROM table_y
> ;
> CREATE VIEW
> TABLE xy_union_v;
> name | id | data
> ---------+----+--------------------
> table_y | 2 | 0.0528458947672874
> table_x | 2 | 0.0222670808201144
> table_y | 3 | 0.965044982911163
> table_x | 1 | 0.950724735058774
> table_x | 3 | 0.391258547114841
> table_y | 1 | 0.991717347778337
> (6 rows)
>
> CREATE INCREMENTAL MATERIALIZED VIEW xy_imv AS
> SELECT 'table_x' AS name, * FROM table_x
> UNION
> SELECT 'table_y' AS name, * FROM table_y
> ;
> psql:union_query_crash.sql:28: server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> psql:union_query_crash.sql:28: fatal: connection to server was lost
> ```
> UNION query problem.(server crash)
>
> When creating an INCREMENTAL MATERIALIZED VIEW,
> the server process crashes if you specify a query with a UNION.
>
> (commit id = 23151be7be8d8f8f9c35c2d0e4e5353aedf2b31e)
>
> execute log.
>
> ```
> [ec2-user@ip-10-0-1-10 ivm]$ psql testdb -e -f union_query_crash.sql
> DROP TABLE IF EXISTS table_x CASCADE;
> psql:union_query_crash.sql:6: NOTICE: drop cascades to view xy_union_v
> DROP TABLE
> DROP TABLE IF EXISTS table_y CASCADE;
> DROP TABLE
> CREATE TABLE table_x (id int, data numeric);
> CREATE TABLE
> CREATE TABLE table_y (id int, data numeric);
> CREATE TABLE
> INSERT INTO table_x VALUES (generate_series(1, 3), random()::numeric);
> INSERT 0 3
> INSERT INTO table_y VALUES (generate_series(1, 3), random()::numeric);
> INSERT 0 3
> SELECT * FROM table_x;
> id | data
> ----+--------------------
> 1 | 0.950724735058774
> 2 | 0.0222670808201144
> 3 | 0.391258547114841
> (3 rows)
>
> SELECT * FROM table_y;
> id | data
> ----+--------------------
> 1 | 0.991717347778337
> 2 | 0.0528458947672874
> 3 | 0.965044982911163
> (3 rows)
>
> CREATE VIEW xy_union_v AS
> SELECT 'table_x' AS name, * FROM table_x
> UNION
> SELECT 'table_y' AS name, * FROM table_y
> ;
> CREATE VIEW
> TABLE xy_union_v;
> name | id | data
> ---------+----+--------------------
> table_y | 2 | 0.0528458947672874
> table_x | 2 | 0.0222670808201144
> table_y | 3 | 0.965044982911163
> table_x | 1 | 0.950724735058774
> table_x | 3 | 0.391258547114841
> table_y | 1 | 0.991717347778337
> (6 rows)
>
> CREATE INCREMENTAL MATERIALIZED VIEW xy_imv AS
> SELECT 'table_x' AS name, * FROM table_x
> UNION
> SELECT 'table_y' AS name, * FROM table_y
> ;
> psql:union_query_crash.sql:28: server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> psql:union_query_crash.sql:28: fatal: connection to server was lost
> ```
>
> 2018年12月27日(木) 21:57 Yugo Nagata <nagata@sraoss.co.jp>:
>
> > Hi,
> >
> > I would like to implement Incremental View Maintenance (IVM) on
> > PostgreSQL.
> > IVM is a technique to maintain materialized views which computes and
> > applies
> > only the incremental changes to the materialized views rather than
> > recomputate the contents as the current REFRESH command does.
> >
> > I had a presentation on our PoC implementation of IVM at PGConf.eu 2018
> > [1].
> > Our implementation uses row OIDs to compute deltas for materialized
> > views.
> > The basic idea is that if we have information about which rows in base
> > tables
> > are contributing to generate a certain row in a matview then we can
> > identify
> > the affected rows when a base table is updated. This is based on an idea of
> > Dr. Masunaga [2] who is a member of our group and inspired from ID-based
> > approach[3].
> >
> > In our implementation, the mapping of the row OIDs of the materialized view
> > and the base tables are stored in "OID map". When a base relation is
> > modified,
> > AFTER trigger is executed and the delta is recorded in delta tables using
> > the transition table feature. The accual udpate of the matview is triggerd
> > by REFRESH command with INCREMENTALLY option.
> >
> > However, we realize problems of our implementation. First, WITH OIDS will
> > be removed since PG12, so OIDs are no longer available. Besides this, it
> > would
> > be hard to implement this since it needs many changes of executor nodes to
> > collect base tables's OIDs during execuing a query. Also, the cost of
> > maintaining
> > OID map would be high.
> >
> > For these reasons, we started to think to implement IVM without relying on
> > OIDs
> > and made a bit more surveys.
> >
> > We also looked at Kevin Grittner's discussion [4] on incremental matview
> > maintenance. In this discussion, Kevin proposed to use counting algorithm
> > [5]
> > to handle projection views (using DISTNICT) properly. This algorithm need
> > an
> > additional system column, count_t, in materialized views and delta tables
> > of
> > base tables.
> >
> > However, the discussion about IVM is now stoped, so we would like to
> > restart and
> > progress this.
> >
> >
> > Through our PoC inplementation and surveys, I think we need to think at
> > least
> > the followings for implementing IVM.
> >
> > 1. How to extract changes on base tables
> >
> > I think there would be at least two approaches for it.
> >
> > - Using transition table in AFTER triggers
> > - Extracting changes from WAL using logical decoding
> >
> > In our PoC implementation, we used AFTER trigger and transition tables,
> > but using
> > logical decoding might be better from the point of performance of base
> > table
> > modification.
> >
> > If we can represent a change of UPDATE on a base table as query-like
> > rather than
> > OLD and NEW, it may be possible to update the materialized view directly
> > instead
> > of performing delete & insert.
> >
> >
> > 2. How to compute the delta to be applied to materialized views
> >
> > Essentially, IVM is based on relational algebra. Theorically, changes on
> > base
> > tables are represented as deltas on this, like "R <- R + dR", and the
> > delta on
> > the materialized view is computed using base table deltas based on "change
> > propagation equations". For implementation, we have to derive the
> > equation from
> > the view definition query (Query tree, or Plan tree?) and describe this as
> > SQL
> > query to compulte delta to be applied to the materialized view.
> >
> > There could be several operations for view definition: selection,
> > projection,
> > join, aggregation, union, difference, intersection, etc. If we can
> > prepare a
> > module for each operation, it makes IVM extensable, so we can start a
> > simple
> > view definition, and then support more complex views.
> >
> >
> > 3. How to identify rows to be modifed in materialized views
> >
> > When applying the delta to the materialized view, we have to identify
> > which row
> > in the matview is corresponding to a row in the delta. A naive method is
> > matching
> > by using all columns in a tuple, but clearly this is unefficient. If
> > thematerialized
> > view has unique index, we can use this. Maybe, we have to force
> > materialized views
> > to have all primary key colums in their base tables. In our PoC
> > implementation, we
> > used OID to identify rows, but this will be no longer available as said
> > above.
> >
> >
> > 4. When to maintain materialized views
> >
> > There are two candidates of the timing of maintenance, immediate (eager)
> > or deferred.
> >
> > In eager maintenance, the materialized view is updated in the same
> > transaction
> > where the base table is updated. In deferred maintenance, this is done
> > after the
> > transaction is commited, for example, when view is accessed, as a response
> > to user
> > request, etc.
> >
> > In the previous discussion[4], it is planned to start from "eager"
> > approach. In our PoC
> > implementaion, we used the other aproach, that is, using REFRESH command
> > to perform IVM.
> > I am not sure which is better as a start point, but I begin to think that
> > the eager
> > approach may be more simple since we don't have to maintain base table
> > changes in other
> > past transactions.
> >
> > In the eager maintenance approache, we have to consider a race condition
> > where two
> > different transactions change base tables simultaneously as discussed in
> > [4].
> >
> >
> > [1]
> > https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
> > [2]
> > https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1
> > (Japanese only)
> > [3] https://dl.acm.org/citation.cfm?id=2750546
> > [4]
> > https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
> > [5] https://dl.acm.org/citation.cfm?id=170066
> >
> > Regards,
> > --
> > Yugo Nagata <nagata@sraoss.co.jp>
> >
> >

--
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

10 февраля 2020 г., 04:58:54

Hi,

Attached is the latest patch (v13) to add support for Incremental
View Maintenance (IVM). Differences from the previous patch (v12)
include:

* Allow to maintain IMMVs containing user defined types

 Previously, IMMVs (Incrementally Maintainable Materialized Views)
 containing user defined types could not be maintained and an error
 was raised because such columns were compared using pg_calatog.=
 during tuple matching.  To fix this, use the column type's default
 equality operator instead of forcing to use the built-in operator.

 Pointed out by nuko-san.
 https://www.postgresql.org/message-id/CAF3Gu1YL7HWF0Veor3t8sQD%2BJnvozHe6WdUw0YsMqJGFezVhpg%40mail.gmail.com

* Improve an error message for unspoorted aggregate functions

 Currentlly only built-in aggregate functions are supported, so
 aggregates on user-defined types causes an error at view definition
 time. However, the message was unappropreate like:

  ERROR:  aggregate function max is not supported

 even though built-in max is supported. Therefore, this is improved
 to include its argument types as following:

  ERROR:  aggregate function min(xxx) is not supported
  HINT:  IVM supports only built-in aggregate functions.

 Pointed out by nuko-san.
 https://www.postgresql.org/message-id/CAF3Gu1bP0eiv%3DCqV%3D%2BxATdcmLypjjudLz_wdJgnRNULpiX9GrA%40mail.gmail.com

* Doc: fix description of support subquery

 IVM supports regular EXISTS clause not only correlated subqueries.

Regards,
Yugo Nagata

On Fri, 20 Dec 2019 14:02:32 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:

> IVM is a way to make materialized views up-to-date in which only
> incremental changes are computed and applied on views rather than
> recomputing the contents from scratch as REFRESH MATERIALIZED VIEW
> does. IVM can update materialized views more efficiently
> than recomputation when only small part of the view need updates.
> 
> There are two approaches with regard to timing of view maintenance:
> immediate and deferred. In immediate maintenance, views are updated in
> the same transaction where its base table is modified. In deferred
> maintenance, views are updated after the transaction is committed,
> for example, when the view is accessed, as a response to user command
> like REFRESH, or periodically in background, and so on. 
> 
> This patch implements a kind of immediate maintenance, in which
> materialized views are updated immediately in AFTER triggers when a
> base table is modified.
> 
> This supports views using:
>  - inner and outer joins including self-join
>  - some built-in aggregate functions (count, sum, agv, min, max)
>  - a part of subqueries
>    -- simple subqueries in FROM clause
>    -- EXISTS subqueries in WHERE clause
>  - DISTINCT and views with tuple duplicates

Regareds,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Вложения

IVM_patches_v13.tar.gz

Re: Implementing Incremental View Maintenance

От

legrand legrand

Дата:

11 февраля 2020 г., 22:04:12

Takuma Hoshiai wrote
> Hi, 
> 
> Attached is the latest patch (v12) to add support for Incremental
> Materialized View Maintenance (IVM).
> It is possible to apply to current latest master branch.
> 
> Differences from the previous patch (v11) include:
> * support executing REFRESH MATERIALIZED VIEW command with IVM.
> * support unscannable state by WITH NO DATA option.
> * add a check for LIMIT/OFFSET at creating an IMMV
> 
>  If REFRESH is executed for IMMV (incremental maintainable materialized
> view),  its contents is re-calculated as same as usual materialized views
> (full REFRESH). Although IMMV is basically keeping up-to-date data,
> rounding errors can be accumulated in aggregated value in some cases, for
> example, if the view contains sum/avg on float type columns. Running
> REFRESH command on IMMV will resolve this. Also, WITH NO DATA option
> allows to make IMMV unscannable. At that time,  IVM triggers are dropped
> from IMMV because these become unneeded and useless. 
> 
> [...]

Hello,

regarding syntax REFRESH MATERIALIZED VIEW x WITH NO DATA

I understand that triggers are removed from the source tables, transforming 
the INCREMENTAL MATERIALIZED VIEW into a(n unscannable) MATERIALIZED VIEW.

postgres=# refresh materialized view imv with no data;
REFRESH MATERIALIZED VIEW
postgres=# select * from imv;
ERROR:  materialized view "imv" has not been populated
HINT:  Use the REFRESH MATERIALIZED VIEW command.

This operation seems to me more of an ALTER command than a REFRESH ONE.

Wouldn't the syntax
ALTER MATERIALIZED VIEW [ IF EXISTS ] name
    SET WITH NO DATA
or
    SET WITHOUT DATA
be better ?

Continuing into this direction, did you ever think about an other feature
like:  
ALTER MATERIALIZED VIEW [ IF EXISTS ] name
    SET { NOINCREMENTAL }
or even
    SET { NOINCREMENTAL | INCREMENTAL | INCREMENTAL CONCURRENTLY  }

that would permit to switch between those modes and would keep frozen data 
available in the materialized view during heavy operations on source tables
?

Regards
PAscal 



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

13 февраля 2020 г., 06:05:40

Hi PAscal,

On Tue, 11 Feb 2020 15:04:12 -0700 (MST)
legrand legrand <legrand_legrand@hotmail.com> wrote:
> 
> regarding syntax REFRESH MATERIALIZED VIEW x WITH NO DATA
> 
> I understand that triggers are removed from the source tables, transforming 
> the INCREMENTAL MATERIALIZED VIEW into a(n unscannable) MATERIALIZED VIEW.
> 
> postgres=# refresh materialized view imv with no data;
> REFRESH MATERIALIZED VIEW
> postgres=# select * from imv;
> ERROR:  materialized view "imv" has not been populated
> HINT:  Use the REFRESH MATERIALIZED VIEW command.
> 
> This operation seems to me more of an ALTER command than a REFRESH ONE.
> 
> Wouldn't the syntax
> ALTER MATERIALIZED VIEW [ IF EXISTS ] name
>     SET WITH NO DATA
> or
>     SET WITHOUT DATA
> be better ?

We use "REFRESH ... WITH NO DATA" because there is already the syntax
to make materialized views non-scannable. We are just following in this.

https://www.postgresql.org/docs/12/sql-refreshmaterializedview.html

> 
> Continuing into this direction, did you ever think about an other feature
> like:  
> ALTER MATERIALIZED VIEW [ IF EXISTS ] name
>     SET { NOINCREMENTAL }
> or even
>     SET { NOINCREMENTAL | INCREMENTAL | INCREMENTAL CONCURRENTLY  }
> 
> that would permit to switch between those modes and would keep frozen data 
> available in the materialized view during heavy operations on source tables
> ?

Thank you for your suggestion! I agree that the feature to switch between
normal materialized view and incrementally maintainable view is useful.
We will add this to our ToDo list. Regarding its syntax, 
I would not like to add new keyword like NONINCREMENTAL, so how about 
the following

 ALTER MATERIALIZED VIEW ... SET {WITH | WITHOUT} INCREMENTAL REFRESH

although this is just a idea and we will need discussion on it.

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

legrand legrand

Дата:

13 февраля 2020 г., 19:57:11

Yugo Nagata wrote
> Thank you for your suggestion! I agree that the feature to switch between
> normal materialized view and incrementally maintainable view is useful.
> We will add this to our ToDo list. Regarding its syntax, 
> I would not like to add new keyword like NONINCREMENTAL, so how about 
> the following
> 
>  ALTER MATERIALIZED VIEW ... SET {WITH | WITHOUT} INCREMENTAL REFRESH
> 
> although this is just a idea and we will need discussion on it.

Thanks I will follow that discussion on GitHub 
https://github.com/sraoss/pgsql-ivm/issues/79

Regards
PAscal




--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: Implementing Incremental View Maintenance

От

nuko yokohama

Дата:

18 февраля 2020 г., 13:03:47

Hi.

SELECT statements with a TABLESAMPLE clause should be rejected.

Currently, CREATE INCREMENTAL MATERIALIZED VIEW allows SELECT statements with the TABLESAMPLE clause.
However, the result of this SELECT statement is undefined and should be rejected when specified in CREATE INCREMENTAL MATERIALIZED VIEW.
(similar to handling non-immutable functions)

Regard.

2020年2月8日(土) 11:15 nuko yokohama <nuko.yokohama@gmail.com>:

Hi.

UNION query problem.(server crash)

When creating an INCREMENTAL MATERIALIZED VIEW,
the server process crashes if you specify a query with a UNION.

(commit id = 23151be7be8d8f8f9c35c2d0e4e5353aedf2b31e)

execute log.

```
[ec2-user@ip-10-0-1-10 ivm]$ psql testdb -e -f union_query_crash.sql
DROP TABLE IF EXISTS table_x CASCADE;
psql:union_query_crash.sql:6: NOTICE: drop cascades to view xy_union_v
DROP TABLE
DROP TABLE IF EXISTS table_y CASCADE;
DROP TABLE
CREATE TABLE table_x (id int, data numeric);
CREATE TABLE
CREATE TABLE table_y (id int, data numeric);
CREATE TABLE
INSERT INTO table_x VALUES (generate_series(1, 3), random()::numeric);
INSERT 0 3
INSERT INTO table_y VALUES (generate_series(1, 3), random()::numeric);
INSERT 0 3
SELECT * FROM table_x;
id | data
----+--------------------
1 | 0.950724735058774
2 | 0.0222670808201144
3 | 0.391258547114841
(3 rows)

SELECT * FROM table_y;
id | data
----+--------------------
1 | 0.991717347778337
2 | 0.0528458947672874
3 | 0.965044982911163
(3 rows)

CREATE VIEW xy_union_v AS
SELECT 'table_x' AS name, * FROM table_x
UNION
SELECT 'table_y' AS name, * FROM table_y
;
CREATE VIEW
TABLE xy_union_v;
name | id | data
---------+----+--------------------
table_y | 2 | 0.0528458947672874
table_x | 2 | 0.0222670808201144
table_y | 3 | 0.965044982911163
table_x | 1 | 0.950724735058774
table_x | 3 | 0.391258547114841
table_y | 1 | 0.991717347778337
(6 rows)

CREATE INCREMENTAL MATERIALIZED VIEW xy_imv AS
SELECT 'table_x' AS name, * FROM table_x
UNION
SELECT 'table_y' AS name, * FROM table_y
;
psql:union_query_crash.sql:28: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql:union_query_crash.sql:28: fatal: connection to server was lost
```
UNION query problem.(server crash)

When creating an INCREMENTAL MATERIALIZED VIEW,
the server process crashes if you specify a query with a UNION.

(commit id = 23151be7be8d8f8f9c35c2d0e4e5353aedf2b31e)

execute log.

```
[ec2-user@ip-10-0-1-10 ivm]$ psql testdb -e -f union_query_crash.sql
DROP TABLE IF EXISTS table_x CASCADE;
psql:union_query_crash.sql:6: NOTICE: drop cascades to view xy_union_v
DROP TABLE
DROP TABLE IF EXISTS table_y CASCADE;
DROP TABLE
CREATE TABLE table_x (id int, data numeric);
CREATE TABLE
CREATE TABLE table_y (id int, data numeric);
CREATE TABLE
INSERT INTO table_x VALUES (generate_series(1, 3), random()::numeric);
INSERT 0 3
INSERT INTO table_y VALUES (generate_series(1, 3), random()::numeric);
INSERT 0 3
SELECT * FROM table_x;
id | data
----+--------------------
1 | 0.950724735058774
2 | 0.0222670808201144
3 | 0.391258547114841
(3 rows)

SELECT * FROM table_y;
id | data
----+--------------------
1 | 0.991717347778337
2 | 0.0528458947672874
3 | 0.965044982911163
(3 rows)

CREATE VIEW xy_union_v AS
SELECT 'table_x' AS name, * FROM table_x
UNION
SELECT 'table_y' AS name, * FROM table_y
;
CREATE VIEW
TABLE xy_union_v;
name | id | data
---------+----+--------------------
table_y | 2 | 0.0528458947672874
table_x | 2 | 0.0222670808201144
table_y | 3 | 0.965044982911163
table_x | 1 | 0.950724735058774
table_x | 3 | 0.391258547114841
table_y | 1 | 0.991717347778337
(6 rows)

CREATE INCREMENTAL MATERIALIZED VIEW xy_imv AS
SELECT 'table_x' AS name, * FROM table_x
UNION
SELECT 'table_y' AS name, * FROM table_y
;
psql:union_query_crash.sql:28: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql:union_query_crash.sql:28: fatal: connection to server was lost
```

2018年12月27日(木) 21:57 Yugo Nagata <nagata@sraoss.co.jp>:
Hi,

I would like to implement Incremental View Maintenance (IVM) on PostgreSQL.
IVM is a technique to maintain materialized views which computes and applies
only the incremental changes to the materialized views rather than
recomputate the contents as the current REFRESH command does.

I had a presentation on our PoC implementation of IVM at PGConf.eu 2018 [1].
Our implementation uses row OIDs to compute deltas for materialized views.
The basic idea is that if we have information about which rows in base tables
are contributing to generate a certain row in a matview then we can identify
the affected rows when a base table is updated. This is based on an idea of
Dr. Masunaga [2] who is a member of our group and inspired from ID-based
approach[3].

In our implementation, the mapping of the row OIDs of the materialized view
and the base tables are stored in "OID map". When a base relation is modified,
AFTER trigger is executed and the delta is recorded in delta tables using
the transition table feature. The accual udpate of the matview is triggerd
by REFRESH command with INCREMENTALLY option.

However, we realize problems of our implementation. First, WITH OIDS will
be removed since PG12, so OIDs are no longer available. Besides this, it would
be hard to implement this since it needs many changes of executor nodes to
collect base tables's OIDs during execuing a query. Also, the cost of maintaining
OID map would be high.

For these reasons, we started to think to implement IVM without relying on OIDs
and made a bit more surveys.

We also looked at Kevin Grittner's discussion [4] on incremental matview
maintenance. In this discussion, Kevin proposed to use counting algorithm [5]
to handle projection views (using DISTNICT) properly. This algorithm need an
additional system column, count_t, in materialized views and delta tables of
base tables.

However, the discussion about IVM is now stoped, so we would like to restart and
progress this.

Through our PoC inplementation and surveys, I think we need to think at least
the followings for implementing IVM.

1. How to extract changes on base tables

I think there would be at least two approaches for it.

- Using transition table in AFTER triggers
- Extracting changes from WAL using logical decoding

In our PoC implementation, we used AFTER trigger and transition tables, but using
logical decoding might be better from the point of performance of base table
modification.

If we can represent a change of UPDATE on a base table as query-like rather than
OLD and NEW, it may be possible to update the materialized view directly instead
of performing delete & insert.

2. How to compute the delta to be applied to materialized views

Essentially, IVM is based on relational algebra. Theorically, changes on base
tables are represented as deltas on this, like "R <- R + dR", and the delta on
the materialized view is computed using base table deltas based on "change
propagation equations". For implementation, we have to derive the equation from
the view definition query (Query tree, or Plan tree?) and describe this as SQL
query to compulte delta to be applied to the materialized view.

There could be several operations for view definition: selection, projection,
join, aggregation, union, difference, intersection, etc. If we can prepare a
module for each operation, it makes IVM extensable, so we can start a simple
view definition, and then support more complex views.

3. How to identify rows to be modifed in materialized views

When applying the delta to the materialized view, we have to identify which row
in the matview is corresponding to a row in the delta. A naive method is matching
by using all columns in a tuple, but clearly this is unefficient. If thematerialized
view has unique index, we can use this. Maybe, we have to force materialized views
to have all primary key colums in their base tables. In our PoC implementation, we
used OID to identify rows, but this will be no longer available as said above.

4. When to maintain materialized views

There are two candidates of the timing of maintenance, immediate (eager) or deferred.

In eager maintenance, the materialized view is updated in the same transaction
where the base table is updated. In deferred maintenance, this is done after the
transaction is commited, for example, when view is accessed, as a response to user
request, etc.

In the previous discussion[4], it is planned to start from "eager" approach. In our PoC
implementaion, we used the other aproach, that is, using REFRESH command to perform IVM.
I am not sure which is better as a start point, but I begin to think that the eager
approach may be more simple since we don't have to maintain base table changes in other
past transactions.

In the eager maintenance approache, we have to consider a race condition where two
different transactions change base tables simultaneously as discussed in [4].

[1] https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
[2] https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1 (Japanese only)
[3] https://dl.acm.org/citation.cfm?id=2750546
[4] https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
[5] https://dl.acm.org/citation.cfm?id=170066

Regards,
--
Yugo Nagata <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

19 февраля 2020 г., 00:11:35

On Tue, 18 Feb 2020 22:03:47 +0900
nuko yokohama <nuko.yokohama@gmail.com> wrote:

> Hi.
> 
> SELECT statements with a TABLESAMPLE clause should be rejected.
> 
> Currently, CREATE INCREMENTAL MATERIALIZED VIEW allows SELECT statements
> with the TABLESAMPLE clause.
> However, the result of this SELECT statement is undefined and should be
> rejected when specified in CREATE INCREMENTAL MATERIALIZED VIEW.
> (similar to handling non-immutable functions)

Thanks!  We totally agree with you.  We are now working on improvement of
query checks at creating IMMV.  TABLESAMPLE will also be checked in this.

Regards,
Yugo Nagata

> Regard.
> 
> 2020年2月8日(土) 11:15 nuko yokohama <nuko.yokohama@gmail.com>:
> 
> > Hi.
> >
> > UNION query problem.(server crash)
> >
> > When creating an INCREMENTAL MATERIALIZED VIEW,
> > the server process crashes if you specify a query with a UNION.
> >
> > (commit id = 23151be7be8d8f8f9c35c2d0e4e5353aedf2b31e)
> >
> > execute log.
> >
> > ```
> > [ec2-user@ip-10-0-1-10 ivm]$ psql testdb -e -f union_query_crash.sql
> > DROP TABLE IF EXISTS table_x CASCADE;
> > psql:union_query_crash.sql:6: NOTICE:  drop cascades to view xy_union_v
> > DROP TABLE
> > DROP TABLE IF EXISTS table_y CASCADE;
> > DROP TABLE
> > CREATE TABLE table_x (id int, data numeric);
> > CREATE TABLE
> > CREATE TABLE table_y (id int, data numeric);
> > CREATE TABLE
> > INSERT INTO table_x VALUES (generate_series(1, 3), random()::numeric);
> > INSERT 0 3
> > INSERT INTO table_y VALUES (generate_series(1, 3), random()::numeric);
> > INSERT 0 3
> > SELECT * FROM table_x;
> >  id |        data
> > ----+--------------------
> >   1 |  0.950724735058774
> >   2 | 0.0222670808201144
> >   3 |  0.391258547114841
> > (3 rows)
> >
> > SELECT * FROM table_y;
> >  id |        data
> > ----+--------------------
> >   1 |  0.991717347778337
> >   2 | 0.0528458947672874
> >   3 |  0.965044982911163
> > (3 rows)
> >
> > CREATE VIEW xy_union_v AS
> > SELECT 'table_x' AS name, * FROM table_x
> > UNION
> > SELECT 'table_y' AS name, * FROM table_y
> > ;
> > CREATE VIEW
> > TABLE xy_union_v;
> >   name   | id |        data
> > ---------+----+--------------------
> >  table_y |  2 | 0.0528458947672874
> >  table_x |  2 | 0.0222670808201144
> >  table_y |  3 |  0.965044982911163
> >  table_x |  1 |  0.950724735058774
> >  table_x |  3 |  0.391258547114841
> >  table_y |  1 |  0.991717347778337
> > (6 rows)
> >
> > CREATE INCREMENTAL MATERIALIZED VIEW xy_imv AS
> > SELECT 'table_x' AS name, * FROM table_x
> > UNION
> > SELECT 'table_y' AS name, * FROM table_y
> > ;
> > psql:union_query_crash.sql:28: server closed the connection unexpectedly
> >         This probably means the server terminated abnormally
> >         before or while processing the request.
> > psql:union_query_crash.sql:28: fatal: connection to server was lost
> > ```
> > UNION query problem.(server crash)
> >
> > When creating an INCREMENTAL MATERIALIZED VIEW,
> > the server process crashes if you specify a query with a UNION.
> >
> > (commit id = 23151be7be8d8f8f9c35c2d0e4e5353aedf2b31e)
> >
> > execute log.
> >
> > ```
> > [ec2-user@ip-10-0-1-10 ivm]$ psql testdb -e -f union_query_crash.sql
> > DROP TABLE IF EXISTS table_x CASCADE;
> > psql:union_query_crash.sql:6: NOTICE:  drop cascades to view xy_union_v
> > DROP TABLE
> > DROP TABLE IF EXISTS table_y CASCADE;
> > DROP TABLE
> > CREATE TABLE table_x (id int, data numeric);
> > CREATE TABLE
> > CREATE TABLE table_y (id int, data numeric);
> > CREATE TABLE
> > INSERT INTO table_x VALUES (generate_series(1, 3), random()::numeric);
> > INSERT 0 3
> > INSERT INTO table_y VALUES (generate_series(1, 3), random()::numeric);
> > INSERT 0 3
> > SELECT * FROM table_x;
> >  id |        data
> > ----+--------------------
> >   1 |  0.950724735058774
> >   2 | 0.0222670808201144
> >   3 |  0.391258547114841
> > (3 rows)
> >
> > SELECT * FROM table_y;
> >  id |        data
> > ----+--------------------
> >   1 |  0.991717347778337
> >   2 | 0.0528458947672874
> >   3 |  0.965044982911163
> > (3 rows)
> >
> > CREATE VIEW xy_union_v AS
> > SELECT 'table_x' AS name, * FROM table_x
> > UNION
> > SELECT 'table_y' AS name, * FROM table_y
> > ;
> > CREATE VIEW
> > TABLE xy_union_v;
> >   name   | id |        data
> > ---------+----+--------------------
> >  table_y |  2 | 0.0528458947672874
> >  table_x |  2 | 0.0222670808201144
> >  table_y |  3 |  0.965044982911163
> >  table_x |  1 |  0.950724735058774
> >  table_x |  3 |  0.391258547114841
> >  table_y |  1 |  0.991717347778337
> > (6 rows)
> >
> > CREATE INCREMENTAL MATERIALIZED VIEW xy_imv AS
> > SELECT 'table_x' AS name, * FROM table_x
> > UNION
> > SELECT 'table_y' AS name, * FROM table_y
> > ;
> > psql:union_query_crash.sql:28: server closed the connection unexpectedly
> >         This probably means the server terminated abnormally
> >         before or while processing the request.
> > psql:union_query_crash.sql:28: fatal: connection to server was lost
> > ```
> >
> > 2018年12月27日(木) 21:57 Yugo Nagata <nagata@sraoss.co.jp>:
> >
> >> Hi,
> >>
> >> I would like to implement Incremental View Maintenance (IVM) on
> >> PostgreSQL.
> >> IVM is a technique to maintain materialized views which computes and
> >> applies
> >> only the incremental changes to the materialized views rather than
> >> recomputate the contents as the current REFRESH command does.
> >>
> >> I had a presentation on our PoC implementation of IVM at PGConf.eu 2018
> >> [1].
> >> Our implementation uses row OIDs to compute deltas for materialized
> >> views.
> >> The basic idea is that if we have information about which rows in base
> >> tables
> >> are contributing to generate a certain row in a matview then we can
> >> identify
> >> the affected rows when a base table is updated. This is based on an idea
> >> of
> >> Dr. Masunaga [2] who is a member of our group and inspired from ID-based
> >> approach[3].
> >>
> >> In our implementation, the mapping of the row OIDs of the materialized
> >> view
> >> and the base tables are stored in "OID map". When a base relation is
> >> modified,
> >> AFTER trigger is executed and the delta is recorded in delta tables using
> >> the transition table feature. The accual udpate of the matview is triggerd
> >> by REFRESH command with INCREMENTALLY option.
> >>
> >> However, we realize problems of our implementation. First, WITH OIDS will
> >> be removed since PG12, so OIDs are no longer available. Besides this, it
> >> would
> >> be hard to implement this since it needs many changes of executor nodes to
> >> collect base tables's OIDs during execuing a query. Also, the cost of
> >> maintaining
> >> OID map would be high.
> >>
> >> For these reasons, we started to think to implement IVM without relying
> >> on OIDs
> >> and made a bit more surveys.
> >>
> >> We also looked at Kevin Grittner's discussion [4] on incremental matview
> >> maintenance.  In this discussion, Kevin proposed to use counting
> >> algorithm [5]
> >> to handle projection views (using DISTNICT) properly. This algorithm need
> >> an
> >> additional system column, count_t, in materialized views and delta tables
> >> of
> >> base tables.
> >>
> >> However, the discussion about IVM is now stoped, so we would like to
> >> restart and
> >> progress this.
> >>
> >>
> >> Through our PoC inplementation and surveys, I think we need to think at
> >> least
> >> the followings for implementing IVM.
> >>
> >> 1. How to extract changes on base tables
> >>
> >> I think there would be at least two approaches for it.
> >>
> >>  - Using transition table in AFTER triggers
> >>  - Extracting changes from WAL using logical decoding
> >>
> >> In our PoC implementation, we used AFTER trigger and transition tables,
> >> but using
> >> logical decoding might be better from the point of performance of base
> >> table
> >> modification.
> >>
> >> If we can represent a change of UPDATE on a base table as query-like
> >> rather than
> >> OLD and NEW, it may be possible to update the materialized view directly
> >> instead
> >> of performing delete & insert.
> >>
> >>
> >> 2. How to compute the delta to be applied to materialized views
> >>
> >> Essentially, IVM is based on relational algebra. Theorically, changes on
> >> base
> >> tables are represented as deltas on this, like "R <- R + dR", and the
> >> delta on
> >> the materialized view is computed using base table deltas based on "change
> >> propagation equations".  For implementation, we have to derive the
> >> equation from
> >> the view definition query (Query tree, or Plan tree?) and describe this
> >> as SQL
> >> query to compulte delta to be applied to the materialized view.
> >>
> >> There could be several operations for view definition: selection,
> >> projection,
> >> join,  aggregation, union, difference, intersection, etc.  If we can
> >> prepare a
> >> module for each operation, it makes IVM extensable, so we can start a
> >> simple
> >> view definition, and then support more complex views.
> >>
> >>
> >> 3. How to identify rows to be modifed in materialized views
> >>
> >> When applying the delta to the materialized view, we have to identify
> >> which row
> >> in the matview is corresponding to a row in the delta.  A naive method is
> >> matching
> >> by using all columns in a tuple, but clearly this is unefficient. If
> >> thematerialized
> >> view has unique index, we can use this. Maybe, we have to force
> >> materialized views
> >> to have all primary key colums in their base tables.  In our PoC
> >> implementation, we
> >> used OID to identify rows, but this will be no longer available as said
> >> above.
> >>
> >>
> >> 4. When to maintain materialized views
> >>
> >> There are two candidates of the timing of maintenance, immediate (eager)
> >> or deferred.
> >>
> >> In eager maintenance, the materialized view is updated in the same
> >> transaction
> >> where the base table is updated. In deferred maintenance, this is done
> >> after the
> >> transaction is commited, for example, when view is accessed, as a
> >> response to user
> >> request, etc.
> >>
> >> In the previous discussion[4], it is planned to start from "eager"
> >> approach. In our PoC
> >> implementaion, we used the other aproach, that is, using REFRESH command
> >> to perform IVM.
> >> I am not sure which is better as a start point, but I begin to think that
> >> the eager
> >> approach may be more simple since we don't have to maintain base table
> >> changes in other
> >> past transactions.
> >>
> >> In the eager maintenance approache, we have to consider a race condition
> >> where two
> >> different transactions change base tables simultaneously as discussed in
> >> [4].
> >>
> >>
> >> [1]
> >>
https://www.postgresql.eu/events/pgconfeu2018/schedule/session/2195-implementing-incremental-view-maintenance-on-postgresql/
> >> [2]
> >>
https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=191254&item_no=1
> >> (Japanese only)
> >> [3] https://dl.acm.org/citation.cfm?id=2750546
> >> [4]
> >> https://www.postgresql.org/message-id/flat/1368561126.64093.YahooMailNeo%40web162904.mail.bf1.yahoo.com
> >> [5] https://dl.acm.org/citation.cfm?id=170066
> >>
> >> Regards,
> >> --
> >> Yugo Nagata <nagata@sraoss.co.jp>
> >>
> >>


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Takuma Hoshiai

Дата:

27 февраля 2020 г., 06:06:49

Hi, 

Attached is the latest patch (v14) to add support for Incremental Materialized
View Maintenance (IVM). It is possible to apply to current latest master branch.

Differences from the previous patch (v13) include:

* Support base tables using RLS

If a table has the Row Level Security (RLS) policy, IMMV is updated based on
the view owner's policy when a base table is updated. However, when a policy
of base table is changed or created after creating IMMV, IMMV is not updated
based on the new RLS policy. In this case, REFRESH command must be executed.

* Use ENR instead of temporary tables for internal operation

Previously, IVM create and use a temporary tables to store view delta rows.
However it caused out of shared memory, and Tom Lane pointed out that 
using temp tables in IVM trigger is not good.

Currently, IVM uses tuplestores and ephemeral named relation (ENR) instead
of temporary tables. it doesn't cause previous problem like below:

testdb=# create table b1 (id integer, x numeric(10,3));
CREATE TABLE
testdb=# create incremental materialized view mv1 
testdb-# as select id, count(*),sum(x) from b1 group by id;
SELECT 0
testdb=# 
testdb=# do $$ 
testdb$# declare 
testdb$# i integer;
testdb$# begin 
testdb$# for i in 1..10000 
testdb$# loop 
testdb$# insert into b1 values (1,1); 
testdb$# end loop; 
testdb$# end;
testdb$# $$
testdb-# ;
DO
testdb=# 

This issue is reported by PAscal.
https://www.postgresql.org/message-id/1577564109604-0.post@n3.nabble.com


* Support pg_dump/pg_restore for IVM

IVM supports pg_dump/pg_restore command.

* Prohibit rename and unique index creation on IVM columns

When a user make a unique index on ivm columns such as ivm_count, IVM will fail due to
the unique constraint violation, so IVM prohibits it.
Also, rename of these columns also causes IVM fails, so IVM prohibits it too.

* Fix incorrect WHERE condition check for outer-join views

The check for non null-rejecting condition check was incorrect.

Best Regards,
Takuma Hoshiai

-- 
Takuma Hoshiai <hoshiai@sraoss.co.jp>

Вложения

IVM_patches_v14.tar.gz

Re: Implementing Incremental View Maintenance

От

legrand legrand

Дата:

27 февраля 2020 г., 21:35:55

> I have tried to use an other patch with yours:
> "Planning counters in pg_stat_statements (using pgss_store)"

> setting
> shared_preload_libraries='pg_stat_statements'
> pg_stat_statements.track=all
> restarting the cluster and creating the extension


> When trying following syntax:

> create table b1 (id integer, x numeric(10,3));
> create incremental materialized view mv1 as select id, count(*),sum(x) 
> from b1 group by id;
> insert into b1 values (1,1)
>
> I got an ASSERT FAILURE in pg_stat_statements.c
> on
>     Assert(query != NULL);
>
> comming from matview.c
>    refresh_matview_datafill(dest_old, query, queryEnv, NULL);
> or
>    refresh_matview_datafill(dest_new, query, queryEnv, NULL);
>
> If this (last) NULL field was replaced by the query text, 
> a comment or just "n/a",
> it would fix the problem.

> Could this be investigated ?

Hello,

thank you for patch v14, that fix problems inherited from temporary tables.
it seems that this ASSERT problem with pgss patch is still present ;o(

Could we have a look ?

Thanks in advance
Regards
PAscal



--
Sent from:
https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html





--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: Implementing Incremental View Maintenance

От

Takuma Hoshiai

Дата:

28 февраля 2020 г., 07:29:14

On Thu, 27 Feb 2020 14:35:55 -0700 (MST)
legrand legrand <legrand_legrand@hotmail.com> wrote:

> > I have tried to use an other patch with yours:
> > "Planning counters in pg_stat_statements (using pgss_store)"
> 
> > setting
> > shared_preload_libraries='pg_stat_statements'
> > pg_stat_statements.track=all
> > restarting the cluster and creating the extension
> 
> 
> > When trying following syntax:
> 
> > create table b1 (id integer, x numeric(10,3));
> > create incremental materialized view mv1 as select id, count(*),sum(x) 
> > from b1 group by id;
> > insert into b1 values (1,1)
> >
> > I got an ASSERT FAILURE in pg_stat_statements.c
> > on
> >     Assert(query != NULL);
> >
> > comming from matview.c
> >    refresh_matview_datafill(dest_old, query, queryEnv, NULL);
> > or
> >    refresh_matview_datafill(dest_new, query, queryEnv, NULL);
> >
> > If this (last) NULL field was replaced by the query text, 
> > a comment or just "n/a",
> > it would fix the problem.
> 
> > Could this be investigated ?
> 
> Hello,
> 
> thank you for patch v14, that fix problems inherited from temporary tables.
> it seems that this ASSERT problem with pgss patch is still present ;o(
> 
> Could we have a look ?

Sorry but we are busy on fixing and improving IVM patches. I think fixing
the assertion failure needs non trivial changes to other part of PosthreSQL.
So we would like to work on the issue you reported after the pgss patch
gets committed.

Best Regards,

Takuma Hoshiai

 
> Thanks in advance
> Regards
> PAscal
> 
> 
> 
> --
> Sent from:
> https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
> 
> 
> 
> 
> 
> --
> Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
> 
> 
> 


-- 
Takuma Hoshiai <hoshiai@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

legrand legrand

Дата:

28 февраля 2020 г., 09:23:25

>> thank you for patch v14, that fix problems inherited from temporary
tables.
>> it seems that this ASSERT problem with pgss patch is still present ;o(
>> 
>
> Sorry but we are busy on fixing and improving IVM patches. I think fixing
> the assertion failure needs non trivial changes to other part of
> PosthreSQL.
> So we would like to work on the issue you reported after the pgss patch
> gets committed.

Imagine it will happen tomorrow !
You may say I'm a dreamer
But I'm not the only one
...
...





--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

10 апреля 2020 г., 14:26:58

Hi, 

Attached is the latest patch (v15) to add support for Incremental Materialized
View Maintenance (IVM). It is possible to apply to current latest master branch.

Differences from the previous patch (v14) include:

* Fix to not use generate_series when views are queried
    
In the previous implementation, multiplicity of each tuple was stored
in ivm_count column in views. When SELECT was issued for views with
duplicate, the view was replaced with a subquery in which each tuple
was joined with generate_series function in order to output tuples
of the number of ivm_count.

This was problematic for following reasons:
  
- The overhead was huge. When almost of tuples in a view were selected,
  it took much longer time than the original query. This lost the meaning
  of materialized views.
    
- Optimizer could not estimate row numbers correctly because this had to
  know ivm_count values stored in tuples.
    
- System columns of materialized views like cmin, xmin, xmax could not
  be used because a view was replaced with a subquery.
    
To resolve this, the new implementation doen't store multiplicities
for views with tuple duplicates, and doesn't use generate_series
when SELECT query is issued for such views.
    
Note that we still have to use ivm_count for supporting DISTINCT and
aggregates.

*  Add query checks for IVM restrictions
    
Query checks for following restrictions are added:
    
- DISTINCT ON
- TABLESAMPLE parameter
- inheritance parent table
- window function
- some aggregate options(such as FILTER, DISTINCT, ORDER and GROUPING SETS)
- targetlist containing IVM column
- simple subquery is only supported
- FOR UPDATE/SHARE
- empty target list
- UNION/INTERSECT/EXCEPT
- GROUPING SETS clauses

* Improve error messages
    
Add error code ERRCODE_FEATURE_NOT_SUPPORTED to each IVM error message.
Also, the message format was unified.

* Support subqueries containig joins in FROM clause
    
Previously, when multi tables are updated simultaneously, incremental
view maintenance with subqueries including JOIN didn't work correctly
due to a bug. 

Best Regards,
Takuma Hoshiai

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Вложения

IVM_patches_v15.tar.gz

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

13 апреля 2020 г., 05:18:35

On Fri, 10 Apr 2020 23:26:58 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:

> Hi, 
> 
> Attached is the latest patch (v15) to add support for Incremental Materialized
> View Maintenance (IVM). It is possible to apply to current latest master branch.

I found a mistake of splitting patch, so I attached the fixed patch (v15a).
 
> Differences from the previous patch (v14) include:
> 
> * Fix to not use generate_series when views are queried
>     
> In the previous implementation, multiplicity of each tuple was stored
> in ivm_count column in views. When SELECT was issued for views with
> duplicate, the view was replaced with a subquery in which each tuple
> was joined with generate_series function in order to output tuples
> of the number of ivm_count.
> 
> This was problematic for following reasons:
>   
> - The overhead was huge. When almost of tuples in a view were selected,
>   it took much longer time than the original query. This lost the meaning
>   of materialized views.
>     
> - Optimizer could not estimate row numbers correctly because this had to
>   know ivm_count values stored in tuples.
>     
> - System columns of materialized views like cmin, xmin, xmax could not
>   be used because a view was replaced with a subquery.
>     
> To resolve this, the new implementation doen't store multiplicities
> for views with tuple duplicates, and doesn't use generate_series
> when SELECT query is issued for such views.
>     
> Note that we still have to use ivm_count for supporting DISTINCT and
> aggregates.

I also explain the way of updating views with tuple duplicates.

Although a view itself doesn't have ivm_count column, multiplicities
for old delta and new delta are calculated and the count value is
contained in a column named __ivm_count__ in each delta table.
    
The old delta table is applied using ctid and row_number function. 
row_number is used to numbering tuples in the view, and tuples whose
number is equal or is less than __ivm_count__ are deleted from the
view using a query like:
    
 DELETE FROM matviewname WHERE ctid IN (
    SELECT tid FROM (
       SELECT row_number() over (partition by c1, c2, ...) AS __ivm_row_number__,
              mv.ctid AS tid,
              diff.__ivm_count__
       FROM matviewname AS mv, old_delta AS diff "
       WHERE mv.c1 = diff.c1 AND mv.c2 = diff.c2 AND ... ) v
 WHERE v.__ivm_row_number__ <=  v.__ivm_count__
    
The new delta is applied using generate_seriese to insert mutiple same
tuples, using a query like:

 INSERT INTO matviewname (c1, c2, ...)
   SELECT c1,c2,... FROM (
     SELECT diff.*, generate_series(

> 
> *  Add query checks for IVM restrictions
>     
> Query checks for following restrictions are added:
>     
> - DISTINCT ON
> - TABLESAMPLE parameter
> - inheritance parent table
> - window function
> - some aggregate options(such as FILTER, DISTINCT, ORDER and GROUPING SETS)
> - targetlist containing IVM column
> - simple subquery is only supported
> - FOR UPDATE/SHARE
> - empty target list
> - UNION/INTERSECT/EXCEPT
> - GROUPING SETS clauses
> 
> * Improve error messages
>     
> Add error code ERRCODE_FEATURE_NOT_SUPPORTED to each IVM error message.
> Also, the message format was unified.
> 
> * Support subqueries containig joins in FROM clause
>     
> Previously, when multi tables are updated simultaneously, incremental
> view maintenance with subqueries including JOIN didn't work correctly
> due to a bug. 
> 
> Best Regards,
> Takuma Hoshiai
> 
> -- 
> Yugo NAGATA <nagata@sraoss.co.jp>


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Вложения

IVM_patches_v15a.tar.gz

Re: Implementing Incremental View Maintenance

От

Tatsuo Ishii

Дата:

08 мая 2020 г., 01:13:06

>> Hi, 
>> 
>> Attached is the latest patch (v15) to add support for Incremental Materialized
>> View Maintenance (IVM). It is possible to apply to current latest master branch.

I have tried to use IVM against TPC-DS (http://www.tpc.org/tpcds/)
queries.  TPC-DS models decision support systems and those queries are
modestly complex. So I thought applying IVM to those queries could
show how IVM covers real world queries.

Since IVM does not support queries including ORDER BY and LIMIT, I
removed them from the queries before the test.

Here are some facts so far learned in this attempt.

- Number of TPC-DS query files is 99.
- IVM was successfully applied to 20 queries.
- 33 queries failed because they use WITH clause (CTE) (currenly IVM does not support CTE).
- Error messages from failed queries (except those using WITH) are below:
  (the number indicates how many queries failed by the same reason)

11     aggregate functions in nested query are not supported on incrementally maintainable materialized view
8     window functions are not supported on incrementally maintainable materialized view
7     UNION/INTERSECT/EXCEPT statements are not supported on incrementally maintainable materialized view
5     WHERE clause only support subquery with EXISTS clause
3     GROUPING SETS, ROLLUP, or CUBE clauses is not supported on incrementally maintainable materialized view
3     aggregate function and EXISTS condition are not supported at the same time
2     GROUP BY expression not appeared in select list is not supported on incrementally maintainable materialized view
2     aggregate function with DISTINCT arguments is not supported on incrementally maintainable materialized view
2     aggregate is not supported with outer join
1     aggregate function stddev_samp(integer) is not supported on incrementally maintainable materialized view
1     HAVING clause is not supported on incrementally maintainable materialized view
1     subquery is not supported with outer join
1    column "avg" specified more than once

Attached are the queries IVM are successfully applied.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Вложения

IVM_sucessfull_queries.tar.gz

Re: Implementing Incremental View Maintenance

От

Andy Fan

Дата:

08 мая 2020 г., 05:08:50

On Fri, May 8, 2020 at 9:13 AM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

>> Hi,
>>
>> Attached is the latest patch (v15) to add support for Incremental Materialized
>> View Maintenance (IVM). It is possible to apply to current latest master branch.

I have tried to use IVM against TPC-DS (http://www.tpc.org/tpcds/)
queries. TPC-DS models decision support systems and those queries are
modestly complex. So I thought applying IVM to those queries could
show how IVM covers real world queries.

+1, This is a smart idea. How did you test it? AFAIK, we can test it with:

1. For any query like SELECT xxx, we create view like CREATE MATERIAL VIEW

mv_name as SELECT xxx; to test if the features in the query are supported.

2. Update the data and then compare the result with SELECT XXX with SELECT

* from mv_name to test if the data is correctly sync.

Best Regards

Andy Fan

Re: Implementing Incremental View Maintenance

От

Tatsuo Ishii

Дата:

08 мая 2020 г., 06:52:53

>> +1,  This is a smart idea.   How did you test it?  AFAIK, we can test it
> with:
> 
> 1.  For any query like SELECT xxx,  we create view like CREATE MATERIAL VIEW
> mv_name as SELECT xxx;  to test if the features in the query are supported.

No I didn't test the correctness of IVM with TPC-DS data for now.
TPC-DS comes with a data generator and we can test IVM something like:

SELECT * FROM IVM_vew EXCEPT SELECT ... (TPC-DS original query);

If this produces 0 row, then the IVM is correct for the initial data.
(of course actually we need to add appropreate ORDER BY and LIMIT
clause to the SELECT statement for IVM if neccessary).

> 2.  Update the data and then compare the result with SELECT XXX with SELECT
> * from  mv_name to test if the data is correctly sync.

I wanted to test the data updating but I am still struggling how to
extract correct updating data from TPC-DS data set.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: Implementing Incremental View Maintenance

От

Andy Fan

Дата:

07 июля 2020 г., 03:11:08

Thanks for the patch!

Query checks for following restrictions are added:

Are all known supported cases listed below?

- inheritance parent table
...
- targetlist containing IVM column
- simple subquery is only supported

How to understand 3 items above?

Best Regards

Andy Fan

Re: Implementing Incremental View Maintenance

От

Tatsuo Ishii

Дата:

07 июля 2020 г., 07:26:34

>> Query checks for following restrictions are added:
> 
> 
> Are all known supported cases listed below?

They are "restrictions" and are not supported.
> 
>> - inheritance parent table
>> ...
>> - targetlist containing IVM column
>> - simple subquery is only supported
>>
> 
> How to understand 3 items above?

The best way to understand them is looking into regression test.
src/test/regress/expected/incremental_matview.out.

>> - inheritance parent table
-- inheritance parent is not supported with IVM"
BEGIN;
CREATE TABLE parent (i int, v int);
CREATE TABLE child_a(options text) INHERITS(parent);
CREATE INCREMENTAL MATERIALIZED VIEW  mv_ivm21 AS SELECT * FROM parent;
ERROR:  inheritance parent is not supported on incrementally maintainable materialized view

>> - targetlist containing IVM column

-- tartget list cannot contain ivm clumn that start with '__ivm'
CREATE INCREMENTAL MATERIALIZED VIEW  mv_ivm28 AS SELECT i AS "__ivm_count__" FROM mv_base_a;
ERROR:  column name __ivm_count__ is not supported on incrementally maintainable materialized view

>> - simple subquery is only supported
-- subquery is not supported with outer join
CREATE INCREMENTAL MATERIALIZED VIEW mv(a,b) AS SELECT a.i, b.i FROM mv_base_a a LEFT JOIN (SELECT * FROM mv_base_b) b
ONa.i=b.i;
 
ERROR:  this query is not allowed on incrementally maintainable materialized view
HINT:  subquery is not supported with outer join

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: Implementing Incremental View Maintenance

От

Andy Fan

Дата:

09 июля 2020 г., 12:20:41

On Tue, Jul 7, 2020 at 3:26 PM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

>> Query checks for following restrictions are added:
>
>
> Are all known supported cases listed below?

They are "restrictions" and are not supported.

Yes, I missed the "not" word:(

>
>> - inheritance parent table
>> ...
>> - targetlist containing IVM column
>> - simple subquery is only supported
>>
>
> How to understand 3 items above?

The best way to understand them is looking into regression test.

Thanks for sharing, I will look into it.

Best Regards

Andy Fan

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

18 августа 2020 г., 09:52:36

Hi,

Attached is the rebased patch (v16) to add support for Incremental
Materialized View Maintenance (IVM). It is able to be applied to
current latest master branch.

This also includes the following small fixes:

- Add a query check for expressions containing aggregates in it
- [doc] Add description about which situations IVM is effective or not in
- Improve hint in log messages
- Reorganize include directives in codes

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Вложения

IVM_patches_v16.tar.gz

Re: Implementing Incremental View Maintenance

От

Tatsuo Ishii

Дата:

19 августа 2020 г., 01:02:42

I have looked into this.

> Hi,
> 
> Attached is the rebased patch (v16) to add support for Incremental
> Materialized View Maintenance (IVM). It is able to be applied to
> current latest master branch.
> 
> This also includes the following small fixes:
> 
> - Add a query check for expressions containing aggregates in it
> - [doc] Add description about which situations IVM is effective or not in
> - Improve hint in log messages
> - Reorganize include directives in codes

- make check passed.
- make check-world passed.

- 0004-Allow-to-prolong-life-span-of-transition-tables-unti.patch:
  This one needs a comment to describe what the function does etc.

  +void
  +SetTransitionTablePreserved(Oid relid, CmdType cmdType)
  +{


- 0007-Add-aggregates-support-in-IVM.patch
  "Check if the given aggregate function is supporting" shouldn't be
  "Check if the given aggregate function is supporting IVM"?

+ * check_aggregate_supports_ivm
+ *
+ * Check if the given aggregate function is supporting

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

21 августа 2020 г., 08:23:20

On Wed, 19 Aug 2020 10:02:42 +0900 (JST)
Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

> I have looked into this.

Thank you for your reviewing!
 
> - 0004-Allow-to-prolong-life-span-of-transition-tables-unti.patch:
>   This one needs a comment to describe what the function does etc.
> 
>   +void
>   +SetTransitionTablePreserved(Oid relid, CmdType cmdType)
>   +{

I added a comment for this function and related places.

+/*
+ * SetTransitionTablePreserved
+ *
+ * Prolong lifespan of transition tables corresponding specified relid and
+ * command type to the end of the outmost query instead of each nested query.
+ * This enables to use nested AFTER trigger's transition tables from outer
+ * query's triggers.  Currently, only immediate incremental view maintenance
+ * uses this.
+ */
+void
+SetTransitionTablePreserved(Oid relid, CmdType cmdType)

Also, I removed releted unnecessary code which was left accidentally.

 
> - 0007-Add-aggregates-support-in-IVM.patch
>   "Check if the given aggregate function is supporting" shouldn't be
>   "Check if the given aggregate function is supporting IVM"?

Yes, you are right. I fixed this, too.

> 
> + * check_aggregate_supports_ivm
> + *
> + * Check if the given aggregate function is supporting


Regards,
Yugo Nagata


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Вложения

IVM_patches_v17.tar.gz

Re: Implementing Incremental View Maintenance

От

Tatsuo Ishii

Дата:

21 августа 2020 г., 12:40:50

From: Yugo NAGATA <nagata@sraoss.co.jp>
Subject: Re: Implementing Incremental View Maintenance
Date: Fri, 21 Aug 2020 17:23:20 +0900
Message-ID: <20200821172320.a2506577d5244b6066f69331@sraoss.co.jp>

> On Wed, 19 Aug 2020 10:02:42 +0900 (JST)
> Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
> 
>> I have looked into this.
> 
> Thank you for your reviewing!
>  
>> - 0004-Allow-to-prolong-life-span-of-transition-tables-unti.patch:
>>   This one needs a comment to describe what the function does etc.
>> 
>>   +void
>>   +SetTransitionTablePreserved(Oid relid, CmdType cmdType)
>>   +{
> 
> I added a comment for this function and related places.
> 
> +/*
> + * SetTransitionTablePreserved
> + *
> + * Prolong lifespan of transition tables corresponding specified relid and
> + * command type to the end of the outmost query instead of each nested query.
> + * This enables to use nested AFTER trigger's transition tables from outer
> + * query's triggers.  Currently, only immediate incremental view maintenance
> + * uses this.
> + */
> +void
> +SetTransitionTablePreserved(Oid relid, CmdType cmdType)
> 
> Also, I removed releted unnecessary code which was left accidentally.
> 
>  
>> - 0007-Add-aggregates-support-in-IVM.patch
>>   "Check if the given aggregate function is supporting" shouldn't be
>>   "Check if the given aggregate function is supporting IVM"?
> 
> Yes, you are right. I fixed this, too.
> 
>> 
>> + * check_aggregate_supports_ivm
>> + *
>> + * Check if the given aggregate function is supporting

Thanks for the fixes. I have changed the commit fest status to "Ready
for Committer".

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

31 августа 2020 г., 05:31:10

Hi,

I updated the wiki page.
https://wiki.postgresql.org/wiki/Incremental_View_Maintenance

On Fri, 21 Aug 2020 21:40:50 +0900 (JST)
Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

> From: Yugo NAGATA <nagata@sraoss.co.jp>
> Subject: Re: Implementing Incremental View Maintenance
> Date: Fri, 21 Aug 2020 17:23:20 +0900
> Message-ID: <20200821172320.a2506577d5244b6066f69331@sraoss.co.jp>
> 
> > On Wed, 19 Aug 2020 10:02:42 +0900 (JST)
> > Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
> > 
> >> I have looked into this.
> > 
> > Thank you for your reviewing!
> >  
> >> - 0004-Allow-to-prolong-life-span-of-transition-tables-unti.patch:
> >>   This one needs a comment to describe what the function does etc.
> >> 
> >>   +void
> >>   +SetTransitionTablePreserved(Oid relid, CmdType cmdType)
> >>   +{
> > 
> > I added a comment for this function and related places.
> > 
> > +/*
> > + * SetTransitionTablePreserved
> > + *
> > + * Prolong lifespan of transition tables corresponding specified relid and
> > + * command type to the end of the outmost query instead of each nested query.
> > + * This enables to use nested AFTER trigger's transition tables from outer
> > + * query's triggers.  Currently, only immediate incremental view maintenance
> > + * uses this.
> > + */
> > +void
> > +SetTransitionTablePreserved(Oid relid, CmdType cmdType)
> > 
> > Also, I removed releted unnecessary code which was left accidentally.
> > 
> >  
> >> - 0007-Add-aggregates-support-in-IVM.patch
> >>   "Check if the given aggregate function is supporting" shouldn't be
> >>   "Check if the given aggregate function is supporting IVM"?
> > 
> > Yes, you are right. I fixed this, too.
> > 
> >> 
> >> + * check_aggregate_supports_ivm
> >> + *
> >> + * Check if the given aggregate function is supporting
> 
> Thanks for the fixes. I have changed the commit fest status to "Ready
> for Committer".
> 
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
> 
> 


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Thomas Munro

Дата:

05 сентября 2020 г., 05:56:18

Hi Nagata-san,

On Mon, Aug 31, 2020 at 5:32 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> https://wiki.postgresql.org/wiki/Incremental_View_Maintenance

Thanks for writing this!

+   /*
+    * Wait for concurrent transactions which update this materialized view at
+    * READ COMMITED. This is needed to see changes committed in other
+    * transactions. No wait and raise an error at REPEATABLE READ or
+    * SERIALIZABLE to prevent update anomalies of matviews.
+    * XXX: dead-lock is possible here.
+    */
+   if (!IsolationUsesXactSnapshot())
+       LockRelationOid(matviewOid, ExclusiveLock);
+   else if (!ConditionalLockRelationOid(matviewOid, ExclusiveLock))

Could you please say a bit more about your plans for concurrency control?

Simple hand-crafted "rollup" triggers typically conflict only when
modifying the same output rows due to update/insert conflicts, or
perhaps some explicit row level locking if they're doing something
complex (unfortunately, they also very often have concurrency
bugs...).  In some initial reading about MV maintenance I did today in
the hope of understanding some more context for this very impressive
but rather intimidating patch set, I gained the impression that
aggregate-row locking granularity is assumed as a baseline for eager
incremental aggregate maintenance.  I understand that our
MVCC/snapshot scheme introduces extra problems, but I'm wondering if
these problems can be solved using the usual update semantics (the
EvalPlanQual mechanism), and perhaps also some UPSERT logic.  Why is
it not sufficient to have locked all the base table rows that you have
modified, captured the before-and-after values generated by those
updates, and also locked all the IMV aggregate rows you will read, and
in the process acquired a view of the latest committed state of the
IMV aggregate rows you will modify (possibly having waited first)?  In
other words, what other data do you look at, while computing the
incremental update, that might suffer from anomalies because of
snapshots and concurrency?  For one thing, I am aware that unique
indexes for groups would probably be necessary; perhaps some subtle
problems of the sort usually solved with predicate locks lurk there?

(Newer papers describe locking schemes that avoid even aggregate-row
level conflicts, by taking advantage of the associativity and
commutativity of aggregates like SUM and COUNT.  You can allow N
writers to update the aggregate concurrently, and if any transaction
has to roll back it subtracts what it added, not necessarily restoring
the original value, so that nobody conflicts with anyone else, or
something like that...  Contemplating an MVCC, no-rollbacks version of
that sort of thing leads to ideas like, I dunno, update chains
containing differential update trees to be compacted later... egad!)

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

09 сентября 2020 г., 00:27:52

Hi Thomas,

Thank you for your comment!

On Sat, 5 Sep 2020 17:56:18 +1200
Thomas Munro <thomas.munro@gmail.com> wrote:
> +   /*
> +    * Wait for concurrent transactions which update this materialized view at
> +    * READ COMMITED. This is needed to see changes committed in other
> +    * transactions. No wait and raise an error at REPEATABLE READ or
> +    * SERIALIZABLE to prevent update anomalies of matviews.
> +    * XXX: dead-lock is possible here.
> +    */
> +   if (!IsolationUsesXactSnapshot())
> +       LockRelationOid(matviewOid, ExclusiveLock);
> +   else if (!ConditionalLockRelationOid(matviewOid, ExclusiveLock))
> 
> Could you please say a bit more about your plans for concurrency control?
> 
> Simple hand-crafted "rollup" triggers typically conflict only when
> modifying the same output rows due to update/insert conflicts, or
> perhaps some explicit row level locking if they're doing something
> complex (unfortunately, they also very often have concurrency
> bugs...).  In some initial reading about MV maintenance I did today in
> the hope of understanding some more context for this very impressive
> but rather intimidating patch set, I gained the impression that
> aggregate-row locking granularity is assumed as a baseline for eager
> incremental aggregate maintenance.  I understand that our
> MVCC/snapshot scheme introduces extra problems, but I'm wondering if
> these problems can be solved using the usual update semantics (the
> EvalPlanQual mechanism), and perhaps also some UPSERT logic.  Why is
> it not sufficient to have locked all the base table rows that you have
> modified, captured the before-and-after values generated by those
> updates, and also locked all the IMV aggregate rows you will read, and
> in the process acquired a view of the latest committed state of the
> IMV aggregate rows you will modify (possibly having waited first)?  In
> other words, what other data do you look at, while computing the
> incremental update, that might suffer from anomalies because of
> snapshots and concurrency?  For one thing, I am aware that unique
> indexes for groups would probably be necessary; perhaps some subtle
> problems of the sort usually solved with predicate locks lurk there?

I decided to lock a matview considering views joining tables.  
For example, let V = R*S is an incrementally maintainable materialized 
view which joins tables R and S.  Suppose there are two concurrent 
transactions T1 which changes table R to R' and T2 which changes S to S'.  
Without any lock,  in READ COMMITTED mode, V would be updated to
represent  V=R'*S in T1, and V=R*S' in T2, so it would cause inconsistency.  
By locking the view V, transactions T1, T2 are processed serially and this 
inconsistency can be avoided.

I also thought it might be resolved using tuple locks and EvalPlanQual
instead of table level lock, but there is still a unavoidable case.  For
example, suppose that tuple dR is inserted into R in T1, and dS is inserted
into S in T2. Also, suppose that dR and dS will be joined in according to
the view definition. In this situation, without any lock, the change of V is
computed as dV=dR*S in T1, dV=R*dS in T2, respectively, and dR*dS would not
be included in the results.  This causes inconsistency. I don't think this
could be resolved even if we use tuple locks.

As to aggregate view without join , however, we might be able to use a lock
of more low granularity as you said, because if rows belonging a group in a
table is changes, we just update (or delete) corresponding rows in the view.  
Even if there are concurrent transactions updating the same table, we would
be able to make one of them wait using tuple lock. If concurrent transactions
are trying to insert a tuple into the same table, we might need to use unique
index and UPSERT to avoid to insert multiple rows with same group key into
the view.

Therefore, usual update semantics (tuple locks and EvalPlanQual) and UPSERT
can be used for optimization for some classes of view, but we don't have any
other better idea than using table lock for views joining tables.  We would
appreciate it if you could suggest better solution. 

> (Newer papers describe locking schemes that avoid even aggregate-row
> level conflicts, by taking advantage of the associativity and
> commutativity of aggregates like SUM and COUNT.  You can allow N
> writers to update the aggregate concurrently, and if any transaction
> has to roll back it subtracts what it added, not necessarily restoring
> the original value, so that nobody conflicts with anyone else, or
> something like that...  Contemplating an MVCC, no-rollbacks version of
> that sort of thing leads to ideas like, I dunno, update chains
> containing differential update trees to be compacted later... egad!)

I am interested in papers you mentioned!  Are they literatures in context of
incremental view maintenance?

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Thomas Munro

Дата:

09 сентября 2020 г., 02:22:28

On Wed, Sep 9, 2020 at 12:29 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> I also thought it might be resolved using tuple locks and EvalPlanQual
> instead of table level lock, but there is still a unavoidable case.  For
> example, suppose that tuple dR is inserted into R in T1, and dS is inserted
> into S in T2. Also, suppose that dR and dS will be joined in according to
> the view definition. In this situation, without any lock, the change of V is
> computed as dV=dR*S in T1, dV=R*dS in T2, respectively, and dR*dS would not
> be included in the results.  This causes inconsistency. I don't think this
> could be resolved even if we use tuple locks.

I see.  Thanks for the explanation!

> As to aggregate view without join , however, we might be able to use a lock
> of more low granularity as you said, because if rows belonging a group in a
> table is changes, we just update (or delete) corresponding rows in the view.
> Even if there are concurrent transactions updating the same table, we would
> be able to make one of them wait using tuple lock. If concurrent transactions
> are trying to insert a tuple into the same table, we might need to use unique
> index and UPSERT to avoid to insert multiple rows with same group key into
> the view.
>
> Therefore, usual update semantics (tuple locks and EvalPlanQual) and UPSERT
> can be used for optimization for some classes of view, but we don't have any
> other better idea than using table lock for views joining tables.  We would
> appreciate it if you could suggest better solution.

I have nothing, I'm just reading starter papers and trying to learn a
bit more about the concepts at this stage.  I was thinking of
reviewing some of the more mechanical parts of the patch set, though,
like perhaps the transition table lifetime management, since I have
worked on that area before.

> > (Newer papers describe locking schemes that avoid even aggregate-row
> > level conflicts, by taking advantage of the associativity and
> > commutativity of aggregates like SUM and COUNT.  You can allow N
> > writers to update the aggregate concurrently, and if any transaction
> > has to roll back it subtracts what it added, not necessarily restoring
> > the original value, so that nobody conflicts with anyone else, or
> > something like that...  Contemplating an MVCC, no-rollbacks version of
> > that sort of thing leads to ideas like, I dunno, update chains
> > containing differential update trees to be compacted later... egad!)
>
> I am interested in papers you mentioned!  Are they literatures in context of
> incremental view maintenance?

Yeah.  I was skim-reading some parts of [1] including section 2.5.1
"Concurrency Control", which opens with some comments about
aggregates, locking and pointers to "V-locking" [2] for high
concurrency aggregates.  There is also a pointer to G. Graefe and M.
J. Zwilling, "Transaction support for indexed views," which I haven't
located; apparently indexed views are Graefe's name for MVs, and
apparently this paper has a section on MVCC systems which sounds
interesting for us.

[1] https://dsf.berkeley.edu/cs286/papers/mv-fntdb2012.pdf
[2] http://pages.cs.wisc.edu/~gangluo/latch.pdf

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

09 сентября 2020 г., 05:49:24

On Wed, 9 Sep 2020 14:22:28 +1200
Thomas Munro <thomas.munro@gmail.com> wrote:

> > Therefore, usual update semantics (tuple locks and EvalPlanQual) and UPSERT
> > can be used for optimization for some classes of view, but we don't have any
> > other better idea than using table lock for views joining tables.  We would
> > appreciate it if you could suggest better solution.
> 
> I have nothing, I'm just reading starter papers and trying to learn a
> bit more about the concepts at this stage.  I was thinking of
> reviewing some of the more mechanical parts of the patch set, though,
> like perhaps the transition table lifetime management, since I have
> worked on that area before.

Thank you for your interrest. It would be greatly appreciated if you
could review the patch.

> > > (Newer papers describe locking schemes that avoid even aggregate-row
> > > level conflicts, by taking advantage of the associativity and
> > > commutativity of aggregates like SUM and COUNT.  You can allow N
> > > writers to update the aggregate concurrently, and if any transaction
> > > has to roll back it subtracts what it added, not necessarily restoring
> > > the original value, so that nobody conflicts with anyone else, or
> > > something like that...  Contemplating an MVCC, no-rollbacks version of
> > > that sort of thing leads to ideas like, I dunno, update chains
> > > containing differential update trees to be compacted later... egad!)
> >
> > I am interested in papers you mentioned!  Are they literatures in context of
> > incremental view maintenance?
> 
> Yeah.  I was skim-reading some parts of [1] including section 2.5.1
> "Concurrency Control", which opens with some comments about
> aggregates, locking and pointers to "V-locking" [2] for high
> concurrency aggregates.  There is also a pointer to G. Graefe and M.
> J. Zwilling, "Transaction support for indexed views," which I haven't
> located; apparently indexed views are Graefe's name for MVs, and
> apparently this paper has a section on MVCC systems which sounds
> interesting for us.
> 
> [1] https://dsf.berkeley.edu/cs286/papers/mv-fntdb2012.pdf
> [2] http://pages.cs.wisc.edu/~gangluo/latch.pdf

Thanks for your information!  I will also check references
regarding with IVM and concurrency control.

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Tatsuo Ishii

Дата:

17 сентября 2020 г., 00:42:45

> I have nothing, I'm just reading starter papers and trying to learn a
> bit more about the concepts at this stage.  I was thinking of
> reviewing some of the more mechanical parts of the patch set, though,
> like perhaps the transition table lifetime management, since I have
> worked on that area before.

Do you have comments on this part?

I am asking because these patch sets are now getting closer to
committable state in my opinion, and if there's someting wrong, it
should be fixed soon so that these patches are getting into the master
branch.

I think this feature has been long awaited by users and merging the
patches should be a benefit for them.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: Implementing Incremental View Maintenance

От

Tatsuo Ishii

Дата:

01 октября 2020 г., 04:03:51

> On Thu, Sep 17, 2020 at 09:42:45AM +0900, Tatsuo Ishii wrote:
>> I am asking because these patch sets are now getting closer to
>> committable state in my opinion, and if there's someting wrong, it
>> should be fixed soon so that these patches are getting into the master
>> branch.
>> 
>> I think this feature has been long awaited by users and merging the
>> patches should be a benefit for them.
> 
> I don't have much thoughts to offer about that, but this patch is
> failing to apply, so a rebase is at least necessary.

Yes. I think he is going to post a new patch (possibly with
enhancements) soon.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: Implementing Incremental View Maintenance

От

Fujii Masao

Дата:

01 октября 2020 г., 04:43:49


On 2020/10/01 13:03, Tatsuo Ishii wrote:
>> On Thu, Sep 17, 2020 at 09:42:45AM +0900, Tatsuo Ishii wrote:
>>> I am asking because these patch sets are now getting closer to
>>> committable state in my opinion, and if there's someting wrong, it
>>> should be fixed soon so that these patches are getting into the master
>>> branch.
>>>
>>> I think this feature has been long awaited by users and merging the
>>> patches should be a benefit for them.
>>
>> I don't have much thoughts to offer about that, but this patch is
>> failing to apply, so a rebase is at least necessary.
> 
> Yes. I think he is going to post a new patch (possibly with
> enhancements) soon.

When I glanced the doc patch (i.e., 0012), I found some typos.

+    <command>CRATE INCREMENTAL MATERIALIZED VIEW</command>, for example:

Typo: CRATE should be CREATE ?

+    with <literal>__ivm_</literal> and they contains information required

Typo: contains should be contain ?

+    For exmaple, here are two materialized views based on the same view

Typo: exmaple should be example ?

+    maintenance can be lager than <command>REFRESH MATERIALIZED VIEW</command>

Typo: lager should be larger ?

+postgres=# SELECt * FROM m; -- automatically updated

Typo: SELECt should be SELECT ?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

01 октября 2020 г., 05:06:27

On Thu, 1 Oct 2020 13:43:49 +0900
Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
 
> When I glanced the doc patch (i.e., 0012), I found some typos.

Thank you for your pointing out typos!  I'll fix it.


> 
> +    <command>CRATE INCREMENTAL MATERIALIZED VIEW</command>, for example:
> 
> Typo: CRATE should be CREATE ?
> 
> +    with <literal>__ivm_</literal> and they contains information required
> 
> Typo: contains should be contain ?
> 
> +    For exmaple, here are two materialized views based on the same view
> 
> Typo: exmaple should be example ?
> 
> +    maintenance can be lager than <command>REFRESH MATERIALIZED VIEW</command>
> 
> Typo: lager should be larger ?
> 
> +postgres=# SELECt * FROM m; -- automatically updated
> 
> Typo: SELECt should be SELECT ?
> 
> Regards,
> 
> -- 
> Fujii Masao
> Advanced Computing Technology Center
> Research and Development Headquarters
> NTT DATA CORPORATION


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

05 октября 2020 г., 09:16:18

Hi,

Attached is the rebased patch (v18) to add support for Incremental
Materialized View Maintenance (IVM). It is able to be applied to
current latest master branch.

Also, this now supports simple CTEs (WITH clauses) which do not contain
aggregates or DISTINCT like simple sub-queries. This feature is provided
as an additional patch segment "0010-Add-CTE-support-in-IVM.patch".

==== Example ====

cte=# TABLE r;
 i | v  
---+----
 1 | 10
 2 | 20
(2 rows)

cte=# TABLE s;
 i |  v  
---+-----
 2 | 200
 3 | 300
(2 rows)

cte=# \d+ mv
                              Materialized view "public.mv"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target | Description 
--------+---------+-----------+----------+---------+---------+--------------+-------------
 r      | integer |           |          |         | plain   |              | 
 x      | integer |           |          |         | plain   |              | 
View definition:
 WITH x AS (
         SELECT s.i,
            s.v
           FROM s
        )
 SELECT r.v AS r,
    x.v AS x
   FROM r,
    x
  WHERE r.i = x.i;
Access method: heap
Incremental view maintenance: yes

cte=# SELECT * FROM mv;
 r  |  x  
----+-----
 20 | 200
(1 row)

cte=# INSERT INTO r VALUES (3,30);
INSERT 0 1
cte=# INSERT INTO s VALUES (1,100);
INSERT 0 1
cte=# SELECT * FROM mv;
 r  |  x  
----+-----
 20 | 200
 30 | 300
 10 | 100
(3 rows)

======================

Regards,
Yugo Nagata


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Вложения

IVM_patches_v18.tar.gz

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

16 октября 2020 г., 10:30:34

Hi,

I have reviewed the past discussions in this thread on IVM implementation
of the proposed patch[1], and summarized it as following . We would appreciate
any comments or suggestions on the patch as regard of them.

* Aggregate support

The current patch supports several built-in aggregates, that is, count, sum, 
avg, min, and max. Other built-in aggregates or user-defined aggregates are
not supported.

Aggregates in a materialized view definition is checked if this is supported
using OIDs of aggregate function. For this end, Gen_fmgrtab.pl is changed to
output aggregate function's OIDs to fmgroids.h
(by 0006-Change-Gen_fmgrtab.pl-to-output-aggregate-function-s.patch). 
The logic for each aggregate function to update aggregate values in
materialized views is enbedded in a trigger function.
    
There was another option in the past discussion. That is, we could add one
or more new attribute to pg_aggregate which provides information about if
each aggregate function supports IVM and its logic[2]. If we have a mechanism
to support IVM in pg_aggregate, we may use more general aggregate functions
including user-defined aggregate in materialized views for IVM.

For example, the current pg_aggregate has aggcombinefn attribute for
supporting partial aggregation.  Maybe we could use combine functions to
calculate new aggregate values in materialized views when tuples are
inserted into a base table.  However, in the context of IVM, we also need
other function used when tuples are deleted from a base table, so we can not
use partial aggregation for IVM in the current implementation. 

Maybe, we could support the deletion case by adding a new support function,
say, "inverse combine function".  The "inverse combine function" would take 
aggregate value in a materialized view and aggregate value calculated from a
delta of view, and produces the new aggregate value which equals the result
after tuples in a base table are deleted.

However, we don't have concrete plan for the new design of pg_aggregate.  
In addition, even if make a new support function in pg_aggregate for IVM, 
we can't use this in the current IVM code because our code uses SQL via SPI
in order to update a materialized view and we can't call "internal" type
function directly in SQL.

For these reasons, in the current patch, we decided to left supporting
general aggregates to the next version for simplicity, so the current patch
supports only some built-in aggregates and checks if they can be used in IVM
by their OIDs.

* Hidden columns

For supporting aggregates, DISTINCT, and EXISTS, the current implementation
automatically create hidden columns whose name starts with "__ivm_" in
materialized views.

The columns starting with "__ivm_" are hidden, so when "SELECT * FROM ..." is
issued to a materialized view,  these are invisible for users. Users can not
use such name as a user column in materialized views with IVM support. 

As for how to make internal columns invisible to SELECT *, previously there
have been discussions about doing that using a new flag in pg_attribute[3]. 
However, the discussion is no longer active. So, we decided to use column
name for checking if this is special or not in our IVM implementation
for now.

* TRUNCATE support

Currently, TRUNCATE on base tables are not supported. When TRUNCATE command
is executed on a base table, it is ignored and nothing occurs on materialized
views. 

There are another options as followings:

- Raise an error or warning when a base table is TRUNCATEed.
- Make the view non-scannable (like REFRESH WITH NO DATA)
- Update the view in any way. It would be easy for inner joins
  or aggregate views, but there is some difficult with outer joins.

Which is the best way? Should we support TRUNCATE in the first version?
Any suggestions would be greatly appreciated.

[1] https://wiki.postgresql.org/wiki/Incremental_View_Maintenance
[2] https://www.postgresql.org/message-id/20191129173328.e5a0e9f81e369a3769c4fd0c%40sraoss.co.jp
[3]
https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com


Regard,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Tatsuo Ishii

Дата:

19 октября 2020 г., 03:24:32

> * Aggregate support
> 
> The current patch supports several built-in aggregates, that is, count, sum, 
> avg, min, and max. Other built-in aggregates or user-defined aggregates are
> not supported.
> 
> Aggregates in a materialized view definition is checked if this is supported
> using OIDs of aggregate function. For this end, Gen_fmgrtab.pl is changed to
> output aggregate function's OIDs to fmgroids.h
> (by 0006-Change-Gen_fmgrtab.pl-to-output-aggregate-function-s.patch). 
> The logic for each aggregate function to update aggregate values in
> materialized views is enbedded in a trigger function.
>     
> There was another option in the past discussion. That is, we could add one
> or more new attribute to pg_aggregate which provides information about if
> each aggregate function supports IVM and its logic[2]. If we have a mechanism
> to support IVM in pg_aggregate, we may use more general aggregate functions
> including user-defined aggregate in materialized views for IVM.
> 
> For example, the current pg_aggregate has aggcombinefn attribute for
> supporting partial aggregation.  Maybe we could use combine functions to
> calculate new aggregate values in materialized views when tuples are
> inserted into a base table.  However, in the context of IVM, we also need
> other function used when tuples are deleted from a base table, so we can not
> use partial aggregation for IVM in the current implementation. 
> 
> Maybe, we could support the deletion case by adding a new support function,
> say, "inverse combine function".  The "inverse combine function" would take 
> aggregate value in a materialized view and aggregate value calculated from a
> delta of view, and produces the new aggregate value which equals the result
> after tuples in a base table are deleted.
> 
> However, we don't have concrete plan for the new design of pg_aggregate.  
> In addition, even if make a new support function in pg_aggregate for IVM, 
> we can't use this in the current IVM code because our code uses SQL via SPI
> in order to update a materialized view and we can't call "internal" type
> function directly in SQL.
> 
> For these reasons, in the current patch, we decided to left supporting
> general aggregates to the next version for simplicity, so the current patch
> supports only some built-in aggregates and checks if they can be used in IVM
> by their OIDs.

Current patch for IVM is already large. I think implementing above
will make the patch size even larger, which makes reviewer's work
difficult. So I personally think we should commit the patch as it is,
then enhance IVM to support user defined and other aggregates in later
version of PostgreSQL.

However, if supporting user defined and other aggregates is quite
important for certain users, then we should rethink about this. It
will be nice if we could know how high such demand is.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

22 октября 2020 г., 03:21:26

Hi Adam Brusselback,

On Mon, 31 Dec 2018 11:20:11 -0500
Adam Brusselback <adambrusselback@gmail.com> wrote:

> Hi all, just wanted to say  I am very happy to see progress made on this,
> my codebase has multiple "materialized tables" which are maintained with
> statement triggers (transition tables) and custom functions. They are ugly
> and a pain to maintain, but they work because I have no other
> solution...for now at least.

We are want to find sutable use cases of the IVM patch being discussed in this
thread, and I remembered your post that said you used statement triggers and
custom functions. We hope the patch will help you.

The patch implements IVM of immediate, that is, eager approach. Materialized
views are updated immediately when its base tables are modified. While the view
is always up-to-date, there is a overhead on base table modification. 

We would appreciate it if you could tell us what your use cases of materialized
view is and whether our implementation suits your needs or not.

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Adam Brusselback

Дата:

22 октября 2020 г., 14:07:29

Hey there Yugo,

I've asked a coworker to prepare a self contained example that encapsulates our multiple use cases.

The immediate/eager approach is exactly what we need, as within the same transaction we have statements that can cause one of those "materialized tables" to be updated, and then sometimes have the need to query that "materialized table" in a subsequent statement and need to see the changes reflected.

As soon as my coworker gets that example built up I'll send a followup with it attached.

Thank you,

Adam Brusselback

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

23 октября 2020 г., 07:57:26

Hi Adam,

On Thu, 22 Oct 2020 10:07:29 -0400
Adam Brusselback <adambrusselback@gmail.com> wrote:

> Hey there Yugo,
> I've asked a coworker to prepare a self contained example that encapsulates
> our multiple use cases.

Thank you very much!

> The immediate/eager approach is exactly what we need, as within the same
> transaction we have statements that can cause one of those "materialized
> tables" to be updated, and then sometimes have the need to query that
> "materialized table" in a subsequent statement and need to see the changes
> reflected.

The proposed patch provides the exact this feature and I think this will meet
your needs.

> As soon as my coworker gets that example built up I'll send a followup with
> it attached.

Great! We are looking forward to it. 

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Adam Brusselback

Дата:

27 октября 2020 г., 16:14:52

That was a good bit more work to get ready than I expected. It's broken into two scripts, one to create the schema, the other to load data and containing a couple check queries to ensure things are working properly (checking the materialized tables against a regular view for accuracy).

The first test case is to give us a definitive result on what "agreed pricing" is in effect at a point in time based on a product hierarchy our customers setup, and allow pricing to be set on nodes in that hierarchy, as well as specific products (with an order of precedence).

The second test case maintains some aggregated amounts / counts / boolean logic at an "invoice" level for all the detail lines which make up that invoice.

Both of these are real-world use cases which were simplified a bit to make them easier to understand. We have other use cases as well, but with how much time this took to prepare i'll keep it at this for now.

If you need anything clarified or have any issues, just let me know.

On Fri, Oct 23, 2020 at 3:58 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:

Hi Adam,

On Thu, 22 Oct 2020 10:07:29 -0400
Adam Brusselback <adambrusselback@gmail.com> wrote:

> Hey there Yugo,
> I've asked a coworker to prepare a self contained example that encapsulates
> our multiple use cases.

Thank you very much!

> The immediate/eager approach is exactly what we need, as within the same
> transaction we have statements that can cause one of those "materialized
> tables" to be updated, and then sometimes have the need to query that
> "materialized table" in a subsequent statement and need to see the changes
> reflected.

The proposed patch provides the exact this feature and I think this will meet
your needs.

> As soon as my coworker gets that example built up I'll send a followup with
> it attached.

Great! We are looking forward to it.

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

ср, 28 окт. 2020 г. в 08:02, Yugo NAGATA <nagata@sraoss.co.jp>:

Hi Anastasia Lubennikova,

I am writing this to you because I would like to ask the commitfest
manager something.

The status of the patch was changed to "Waiting on Author" from
"Ready for Committer" at the beginning of this montfor the reason
that rebase was necessary. Now I updated the patch, so can I change
the status back to "Ready for Committer"?

Regards,
Yugo Nagata

Yes, go ahead. As far as I see, the patch is in a good shape and there are no unanswered questions from reviewers.
Feel free to change the status of CF entries, when it seems reasonable to you.

P.S. Please, avoid top-posting, It makes it harder to follow the discussion, in-line replies are customary in pgsql mailing lists.
See https://en.wikipedia.org/wiki/Posting_style#Top-posting for details.

Best regards,

Lubennikova Anastasia

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

28 октября 2020 г., 09:11:46

On Wed, 28 Oct 2020 12:01:58 +0300
Anastasia Lubennikova <lubennikovaav@gmail.com> wrote:

> ср, 28 окт. 2020 г. в 08:02, Yugo NAGATA <nagata@sraoss.co.jp>:
> 
> > Hi Anastasia Lubennikova,
> >
> > I am writing this to you because I would like to ask the commitfest
> > manager something.
> >
> > The status of the patch was changed to "Waiting on Author" from
> > "Ready for Committer" at the beginning of this montfor the reason
> > that rebase was necessary. Now I updated the patch, so can I change
> > the status back to "Ready for Committer"?
> >
> > Regards,
> > Yugo Nagata
> >
> >
> Yes, go ahead. As far as I see, the patch is in a good shape and there are
> no unanswered questions from reviewers.
> Feel free to change the status of CF entries, when it seems reasonable to
> you.

Thank you for your response! I get it.
 
> P.S. Please, avoid top-posting, It makes it harder to follow the
> discussion, in-line replies are customary in pgsql mailing lists.
> See https://en.wikipedia.org/wiki/Posting_style#Top-posting for details.

I understand it.

Regards,
Yugo Nagata

> -- 
> Best regards,
> Lubennikova Anastasia


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Justin Pryzby

Дата:

06 ноября 2020 г., 04:58:25

On Mon, Oct 05, 2020 at 06:16:18PM +0900, Yugo NAGATA wrote:
> Hi,
> 
> Attached is the rebased patch (v18) to add support for Incremental

This needs to be rebased again - the last version doesn't apply anymore.
http://cfbot.cputube.org/yugo-nagata.html

I looked though it a bit and attach some fixes to the user-facing docs.

There's some more typos in the source that I didn't fix:
constrains
materliazied
cluase
immediaite
clumn
Temrs
migth
recalculaetd
speified
secuirty

commit message: comletion

psql and pg_dump say 13 but should say 14 now:
pset.sversion >= 130000 

# bag union
big union?

+       <structfield>relisivm</structfield> <type>bool</type>
+      </para>
+      <para>
+       True if materialized view enables incremental view maintenance

This isn't clear, but I think it should say "True for materialized views which
are enabled for incremental view maintenance (IVM)."

-- 
Justin

Вложения

0001-incremental-view-doc-fixes.patch

Re: Implementing Incremental View Maintenance

От

Konstantin Knizhnik

Дата:

11 ноября 2020 г., 16:10:35

On 05.10.2020 12:16, Yugo NAGATA wrote:
> Hi,
>
> Attached is the rebased patch (v18) to add support for Incremental
> Materialized View Maintenance (IVM). It is able to be applied to
> current latest master branch.
>

Thank you very much for this work.
I consider incremental materialized views as "reincarnation" of OLAP 
hypercubes.
There are two approaches of making OLAP queries faster:
1. speed up query execution (using JIT, columnar store, vector 
operations and parallel execution)
2. precalculate requested data

Incremental materialize views make it possible to implement second 
approach. But how competitive it is?
I do not know current limitations of incremental materialized views, but 
I checked that basic OLAP functionality:
JOIN+GROUP_BY+AGGREGATION is supported.

The patch is not applied to the current master because makeFuncCall 
prototype is changed,
I fixed it by adding COAERCE_CALL_EXPLICIT.
Then I did the following simple test:

1. Create pgbench database with scale 100.
pgbench speed at my desktop is about 10k TPS:

pgbench -M prepared -N -c 10 -j 4 -T 30 -P 1 postgres
tps = 10194.951827 (including connections establishing)

2. Then I created incremental materialized view:

create incremental materialized view teller_sums as select 
t.tid,sum(abalance) from pgbench_accounts a join pgbench_tellers t on 
a.bid=t.bid group by t.tid;
SELECT 1000
Time: 20805.230 ms (00:20.805)

20 second is reasonable time, comparable with time of database 
initialization.

Then obviously we see advantages of precalculated aggregates:

postgres=# select * from teller_sums where tid=1;
  tid |  sum
-----+--------
    1 | -96427
(1 row)

Time: 0.871 ms
postgres=# select t.tid,sum(abalance) from pgbench_accounts a join 
pgbench_tellers t on a.bid=t.bid group by t.tid having t.tid=1
;
  tid |  sum
-----+--------
    1 | -96427
(1 row)

Time: 915.508 ms

Amazing. Almost 1000 times difference!

3. Run pgbench once again:

Ooops! Now TPS are much lower:

tps = 141.767347 (including connections establishing)

Speed of updates is reduced more than 70 times!
Looks like we loose parallelism because almost the same result I get 
with just one connection.

4. Finally let's create one more view (it is reasonable to expect that 
analytics will run many different queries and so need multiple views).

create incremental materialized view teller_avgs as select 
t.tid,avg(abalance) from pgbench_accounts a join pgbench_tellers t on 
a.bid=t.bid group by t.tid;

It is great that not only simple aggregates like SUM are supported, but 
also AVG.
But insertion speed now is reduced twice - 72TPS.

I tried to make some profiling but didn't see something unusual:

   16.41%  postgres  postgres            [.] ExecInterpExpr
    8.78%  postgres  postgres            [.] slot_deform_heap_tuple
    3.23%  postgres  postgres            [.] ExecMaterial
    2.71%  postgres  postgres            [.] AllocSetCheck
    2.33%  postgres  postgres            [.] AllocSetAlloc
    2.29%  postgres  postgres            [.] slot_getsomeattrs_int
    2.26%  postgres  postgres            [.] ExecNestLoop
    2.11%  postgres  postgres            [.] MemoryContextReset
    1.98%  postgres  postgres            [.] tts_minimal_store_tuple
    1.87%  postgres  postgres            [.] heap_compute_data_size
    1.78%  postgres  postgres            [.] fill_val
    1.56%  postgres  postgres            [.] tuplestore_gettuple
    1.44%  postgres  postgres            [.] sentinel_ok
    1.35%  postgres  postgres            [.] heap_fill_tuple
    1.27%  postgres  postgres            [.] tuplestore_gettupleslot
    1.17%  postgres  postgres            [.] ExecQual
    1.14%  postgres  postgres            [.] tts_minimal_clear
    1.13%  postgres  postgres            [.] CheckOpSlotCompatibility
    1.10%  postgres  postgres            [.] base_yyparse
    1.10%  postgres  postgres            [.] heapgetpage
    1.04%  postgres  postgres            [.] heap_form_minimal_tuple
    1.00%  postgres  postgres            [.] slot_getsomeattrs

So good news is that incremental materialized views really work.
And bad news is that maintenance overhead is too large which 
significantly restrict applicability of this approach.
Certainly in case of dominated read-only workload such materialized 
views can significantly improve performance.
But unfortunately my dream that them allow to combine OLAP+OLPT is not 
currently realized.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Implementing Incremental View Maintenance

От

legrand legrand

Дата:

11 ноября 2020 г., 16:57:06

Hello Konstantin,
I remember testing it with pg_stat_statements (and planning counters
enabled). Maybe identifying internal queries associated with this (simple)
test case, could help dev team ?
Regards
PAscal



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

12 ноября 2020 г., 08:47:48

On Thu, 5 Nov 2020 22:58:25 -0600
Justin Pryzby <pryzby@telsasoft.com> wrote:

> On Mon, Oct 05, 2020 at 06:16:18PM +0900, Yugo NAGATA wrote:
> This needs to be rebased again - the last version doesn't apply anymore.
> http://cfbot.cputube.org/yugo-nagata.html

I attached the rebased patch (v19).
 
> I looked though it a bit and attach some fixes to the user-facing docs.

Thank you for pointing out a lot of typos and making the patch to fix it!
Your fixes are included in the latest patch.
 
> There's some more typos in the source that I didn't fix:
> constrains
> materliazied
> cluase
> immediaite
> clumn
> Temrs
> migth
> recalculaetd
> speified
> secuirty
> 
> commit message: comletion
>
> psql and pg_dump say 13 but should say 14 now:
> pset.sversion >= 130000 

These were also fixed.
 
> # bag union
> big union?

"bag union" is union operation of bag (multi-set) that does not eliminate duplicate of tuples.
 
> +       <structfield>relisivm</structfield> <type>bool</type>
> +      </para>
> +      <para>
> +       True if materialized view enables incremental view maintenance
> 
> This isn't clear, but I think it should say "True for materialized views which
> are enabled for incremental view maintenance (IVM)."

Yes, you are right. I also fixed it in this way.

Regards,
Yugo Nagata


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Вложения

IVM_patches_v19.tar.gz

Re: Implementing Incremental View Maintenance

От

Tatsuo Ishii

Дата:

12 ноября 2020 г., 11:53:55

> 1. Create pgbench database with scale 100.
> pgbench speed at my desktop is about 10k TPS:
>
> pgbench -M prepared -N -c 10 -j 4 -T 30 -P 1 postgres
> tps = 10194.951827 (including connections establishing)
>
> 2. Then I created incremental materialized view:
>
> create incremental materialized view teller_sums as select
> t.tid,sum(abalance) from pgbench_accounts a join pgbench_tellers t on
> a.bid=t.bid group by t.tid;
> SELECT 1000
> Time: 20805.230 ms (00:20.805)
>
> 20 second is reasonable time, comparable with time of database
> initialization.
>
> Then obviously we see advantages of precalculated aggregates:
>
> postgres=# select * from teller_sums where tid=1;
>  tid |  sum
> -----+--------
>    1 | -96427
> (1 row)
>
> Time: 0.871 ms
> postgres=# select t.tid,sum(abalance) from pgbench_accounts a join
> pgbench_tellers t on a.bid=t.bid group by t.tid having t.tid=1
> ;
>  tid |  sum
> -----+--------
>    1 | -96427
> (1 row)
>
> Time: 915.508 ms
>
> Amazing. Almost 1000 times difference!
>
> 3. Run pgbench once again:
>
> Ooops! Now TPS are much lower:
>
> tps = 141.767347 (including connections establishing)
>
> Speed of updates is reduced more than 70 times!
> Looks like we loose parallelism because almost the same result I get
> with just one connection.

How much TPS do you get if you execute pgbench -c 1 without
incremental materialized view defined? If it's around 141 then we
could surely confirm that the major bottle neck is locking contention.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: Implementing Incremental View Maintenance

От

Konstantin Knizhnik

Дата:

12 ноября 2020 г., 12:37:42

On 12.11.2020 14:53, Tatsuo Ishii wrote:
>> 1. Create pgbench database with scale 100.
>> pgbench speed at my desktop is about 10k TPS:
>>
>> pgbench -M prepared -N -c 10 -j 4 -T 30 -P 1 postgres
>> tps = 10194.951827 (including connections establishing)
>>
>> 2. Then I created incremental materialized view:
>>
>> create incremental materialized view teller_sums as select
>> t.tid,sum(abalance) from pgbench_accounts a join pgbench_tellers t on
>> a.bid=t.bid group by t.tid;
>> SELECT 1000
>> Time: 20805.230 ms (00:20.805)
>>
>> 20 second is reasonable time, comparable with time of database
>> initialization.
>>
>> Then obviously we see advantages of precalculated aggregates:
>>
>> postgres=# select * from teller_sums where tid=1;
>>   tid |  sum
>> -----+--------
>>     1 | -96427
>> (1 row)
>>
>> Time: 0.871 ms
>> postgres=# select t.tid,sum(abalance) from pgbench_accounts a join
>> pgbench_tellers t on a.bid=t.bid group by t.tid having t.tid=1
>> ;
>>   tid |  sum
>> -----+--------
>>     1 | -96427
>> (1 row)
>>
>> Time: 915.508 ms
>>
>> Amazing. Almost 1000 times difference!
>>
>> 3. Run pgbench once again:
>>
>> Ooops! Now TPS are much lower:
>>
>> tps = 141.767347 (including connections establishing)
>>
>> Speed of updates is reduced more than 70 times!
>> Looks like we loose parallelism because almost the same result I get
>> with just one connection.
> How much TPS do you get if you execute pgbench -c 1 without
> incremental materialized view defined? If it's around 141 then we
> could surely confirm that the major bottle neck is locking contention.
>

My desktop has just 4 physical cores, so performance with one connection 
is about 2k TPS:

pgbench -M prepared -N -c 1 -T 60 -P 1 postgres
tps = 1949.233532 (including connections establishing)

So there is still large gap (~14 times) between insert speed 
with/without incremental view.
I did more investigations and found out that one pf the reasons of bad 
performance in this case is lack of index on materialized view,
so update has to perform sequential scan through 1000 elements.

Well, creation of proper indexes for table is certainly responsibility 
of DBA.
But users may not consider materialized view as normal table. So the 
idea that index should
be explicitly created for materialized view seems to be not so obvious.
 From the other side, implementation of materialized view knows which 
index is needed for performing efficient incremental update.
I wonder if it can create such index itself implicitly or at least 
produce notice with proposal to create such index.

In any case, after creation of index on tid column of materialized view, 
pgbench speed is increased from 141 to 331 TPS
(more than two times). It is with single connection. But if I run 
pgbench with 10 connections, then performance is even slightly slower: 
289 TPS.

I looked throw your patch for exclusive table locks and found this 
fragment in matview.c:

     /*
      * Wait for concurrent transactions which update this materialized 
view at
      * READ COMMITED. This is needed to see changes committed in other
      * transactions. No wait and raise an error at REPEATABLE READ or
      * SERIALIZABLE to prevent update anomalies of matviews.
      * XXX: dead-lock is possible here.
      */
     if (!IsolationUsesXactSnapshot())
         LockRelationOid(matviewOid, ExclusiveLock);
     else if (!ConditionalLockRelationOid(matviewOid, ExclusiveLock))

I replaced it with RowExlusiveLock and ... got 1437 TPS with 10 connections.
It is still about 7 times slower than performance without incremental view.
But now the gap is not so dramatic. And it seems to be clear that this 
exclusive lock on matview is real show stopper for concurrent updates.
I do not know which race conditions and anomalies we can get if replace 
table-level lock with row-level lock here.
But I think that this problem should be addressed in any case: single 
client update mode is very rare scenario.

I attached to this mail profile of pgbench workload with defined 
incremental view (with index).
May be you will find it useful.

One more disappointing observation of materialized views (now 
non-incremental).
Time of creation of non-incremental materialized view is about 18 seconds:

postgres=# create materialized view teller_avgs as select 
t.tid,avg(abalance) from pgbench_accounts a join pgbench_tellers t on 
a.bid=t.bid group by t.tid;
SELECT 1000
Time: 17795.395 ms (00:17.795)

But refresh of such view takes 55 seconds:

postgres=# refresh materialized view teller_avgs;
REFRESH MATERIALIZED VIEW
Time: 55500.381 ms (00:55.500)

And refresh time doesn't depend on amount of updates since last refresh:
I got almost the same time when I ran pgbench for one minute before 
refresh and
when  two refreshes are performed subsequently.

Adding index doesn't help much in this case and concurrent refresh is 
even slower:

postgres=# refresh materialized view concurrently teller_avgs;
REFRESH MATERIALIZED VIEW
Time: 56981.772 ms (00:56.982)

So it seems to be more efficient to drop and recreate materialized view 
rather than refresh it. At least in this case.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Вложения

imv.svg

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

24 ноября 2020 г., 09:21:14

On Wed, 11 Nov 2020 19:10:35 +0300
Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:

Thank you for reviewing this patch!

> 
> The patch is not applied to the current master because makeFuncCall 
> prototype is changed,
> I fixed it by adding COAERCE_CALL_EXPLICIT.

The rebased patch was submitted.

> Ooops! Now TPS are much lower:
> 
> tps = 141.767347 (including connections establishing)
> 
> Speed of updates is reduced more than 70 times!
> Looks like we loose parallelism because almost the same result I get 
> with just one connection.

As you and Ishii-san mentioned in other posts, I think the reason would be a
table lock on the materialized view that is acquired during view maintenance.
I will explain more a bit in another post.

> 4. Finally let's create one more view (it is reasonable to expect that 
> analytics will run many different queries and so need multiple views).
> 
> create incremental materialized view teller_avgs as select 
> t.tid,avg(abalance) from pgbench_accounts a join pgbench_tellers t on 
> a.bid=t.bid group by t.tid;
> 
> It is great that not only simple aggregates like SUM are supported, but 
> also AVG.
> But insertion speed now is reduced twice - 72TPS.

Yes, the current implementation takes twice time for updating a table time
when a new incrementally maintainable materialized view is defined on the
table because view maintenance is performed for each view.

> 
> So good news is that incremental materialized views really work.
> And bad news is that maintenance overhead is too large which 
> significantly restrict applicability of this approach.
> Certainly in case of dominated read-only workload such materialized 
> views can significantly improve performance.
> But unfortunately my dream that them allow to combine OLAP+OLPT is not 
> currently realized.

As you concluded, there is a large overhead on updating base tables in the
current implementation because it is immediate maintenance in which the view
is updated in the same sentence where its base table is modified. Therefore,
this is not suitable to OLTP workload where there are frequent updates of
tables. 

For suppressing maintenance overhead in such workload, we have to implement
"deferred maintenance" which collects table change logs and updates the view
in another transaction afterward.

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

24 ноября 2020 г., 09:21:33

On Thu, 12 Nov 2020 15:37:42 +0300
Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:

> Well, creation of proper indexes for table is certainly responsibility 
> of DBA.
> But users may not consider materialized view as normal table. So the 
> idea that index should
> be explicitly created for materialized view seems to be not so obvious.
>  From the other side, implementation of materialized view knows which 
> index is needed for performing efficient incremental update.
> I wonder if it can create such index itself implicitly or at least 
> produce notice with proposal to create such index.

That makes sense. Especially for aggregate views, it is obvious that
creating an index on expressions used in GROUP BY is effective. For
other views, creating an index on columns that come from primary keys
of base tables would be effective if any.

However, if any base table doesn't have a primary or unique key or such
key column is not contained in the view's target list, it is hard to
decide an appropriate index on the view. We can create an index on all
columns in the target list, but it could cause overhead on view maintenance. 
So, just producing notice would be better for such cases. 

> I looked throw your patch for exclusive table locks and found this 
> fragment in matview.c:
> 
>      /*
>       * Wait for concurrent transactions which update this materialized 
> view at
>       * READ COMMITED. This is needed to see changes committed in other
>       * transactions. No wait and raise an error at REPEATABLE READ or
>       * SERIALIZABLE to prevent update anomalies of matviews.
>       * XXX: dead-lock is possible here.
>       */
>      if (!IsolationUsesXactSnapshot())
>          LockRelationOid(matviewOid, ExclusiveLock);
>      else if (!ConditionalLockRelationOid(matviewOid, ExclusiveLock))
> 
> 
> I replaced it with RowExlusiveLock and ... got 1437 TPS with 10 connections.
> It is still about 7 times slower than performance without incremental view.
> But now the gap is not so dramatic. And it seems to be clear that this 
> exclusive lock on matview is real show stopper for concurrent updates.
> I do not know which race conditions and anomalies we can get if replace 
> table-level lock with row-level lock here.

I explained it here:
https://www.postgresql.org/message-id/20200909092752.c91758a1bec3479668e82643%40sraoss.co.jp

For example, suppose there is a view V = R*S that joins tables R and S,
and there are two concurrent transactions T1 which changes table R to R'
and T2 which changes S to S'. Without any lock,  in READ COMMITTED mode,
V would be updated to R'*S in T1, and R*S' in T2, so it would cause
inconsistency.  By locking the view V, transactions T1, T2 are processed
serially and this inconsistency can be avoided.

Especially, suppose that tuple dR is inserted into R in T1, and dS is
inserted into S in T2, where dR and dS will be joined in according to
the view definition. In this situation, without any lock, the change of V is
computed as dV=dR*S in T1, dV=R*dS in T2, respectively, and dR*dS would not
be included in the results.  This inconsistency could not be resolved by
row-level lock.

> But I think that this problem should be addressed in any case: single 
> client update mode is very rare scenario.

This behavior is explained in rules.sgml like this:

+<sect2>
+<title>Concurrent Transactions</title>
+<para>
+    Suppose an <acronym>IMMV</acronym> is defined on two base tables and each
+    table was modified in different a concurrent transaction simultaneously.
+    In the transaction which was committed first, <acronym>IMMV</acronym> can 
+    be updated considering only the change which happened in this transaction.
+    On the other hand, in order to update the view correctly in the transaction
+    which was committed later, we need to know the changes occurred in
+    both transactions.  For this reason, <literal>ExclusiveLock</literal>
+    is held on an <acronym>IMMV</acronym> immediately after a base table is
+    modified in <literal>READ COMMITTED</literal> mode to make sure that
+    the <acronym>IMMV</acronym> is updated in the latter transaction after
+    the former transaction is committed.  In <literal>REPEATABLE READ</literal>
+    or <literal>SERIALIZABLE</literal> mode, an error is raised immediately
+    if lock acquisition fails because any changes which occurred in
+    other transactions are not be visible in these modes and 
+    <acronym>IMMV</acronym> cannot be updated correctly in such situations.
+</para>
+</sect2>

Hoever, should we describe explicitly its impact on performance here?

> I attached to this mail profile of pgbench workload with defined 
> incremental view (with index).
> May be you will find it useful.

Thank you for your profiling! Hmm, it shows that overhead of executing
query for calculating the delta (refresh_mateview_datfill) and applying
the delta (SPI_exec) is large.... I will investigate if more optimizations
to reduce the overhead is possible.

> 
> One more disappointing observation of materialized views (now 
> non-incremental).
> Time of creation of non-incremental materialized view is about 18 seconds:
> 
> postgres=# create materialized view teller_avgs as select 
> t.tid,avg(abalance) from pgbench_accounts a join pgbench_tellers t on 
> a.bid=t.bid group by t.tid;
> SELECT 1000
> Time: 17795.395 ms (00:17.795)
> 
> But refresh of such view takes 55 seconds:
> 
> postgres=# refresh materialized view teller_avgs;
> REFRESH MATERIALIZED VIEW
> Time: 55500.381 ms (00:55.500)

Hmm, interesting... I would like to investigate this issue, too.

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Konstantin Knizhnik

Дата:

24 ноября 2020 г., 09:46:57


On 24.11.2020 12:21, Yugo NAGATA wrote:
>
>> I replaced it with RowExlusiveLock and ... got 1437 TPS with 10 connections.
>> It is still about 7 times slower than performance without incremental view.
>> But now the gap is not so dramatic. And it seems to be clear that this
>> exclusive lock on matview is real show stopper for concurrent updates.
>> I do not know which race conditions and anomalies we can get if replace
>> table-level lock with row-level lock here.
> I explained it here:
> https://www.postgresql.org/message-id/20200909092752.c91758a1bec3479668e82643%40sraoss.co.jp
>   
> For example, suppose there is a view V = R*S that joins tables R and S,
> and there are two concurrent transactions T1 which changes table R to R'
> and T2 which changes S to S'. Without any lock,  in READ COMMITTED mode,
> V would be updated to R'*S in T1, and R*S' in T2, so it would cause
> inconsistency.  By locking the view V, transactions T1, T2 are processed
> serially and this inconsistency can be avoided.
>
> Especially, suppose that tuple dR is inserted into R in T1, and dS is
> inserted into S in T2, where dR and dS will be joined in according to
> the view definition. In this situation, without any lock, the change of V is
> computed as dV=dR*S in T1, dV=R*dS in T2, respectively, and dR*dS would not
> be included in the results.  This inconsistency could not be resolved by
> row-level lock.
>
>> But I think that this problem should be addressed in any case: single
>> client update mode is very rare scenario.
> This behavior is explained in rules.sgml like this:
>
> +<sect2>
> +<title>Concurrent Transactions</title>
> +<para>
> +    Suppose an <acronym>IMMV</acronym> is defined on two base tables and each
> +    table was modified in different a concurrent transaction simultaneously.
> +    In the transaction which was committed first, <acronym>IMMV</acronym> can
> +    be updated considering only the change which happened in this transaction.
> +    On the other hand, in order to update the view correctly in the transaction
> +    which was committed later, we need to know the changes occurred in
> +    both transactions.  For this reason, <literal>ExclusiveLock</literal>
> +    is held on an <acronym>IMMV</acronym> immediately after a base table is
> +    modified in <literal>READ COMMITTED</literal> mode to make sure that
> +    the <acronym>IMMV</acronym> is updated in the latter transaction after
> +    the former transaction is committed.  In <literal>REPEATABLE READ</literal>
> +    or <literal>SERIALIZABLE</literal> mode, an error is raised immediately
> +    if lock acquisition fails because any changes which occurred in
> +    other transactions are not be visible in these modes and
> +    <acronym>IMMV</acronym> cannot be updated correctly in such situations.
> +</para>
> +</sect2>
>
> Hoever, should we describe explicitly its impact on performance here?
>   

Sorry, I didn't think much about this problem.
But I think that it is very important to try to find some solution of 
the problem.
The most obvious optimization is not to use exclusive table lock if view 
depends just on one table (contains no joins).
Looks like there are no any anomalies in this case, are there?

Yes, most analytic queries contain joins (just two queries among 22 
TPC-H  have no joins).
So may be this optimization will not help much.

I wonder if it is possible to somehow use predicate locking mechanism of 
Postgres to avoid this anomalies without global lock?

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

24 ноября 2020 г., 10:11:38

On Tue, 24 Nov 2020 12:46:57 +0300
Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:

> 
> 
> On 24.11.2020 12:21, Yugo NAGATA wrote:
> >
> >> I replaced it with RowExlusiveLock and ... got 1437 TPS with 10 connections.
> >> It is still about 7 times slower than performance without incremental view.
> >> But now the gap is not so dramatic. And it seems to be clear that this
> >> exclusive lock on matview is real show stopper for concurrent updates.
> >> I do not know which race conditions and anomalies we can get if replace
> >> table-level lock with row-level lock here.
> > I explained it here:
> > https://www.postgresql.org/message-id/20200909092752.c91758a1bec3479668e82643%40sraoss.co.jp
> >   
> > For example, suppose there is a view V = R*S that joins tables R and S,
> > and there are two concurrent transactions T1 which changes table R to R'
> > and T2 which changes S to S'. Without any lock,  in READ COMMITTED mode,
> > V would be updated to R'*S in T1, and R*S' in T2, so it would cause
> > inconsistency.  By locking the view V, transactions T1, T2 are processed
> > serially and this inconsistency can be avoided.
> >
> > Especially, suppose that tuple dR is inserted into R in T1, and dS is
> > inserted into S in T2, where dR and dS will be joined in according to
> > the view definition. In this situation, without any lock, the change of V is
> > computed as dV=dR*S in T1, dV=R*dS in T2, respectively, and dR*dS would not
> > be included in the results.  This inconsistency could not be resolved by
> > row-level lock.
> >
> >> But I think that this problem should be addressed in any case: single
> >> client update mode is very rare scenario.
> > This behavior is explained in rules.sgml like this:
> >
> > +<sect2>
> > +<title>Concurrent Transactions</title>
> > +<para>
> > +    Suppose an <acronym>IMMV</acronym> is defined on two base tables and each
> > +    table was modified in different a concurrent transaction simultaneously.
> > +    In the transaction which was committed first, <acronym>IMMV</acronym> can
> > +    be updated considering only the change which happened in this transaction.
> > +    On the other hand, in order to update the view correctly in the transaction
> > +    which was committed later, we need to know the changes occurred in
> > +    both transactions.  For this reason, <literal>ExclusiveLock</literal>
> > +    is held on an <acronym>IMMV</acronym> immediately after a base table is
> > +    modified in <literal>READ COMMITTED</literal> mode to make sure that
> > +    the <acronym>IMMV</acronym> is updated in the latter transaction after
> > +    the former transaction is committed.  In <literal>REPEATABLE READ</literal>
> > +    or <literal>SERIALIZABLE</literal> mode, an error is raised immediately
> > +    if lock acquisition fails because any changes which occurred in
> > +    other transactions are not be visible in these modes and
> > +    <acronym>IMMV</acronym> cannot be updated correctly in such situations.
> > +</para>
> > +</sect2>
> >
> > Hoever, should we describe explicitly its impact on performance here?
> >   
> 
> Sorry, I didn't think much about this problem.
> But I think that it is very important to try to find some solution of 
> the problem.
> The most obvious optimization is not to use exclusive table lock if view 
> depends just on one table (contains no joins).
> Looks like there are no any anomalies in this case, are there?

Thank you for your suggestion! That makes sense.
 
> Yes, most analytic queries contain joins (just two queries among 22 
> TPC-H  have no joins).
> So may be this optimization will not help much.

Yes, but if a user want to incrementally maintain only aggregate views on a large
table, like TPC-H Q1, it will be helpful. For this optimization, we have to only
check the number of RTE in the rtable list and it would be cheap.

> I wonder if it is possible to somehow use predicate locking mechanism of 
> Postgres to avoid this anomalies without global lock?

You mean that, ,instead of using any table lock, if any possibility of the
anomaly is detected using predlock mechanism then abort the transaction?

I don't have concrete idea to implement it and know if it is possible yet,
but I think it is worth to consider this. Thanks.


Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Konstantin Knizhnik

Дата:

25 ноября 2020 г., 12:16:05

On 24.11.2020 13:11, Yugo NAGATA wrote:
>
>> I wonder if it is possible to somehow use predicate locking mechanism of
>> Postgres to avoid this anomalies without global lock?
> You mean that, ,instead of using any table lock, if any possibility of the
> anomaly is detected using predlock mechanism then abort the transaction?

Yes. If both transactions are using serializable isolation level, then 
lock is not needed, isn't it?
So at least you can add yet another simple optimization: if transaction 
has serializable isolation level,
then exclusive lock is not required.

But I wonder if we can go further so that even if transaction is using 
read-committed or repeatable-read isolation level,
we still can replace exclusive table lock with predicate locks.

The main problem with this approach (from my point of view) is the 
predicate locks are able to detect conflict but not able to prevent it.
I.e. if such conflict is detected then transaction has to be aborted.
And it is not always desirable, especially because user doesn't expect 
it: how can insertion of single record with unique keys in a table cause 
transaction conflict?
And this is what will happen in your example with transactions T1 and T2 
inserting records in R and S tables.

And what do you think about backrgound update of materialized view?
On update/insert trigger will just add record to some "delta" table and 
then some background worker will update view.
Certainly in this case we loose synchronization between main table and 
materialized view (last one may contain slightly deteriorated data).
But in this case no exclusive lock is needed, isn't it?

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

25 ноября 2020 г., 13:06:28

On Wed, 25 Nov 2020 15:16:05 +0300
Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:

> 
> 
> On 24.11.2020 13:11, Yugo NAGATA wrote:
> >
> >> I wonder if it is possible to somehow use predicate locking mechanism of
> >> Postgres to avoid this anomalies without global lock?
> > You mean that, ,instead of using any table lock, if any possibility of the
> > anomaly is detected using predlock mechanism then abort the transaction?
> 
> Yes. If both transactions are using serializable isolation level, then 
> lock is not needed, isn't it?
> So at least you can add yet another simple optimization: if transaction 
> has serializable isolation level,
> then exclusive lock is not required.

As long as we use the trigger approach, we can't handle concurrent view maintenance
in either repeatable read or serializable isolation level.  It is because one
transaction (R= R+dR) cannot see changes occurred in another transaction (S'= S+dS)
in such cases, and we cannot get the incremental change on the view (dV=dR*dS). 
Therefore, in the current implementation, the transaction is aborted when the
concurrent view maintenance happens in repeatable read or serializable.

> But I wonder if we can go further so that even if transaction is using 
> read-committed or repeatable-read isolation level,
> we still can replace exclusive table lock with predicate locks.
> 
> The main problem with this approach (from my point of view) is the 
> predicate locks are able to detect conflict but not able to prevent it.
> I.e. if such conflict is detected then transaction has to be aborted.
> And it is not always desirable, especially because user doesn't expect 
> it: how can insertion of single record with unique keys in a table cause 
> transaction conflict?
> And this is what will happen in your example with transactions T1 and T2 
> inserting records in R and S tables.

Yes. I wonder that either aborting transaction or waiting on locks is unavoidable
when a view is incrementally updated concurrently (at least in the immediate
maintenance where a view is update in the same transaction that updates the base
table).

> And what do you think about backrgound update of materialized view?
> On update/insert trigger will just add record to some "delta" table and 
> then some background worker will update view.
> Certainly in this case we loose synchronization between main table and 
> materialized view (last one may contain slightly deteriorated data).
> But in this case no exclusive lock is needed, isn't it?

Of course, we are considering this type of view maintenance. This is
deferred maintenance where a view is update after the transaction
that updates the base tables is committed. Views can be updated in
bacground in a appropreate timing or as a response of a user command.

To implement this, we needs a mechanism to maintain change logs which
records changes of base tables. We think that implementing this infrastructure
is not trivial work, so, in the first patch proposal, we decided to start from
immediate approach which needs less code. 

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Konstantin Knizhnik

Дата:

25 ноября 2020 г., 15:00:16


On 25.11.2020 16:06, Yugo NAGATA wrote:
> On Wed, 25 Nov 2020 15:16:05 +0300
> Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
>
>>
>> On 24.11.2020 13:11, Yugo NAGATA wrote:
>>>> I wonder if it is possible to somehow use predicate locking mechanism of
>>>> Postgres to avoid this anomalies without global lock?
>>> You mean that, ,instead of using any table lock, if any possibility of the
>>> anomaly is detected using predlock mechanism then abort the transaction?
>> Yes. If both transactions are using serializable isolation level, then
>> lock is not needed, isn't it?
>> So at least you can add yet another simple optimization: if transaction
>> has serializable isolation level,
>> then exclusive lock is not required.
> As long as we use the trigger approach, we can't handle concurrent view maintenance
> in either repeatable read or serializable isolation level.  It is because one
> transaction (R= R+dR) cannot see changes occurred in another transaction (S'= S+dS)
> in such cases, and we cannot get the incremental change on the view (dV=dR*dS).
> Therefore, in the current implementation, the transaction is aborted when the
> concurrent view maintenance happens in repeatable read or serializable.

Sorry, may be I do not correctly understand you or you do not understand me.
Lets consider two serializable transactions (I do not use view or 
triggers, but perform correspondent updates manually):



create table t(pk integer, val int);
create table mat_view(gby_key integer primary key, total bigint);
insert into t values (1,0),(2,0);
insert into mat_view values (1,0),(2,0);

Session 1: Session 2:

begin isolation level serializable;
begin isolation level serializable;
insert into t values (1,200);                         insert into t 
values (1,300);
update mat_view set total=total+200  where gby_key=1;
update mat_view set total=total+300 where gby_key=1;
<blocked>
commit;
ERROR:  could not serialize access due to concurrent update

So both transactions are aborted.
It is expected behavior for serializable transactions.
But if transactions updating different records of mat_view, then them 
can be executed concurrently:

Session 1: Session 2:

begin isolation level serializable;
begin isolation level serializable;
insert into t values (1,200);                         insert into t 
values (2,300);
update mat_view set total=total+200  where gby_key=1;
update mat_view set total=total+300 where gby_key=2;
commit;                                                      commit;

So, if transactions are using serializable isolation level, then we can 
update mat view without exclusive lock
and if there is not conflict, this transaction can be executed concurrently.

Please notice, that exclusive lock doesn't prevent conflict in first case:

Session 1: Session 2:

begin isolation level serializable;
begin isolation level serializable;
insert into t values (1,200);                         insert into t 
values (1,300);
lock table mat_view;
update mat_view set total=total+200  where gby_key=1;
lock table mat_view;
<blocked>
commit;
update mat_view set total=total+300 where gby_key=1;
commit;
ERROR:  could not serialize access due to concurrent update


So do you agree that there are no reasons for using explicit lock for 
serializable transactions?

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

30 ноября 2020 г., 02:52:05

On Wed, 25 Nov 2020 18:00:16 +0300
Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:

> 
> 
> On 25.11.2020 16:06, Yugo NAGATA wrote:
> > On Wed, 25 Nov 2020 15:16:05 +0300
> > Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
> >
> >>
> >> On 24.11.2020 13:11, Yugo NAGATA wrote:
> >>>> I wonder if it is possible to somehow use predicate locking mechanism of
> >>>> Postgres to avoid this anomalies without global lock?
> >>> You mean that, ,instead of using any table lock, if any possibility of the
> >>> anomaly is detected using predlock mechanism then abort the transaction?
> >> Yes. If both transactions are using serializable isolation level, then
> >> lock is not needed, isn't it?
> >> So at least you can add yet another simple optimization: if transaction
> >> has serializable isolation level,
> >> then exclusive lock is not required.
> > As long as we use the trigger approach, we can't handle concurrent view maintenance
> > in either repeatable read or serializable isolation level.  It is because one
> > transaction (R= R+dR) cannot see changes occurred in another transaction (S'= S+dS)
> > in such cases, and we cannot get the incremental change on the view (dV=dR*dS).
> > Therefore, in the current implementation, the transaction is aborted when the
> > concurrent view maintenance happens in repeatable read or serializable.
> 
> Sorry, may be I do not correctly understand you or you do not understand me.
> Lets consider two serializable transactions (I do not use view or 
> triggers, but perform correspondent updates manually):
> 
> 
> 
> create table t(pk integer, val int);
> create table mat_view(gby_key integer primary key, total bigint);
> insert into t values (1,0),(2,0);
> insert into mat_view values (1,0),(2,0);
> 
> Session 1: Session 2:
> 
> begin isolation level serializable;
> begin isolation level serializable;
> insert into t values (1,200);                         insert into t 
> values (1,300);
> update mat_view set total=total+200  where gby_key=1;
> update mat_view set total=total+300 where gby_key=1;
> <blocked>
> commit;
> ERROR:  could not serialize access due to concurrent update
> 
> So both transactions are aborted.
> It is expected behavior for serializable transactions.
> But if transactions updating different records of mat_view, then them 
> can be executed concurrently:
> 
> Session 1: Session 2:
> 
> begin isolation level serializable;
> begin isolation level serializable;
> insert into t values (1,200);                         insert into t 
> values (2,300);
> update mat_view set total=total+200  where gby_key=1;
> update mat_view set total=total+300 where gby_key=2;
> commit;                                                      commit;
> 
> So, if transactions are using serializable isolation level, then we can 
> update mat view without exclusive lock
> and if there is not conflict, this transaction can be executed concurrently.
> 
> Please notice, that exclusive lock doesn't prevent conflict in first case:
> 
> Session 1: Session 2:
> 
> begin isolation level serializable;
> begin isolation level serializable;
> insert into t values (1,200);                         insert into t 
> values (1,300);
> lock table mat_view;
> update mat_view set total=total+200  where gby_key=1;
> lock table mat_view;
> <blocked>
> commit;
> update mat_view set total=total+300 where gby_key=1;
> commit;
> ERROR:  could not serialize access due to concurrent update
> 
> 
> So do you agree that there are no reasons for using explicit lock for 
> serializable transactions?

Yes, I agree. I said an anomaly could occur in repeatable read and serializable
isolation level, but it was wrong. In serializable, the transaction will be
aborted in programmable cases due to predicate locks, and we don't need the lock.  

However, in repeatable read, the anomaly still could occurs when the table is
defined on more than one base tables even if we lock the view.  To prevent it,
the only way I found is aborting the transaction forcedly in such cases for now.

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

22 декабря 2020 г., 12:51:36

Hi,

Attached is the revised patch (v20) to add support for Incremental
Materialized View Maintenance (IVM).

In according with Konstantin's suggestion, I made a few optimizations.

1. Creating an index on the matview automatically

When creating incremental maintainable materialized view (IMMV)s,
a unique index on IMMV is created automatically if possible.

If the view definition query has a GROUP BY clause, the index is created
on the columns of GROUP BY expressions. Otherwise, if the view contains
all primary key attributes of its base tables in the target list, the index
is created on these attributes.  Also, if the view has DISTINCT,
a unique index is created on all columns in the target list.
In other cases, no index is created.

In all cases, a NOTICE message is output to inform users that an index is
created or that an appropriate index is necessary for efficient IVM.

2. Use a weaker lock on the matview if possible

If the view has only one base table in this query, RowExclusiveLock is
held on the view instead of AccessExclusiveLock, because we don't
need to wait other concurrent transaction's result in order to
maintain the view in this case. When the same row in the view is
affected due to concurrent maintenances, a row level lock will
protect it.

On Tue, 24 Nov 2020 12:46:57 +0300
Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:

> The most obvious optimization is not to use exclusive table lock if view 
> depends just on one table (contains no joins).
> Looks like there are no any anomalies in this case, are there?

I confirmed the effect of this optimizations.

First, when I performed pgbench (SF=100) without any materialized views,
the results is :

 pgbench test4 -T 300 -c 8 -j 4
 latency average = 6.493 ms
 tps = 1232.146229 (including connections establishing)

Next, created a view as below, I performed the same pgbench.
 CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm2 AS
        SELECT bid, count(abalance), sum(abalance), avg(abalance)
        FROM pgbench_accounts GROUP BY bid;

The result is here:

[the previous version (v19 with exclusive table lock)]
 - latency average = 77.677 ms
 - tps = 102.990159 (including connections establishing)

[In the latest version (v20 with weaker lock)]
 - latency average = 17.576 ms
 - tps = 455.159644 (including connections establishing)

There is still substantial overhead, but we can see that the effect
of the optimization.

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Вложения

IVM_patches_v20.tar.gz

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

22 декабря 2020 г., 13:24:22

Hi hackers,

I heard the opinion that this patch is too big and hard to review.
So, I wander that we should downsize the patch  by eliminating some
features and leaving other basic features.

If there are more opinions this makes it easer for reviewers to look
at this patch, I would like do it. If so, we plan to support only
selection, projection, inner-join, and some aggregates in the first
release and leave sub-queries, outer-join, and CTE supports to the
next release.

Regards,
Yugo Nagata

On Tue, 22 Dec 2020 21:51:36 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> Hi,
> 
> Attached is the revised patch (v20) to add support for Incremental
> Materialized View Maintenance (IVM).
> 
> In according with Konstantin's suggestion, I made a few optimizations.
> 
> 1. Creating an index on the matview automatically
> 
> When creating incremental maintainable materialized view (IMMV)s,
> a unique index on IMMV is created automatically if possible.
> 
> If the view definition query has a GROUP BY clause, the index is created
> on the columns of GROUP BY expressions. Otherwise, if the view contains
> all primary key attributes of its base tables in the target list, the index
> is created on these attributes.  Also, if the view has DISTINCT,
> a unique index is created on all columns in the target list.
> In other cases, no index is created.
> 
> In all cases, a NOTICE message is output to inform users that an index is
> created or that an appropriate index is necessary for efficient IVM.
> 
> 2. Use a weaker lock on the matview if possible
> 
> If the view has only one base table in this query, RowExclusiveLock is
> held on the view instead of AccessExclusiveLock, because we don't
> need to wait other concurrent transaction's result in order to
> maintain the view in this case. When the same row in the view is
> affected due to concurrent maintenances, a row level lock will
> protect it.
> 
> On Tue, 24 Nov 2020 12:46:57 +0300
> Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
> 
> > The most obvious optimization is not to use exclusive table lock if view 
> > depends just on one table (contains no joins).
> > Looks like there are no any anomalies in this case, are there?
> 
> I confirmed the effect of this optimizations.
> 
> First, when I performed pgbench (SF=100) without any materialized views,
> the results is :
>  
>  pgbench test4 -T 300 -c 8 -j 4
>  latency average = 6.493 ms
>  tps = 1232.146229 (including connections establishing)
> 
> Next, created a view as below, I performed the same pgbench.
>  CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm2 AS
>         SELECT bid, count(abalance), sum(abalance), avg(abalance)
>         FROM pgbench_accounts GROUP BY bid;
> 
> The result is here:
> 
> [the previous version (v19 with exclusive table lock)]
>  - latency average = 77.677 ms
>  - tps = 102.990159 (including connections establishing)
> 
> [In the latest version (v20 with weaker lock)]
>  - latency average = 17.576 ms
>  - tps = 455.159644 (including connections establishing)
> 
> There is still substantial overhead, but we can see that the effect
> of the optimization.
> 
> Regards,
> Yugo Nagata
> 
> -- 
> Yugo NAGATA <nagata@sraoss.co.jp>


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Tatsuo Ishii

Дата:

23 декабря 2020 г., 21:54:17

Hi Yugo,

> 1. Creating an index on the matview automatically

Nice.

> 2. Use a weaker lock on the matview if possible
> 
> If the view has only one base table in this query, RowExclusiveLock is
> held on the view instead of AccessExclusiveLock, because we don't
> need to wait other concurrent transaction's result in order to
> maintain the view in this case. When the same row in the view is
> affected due to concurrent maintenances, a row level lock will
> protect it.
> 
> On Tue, 24 Nov 2020 12:46:57 +0300
> Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
> 
>> The most obvious optimization is not to use exclusive table lock if view 
>> depends just on one table (contains no joins).
>> Looks like there are no any anomalies in this case, are there?
> 
> I confirmed the effect of this optimizations.
> 
> First, when I performed pgbench (SF=100) without any materialized views,
> the results is :
>  
>  pgbench test4 -T 300 -c 8 -j 4
>  latency average = 6.493 ms
>  tps = 1232.146229 (including connections establishing)
> 
> Next, created a view as below, I performed the same pgbench.
>  CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm2 AS
>         SELECT bid, count(abalance), sum(abalance), avg(abalance)
>         FROM pgbench_accounts GROUP BY bid;
> 
> The result is here:
> 
> [the previous version (v19 with exclusive table lock)]
>  - latency average = 77.677 ms
>  - tps = 102.990159 (including connections establishing)
> 
> [In the latest version (v20 with weaker lock)]
>  - latency average = 17.576 ms
>  - tps = 455.159644 (including connections establishing)
> 
> There is still substantial overhead, but we can see that the effect
> of the optimization.

Great improvement. Now with this patch more than 4x faster than
previous one!

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

12 января 2021 г., 10:03:08

Hi,

Attached is the revised patch (v21) to add support for Incremental
Materialized View Maintenance (IVM).

In addition to some typos in the previous enhancement, I fixed a check to
prevent a view from containing an expression including aggregates like
sum(x)/sum(y) in this revision.

Regards,
Yugo Nagata

On Tue, 22 Dec 2020 22:24:22 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:

> Hi hackers,
> 
> I heard the opinion that this patch is too big and hard to review.
> So, I wander that we should downsize the patch  by eliminating some
> features and leaving other basic features.
> 
> If there are more opinions this makes it easer for reviewers to look
> at this patch, I would like do it. If so, we plan to support only
> selection, projection, inner-join, and some aggregates in the first
> release and leave sub-queries, outer-join, and CTE supports to the
> next release.
> 
> Regards,
> Yugo Nagata
> 
> On Tue, 22 Dec 2020 21:51:36 +0900
> Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> > Hi,
> > 
> > Attached is the revised patch (v20) to add support for Incremental
> > Materialized View Maintenance (IVM).
> > 
> > In according with Konstantin's suggestion, I made a few optimizations.
> > 
> > 1. Creating an index on the matview automatically
> > 
> > When creating incremental maintainable materialized view (IMMV)s,
> > a unique index on IMMV is created automatically if possible.
> > 
> > If the view definition query has a GROUP BY clause, the index is created
> > on the columns of GROUP BY expressions. Otherwise, if the view contains
> > all primary key attributes of its base tables in the target list, the index
> > is created on these attributes.  Also, if the view has DISTINCT,
> > a unique index is created on all columns in the target list.
> > In other cases, no index is created.
> > 
> > In all cases, a NOTICE message is output to inform users that an index is
> > created or that an appropriate index is necessary for efficient IVM.
> > 
> > 2. Use a weaker lock on the matview if possible
> > 
> > If the view has only one base table in this query, RowExclusiveLock is
> > held on the view instead of AccessExclusiveLock, because we don't
> > need to wait other concurrent transaction's result in order to
> > maintain the view in this case. When the same row in the view is
> > affected due to concurrent maintenances, a row level lock will
> > protect it.
> > 
> > On Tue, 24 Nov 2020 12:46:57 +0300
> > Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
> > 
> > > The most obvious optimization is not to use exclusive table lock if view 
> > > depends just on one table (contains no joins).
> > > Looks like there are no any anomalies in this case, are there?
> > 
> > I confirmed the effect of this optimizations.
> > 
> > First, when I performed pgbench (SF=100) without any materialized views,
> > the results is :
> >  
> >  pgbench test4 -T 300 -c 8 -j 4
> >  latency average = 6.493 ms
> >  tps = 1232.146229 (including connections establishing)
> > 
> > Next, created a view as below, I performed the same pgbench.
> >  CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm2 AS
> >         SELECT bid, count(abalance), sum(abalance), avg(abalance)
> >         FROM pgbench_accounts GROUP BY bid;
> > 
> > The result is here:
> > 
> > [the previous version (v19 with exclusive table lock)]
> >  - latency average = 77.677 ms
> >  - tps = 102.990159 (including connections establishing)
> > 
> > [In the latest version (v20 with weaker lock)]
> >  - latency average = 17.576 ms
> >  - tps = 455.159644 (including connections establishing)
> > 
> > There is still substantial overhead, but we can see that the effect
> > of the optimization.
> > 
> > Regards,
> > Yugo Nagata
> > 
> > -- 
> > Yugo NAGATA <nagata@sraoss.co.jp>
> 
> 
> -- 
> Yugo NAGATA <nagata@sraoss.co.jp>
> 
> 


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Вложения

IVM_patches_v21.tar.gz

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

22 января 2021 г., 07:59:54

Hi,

Attached is a revised patch (v22) rebased for the latest master head.

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Вложения

IVM_patches_v22.tar.gz

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

16 февраля 2021 г., 01:31:55

Hi,

Attached is a rebased patch (v22a).

Ragards,
Yugo Nagata


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Вложения

IVM_patches_v22a.tar.gz

Re: Implementing Incremental View Maintenance

От

Andy Fan

Дата:

18 февраля 2021 г., 11:38:44

On Tue, Feb 16, 2021 at 9:33 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:

Hi,

Attached is a rebased patch (v22a).

Thanks for the patch. Will you think posting a patch with the latest commit at that

time is helpful? If so, when others want to review it, they know which commit to

apply the patch without asking for a new rebase usually.

Best Regards

Andy Fan (https://www.aliyun.com/)

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

19 февраля 2021 г., 02:01:26

On Thu, 18 Feb 2021 19:38:44 +0800
Andy Fan <zhihui.fan1213@gmail.com> wrote:

> On Tue, Feb 16, 2021 at 9:33 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> 
> > Hi,
> >
> > Attached is a rebased patch (v22a).
> >
> 
> Thanks for the patch. Will you think posting a patch with the latest commit
> at that
> time is helpful? If so, when others want to review it,  they know which
> commit to
> apply the patch without asking for a new rebase usually.

I rebased the patch because cfbot failed.
http://cfbot.cputube.org/

Regards,
Yugo Nagata


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

07 апреля 2021 г., 09:25:37

Hi, 

I rebased the patch because the cfbot failed.

Regards,
Yugo Nagata

On Tue, 9 Mar 2021 17:27:50 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:

> On Tue, 9 Mar 2021 09:20:49 +0900
> Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> 
> > On Mon, 8 Mar 2021 15:42:00 -0500
> > Andrew Dunstan <andrew@dunslane.net> wrote:
> > 
> > > 
> > > On 2/18/21 9:01 PM, Yugo NAGATA wrote:
> > > > On Thu, 18 Feb 2021 19:38:44 +0800
> > > > Andy Fan <zhihui.fan1213@gmail.com> wrote:
> > > >
> > > >> On Tue, Feb 16, 2021 at 9:33 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> > > >>
> > > >>> Hi,
> > > >>>
> > > >>> Attached is a rebased patch (v22a).
> > > >>>
> > > >> Thanks for the patch. Will you think posting a patch with the latest commit
> > > >> at that
> > > >> time is helpful? If so, when others want to review it,  they know which
> > > >> commit to
> > > >> apply the patch without asking for a new rebase usually.
> > > > I rebased the patch because cfbot failed.
> > > > http://cfbot.cputube.org/
> > > >
> > > 
> > > It's bitrotted a bit more dues to commits bb437f995d and 25936fd46c
> > 
> > Thank you for letting me konw.  I'll rebase it soon.
> 
> Done. Attached is a rebased patch set.
> 
> Regards,
> Yugo Nagata
> 
> -- 
> Yugo NAGATA <nagata@sraoss.co.jp>


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Вложения

IVM_patches_v22c.tar.gz

Re: Implementing Incremental View Maintenance

От

Andrew Dunstan

Дата:

19 апреля 2021 г., 20:47:38

On 4/7/21 5:25 AM, Yugo NAGATA wrote:
> Hi, 
>
> I rebased the patch because the cfbot failed.
>
> Regards,
> Yugo Nagata



This patch (v22c) just crashed for me with an assertion failure on
Fedora 31. Here's the stack trace:


[New LWP 333090]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `postgres: andrew regression [local]
INSERT                                    '.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f8981caa625 in raise () from /lib64/libc.so.6
#0  0x00007f8981caa625 in raise () from /lib64/libc.so.6
#1  0x00007f8981c938d9 in abort () from /lib64/libc.so.6
#2  0x000000000094a54a in ExceptionalCondition
(conditionName=conditionName@entry=0xa91dae "queryDesc->sourceText !=
NULL", errorType=errorType@entry=0x99b468 "FailedAssertion",
fileName=fileName@entry=0xa91468
"/home/andrew/pgl/pg_head/src/backend/executor/execMain.c",
lineNumber=lineNumber@entry=199) at
/home/andrew/pgl/pg_head/src/backend/utils/error/assert.c:69
#3  0x00000000006c0e17 in standard_ExecutorStart (queryDesc=0x226af98,
eflags=0) at /home/andrew/pgl/pg_head/src/backend/executor/execMain.c:199
#4  0x00000000006737b2 in refresh_matview_datafill (dest=0x21cf428,
query=<optimized out>, queryEnv=0x2245fd0,
resultTupleDesc=0x7ffd5e764888, queryString=0x0) at
/home/andrew/pgl/pg_head/src/backend/commands/matview.c:719
#5  0x0000000000678042 in calc_delta (queryEnv=0x2245fd0,
tupdesc_new=0x7ffd5e764888, tupdesc_old=0x7ffd5e764880,
dest_new=0x21cf428, dest_old=0x0, query=0x2246108, rte_path=0x2228a60,
table=<optimized out>) at
/home/andrew/pgl/pg_head/src/backend/commands/matview.c:2907
#6  IVM_immediate_maintenance (fcinfo=<optimized out>) at
/home/andrew/pgl/pg_head/src/backend/commands/matview.c:1683
#7  0x000000000069e483 in ExecCallTriggerFunc (trigdata=0x7ffd5e764bb0,
tgindx=2, finfo=0x22345f8, instr=0x0, per_tuple_context=0x2245eb0) at
/home/andrew/pgl/pg_head/src/backend/commands/trigger.c:2142
#8  0x000000000069fc4c in AfterTriggerExecute (trigdesc=0x2233db8,
trigdesc=0x2233db8, trig_tuple_slot2=0x0, trig_tuple_slot1=0x0,
per_tuple_context=0x2245eb0, instr=0x0, finfo=0x2234598,
relInfo=0x2233ba0, event=0x222d380, estate=0x2233710) at
/home/andrew/pgl/pg_head/src/backend/commands/trigger.c:4041
#9  afterTriggerInvokeEvents (events=0x21cece8, firing_id=1,
estate=0x2233710, delete_ok=false) at
/home/andrew/pgl/pg_head/src/backend/commands/trigger.c:4255
#10 0x00000000006a4173 in AfterTriggerEndQuery
(estate=estate@entry=0x2233710) at
/home/andrew/pgl/pg_head/src/backend/commands/trigger.c:4632
#11 0x00000000006c04c8 in standard_ExecutorFinish (queryDesc=0x2237300)
at /home/andrew/pgl/pg_head/src/backend/executor/execMain.c:436
#12 0x00000000008415d8 in ProcessQuery (plan=<optimized out>,
sourceText=0x21490a0 "INSERT INTO mv_base_b VALUES(5,105);", params=0x0,
queryEnv=0x0, dest=0x2221010, qc=0x7ffd5e764f00) at
/home/andrew/pgl/pg_head/src/backend/tcop/pquery.c:190
#13 0x00000000008417f2 in PortalRunMulti (portal=portal@entry=0x21ac3c0,
isTopLevel=isTopLevel@entry=true,
setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x2221010,
altdest=altdest@entry=0x2221010, qc=qc@entry=0x7ffd5e764f00) at
/home/andrew/pgl/pg_head/src/backend/tcop/pquery.c:1267
#14 0x0000000000842415 in PortalRun (portal=portal@entry=0x21ac3c0,
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true,
run_once=run_once@entry=true, dest=dest@entry=0x2221010,
altdest=altdest@entry=0x2221010, qc=0x7ffd5e764f00) at
/home/andrew/pgl/pg_head/src/backend/tcop/pquery.c:779
#15 0x000000000083e3ca in exec_simple_query (query_string=0x21490a0
"INSERT INTO mv_base_b VALUES(5,105);") at
/home/andrew/pgl/pg_head/src/backend/tcop/postgres.c:1196
#16 0x0000000000840075 in PostgresMain (argc=argc@entry=1,
argv=argv@entry=0x7ffd5e765450, dbname=<optimized out>,
username=<optimized out>) at
/home/andrew/pgl/pg_head/src/backend/tcop/postgres.c:4458
#17 0x00000000007b8054 in BackendRun (port=<optimized out>,
port=<optimized out>) at
/home/andrew/pgl/pg_head/src/backend/postmaster/postmaster.c:4488
#18 BackendStartup (port=<optimized out>) at
/home/andrew/pgl/pg_head/src/backend/postmaster/postmaster.c:4210
#19 ServerLoop () at
/home/andrew/pgl/pg_head/src/backend/postmaster/postmaster.c:1742
#20 0x00000000007b8ebf in PostmasterMain (argc=argc@entry=8,
argv=argv@entry=0x21435c0) at
/home/andrew/pgl/pg_head/src/backend/postmaster/postmaster.c:1414
#21 0x000000000050e030 in main (argc=8, argv=0x21435c0) at
/home/andrew/pgl/pg_head/src/backend/main/main.c:209
$1 = {si_signo = 6, si_errno = 0, si_code = -6, _sifields = {_pad =
{333090, 500, 0 <repeats 26 times>}, _kill = {si_pid = 333090, si_uid =
500}, _timer = {si_tid = 333090, si_overrun = 500, si_sigval =
{sival_int = 0, sival_ptr = 0x0}}, _rt = {si_pid = 333090, si_uid = 500,
si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _sigchld = {si_pid =
333090, si_uid = 500, si_status = 0, si_utime = 0, si_stime = 0},
_sigfault = {si_addr = 0x1f400051522, _addr_lsb = 0, _addr_bnd = {_lower
= 0x0, _upper = 0x0}}, _sigpoll = {si_band = 2147483981090, si_fd = 0}}}


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Implementing Incremental View Maintenance

От

Tom Lane

Дата:

19 апреля 2021 г., 21:40:31

Andrew Dunstan <andrew@dunslane.net> writes:
> This patch (v22c) just crashed for me with an assertion failure on
> Fedora 31. Here's the stack trace:

> #2  0x000000000094a54a in ExceptionalCondition
> (conditionName=conditionName@entry=0xa91dae "queryDesc->sourceText !=
> NULL", errorType=errorType@entry=0x99b468 "FailedAssertion",
> fileName=fileName@entry=0xa91468
> "/home/andrew/pgl/pg_head/src/backend/executor/execMain.c",
> lineNumber=lineNumber@entry=199) at
> /home/andrew/pgl/pg_head/src/backend/utils/error/assert.c:69

That assert just got added a few days ago, so that's why the patch
seemed OK before.

            regards, tom lane

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

20 апреля 2021 г., 00:51:34

On Mon, 19 Apr 2021 17:40:31 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Andrew Dunstan <andrew@dunslane.net> writes:
> > This patch (v22c) just crashed for me with an assertion failure on
> > Fedora 31. Here's the stack trace:
> 
> > #2  0x000000000094a54a in ExceptionalCondition
> > (conditionName=conditionName@entry=0xa91dae "queryDesc->sourceText !=
> > NULL", errorType=errorType@entry=0x99b468 "FailedAssertion",
> > fileName=fileName@entry=0xa91468
> > "/home/andrew/pgl/pg_head/src/backend/executor/execMain.c",
> > lineNumber=lineNumber@entry=199) at
> > /home/andrew/pgl/pg_head/src/backend/utils/error/assert.c:69
> 
> That assert just got added a few days ago, so that's why the patch
> seemed OK before.

Thank you for letting me know. I'll fix it.



-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

26 апреля 2021 г., 06:46:21

On Tue, 20 Apr 2021 09:51:34 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:

> On Mon, 19 Apr 2021 17:40:31 -0400
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 
> > Andrew Dunstan <andrew@dunslane.net> writes:
> > > This patch (v22c) just crashed for me with an assertion failure on
> > > Fedora 31. Here's the stack trace:
> > 
> > > #2  0x000000000094a54a in ExceptionalCondition
> > > (conditionName=conditionName@entry=0xa91dae "queryDesc->sourceText !=
> > > NULL", errorType=errorType@entry=0x99b468 "FailedAssertion",
> > > fileName=fileName@entry=0xa91468
> > > "/home/andrew/pgl/pg_head/src/backend/executor/execMain.c",
> > > lineNumber=lineNumber@entry=199) at
> > > /home/andrew/pgl/pg_head/src/backend/utils/error/assert.c:69
> > 
> > That assert just got added a few days ago, so that's why the patch
> > seemed OK before.
> 
> Thank you for letting me know. I'll fix it.

Attached is the fixed patch.

queryDesc->sourceText cannot be NULL after commit 1111b2668d8, 
so now we pass an empty string "" for refresh_matview_datafill() instead NULL
when maintaining views incrementally.

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Вложения

IVM_patches_v22d.tar.gz

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

26 апреля 2021 г., 07:03:48

On Mon, 26 Apr 2021 15:46:21 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:

> On Tue, 20 Apr 2021 09:51:34 +0900
> Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> 
> > On Mon, 19 Apr 2021 17:40:31 -0400
> > Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > 
> > > Andrew Dunstan <andrew@dunslane.net> writes:
> > > > This patch (v22c) just crashed for me with an assertion failure on
> > > > Fedora 31. Here's the stack trace:
> > > 
> > > > #2  0x000000000094a54a in ExceptionalCondition
> > > > (conditionName=conditionName@entry=0xa91dae "queryDesc->sourceText !=
> > > > NULL", errorType=errorType@entry=0x99b468 "FailedAssertion",
> > > > fileName=fileName@entry=0xa91468
> > > > "/home/andrew/pgl/pg_head/src/backend/executor/execMain.c",
> > > > lineNumber=lineNumber@entry=199) at
> > > > /home/andrew/pgl/pg_head/src/backend/utils/error/assert.c:69
> > > 
> > > That assert just got added a few days ago, so that's why the patch
> > > seemed OK before.
> > 
> > Thank you for letting me know. I'll fix it.
> 
> Attached is the fixed patch.
> 
> queryDesc->sourceText cannot be NULL after commit 1111b2668d8, 
> so now we pass an empty string "" for refresh_matview_datafill() instead NULL
> when maintaining views incrementally.

I am sorry, I forgot to include a fix for 8aba9322511.
Attached is the fixed version.

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Вложения

IVM_patches_v22e.tar.gz

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

07 мая 2021 г., 05:14:16

On Mon, 26 Apr 2021 16:03:48 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:

> On Mon, 26 Apr 2021 15:46:21 +0900
> Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> 
> > On Tue, 20 Apr 2021 09:51:34 +0900
> > Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> > 
> > > On Mon, 19 Apr 2021 17:40:31 -0400
> > > Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > > 
> > > > Andrew Dunstan <andrew@dunslane.net> writes:
> > > > > This patch (v22c) just crashed for me with an assertion failure on
> > > > > Fedora 31. Here's the stack trace:
> > > > 
> > > > > #2  0x000000000094a54a in ExceptionalCondition
> > > > > (conditionName=conditionName@entry=0xa91dae "queryDesc->sourceText !=
> > > > > NULL", errorType=errorType@entry=0x99b468 "FailedAssertion",
> > > > > fileName=fileName@entry=0xa91468
> > > > > "/home/andrew/pgl/pg_head/src/backend/executor/execMain.c",
> > > > > lineNumber=lineNumber@entry=199) at
> > > > > /home/andrew/pgl/pg_head/src/backend/utils/error/assert.c:69
> > > > 
> > > > That assert just got added a few days ago, so that's why the patch
> > > > seemed OK before.
> > > 
> > > Thank you for letting me know. I'll fix it.
> > 
> > Attached is the fixed patch.
> > 
> > queryDesc->sourceText cannot be NULL after commit 1111b2668d8, 
> > so now we pass an empty string "" for refresh_matview_datafill() instead NULL
> > when maintaining views incrementally.
> 
> I am sorry, I forgot to include a fix for 8aba9322511.
> Attached is the fixed version.

Attached is the rebased patch (for 6b8d29419d).


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Вложения

IVM_patches_v22f.tar.gz

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

17 мая 2021 г., 04:36:46

On Fri, 7 May 2021 14:14:16 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:

> On Mon, 26 Apr 2021 16:03:48 +0900
> Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> 
> > On Mon, 26 Apr 2021 15:46:21 +0900
> > Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> > 
> > > On Tue, 20 Apr 2021 09:51:34 +0900
> > > Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> > > 
> > > > On Mon, 19 Apr 2021 17:40:31 -0400
> > > > Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > > > 
> > > > > Andrew Dunstan <andrew@dunslane.net> writes:
> > > > > > This patch (v22c) just crashed for me with an assertion failure on
> > > > > > Fedora 31. Here's the stack trace:
> > > > > 
> > > > > > #2  0x000000000094a54a in ExceptionalCondition
> > > > > > (conditionName=conditionName@entry=0xa91dae "queryDesc->sourceText !=
> > > > > > NULL", errorType=errorType@entry=0x99b468 "FailedAssertion",
> > > > > > fileName=fileName@entry=0xa91468
> > > > > > "/home/andrew/pgl/pg_head/src/backend/executor/execMain.c",
> > > > > > lineNumber=lineNumber@entry=199) at
> > > > > > /home/andrew/pgl/pg_head/src/backend/utils/error/assert.c:69
> > > > > 
> > > > > That assert just got added a few days ago, so that's why the patch
> > > > > seemed OK before.
> > > > 
> > > > Thank you for letting me know. I'll fix it.
> > > 
> > > Attached is the fixed patch.
> > > 
> > > queryDesc->sourceText cannot be NULL after commit 1111b2668d8, 
> > > so now we pass an empty string "" for refresh_matview_datafill() instead NULL
> > > when maintaining views incrementally.
> > 
> > I am sorry, I forgot to include a fix for 8aba9322511.
> > Attached is the fixed version.
> 
> Attached is the rebased patch (for 6b8d29419d).

I attached a rebased patch.

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Вложения

IVM_patches_v22g.tar.gz

Re: Implementing Incremental View Maintenance

От

vignesh C

Дата:

14 июля 2021 г., 15:52:37

On Mon, May 17, 2021 at 10:08 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
>
> On Fri, 7 May 2021 14:14:16 +0900
> Yugo NAGATA <nagata@sraoss.co.jp> wrote:
>
> > On Mon, 26 Apr 2021 16:03:48 +0900
> > Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> >
> > > On Mon, 26 Apr 2021 15:46:21 +0900
> > > Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> > >
> > > > On Tue, 20 Apr 2021 09:51:34 +0900
> > > > Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> > > >
> > > > > On Mon, 19 Apr 2021 17:40:31 -0400
> > > > > Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > > > >
> > > > > > Andrew Dunstan <andrew@dunslane.net> writes:
> > > > > > > This patch (v22c) just crashed for me with an assertion failure on
> > > > > > > Fedora 31. Here's the stack trace:
> > > > > >
> > > > > > > #2  0x000000000094a54a in ExceptionalCondition
> > > > > > > (conditionName=conditionName@entry=0xa91dae "queryDesc->sourceText !=
> > > > > > > NULL", errorType=errorType@entry=0x99b468 "FailedAssertion",
> > > > > > > fileName=fileName@entry=0xa91468
> > > > > > > "/home/andrew/pgl/pg_head/src/backend/executor/execMain.c",
> > > > > > > lineNumber=lineNumber@entry=199) at
> > > > > > > /home/andrew/pgl/pg_head/src/backend/utils/error/assert.c:69
> > > > > >
> > > > > > That assert just got added a few days ago, so that's why the patch
> > > > > > seemed OK before.
> > > > >
> > > > > Thank you for letting me know. I'll fix it.
> > > >
> > > > Attached is the fixed patch.
> > > >
> > > > queryDesc->sourceText cannot be NULL after commit 1111b2668d8,
> > > > so now we pass an empty string "" for refresh_matview_datafill() instead NULL
> > > > when maintaining views incrementally.
> > >
> > > I am sorry, I forgot to include a fix for 8aba9322511.
> > > Attached is the fixed version.
> >
> > Attached is the rebased patch (for 6b8d29419d).
>
> I attached a rebased patch.

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

19 июля 2021 г., 00:24:30

On Wed, 14 Jul 2021 21:22:37 +0530
vignesh C <vignesh21@gmail.com> wrote:

> On Mon, May 17, 2021 at 10:08 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> >
> > On Fri, 7 May 2021 14:14:16 +0900
> > Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> >
> > > On Mon, 26 Apr 2021 16:03:48 +0900
> > > Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> > >
> > > > On Mon, 26 Apr 2021 15:46:21 +0900
> > > > Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> > > >
> > > > > On Tue, 20 Apr 2021 09:51:34 +0900
> > > > > Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> > > > >
> > > > > > On Mon, 19 Apr 2021 17:40:31 -0400
> > > > > > Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > > > > >
> > > > > > > Andrew Dunstan <andrew@dunslane.net> writes:
> > > > > > > > This patch (v22c) just crashed for me with an assertion failure on
> > > > > > > > Fedora 31. Here's the stack trace:
> > > > > > >
> > > > > > > > #2  0x000000000094a54a in ExceptionalCondition
> > > > > > > > (conditionName=conditionName@entry=0xa91dae "queryDesc->sourceText !=
> > > > > > > > NULL", errorType=errorType@entry=0x99b468 "FailedAssertion",
> > > > > > > > fileName=fileName@entry=0xa91468
> > > > > > > > "/home/andrew/pgl/pg_head/src/backend/executor/execMain.c",
> > > > > > > > lineNumber=lineNumber@entry=199) at
> > > > > > > > /home/andrew/pgl/pg_head/src/backend/utils/error/assert.c:69
> > > > > > >
> > > > > > > That assert just got added a few days ago, so that's why the patch
> > > > > > > seemed OK before.
> > > > > >
> > > > > > Thank you for letting me know. I'll fix it.
> > > > >
> > > > > Attached is the fixed patch.
> > > > >
> > > > > queryDesc->sourceText cannot be NULL after commit 1111b2668d8,
> > > > > so now we pass an empty string "" for refresh_matview_datafill() instead NULL
> > > > > when maintaining views incrementally.
> > > >
> > > > I am sorry, I forgot to include a fix for 8aba9322511.
> > > > Attached is the fixed version.
> > >
> > > Attached is the rebased patch (for 6b8d29419d).
> >
> > I attached a rebased patch.
> 
> The patch does not apply on Head anymore, could you rebase and post a
> patch. I'm changing the status to "Waiting for Author".

Ok. I'll update the patch in a few days.

Regards,
Yugo Nagata


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

02 августа 2021 г., 06:28:34

Hi hackers,

On Mon, 19 Jul 2021 09:24:30 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:

> On Wed, 14 Jul 2021 21:22:37 +0530
> vignesh C <vignesh21@gmail.com> wrote:

> > The patch does not apply on Head anymore, could you rebase and post a
> > patch. I'm changing the status to "Waiting for Author".
> 
> Ok. I'll update the patch in a few days.

Attached is the latest patch set to add support for Incremental
Materialized View Maintenance (IVM)

The patches are rebased to the master and also revised with some
code cleaning. 

IVM is a way to make materialized views up-to-date in which only
incremental changes are computed and applied on views rather than
recomputing the contents from scratch as REFRESH MATERIALIZED VIEW
does. IVM can update materialized views more efficiently
than recomputation when only small part of the view need updates.

The patch set implements a feature so that materialized views could be
updated automatically and immediately when a base table is modified.

Currently, our IVM implementation supports views which could contain
tuple duplicates whose definition includes:

 - inner and outer joins including self-join
 - DISTINCT
 - some built-in aggregate functions (count, sum, agv, min, and max)
 - a part of subqueries
   -- simple subqueries in FROM clause
   -- EXISTS subqueries in WHERE clause
 - CTEs

We hope the IVM feature would be adopted into pg15. However, the size of
patch set has grown too large through supporting above features. Therefore, 
I think it is better to consider only a part of these features for the first
release. Especially, I would like propose the following features for pg15.

 - inner joins including self-join
 - DISTINCT and views with tuple duplicates
 - some built-in aggregate functions (count, sum, agv, min, and max)

By omitting outer-join, sub-queries, and CTE features, the patch size becomes
less than half. I hope this will make a bit easer to review the IVM patch set.

Here is a list of separated  patches.

- 0001: Add a new syntax:
          CREATE INCREMENTAL MATERIALIZED VIEW
- 0002: Add a new column relisivm to pg_class
- 0003: Add new deptype option 'm' in pg_depend
- 0004: Change trigger.c to allow to prolong life span of tupestores
        containing Transition Tables generated via AFTER trigger
- 0005: Add IVM supprot for pg_dump
- 0006: Add IVM support for psql
- 0007: Add the basic IVM future:
        This supports inner joins, DISTINCT, and tuple duplicates.
- 0008: Add aggregates (count, sum, avg, min, max) support for IVM
- 0009: Add regression tests for IVM
- 0010: Add documentation for IVM

We could split the patch furthermore if this would make reviews much easer. 
For example, I think 0007 could be split into the more basic part and the part
for handling tuple duplicates. Moreover, 0008 could be split into "min/max"
and other aggregates because handling min/max is a bit more complicated than
others.

I also attached IVM_extra.tar.gz that contains patches for sub-quereis,
outer-join, CTE support, just for your information. 

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Вложения

Re: Implementing Incremental View Maintenance

От

Zhihong Yu

Дата:

02 августа 2021 г., 21:33:46

On Sun, Aug 1, 2021 at 11:30 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:

Hi hackers,

On Mon, 19 Jul 2021 09:24:30 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:

> On Wed, 14 Jul 2021 21:22:37 +0530
> vignesh C <vignesh21@gmail.com> wrote:

> > The patch does not apply on Head anymore, could you rebase and post a
> > patch. I'm changing the status to "Waiting for Author".
>
> Ok. I'll update the patch in a few days.

Attached is the latest patch set to add support for Incremental
Materialized View Maintenance (IVM)

The patches are rebased to the master and also revised with some
code cleaning.

IVM is a way to make materialized views up-to-date in which only
incremental changes are computed and applied on views rather than
recomputing the contents from scratch as REFRESH MATERIALIZED VIEW
does. IVM can update materialized views more efficiently
than recomputation when only small part of the view need updates.

The patch set implements a feature so that materialized views could be
updated automatically and immediately when a base table is modified.

Currently, our IVM implementation supports views which could contain
tuple duplicates whose definition includes:

- inner and outer joins including self-join
- DISTINCT
- some built-in aggregate functions (count, sum, agv, min, and max)
- a part of subqueries
-- simple subqueries in FROM clause
-- EXISTS subqueries in WHERE clause
- CTEs

We hope the IVM feature would be adopted into pg15. However, the size of
patch set has grown too large through supporting above features. Therefore,
I think it is better to consider only a part of these features for the first
release. Especially, I would like propose the following features for pg15.

- inner joins including self-join
- DISTINCT and views with tuple duplicates
- some built-in aggregate functions (count, sum, agv, min, and max)

By omitting outer-join, sub-queries, and CTE features, the patch size becomes
less than half. I hope this will make a bit easer to review the IVM patch set.

Here is a list of separated patches.

- 0001: Add a new syntax:
CREATE INCREMENTAL MATERIALIZED VIEW
- 0002: Add a new column relisivm to pg_class
- 0003: Add new deptype option 'm' in pg_depend
- 0004: Change trigger.c to allow to prolong life span of tupestores
containing Transition Tables generated via AFTER trigger
- 0005: Add IVM supprot for pg_dump
- 0006: Add IVM support for psql
- 0007: Add the basic IVM future:
This supports inner joins, DISTINCT, and tuple duplicates.
- 0008: Add aggregates (count, sum, avg, min, max) support for IVM
- 0009: Add regression tests for IVM
- 0010: Add documentation for IVM

We could split the patch furthermore if this would make reviews much easer.
For example, I think 0007 could be split into the more basic part and the part
for handling tuple duplicates. Moreover, 0008 could be split into "min/max"
and other aggregates because handling min/max is a bit more complicated than
others.

I also attached IVM_extra.tar.gz that contains patches for sub-quereis,
outer-join, CTE support, just for your information.

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

--
Yugo NAGATA <nagata@sraoss.co.jp>

Hi,

For v23-0008-Add-aggregates-support-in-IVM.patch :

As a restriction, expressions specified in GROUP BY must appear in
the target list because tuples to be updated in IMMV are identified

by using this group keys.

IMMV -> IMVM (Incremental Materialized View Maintenance, as said above)

Or maybe it means 'incrementally maintainable materialized view'. It would be better to use the same abbreviation.

this group keys -> this group key

+ errmsg("GROUP BY expression not appeared in select list is not supported on incrementally maintainable materialized view")));

expression not appeared in select list -> expression not appearing in select list

+ * For aggregate functions except to count

except to count -> except count

Cheers

RE: Implementing Incremental View Maintenance

От

"r.takahashi_2@fujitsu.com"

Дата:

03 августа 2021 г., 10:15:42

Hi Nagata-san,


I am interested in this patch since it is good feature.

I run some simple tests.
I found the following problems.


(1)
Failed to "make world".
I think there are extra "<lineitem>" in doc/src/sgml/ref/create_materialized_view.sgml
(line 110 and 117)


(2)
In the case of partition, it seems that IVM does not work well.
I run as follows.

postgres=# create table parent (c int) partition by range (c);
CREATE TABLE
postgres=# create table child partition of parent for values from (1) to (100);
CREATE TABLE
postgres=# create incremental materialized view ivm_parent as select c from parent;
NOTICE:  could not create an index on materialized view "ivm_parent" automatically
HINT:  Create an index on the materialized view for efficient incremental maintenance.
SELECT 0
postgres=# create incremental materialized view ivm_child as select c from child;
NOTICE:  could not create an index on materialized view "ivm_child" automatically
HINT:  Create an index on the materialized view for efficient incremental maintenance.
SELECT 0
postgres=# insert into parent values (1);
INSERT 0 1
postgres=# insert into child values (2);
INSERT 0 1
postgres=# select * from parent;
 c
---
 1
 2
(2 rows)

postgres=# select * from child;
 c
---
 1
 2
(2 rows)

postgres=# select * from ivm_parent;
 c
---
 1
(1 row)

postgres=# select * from ivm_child;
 c
---
 2
(1 row)


I think ivm_parent and ivm_child should return 2 rows.


(3)
I think IVM does not support foreign table, but try to make IVM.

postgres=# create incremental materialized view ivm_foreign as select c from foreign_table;
NOTICE:  could not create an index on materialized view "ivm_foreign" automatically
HINT:  Create an index on the materialized view for efficient incremental maintenance.
ERROR:  "foreign_table" is a foreign table
DETAIL:  Triggers on foreign tables cannot have transition tables.

It finally failed to make IVM, but I think it should be checked more early.


Regards,
Ryohei Takahashi

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

05 августа 2021 г., 03:29:59

Hello Zhihong Yu,

On Mon, 2 Aug 2021 14:33:46 -0700
Zhihong Yu <zyu@yugabyte.com> wrote:

> On Sun, Aug 1, 2021 at 11:30 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> 
> > Hi hackers,
> >
> > On Mon, 19 Jul 2021 09:24:30 +0900
> > Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> >
> > > On Wed, 14 Jul 2021 21:22:37 +0530
> > > vignesh C <vignesh21@gmail.com> wrote:
> >
> > > > The patch does not apply on Head anymore, could you rebase and post a
> > > > patch. I'm changing the status to "Waiting for Author".
> > >
> > > Ok. I'll update the patch in a few days.
> >
> > Attached is the latest patch set to add support for Incremental
> > Materialized View Maintenance (IVM)
> >
> > The patches are rebased to the master and also revised with some
> > code cleaning.
> >
> > IVM is a way to make materialized views up-to-date in which only
> > incremental changes are computed and applied on views rather than
> > recomputing the contents from scratch as REFRESH MATERIALIZED VIEW
> > does. IVM can update materialized views more efficiently
> > than recomputation when only small part of the view need updates.
> >
> > The patch set implements a feature so that materialized views could be
> > updated automatically and immediately when a base table is modified.
> >
> > Currently, our IVM implementation supports views which could contain
> > tuple duplicates whose definition includes:
> >
> >  - inner and outer joins including self-join
> >  - DISTINCT
> >  - some built-in aggregate functions (count, sum, agv, min, and max)
> >  - a part of subqueries
> >    -- simple subqueries in FROM clause
> >    -- EXISTS subqueries in WHERE clause
> >  - CTEs
> >
> > We hope the IVM feature would be adopted into pg15. However, the size of
> > patch set has grown too large through supporting above features.
> > Therefore,
> > I think it is better to consider only a part of these features for the
> > first
> > release. Especially, I would like propose the following features for pg15.
> >
> >  - inner joins including self-join
> >  - DISTINCT and views with tuple duplicates
> >  - some built-in aggregate functions (count, sum, agv, min, and max)
> >
> > By omitting outer-join, sub-queries, and CTE features, the patch size
> > becomes
> > less than half. I hope this will make a bit easer to review the IVM patch
> > set.
> >
> > Here is a list of separated  patches.
> >
> > - 0001: Add a new syntax:
> >           CREATE INCREMENTAL MATERIALIZED VIEW
> > - 0002: Add a new column relisivm to pg_class
> > - 0003: Add new deptype option 'm' in pg_depend
> > - 0004: Change trigger.c to allow to prolong life span of tupestores
> >         containing Transition Tables generated via AFTER trigger
> > - 0005: Add IVM supprot for pg_dump
> > - 0006: Add IVM support for psql
> > - 0007: Add the basic IVM future:
> >         This supports inner joins, DISTINCT, and tuple duplicates.
> > - 0008: Add aggregates (count, sum, avg, min, max) support for IVM
> > - 0009: Add regression tests for IVM
> > - 0010: Add documentation for IVM
> >
> > We could split the patch furthermore if this would make reviews much
> > easer.
> > For example, I think 0007 could be split into the more basic part and the
> > part
> > for handling tuple duplicates. Moreover, 0008 could be split into "min/max"
> > and other aggregates because handling min/max is a bit more complicated
> > than
> > others.
> >
> > I also attached IVM_extra.tar.gz that contains patches for sub-quereis,
> > outer-join, CTE support, just for your information.
> >
> > Regards,
> > Yugo Nagata
> >
> > --
> > Yugo NAGATA <nagata@sraoss.co.jp>
> >
> >
> > --
> > Yugo NAGATA <nagata@sraoss.co.jp>
> >
> Hi,
> For v23-0008-Add-aggregates-support-in-IVM.patch :

Thank you for looking into this!

> As a restriction, expressions specified in GROUP BY must appear in
> the target list because tuples to be updated in IMMV are identified
> by using this group keys.
> 
> IMMV ->  IMVM (Incremental Materialized View Maintenance, as said above)
> Or maybe it means 'incrementally maintainable materialized view'. It would
> be better to use the same abbreviation.

IMMV is correct in the commit message of this patch. Rather, IMVM used
in  v23-0003-Add-new-deptype-option-m-in-pg_depend-system-cat.patch 
should be corrected to IMMV. 

> this group keys -> this group key
> 
> +                        errmsg("GROUP BY expression not appeared in select
> list is not supported on incrementally maintainable materialized view")));
> 
> expression not appeared in select list -> expression not appearing in
> select list
> 
> +    * For aggregate functions except to count
> 
> except to count -> except count

Thank you for pointing out them. I'll fix.

Regards,
Yugo Nagata


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

05 августа 2021 г., 03:41:09

Hello Takahashi-san,

On Tue, 3 Aug 2021 10:15:42 +0000
"r.takahashi_2@fujitsu.com" <r.takahashi_2@fujitsu.com> wrote:

> Hi Nagata-san,
> 
> 
> I am interested in this patch since it is good feature.
> 
> I run some simple tests.
> I found the following problems.

Thank you for your interest for this patch!

> (1) 
> Failed to "make world".
> I think there are extra "<lineitem>" in doc/src/sgml/ref/create_materialized_view.sgml
> (line 110 and 117)

Oops. I'll fix it. 

> (2)
> In the case of partition, it seems that IVM does not work well.
> I run as follows.
> 
> postgres=# create table parent (c int) partition by range (c);
> CREATE TABLE
> postgres=# create table child partition of parent for values from (1) to (100);
> CREATE TABLE
> postgres=# create incremental materialized view ivm_parent as select c from parent;
> NOTICE:  could not create an index on materialized view "ivm_parent" automatically
> HINT:  Create an index on the materialized view for efficient incremental maintenance.
> SELECT 0
> postgres=# create incremental materialized view ivm_child as select c from child;
> NOTICE:  could not create an index on materialized view "ivm_child" automatically
> HINT:  Create an index on the materialized view for efficient incremental maintenance.
> SELECT 0
> postgres=# insert into parent values (1);
> INSERT 0 1
> postgres=# insert into child values (2);
> INSERT 0 1
> postgres=# select * from parent;
>  c
> ---
>  1
>  2
> (2 rows)
> 
> postgres=# select * from child;
>  c
> ---
>  1
>  2
> (2 rows)
> 
> postgres=# select * from ivm_parent;
>  c
> ---
>  1
> (1 row)
> 
> postgres=# select * from ivm_child;
>  c
> ---
>  2
> (1 row)
> 
> 
> I think ivm_parent and ivm_child should return 2 rows.

Good point!
I'll investigate this more, but we may have to prohibit views on partitioned
table and partitions.

> (3)
> I think IVM does not support foreign table, but try to make IVM.
> 
> postgres=# create incremental materialized view ivm_foreign as select c from foreign_table;
> NOTICE:  could not create an index on materialized view "ivm_foreign" automatically
> HINT:  Create an index on the materialized view for efficient incremental maintenance.
> ERROR:  "foreign_table" is a foreign table
> DETAIL:  Triggers on foreign tables cannot have transition tables.
> 
> It finally failed to make IVM, but I think it should be checked more early.

You are right. We don't support foreign tables as long as we use triggers.

 I'll fix.

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

RE: Implementing Incremental View Maintenance

От

"r.takahashi_2@fujitsu.com"

Дата:

05 августа 2021 г., 08:53:47

Hi Nagata-san,


Thank you for your reply.

> I'll investigate this more, but we may have to prohibit views on partitioned
> table and partitions.

I think this restriction is strict.
This feature is useful when the base table is large and partitioning is also useful in such case.


I have several additional comments on the patch.


(1)
The following features are added to transition table.
- Prolong lifespan of transition table
- If table has row security policies, set them to the transition table
- Calculate pre-state of the table

Are these features only for IVM?
If there are other useful case, they should be separated from IVM patch and
should be independent patch for transition table.


(2)
DEPENDENCY_IMMV (m) is added to deptype of pg_depend.
What is the difference compared with existing deptype such as DEPENDENCY_INTERNAL (i)?


(3)
Converting from normal materialized view to IVM or from IVM to normal materialized view is not implemented yet.
Is it difficult?

I think create/drop triggers and __ivm_ columns can achieve this feature.


Regards,
Ryohei Takahashi

Re: Implementing Incremental View Maintenance

От

Zhihong Yu

Дата:

07 августа 2021 г., 07:00:44

On Sun, Aug 1, 2021 at 11:30 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:

Hi hackers,

On Mon, 19 Jul 2021 09:24:30 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:

> On Wed, 14 Jul 2021 21:22:37 +0530
> vignesh C <vignesh21@gmail.com> wrote:

> > The patch does not apply on Head anymore, could you rebase and post a
> > patch. I'm changing the status to "Waiting for Author".
>
> Ok. I'll update the patch in a few days.

Attached is the latest patch set to add support for Incremental
Materialized View Maintenance (IVM)

The patches are rebased to the master and also revised with some
code cleaning.

IVM is a way to make materialized views up-to-date in which only
incremental changes are computed and applied on views rather than
recomputing the contents from scratch as REFRESH MATERIALIZED VIEW
does. IVM can update materialized views more efficiently
than recomputation when only small part of the view need updates.

The patch set implements a feature so that materialized views could be
updated automatically and immediately when a base table is modified.

Currently, our IVM implementation supports views which could contain
tuple duplicates whose definition includes:

- inner and outer joins including self-join
- DISTINCT
- some built-in aggregate functions (count, sum, agv, min, and max)
- a part of subqueries
-- simple subqueries in FROM clause
-- EXISTS subqueries in WHERE clause
- CTEs

We hope the IVM feature would be adopted into pg15. However, the size of
patch set has grown too large through supporting above features. Therefore,
I think it is better to consider only a part of these features for the first
release. Especially, I would like propose the following features for pg15.

- inner joins including self-join
- DISTINCT and views with tuple duplicates
- some built-in aggregate functions (count, sum, agv, min, and max)

By omitting outer-join, sub-queries, and CTE features, the patch size becomes
less than half. I hope this will make a bit easer to review the IVM patch set.

Here is a list of separated patches.

- 0001: Add a new syntax:
CREATE INCREMENTAL MATERIALIZED VIEW
- 0002: Add a new column relisivm to pg_class
- 0003: Add new deptype option 'm' in pg_depend
- 0004: Change trigger.c to allow to prolong life span of tupestores
containing Transition Tables generated via AFTER trigger
- 0005: Add IVM supprot for pg_dump
- 0006: Add IVM support for psql
- 0007: Add the basic IVM future:
This supports inner joins, DISTINCT, and tuple duplicates.
- 0008: Add aggregates (count, sum, avg, min, max) support for IVM
- 0009: Add regression tests for IVM
- 0010: Add documentation for IVM

We could split the patch furthermore if this would make reviews much easer.
For example, I think 0007 could be split into the more basic part and the part
for handling tuple duplicates. Moreover, 0008 could be split into "min/max"
and other aggregates because handling min/max is a bit more complicated than
others.

I also attached IVM_extra.tar.gz that contains patches for sub-quereis,
outer-join, CTE support, just for your information.

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

--
Yugo NAGATA <nagata@sraoss.co.jp>

Hi,

For v23-0007-Add-Incremental-View-Maintenance-support.patch :

bq. In this implementation, AFTER triggers are used to collecting tuplestores

'to collecting' -> to collect

bq. are contained in a old transition table.

'a old' -> an old

bq. updates of more than one base tables

one base tables -> one base table

bq. DISTINCT and tuple duplicates in views are supported

Since distinct and duplicate have opposite meanings, it would be better to rephrase the above sentence.

bq. The value in__ivm_count__ is updated

I searched the patch for in__ivm_count__ - there was no (second) match. I think there should be a space between in and underscore.

+static void CreateIvmTriggersOnBaseTables_recurse(Query *qry, Node *node, Oid matviewOid, Relids *relids, bool ex_lock);

nit: long line. please wrap.

+ if (rewritten->distinctClause)
+ rewritten->groupClause = transformDistinctClause(NULL, &rewritten->targetList, rewritten->sortClause, false);
+
+ /* Add count(*) for counting distinct tuples in views */
+ if (rewritten->distinctClause)

It seems the body of the two if statements can be combined into one.

More to follow for this patch.

Cheers

Re: Implementing Incremental View Maintenance

От

Zhihong Yu

Дата:

07 августа 2021 г., 07:52:24

On Sat, Aug 7, 2021 at 12:00 AM Zhihong Yu <zyu@yugabyte.com> wrote:

On Sun, Aug 1, 2021 at 11:30 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
Hi hackers,

On Mon, 19 Jul 2021 09:24:30 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:

> On Wed, 14 Jul 2021 21:22:37 +0530
> vignesh C <vignesh21@gmail.com> wrote:

> > The patch does not apply on Head anymore, could you rebase and post a
> > patch. I'm changing the status to "Waiting for Author".
>
> Ok. I'll update the patch in a few days.

Attached is the latest patch set to add support for Incremental
Materialized View Maintenance (IVM)

The patches are rebased to the master and also revised with some
code cleaning.

IVM is a way to make materialized views up-to-date in which only
incremental changes are computed and applied on views rather than
recomputing the contents from scratch as REFRESH MATERIALIZED VIEW
does. IVM can update materialized views more efficiently
than recomputation when only small part of the view need updates.

The patch set implements a feature so that materialized views could be
updated automatically and immediately when a base table is modified.

Currently, our IVM implementation supports views which could contain
tuple duplicates whose definition includes:

- inner and outer joins including self-join
- DISTINCT
- some built-in aggregate functions (count, sum, agv, min, and max)
- a part of subqueries
-- simple subqueries in FROM clause
-- EXISTS subqueries in WHERE clause
- CTEs

We hope the IVM feature would be adopted into pg15. However, the size of
patch set has grown too large through supporting above features. Therefore,
I think it is better to consider only a part of these features for the first
release. Especially, I would like propose the following features for pg15.

- inner joins including self-join
- DISTINCT and views with tuple duplicates
- some built-in aggregate functions (count, sum, agv, min, and max)

By omitting outer-join, sub-queries, and CTE features, the patch size becomes
less than half. I hope this will make a bit easer to review the IVM patch set.

Here is a list of separated patches.

- 0001: Add a new syntax:
CREATE INCREMENTAL MATERIALIZED VIEW
- 0002: Add a new column relisivm to pg_class
- 0003: Add new deptype option 'm' in pg_depend
- 0004: Change trigger.c to allow to prolong life span of tupestores
containing Transition Tables generated via AFTER trigger
- 0005: Add IVM supprot for pg_dump
- 0006: Add IVM support for psql
- 0007: Add the basic IVM future:
This supports inner joins, DISTINCT, and tuple duplicates.
- 0008: Add aggregates (count, sum, avg, min, max) support for IVM
- 0009: Add regression tests for IVM
- 0010: Add documentation for IVM

We could split the patch furthermore if this would make reviews much easer.
For example, I think 0007 could be split into the more basic part and the part
for handling tuple duplicates. Moreover, 0008 could be split into "min/max"
and other aggregates because handling min/max is a bit more complicated than
others.

I also attached IVM_extra.tar.gz that contains patches for sub-quereis,
outer-join, CTE support, just for your information.

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata@sraoss.co.jp>

--
Yugo NAGATA <nagata@sraoss.co.jp>

Hi,
For v23-0007-Add-Incremental-View-Maintenance-support.patch :

bq. In this implementation, AFTER triggers are used to collecting tuplestores

'to collecting' -> to collect

bq. are contained in a old transition table.

'a old' -> an old

bq. updates of more than one base tables

one base tables -> one base table

bq. DISTINCT and tuple duplicates in views are supported

Since distinct and duplicate have opposite meanings, it would be better to rephrase the above sentence.

bq. The value in__ivm_count__ is updated

I searched the patch for in__ivm_count__ - there was no (second) match. I think there should be a space between in and underscore.

+static void CreateIvmTriggersOnBaseTables_recurse(Query *qry, Node *node, Oid matviewOid, Relids *relids, bool ex_lock);

nit: long line. please wrap.

+ if (rewritten->distinctClause)
+ rewritten->groupClause = transformDistinctClause(NULL, &rewritten->targetList, rewritten->sortClause, false);
+
+ /* Add count(*) for counting distinct tuples in views */
+ if (rewritten->distinctClause)

It seems the body of the two if statements can be combined into one.

More to follow for this patch.

Cheers

Hi,

+ CreateIvmTriggersOnBaseTables_recurse(qry, (Node *)qry, matviewOid, &relids, ex_lock);

Looking at existing recursive functions, e.g.

src/backend/executor/execPartition.c:find_matching_subplans_recurse(PartitionPruningData *prunedata,

the letters in the function name are all lower case. I think following the convention would be nice.

+ if (rte->rtekind == RTE_RELATION)
+ {
+ if (!bms_is_member(rte->relid, *relids))

The conditions for the two if statements can be combined (saving some indentation).

+ check_stack_depth();
+
+ if (node == NULL)
+ return false;

It seems the node check can be placed ahead of the stack depth check.

+ * CreateindexOnIMMV

CreateindexOnIMMV -> CreateIndexOnIMMV

+ (errmsg("could not create an index on materialized view \"%s\" automatically",

It would be nice to mention the reason is the lack of primary key.

+ /* create no index, just notice that an appropriate index is necessary for efficient, IVM */

for efficient -> for efficiency.

Cheers

RE: Implementing Incremental View Maintenance

От

"r.takahashi_2@fujitsu.com"

Дата:

06 сентября 2021 г., 10:06:37

Hi Nagata-san,


I'm still reading the patch.
I have additional comments.


(1)
In v23-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patch, ivm member is added to IntoClause struct.
I think it is necessary to modify _copyIntoClause() and _equalIntoClause() functions.


(2)
By executing pg_dump with v23-0005-Add-Incremental-View-Maintenance-support-to-pg_d.patch,
the constraint which is automatically created during "CREATE INCREMENTAL MATERIALIZED VIEW" is also dumped.
This cause error during recovery as follows.

ivm=# create table t (c1 int, c2 int);
CREATE TABLE
ivm=# create incremental materialized view ivm_t as select distinct c1 from t;
NOTICE:  created index "ivm_t_index" on materialized view "ivm_t"
SELECT 0

Then I executed pg_dump.

In the dump, the following SQLs appear.

CREATE INCREMENTAL MATERIALIZED VIEW public.ivm_t AS
 SELECT DISTINCT t.c1
   FROM public.t
  WITH NO DATA;

ALTER TABLE ONLY public.ivm_t
    ADD CONSTRAINT ivm_t_index UNIQUE (c1);

If I execute psql with the result of pg_dump, following error occurs.

ERROR:  ALTER action ADD CONSTRAINT cannot be performed on relation "ivm_t"
DETAIL:  This operation is not supported for materialized views.


Regards,
Ryohei Takahashi

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

22 сентября 2021 г., 09:23:21

Hello Zhihong Yu,

Thank you for your suggestion!

I am sorry for late replay. I'll fix them and submit the
updated patch soon.

On Sat, 7 Aug 2021 00:52:24 -0700
Zhihong Yu <zyu@yugabyte.com> wrote:

> > Hi,
> > For v23-0007-Add-Incremental-View-Maintenance-support.patch :
> >
> > bq. In this implementation, AFTER triggers are used to collecting
> > tuplestores
> >
> > 'to collecting' -> to collect
> >
> > bq. are contained in a old transition table.
> >
> > 'a old' -> an old
> >
> > bq. updates of more than one base tables
> >
> > one base tables -> one base table

I'll fix them.

> > bq. DISTINCT and tuple duplicates in views are supported
> >
> > Since distinct and duplicate have opposite meanings, it would be better to
> > rephrase the above sentence.

I'll rewrite it to 
"Incrementally Maintainable Materialized Views (IMMV) can contain
duplicated tuples. Also, DISTINCT clause is supported. "

> > bq. The value in__ivm_count__ is updated
> >
> > I searched the patch for in__ivm_count__ - there was no (second) match. I
> > think there should be a space between in and underscore.

Yes, the space was missing.

> > +static void CreateIvmTriggersOnBaseTables_recurse(Query *qry, Node *node,
> > Oid matviewOid, Relids *relids, bool ex_lock);
> >
> > nit: long line. please wrap.

OK.

> >
> > +   if (rewritten->distinctClause)
> > +       rewritten->groupClause = transformDistinctClause(NULL,
> > &rewritten->targetList, rewritten->sortClause, false);
> > +
> > +   /* Add count(*) for counting distinct tuples in views */
> > +   if (rewritten->distinctClause)
> >
> > It seems the body of the two if statements can be combined into one.

Ok.

> 
> +   CreateIvmTriggersOnBaseTables_recurse(qry, (Node *)qry, matviewOid,
> &relids, ex_lock);
> 
> Looking at existing recursive functions, e.g.
> 
> src/backend/executor/execPartition.c:find_matching_subplans_recurse(PartitionPruningData
> *prunedata,
> 
> the letters in the function name are all lower case. I think following the
> convention would be nice.

Ok. I'll rename this to CreateIvmTriggersOnBaseTablesRecurse since I found
DeadLockCheckRecurse, transformExprRecurse, and so on.

> 
> +               if (rte->rtekind == RTE_RELATION)
> +               {
> +                   if (!bms_is_member(rte->relid, *relids))
> 
> The conditions for the two if statements can be combined (saving some
> indentation).

Yes. I'll fix.

> +   check_stack_depth();
> +
> +   if (node == NULL)
> +       return false;
> 
> It seems the node check can be placed ahead of the stack depth check.

OK.

> + * CreateindexOnIMMV
> 
> CreateindexOnIMMV -> CreateIndexOnIMMV
> 
> +                   (errmsg("could not create an index on materialized view
> \"%s\" automatically",
> 
> It would be nice to mention the reason is the lack of primary key.
> 
> +           /* create no index, just notice that an appropriate index is
> necessary for efficient, IVM */
> 
> for efficient -> for efficiency.

I'll fix them. Thanks.


Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

22 сентября 2021 г., 09:53:43

Hello Takahashi-san,

On Thu, 5 Aug 2021 08:53:47 +0000
"r.takahashi_2@fujitsu.com" <r.takahashi_2@fujitsu.com> wrote:

> Hi Nagata-san,
> 
> 
> Thank you for your reply.
> 
> > I'll investigate this more, but we may have to prohibit views on partitioned
> > table and partitions.
> 
> I think this restriction is strict.
> This feature is useful when the base table is large and partitioning is also useful in such case.

One reason of this issue is the lack of triggers on partitioned tables or partitions that
are not specified in the view definition. 

However, even if we create triggers recursively on the parents or children, we would still
need more consideration. This is because we will have to convert the format of tuple of
modified table to the format of the table specified in the view for cases that the parent
and some children have different format. 

I think supporting partitioned tables can be left for the next release.

> 
> I have several additional comments on the patch.
> 
> 
> (1)
> The following features are added to transition table.
> - Prolong lifespan of transition table
> - If table has row security policies, set them to the transition table
> - Calculate pre-state of the table
> 
> Are these features only for IVM?
> If there are other useful case, they should be separated from IVM patch and
> should be independent patch for transition table.

Maybe. However, we don't have good idea about use cases other than IVM of
them for now...

> 
> (2)
> DEPENDENCY_IMMV (m) is added to deptype of pg_depend.
> What is the difference compared with existing deptype such as DEPENDENCY_INTERNAL (i)?

DEPENDENCY_IMMV was added to clear that a certain trigger is related to IMMV.
We dropped the IVM trigger and its dependencies from IMMV when REFRESH ... WITH NO DATA
is executed. Without the new deptype, we may accidentally delete a dependency created
with an intention other than the IVM trigger.

> (3)
> Converting from normal materialized view to IVM or from IVM to normal materialized view is not implemented yet.
> Is it difficult?
> 
> I think create/drop triggers and __ivm_ columns can achieve this feature.

I think it is harder than you expected. When an IMMV is switched to a normal
materialized view, we needs to drop hidden columns (__ivm_count__ etc.), and in
the opposite case, we need to create them again. The former (IMMV->IVM) might be
easer, but for the latter (IVM->IMMV) I wonder we would need to re-create IMMV.

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

22 сентября 2021 г., 10:12:27

Hello Takahashi-san,

On Mon, 6 Sep 2021 10:06:37 +0000
"r.takahashi_2@fujitsu.com" <r.takahashi_2@fujitsu.com> wrote:

> Hi Nagata-san,
> 
> 
> I'm still reading the patch.
> I have additional comments.

Thank you for your comments!

> 
> (1)
> In v23-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patch, ivm member is added to IntoClause struct.
> I think it is necessary to modify _copyIntoClause() and _equalIntoClause() functions.

Ok. I'll fix _copyIntoClause() and _equalIntoClause() as well as _readIntoClause() and _outIntoClause().
 
> (2)
> By executing pg_dump with v23-0005-Add-Incremental-View-Maintenance-support-to-pg_d.patch,
> the constraint which is automatically created during "CREATE INCREMENTAL MATERIALIZED VIEW" is also dumped.
> This cause error during recovery as follows.
> 
> ivm=# create table t (c1 int, c2 int);
> CREATE TABLE
> ivm=# create incremental materialized view ivm_t as select distinct c1 from t;
> NOTICE:  created index "ivm_t_index" on materialized view "ivm_t"
> SELECT 0
> 
> Then I executed pg_dump.
> 
> In the dump, the following SQLs appear.
> 
> CREATE INCREMENTAL MATERIALIZED VIEW public.ivm_t AS
>  SELECT DISTINCT t.c1
>    FROM public.t
>   WITH NO DATA;
> 
> ALTER TABLE ONLY public.ivm_t
>     ADD CONSTRAINT ivm_t_index UNIQUE (c1);
> 
> If I execute psql with the result of pg_dump, following error occurs.
> 
> ERROR:  ALTER action ADD CONSTRAINT cannot be performed on relation "ivm_t"
> DETAIL:  This operation is not supported for materialized views.

Good catch! It was my mistake creating unique constraints on IMMV in spite of
we cannot defined them via SQL. I'll fix it to use unique indexes instead of
constraints.

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

22 сентября 2021 г., 10:17:12

Hi hackers,

I attached the updated patch including fixes reported by 
Zhihong Yu and Ryohei Takahashi.

Regards,
Yugo Nagata

On Wed, 22 Sep 2021 19:12:27 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:

> Hello Takahashi-san,
> 
> On Mon, 6 Sep 2021 10:06:37 +0000
> "r.takahashi_2@fujitsu.com" <r.takahashi_2@fujitsu.com> wrote:
> 
> > Hi Nagata-san,
> > 
> > 
> > I'm still reading the patch.
> > I have additional comments.
> 
> Thank you for your comments!
> 
> > 
> > (1)
> > In v23-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patch, ivm member is added to IntoClause struct.
> > I think it is necessary to modify _copyIntoClause() and _equalIntoClause() functions.
> 
> Ok. I'll fix _copyIntoClause() and _equalIntoClause() as well as _readIntoClause() and _outIntoClause().
>  
> > (2)
> > By executing pg_dump with v23-0005-Add-Incremental-View-Maintenance-support-to-pg_d.patch,
> > the constraint which is automatically created during "CREATE INCREMENTAL MATERIALIZED VIEW" is also dumped.
> > This cause error during recovery as follows.
> > 
> > ivm=# create table t (c1 int, c2 int);
> > CREATE TABLE
> > ivm=# create incremental materialized view ivm_t as select distinct c1 from t;
> > NOTICE:  created index "ivm_t_index" on materialized view "ivm_t"
> > SELECT 0
> > 
> > Then I executed pg_dump.
> > 
> > In the dump, the following SQLs appear.
> > 
> > CREATE INCREMENTAL MATERIALIZED VIEW public.ivm_t AS
> >  SELECT DISTINCT t.c1
> >    FROM public.t
> >   WITH NO DATA;
> > 
> > ALTER TABLE ONLY public.ivm_t
> >     ADD CONSTRAINT ivm_t_index UNIQUE (c1);
> > 
> > If I execute psql with the result of pg_dump, following error occurs.
> > 
> > ERROR:  ALTER action ADD CONSTRAINT cannot be performed on relation "ivm_t"
> > DETAIL:  This operation is not supported for materialized views.
> 
> Good catch! It was my mistake creating unique constraints on IMMV in spite of
> we cannot defined them via SQL. I'll fix it to use unique indexes instead of
> constraints.
> 
> Regards,
> Yugo Nagata
> 
> -- 
> Yugo NAGATA <nagata@sraoss.co.jp>
> 
> 


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Вложения

IVM_patches_v24.tar.gz

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

22 сентября 2021 г., 19:57:30

On Wed, 22 Sep 2021 19:17:12 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:

> Hi hackers,
> 
> I attached the updated patch including fixes reported by 
> Zhihong Yu and Ryohei Takahashi.

Cfbot seems to fail to open the tar file, so I attached
patch files instead of tar ball.

Regards,
Yugo Nagata

 
> On Wed, 22 Sep 2021 19:12:27 +0900
> Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> 
> > Hello Takahashi-san,
> > 
> > On Mon, 6 Sep 2021 10:06:37 +0000
> > "r.takahashi_2@fujitsu.com" <r.takahashi_2@fujitsu.com> wrote:
> > 
> > > Hi Nagata-san,
> > > 
> > > 
> > > I'm still reading the patch.
> > > I have additional comments.
> > 
> > Thank you for your comments!
> > 
> > > 
> > > (1)
> > > In v23-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patch, ivm member is added to IntoClause struct.
> > > I think it is necessary to modify _copyIntoClause() and _equalIntoClause() functions.
> > 
> > Ok. I'll fix _copyIntoClause() and _equalIntoClause() as well as _readIntoClause() and _outIntoClause().
> >  
> > > (2)
> > > By executing pg_dump with v23-0005-Add-Incremental-View-Maintenance-support-to-pg_d.patch,
> > > the constraint which is automatically created during "CREATE INCREMENTAL MATERIALIZED VIEW" is also dumped.
> > > This cause error during recovery as follows.
> > > 
> > > ivm=# create table t (c1 int, c2 int);
> > > CREATE TABLE
> > > ivm=# create incremental materialized view ivm_t as select distinct c1 from t;
> > > NOTICE:  created index "ivm_t_index" on materialized view "ivm_t"
> > > SELECT 0
> > > 
> > > Then I executed pg_dump.
> > > 
> > > In the dump, the following SQLs appear.
> > > 
> > > CREATE INCREMENTAL MATERIALIZED VIEW public.ivm_t AS
> > >  SELECT DISTINCT t.c1
> > >    FROM public.t
> > >   WITH NO DATA;
> > > 
> > > ALTER TABLE ONLY public.ivm_t
> > >     ADD CONSTRAINT ivm_t_index UNIQUE (c1);
> > > 
> > > If I execute psql with the result of pg_dump, following error occurs.
> > > 
> > > ERROR:  ALTER action ADD CONSTRAINT cannot be performed on relation "ivm_t"
> > > DETAIL:  This operation is not supported for materialized views.
> > 
> > Good catch! It was my mistake creating unique constraints on IMMV in spite of
> > we cannot defined them via SQL. I'll fix it to use unique indexes instead of
> > constraints.
> > 
> > Regards,
> > Yugo Nagata
> > 
> > -- 
> > Yugo NAGATA <nagata@sraoss.co.jp>
> > 
> > 
> 
> 
> -- 
> Yugo NAGATA <nagata@sraoss.co.jp>


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Вложения

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

30 сентября 2021 г., 06:37:55

Hello Takahashi-san,

On Wed, 22 Sep 2021 18:53:43 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:

> Hello Takahashi-san,
> 
> On Thu, 5 Aug 2021 08:53:47 +0000
> "r.takahashi_2@fujitsu.com" <r.takahashi_2@fujitsu.com> wrote:
> 
> > Hi Nagata-san,
> > 
> > 
> > Thank you for your reply.
> > 
> > > I'll investigate this more, but we may have to prohibit views on partitioned
> > > table and partitions.
> > 
> > I think this restriction is strict.
> > This feature is useful when the base table is large and partitioning is also useful in such case.
> 
> One reason of this issue is the lack of triggers on partitioned tables or partitions that
> are not specified in the view definition. 
> 
> However, even if we create triggers recursively on the parents or children, we would still
> need more consideration. This is because we will have to convert the format of tuple of
> modified table to the format of the table specified in the view for cases that the parent
> and some children have different format. 
> 
> I think supporting partitioned tables can be left for the next release.
> 
> > 
> > I have several additional comments on the patch.
> > 
> > 
> > (1)
> > The following features are added to transition table.
> > - Prolong lifespan of transition table
> > - If table has row security policies, set them to the transition table
> > - Calculate pre-state of the table
> > 
> > Are these features only for IVM?
> > If there are other useful case, they should be separated from IVM patch and
> > should be independent patch for transition table.
> 
> Maybe. However, we don't have good idea about use cases other than IVM of
> them for now...
> 
> > 
> > (2)
> > DEPENDENCY_IMMV (m) is added to deptype of pg_depend.
> > What is the difference compared with existing deptype such as DEPENDENCY_INTERNAL (i)?
> 
> DEPENDENCY_IMMV was added to clear that a certain trigger is related to IMMV.
> We dropped the IVM trigger and its dependencies from IMMV when REFRESH ... WITH NO DATA
> is executed. Without the new deptype, we may accidentally delete a dependency created
> with an intention other than the IVM trigger.
> 
> > (3)
> > Converting from normal materialized view to IVM or from IVM to normal materialized view is not implemented yet.
> > Is it difficult?
> > 
> > I think create/drop triggers and __ivm_ columns can achieve this feature.
> 
> I think it is harder than you expected. When an IMMV is switched to a normal
> materialized view, we needs to drop hidden columns (__ivm_count__ etc.), and in
> the opposite case, we need to create them again. The former (IMMV->IVM) might be
> easer, but for the latter (IVM->IMMV) I wonder we would need to re-create IMMV.

I am sorry but I found a mistake in the above description. 
"IMMV->IVM" and "IVM->IMMV" were wrong. I've should use "IMMV->MV" and "MV->IMMV"
where MV means normal materialized view.w.

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

29 октября 2021 г., 09:16:28

Hi hackers,

I attached the rebased patch set.

Regards,
Yugo Nagata

On Thu, 23 Sep 2021 04:57:30 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:

> On Wed, 22 Sep 2021 19:17:12 +0900
> Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> 
> > Hi hackers,
> > 
> > I attached the updated patch including fixes reported by 
> > Zhihong Yu and Ryohei Takahashi.
> 
> Cfbot seems to fail to open the tar file, so I attached
> patch files instead of tar ball.
> 
> Regards,
> Yugo Nagata
> 
>  
> > On Wed, 22 Sep 2021 19:12:27 +0900
> > Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> > 
> > > Hello Takahashi-san,
> > > 
> > > On Mon, 6 Sep 2021 10:06:37 +0000
> > > "r.takahashi_2@fujitsu.com" <r.takahashi_2@fujitsu.com> wrote:
> > > 
> > > > Hi Nagata-san,
> > > > 
> > > > 
> > > > I'm still reading the patch.
> > > > I have additional comments.
> > > 
> > > Thank you for your comments!
> > > 
> > > > 
> > > > (1)
> > > > In v23-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patch, ivm member is added to IntoClause struct.
> > > > I think it is necessary to modify _copyIntoClause() and _equalIntoClause() functions.
> > > 
> > > Ok. I'll fix _copyIntoClause() and _equalIntoClause() as well as _readIntoClause() and _outIntoClause().
> > >  
> > > > (2)
> > > > By executing pg_dump with v23-0005-Add-Incremental-View-Maintenance-support-to-pg_d.patch,
> > > > the constraint which is automatically created during "CREATE INCREMENTAL MATERIALIZED VIEW" is also dumped.
> > > > This cause error during recovery as follows.
> > > > 
> > > > ivm=# create table t (c1 int, c2 int);
> > > > CREATE TABLE
> > > > ivm=# create incremental materialized view ivm_t as select distinct c1 from t;
> > > > NOTICE:  created index "ivm_t_index" on materialized view "ivm_t"
> > > > SELECT 0
> > > > 
> > > > Then I executed pg_dump.
> > > > 
> > > > In the dump, the following SQLs appear.
> > > > 
> > > > CREATE INCREMENTAL MATERIALIZED VIEW public.ivm_t AS
> > > >  SELECT DISTINCT t.c1
> > > >    FROM public.t
> > > >   WITH NO DATA;
> > > > 
> > > > ALTER TABLE ONLY public.ivm_t
> > > >     ADD CONSTRAINT ivm_t_index UNIQUE (c1);
> > > > 
> > > > If I execute psql with the result of pg_dump, following error occurs.
> > > > 
> > > > ERROR:  ALTER action ADD CONSTRAINT cannot be performed on relation "ivm_t"
> > > > DETAIL:  This operation is not supported for materialized views.
> > > 
> > > Good catch! It was my mistake creating unique constraints on IMMV in spite of
> > > we cannot defined them via SQL. I'll fix it to use unique indexes instead of
> > > constraints.
> > > 
> > > Regards,
> > > Yugo Nagata
> > > 
> > > -- 
> > > Yugo NAGATA <nagata@sraoss.co.jp>
> > > 
> > > 
> > 
> > 
> > -- 
> > Yugo NAGATA <nagata@sraoss.co.jp>
> 
> 
> -- 
> Yugo NAGATA <nagata@sraoss.co.jp>


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

On Thu, Feb 3, 2022 at 8:28 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:

Hi,

On Thu, 13 Jan 2022 18:23:42 +0800
Julien Rouhaud <rjuju123@gmail.com> wrote:

> Hi,
>
> On Thu, Nov 25, 2021 at 04:37:10PM +0900, Yugo NAGATA wrote:
> > On Wed, 24 Nov 2021 04:31:25 +0000
> > "r.takahashi_2@fujitsu.com" <r.takahashi_2@fujitsu.com> wrote:
> >
> > >
> > > I checked the same procedure on v24 patch.
> > > But following error occurs instead of the original error.
> > >
> > > ERROR: relation "ivm_t_index" already exists
> >
> > Thank you for pointing out it!
> >
> > Hmmm, an index is created when IMMV is defined, so CREAE INDEX called
> > after this would fail... Maybe, we should not create any index automatically
> > if IMMV is created WITH NO DATA.
> >
> > I'll fix it after some investigation.
>
> Are you still investigating on that problem? Also, the patchset doesn't apply
> anymore:

I attached the updated and rebased patch set.

I fixed to not create a unique index when an IMMV is created WITH NO DATA.
Instead, the index is created by REFRESH WITH DATA only when the same one
is not created yet.

Also, I fixed the documentation to describe that foreign tables and partitioned
tables are not supported according with Takahashi-san's suggestion.

> There isn't any answer to your following email summarizing the feature yet, so
> I'm not sure what should be the status of this patch, as there's no ideal
> category for that. For now I'll change the patch to Waiting on Author on the
> cf app, feel free to switch it back to Needs Review if you think it's more
> suitable, at least for the design discussion need.

I changed the status to Needs Review.

Hi,

Did you intend to attach updated patch ?

I don't seem to find any.

FYI

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

03 февраля 2022 г., 16:48:06

On Thu, 3 Feb 2022 08:48:00 -0800
Zhihong Yu <zyu@yugabyte.com> wrote:

> On Thu, Feb 3, 2022 at 8:28 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> 
> > Hi,
> >
> > On Thu, 13 Jan 2022 18:23:42 +0800
> > Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > On Thu, Nov 25, 2021 at 04:37:10PM +0900, Yugo NAGATA wrote:
> > > > On Wed, 24 Nov 2021 04:31:25 +0000
> > > > "r.takahashi_2@fujitsu.com" <r.takahashi_2@fujitsu.com> wrote:
> > > >
> > > > >
> > > > > I checked the same procedure on v24 patch.
> > > > > But following error occurs instead of the original error.
> > > > >
> > > > > ERROR:  relation "ivm_t_index" already exists
> > > >
> > > > Thank you for pointing out it!
> > > >
> > > > Hmmm, an index is created when IMMV is defined, so CREAE INDEX called
> > > > after this would fail... Maybe, we should not create any index
> > automatically
> > > > if IMMV is created WITH NO DATA.
> > > >
> > > > I'll fix it after some investigation.
> > >
> > > Are you still investigating on that problem?  Also, the patchset doesn't
> > apply
> > > anymore:
> >
> > I attached the updated and rebased patch set.
> >
> > I fixed to not create a unique index when an IMMV is created WITH NO DATA.
> > Instead, the index is created by REFRESH WITH DATA only when the same one
> > is not created yet.
> >
> > Also, I fixed the documentation to describe that foreign tables and
> > partitioned
> > tables are not supported according with Takahashi-san's suggestion.
> >
> > > There isn't any answer to your following email summarizing the feature
> > yet, so
> > > I'm not sure what should be the status of this patch, as there's no ideal
> > > category for that.  For now I'll change the patch to Waiting on Author
> > on the
> > > cf app, feel free to switch it back to Needs Review if you think it's
> > more
> > > suitable, at least for the design discussion need.
> >
> > I changed the status to Needs Review.
> >
> >
> > Hi,
> Did you intend to attach updated patch ?
> 
> I don't seem to find any.

Oops, I attached. Thanks!


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Вложения

Re: Implementing Incremental View Maintenance

От

Zhihong Yu

Дата:

03 февраля 2022 г., 17:51:52

On Thu, Feb 3, 2022 at 8:50 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:

On Thu, 3 Feb 2022 08:48:00 -0800
Zhihong Yu <zyu@yugabyte.com> wrote:

> On Thu, Feb 3, 2022 at 8:28 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
>
> > Hi,
> >
> > On Thu, 13 Jan 2022 18:23:42 +0800
> > Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > On Thu, Nov 25, 2021 at 04:37:10PM +0900, Yugo NAGATA wrote:
> > > > On Wed, 24 Nov 2021 04:31:25 +0000
> > > > "r.takahashi_2@fujitsu.com" <r.takahashi_2@fujitsu.com> wrote:
> > > >
> > > > >
> > > > > I checked the same procedure on v24 patch.
> > > > > But following error occurs instead of the original error.
> > > > >
> > > > > ERROR: relation "ivm_t_index" already exists
> > > >
> > > > Thank you for pointing out it!
> > > >
> > > > Hmmm, an index is created when IMMV is defined, so CREAE INDEX called
> > > > after this would fail... Maybe, we should not create any index
> > automatically
> > > > if IMMV is created WITH NO DATA.
> > > >
> > > > I'll fix it after some investigation.
> > >
> > > Are you still investigating on that problem? Also, the patchset doesn't
> > apply
> > > anymore:
> >
> > I attached the updated and rebased patch set.
> >
> > I fixed to not create a unique index when an IMMV is created WITH NO DATA.
> > Instead, the index is created by REFRESH WITH DATA only when the same one
> > is not created yet.
> >
> > Also, I fixed the documentation to describe that foreign tables and
> > partitioned
> > tables are not supported according with Takahashi-san's suggestion.
> >
> > > There isn't any answer to your following email summarizing the feature
> > yet, so
> > > I'm not sure what should be the status of this patch, as there's no ideal
> > > category for that. For now I'll change the patch to Waiting on Author
> > on the
> > > cf app, feel free to switch it back to Needs Review if you think it's
> > more
> > > suitable, at least for the design discussion need.
> >
> > I changed the status to Needs Review.
> >
> >
> > Hi,
> Did you intend to attach updated patch ?
>
> I don't seem to find any.

Oops, I attached. Thanks!

Hi,

For CreateIndexOnIMMV():

+ ereport(NOTICE,

+ (errmsg("could not create an index on materialized view \"%s\" automatically",

...

+ return;

+ }

Should the return type be changed to bool so that the caller knows whether the index creation succeeds ?

If index creation is unsuccessful, should the call to CreateIvmTriggersOnBaseTables() be skipped ?

For check_ivm_restriction_walker():

+ break;
+ expression_tree_walker(node, check_ivm_restriction_walker, NULL);
+ break;

Something is missing between the break and expression_tree_walker().

Cheers

Re: Implementing Incremental View Maintenance

От

huyajun

Дата:

16 февраля 2022 г., 14:34:18

Hi, Nagata-san

I am very interested in IMMV and read your patch but have some comments in v25-0007-Add-Incremental-View-Maintenance-support.patch and want to discuss with you.

+ /* For IMMV, we need to rewrite matview query */ + query = rewriteQueryForIMMV(query, into->colNames); + query_immv = copyObject(query);

/* Create triggers on incremental maintainable materialized view */ + Assert(query_immv != NULL); + CreateIvmTriggersOnBaseTables(query_immv, matviewOid, true);

1. Do we need copy query?Is it okay that CreateIvmTriggersOnBaseTables directly use (Query *) into->viewQuery instead of query_immv like CreateIndexOnIMMV? It seems only planner may change query, but it shouldn't affect us finding the correct base table in CreateIvmTriggersOnBaseTables .

+void

+CreateIndexOnIMMV(Query *query, Relation matviewRel)

+ Query *qry = (Query *) copyObject(query);

2. Also, is it okay to not copy query in CreateIndexOnIMMV? It seems we only read query in CreateIndexOnIMMV.

Regards,

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

01 марта 2022 г., 18:55:01

On Wed, 16 Feb 2022 22:34:18 +0800
huyajun <hu_yajun@qq.com> wrote:

> Hi, Nagata-san
> I am very interested in IMMV and read your patch but have some comments in
v25-0007-Add-Incremental-View-Maintenance-support.patchand want to discuss with you.
 

Thank you for your review!

> 
> +        /* For IMMV, we need to rewrite matview query */
> +        query = rewriteQueryForIMMV(query, into->colNames);
> +        query_immv = copyObject(query);
> 
> /* Create triggers on incremental maintainable materialized view */
> +                Assert(query_immv != NULL);
> +                CreateIvmTriggersOnBaseTables(query_immv, matviewOid, true);
> 1. Do we need copy query?Is it okay  that CreateIvmTriggersOnBaseTables directly use (Query *) into->viewQuery
insteadof query_immv like CreateIndexOnIMMV?  It seems only planner may change query, but it shouldn't affect us
findingthe correct base table in CreateIvmTriggersOnBaseTables .
 

The copy to query_immv was necessary for supporting sub-queries in the view
definition. However, we excluded the fueature from the current patch to reduce
the patch size, so it would be unnecessary. I'll fix it. 

> 
> +void
> +CreateIndexOnIMMV(Query *query, Relation matviewRel)
> +{
> +    Query *qry = (Query *) copyObject(query);
> 2. Also, is it okay to not copy query in CreateIndexOnIMMV? It seems we only read query in CreateIndexOnIMMV.

This was also necessary for supporting CTEs, but unnecessary in the current
patch, so I'll fix it, too.

Regards,
Yugo Nagata


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

14 марта 2022 г., 10:12:17

Hi,

I attached the updated patch-set (v26).

> On Wed, 16 Feb 2022 22:34:18 +0800
> huyajun <hu_yajun@qq.com> wrote:
> 
> > Hi, Nagata-san
> > I am very interested in IMMV and read your patch but have some comments in
v25-0007-Add-Incremental-View-Maintenance-support.patchand want to discuss with you.
 
> 
> Thank you for your review!
> 
> > 
> > +        /* For IMMV, we need to rewrite matview query */
> > +        query = rewriteQueryForIMMV(query, into->colNames);
> > +        query_immv = copyObject(query);
> > 
> > /* Create triggers on incremental maintainable materialized view */
> > +                Assert(query_immv != NULL);
> > +                CreateIvmTriggersOnBaseTables(query_immv, matviewOid, true);
> > 1. Do we need copy query?Is it okay  that CreateIvmTriggersOnBaseTables directly use (Query *) into->viewQuery
insteadof query_immv like CreateIndexOnIMMV?  It seems only planner may change query, but it shouldn't affect us
findingthe correct base table in CreateIvmTriggersOnBaseTables .
 
> 
> The copy to query_immv was necessary for supporting sub-queries in the view
> definition. However, we excluded the fueature from the current patch to reduce
> the patch size, so it would be unnecessary. I'll fix it. 
> 
> > 
> > +void
> > +CreateIndexOnIMMV(Query *query, Relation matviewRel)
> > +{
> > +    Query *qry = (Query *) copyObject(query);
> > 2. Also, is it okay to not copy query in CreateIndexOnIMMV? It seems we only read query in CreateIndexOnIMMV.
> 
> This was also necessary for supporting CTEs, but unnecessary in the current
> patch, so I'll fix it, too.

I removed unnecessary copies of Query in according with the suggestions
from huyajun, and fix wrong codes in a "switch" statement pointed out
by Zhihong Yu.

In addition, I made the following fixes:
- Fix psql tab-completion code according with master branch
- Fix auto-index-creation that didn't work well in REFRESH command
- Add documentation description about the automatic index creation

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Hi,

On Fri, 1 Apr 2022 11:09:16 -0400
Greg Stark <stark@mit.edu> wrote:

> This patch has bitrotted due to some other patch affecting trigger.c.
> 
> Could you post a rebase?
> 
> This is the last week of the CF before feature freeze so time is of the essence.

I attached a rebased patch-set.

Also, I made the folowing changes from the previous.

1. Fix to not use a new deptye

In the previous patch, we introduced a new deptye 'm' into pg_depend.
This deptype was used for looking for IVM triggers to be removed at
REFRESH WITH NO DATA. However, we decided to not use it for reducing
unnecessary change in the core code. Currently, the trigger name and
dependent objclass are used at that time instead of it.

As a result, the number of patches are reduced to nine from ten.

2. Bump the version numbers in psql and pg_dump

This feature's target is PG 16 now.

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Вложения

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

22 апреля 2022 г., 05:58:01

On Fri, 22 Apr 2022 11:29:39 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:

> Hi,
> 
> On Fri, 1 Apr 2022 11:09:16 -0400
> Greg Stark <stark@mit.edu> wrote:
> 
> > This patch has bitrotted due to some other patch affecting trigger.c.
> > 
> > Could you post a rebase?
> > 
> > This is the last week of the CF before feature freeze so time is of the essence.
> 
> I attached a rebased patch-set.
> 
> Also, I made the folowing changes from the previous.
> 
> 1. Fix to not use a new deptye
> 
> In the previous patch, we introduced a new deptye 'm' into pg_depend.
> This deptype was used for looking for IVM triggers to be removed at
> REFRESH WITH NO DATA. However, we decided to not use it for reducing
> unnecessary change in the core code. Currently, the trigger name and
> dependent objclass are used at that time instead of it.
> 
> As a result, the number of patches are reduced to nine from ten.


> 2. Bump the version numbers in psql and pg_dump
> 
> This feature's target is PG 16 now.

Sorry, I revert this change. It was too early to bump up the
version number.

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Вложения

Re: Implementing Incremental View Maintenance

От

Greg Stark

Дата:

23 апреля 2022 г., 06:18:01

I'm trying to figure out how to get this feature more attention. Everyone agrees it would be a huge help but it's a scary patch to review.

I wonder if it would be helpful to have a kind of "readers guide" explanation of the patches to help a reviewer understand what the point of each patch is and how the whole system works? I think Andres and Robert have both taken that approach before with big patches and it really helped imho.

On Fri., Apr. 22, 2022, 08:01 Yugo NAGATA, <nagata@sraoss.co.jp> wrote:

On Fri, 22 Apr 2022 11:29:39 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:

> Hi,
>
> On Fri, 1 Apr 2022 11:09:16 -0400
> Greg Stark <stark@mit.edu> wrote:
>
> > This patch has bitrotted due to some other patch affecting trigger.c.
> >
> > Could you post a rebase?
> >
> > This is the last week of the CF before feature freeze so time is of the essence.
>
> I attached a rebased patch-set.
>
> Also, I made the folowing changes from the previous.
>
> 1. Fix to not use a new deptye
>
> In the previous patch, we introduced a new deptye 'm' into pg_depend.
> This deptype was used for looking for IVM triggers to be removed at
> REFRESH WITH NO DATA. However, we decided to not use it for reducing
> unnecessary change in the core code. Currently, the trigger name and
> dependent objclass are used at that time instead of it.
>
> As a result, the number of patches are reduced to nine from ten.

> 2. Bump the version numbers in psql and pg_dump
>
> This feature's target is PG 16 now.

Sorry, I revert this change. It was too early to bump up the
version number.

--
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

28 апреля 2022 г., 06:40:11

Hello Greg,

On Sat, 23 Apr 2022 08:18:01 +0200
Greg Stark <stark@mit.edu> wrote:

> I'm trying to figure out how to get this feature more attention. Everyone
> agrees it would be a huge help but it's a scary patch to review.
> 
> I wonder if it would be helpful to have a kind of "readers guide"
> explanation of the patches to help a reviewer understand what the point of
> each patch is and how the whole system works? I think Andres and Robert
> have both taken that approach before with big patches and it really helped
> imho.

Thank you very much for your suggestion!

Following your advice, I am going to write a readers guide referring to the past
posts of Andres and Rebert. 


Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

huyajun

Дата:

29 июня 2022 г., 09:56:39

2022年4月22日下午1:58，Yugo NAGATA <nagata@sraoss.co.jp> 写道：

On Fri, 22 Apr 2022 11:29:39 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:

Hi,

On Fri, 1 Apr 2022 11:09:16 -0400
Greg Stark <stark@mit.edu> wrote:

This patch has bitrotted due to some other patch affecting trigger.c.

Could you post a rebase?

This is the last week of the CF before feature freeze so time is of the essence.

I attached a rebased patch-set.

Also, I made the folowing changes from the previous.

1. Fix to not use a new deptye

In the previous patch, we introduced a new deptye 'm' into pg_depend.
This deptype was used for looking for IVM triggers to be removed at
REFRESH WITH NO DATA. However, we decided to not use it for reducing
unnecessary change in the core code. Currently, the trigger name and
dependent objclass are used at that time instead of it.

As a result, the number of patches are reduced to nine from ten.

2. Bump the version numbers in psql and pg_dump

This feature's target is PG 16 now.

Sorry, I revert this change. It was too early to bump up the
version number.

--
Yugo NAGATA <nagata@sraoss.co.jp>

<v27-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patch><v27-0002-Add-relisivm-column-to-pg_class-system-catalog.patch><v27-0003-Allow-to-prolong-life-span-of-transition-tables-.patch><v27-0004-Add-Incremental-View-Maintenance-support-to-pg_d.patch><v27-0005-Add-Incremental-View-Maintenance-support-to-psql.patch><v27-0006-Add-Incremental-View-Maintenance-support.patch><v27-0007-Add-aggregates-support-in-IVM.patch><v27-0008-Add-regression-tests-for-Incremental-View-Mainte.patch><v27-0009-Add-documentations-about-Incremental-View-Mainte.patch>

Hi, Nagata-san

I read your patch with v27 version and has some new comments,I want to discuss with you.

1. How about use DEPENDENCY_INTERNAL instead of DEPENDENCY_AUTO

when record dependence on trigger created by IMV.( related code is in the end of CreateIvmTrigger)

Otherwise, User can use sql to drop trigger and corrupt IVM, DEPENDENCY_INTERNAL is also semantically more correct

Crash case like:

create table t( a int);

create incremental materialized view s as select * from t;

drop trigger "IVM_trigger_XXXX”;

Insert into t values(1);

2. In get_matching_condition_string, Considering NULL values, we can not use simple = operator.

But how about 'record = record', record_eq treat NULL = NULL

it should fast than current implementation for only one comparation

Below is my simple implementation with this, Variables are named arbitrarily..

I test some cases it’s ok

static char *

get_matching_condition_string(List *keys)

{

StringInfoData match_cond;

ListCell *lc;

/* If there is no key columns, the condition is always true. */

if (keys == NIL)

return "true";

else

{

StringInfoData s1;

StringInfoData s2;

initStringInfo(&match_cond);

initStringInfo(&s1);

initStringInfo(&s2);

/* Considering NULL values, we can not use simple = operator. */

appendStringInfo(&s1, "ROW(");

appendStringInfo(&s2, "ROW(");

foreach (lc, keys)

{

Form_pg_attribute attr = (Form_pg_attribute) lfirst(lc);

char *resname = NameStr(attr->attname);

char *mv_resname = quote_qualified_identifier("mv", resname);

char *diff_resname = quote_qualified_identifier("diff", resname);

appendStringInfo(&s1, "%s", mv_resname);

appendStringInfo(&s2, "%s", diff_resname);

if (lnext(lc))

{

appendStringInfo(&s1, ", ");

appendStringInfo(&s2, ", ");

}

}

appendStringInfo(&s1, ")::record");

appendStringInfo(&s2, ")::record");

appendStringInfo(&match_cond, "%s operator(pg_catalog.=) %s", s1.data, s2.data);

return match_cond.data;

}

}

3. Consider truncate base tables, IVM will not refresh, maybe raise an error will be better

4. In IVM_immediate_before,I know Lock base table with ExclusiveLock is

for concurrent updates to the IVM correctly, But how about to Lock it when actually

need to maintain MV which in IVM_immediate_maintenance

In this way you don't have to lock multiple times.

5. Why we need CreateIndexOnIMMV, is it a optimize?

It seems like when maintenance MV,

the index may not be used because of our match conditions can’t use simple = operator

Looking forward to your early reply to answer my above doubts, thank you a lot!

Regards,

Yajun Hu

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

08 июля 2022 г., 10:22:11

Hi huyajun, 

Thank you for your comments!

On Wed, 29 Jun 2022 17:56:39 +0800
huyajun <hu_yajun@qq.com> wrote:


> Hi, Nagata-san
> I read your patch with v27 version and has some new comments,I want to discuss with you.
> 
> 1. How about use DEPENDENCY_INTERNAL instead of DEPENDENCY_AUTO
>   when record dependence on trigger created by IMV.( related code is in the end of CreateIvmTrigger)
> Otherwise, User can use sql to drop trigger and corrupt IVM, DEPENDENCY_INTERNAL is also semantically more correct
> Crash case like:
>    create table t( a int);
>    create incremental  materialized view s as select * from t;
>    drop trigger "IVM_trigger_XXXX”;
>    Insert into t values(1);

We use DEPENDENCY_AUTO because we want to delete the triggers when
REFRESH ... WITH NO DATA is performed on the materialized view in order
to disable IVM. Triggers created with DEPENDENCY_INTERNAL cannot be dropped.
Such triggers are re-created when REFRESH ... [WITH DATA] is performed.

We can use DEPENDENCY_INTERNAL if we disable/enable such triggers instead of
dropping/re-creating them, although users also can disable triggers using
ALTER TRIGGER.

> 2. In get_matching_condition_string, Considering NULL values, we can not use simple = operator.
> But how about 'record = record', record_eq treat NULL = NULL
> it should fast than current implementation for only one comparation
> Below is my simple implementation with this, Variables are named arbitrarily..
> I test some cases it’s ok
> 
> static char *
> get_matching_condition_string(List *keys)
> {
>     StringInfoData match_cond;
>     ListCell    *lc;
> 
>     /* If there is no key columns, the condition is always true. */
>     if (keys == NIL)
>         return "true";
>     else
>     {
>         StringInfoData s1;
>         StringInfoData s2;
>         initStringInfo(&match_cond);
>         initStringInfo(&s1);
>         initStringInfo(&s2);
>         /* Considering NULL values, we can not use simple = operator. */
>         appendStringInfo(&s1, "ROW(");
>         appendStringInfo(&s2, "ROW(");
>         foreach (lc, keys)
>         {
>             Form_pg_attribute attr = (Form_pg_attribute) lfirst(lc);
>             char   *resname = NameStr(attr->attname);
>             char   *mv_resname = quote_qualified_identifier("mv", resname);
>             char   *diff_resname = quote_qualified_identifier("diff", resname);
>             
>             appendStringInfo(&s1, "%s", mv_resname);
>             appendStringInfo(&s2, "%s", diff_resname);
> 
>             if (lnext(lc))
>             {
>                 appendStringInfo(&s1, ", ");
>                 appendStringInfo(&s2, ", ");
>             }
>         }
>         appendStringInfo(&s1, ")::record");
>         appendStringInfo(&s2, ")::record");
>         appendStringInfo(&match_cond, "%s operator(pg_catalog.=) %s", s1.data, s2.data);
>         return match_cond.data;
>     }
> }

As you say, we don't have to use IS NULL if we use ROW(...)::record, but we
cannot use an index in this case and it makes IVM ineffecient. As showed
bellow (#5), an index works even when we use simple = operations together
with together "IS NULL" operations.
 
> 3. Consider truncate base tables, IVM will not refresh, maybe raise an error will be better

I fixed to support TRUNCATE on base tables in our repository.
https://github.com/sraoss/pgsql-ivm/commit/a1365ed69f34e1adbd160f2ce8fd1e80e032392f

When a base table is truncated, the view content will be empty if the
view definition query does not contain an aggregate without a GROUP clause.
Therefore, such views can be truncated. 

Aggregate views without a GROUP clause always have one row. Therefore,
if a base table is truncated, the view will not be empty and will contain
a row with NULL value (or 0 for count()). So, in this case, we refresh the
view instead of truncating it.

The next version of the patch-set will include this change. 
 
> 4. In IVM_immediate_before,I know Lock base table with ExclusiveLock is 
>   for concurrent updates to the IVM correctly, But how about to Lock it when actually 
>   need  to maintain MV which in IVM_immediate_maintenance
>     In this way you don't have to lock multiple times.

Yes, as you say, we don't have to lock the view multiple times.
I'll investigate better locking ways including the way that you suggest.
 
> 5. Why we need CreateIndexOnIMMV, is it a optimize? 
>    It seems like when maintenance MV,
>    the index may not be used because of our match conditions  can’t use simple = operator

No, the index works even when we use simple = operator together with "IS NULL".
For example:

postgres=# \d mv
           Materialized view "public.mv"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 id     | integer |           |          | 
 v1     | integer |           |          | 
 v2     | integer |           |          | 
Indexes:
    "mv_index" UNIQUE, btree (id) NULLS NOT DISTINCT

postgres=# EXPLAIN ANALYZE
     WITH diff(id, v1, v2) AS MATERIALIZED ((VALUES(42, 420, NULL::int)))
     SELECT mv.* FROM mv, diff 
     WHERE (mv.id = diff.id OR (mv.id IS NULL AND diff.id IS NULL)) AND 
           (mv.v1 = diff.v1 OR (mv.v1 IS NULL AND diff.v1 IS NULL)) AND 
           (mv.v2 = diff.v2 OR (mv.v2 IS NULL AND diff.v2 IS NULL));

                                                                                              QUERY PLAN
                                                                
 
             

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------
 Nested Loop  (cost=133.87..137.92 rows=1 width=12) (actual time=0.180..0.191 rows=1 loops=1)
   CTE diff
     ->  Result  (cost=0.00..0.01 rows=1 width=12) (actual time=0.027..0.028 rows=1 loops=1)
   ->  CTE Scan on diff  (cost=0.00..0.02 rows=1 width=12) (actual time=0.037..0.040 rows=1 loops=1)
   ->  Bitmap Heap Scan on mv  (cost=133.86..137.88 rows=1 width=12) (actual time=0.127..0.132 rows=1 loops=1)
         Recheck Cond: ((id = diff.id) OR (id IS NULL))
         Filter: (((id = diff.id) OR ((id IS NULL) AND (diff.id IS NULL))) AND ((v1 = diff.v1) OR ((v1 IS NULL) AND
(diff.v1IS NULL))) AND ((v2 = diff.v2) OR ((v2 IS NULL) AND (diff.v2
 
 IS NULL))))
         Heap Blocks: exact=1
         ->  BitmapOr  (cost=133.86..133.86 rows=1 width=0) (actual time=0.091..0.093 rows=0 loops=1)
               ->  Bitmap Index Scan on mv_index  (cost=0.00..4.43 rows=1 width=0) (actual time=0.065..0.065 rows=1
loops=1)
                     Index Cond: (id = diff.id)
               ->  Bitmap Index Scan on mv_index  (cost=0.00..4.43 rows=1 width=0) (actual time=0.021..0.021 rows=0
loops=1)
                     Index Cond: (id IS NULL)
 Planning Time: 0.666 ms
 Execution Time: 0.399 ms
(15 rows)



Regards,
Yugo Nagata


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

huyajun

Дата:

26 июля 2022 г., 04:00:26

Hi, Nagata-san

Thank you for your answer, I agree with your opinion, and found some new problems to discuss with you

3. Consider truncate base tables, IVM will not refresh, maybe raise an error will be better

I fixed to support TRUNCATE on base tables in our repository.
https://github.com/sraoss/pgsql-ivm/commit/a1365ed69f34e1adbd160f2ce8fd1e80e032392f

When a base table is truncated, the view content will be empty if the
view definition query does not contain an aggregate without a GROUP clause.
Therefore, such views can be truncated.

Aggregate views without a GROUP clause always have one row. Therefore,
if a base table is truncated, the view will not be empty and will contain
a row with NULL value (or 0 for count()). So, in this case, we refresh the
view instead of truncating it.

The next version of the patch-set will include this change.

I read your patch and think this processing is greet, but there is a risk of deadlock.

Although I have not thought of a suitable processing method for the time being,

it is also acceptable for truncate scenarios.The deadlock scene is as follows:

Mv define is: select * from base_a,base_b;

S1: truncate base_a; — only AccessExclusiveLock base_a and not run into after trigger

S2: insert into base_b; — The update has been completed and the incremental refresh is started in the after trigger，RowExclusive on base_b and ExclusiveLock on mv

S1: continue truncate mv, wait for AccessExclusiveLock on mv, wait for S2

S2: continue refresh mv, wait for AccessShardLock on base_a, wait for S1

So deadlock occurred

I also found some new issues that I would like to discuss with you

1. Concurrent DML causes imv data error, case like below

Setup:

Create table t( a int);

Insert into t select 1 from generate_series(1,3);

create incremental materialized view s as select count(*) from t;

S1: begin;delete from t where ctid in (select ctid from t limit 1);

S2: begin;delete from t where ctid in (select ctid from t limit 1 offset 1);

S1: commit;

S2: commit;

After this, The count data of s becomes 2 but correct data is 1.

I found out that the problem is probably because to our use of ctid update

Consider user behavior unrelated to imv：

Create table t( a int);

Insert into t select 1;

s1: BEGIN

s1: update t set a = 2 where ctid in (select ctid from t); -- UPDATE 1

s2: BEGIN

s2: update t set a = 3 where ctid in (select ctid from t); -- wait row lock

s1: COMMIT

s2: -- UPDATE 0 -- ctid change so can't UPDATE one rows

So we lost the s2 update

2. Sometimes it will crash when the columns of the created materialized view do not match

Create table t( a int);

create incremental materialized view s(z) as select sum(1) as a, sum(1) as b from t;

The problem should be that colNames in rewriteQueryForIMMV does not consider this situation

3. Sometimes no error when the columns of the created materialized view do not match

Create table t( a int);

create incremental materialized view s(y,z) as select count(1) as b from t;

But the hidden column of IMV is overwritten to z which will cause refresh failed.

The problem should be that checkRuleResultList we should only skip imv hidden columns check

4. A unique index should not be created in the case of a Cartesian product

create table base_a (i int primary key, j varchar);

create table base_b (i int primary key, k varchar);

INSERT INTO base_a VALUES

(1,10),

(2,20),

(3,30),

(4,40),

(5,50);

INSERT INTO base_b VALUES

(1,101),

(2,102),

(3,103),

(4,104);

CREATE incremental MATERIALIZED VIEW s as

select base_a.i,base_a.j from base_a,base_b; — create error because of unique index

5. Besides, I would like to ask you if you have considered implementing an IMV with delayed refresh?

The advantage of delayed refresh is that it will not have much impact on write performance

I probably have some ideas about it now, do you think it works?

1. After the base table is updated, the delayed IMV's after trigger is used to record the delta

information in another table similar to the incremental log of the base table

2. When incremental refresh, use the data in the log instead of the data in the trasient table

of the after trigger

3. We need to merge the incremental information in advance to ensure that the base_table

after transaction filtering UNION ALL old_delta is the state before the base table is updated

Case like below:

Create table t( a int);

—begin to record log

Insert into t select 1; — newlog: 1 oldlog: empty

Delete from t; —newlog:1, oldlog：1

— begin to incremental refresh

Select * from t where xmin < xid or (xmin = xid and cmin < cid); — empty

So this union all oldlog is not equal to before the base table is updated

We need merge the incremental log in advance to make newlog: empty, oldlog: empty

If implemented, incremental refresh must still be serialized, but the DML of the base table

can not be blocked, that is to say, the base table can still record logs during incremental refresh,

as long as we use same snapshot when incrementally updating.

do you think there will be any problems with this solution?

Looking forward to your reply to answer my above doubts, thank you a lot!

Regards,

Yajun Hu

Re: Implementing Incremental View Maintenance

От

Yugo NAGATA

Дата:

09 сентября 2022 г., 11:10:32

Hello huyajun,

I'm sorry for delay in my response.

On Tue, 26 Jul 2022 12:00:26 +0800
huyajun <hu_yajun@qq.com> wrote:

> I read your patch and think this processing is greet, but there is a risk of deadlock. 
> Although I have not thought of a suitable processing method for the time being, 
> it is also acceptable for truncate scenarios.The deadlock scene is as follows:
> 
> Mv define is: select * from base_a,base_b;
> S1: truncate base_a; ― only AccessExclusiveLock base_a and not run into after trigger
> S2: insert into base_b; ― The update has been completed and the incremental refresh is started in the after
trigger，RowExclusiveon base_b and ExclusiveLock on mv
 
> S1: continue truncate mv, wait for AccessExclusiveLock on mv, wait for S2
> S2: continue refresh mv, wait for AccessShardLock on base_a, wait for S1
> So deadlock occurred

Hmm, this deadlock scenario is possible, indeed.

One idea to resolve it is to acquire RowExclusive locks on all base tables
in the BEFORE trigger. If so, S2 can not progress its process because it
waits for a RowExclusive lock on base_b, and it can not  acquire ExeclusiveLock
on mv before S1 finishes.

> I also found some new issues that I would like to discuss with you

Thank you so much for your massive bug reports!

> 1. Concurrent DML causes imv data error, case like below
> Setup:
> Create table t( a int);
> Insert into t select 1 from generate_series(1,3);
> create incremental materialized view s  as select count(*) from t;
> 
> S1: begin;delete from t where ctid in (select ctid from t limit 1);
> S2: begin;delete from t where ctid in (select ctid from t limit 1 offset 1);
> S1: commit;
> S2: commit;
> 
> After this, The count data of s becomes 2 but correct data is 1.
> I found out that the problem is probably because to our use of ctid update
> Consider user behavior unrelated to imv：
> 
> Create table t( a int);
> Insert into t select 1;
> s1: BEGIN
> s1: update t set a = 2 where ctid in (select ctid from t); -- UPDATE 1
> s2: BEGIN
> s2: update t set a = 3 where ctid in (select ctid from t); -- wait row lock
> s1: COMMIT
> s2: -- UPDATE 0 -- ctid change so can't UPDATE one rows
> So we lost the s2 update
> 
> 2. Sometimes it will crash when the columns of the created materialized view do not match
> Create table t( a int);
> create incremental materialized view s(z)  as select sum(1) as a, sum(1) as b from t;
> 
> The problem should be that colNames in rewriteQueryForIMMV does not consider this situation
> 
> 3. Sometimes no error when the columns of the created materialized view do not match
>  Create table t( a int);
>  create incremental materialized view s(y,z)  as select  count(1) as b from t;
> 
> But the hidden column of IMV is overwritten to z which will cause refresh failed.
> 
> The problem should be that checkRuleResultList we should only skip imv hidden columns check
> 
> 4. A unique index should not be created in the case of a Cartesian product
> 
> create table base_a (i int primary key, j varchar);
> create table base_b (i int primary key, k varchar);
> INSERT INTO base_a VALUES
> (1,10),
> (2,20),
> (3,30),
> (4,40),
> (5,50);
> INSERT INTO base_b VALUES
> (1,101),
> (2,102),
> (3,103),
> (4,104);
> CREATE incremental MATERIALIZED VIEW s as
> select base_a.i,base_a.j from base_a,base_b; ― create error because of unique index

I am working on above issues (#1-#4) now, and I'll respond on each later.

> 5. Besides, I would like to ask you if you have considered implementing an IMV with delayed refresh?
> The advantage of delayed refresh is that it will not have much impact on write performance

Yes, I've been thinking to implement deferred maintenance since the beginning of
this IVM project. However, we've decided to start from immediate maintenance, and
will plan to propose deferred maintenance to the core after the current patch is
accepted. (I plan to implement this feature in pg_ivm extension module first,
though.)

> I probably have some ideas about it now, do you think it works?
> 1. After the base table is updated, the delayed IMV's after trigger is used to record the delta
>  information in another table similar to the incremental log of the base table
> 2. When incremental refresh, use the data in the log instead of the data in the trasient table
> of the after trigger
> 3. We need to merge the incremental information in advance to ensure that the base_table
> after transaction filtering UNION ALL old_delta is the state before the base table is updated
> Case like below:
> Create table t( a int);
> ―begin to record log
> Insert into t select 1; ― newlog: 1 oldlog: empty
> Delete from t; ―newlog:1, oldlog：1
> ― begin to incremental refresh
> Select * from t where xmin < xid or (xmin = xid and cmin < cid); ― empty
> So this union all oldlog is not equal to  before the base table is updated
> We need  merge the incremental log in advance to make newlog: empty, oldlog: empty
> 
> If implemented, incremental refresh must still be serialized, but the DML of the base table 
> can not be blocked, that is to say, the base table can still record logs during incremental refresh,
> as long as we use same snapshot when incrementally updating.
> 
> do you think there will be any problems with this solution?

I guess the deferred maintenance process would be basically what similar
to above. Especially, as you say, we need to merge incremental information
in some way before calculating deltas on the view.  I investigated some
research papers, but I'll review again before working on deferred approach
design.


Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: Implementing Incremental View Maintenance

От

Michael Paquier

Дата:

12 октября 2022 г., 07:53:45

On Fri, Sep 09, 2022 at 08:10:32PM +0900, Yugo NAGATA wrote:
> I am working on above issues (#1-#4) now, and I'll respond on each later.

Okay, well.  There has been some feedback sent lately and no update
for one month, so I am marking it as RwF for now.  As a whole the
patch has been around for three years and it does not seem that a lot
has happened in terms of design discussion (now the thread is long so
I would easily miss something)..
--
Michael

Вложения

signature.asc

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Implementing Incremental View Maintenance

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения