Обсуждение: Feature: triggers on materialized views

Поиск
Список
Период
Сортировка

Feature: triggers on materialized views

От
Mitar
Дата:
Hi!

Based on discussion about observing changes on an open query in a
reactive manner (to support reactive web applications) [1], I
identified that one critical feature is missing to fully implement
discussed design of having reactive queries be represented as
materialized views, and changes to these materialized views would then
be observed and pushed to the client through LISTEN/NOTIFY.

This is my first time contributing to PostgreSQL, so I hope I am
starting this process well.

I would like to propose that support for AFTER triggers are added to
materialized views. I experimented a bit and it seems this is mostly
just a question of enabling/exposing them. See attached patch. This
enabled me to add trigger to a material view which mostly worked. Here
are my findings.

Running REFRESH MATERIALIZED VIEW CONCURRENTLY calls triggers. Both
per statement and per row. There are few improvements which could be
done:

- Currently only insert and remove operations are done on the
materialized view. This is because the current logic just removes
changed rows and inserts new rows.
- In current concurrently refresh logic those insert and remove
operations are made even if there are no changes to be done. Which
triggers a statement trigger unnecessary. A small improvement could be
to skip the statement in that case, but looking at the code this seems
maybe tricky because both each of inserts and deletions are done
inside one query each.
- Current concurrently refresh logic does never do updates on existing
rows. It would be nicer to have that so that triggers are more aligned
with real changes to the data. So current two queries could be changed
to three, each doing one of the insert, update, and delete.

Non-concurrent refresh does not trigger any trigger. But it seems all
data to do so is there (previous table, new table), at least for the
statement-level trigger. Row-level triggers could also be simulated
probably (with TRUNCATE and INSERT triggers).

[1] https://www.postgresql.org/message-id/flat/CAKLmikP%2BPPB49z8rEEvRjFOD0D2DV72KdqYN7s9fjh9sM_32ZA%40mail.gmail.com


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

Вложения

Re: Feature: triggers on materialized views

От
David Fetter
Дата:
On Mon, Dec 24, 2018 at 12:59:43PM -0800, Mitar wrote:
> Hi!
> 
> Based on discussion about observing changes on an open query in a
> reactive manner (to support reactive web applications) [1], I
> identified that one critical feature is missing to fully implement
> discussed design of having reactive queries be represented as
> materialized views, and changes to these materialized views would then
> be observed and pushed to the client through LISTEN/NOTIFY.
> 
> This is my first time contributing to PostgreSQL, so I hope I am
> starting this process well.

You've got the right mailing list, a description of what you want, and
a PoC patch. You also got the patch in during the time between
Commitfests. You're doing great!

> I would like to propose that support for AFTER triggers are added to
> materialized views. I experimented a bit and it seems this is mostly
> just a question of enabling/exposing them. See attached patch.

About that. When there's a change (or possible change) in user-visible
behavior, it should come with regression tests, which it would make
sense to add to src/tests/regress/matview.sql along with the
corresponding changes to src/tests/regress/expected/matview.out

> This enabled me to add trigger to a material view which mostly
> worked. Here are my findings.
> 
> Running REFRESH MATERIALIZED VIEW CONCURRENTLY calls triggers. Both
> per statement and per row.

You'd want at least one test for each of those new features.

> There are few improvements which could be
> done:
> 
> - Currently only insert and remove operations are done on the
> materialized view. This is because the current logic just removes
> changed rows and inserts new rows.

What other operations might you want to support?

> - In current concurrently refresh logic those insert and remove
> operations are made even if there are no changes to be done. Which
> triggers a statement trigger unnecessary. A small improvement could be
> to skip the statement in that case, but looking at the code this seems
> maybe tricky because both each of inserts and deletions are done
> inside one query each.

As far as you can tell, is this just an efficiency optimization, or
might it go to correctness of the behavior?

> - Current concurrently refresh logic does never do updates on existing
> rows. It would be nicer to have that so that triggers are more aligned
> with real changes to the data. So current two queries could be changed
> to three, each doing one of the insert, update, and delete.

I'm not sure I understand the problem being described here. Do you see
these as useful to separate for some reason?

> Non-concurrent refresh does not trigger any trigger. But it seems
> all data to do so is there (previous table, new table), at least for
> the statement-level trigger. Row-level triggers could also be
> simulated probably (with TRUNCATE and INSERT triggers).

Would it make more sense to fill in the missing implementations of NEW
and OLD for per-row triggers instead of adding another hack?

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Feature: triggers on materialized views

От
Mitar
Дата:
Hi!

Thanks for reply!

On Mon, Dec 24, 2018 at 2:20 PM David Fetter <david@fetter.org> wrote:
> You've got the right mailing list, a description of what you want, and
> a PoC patch. You also got the patch in during the time between
> Commitfests. You're doing great!

Great!

One thing I am unclear about is how it is determined if this is a
viable feature to be eventually included. You gave me some suggestions
to improve in my patch (adding tests and so on). Does this mean that
the patch should be fully done before a decision is made?

Also, the workflow is that I improve things, and resubmit a patch to
the mailing list, for now?

> > - Currently only insert and remove operations are done on the
> > materialized view. This is because the current logic just removes
> > changed rows and inserts new rows.
>
> What other operations might you want to support?

Update. So if a row is changing, instead of doing a remove and insert,
what currently is being done, I would prefer an update. Then UPDATE
trigger operation would happen as well. Maybe the INSERT query could
be changed to INSERT ... ON CONFLICT UPDATE query (not sure if this
one does UPDATE trigger operation on conflict), and REMOVE changed to
remove just rows which were really removed, but not only updated.

> As far as you can tell, is this just an efficiency optimization, or
> might it go to correctness of the behavior?

It is just an optimization. Or maybe even just a surprise. Maybe a
documentation addition could help here. In my use case I would loop
over OLD and NEW REFERENCING TABLE so if they are empty, nothing would
be done. But it is just surprising that DELETE trigger is called even
when no rows are being deleted in the materialized view.

> I'm not sure I understand the problem being described here. Do you see
> these as useful to separate for some reason?

So rows which are just updated currently get first DELETE trigger
called and then INSERT. The issue is that if I am observing this
behavior from outside, it makes it unclear when I see DELETE if this
means really that a row has been deleted or it just means that later
on an INSERT would happen. Now I have to wait for an eventual INSERT
to determine that. But how long should I wait? It makes consuming
these notifications tricky.

If I just blindly respond to those notifications, this could introduce
other problems. For example, if I have a reactive web application it
could mean a visible flicker to the user. Instead of updating rendered
row, I would first delete it and then later on re-insert it.

> > Non-concurrent refresh does not trigger any trigger. But it seems
> > all data to do so is there (previous table, new table), at least for
> > the statement-level trigger. Row-level triggers could also be
> > simulated probably (with TRUNCATE and INSERT triggers).
>
> Would it make more sense to fill in the missing implementations of NEW
> and OLD for per-row triggers instead of adding another hack?

You lost me here. But I agree, we should implement this fully, without
hacks. I just do not know how exactly.

Are you saying that we should support only row-level triggers, or that
we should support both statement-level and row-level triggers, but
just make sure we implement this properly? I think that my suggestion
of using TRUNCATE and INSERT triggers is reasonable in the case of
full refresh. This is what happens. If we would want to have
DELETE/UPDATE/INSERT triggers, we would have to compute the diff like
concurrent version has to do, which would defeat the difference
between the two. But yes, all INSERT trigger calls should have NEW
provided.

So per-statement trigger would have TRUNCATE and INSERT called. And
per-row trigger would have TRUNCATE and per-row INSERTs called.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m


Re: Feature: triggers on materialized views

От
Mitar
Дата:
Hi!

I made another version of the patch. This one does UPDATEs for changed
row instead of DELETE/INSERT.

All existing regression tests are still passing (make check).


Mitar

On Mon, Dec 24, 2018 at 4:13 PM Mitar <mmitar@gmail.com> wrote:
>
> Hi!
>
> Thanks for reply!
>
> On Mon, Dec 24, 2018 at 2:20 PM David Fetter <david@fetter.org> wrote:
> > You've got the right mailing list, a description of what you want, and
> > a PoC patch. You also got the patch in during the time between
> > Commitfests. You're doing great!
>
> Great!
>
> One thing I am unclear about is how it is determined if this is a
> viable feature to be eventually included. You gave me some suggestions
> to improve in my patch (adding tests and so on). Does this mean that
> the patch should be fully done before a decision is made?
>
> Also, the workflow is that I improve things, and resubmit a patch to
> the mailing list, for now?
>
> > > - Currently only insert and remove operations are done on the
> > > materialized view. This is because the current logic just removes
> > > changed rows and inserts new rows.
> >
> > What other operations might you want to support?
>
> Update. So if a row is changing, instead of doing a remove and insert,
> what currently is being done, I would prefer an update. Then UPDATE
> trigger operation would happen as well. Maybe the INSERT query could
> be changed to INSERT ... ON CONFLICT UPDATE query (not sure if this
> one does UPDATE trigger operation on conflict), and REMOVE changed to
> remove just rows which were really removed, but not only updated.
>
> > As far as you can tell, is this just an efficiency optimization, or
> > might it go to correctness of the behavior?
>
> It is just an optimization. Or maybe even just a surprise. Maybe a
> documentation addition could help here. In my use case I would loop
> over OLD and NEW REFERENCING TABLE so if they are empty, nothing would
> be done. But it is just surprising that DELETE trigger is called even
> when no rows are being deleted in the materialized view.
>
> > I'm not sure I understand the problem being described here. Do you see
> > these as useful to separate for some reason?
>
> So rows which are just updated currently get first DELETE trigger
> called and then INSERT. The issue is that if I am observing this
> behavior from outside, it makes it unclear when I see DELETE if this
> means really that a row has been deleted or it just means that later
> on an INSERT would happen. Now I have to wait for an eventual INSERT
> to determine that. But how long should I wait? It makes consuming
> these notifications tricky.
>
> If I just blindly respond to those notifications, this could introduce
> other problems. For example, if I have a reactive web application it
> could mean a visible flicker to the user. Instead of updating rendered
> row, I would first delete it and then later on re-insert it.
>
> > > Non-concurrent refresh does not trigger any trigger. But it seems
> > > all data to do so is there (previous table, new table), at least for
> > > the statement-level trigger. Row-level triggers could also be
> > > simulated probably (with TRUNCATE and INSERT triggers).
> >
> > Would it make more sense to fill in the missing implementations of NEW
> > and OLD for per-row triggers instead of adding another hack?
>
> You lost me here. But I agree, we should implement this fully, without
> hacks. I just do not know how exactly.
>
> Are you saying that we should support only row-level triggers, or that
> we should support both statement-level and row-level triggers, but
> just make sure we implement this properly? I think that my suggestion
> of using TRUNCATE and INSERT triggers is reasonable in the case of
> full refresh. This is what happens. If we would want to have
> DELETE/UPDATE/INSERT triggers, we would have to compute the diff like
> concurrent version has to do, which would defeat the difference
> between the two. But yes, all INSERT trigger calls should have NEW
> provided.
>
> So per-statement trigger would have TRUNCATE and INSERT called. And
> per-row trigger would have TRUNCATE and per-row INSERTs called.
>
>
> Mitar
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

Вложения

Re: Feature: triggers on materialized views

От
Mitar
Дата:
Hi!

So I think this makes it work great for REFRESH MATERIALIZED VIEW
CONCURRENTLY. I think we can leave empty statement triggers as they
are. I have not found a nice way to not do them.

For adding triggers to REFRESH MATERIALIZED VIEW I would need some
help and pointers. I am not sure how to write calling triggers there.
Any reference to an existing code which does something similar would
be great. So I think after swapping heaps we should call TRUNCATE
trigger and then INSERT for all new rows.


Mitar

On Mon, Dec 24, 2018 at 6:17 PM Mitar <mmitar@gmail.com> wrote:
>
> Hi!
>
> I made another version of the patch. This one does UPDATEs for changed
> row instead of DELETE/INSERT.
>
> All existing regression tests are still passing (make check).
>
>
> Mitar
>
> On Mon, Dec 24, 2018 at 4:13 PM Mitar <mmitar@gmail.com> wrote:
> >
> > Hi!
> >
> > Thanks for reply!
> >
> > On Mon, Dec 24, 2018 at 2:20 PM David Fetter <david@fetter.org> wrote:
> > > You've got the right mailing list, a description of what you want, and
> > > a PoC patch. You also got the patch in during the time between
> > > Commitfests. You're doing great!
> >
> > Great!
> >
> > One thing I am unclear about is how it is determined if this is a
> > viable feature to be eventually included. You gave me some suggestions
> > to improve in my patch (adding tests and so on). Does this mean that
> > the patch should be fully done before a decision is made?
> >
> > Also, the workflow is that I improve things, and resubmit a patch to
> > the mailing list, for now?
> >
> > > > - Currently only insert and remove operations are done on the
> > > > materialized view. This is because the current logic just removes
> > > > changed rows and inserts new rows.
> > >
> > > What other operations might you want to support?
> >
> > Update. So if a row is changing, instead of doing a remove and insert,
> > what currently is being done, I would prefer an update. Then UPDATE
> > trigger operation would happen as well. Maybe the INSERT query could
> > be changed to INSERT ... ON CONFLICT UPDATE query (not sure if this
> > one does UPDATE trigger operation on conflict), and REMOVE changed to
> > remove just rows which were really removed, but not only updated.
> >
> > > As far as you can tell, is this just an efficiency optimization, or
> > > might it go to correctness of the behavior?
> >
> > It is just an optimization. Or maybe even just a surprise. Maybe a
> > documentation addition could help here. In my use case I would loop
> > over OLD and NEW REFERENCING TABLE so if they are empty, nothing would
> > be done. But it is just surprising that DELETE trigger is called even
> > when no rows are being deleted in the materialized view.
> >
> > > I'm not sure I understand the problem being described here. Do you see
> > > these as useful to separate for some reason?
> >
> > So rows which are just updated currently get first DELETE trigger
> > called and then INSERT. The issue is that if I am observing this
> > behavior from outside, it makes it unclear when I see DELETE if this
> > means really that a row has been deleted or it just means that later
> > on an INSERT would happen. Now I have to wait for an eventual INSERT
> > to determine that. But how long should I wait? It makes consuming
> > these notifications tricky.
> >
> > If I just blindly respond to those notifications, this could introduce
> > other problems. For example, if I have a reactive web application it
> > could mean a visible flicker to the user. Instead of updating rendered
> > row, I would first delete it and then later on re-insert it.
> >
> > > > Non-concurrent refresh does not trigger any trigger. But it seems
> > > > all data to do so is there (previous table, new table), at least for
> > > > the statement-level trigger. Row-level triggers could also be
> > > > simulated probably (with TRUNCATE and INSERT triggers).
> > >
> > > Would it make more sense to fill in the missing implementations of NEW
> > > and OLD for per-row triggers instead of adding another hack?
> >
> > You lost me here. But I agree, we should implement this fully, without
> > hacks. I just do not know how exactly.
> >
> > Are you saying that we should support only row-level triggers, or that
> > we should support both statement-level and row-level triggers, but
> > just make sure we implement this properly? I think that my suggestion
> > of using TRUNCATE and INSERT triggers is reasonable in the case of
> > full refresh. This is what happens. If we would want to have
> > DELETE/UPDATE/INSERT triggers, we would have to compute the diff like
> > concurrent version has to do, which would defeat the difference
> > between the two. But yes, all INSERT trigger calls should have NEW
> > provided.
> >
> > So per-statement trigger would have TRUNCATE and INSERT called. And
> > per-row trigger would have TRUNCATE and per-row INSERTs called.
> >
> >
> > Mitar
> >
> > --
> > http://mitar.tnode.com/
> > https://twitter.com/mitar_m
>
>
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m


Re: Feature: triggers on materialized views

От
David Fetter
Дата:
On Mon, Dec 24, 2018 at 04:13:44PM -0800, Mitar wrote:
> Hi!
> 
> Thanks for reply!
> 
> On Mon, Dec 24, 2018 at 2:20 PM David Fetter <david@fetter.org> wrote:
> > You've got the right mailing list, a description of what you want, and
> > a PoC patch. You also got the patch in during the time between
> > Commitfests. You're doing great!
> 
> Great!
> 
> One thing I am unclear about is how it is determined if this is a
> viable feature to be eventually included. You gave me some suggestions
> to improve in my patch (adding tests and so on). Does this mean that
> the patch should be fully done before a decision is made?
> 
> Also, the workflow is that I improve things, and resubmit a patch to
> the mailing list, for now?
> 
> > > - Currently only insert and remove operations are done on the
> > > materialized view. This is because the current logic just removes
> > > changed rows and inserts new rows.
> >
> > What other operations might you want to support?
> 
> Update. So if a row is changing, instead of doing a remove and insert,
> what currently is being done, I would prefer an update. Then UPDATE
> trigger operation would happen as well. Maybe the INSERT query could
> be changed to INSERT ... ON CONFLICT UPDATE query (not sure if this
> one does UPDATE trigger operation on conflict), and REMOVE changed to
> remove just rows which were really removed, but not only updated.

There might be a reason it's the way it is. Looking at the commits
that introduced this might shed some light.

> > I'm not sure I understand the problem being described here. Do you see
> > these as useful to separate for some reason?
> 
> So rows which are just updated currently get first DELETE trigger
> called and then INSERT. The issue is that if I am observing this
> behavior from outside, it makes it unclear when I see DELETE if this
> means really that a row has been deleted or it just means that later
> on an INSERT would happen. Now I have to wait for an eventual INSERT
> to determine that. But how long should I wait? It makes consuming
> these notifications tricky.

If it helps you think about it better, all NOTIFICATIONs are sent on
COMMIT, i.e. you don't need to worry as much about what things should
or shouldn't have arrived. The down side, such as it is, is that they
don't convey premature knowledge about a state that may never arrive.

> If I just blindly respond to those notifications, this could introduce
> other problems. For example, if I have a reactive web application it
> could mean a visible flicker to the user. Instead of updating rendered
> row, I would first delete it and then later on re-insert it.

This is at what I hope is a level quite distinct from database
operations. Separation of concerns via the model-view-controller (or
similar) architecture and all that.

> > > Non-concurrent refresh does not trigger any trigger. But it seems
> > > all data to do so is there (previous table, new table), at least for
> > > the statement-level trigger. Row-level triggers could also be
> > > simulated probably (with TRUNCATE and INSERT triggers).
> >
> > Would it make more sense to fill in the missing implementations of NEW
> > and OLD for per-row triggers instead of adding another hack?
> 
> You lost me here. But I agree, we should implement this fully, without
> hacks. I just do not know how exactly.

Sorry I was unclear.  The SQL standard defines both transition tables,
which we have, for per-statement triggers, and transition variables,
which we don't, for per-row triggers. Here's the relevant part of the
syntax:

<trigger definition> ::=
    CREATE TRIGGER <trigger name> <trigger action time> <trigger event>
    ON <table name> [ REFERENCING <transition table or variable list> ]
    <triggered action>

<transition table or variable list> ::=
    <transition table or variable>...
<transition table or
    OLD [ ROW ] [ AS
    | NEW [ ROW ] [ AS
    | OLD TABLE [ AS ]
    | NEW TABLE [ AS ]
variable> ::=
] <old transition variable name>
] <new transition variable name>
<old transition table name>
<new transition table name>

> Are you saying that we should support only row-level triggers, or that
> we should support both statement-level and row-level triggers, but
> just make sure we implement this properly?

The latter, although we might need to defer the row-level triggers
until we support transition variables.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Feature: triggers on materialized views

От
Mitar
Дата:
Hi!

On Tue, Dec 25, 2018 at 10:03 AM David Fetter <david@fetter.org> wrote:
> If it helps you think about it better, all NOTIFICATIONs are sent on
> COMMIT, i.e. you don't need to worry as much about what things should
> or shouldn't have arrived. The down side, such as it is, is that they
> don't convey premature knowledge about a state that may never arrive.

This fact does not really help me. My client code listening to
NOTIFICATIONs does not know when some other client made COMMIT. There
is no NOTIFICATION saying "this is the end of the commit for which I
just sent you notifications".

> This is at what I hope is a level quite distinct from database
> operations. Separation of concerns via the model-view-controller (or
> similar) architecture and all that.

Of course, but garbage in, garbage out. If notifications are
superfluous then another abstraction layer has to fix them. I would
prefer if this would not have to be the case.

But it seems it is relatively easy to fix this logic and have both
INSERTs, DELETEs and UPDATEs. The patch I updated and attached
previously does that.

> Sorry I was unclear.  The SQL standard defines both transition tables,
> which we have, for per-statement triggers, and transition variables,
> which we don't, for per-row triggers.

I thought that PostgreSQL has transition variables per-row triggers,
only that it is not possible to (re)name them (are depend on the
trigger function language)? But there are OLD and NEW variables
available in per-row triggers, or equivalent?

> The latter, although we might need to defer the row-level triggers
> until we support transition variables.

Not sure how transition variables are implemented currently for
regular tables, but we could probably do the same?

Anyway, I do not know how to proceed here to implement or
statement-level or row-level triggers here. It could be just a matter
a calling some function to call them, but I am not familiar with the
codebase enough to know what. So any pointers to existing code which
does something similar would be great. So, what to call once material
views' heaps are swapped to call triggers?


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m


Re: Feature: triggers on materialized views

От
Alvaro Herrera
Дата:
On 2018-Dec-24, Mitar wrote:


> I made another version of the patch. This one does UPDATEs for changed
> row instead of DELETE/INSERT.
> 
> All existing regression tests are still passing (make check).

Okay, but you don't add any for your new feature, which is not good.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Feature: triggers on materialized views

От
Mitar
Дата:
Hi!

On Tue, Dec 25, 2018 at 6:47 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > I made another version of the patch. This one does UPDATEs for changed
> > row instead of DELETE/INSERT.
> >
> > All existing regression tests are still passing (make check).
>
> Okay, but you don't add any for your new feature, which is not good.

Yes, I have not yet done that. I want first to also add calling
triggers for non-concurrent refresh, but I would need a bit help there
(what to call, example of maybe code which does something similar
already).


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m


Re: Feature: triggers on materialized views

От
Alvaro Herrera
Дата:
On 2018-Dec-25, Mitar wrote:

> On Tue, Dec 25, 2018 at 6:47 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > > I made another version of the patch. This one does UPDATEs for changed
> > > row instead of DELETE/INSERT.
> > >
> > > All existing regression tests are still passing (make check).
> >
> > Okay, but you don't add any for your new feature, which is not good.
> 
> Yes, I have not yet done that. I want first to also add calling
> triggers for non-concurrent refresh, but I would need a bit help there
> (what to call, example of maybe code which does something similar
> already).

Well, REFRESH CONCURRENTLY runs completely different than non-concurrent
REFRESH.  The former updates the existing data by observing the
differences with the previous data; the latter simply re-runs the query
and overwrites everything.  So if you simply enabled triggers on
non-concurrent refresh, you'd just see a bunch of inserts into a
throwaway data area (a transient relfilenode, we call it), then a swap
of the MV's relfilenode with the throwaway one.  I doubt it'd be useful.
But then I'm not clear *why* you would like to do a non-concurrent
refresh.  Maybe your situation would be best served by forbidding non-
concurrent refresh when the MV contains any triggers.

Alternatively, maybe reimplement non-concurrent refresh so that it works
identically to concurrent refresh (except with a stronger lock).  Not
sure if this implies any performance penalties.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Feature: triggers on materialized views

От
Mitar
Дата:
Hi!

On Tue, Dec 25, 2018 at 7:05 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> But then I'm not clear *why* you would like to do a non-concurrent
> refresh.

I mostly wanted to support if for two reasons:

- completeness: maybe we cannot imagine the use case yet, but somebody
might in the future
- getting trigger calls for initial inserts: you can then create
materialized view without data, attach triggers, and then run a
regular refresh; this allows you to have only one code path to process
any (including initial) changes to the view through notifications,
instead of handling initial data differently

> Maybe your situation would be best served by forbidding non-
> concurrent refresh when the MV contains any triggers.

If this would be acceptable by the community, I could do it. I worry
though that one could probably get themselves into a situation where
materialized view losses all data through some WITH NO DATA operation
and concurrent refresh is not possible. Currently concurrent refresh
works only with data. We could make concurrent refresh also work when
materialized view has no data easily (it would just insert data and
not compute diff).

> Alternatively, maybe reimplement non-concurrent refresh so that it works
> identically to concurrent refresh (except with a stronger lock).  Not
> sure if this implies any performance penalties.

Ah, yes. I could just do TRUNCATE and INSERT, instead of heap swap.
That would then generate reasonable trigger calls.

Are there any existing benchmarks for such operations I could use to
see if there are any performance changes if I change implementation
here? Any guidelines how to evaluate this?


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m


Re: Feature: triggers on materialized views

От
Mitar
Дата:
Hi!

I did a bit of benchmarking. It seems my version with UPDATE takes
even slightly less time (~5%).


Mitar

On Mon, Dec 24, 2018 at 6:17 PM Mitar <mmitar@gmail.com> wrote:
>
> Hi!
>
> I made another version of the patch. This one does UPDATEs for changed
> row instead of DELETE/INSERT.
>
> All existing regression tests are still passing (make check).
>
>
> Mitar
>
> On Mon, Dec 24, 2018 at 4:13 PM Mitar <mmitar@gmail.com> wrote:
> >
> > Hi!
> >
> > Thanks for reply!
> >
> > On Mon, Dec 24, 2018 at 2:20 PM David Fetter <david@fetter.org> wrote:
> > > You've got the right mailing list, a description of what you want, and
> > > a PoC patch. You also got the patch in during the time between
> > > Commitfests. You're doing great!
> >
> > Great!
> >
> > One thing I am unclear about is how it is determined if this is a
> > viable feature to be eventually included. You gave me some suggestions
> > to improve in my patch (adding tests and so on). Does this mean that
> > the patch should be fully done before a decision is made?
> >
> > Also, the workflow is that I improve things, and resubmit a patch to
> > the mailing list, for now?
> >
> > > > - Currently only insert and remove operations are done on the
> > > > materialized view. This is because the current logic just removes
> > > > changed rows and inserts new rows.
> > >
> > > What other operations might you want to support?
> >
> > Update. So if a row is changing, instead of doing a remove and insert,
> > what currently is being done, I would prefer an update. Then UPDATE
> > trigger operation would happen as well. Maybe the INSERT query could
> > be changed to INSERT ... ON CONFLICT UPDATE query (not sure if this
> > one does UPDATE trigger operation on conflict), and REMOVE changed to
> > remove just rows which were really removed, but not only updated.
> >
> > > As far as you can tell, is this just an efficiency optimization, or
> > > might it go to correctness of the behavior?
> >
> > It is just an optimization. Or maybe even just a surprise. Maybe a
> > documentation addition could help here. In my use case I would loop
> > over OLD and NEW REFERENCING TABLE so if they are empty, nothing would
> > be done. But it is just surprising that DELETE trigger is called even
> > when no rows are being deleted in the materialized view.
> >
> > > I'm not sure I understand the problem being described here. Do you see
> > > these as useful to separate for some reason?
> >
> > So rows which are just updated currently get first DELETE trigger
> > called and then INSERT. The issue is that if I am observing this
> > behavior from outside, it makes it unclear when I see DELETE if this
> > means really that a row has been deleted or it just means that later
> > on an INSERT would happen. Now I have to wait for an eventual INSERT
> > to determine that. But how long should I wait? It makes consuming
> > these notifications tricky.
> >
> > If I just blindly respond to those notifications, this could introduce
> > other problems. For example, if I have a reactive web application it
> > could mean a visible flicker to the user. Instead of updating rendered
> > row, I would first delete it and then later on re-insert it.
> >
> > > > Non-concurrent refresh does not trigger any trigger. But it seems
> > > > all data to do so is there (previous table, new table), at least for
> > > > the statement-level trigger. Row-level triggers could also be
> > > > simulated probably (with TRUNCATE and INSERT triggers).
> > >
> > > Would it make more sense to fill in the missing implementations of NEW
> > > and OLD for per-row triggers instead of adding another hack?
> >
> > You lost me here. But I agree, we should implement this fully, without
> > hacks. I just do not know how exactly.
> >
> > Are you saying that we should support only row-level triggers, or that
> > we should support both statement-level and row-level triggers, but
> > just make sure we implement this properly? I think that my suggestion
> > of using TRUNCATE and INSERT triggers is reasonable in the case of
> > full refresh. This is what happens. If we would want to have
> > DELETE/UPDATE/INSERT triggers, we would have to compute the diff like
> > concurrent version has to do, which would defeat the difference
> > between the two. But yes, all INSERT trigger calls should have NEW
> > provided.
> >
> > So per-statement trigger would have TRUNCATE and INSERT called. And
> > per-row trigger would have TRUNCATE and per-row INSERTs called.
> >
> >
> > Mitar
> >
> > --
> > http://mitar.tnode.com/
> > https://twitter.com/mitar_m
>
>
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m


Re: Feature: triggers on materialized views

От
Alvaro Herrera
Дата:
On 2018-Dec-25, Mitar wrote:

> On Tue, Dec 25, 2018 at 7:05 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > But then I'm not clear *why* you would like to do a non-concurrent
> > refresh.
> 
> I mostly wanted to support if for two reasons:
> 
> - completeness: maybe we cannot imagine the use case yet, but somebody
> might in the future

Understood.  We don't like features that fail to work in conjunction
with other features, so this is a good goal to keep in mind.

> - getting trigger calls for initial inserts: you can then create
> materialized view without data, attach triggers, and then run a
> regular refresh; this allows you to have only one code path to process
> any (including initial) changes to the view through notifications,
> instead of handling initial data differently

Sounds like you could do this by fixing concurrent refresh to also work
when the MV is WITH NO DATA.

> > Maybe your situation would be best served by forbidding non-
> > concurrent refresh when the MV contains any triggers.
> 
> If this would be acceptable by the community, I could do it.

I think your chances are 50%/50% that this would appear acceptable.

> > Alternatively, maybe reimplement non-concurrent refresh so that it works
> > identically to concurrent refresh (except with a stronger lock).  Not
> > sure if this implies any performance penalties.
> 
> Ah, yes. I could just do TRUNCATE and INSERT, instead of heap swap.
> That would then generate reasonable trigger calls.

Right.

> Are there any existing benchmarks for such operations I could use to
> see if there are any performance changes if I change implementation
> here? Any guidelines how to evaluate this?

Not that I know of.  Typically the developer of a feature comes up with
appropriate performance tests also, targetting average and worst cases.

If the performance worsens with the different implementation, one idea
is to keep both and only use the slow one when triggers are present.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Feature: triggers on materialized views

От
Mitar
Дата:
Hi!

On Wed, Dec 26, 2018 at 4:38 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Sounds like you could do this by fixing concurrent refresh to also work
> when the MV is WITH NO DATA.

Yes, I do not think this would be too hard to fix. I could do this nevertheless.

> > Ah, yes. I could just do TRUNCATE and INSERT, instead of heap swap.
> > That would then generate reasonable trigger calls.
>
> Right.

I have tested this yesterday and performance is 2x worse that heap
swap on the benchmark I made. So I do not think this is a viable
approach.

I am now looking into simply triggering TRUNCATE and INSERT triggers
after heap swap simulating the above. I made AFTER STATEMENT triggers
and it looks like it is working, only NEW table is not populated for
some reason. Any suggestions? See attached patch.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

Вложения

Re: Feature: triggers on materialized views

От
Mitar
Дата:
Hi!

I have made an updated version of the patch, added tests and
documentation changes. This is my view now a complete patch. Please
provide any feedback or comments you might have for me to improve the
patch. I will also add it to commitfest.

A summary of the patch: This patch enables adding AFTER triggers (both
ROW and STATEMENT) on materialized views. They are fired when doing
REFRESH MATERIALIZED VIEW CONCURRENTLY for rows which have changed.
Triggers are not fired if you call REFRESH without CONCURRENTLY. This
is based on some discussion on the mailing list because implementing
it for without CONCURRENTLY would require us to add logic for firing
triggers where there was none before (and is just an efficient heap
swap).

To be able to create a materialized view without data, specify
triggers, and REFRESH CONCURRENTLY so that those triggers would be
called also for initial data, I have tested and determined that there
is no reason why REFRESH CONCURRENTLY could not be run on
uninitialized materialized view. So I removed that check and things
seem to just work. Including triggers being called for initial data. I
think this makes REFRESH CONCURRENTLY have one less special case which
is in general nicer.

I have run tests and all old tests still succeed. I have added more
tests for the new feature.

I have run benchmark to evaluate the impact of me changing
refresh_by_match_merge to do UPDATE instead of DELETE and INSERT for
rows which were just updated. In fact it seems this improves
performance slightly (4% in my benchmark, mean over 10 runs). I guess
that this is because it is cheaper to just change one column's values
(what benchmark is doing when changing rows) instead of removing and
inserting the whole row. Because REFRESH MATERIALIZED VIEW
CONCURRENTLY is meant to be used when not a lot of data has been
changed anyway, I find this a pleasantly surprising additional
improvement in this patch. I am attaching the benchmark script I have
used. I compared the time of the final refresh query in the script. (I
would love if pgbench could take a custom init script to run before
the timed part of the script.)


Mitar

On Mon, Dec 24, 2018 at 12:59 PM Mitar <mmitar@gmail.com> wrote:
>
> Hi!
>
> Based on discussion about observing changes on an open query in a
> reactive manner (to support reactive web applications) [1], I
> identified that one critical feature is missing to fully implement
> discussed design of having reactive queries be represented as
> materialized views, and changes to these materialized views would then
> be observed and pushed to the client through LISTEN/NOTIFY.
>
> This is my first time contributing to PostgreSQL, so I hope I am
> starting this process well.
>
> I would like to propose that support for AFTER triggers are added to
> materialized views. I experimented a bit and it seems this is mostly
> just a question of enabling/exposing them. See attached patch. This
> enabled me to add trigger to a material view which mostly worked. Here
> are my findings.
>
> Running REFRESH MATERIALIZED VIEW CONCURRENTLY calls triggers. Both
> per statement and per row. There are few improvements which could be
> done:
>
> - Currently only insert and remove operations are done on the
> materialized view. This is because the current logic just removes
> changed rows and inserts new rows.
> - In current concurrently refresh logic those insert and remove
> operations are made even if there are no changes to be done. Which
> triggers a statement trigger unnecessary. A small improvement could be
> to skip the statement in that case, but looking at the code this seems
> maybe tricky because both each of inserts and deletions are done
> inside one query each.
> - Current concurrently refresh logic does never do updates on existing
> rows. It would be nicer to have that so that triggers are more aligned
> with real changes to the data. So current two queries could be changed
> to three, each doing one of the insert, update, and delete.
>
> Non-concurrent refresh does not trigger any trigger. But it seems all
> data to do so is there (previous table, new table), at least for the
> statement-level trigger. Row-level triggers could also be simulated
> probably (with TRUNCATE and INSERT triggers).
>
> [1]
https://www.postgresql.org/message-id/flat/CAKLmikP%2BPPB49z8rEEvRjFOD0D2DV72KdqYN7s9fjh9sM_32ZA%40mail.gmail.com
>
>
> Mitar
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

Вложения

Re: Feature: triggers on materialized views

От
Mitar
Дата:
Hi!

One more version of the patch with slightly more deterministic tests.


Mitar

On Thu, Dec 27, 2018 at 11:43 PM Mitar <mmitar@gmail.com> wrote:
>
> Hi!
>
> I have made an updated version of the patch, added tests and
> documentation changes. This is my view now a complete patch. Please
> provide any feedback or comments you might have for me to improve the
> patch. I will also add it to commitfest.
>
> A summary of the patch: This patch enables adding AFTER triggers (both
> ROW and STATEMENT) on materialized views. They are fired when doing
> REFRESH MATERIALIZED VIEW CONCURRENTLY for rows which have changed.
> Triggers are not fired if you call REFRESH without CONCURRENTLY. This
> is based on some discussion on the mailing list because implementing
> it for without CONCURRENTLY would require us to add logic for firing
> triggers where there was none before (and is just an efficient heap
> swap).
>
> To be able to create a materialized view without data, specify
> triggers, and REFRESH CONCURRENTLY so that those triggers would be
> called also for initial data, I have tested and determined that there
> is no reason why REFRESH CONCURRENTLY could not be run on
> uninitialized materialized view. So I removed that check and things
> seem to just work. Including triggers being called for initial data. I
> think this makes REFRESH CONCURRENTLY have one less special case which
> is in general nicer.
>
> I have run tests and all old tests still succeed. I have added more
> tests for the new feature.
>
> I have run benchmark to evaluate the impact of me changing
> refresh_by_match_merge to do UPDATE instead of DELETE and INSERT for
> rows which were just updated. In fact it seems this improves
> performance slightly (4% in my benchmark, mean over 10 runs). I guess
> that this is because it is cheaper to just change one column's values
> (what benchmark is doing when changing rows) instead of removing and
> inserting the whole row. Because REFRESH MATERIALIZED VIEW
> CONCURRENTLY is meant to be used when not a lot of data has been
> changed anyway, I find this a pleasantly surprising additional
> improvement in this patch. I am attaching the benchmark script I have
> used. I compared the time of the final refresh query in the script. (I
> would love if pgbench could take a custom init script to run before
> the timed part of the script.)
>
>
> Mitar
>
> On Mon, Dec 24, 2018 at 12:59 PM Mitar <mmitar@gmail.com> wrote:
> >
> > Hi!
> >
> > Based on discussion about observing changes on an open query in a
> > reactive manner (to support reactive web applications) [1], I
> > identified that one critical feature is missing to fully implement
> > discussed design of having reactive queries be represented as
> > materialized views, and changes to these materialized views would then
> > be observed and pushed to the client through LISTEN/NOTIFY.
> >
> > This is my first time contributing to PostgreSQL, so I hope I am
> > starting this process well.
> >
> > I would like to propose that support for AFTER triggers are added to
> > materialized views. I experimented a bit and it seems this is mostly
> > just a question of enabling/exposing them. See attached patch. This
> > enabled me to add trigger to a material view which mostly worked. Here
> > are my findings.
> >
> > Running REFRESH MATERIALIZED VIEW CONCURRENTLY calls triggers. Both
> > per statement and per row. There are few improvements which could be
> > done:
> >
> > - Currently only insert and remove operations are done on the
> > materialized view. This is because the current logic just removes
> > changed rows and inserts new rows.
> > - In current concurrently refresh logic those insert and remove
> > operations are made even if there are no changes to be done. Which
> > triggers a statement trigger unnecessary. A small improvement could be
> > to skip the statement in that case, but looking at the code this seems
> > maybe tricky because both each of inserts and deletions are done
> > inside one query each.
> > - Current concurrently refresh logic does never do updates on existing
> > rows. It would be nicer to have that so that triggers are more aligned
> > with real changes to the data. So current two queries could be changed
> > to three, each doing one of the insert, update, and delete.
> >
> > Non-concurrent refresh does not trigger any trigger. But it seems all
> > data to do so is there (previous table, new table), at least for the
> > statement-level trigger. Row-level triggers could also be simulated
> > probably (with TRUNCATE and INSERT triggers).
> >
> > [1]
https://www.postgresql.org/message-id/flat/CAKLmikP%2BPPB49z8rEEvRjFOD0D2DV72KdqYN7s9fjh9sM_32ZA%40mail.gmail.com
> >
> >
> > Mitar
> >
> > --
> > http://mitar.tnode.com/
> > https://twitter.com/mitar_m
>
>
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

Вложения

Re: Feature: triggers on materialized views

От
Mitar
Дата:
Hi!

Hm, why in commitfest it does not display the latest patch?

https://commitfest.postgresql.org/21/1953/

It does display correctly the latest e-mail, but not the link to the patch. :-(


Mitar

On Thu, Dec 27, 2018 at 11:51 PM Mitar <mmitar@gmail.com> wrote:
>
> Hi!
>
> One more version of the patch with slightly more deterministic tests.
>
>
> Mitar
>
> On Thu, Dec 27, 2018 at 11:43 PM Mitar <mmitar@gmail.com> wrote:
> >
> > Hi!
> >
> > I have made an updated version of the patch, added tests and
> > documentation changes. This is my view now a complete patch. Please
> > provide any feedback or comments you might have for me to improve the
> > patch. I will also add it to commitfest.
> >
> > A summary of the patch: This patch enables adding AFTER triggers (both
> > ROW and STATEMENT) on materialized views. They are fired when doing
> > REFRESH MATERIALIZED VIEW CONCURRENTLY for rows which have changed.
> > Triggers are not fired if you call REFRESH without CONCURRENTLY. This
> > is based on some discussion on the mailing list because implementing
> > it for without CONCURRENTLY would require us to add logic for firing
> > triggers where there was none before (and is just an efficient heap
> > swap).
> >
> > To be able to create a materialized view without data, specify
> > triggers, and REFRESH CONCURRENTLY so that those triggers would be
> > called also for initial data, I have tested and determined that there
> > is no reason why REFRESH CONCURRENTLY could not be run on
> > uninitialized materialized view. So I removed that check and things
> > seem to just work. Including triggers being called for initial data. I
> > think this makes REFRESH CONCURRENTLY have one less special case which
> > is in general nicer.
> >
> > I have run tests and all old tests still succeed. I have added more
> > tests for the new feature.
> >
> > I have run benchmark to evaluate the impact of me changing
> > refresh_by_match_merge to do UPDATE instead of DELETE and INSERT for
> > rows which were just updated. In fact it seems this improves
> > performance slightly (4% in my benchmark, mean over 10 runs). I guess
> > that this is because it is cheaper to just change one column's values
> > (what benchmark is doing when changing rows) instead of removing and
> > inserting the whole row. Because REFRESH MATERIALIZED VIEW
> > CONCURRENTLY is meant to be used when not a lot of data has been
> > changed anyway, I find this a pleasantly surprising additional
> > improvement in this patch. I am attaching the benchmark script I have
> > used. I compared the time of the final refresh query in the script. (I
> > would love if pgbench could take a custom init script to run before
> > the timed part of the script.)
> >
> >
> > Mitar
> >
> > On Mon, Dec 24, 2018 at 12:59 PM Mitar <mmitar@gmail.com> wrote:
> > >
> > > Hi!
> > >
> > > Based on discussion about observing changes on an open query in a
> > > reactive manner (to support reactive web applications) [1], I
> > > identified that one critical feature is missing to fully implement
> > > discussed design of having reactive queries be represented as
> > > materialized views, and changes to these materialized views would then
> > > be observed and pushed to the client through LISTEN/NOTIFY.
> > >
> > > This is my first time contributing to PostgreSQL, so I hope I am
> > > starting this process well.
> > >
> > > I would like to propose that support for AFTER triggers are added to
> > > materialized views. I experimented a bit and it seems this is mostly
> > > just a question of enabling/exposing them. See attached patch. This
> > > enabled me to add trigger to a material view which mostly worked. Here
> > > are my findings.
> > >
> > > Running REFRESH MATERIALIZED VIEW CONCURRENTLY calls triggers. Both
> > > per statement and per row. There are few improvements which could be
> > > done:
> > >
> > > - Currently only insert and remove operations are done on the
> > > materialized view. This is because the current logic just removes
> > > changed rows and inserts new rows.
> > > - In current concurrently refresh logic those insert and remove
> > > operations are made even if there are no changes to be done. Which
> > > triggers a statement trigger unnecessary. A small improvement could be
> > > to skip the statement in that case, but looking at the code this seems
> > > maybe tricky because both each of inserts and deletions are done
> > > inside one query each.
> > > - Current concurrently refresh logic does never do updates on existing
> > > rows. It would be nicer to have that so that triggers are more aligned
> > > with real changes to the data. So current two queries could be changed
> > > to three, each doing one of the insert, update, and delete.
> > >
> > > Non-concurrent refresh does not trigger any trigger. But it seems all
> > > data to do so is there (previous table, new table), at least for the
> > > statement-level trigger. Row-level triggers could also be simulated
> > > probably (with TRUNCATE and INSERT triggers).
> > >
> > > [1]
https://www.postgresql.org/message-id/flat/CAKLmikP%2BPPB49z8rEEvRjFOD0D2DV72KdqYN7s9fjh9sM_32ZA%40mail.gmail.com
> > >
> > >
> > > Mitar
> > >
> > > --
> > > http://mitar.tnode.com/
> > > https://twitter.com/mitar_m
> >
> >
> >
> > --
> > http://mitar.tnode.com/
> > https://twitter.com/mitar_m
>
>
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m


Re: Feature: triggers on materialized views

От
Mitar
Дата:
Hi!

False alarm. It just looks like updating patches takes longer than
updating e-mails.


Mitar

On Fri, Dec 28, 2018 at 12:11 AM Mitar <mmitar@gmail.com> wrote:
>
> Hi!
>
> Hm, why in commitfest it does not display the latest patch?
>
> https://commitfest.postgresql.org/21/1953/
>
> It does display correctly the latest e-mail, but not the link to the patch. :-(
>
>
> Mitar
>
> On Thu, Dec 27, 2018 at 11:51 PM Mitar <mmitar@gmail.com> wrote:
> >
> > Hi!
> >
> > One more version of the patch with slightly more deterministic tests.
> >
> >
> > Mitar
> >
> > On Thu, Dec 27, 2018 at 11:43 PM Mitar <mmitar@gmail.com> wrote:
> > >
> > > Hi!
> > >
> > > I have made an updated version of the patch, added tests and
> > > documentation changes. This is my view now a complete patch. Please
> > > provide any feedback or comments you might have for me to improve the
> > > patch. I will also add it to commitfest.
> > >
> > > A summary of the patch: This patch enables adding AFTER triggers (both
> > > ROW and STATEMENT) on materialized views. They are fired when doing
> > > REFRESH MATERIALIZED VIEW CONCURRENTLY for rows which have changed.
> > > Triggers are not fired if you call REFRESH without CONCURRENTLY. This
> > > is based on some discussion on the mailing list because implementing
> > > it for without CONCURRENTLY would require us to add logic for firing
> > > triggers where there was none before (and is just an efficient heap
> > > swap).
> > >
> > > To be able to create a materialized view without data, specify
> > > triggers, and REFRESH CONCURRENTLY so that those triggers would be
> > > called also for initial data, I have tested and determined that there
> > > is no reason why REFRESH CONCURRENTLY could not be run on
> > > uninitialized materialized view. So I removed that check and things
> > > seem to just work. Including triggers being called for initial data. I
> > > think this makes REFRESH CONCURRENTLY have one less special case which
> > > is in general nicer.
> > >
> > > I have run tests and all old tests still succeed. I have added more
> > > tests for the new feature.
> > >
> > > I have run benchmark to evaluate the impact of me changing
> > > refresh_by_match_merge to do UPDATE instead of DELETE and INSERT for
> > > rows which were just updated. In fact it seems this improves
> > > performance slightly (4% in my benchmark, mean over 10 runs). I guess
> > > that this is because it is cheaper to just change one column's values
> > > (what benchmark is doing when changing rows) instead of removing and
> > > inserting the whole row. Because REFRESH MATERIALIZED VIEW
> > > CONCURRENTLY is meant to be used when not a lot of data has been
> > > changed anyway, I find this a pleasantly surprising additional
> > > improvement in this patch. I am attaching the benchmark script I have
> > > used. I compared the time of the final refresh query in the script. (I
> > > would love if pgbench could take a custom init script to run before
> > > the timed part of the script.)
> > >
> > >
> > > Mitar
> > >
> > > On Mon, Dec 24, 2018 at 12:59 PM Mitar <mmitar@gmail.com> wrote:
> > > >
> > > > Hi!
> > > >
> > > > Based on discussion about observing changes on an open query in a
> > > > reactive manner (to support reactive web applications) [1], I
> > > > identified that one critical feature is missing to fully implement
> > > > discussed design of having reactive queries be represented as
> > > > materialized views, and changes to these materialized views would then
> > > > be observed and pushed to the client through LISTEN/NOTIFY.
> > > >
> > > > This is my first time contributing to PostgreSQL, so I hope I am
> > > > starting this process well.
> > > >
> > > > I would like to propose that support for AFTER triggers are added to
> > > > materialized views. I experimented a bit and it seems this is mostly
> > > > just a question of enabling/exposing them. See attached patch. This
> > > > enabled me to add trigger to a material view which mostly worked. Here
> > > > are my findings.
> > > >
> > > > Running REFRESH MATERIALIZED VIEW CONCURRENTLY calls triggers. Both
> > > > per statement and per row. There are few improvements which could be
> > > > done:
> > > >
> > > > - Currently only insert and remove operations are done on the
> > > > materialized view. This is because the current logic just removes
> > > > changed rows and inserts new rows.
> > > > - In current concurrently refresh logic those insert and remove
> > > > operations are made even if there are no changes to be done. Which
> > > > triggers a statement trigger unnecessary. A small improvement could be
> > > > to skip the statement in that case, but looking at the code this seems
> > > > maybe tricky because both each of inserts and deletions are done
> > > > inside one query each.
> > > > - Current concurrently refresh logic does never do updates on existing
> > > > rows. It would be nicer to have that so that triggers are more aligned
> > > > with real changes to the data. So current two queries could be changed
> > > > to three, each doing one of the insert, update, and delete.
> > > >
> > > > Non-concurrent refresh does not trigger any trigger. But it seems all
> > > > data to do so is there (previous table, new table), at least for the
> > > > statement-level trigger. Row-level triggers could also be simulated
> > > > probably (with TRUNCATE and INSERT triggers).
> > > >
> > > > [1]
https://www.postgresql.org/message-id/flat/CAKLmikP%2BPPB49z8rEEvRjFOD0D2DV72KdqYN7s9fjh9sM_32ZA%40mail.gmail.com
> > > >
> > > >
> > > > Mitar
> > > >
> > > > --
> > > > http://mitar.tnode.com/
> > > > https://twitter.com/mitar_m
> > >
> > >
> > >
> > > --
> > > http://mitar.tnode.com/
> > > https://twitter.com/mitar_m
> >
> >
> >
> > --
> > http://mitar.tnode.com/
> > https://twitter.com/mitar_m
>
>
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m


Re: Feature: triggers on materialized views

От
Peter Eisentraut
Дата:
On 28/12/2018 08:43, Mitar wrote:
> A summary of the patch: This patch enables adding AFTER triggers (both
> ROW and STATEMENT) on materialized views. They are fired when doing
> REFRESH MATERIALIZED VIEW CONCURRENTLY for rows which have changed.

What bothers me about this patch is that it subtly changes what a
trigger means.  It currently means, say, INSERT was executed on this
table.  You are expanding that to mean, a row was inserted into this
table -- somehow.

Triggers should generally refer to user-facing commands.  Could you not
make a trigger on REFRESH itself?

> Triggers are not fired if you call REFRESH without CONCURRENTLY. This
> is based on some discussion on the mailing list because implementing
> it for without CONCURRENTLY would require us to add logic for firing
> triggers where there was none before (and is just an efficient heap
> swap).

This is also a problem, because it would allow bypassing the trigger
accidentally.

Moreover, consider that there could be updatable materialized views,
just like there are updatable normal views.  And there could be triggers
on those updatable materialized views.  Those would look similar but
work quite differently from what you are proposing here.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Feature: triggers on materialized views

От
Mitar
Дата:
Hi!

I am new to contributing to PostgreSQL and this is my first time
having patches in commit fest, so I am not sure about details of the
process here, but I assume that replying and discuss the patch during
this period is one of the actives, so I am replying to the comment. If
I should wait or something like that, please advise.

On Fri, Jan 4, 2019 at 3:23 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> > A summary of the patch: This patch enables adding AFTER triggers (both
> > ROW and STATEMENT) on materialized views. They are fired when doing
> > REFRESH MATERIALIZED VIEW CONCURRENTLY for rows which have changed.
>
> What bothers me about this patch is that it subtly changes what a
> trigger means.  It currently means, say, INSERT was executed on this
> table.  You are expanding that to mean, a row was inserted into this
> table -- somehow.

Aren't almost all statements these days generated by some sort of
automatic logic? Which generates those INSERTs and then you get
triggers on them? I am not sure where is this big difference in your
view coming from? Triggers are not defined as "user-made INSERT was
executed on this table". I think it has always been defined as "INSERT
modified this table", no matter where this insert came from (from
user, from some other trigger, by backup process). I mean, this is the
beauty of declarative programming. You define it once and you do not
care who triggers it.

Materialized views are anyway just built-in implementation of tables +
triggers to rerun the query. You could reconstruct them manually. And
why would not triggers be called if tables is being modified through
INSERTs? So if PostgreSQL has such a feature, why make it limited and
artificially make it less powerful? It is literally not possible to
have triggers only because there is "if trigger on a materialized
view, throw an exception".

> Triggers should generally refer to user-facing commands

So triggers on table A are not run when some other trigger from table
B decides to insert data into table A? Not true. I think triggers have
never cared who and where an INSERT came from. They just trigger. From
user, from another trigger, or from some built-in PostgreSQL procedure
called REFRESH.

> Could you not make a trigger on REFRESH itself?

If you mean if I could simulate this somehow before or after I call
REFRESH, then not really. I would not have access to previous state of
the table to compute the diff anymore. Moreover, I would have to
recompute the diff again, when REFRESH already did it once.

I could implement materialized views myself using regular tables and
triggers. And then have triggers after change on that table. But this
sounds very sad.

Or, are you saying that we should introduce a whole new type of of
trigger, REFRESH trigger, which would be valid only on materialized
views, and get OLD and NEW relations for previous and old state? I
think this could be an option, but it would require much more work,
and more changes to API. Is this what community would prefer?

> This is also a problem, because it would allow bypassing the trigger
> accidentally.

Sure, this is why it is useful to explain that CONCURRENT REFRESH uses
INSERT/UPDATE/DELETE and this is why you get triggers, and REFRESH
does not (but it is faster). I explained this in documentation.

But yes, this is downside. I checked the idea of calling row-level
triggers after regular REFRESH, but it seems it will introduce a lot
of overhead and special handling. I tried implementing it as TRUNCATE
+ INSERTS instead of heap swap and it is 2x slower.

> Moreover, consider that there could be updatable materialized views,
> just like there are updatable normal views.  And there could be triggers
> on those updatable materialized views.  Those would look similar but
> work quite differently from what you are proposing here.

Hm, not really. I would claim they would behave exactly the same.
AFTER trigger on INSERT on a materialized view would trigger for rows
which have changed through user updating materialized view directly,
or by calling CONCURRENT REFRESH which inserted a row. In both cases
the same trigger would run because materialized view had a row
inserted. Pretty nice.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m


Re: Feature: triggers on materialized views

От
Nguyễn Trần Quốc Vinh
Дата:
Dear,

You can try https://github.com/ntqvinh/PgMvIncrementalUpdate to generate triggers in C for incremental updates of matviews.

For asynchronous updates, the tool does generate the triggers for collecting updated/inserted/deleted rows and then the codes for doing incremental updating as well.

Tks and best regards,

Vinh



On Sat, Jan 5, 2019 at 5:10 AM Mitar <mmitar@gmail.com> wrote:
Hi!

I am new to contributing to PostgreSQL and this is my first time
having patches in commit fest, so I am not sure about details of the
process here, but I assume that replying and discuss the patch during
this period is one of the actives, so I am replying to the comment. If
I should wait or something like that, please advise.

On Fri, Jan 4, 2019 at 3:23 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> > A summary of the patch: This patch enables adding AFTER triggers (both
> > ROW and STATEMENT) on materialized views. They are fired when doing
> > REFRESH MATERIALIZED VIEW CONCURRENTLY for rows which have changed.
>
> What bothers me about this patch is that it subtly changes what a
> trigger means.  It currently means, say, INSERT was executed on this
> table.  You are expanding that to mean, a row was inserted into this
> table -- somehow.

Aren't almost all statements these days generated by some sort of
automatic logic? Which generates those INSERTs and then you get
triggers on them? I am not sure where is this big difference in your
view coming from? Triggers are not defined as "user-made INSERT was
executed on this table". I think it has always been defined as "INSERT
modified this table", no matter where this insert came from (from
user, from some other trigger, by backup process). I mean, this is the
beauty of declarative programming. You define it once and you do not
care who triggers it.

Materialized views are anyway just built-in implementation of tables +
triggers to rerun the query. You could reconstruct them manually. And
why would not triggers be called if tables is being modified through
INSERTs? So if PostgreSQL has such a feature, why make it limited and
artificially make it less powerful? It is literally not possible to
have triggers only because there is "if trigger on a materialized
view, throw an exception".

> Triggers should generally refer to user-facing commands

So triggers on table A are not run when some other trigger from table
B decides to insert data into table A? Not true. I think triggers have
never cared who and where an INSERT came from. They just trigger. From
user, from another trigger, or from some built-in PostgreSQL procedure
called REFRESH.

> Could you not make a trigger on REFRESH itself?

If you mean if I could simulate this somehow before or after I call
REFRESH, then not really. I would not have access to previous state of
the table to compute the diff anymore. Moreover, I would have to
recompute the diff again, when REFRESH already did it once.

I could implement materialized views myself using regular tables and
triggers. And then have triggers after change on that table. But this
sounds very sad.

Or, are you saying that we should introduce a whole new type of of
trigger, REFRESH trigger, which would be valid only on materialized
views, and get OLD and NEW relations for previous and old state? I
think this could be an option, but it would require much more work,
and more changes to API. Is this what community would prefer?

> This is also a problem, because it would allow bypassing the trigger
> accidentally.

Sure, this is why it is useful to explain that CONCURRENT REFRESH uses
INSERT/UPDATE/DELETE and this is why you get triggers, and REFRESH
does not (but it is faster). I explained this in documentation.

But yes, this is downside. I checked the idea of calling row-level
triggers after regular REFRESH, but it seems it will introduce a lot
of overhead and special handling. I tried implementing it as TRUNCATE
+ INSERTS instead of heap swap and it is 2x slower.

> Moreover, consider that there could be updatable materialized views,
> just like there are updatable normal views.  And there could be triggers
> on those updatable materialized views.  Those would look similar but
> work quite differently from what you are proposing here.

Hm, not really. I would claim they would behave exactly the same.
AFTER trigger on INSERT on a materialized view would trigger for rows
which have changed through user updating materialized view directly,
or by calling CONCURRENT REFRESH which inserted a row. In both cases
the same trigger would run because materialized view had a row
inserted. Pretty nice.


Mitar

--
http://mitar.tnode.com/
https://twitter.com/mitar_m

Re: Feature: triggers on materialized views

От
Mitar
Дата:
Hi!

On Sat, Jan 5, 2019 at 2:53 AM Nguyễn Trần Quốc Vinh
<ntquocvinh@gmail.com> wrote:
> You can try https://github.com/ntqvinh/PgMvIncrementalUpdate to generate triggers in C for incremental updates of
matviews.
>
> For asynchronous updates, the tool does generate the triggers for collecting updated/inserted/deleted rows and then
thecodes for doing incremental updating as well. 

Thank you for sharing this. This looks interesting, but I could not
test it myself (not using Windows), so I just read through the code.

Having better updating of materialized views using incremental
approach would really benefit my use case as well. Then triggers being
added through my patch here on materialized view itself could
communicate those changes which were done to the client. If I
understand things correctly, this IVM would benefit the speed of how
quickly we can do refreshes, and also if would allow that we call
refresh on a materialized view for every change on the source tables,
knowing exactly what we have to update in the materialized view.
Really cool. I also see that there was recently more discussion about
IVM on the mailing list. [1]

[1] https://www.postgresql.org/message-id/flat/20181227215726.4d166b4874f8983a641123f5%40sraoss.co.jp
[2] https://www.postgresql.org/message-id/flat/FC784A9F-F599-4DCC-A45D-DBF6FA582D30@QQdd.eu


Mitar

--
http://mitar.tnode.com/
https://twitter.com/mitar_m


Re: Re: Feature: triggers on materialized views

От
David Steele
Дата:
On 1/5/19 11:57 PM, Mitar wrote:
> 
> Having better updating of materialized views using incremental
> approach would really benefit my use case as well. Then triggers being
> added through my patch here on materialized view itself could
> communicate those changes which were done to the client. If I
> understand things correctly, this IVM would benefit the speed of how
> quickly we can do refreshes, and also if would allow that we call
> refresh on a materialized view for every change on the source tables,
> knowing exactly what we have to update in the materialized view.
> Really cool. I also see that there was recently more discussion about
> IVM on the mailing list. [1]

There doesn't seem to be consensus on whether or not we want this patch. 
  Peter has issues with the way it works and Andres [1] thinks it should 
be pushed to PG13 or possibly rejected.

I'll push this to PG13 for now.

Regards,
-- 
-David
david@pgmasters.net

[1] 
https://www.postgresql.org/message-id/20190216054526.zss2cufdxfeudr4i%40alap3.anarazel.de


Re: Re: Feature: triggers on materialized views

От
Mitar
Дата:
Hi!

On Thu, Mar 7, 2019 at 12:13 AM David Steele <david@pgmasters.net> wrote:
> There doesn't seem to be consensus on whether or not we want this patch.
> Peter has issues with the way it works and Andres [1] thinks it should
> be pushed to PG13 or possibly rejected.
>
> I'll push this to PG13 for now.

Sorry, I am new to PostgreSQL development process. So this has been
pushed for maybe (if at all) release planned for 2020 and is not
anymore in consideration for PG12 to be released this year? From my
very inexperienced eye this looks like a very far push. What is
expected to happen in the year which would make it clearer if this is
something which has a chance of going and/or what should be improved,
if improving is even an option? I worry that nothing will happen for a
year and we will all forget about this and then we will be all to
square zero.

I must say that i do not really see a reason why this would not be
included. I mean, materialized views are really just a sugar on top of
having a table you refresh with a stored query, and if that table can
have triggers, why not also a materialized view.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m


Re: Feature: triggers on materialized views

От
David Steele
Дата:
On 3/14/19 12:05 PM, Mitar wrote:
> Hi!
> 
> On Thu, Mar 7, 2019 at 12:13 AM David Steele <david@pgmasters.net> wrote:
>> There doesn't seem to be consensus on whether or not we want this patch.
>> Peter has issues with the way it works and Andres [1] thinks it should
>> be pushed to PG13 or possibly rejected.
>>
>> I'll push this to PG13 for now.
> 
> Sorry, I am new to PostgreSQL development process. So this has been
> pushed for maybe (if at all) release planned for 2020 and is not
> anymore in consideration for PG12 to be released this year? From my
> very inexperienced eye this looks like a very far push. What is
> expected to happen in the year which would make it clearer if this is
> something which has a chance of going and/or what should be improved,
> if improving is even an option? I worry that nothing will happen for a
> year and we will all forget about this and then we will be all to
> square zero.
> 
> I must say that i do not really see a reason why this would not be
> included. I mean, materialized views are really just a sugar on top of
> having a table you refresh with a stored query, and if that table can
> have triggers, why not also a materialized view.

The reason is that you have not gotten any committer support for this 
patch or attracted significant review, that I can see.  On the contrary, 
three committers have expressed doubts about all or some of the patch 
and it doesn't seem to me that their issues have been addressed.

This is also a relatively new patch which makes large changes -- we 
generally like to get those in earlier than the second-to-last CF.

I can only spend so much time looking at each patch, so Peter, Álvaro, 
or Andres are welcome to jump in and let me know if I have it wrong.

What you need to be doing for PG13 is very specifically addressing 
committer concerns and gathering a consensus that the behavior of this 
patch is the way to go.

Regards,
-- 
-David
david@pgmasters.net


Re: Feature: triggers on materialized views

От
Mitar
Дата:
Hi!

On Fri, Mar 15, 2019 at 2:46 AM David Steele <david@pgmasters.net> wrote:
> The reason is that you have not gotten any committer support for this
> patch or attracted significant review, that I can see.  On the contrary,
> three committers have expressed doubts about all or some of the patch
> and it doesn't seem to me that their issues have been addressed.

To my understanding many comments were to early version of the patch
which were or addressed or explained why proposed changes not work
(use TRUNCATE/INSERT instead of swapping heads is slower). If I missed
anything and have not addressed it, please point this out.

The only pending/unaddressed comment is about the philosophical
question of what it means to be a trigger. There it seems we simply
disagree with the reviewer and I do not know how to address that. I
just see this as a very pragmatical feature which provides features
you would have if you would not use PostgreSQL abstraction. If you can
use features then, why not also with the abstraction?

So in my view this looks more like a lack of review feedback on the
latest version of the patch. I really do not know how to ask for more
feedback or to move the philosophical discussion further. I thought
that commit fest is in fact exactly a place to motivate and collect
such feedback instead of waiting for it in the limbo.

> What you need to be doing for PG13 is very specifically addressing
> committer concerns and gathering a consensus that the behavior of this
> patch is the way to go.

To my understanding the current patch addresses all concerns made by
reviewers on older versions of the patch, or explains why proposals
cannot work out, modulo the question of "does this change what trigger
is".

Moreover, it improves performance of CONCURRENT REFRESH for about 5%
based on my tests because of the split to INSERT/UPDATE/DELETE instead
of TRUNCATE/INSERT, when measuring across a mixed set of queries which
include just UPDATEs to source tables.

Thank you everyone who is involved in this process for your time, I do
appreciate. I am just trying to explain that I am a bit at loss on
concrete next steps I could take here.

Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m


Re: Feature: triggers on materialized views

От
David Steele
Дата:
On 3/15/19 8:15 PM, Mitar wrote:
> 
> The only pending/unaddressed comment is about the philosophical
> question of what it means to be a trigger. There it seems we simply
> disagree with the reviewer and I do not know how to address that. I
> just see this as a very pragmatical feature which provides features
> you would have if you would not use PostgreSQL abstraction. If you can
> use features then, why not also with the abstraction?

This seems to be a pretty big deal to me.  When the reviewer is also a 
committer I think you need to give serious thought to their objections.

> So in my view this looks more like a lack of review feedback on the
> latest version of the patch. I really do not know how to ask for more
> feedback or to move the philosophical discussion further. I thought
> that commit fest is in fact exactly a place to motivate and collect
> such feedback instead of waiting for it in the limbo.

Yes, but sometimes these things take time.

>> What you need to be doing for PG13 is very specifically addressing
>> committer concerns and gathering a consensus that the behavior of this
>> patch is the way to go.
> 
> To my understanding the current patch addresses all concerns made by
> reviewers on older versions of the patch, or explains why proposals
> cannot work out, modulo the question of "does this change what trigger
> is".

Still a pretty important question...

> Thank you everyone who is involved in this process for your time, I do
> appreciate. I am just trying to explain that I am a bit at loss on
> concrete next steps I could take here.

This is the last commitfest, so committers and reviewers are focused on 
what is most likely to make it into PG12.  Your patch does not seem to 
be attracting the attention it needs to make it into this release.

Regards,
-- 
-David
david@pgmasters.net


Re: Feature: triggers on materialized views

От
Robert Haas
Дата:
On Tue, Dec 25, 2018 at 10:05 PM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Well, REFRESH CONCURRENTLY runs completely different than non-concurrent
> REFRESH.  The former updates the existing data by observing the
> differences with the previous data; the latter simply re-runs the query
> and overwrites everything.  So if you simply enabled triggers on
> non-concurrent refresh, you'd just see a bunch of inserts into a
> throwaway data area (a transient relfilenode, we call it), then a swap
> of the MV's relfilenode with the throwaway one.  I doubt it'd be useful.
> But then I'm not clear *why* you would like to do a non-concurrent
> refresh.  Maybe your situation would be best served by forbidding non-
> concurrent refresh when the MV contains any triggers.
>
> Alternatively, maybe reimplement non-concurrent refresh so that it works
> identically to concurrent refresh (except with a stronger lock).  Not
> sure if this implies any performance penalties.

Sorry to jump in late, but all of this sounds very strange to me.
It's possible for either concurrent or non-concurrent refresh to be
faster, depending on the circumstances; for example, if a concurrent
refresh would end up deleting all the rows and inserting them again, I
think that could be slower than just blowing all the data away and
starting over.  So disabling non-concurrent refresh sounds like a bad
idea.  For the same reason, reimplementing it to work like a
concurrent refresh also sounds like a bad idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Feature: triggers on materialized views

От
Robert Haas
Дата:
On Fri, Jan 4, 2019 at 6:23 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> What bothers me about this patch is that it subtly changes what a
> trigger means.  It currently means, say, INSERT was executed on this
> table.  You are expanding that to mean, a row was inserted into this
> table -- somehow.

Yeah.  The fact that a concurrent refresh currently does DELETE+INSERT
rather than UPDATE is currently an implementation detail.  If you
allow users to hook up triggers to the inserts, then suddenly it's no
longer an implementation detail: it is a user-visible behavior that
can't be changed in the future without breaking backward
compatibility.

> Triggers should generally refer to user-facing commands.  Could you not
> make a trigger on REFRESH itself?

I'm not sure that would help with the use case... but that seems like
something to think about, especially if it could use the transition
table machinery somehow.

> > Triggers are not fired if you call REFRESH without CONCURRENTLY. This
> > is based on some discussion on the mailing list because implementing
> > it for without CONCURRENTLY would require us to add logic for firing
> > triggers where there was none before (and is just an efficient heap
> > swap).
>
> This is also a problem, because it would allow bypassing the trigger
> accidentally.
>
> Moreover, consider that there could be updatable materialized views,
> just like there are updatable normal views.  And there could be triggers
> on those updatable materialized views.  Those would look similar but
> work quite differently from what you are proposing here.

Yeah.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Feature: triggers on materialized views

От
Michael Paquier
Дата:
On Thu, Mar 21, 2019 at 03:41:08PM -0400, Robert Haas wrote:
> Yeah.  The fact that a concurrent refresh currently does DELETE+INSERT
> rather than UPDATE is currently an implementation detail.  If you
> allow users to hook up triggers to the inserts, then suddenly it's no
> longer an implementation detail: it is a user-visible behavior that
> can't be changed in the future without breaking backward
> compatibility.

We are visibly going nowhere for this thread for v12, so I have marked
the proposal as returned with feedback.
--
Michael

Вложения