Обсуждение: Re: [HACKERS] transition table behavior with inheritance appearsbroken (was: Declarative partitioning - another take)

Поиск
Список
Период
Сортировка
On Mon, May 1, 2017 at 12:51 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> What we could document now is that partitioned tables don't allow
> specifying triggers that reference transition tables.  Although, I am
> wondering where this note really belongs - the partitioning chapter, the
> triggers chapter or the CREATE TRIGGER reference page?  Maybe, Kevin and
> Thomas have something to say about that.  If it turns out that the
> partitioning chapter is a good place, here is a documentation patch.

I think that before we document this behavior, we'd better make sure
we understand exactly what the behavior is, and we'd better make sure
it's correct.  Currently, triggers that involve transition tables are
altogether prohibited when the root relation is partitioned, but are
allowed in inheritance cases.  However, the actual behavior appears to
be buggy.  Here's what happens when I define a parent and a child and
update all the rows:

rhaas=# CREATE FUNCTION t() RETURNS trigger
rhaas-#     LANGUAGE plpgsql
rhaas-#     AS $$declare q record; begin raise notice 'table %',
tg_table_name; for q in select * from old loop raise notice 'table %
got value %', tg_table_name, q.a; end loop; return null; end;$$;
CREATE FUNCTION
rhaas=# CREATE TABLE p (a int, b text);
CREATE TABLE
rhaas=# CREATE TABLE p1 () INHERITS (p);
CREATE TABLE
rhaas=# CREATE TRIGGER x AFTER UPDATE ON p REFERENCING OLD TABLE AS
old NEW TABLE AS new FOR EACH STATEMENT EXECUTE PROCEDURE t();
CREATE TRIGGER
rhaas=# INSERT INTO p VALUES (0, 'zero');
INSERT 0 1
rhaas=# INSERT INTO p1 VALUES (1, 'one');
INSERT 0 1
rhaas=# INSERT INTO p1 VALUES (2, 'two');
INSERT 0 1
rhaas=# UPDATE p SET b = 'whatever';
NOTICE:  table p
NOTICE:  table p got value 0
UPDATE 3

Only the rows in the parent show up in the transition table.  But now
look what happens if I add an unrelated trigger that also uses
transition tables to the children:

rhaas=# CREATE FUNCTION u() RETURNS trigger LANGUAGE plpgsql AS
$$begin null; end$$;
CREATE FUNCTION
rhaas=# CREATE TRIGGER x1 AFTER UPDATE ON p1 REFERENCING OLD TABLE AS
old NEW TABLE AS new FOR EACH STATEMENT EXECUTE PROCEDURE u();
CREATE TRIGGER
rhaas=# UPDATE p SET b = 'whatever';
NOTICE:  table p
NOTICE:  table p got value 0
NOTICE:  table p got value 1
NOTICE:  table p got value 2
UPDATE 3

It seems pretty clear to me that this is busted.  The existence of
trigger x1 on p1 shouldn't affect whether trigger x on p sees changes
to p1's rows in its transition tables.  Either all changes to any
descendants of p should be captured by the transition tables, or only
changes to the root table should be captured.  If we do the former,
the restriction against using transition tables in triggers on
partitioned tables should be removed, I would think.  If we do the
latter, then what we should document is not that partitioned tables
have a restriction that doesn't apply to inheritance but rather that
the restriction on the partitioned case flows from the fact that only
the parent's changes are captured, and the parent is always empty in
the partitioning case.  In deciding between these two cases, we should
consider the case where the inheritance children have extra columns
and/or different column orderings.

Adding this as an open item.  Kevin?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Mon, May 1, 2017 at 10:01 AM, Robert Haas <robertmhaas@gmail.com> wrote:

> It seems pretty clear to me that this is busted.

I don't think you actually tested anything that is dependent on any
of my patches there.

> Adding this as an open item.  Kevin?

It will take some time to establish what legacy behavior is and how
the new transition tables are impacted.  My first reaction is that a
trigger on the parent should fire for any related action on a child
(unless maybe the trigger is defined with an ONLY keyword???) using
the TupleDesc of the parent.  Note that the SQL spec mandates that
even in a AFTER EACH ROW trigger the transition tables must
represent all rows affected by the STATEMENT.  I think that this
should be independent of triggers fired at the row level.  I think
the rules should be similar for updateable views.

This will take some time to investigate, discuss and produce a
patch.  I think best case is Friday.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/



On Mon, May 1, 2017 at 12:10 PM, Kevin Grittner <kgrittn@gmail.com> wrote:
> On Mon, May 1, 2017 at 10:01 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> It seems pretty clear to me that this is busted.
>
> I don't think you actually tested anything that is dependent on any
> of my patches there.

I was testing which rows show up in a transition table, so I assumed
that was related to the transition tables patch.  Note that this is
not about which triggers are fired, just about how inheritance
interacts with transition tables.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Mon, May 1, 2017 at 11:53 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, May 1, 2017 at 12:10 PM, Kevin Grittner <kgrittn@gmail.com> wrote:
>> On Mon, May 1, 2017 at 10:01 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> It seems pretty clear to me that this is busted.
>>
>> I don't think you actually tested anything that is dependent on any
>> of my patches there.
>
> I was testing which rows show up in a transition table, so I assumed
> that was related to the transition tables patch.  Note that this is
> not about which triggers are fired, just about how inheritance
> interacts with transition tables.

Yeah, I got confused a bit there, comparing to the updateable views case.

-- 
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/



On Tue, May 2, 2017 at 3:01 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> [...]
> Only the rows in the parent show up in the transition table.  But now
> look what happens if I add an unrelated trigger that also uses
> transition tables to the children:
>
> rhaas=# CREATE FUNCTION u() RETURNS trigger LANGUAGE plpgsql AS
> $$begin null; end$$;
> CREATE FUNCTION
> rhaas=# CREATE TRIGGER x1 AFTER UPDATE ON p1 REFERENCING OLD TABLE AS
> old NEW TABLE AS new FOR EACH STATEMENT EXECUTE PROCEDURE u();
> CREATE TRIGGER
> rhaas=# UPDATE p SET b = 'whatever';
> NOTICE:  table p
> NOTICE:  table p got value 0
> NOTICE:  table p got value 1
> NOTICE:  table p got value 2
> UPDATE 3
>
> It seems pretty clear to me that this is busted.  The existence of
> trigger x1 on p1 shouldn't affect whether trigger x on p sees changes
> to p1's rows in its transition tables.

Yikes -- agreed.  See analysis and draft patch for discussion below.

> Either all changes to any
> descendants of p should be captured by the transition tables, or only
> changes to the root table should be captured.  If we do the former,
> the restriction against using transition tables in triggers on
> partitioned tables should be removed, I would think.  If we do the
> latter, then what we should document is not that partitioned tables
> have a restriction that doesn't apply to inheritance but rather that
> the restriction on the partitioned case flows from the fact that only
> the parent's changes are captured, and the parent is always empty in
> the partitioning case.  In deciding between these two cases, we should
> consider the case where the inheritance children have extra columns
> and/or different column orderings.

I think that we should only capture transition tuples captured from
the explicitly named relation, since we only fire AFTER STATEMENT
triggers on that relation.  I see no inconsistency with the policy of
rejecting transition tables on partitioned tables (as I proposed and
Kevin accepted[1]), because partitioned tables can't have any data so
there would be no point.  In contrast, every parent table in an
inheritance hierarchy is also a regular table and can hold data, so I
think we should allow transition tables on them, and capture
transition tuples from that table only when you modify it directly.

The transition table infrastructure only supports exactly one relation
being modified at each query level, and it's a bug that this example
captures tuples from p1 into the tuplestore intended for p's tuples
even though it is not even going to fire the after statement trigger
on p1.  It's only a coincidence that the tuples have compatible
TupleDescs.

The pre-existing behaviour for triggers with inheritance is that
STATEMENT triggers fire only for the directly named relation, but ROW
triggers fire for all affected relations.  The transition table patch
didn't change that, but it added code to AfterTriggerSaveEvent, which
is called by ExecAR(Insert|Update|Delete)Triggers, to capture
transitions.  That gets called for every updated relation (ie
including partitions and inheritance sub-tables) to implement the ROW
policy.  It needs to be taught not to capture transition tuples from
relations except the one directly named.

One solution to this problem is for nodeModifyTable.c to tell the
ExecAR* functions explicitly whether to capture transition tuples.  It
knows when it has modified the explicitly named relation in a
hierarchy (mt_whichplan == 0) without rerouting via a partitioned
table (mt_partition_dispatch_info == NULL).  See attached patch for
discussion (it lacks tests and needs better comments).  Does this make
sense?  Do you see a better way?

[1] https://www.postgresql.org/message-id/CACjxUsNhdm4ZCgaVreLK5kAwHTZUkqJAVXiywwi-HNVsuTLMnA%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения
On Tue, May 2, 2017 at 9:44 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> I think that we should only capture transition tuples captured from
> the explicitly named relation, since we only fire AFTER STATEMENT
> triggers on that relation.  I see no inconsistency with the policy of
> rejecting transition tables on partitioned tables (as I proposed and
> Kevin accepted[1]), because partitioned tables can't have any data so
> there would be no point.  In contrast, every parent table in an
> inheritance hierarchy is also a regular table and can hold data, so I
> think we should allow transition tables on them, and capture
> transition tuples from that table only when you modify it directly.

I suspect that most users would find it more useful to capture all of
the rows that the statement actually touched, regardless of whether
they hit the named table or an inheritance child.  I just don't know
if it's practical to make that work.  (And, of course, I don't know if
other people agree with my assessment of what is useful ... but
generally there seems to be support for making partitioned tables, at
least, look more like a single table that happens to have partitions
and less like a bunch of separate tables attached to each other with
duct tape.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Robert Haas wrote:

> I suspect that most users would find it more useful to capture all of
> the rows that the statement actually touched, regardless of whether
> they hit the named table or an inheritance child.

Yes, agreed.  For the plain inheritance cases each row would need to
have an indicator of which relation it comes from (tableoid); I'm not
sure if such a thing would be useful in the partitioning case.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On Wed, May 3, 2017 at 12:02 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Robert Haas wrote:
>> I suspect that most users would find it more useful to capture all of
>> the rows that the statement actually touched, regardless of whether
>> they hit the named table or an inheritance child.
>
> Yes, agreed.  For the plain inheritance cases each row would need to
> have an indicator of which relation it comes from (tableoid); I'm not
> sure if such a thing would be useful in the partitioning case.

I think it would be about equally useful.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Wed, May 03, 2017 at 11:47:04AM -0400, Robert Haas wrote:
> On Tue, May 2, 2017 at 9:44 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
> > I think that we should only capture transition tuples captured from
> > the explicitly named relation, since we only fire AFTER STATEMENT
> > triggers on that relation.  I see no inconsistency with the policy of
> > rejecting transition tables on partitioned tables (as I proposed and
> > Kevin accepted[1]), because partitioned tables can't have any data so
> > there would be no point.  In contrast, every parent table in an
> > inheritance hierarchy is also a regular table and can hold data, so I
> > think we should allow transition tables on them, and capture
> > transition tuples from that table only when you modify it directly.
> 
> I suspect that most users would find it more useful to capture all of
> the rows that the statement actually touched, regardless of whether
> they hit the named table or an inheritance child.  I just don't know
> if it's practical to make that work.  (And, of course, I don't know if
> other people agree with my assessment of what is useful ... but
> generally there seems to be support for making partitioned tables, at
> least, look more like a single table that happens to have partitions
> and less like a bunch of separate tables attached to each other with
> duct tape.)

+1 on the not-duct-tape view of partitioned tables.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



On Thu, May 4, 2017 at 4:02 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Robert Haas wrote:
>> I suspect that most users would find it more useful to capture all of
>> the rows that the statement actually touched, regardless of whether
>> they hit the named table or an inheritance child.
>
> Yes, agreed.  For the plain inheritance cases each row would need to
> have an indicator of which relation it comes from (tableoid); I'm not
> sure if such a thing would be useful in the partitioning case.

On Thu, May 4, 2017 at 4:26 AM, David Fetter <david@fetter.org> wrote:
> +1 on the not-duct-tape view of partitioned tables.

Hmm.  Ok.  Are we talking about PG10 or PG11 here?  Does this approach
makes sense?

1.  Remove the prohibition on creating transition-capturing triggers
on a partitioned table.

2.  Make sure that the ExecAR* functions call AfterTriggerSaveEvent
when modifying partition tables if the explicitly named parent
partitioned table has after triggers with transition tables.  Not sure
how exactly how but doesn't seem difficult.

3.  Convert tuples to the TupleDesc of the relation that owns the
statement trigger (ie the partitioned table) when inserting them into
the tuplestore.  One way to do that might be to build an array of
TupleConversionMap objects that does the opposite of the conversions
done by tup_conv_maps.  While tup_conv_maps is for converting tuples
to the layout needed for a partition, tup_unconv_maps (or better name)
would be for converting the old and new tuples to the TupleDesc of the
partitioned table.  Then the appropriate TupleConversionMap could be
passed into the ExecAR* functions as a new argument 'transition_map'.
AfterTriggerSaveEvent would use 'oldtup' and 'newtup' directly for ROW
triggers, but convert using the passed in map if it needs to insert
them into the transition tuplestores.

The same thing could work for inheritance, if tupconvert.c had a new
kind of conversion that allows slicing of tuples (converting a wider
child table's tuples to the parent's subset of columns) rather the
just conversion between logically equivalent TupleDescs.

To avoid the whiff of duct tape, we'd probably also want to make ROW
triggers created on the partitioned table(s) containing partition to
fire too, with appropriate TypeConversionMap treatment.  Not sure what
exactly is involved there.

On the other hand, doing that with inheritance hierarchies would be an
incompatible behavioural change, which I guess people don't want -- am
I right?

-- 
Thomas Munro
http://www.enterprisedb.com



On Thu, May 4, 2017 at 4:46 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Thu, May 4, 2017 at 4:02 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>> Robert Haas wrote:
>>> I suspect that most users would find it more useful to capture all of
>>> the rows that the statement actually touched, regardless of whether
>>> they hit the named table or an inheritance child.
>>
>> Yes, agreed.  For the plain inheritance cases each row would need to
>> have an indicator of which relation it comes from (tableoid); I'm not
>> sure if such a thing would be useful in the partitioning case.
>
> On Thu, May 4, 2017 at 4:26 AM, David Fetter <david@fetter.org> wrote:
>> +1 on the not-duct-tape view of partitioned tables.
>
> Hmm.  Ok.  Are we talking about PG10 or PG11 here?  Does this approach
> makes sense?

I was thinking PG10 if it can be done straightforwardly.

> 1.  Remove the prohibition on creating transition-capturing triggers
> on a partitioned table.
>
> 2.  Make sure that the ExecAR* functions call AfterTriggerSaveEvent
> when modifying partition tables if the explicitly named parent
> partitioned table has after triggers with transition tables.  Not sure
> how exactly how but doesn't seem difficult.
>
> 3.  Convert tuples to the TupleDesc of the relation that owns the
> statement trigger (ie the partitioned table) when inserting them into
> the tuplestore.  One way to do that might be to build an array of
> TupleConversionMap objects that does the opposite of the conversions
> done by tup_conv_maps.  While tup_conv_maps is for converting tuples
> to the layout needed for a partition, tup_unconv_maps (or better name)
> would be for converting the old and new tuples to the TupleDesc of the
> partitioned table.  Then the appropriate TupleConversionMap could be
> passed into the ExecAR* functions as a new argument 'transition_map'.
> AfterTriggerSaveEvent would use 'oldtup' and 'newtup' directly for ROW
> triggers, but convert using the passed in map if it needs to insert
> them into the transition tuplestores.
>
> The same thing could work for inheritance, if tupconvert.c had a new
> kind of conversion that allows slicing of tuples (converting a wider
> child table's tuples to the parent's subset of columns) rather the
> just conversion between logically equivalent TupleDescs.
>
> To avoid the whiff of duct tape, we'd probably also want to make ROW
> triggers created on the partitioned table(s) containing partition to
> fire too, with appropriate TypeConversionMap treatment.  Not sure what
> exactly is involved there.
>
> On the other hand, doing that with inheritance hierarchies would be an
> incompatible behavioural change, which I guess people don't want -- am
> I right?

Incompatible with what?  Transition tables haven't been released yet.
If we're going to fix the definition of what they do, now's the time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Fri, May 5, 2017 at 2:40 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, May 4, 2017 at 4:46 AM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> On Thu, May 4, 2017 at 4:02 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>>> Robert Haas wrote:
>>>> I suspect that most users would find it more useful to capture all of
>>>> the rows that the statement actually touched, regardless of whether
>>>> they hit the named table or an inheritance child.
>>>
>>> Yes, agreed.  For the plain inheritance cases each row would need to
>>> have an indicator of which relation it comes from (tableoid); I'm not
>>> sure if such a thing would be useful in the partitioning case.
>>
>> On Thu, May 4, 2017 at 4:26 AM, David Fetter <david@fetter.org> wrote:
>>> +1 on the not-duct-tape view of partitioned tables.
>>
>> Hmm.  Ok.  Are we talking about PG10 or PG11 here?  Does this approach
>> makes sense?
>
> I was thinking PG10 if it can be done straightforwardly.

Ok, I will draft a patch to do it the way I described and see what people think.

>> To avoid the whiff of duct tape, we'd probably also want to make ROW
>> triggers created on the partitioned table(s) containing partition to
>> fire too, with appropriate TypeConversionMap treatment.  Not sure what
>> exactly is involved there.
>>
>> On the other hand, doing that with inheritance hierarchies would be an
>> incompatible behavioural change, which I guess people don't want -- am
>> I right?
>
> Incompatible with what?  Transition tables haven't been released yet.
> If we're going to fix the definition of what they do, now's the time.

The last two paragraphs of my email were about ROW triggers created on
partitioned tables and tables with inheritance children, not the new
transition table stuff.  I will forget that for now and look only at
making the transition tables duct-tape-free.

-- 
Thomas Munro
http://www.enterprisedb.com



On Mon, May 01, 2017 at 11:10:52AM -0500, Kevin Grittner wrote:
> On Mon, May 1, 2017 at 10:01 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> 
> > It seems pretty clear to me that this is busted.
> 
> I don't think you actually tested anything that is dependent on any
> of my patches there.
> 
> > Adding this as an open item.  Kevin?
> 
> It will take some time to establish what legacy behavior is and how
> the new transition tables are impacted.  My first reaction is that a
> trigger on the parent should fire for any related action on a child
> (unless maybe the trigger is defined with an ONLY keyword???) using
> the TupleDesc of the parent.  Note that the SQL spec mandates that
> even in a AFTER EACH ROW trigger the transition tables must
> represent all rows affected by the STATEMENT.  I think that this
> should be independent of triggers fired at the row level.  I think
> the rules should be similar for updateable views.
> 
> This will take some time to investigate, discuss and produce a
> patch.  I think best case is Friday.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Kevin,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com



On Fri, May 5, 2017 at 8:29 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Fri, May 5, 2017 at 2:40 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, May 4, 2017 at 4:46 AM, Thomas Munro
>> <thomas.munro@enterprisedb.com> wrote:
>>> On Thu, May 4, 2017 at 4:02 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>>>> Robert Haas wrote:
>>>>> I suspect that most users would find it more useful to capture all of
>>>>> the rows that the statement actually touched, regardless of whether
>>>>> they hit the named table or an inheritance child.
>>>>
>>>> Yes, agreed.  For the plain inheritance cases each row would need to
>>>> have an indicator of which relation it comes from (tableoid); I'm not
>>>> sure if such a thing would be useful in the partitioning case.
>>>
>>> On Thu, May 4, 2017 at 4:26 AM, David Fetter <david@fetter.org> wrote:
>>>> +1 on the not-duct-tape view of partitioned tables.
>>>
>>> Hmm.  Ok.  Are we talking about PG10 or PG11 here?  Does this approach
>>> makes sense?
>>
>> I was thinking PG10 if it can be done straightforwardly.
>
> Ok, I will draft a patch to do it the way I described and see what people think.

FYI I am still working on this and will post a draft patch to do this
(that is: make transition tables capture changes from children with
appropriate tuple conversion) in the next 24 hours.

-- 
Thomas Munro
http://www.enterprisedb.com



On Mon, May 8, 2017 at 7:09 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Fri, May 5, 2017 at 8:29 AM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> On Fri, May 5, 2017 at 2:40 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Thu, May 4, 2017 at 4:46 AM, Thomas Munro
>>> <thomas.munro@enterprisedb.com> wrote:
>>>> On Thu, May 4, 2017 at 4:02 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>>>>> Robert Haas wrote:
>>>>>> I suspect that most users would find it more useful to capture all of
>>>>>> the rows that the statement actually touched, regardless of whether
>>>>>> they hit the named table or an inheritance child.
>>>>>
>>>>> Yes, agreed.  For the plain inheritance cases each row would need to
>>>>> have an indicator of which relation it comes from (tableoid); I'm not
>>>>> sure if such a thing would be useful in the partitioning case.
>>>>
>>>> On Thu, May 4, 2017 at 4:26 AM, David Fetter <david@fetter.org> wrote:
>>>>> +1 on the not-duct-tape view of partitioned tables.
>>>>
>>>> Hmm.  Ok.  Are we talking about PG10 or PG11 here?  Does this approach
>>>> makes sense?
>>>
>>> I was thinking PG10 if it can be done straightforwardly.
>>
>> Ok, I will draft a patch to do it the way I described and see what people think.
>
> FYI I am still working on this and will post a draft patch to do this
> (that is: make transition tables capture changes from children with
> appropriate tuple conversion) in the next 24 hours.

Ok, here is a first swing at it, for discussion.

In master, the decision to populate transition tables happens in
AfterTriggerSaveEvent (called by the ExecAR* functions) in trigger.c.
It does that if it sees that there are triggers on the
relation-being-modified that have transition tables.

With this patch, nodeModifyTuple.c makes a 'TriggerTransitionFilter'
object to override that behaviour, if there are child tables of either
kind.  That is passed to the ExecAR* functions and thence
AfterTriggerSaveEvent, overriding its usual behaviour, so that it can
know that it needs collect tuples from child nodes and how to convert
them to the layout needed for the tuplestores if necessary.

Thoughts?  I'm not yet sure about the locking situation.  Generally
needs some more testing.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения
On Tue, May 9, 2017 at 10:29 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> In master, the decision to populate transition tables happens in
> AfterTriggerSaveEvent (called by the ExecAR* functions) in trigger.c.
> It does that if it sees that there are triggers on the
> relation-being-modified that have transition tables.
>
> With this patch, nodeModifyTuple.c makes a 'TriggerTransitionFilter'
> object to override that behaviour, if there are child tables of either
> kind.  That is passed to the ExecAR* functions and thence
> AfterTriggerSaveEvent, overriding its usual behaviour, so that it can
> know that it needs collect tuples from child nodes and how to convert
> them to the layout needed for the tuplestores if necessary.
>
> Thoughts?  I'm not yet sure about the locking situation.  Generally
> needs some more testing.

Here is a new version with tidied up tests and a couple of small bug
fixes.  However, I've realised that there is a surprising behaviour
with this approach, and I'm not sure what to do about it.

Recall that transition tables can be specified for statement-level
triggers AND row-level triggers.  If you specify them for row-level
triggers, then they can see all rows changed so far each time they
fire.  Now our policy of firing the statement level triggers only for
the named relation but firing row-level triggers for all modified
relations leads to a tricky problem for the inheritance case: what
type of transition tuples should the child table's row-level triggers
see?

Suppose you have an inheritance hierarchy like this:

 animal
  -> mammal
      -> cat

You define a statement-level trigger on "animal" and another
statement-level trigger on "mammal".  You define a row-level trigger
on "cat".  When you update either "animal" or "mammal", the row
triggers on "cat" might run.  Row-level triggers on "cat" see OLD and
NEW as "cat" tuples, of course, but if they are configured to see
transition tables, should they see "cat", "mammal" or "animal" tuples
in the transition tables?  With my patch as it is, that depends on
which level of the hierarchy you explicitly updated!

No such problem exists for partition hierarchies since the tables all
appear as the same type to user code (though conversions may be
happening for technical reasons).

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения
Thomas Munro wrote:

> Recall that transition tables can be specified for statement-level
> triggers AND row-level triggers.  If you specify them for row-level
> triggers, then they can see all rows changed so far each time they
> fire.

Uhmm ... why do we do this?  It seems like a great way to cause much
confusion.  Shouldn't we see the transition table containing the whole
set for statement-level triggers only, and give row-level triggers just
the individual affected row each time?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On Wed, May 10, 2017 at 9:57 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Thomas Munro wrote:
>
>> Recall that transition tables can be specified for statement-level
>> triggers AND row-level triggers.  If you specify them for row-level
>> triggers, then they can see all rows changed so far each time they
>> fire.
>
> Uhmm ... why do we do this?  It seems like a great way to cause much
> confusion.  Shouldn't we see the transition table containing the whole
> set for statement-level triggers only, and give row-level triggers just
> the individual affected row each time?

I assumed that had come from the standard.  I don't have a published
standard, but I have just taken a quick look at one of the publicly
available drafts dated 2006.  I think its model is that the transition
tables are always conceptually there, and NEW and OLD are just range
variables over those tables.  That may explain why transition tables
are mentioned in the context of row-level triggers, and it may be that
the spec's authors never intended row-level triggers to be able to see
the (partial) transition table other than through the range variables
that access exactly one row, but I don't see any wording that
explicitly says so in the spec.  Do you?  Thoughts, Kevin?

After thinking about this some more, it's not only the conversion to
some arbitrary parent tuple type that would be surprising to a user of
inheritance + triggers + transition tables, it's the fact that a
row-level trigger on a given child table will also see tuples
collected from other tables in the hierarchy.  I think I didn't quite
get that right in -v2: it should probably build
TriggerTransitionFilter from union of all child tables' transition
table flags, not just the named table, so that if any child table has
a row trigger we start collecting transition tuples from all others
for it to see...  That sounds pretty crazy to me.

So, assuming we want to proceed with this plan of collecting
transition tuples from children, I see approximately 3 choices:

1.  Reject transition tables on row-level triggers.
2.  Reject transition tables on row-level triggers on tables that
inherit from other tables.
3.  Continue to allow transition tables on row-level triggers, but
document that they must be prepared to see transition tuples as they
would when querying arbitrary parent tables, and see tuples from other
tables in the hierarchy (!)

Another possibility which I haven't seriously considered is per-table
transition tables so you'd collect each child's tuples separately.

I take Alvaro's comment as a vote for 1.  I vote for 1 too.

-- 
Thomas Munro
http://www.enterprisedb.com



On 2017/05/10 6:51, Thomas Munro wrote:
> No such problem exists for partition hierarchies since the tables all
> appear as the same type to user code (though conversions may be
> happening for technical reasons).

To clarify a bit, there may exist differences in the ordering of columns,
either between the parent and its partitions or between different
partitions.  For example, while parent's rowtype is (a int, b char, c
float), a partition's may be (b char, a int, c float), and yet another
partition may have (c float, a int, b char).  If some user code happens to
depend on the ordering of columns, selecting from the parent and selecting
from a partition directly may return the same result but in different formats.

Thanks,
Amit




On Wed, May 10, 2017 at 2:31 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> On 2017/05/10 6:51, Thomas Munro wrote:
>> No such problem exists for partition hierarchies since the tables all
>> appear as the same type to user code (though conversions may be
>> happening for technical reasons).
>
> To clarify a bit, there may exist differences in the ordering of columns,
> either between the parent and its partitions or between different
> partitions.  For example, while parent's rowtype is (a int, b char, c
> float), a partition's may be (b char, a int, c float), and yet another
> partition may have (c float, a int, b char).  If some user code happens to
> depend on the ordering of columns, selecting from the parent and selecting
> from a partition directly may return the same result but in different formats.

Right.  And the patch I posted converts all transition tuples it
collects from child tables to match the TupleDescriptor of the
relation you named, which it gets from
estate->es_root_result_relations[0].  Is that right?  I suppose it
will be very common for partitions to have matching TupleDescriptors,
so the TupleConversionMap will usually be NULL meaning no conversion
is ever done.  But in the inheritance case they might be different on
purpose, and in both inheritance and partitioning cases they might be
different in physical ways that aren't logically important as you said
(column order, dropped columns).

-- 
Thomas Munro
http://www.enterprisedb.com



On Tue, May 9, 2017 at 11:40 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Wed, May 10, 2017 at 2:31 PM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> On 2017/05/10 6:51, Thomas Munro wrote:
>>> No such problem exists for partition hierarchies since the tables all
>>> appear as the same type to user code (though conversions may be
>>> happening for technical reasons).
>>
>> To clarify a bit, there may exist differences in the ordering of columns,
>> either between the parent and its partitions or between different
>> partitions.  For example, while parent's rowtype is (a int, b char, c
>> float), a partition's may be (b char, a int, c float), and yet another
>> partition may have (c float, a int, b char).  If some user code happens to
>> depend on the ordering of columns, selecting from the parent and selecting
>> from a partition directly may return the same result but in different formats.
>
> Right.  And the patch I posted converts all transition tuples it
> collects from child tables to match the TupleDescriptor of the
> relation you named, which it gets from
> estate->es_root_result_relations[0].  Is that right?  I suppose it
> will be very common for partitions to have matching TupleDescriptors,
> so the TupleConversionMap will usually be NULL meaning no conversion
> is ever done.  But in the inheritance case they might be different on
> purpose, and in both inheritance and partitioning cases they might be
> different in physical ways that aren't logically important as you said
> (column order, dropped columns).

Hmm.  What if the partitioning hierarchy contains foreign tables?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Wed, May 10, 2017 at 11:22 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Wed, May 10, 2017 at 9:57 AM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
>> Thomas Munro wrote:
>>
>>> Recall that transition tables can be specified for statement-level
>>> triggers AND row-level triggers.  If you specify them for row-level
>>> triggers, then they can see all rows changed so far each time they
>>> fire.
>>
>> Uhmm ... why do we do this?  It seems like a great way to cause much
>> confusion.  Shouldn't we see the transition table containing the whole
>> set for statement-level triggers only, and give row-level triggers just
>> the individual affected row each time?
>
> I assumed that had come from the standard.  I don't have a published
> standard, but I have just taken a quick look at one of the publicly
> available drafts dated 2006.  I think its model is that the transition
> tables are always conceptually there, and NEW and OLD are just range
> variables over those tables.  That may explain why transition tables
> are mentioned in the context of row-level triggers, and it may be that
> the spec's authors never intended row-level triggers to be able to see
> the (partial) transition table other than through the range variables
> that access exactly one row, but I don't see any wording that
> explicitly says so in the spec.  Do you?  Thoughts, Kevin?

Hmm.  DB2 has transition tables (invented them maybe?) and it allows
OLD/NEW TABLE on row-level triggers:

https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.1.0/com.ibm.db2.luw.admin.dbobj.doc/doc/t0020236.html

-- 
Thomas Munro
http://www.enterprisedb.com



On Tue, May 9, 2017 at 11:48 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Hmm.  DB2 has transition tables (invented them maybe?) and it allows
> OLD/NEW TABLE on row-level triggers:
>
> https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.1.0/com.ibm.db2.luw.admin.dbobj.doc/doc/t0020236.html

Yeah, my impression is that Kevin was pretty keen on supporting that
case.  I couldn't say exactly why, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Wed, May 10, 2017 at 3:55 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, May 9, 2017 at 11:48 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> Hmm.  DB2 has transition tables (invented them maybe?) and it allows
>> OLD/NEW TABLE on row-level triggers:
>>
>> https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.1.0/com.ibm.db2.luw.admin.dbobj.doc/doc/t0020236.html
>
> Yeah, my impression is that Kevin was pretty keen on supporting that
> case.  I couldn't say exactly why, though.

Ok, here's a new version that handles row-level triggers with
transition tables on any child table.  The regression tests show
partition and inheritance examples of that.  To be clear about what
this does:

1.  If you attach a row-level trigger with transition tables to any
partition, it will see transition tuples from all partitions that were
modified by the same statement.

2.  If you attach a row-level trigger with transition tables to any
inheritance child, it will see transition tuples from all tables in
the inheritance hierarchy at or below the directly named table that
were modified by the same statement, sliced so that they appear as
tuples from the directly named table.

On Wed, May 10, 2017 at 3:41 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Hmm.  What if the partitioning hierarchy contains foreign tables?

Arghalalkjhsdflg.  Looking into that...

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения
On Wed, May 10, 2017 at 11:10 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> 2.  If you attach a row-level trigger with transition tables to any
> inheritance child, it will see transition tuples from all tables in
> the inheritance hierarchy at or below the directly named table that
> were modified by the same statement, sliced so that they appear as
> tuples from the directly named table.

Of course that's a bit crazy, not only for trigger authors to
understand and deal with, but also for plan caching: it just doesn't
really make sense to have a database object, even an ephemeral one,
whose type changes depending on how the trigger was invoked, because
the plans stick around.  Perhaps you could modify NamedTuplestorescan
to convert on the fly to the TupleDesc of the table that the row-level
trigger is attached to, using NULL for missing columns, but that'd be
a slightly strange too, depending on how you did it.

Perhaps we should reject row-level triggers with transition tables on
tables that are part of an inheritance hierarchy, but allow them for
partitions.

-- 
Thomas Munro
http://www.enterprisedb.com



On Wed, May 10, 2017 at 8:02 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Wed, May 10, 2017 at 11:10 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> 2.  If you attach a row-level trigger with transition tables to any
>> inheritance child, it will see transition tuples from all tables in
>> the inheritance hierarchy at or below the directly named table that
>> were modified by the same statement, sliced so that they appear as
>> tuples from the directly named table.
>
> Of course that's a bit crazy, not only for trigger authors to
> understand and deal with, but also for plan caching: it just doesn't
> really make sense to have a database object, even an ephemeral one,
> whose type changes depending on how the trigger was invoked, because
> the plans stick around.  Perhaps you could modify NamedTuplestorescan
> to convert on the fly to the TupleDesc of the table that the row-level
> trigger is attached to, using NULL for missing columns, but that'd be
> a slightly strange too, depending on how you did it.

I don't think it's crazy from a user perspective, but the plan caching
thing sounds like a problem.

> Perhaps we should reject row-level triggers with transition tables on
> tables that are part of an inheritance hierarchy, but allow them for
> partitions.

Sounds like a sensible solution.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Sat, May 06, 2017 at 06:54:37PM +0000, Noah Misch wrote:
> On Mon, May 01, 2017 at 11:10:52AM -0500, Kevin Grittner wrote:
> > On Mon, May 1, 2017 at 10:01 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > 
> > > It seems pretty clear to me that this is busted.
> > 
> > I don't think you actually tested anything that is dependent on any
> > of my patches there.
> > 
> > > Adding this as an open item.  Kevin?
> > 
> > It will take some time to establish what legacy behavior is and how
> > the new transition tables are impacted.  My first reaction is that a
> > trigger on the parent should fire for any related action on a child
> > (unless maybe the trigger is defined with an ONLY keyword???) using
> > the TupleDesc of the parent.  Note that the SQL spec mandates that
> > even in a AFTER EACH ROW trigger the transition tables must
> > represent all rows affected by the STATEMENT.  I think that this
> > should be independent of triggers fired at the row level.  I think
> > the rules should be similar for updateable views.
> > 
> > This will take some time to investigate, discuss and produce a
> > patch.  I think best case is Friday.
> 
> [Action required within three days.  This is a generic notification.]
> 
> The above-described topic is currently a PostgreSQL 10 open item.  Kevin,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> v10 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within three calendar days of
> this message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping v10.  Consequently, I will appreciate your efforts
> toward speedy resolution.  Thanks.
> 
> [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com



On Thu, May 11, 2017 at 3:38 AM, Noah Misch <noah@leadboat.com> wrote:
> This PostgreSQL 10 open item is past due for your status update.  Kindly send
> a status update within 24 hours, and include a date for your subsequent status
> update.  Refer to the policy on open item ownership:
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

Kevin has not posted to this mailing list since May 3rd.  I don't know
whether he's gone on vacation or been without email access for some
other reason, but I think we'd better assume that he's not likely to
respond to emails demanding immediate action regardless of how many of
them we send.

I'm not prepared to write a patch for this issue, but it seems like
Thomas is on top of that.  If nobody else steps up to the plate I
guess I'm willing to take responsibility for reviewing and committing
that patch once it's in final form, but at this point I don't think
it's going to be possible to get that done before Monday's planned
wrap.

In formal terms, if Kevin forfeits ownership of this item and nobody
else volunteers to adopt it, put me down as owner with a next-update
date of Friday, May 19th, the day after beta1 is expected to ship.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



[Apologies to all for my recent absence from community lists, and
special thanks to Thomas and Robert for picking up the slack.]

On Tue, May 9, 2017 at 4:51 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Tue, May 9, 2017 at 10:29 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:

> Recall that transition tables can be specified for statement-level
> triggers AND row-level triggers.  If you specify them for row-level
> triggers, then they can see all rows changed so far each time they
> fire.

No, they see all rows from the statement, each time.

test=# create table t (c int not null);
CREATE TABLE
test=# create function t_func()
test-#   returns trigger
test-#   language plpgsql
test-# as $$
test$# begin
test$#   raise notice '% / % = %',
test$#                new.c,
test$#                (select sum(c) from n),
test$#                (select new.c::float / sum(n.c) from n);
test$#   return null;
test$# end;
test$# $$;
CREATE FUNCTION
test=# create trigger t_trig
test-#   after insert or update on t
test-#   referencing new table as n
test-#   for each row
test-#   execute procedure t_func();
CREATE TRIGGER
test=# insert into t select generate_series(1,5);
NOTICE:  1 / 15 = 0.0666666666666667
NOTICE:  2 / 15 = 0.133333333333333
NOTICE:  3 / 15 = 0.2
NOTICE:  4 / 15 = 0.266666666666667
NOTICE:  5 / 15 = 0.333333333333333
INSERT 0 5

This behavior is required for this feature by the SQL standard.

> Now our policy of firing the statement level triggers only for
> the named relation but firing row-level triggers for all modified
> relations leads to a tricky problem for the inheritance case: what
> type of transition tuples should the child table's row-level triggers
> see?

The record format for the object on which the trigger was declared, IMO.

> Suppose you have an inheritance hierarchy like this:
>
>  animal
>   -> mammal
>       -> cat
>
> You define a statement-level trigger on "animal" and another
> statement-level trigger on "mammal".  You define a row-level trigger
> on "cat".  When you update either "animal" or "mammal", the row
> triggers on "cat" might run.  Row-level triggers on "cat" see OLD and
> NEW as "cat" tuples, of course, but if they are configured to see
> transition tables, should they see "cat", "mammal" or "animal" tuples
> in the transition tables?  With my patch as it is, that depends on
> which level of the hierarchy you explicitly updated!

I think that the ideal behavior would be that if you define a
trigger on "cat", you see rows in the "cat" format; if you define a
trigger on rows for "mammal", you see rows in the "mammal" format;
if you define a trigger on rows for "animal", you see rows in the
"animal" format.  Also, the ideal would be that we support an ONLY
option for trigger declaration.  If your statement is ONLY on the a
given level in the hierarchy, the row triggers for only that level
would fire.  If you don't use ONLY, a row trigger at that level
would fire for operations at that level or any child level, but with
a record format matching the level of the trigger.

Now, that may be too ambitious for this release.  If so, I suggest
we not implement anything that would be broken by the above, and
throw a "not implemented" error when necessary.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/



On Wed, May 10, 2017 at 7:02 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Wed, May 10, 2017 at 11:10 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
>> 2.  If you attach a row-level trigger with transition tables to any
>> inheritance child, it will see transition tuples from all tables in
>> the inheritance hierarchy at or below the directly named table that
>> were modified by the same statement, sliced so that they appear as
>> tuples from the directly named table.
>
> Of course that's a bit crazy, not only for trigger authors to
> understand and deal with, but also for plan caching: it just doesn't
> really make sense to have a database object, even an ephemeral one,
> whose type changes depending on how the trigger was invoked, because
> the plans stick around.

The patch to add transition tables changed caching of a trigger
function to key on the combination of the function and the target
relation, rather than having one cache entry regardless of the
target table.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/



On Wed, May 17, 2017 at 9:20 AM, Kevin Grittner <kgrittn@gmail.com> wrote:
> On Wed, May 10, 2017 at 7:02 AM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> On Wed, May 10, 2017 at 11:10 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
>>> 2.  If you attach a row-level trigger with transition tables to any
>>> inheritance child, it will see transition tuples from all tables in
>>> the inheritance hierarchy at or below the directly named table that
>>> were modified by the same statement, sliced so that they appear as
>>> tuples from the directly named table.
>>
>> Of course that's a bit crazy, not only for trigger authors to
>> understand and deal with, but also for plan caching: it just doesn't
>> really make sense to have a database object, even an ephemeral one,
>> whose type changes depending on how the trigger was invoked, because
>> the plans stick around.
>
> The patch to add transition tables changed caching of a trigger
> function to key on the combination of the function and the target
> relation, rather than having one cache entry regardless of the
> target table.

Right.  That works as long as triggers always see tuples in the format
of the relation that they're attached to, and I think that's the only
sensible choice.  The problem I was thinking about was this:  We have
only one pair of tuplestores and in the current design it holds tuples
of a uniform format, yet it may (eventually) need to be scanned by a
statement trigger attached to a the named relation AND any number of
row triggers attached to children with potentially different formats.
That implies some extra conversions are required at scan time.  I had
a more ambitious patch that would deal with sone of that by tracking
storage format and scan format separately (next time your row trigger
is invoked the scan format will be the same but the storage format may
be different depending on which relation you named in a query), but
I'm putting that to one side for this release.  That was a bit of a
rabbit hole, and there are some tricky design questions about tuple
conversions (to behave like DB2 with subtables may even require
tuplestore with per-tuple type information) and also the subset of
rows that each row trigger should see (which may require extra tuple
origin metadata or separate tuplestores).

I'm about to post a much simpler patch that collects uniform tuples
from children, addressing the reported bug, and simply rejects
transition tables on row-triggers on tables that are in either kind of
inheritance hierarchy.  More soon...

-- 
Thomas Munro
http://www.enterprisedb.com



On Tue, May 16, 2017 at 4:50 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Wed, May 17, 2017 at 9:20 AM, Kevin Grittner <kgrittn@gmail.com> wrote:
>> On Wed, May 10, 2017 at 7:02 AM, Thomas Munro
>> <thomas.munro@enterprisedb.com> wrote:
>>> On Wed, May 10, 2017 at 11:10 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
>>>> 2.  If you attach a row-level trigger with transition tables to any
>>>> inheritance child, it will see transition tuples from all tables in
>>>> the inheritance hierarchy at or below the directly named table that
>>>> were modified by the same statement, sliced so that they appear as
>>>> tuples from the directly named table.
>>>
>>> Of course that's a bit crazy, not only for trigger authors to
>>> understand and deal with, but also for plan caching: it just doesn't
>>> really make sense to have a database object, even an ephemeral one,
>>> whose type changes depending on how the trigger was invoked, because
>>> the plans stick around.
>>
>> The patch to add transition tables changed caching of a trigger
>> function to key on the combination of the function and the target
>> relation, rather than having one cache entry regardless of the
>> target table.
>
> Right.  That works as long as triggers always see tuples in the format
> of the relation that they're attached to, and I think that's the only
> sensible choice.  The problem I was thinking about was this:  We have
> only one pair of tuplestores and in the current design it holds tuples
> of a uniform format, yet it may (eventually) need to be scanned by a
> statement trigger attached to a the named relation AND any number of
> row triggers attached to children with potentially different formats.
> That implies some extra conversions are required at scan time.  I had
> a more ambitious patch that would deal with sone of that by tracking
> storage format and scan format separately (next time your row trigger
> is invoked the scan format will be the same but the storage format may
> be different depending on which relation you named in a query), but
> I'm putting that to one side for this release.  That was a bit of a
> rabbit hole, and there are some tricky design questions about tuple
> conversions (to behave like DB2 with subtables may even require
> tuplestore with per-tuple type information) and also the subset of
> rows that each row trigger should see (which may require extra tuple
> origin metadata or separate tuplestores).

Yeah, I wish this had surfaced far earlier, but I missed it and none
of the reviews prior to commit caught it, either.  :-(  It seems too
big to squeeze into v10 now.  I just want to have that general
direction in mind and not paint ourselves into any corner that makes
it hard to get there in the next release.

> I'm about to post a much simpler patch that collects uniform tuples
> from children, addressing the reported bug, and simply rejects
> transition tables on row-triggers on tables that are in either kind of
> inheritance hierarchy.  More soon...

I agree that we can safely go that far, but not farther.  Thanks!

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/



On Wed, May 17, 2017 at 10:13 AM, Kevin Grittner <kgrittn@gmail.com> wrote:
> On Tue, May 16, 2017 at 4:50 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>>
>> I'm about to post a much simpler patch that collects uniform tuples
>> from children, addressing the reported bug, and simply rejects
>> transition tables on row-triggers on tables that are in either kind of
>> inheritance hierarchy.  More soon...
>
> I agree that we can safely go that far, but not farther.  Thanks!

Here is that patch.  Thoughts?

On Wed, May 10, 2017 at 3:41 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Hmm.  What if the partitioning hierarchy contains foreign tables?

No tuples are collected from partitions that are foreign tables.  See
the attached demonstration.  I wasn't sure whether and if so where to
include that in the regression tests because it needs a contrib
module.

Robert and I discussed this off-list and came up with some options:
(1) document that as the current behaviour (where?), (2) figure out
how to prevent that situation from arising, (3) raise some kind of
runtime error if foreign transition tuples need to be collected.

Option 2 would seem to require us to lock the whole chain of ancestors
to check for statement-level triggers with transition tables, which
seems unpleasant, and option 3 is conceptually similar to the
execution time insertion failure.  It's debatable wither 3 or 1 is
more surprising or inconvenient to users.  I vote for option 1 as a
stop-gap measure (and I hope that someone will soon fix transition
tuple capture for foreign tables generally).  However, it's a bit
inconsistent that we explicitly reject triggers with transition tables
on foreign tables directly, but let them silently fail to capture
anything when they're indirectly accessed via a parent relation.
Thoughts?

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения
On 2017/05/17 11:22, Thomas Munro wrote:
> On Wed, May 17, 2017 at 10:13 AM, Kevin Grittner <kgrittn@gmail.com> wrote:
>> On Tue, May 16, 2017 at 4:50 PM, Thomas Munro
>> <thomas.munro@enterprisedb.com> wrote:
>>>
>>> I'm about to post a much simpler patch that collects uniform tuples
>>> from children, addressing the reported bug, and simply rejects
>>> transition tables on row-triggers on tables that are in either kind of
>>> inheritance hierarchy.  More soon...
>>
>> I agree that we can safely go that far, but not farther.  Thanks!
> 
> Here is that patch.  Thoughts?

I looked at the patch and noticed that there might be some confusion about
what's in the EState.es_root_result_relations array.

If you see the following block of code in ExecInitModifyTable():
   /* If modifying a partitioned table, initialize the root table info */   if (node->rootResultRelIndex >= 0)
mtstate->rootResultRelInfo= estate->es_root_result_relations +
node->rootResultRelIndex;

You might be able to see that node->rootResultRelIndex is used as offset
into es_root_result_relations, which means the partitioned table root
being modified by this node.  The entries in es_root_result_relations
correspond to the partitioned table roots referenced as targets in
different parts of the query (for example, a WITH query might have its own
target partitioned tables).

So, in ExecSetupTriggerTransitionState() added by the patch, the following
code needs to be changed:
   /*    * Find the ResultRelInfo corresponding to the relation that was    * explicitly named in the statement.    */
if (estate->es_num_root_result_relations > 0)   {       /* Partitioned table.  The named relation is the first root. */
     targetRelInfo = &estate->es_root_result_relations[0];   }
 

targetRelInfo should instead be set to mtstate->rootResultRelInfo that was
set in ExecInitModifyTable() as described above, IOW, as follows:
   /* Partitioned table. */   if (mtstate->rootResultRelInfo != NULL)       targetRelInfo =
mtstate->rootResultRelInfo;

I guess that's what you intend to do here.

Sorry if the comments that I wrote about es_root_result_relations in what
got committed as e180c8aa8c were not clear enough.

Thanks,
Amit




On Wed, May 17, 2017 at 6:04 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> On 2017/05/17 11:22, Thomas Munro wrote:
>> Here is that patch.  Thoughts?
>
> I looked at the patch and noticed that there might be some confusion about
> what's in the EState.es_root_result_relations array.

Thanks for looking at this!

> ...
>
> targetRelInfo should instead be set to mtstate->rootResultRelInfo that was
> set in ExecInitModifyTable() as described above, IOW, as follows:
>
>     /* Partitioned table. */
>     if (mtstate->rootResultRelInfo != NULL)
>         targetRelInfo = mtstate->rootResultRelInfo;

Ah, I see.  Thank you.  Fixed in the attached.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения
On Wed, May 17, 2017 at 7:42 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Wed, May 17, 2017 at 6:04 PM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> targetRelInfo should instead be set to mtstate->rootResultRelInfo that was
>> set in ExecInitModifyTable() as described above, IOW, as follows:
>>
>>     /* Partitioned table. */
>>     if (mtstate->rootResultRelInfo != NULL)
>>         targetRelInfo = mtstate->rootResultRelInfo;
>
> Ah, I see.  Thank you.  Fixed in the attached.

Here's a post-pgindent rebase.

Also, I discovered a preexisting bug that is independent of all this
inheritance stuff.  COPY in the batch optimisation case was failing to
capture transition tuples.  I thought about sending a separate patch
but this patch already has a regression test that covers it so I've
included it here.  It's this hunk:

@@ -2872,7 +2872,8 @@ CopyFromInsertBatch(CopyState cstate, EState
*estate, CommandId mycid,
         * anyway.
         */
        else if (resultRelInfo->ri_TrigDesc != NULL &&
-                        resultRelInfo->ri_TrigDesc->trig_insert_after_row)
+                        (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+                         resultRelInfo->ri_TrigDesc->trig_insert_new_table))
        {
                for (i = 0; i < nBufferedTuples; i++)
                {

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения
On 2017/05/18 7:13, Thomas Munro wrote:
> On Wed, May 17, 2017 at 7:42 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> On Wed, May 17, 2017 at 6:04 PM, Amit Langote
>> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>>> targetRelInfo should instead be set to mtstate->rootResultRelInfo that was
>>> set in ExecInitModifyTable() as described above, IOW, as follows:
>>>
>>>     /* Partitioned table. */
>>>     if (mtstate->rootResultRelInfo != NULL)
>>>         targetRelInfo = mtstate->rootResultRelInfo;
>>
>> Ah, I see.  Thank you.  Fixed in the attached.
> 
> Here's a post-pgindent rebase.

I read through the latest patch.  Some comments:

Do we need to update documentation?  Perhaps, some clarification on the
inheritance/partitioning behavior somewhere.

+typedef struct TriggerTransitionState
+{
...
+    bool        ttf_delete_old_table;

Just curious: why ttf_?  TriggerTransition field?

-    Assert((enrmd->reliddesc == InvalidOid) != (enrmd->tupdesc == NULL));
+    Assert((enrmd->reliddesc == InvalidOid) !=
+           (enrmd->tupdesc == NULL));

Perhaps, unintentional change?

+        original_tuple = tuple;        map = mtstate->mt_partition_tupconv_maps[leaf_part_index];        if (map)
 {
 
@@ -570,8 +572,17 @@ ExecInsert(ModifyTableState *mtstate,        setLastTid(&(tuple->t_self));    }

+    /*
+     * If we inserted into a partitioned table, then insert routing logic may
+     * have converted the tuple to a partition's format.  Make the original
+     * unconverted tuple available for transition tables.
+     */
+    if (mtstate->mt_transition_state != NULL)
+        mtstate->mt_transition_state->original_insert_tuple = original_tuple;

I'm not sure if it's significant for transition tables, but what if a
partition's BR trigger modified the tuple?  Would we want to include the
modified version of the tuple in the transition table or the original as
the patch does?  Same for the code in CopyFrom().
 * 'tup_conv_maps' receives an array of TupleConversionMap objects with one *      entry for every leaf partition
(requiredto convert input tuple based *      on the root table's rowtype to a leaf partition's rowtype after tuple
 
- *      routing is done
+ *      routing is done)

Oh, thanks! :)

Other than the above minor nitpicks, patch looks good to me.

Thanks,
Amit




On Thu, May 18, 2017 at 5:16 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

> Do we need to update documentation?  Perhaps, some clarification on the
> inheritance/partitioning behavior somewhere.

Yeah, I think so.

> -    Assert((enrmd->reliddesc == InvalidOid) != (enrmd->tupdesc == NULL));
> +    Assert((enrmd->reliddesc == InvalidOid) !=
> +           (enrmd->tupdesc == NULL));
>
> Perhaps, unintentional change?

Agreed; line is not long enough to need to wrap.

> I'm not sure if it's significant for transition tables, but what if a
> partition's BR trigger modified the tuple?  Would we want to include the
> modified version of the tuple in the transition table or the original as
> the patch does?  Same for the code in CopyFrom().

Good spot!  If the BR trigger on the child table modifies or
suppresses the action, I strongly feel that must be reflected in the
transition table.  This needs to be fixed.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/



On Fri, May 19, 2017 at 1:38 AM, Kevin Grittner <kgrittn@gmail.com> wrote:
> On Thu, May 18, 2017 at 5:16 AM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>
>> Do we need to update documentation?  Perhaps, some clarification on the
>> inheritance/partitioning behavior somewhere.
>
> Yeah, I think so.

Here is an attempt at documenting the situation in the CREATE TRIGGER
notes section.

>> -    Assert((enrmd->reliddesc == InvalidOid) != (enrmd->tupdesc == NULL));
>> +    Assert((enrmd->reliddesc == InvalidOid) !=
>> +           (enrmd->tupdesc == NULL));
>>
>> Perhaps, unintentional change?
>
> Agreed; line is not long enough to need to wrap.

Fixed.

>> I'm not sure if it's significant for transition tables, but what if a
>> partition's BR trigger modified the tuple?  Would we want to include the
>> modified version of the tuple in the transition table or the original as
>> the patch does?  Same for the code in CopyFrom().
>
> Good spot!  If the BR trigger on the child table modifies or
> suppresses the action, I strongly feel that must be reflected in the
> transition table.  This needs to be fixed.

Gah.  Right.  In the attached version, there is a still an 'original
tuple' optimisation for insertions (avoiding parent -> child -> parent
conversion), but it's disabled if there are any BEFORE INSERT or
INSTEAD OF INSERT row-level triggers.

That's demonstrated by this part of the regression test, which
modifies the value inserted into the 'CCC' partition (and similar case
for COPY):

insert into parent values ('AAA', 42), ('BBB', 42), ('CCC', 66);
NOTICE:  trigger = parent_stmt_trig, old table = <NULL>, new table =
(AAA,42), (BBB,42), (CCC,1066)

On Thu, May 18, 2017 at 10:16 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> +typedef struct TriggerTransitionState
> +{
> ...
> +    bool        ttf_delete_old_table;
>
> Just curious: why ttf_?  TriggerTransition field?

Oops.  Changed to "tts_".  I had renamed this struct but not the members.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения
On 2017/05/19 14:01, Thomas Munro wrote:
> On Fri, May 19, 2017 at 1:38 AM, Kevin Grittner <kgrittn@gmail.com> wrote:
>> On Thu, May 18, 2017 at 5:16 AM, Amit Langote
>> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>>
>>> Do we need to update documentation?  Perhaps, some clarification on the
>>> inheritance/partitioning behavior somewhere.
>>
>> Yeah, I think so.
> 
> Here is an attempt at documenting the situation in the CREATE TRIGGER
> notes section.

Looks good, thanks.

>>> I'm not sure if it's significant for transition tables, but what if a
>>> partition's BR trigger modified the tuple?  Would we want to include the
>>> modified version of the tuple in the transition table or the original as
>>> the patch does?  Same for the code in CopyFrom().
>>
>> Good spot!  If the BR trigger on the child table modifies or
>> suppresses the action, I strongly feel that must be reflected in the
>> transition table.  This needs to be fixed.
> 
> Gah.  Right.  In the attached version, there is a still an 'original
> tuple' optimisation for insertions (avoiding parent -> child -> parent
> conversion), but it's disabled if there are any BEFORE INSERT or
> INSTEAD OF INSERT row-level triggers.
> 
> That's demonstrated by this part of the regression test, which
> modifies the value inserted into the 'CCC' partition (and similar case
> for COPY):
> 
> insert into parent values ('AAA', 42), ('BBB', 42), ('CCC', 66);
> NOTICE:  trigger = parent_stmt_trig, old table = <NULL>, new table =
> (AAA,42), (BBB,42), (CCC,1066)

Seems to work correctly.

I saw in the latest patch that now ExecSetupTriggerTransitionState() looks
at mtstate->mt_partition_dispatch_info when setting up the transition
conversion map.  In the case where it's non-NULL, you may have realized
that mt_transition_tupconv_map will be an exact copy of
mt_partition_tupconv_maps that's already built.  Would it perhaps be a
good idea to either share the same or make a copy using memcpy() instead
of doing the convert_tuples_by_name() calls again?

> On Thu, May 18, 2017 at 10:16 PM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> +typedef struct TriggerTransitionState
>> +{
>> ...
>> +    bool        ttf_delete_old_table;
>>
>> Just curious: why ttf_?  TriggerTransition field?
> 
> Oops.  Changed to "tts_".  I had renamed this struct but not the members.

Ah.  BTW, maybe it's not a problem, but the existing TupleTableSlot's
member names are prefixed with tts_, too. :)

Thanks,
Amit




On Fri, May 19, 2017 at 5:51 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> I saw in the latest patch that now ExecSetupTriggerTransitionState() looks
> at mtstate->mt_partition_dispatch_info when setting up the transition
> conversion map.  In the case where it's non-NULL, you may have realized
> that mt_transition_tupconv_map will be an exact copy of
> mt_partition_tupconv_maps that's already built.  Would it perhaps be a
> good idea to either share the same or make a copy using memcpy() instead
> of doing the convert_tuples_by_name() calls again?

Isn't it the opposite?  mt_partition_tupconv_maps holds maps that
convert the parent format to the partition format.
mt_transition_tupconv_maps holds maps that convert the partition
format to the parent format (= transition tuplestore format).

>> On Thu, May 18, 2017 at 10:16 PM, Amit Langote
>> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>>> +typedef struct TriggerTransitionState
>>> +{
>>> ...
>>> +    bool        ttf_delete_old_table;
>>>
>>> Just curious: why ttf_?  TriggerTransition field?
>>
>> Oops.  Changed to "tts_".  I had renamed this struct but not the members.
>
> Ah.  BTW, maybe it's not a problem, but the existing TupleTableSlot's
> member names are prefixed with tts_, too. :)

Would TransitionCaptureState be a better name for this struct?

-- 
Thomas Munro
http://www.enterprisedb.com



On 2017/05/19 15:16, Thomas Munro wrote:
> On Fri, May 19, 2017 at 5:51 PM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> I saw in the latest patch that now ExecSetupTriggerTransitionState() looks
>> at mtstate->mt_partition_dispatch_info when setting up the transition
>> conversion map.  In the case where it's non-NULL, you may have realized
>> that mt_transition_tupconv_map will be an exact copy of
>> mt_partition_tupconv_maps that's already built.  Would it perhaps be a
>> good idea to either share the same or make a copy using memcpy() instead
>> of doing the convert_tuples_by_name() calls again?
> 
> Isn't it the opposite?  mt_partition_tupconv_maps holds maps that
> convert the parent format to the partition format.
> mt_transition_tupconv_maps holds maps that convert the partition
> format to the parent format (= transition tuplestore format).

You're right, never mind.

>>> On Thu, May 18, 2017 at 10:16 PM, Amit Langote
>>> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>>>> +typedef struct TriggerTransitionState
>>>> +{
>>>> ...
>>>> +    bool        ttf_delete_old_table;
>>>>
>>>> Just curious: why ttf_?  TriggerTransition field?
>>>
>>> Oops.  Changed to "tts_".  I had renamed this struct but not the members.
>>
>> Ah.  BTW, maybe it's not a problem, but the existing TupleTableSlot's
>> member names are prefixed with tts_, too. :)
> 
> Would TransitionCaptureState be a better name for this struct?

Yes.  Although, losing the Trigger prefix might make it sound a bit
ambiguous though.  Right above its definition, we have TriggerData.  So,
maybe TriggerTransitionCaptureState or TriggerTransitionCaptureData or
TriggerTransitionData may be worth considering.

Thanks,
Amit




On Fri, May 19, 2017 at 6:35 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> On 2017/05/19 15:16, Thomas Munro wrote:
>> Would TransitionCaptureState be a better name for this struct?
>
> Yes.  Although, losing the Trigger prefix might make it sound a bit
> ambiguous though.  Right above its definition, we have TriggerData.  So,
> maybe TriggerTransitionCaptureState or TriggerTransitionCaptureData or
> TriggerTransitionData may be worth considering.

Ok, here's a version using TransitionCaptureState.  Those other names
seem too long, and "TriggerTransition" is already in use so
"TriggerTransitionData" seems off the table.  Having the word
"capture" in there seems good, since this is an object that controls
what we capture when we process a modify a set of tables.  I hope
that's clear.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения
On Sat, May 20, 2017 at 10:43 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Fri, May 19, 2017 at 6:35 PM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> On 2017/05/19 15:16, Thomas Munro wrote:
>>> Would TransitionCaptureState be a better name for this struct?
>>
>> Yes.  Although, losing the Trigger prefix might make it sound a bit
>> ambiguous though.  Right above its definition, we have TriggerData.  So,
>> maybe TriggerTransitionCaptureState or TriggerTransitionCaptureData or
>> TriggerTransitionData may be worth considering.
>
> Ok, here's a version using TransitionCaptureState.  Those other names
> seem too long, and "TriggerTransition" is already in use so
> "TriggerTransitionData" seems off the table.  Having the word
> "capture" in there seems good, since this is an object that controls
> what we capture when we process a modify a set of tables.  I hope
> that's clear.

Sent too soon.  Several variables should also be renamed to make clear
they refer to the transition capture state in effect, instead of vague
names like 'transitions'.  Sorry for the version churn.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения
On 2017/05/20 9:01, Thomas Munro wrote:
> On Sat, May 20, 2017 at 10:43 AM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> On Fri, May 19, 2017 at 6:35 PM, Amit Langote
>> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>>> On 2017/05/19 15:16, Thomas Munro wrote:
>>>> Would TransitionCaptureState be a better name for this struct?
>>>
>>> Yes.  Although, losing the Trigger prefix might make it sound a bit
>>> ambiguous though.  Right above its definition, we have TriggerData.  So,
>>> maybe TriggerTransitionCaptureState or TriggerTransitionCaptureData or
>>> TriggerTransitionData may be worth considering.
>>
>> Ok, here's a version using TransitionCaptureState.  Those other names
>> seem too long, and "TriggerTransition" is already in use so
>> "TriggerTransitionData" seems off the table.  Having the word
>> "capture" in there seems good, since this is an object that controls
>> what we capture when we process a modify a set of tables.  I hope
>> that's clear.

I agree.  TransitionCaptureState sounds good.

> Sent too soon.  Several variables should also be renamed to make clear
> they refer to the transition capture state in effect, instead of vague
> names like 'transitions'.  Sorry for the version churn.

Ah, I was kind of getting distracted by it earlier too; thanks.

Anyway, the patch looks good to me.

Thanks,
Amit




This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com



On Mon, May 29, 2017 at 9:34 PM, Noah Misch <noah@leadboat.com> wrote:
> This PostgreSQL 10 open item is past due for your status update.  Kindly send
> a status update within 24 hours, and include a date for your subsequent status
> update.  Refer to the policy on open item ownership:
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

I spoke with Kevin about this at PGCon and asked him to have a look at
it.  He agreed to do so, but did not specify a time frame, which seems
important.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Tue, May 30, 2017 at 01:34:33AM +0000, Noah Misch wrote:
> This PostgreSQL 10 open item is past due for your status update.  Kindly send
> a status update within 24 hours, and include a date for your subsequent status
> update.  Refer to the policy on open item ownership:
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 10 open item is long past due
for your status update.  Please reacquaint yourself with the policy on open
item ownership[1] and then reply immediately.  If I do not hear from you by
2017-06-01 07:00 UTC, I will transfer this item to release management team
ownership without further notice.

[1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com



On Wed, May 31, 2017 at 1:44 AM, Noah Misch <noah@leadboat.com> wrote:

> IMMEDIATE ATTENTION REQUIRED.

I should be able to complete review and testing by Friday.  If there
are problems I might not take action until Monday; otherwise I
should be able to do so on Friday.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/



On Wed, May 31, 2017 at 09:14:17AM -0500, Kevin Grittner wrote:
> On Wed, May 31, 2017 at 1:44 AM, Noah Misch <noah@leadboat.com> wrote:
> 
> > IMMEDIATE ATTENTION REQUIRED.
> 
> I should be able to complete review and testing by Friday.  If there
> are problems I might not take action until Monday; otherwise I
> should be able to do so on Friday.

IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 10 open item is again long past
due for your status update.  Please reacquaint yourself with the policy on
open item ownership[1] and then reply immediately.  If I do not hear from you
by 2017-06-08 08:00 UTC, I will transfer this item to release management team
ownership without further notice.

[1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com



On Mon, May 22, 2017 at 5:51 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> On 2017/05/20 9:01, Thomas Munro wrote:
>> Sent too soon.  Several variables should also be renamed to make clear
>> they refer to the transition capture state in effect, instead of vague
>> names like 'transitions'.  Sorry for the version churn.
>
> Ah, I was kind of getting distracted by it earlier too; thanks.
>
> Anyway, the patch looks good to me.

[Adding Andrew Gierth]

Here is a rebased version of the patch to fix transition tables with
inheritance.  Fixes a typo in an error message ("not support on
partitions" -> "... supported ..."), and changes regression test
triggers to be single-event (only one of INSERT, UPDATE or DELETE),
because a later patch will not allow multi-event triggers with TTs.

This is patch 1 of a stack of 3 patches addressing currently known
problems with transition tables.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения
On Thu, Jun 8, 2017 at 11:56 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> [Adding Andrew Gierth]
>
> Here is a rebased version of the patch to fix transition tables with
> inheritance.  Fixes a typo in an error message ("not support on
> partitions" -> "... supported ..."), and changes regression test
> triggers to be single-event (only one of INSERT, UPDATE or DELETE),
> because a later patch will not allow multi-event triggers with TTs.
>
> This is patch 1 of a stack of 3 patches addressing currently known
> problems with transition tables.

So, Andrew, are you running with this, or should I keep looking into it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Fri, Jun 9, 2017 at 12:19 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Jun 8, 2017 at 11:56 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> [Adding Andrew Gierth]
>>
>> Here is a rebased version of the patch to fix transition tables with
>> inheritance.  Fixes a typo in an error message ("not support on
>> partitions" -> "... supported ..."), and changes regression test
>> triggers to be single-event (only one of INSERT, UPDATE or DELETE),
>> because a later patch will not allow multi-event triggers with TTs.
>>
>> This is patch 1 of a stack of 3 patches addressing currently known
>> problems with transition tables.
>
> So, Andrew, are you running with this, or should I keep looking into it?

I have spent some time now studying this patch.  I might be missing
something, but to me this appears to be in great shape.  A few minor
nitpicks:

-        if ((event == TRIGGER_EVENT_DELETE &&
!trigdesc->trig_delete_after_row) ||
-        (event == TRIGGER_EVENT_INSERT && !trigdesc->trig_insert_after_row) ||
-         (event == TRIGGER_EVENT_UPDATE && !trigdesc->trig_update_after_row))
+        if (trigdesc == NULL ||
+            (event == TRIGGER_EVENT_DELETE &&
!trigdesc->trig_delete_after_row) ||
+            (event == TRIGGER_EVENT_INSERT &&
!trigdesc->trig_insert_after_row) ||
+            (event == TRIGGER_EVENT_UPDATE &&
!trigdesc->trig_update_after_row))

I suspect the whitespace changes will get reverted by pgindent, making
them pointless.  But that's a trivial issue.

+        if (mtstate->mt_transition_capture != NULL)
+        {
+            if (resultRelInfo->ri_TrigDesc &&
+                (resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
+                 resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
+            {
+                /*
+                 * If there are any BEFORE or INSTEAD triggers on the
+                 * partition, we'll have to be ready to convert their result
+                 * back to tuplestore format.
+                 */
+
mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
+                mtstate->mt_transition_capture->tcs_map =
+                    mtstate->mt_transition_tupconv_maps[leaf_part_index];
+            }
+            else
+            {
+                /*
+                 * Otherwise, just remember the original unconverted tuple, to
+                 * avoid a needless round trip conversion.
+                 */
+
mtstate->mt_transition_capture->tcs_original_insert_tuple = tuple;
+                mtstate->mt_transition_capture->tcs_map = NULL;
+            }
+        }

This chunk of code gets inserted between a comment and the code to
which that comment refers.  Maybe that's not the best idea.

Some other things I thought about:

* Does has_superclass have concurrency issues?  After some
consideration, I decided that it's probably fine as long as you hold a
lock on the target relation sufficient to prevent it from concurrently
becoming an inheritance child - i.e. any lock at all.  The caller is
CREATE TRIGGER, which certainly does.

* In AfterTriggerSaveEvent, is it a problem that the large new hunk of
code ignores trigdesc if transition_capture != NULL?  If I understand
correctly, the trigdesc will be coming from the leaf relation actually
being updated, while the transition_capture will be coming from the
relation named in the query.  Is the transition_capture object
guaranteed to have all the flags set, or do we also need to include
the ones from the trigdesc?  This also seems to be fine, because of
the restriction that row-level triggers with tuplestores can't
participate in inheritance hierarchies.  We can only need to capture
the tuples for the relation named in the query, not the leaf
partitions.

* The regression tests for this function are fairly lengthy.  Given
the complexity of the behavior being tested, though, it seems like a
really good idea to have these.  Otherwise, it's easy to imagine some
future patch breaking this again.

I also like the documentation update.

So, in short, +1 from me.

Regards,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] transition table behavior with inheritance appears broken

От
Andrew Gierth
Дата:
>>>>> "Robert" == Robert Haas <robertmhaas@gmail.com> writes:
Robert> So, Andrew, are you running with this, or should I keep lookingRobert> into it?

I have it; I will post a status update before 23:59 BST on 11 Jun.

-- 
Andrew (irc:RhodiumToad)



Re: [HACKERS] transition table behavior with inheritance appears broken

От
Andrew Gierth
Дата:
>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
Andrew> I have it; I will post a status update before 23:59 BST on 11Andrew> Jun.

This is that status update.  I am still studying Thomas' latest patch
set; as I mentioned in another message, I've confirmed a memory leak,
and I expect further work may be needed in some other areas as well, but
I think we're still making progress towards fixing it and I will work
with Thomas on it.

I will post a further status update before 23:59 BST on 14th Jun.

-- 
Andrew (irc:RhodiumToad)



On Sat, Jun 10, 2017 at 6:08 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> I have spent some time now studying this patch.  I might be missing
> something, but to me this appears to be in great shape.  A few minor
> nitpicks:
>
> -        if ((event == TRIGGER_EVENT_DELETE &&
> !trigdesc->trig_delete_after_row) ||
> -        (event == TRIGGER_EVENT_INSERT && !trigdesc->trig_insert_after_row) ||
> -         (event == TRIGGER_EVENT_UPDATE && !trigdesc->trig_update_after_row))
> +        if (trigdesc == NULL ||
> +            (event == TRIGGER_EVENT_DELETE &&
> !trigdesc->trig_delete_after_row) ||
> +            (event == TRIGGER_EVENT_INSERT &&
> !trigdesc->trig_insert_after_row) ||
> +            (event == TRIGGER_EVENT_UPDATE &&
> !trigdesc->trig_update_after_row))
>
> I suspect the whitespace changes will get reverted by pgindent, making
> them pointless.  But that's a trivial issue.

Not just a whitespace change: added "trigdesc == NULL" and put the
existing stuff into parens, necessarily changing indentation level.

> +        if (mtstate->mt_transition_capture != NULL)
> +        {
> +            if (resultRelInfo->ri_TrigDesc &&
> +                (resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
> +                 resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
> +            {
> +                /*
> +                 * If there are any BEFORE or INSTEAD triggers on the
> +                 * partition, we'll have to be ready to convert their result
> +                 * back to tuplestore format.
> +                 */
> +
> mtstate->mt_transition_capture->tcs_original_insert_tuple = NULL;
> +                mtstate->mt_transition_capture->tcs_map =
> +                    mtstate->mt_transition_tupconv_maps[leaf_part_index];
> +            }
> +            else
> +            {
> +                /*
> +                 * Otherwise, just remember the original unconverted tuple, to
> +                 * avoid a needless round trip conversion.
> +                 */
> +
> mtstate->mt_transition_capture->tcs_original_insert_tuple = tuple;
> +                mtstate->mt_transition_capture->tcs_map = NULL;
> +            }
> +        }
>
> This chunk of code gets inserted between a comment and the code to
> which that comment refers.  Maybe that's not the best idea.

Fixed.  Similar code existed in copy.c, so fixed there too.

> * Does has_superclass have concurrency issues?  After some
> consideration, I decided that it's probably fine as long as you hold a
> lock on the target relation sufficient to prevent it from concurrently
> becoming an inheritance child - i.e. any lock at all.  The caller is
> CREATE TRIGGER, which certainly does.

I added a comment to make that clear.

> * In AfterTriggerSaveEvent, is it a problem that the large new hunk of
> code ignores trigdesc if transition_capture != NULL?  If I understand
> correctly, the trigdesc will be coming from the leaf relation actually
> being updated, while the transition_capture will be coming from the
> relation named in the query.  Is the transition_capture object
> guaranteed to have all the flags set, or do we also need to include
> the ones from the trigdesc?  This also seems to be fine, because of
> the restriction that row-level triggers with tuplestores can't
> participate in inheritance hierarchies.  We can only need to capture
> the tuples for the relation named in the query, not the leaf
> partitions.

Yeah.  I think that's right.

Note that in this patch I was trying to cater to execReplication.c so
that it wouldn't have to construct a TransitionCaptureState.  It
doesn't actually support hierarchies (it doesn't operate on
partitioned tables, and doesn't have the smarts to direct updates to
traditional inheritance children), and doesn't fire statement
triggers.

In the #2 patch in the other thread about wCTEs, I changed this to
make a TransitionCaptureState object the *only* way to cause
transition tuple capture, because in that patch it owns the
tuplestores without which you can't capture anything.  So in that
patch trigdesc is no longer relevant at all for the tuple capture.
Someone extending execReplication.c to support partitions and
statement triggers etc will need to think about creating a
TransitionCaptureState but I decided that was out of scope for the
transition table rescue project.  In the new version of the #2 patch
that I'm about to post on the other there there is now a comment in
execReplication.c to explain.

> So, in short, +1 from me.

Thanks for the review.  New version of patch #1 attached.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] transition table behavior with inheritance appears broken

От
Andrew Gierth
Дата:
>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
Andrew> I will post a further status update before 23:59 BST on 14thAndrew> Jun.

Unfortunately I've been delayed over the past couple of days, but I have
Thomas' latest patchset in hand and will be working on it over the rest
of the week. Status update by 23:59 BST on Sun 18th, by which time I
hope to have everything finalized (all three issues, not just the
inheritance one).

-- 
Andrew (irc:RhodiumToad)



Re: [HACKERS] transition table behavior with inheritance appears broken

От
Andrew Gierth
Дата:
>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
Andrew> Unfortunately I've been delayed over the past couple of days,Andrew> but I have Thomas' latest patchset in hand
andwill be workingAndrew> on it over the rest of the week. Status update by 23:59 BST onAndrew> Sun 18th, by which time
Ihope to have everything finalizedAndrew> (all three issues, not just the inheritance one).
 

I have, I believe, completed my review of the patchset. My conclusion is
that the fix appears to be sound and I haven't been able to find any
further issues with it; so I think Thomas's patches should be committed
as-is. Unless anyone objects I will do this within the next few days.

(Any preferences for whether it should be one commit or 3 separate ones?)

-- 
Andrew (irc:RhodiumToad)



Re: [HACKERS] transition table behavior with inheritance appearsbroken

От
Amit Langote
Дата:
On 2017/06/19 7:59, Andrew Gierth wrote:
>>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> > (Any preferences for whether it should be one commit or 3 separate ones?)

For my 2c, it would be nice to keep at least the inheritance one (or all
of them actually) separate.

Thanks,
Amit




Re: [HACKERS] transition table behavior with inheritance appears broken

От
Robert Haas
Дата:
On Sun, Jun 18, 2017 at 6:59 PM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
> (Any preferences for whether it should be one commit or 3 separate ones?)

If I were doing it, I would commit them separately.

But I'm not doing it, so I won't complain about what you decide to do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Mon, Jun 12, 2017 at 2:03 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Thanks for the review.  New version of patch #1 attached.

Here's a version rebased on top of the recently reindented master branch.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] transition table behavior with inheritance appears broken

От
Noah Misch
Дата:
On Sun, Jun 18, 2017 at 11:59:43PM +0100, Andrew Gierth wrote:
> >>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> 
>  Andrew> Unfortunately I've been delayed over the past couple of days,
>  Andrew> but I have Thomas' latest patchset in hand and will be working
>  Andrew> on it over the rest of the week. Status update by 23:59 BST on
>  Andrew> Sun 18th, by which time I hope to have everything finalized
>  Andrew> (all three issues, not just the inheritance one).
> 
> I have, I believe, completed my review of the patchset. My conclusion is
> that the fix appears to be sound and I haven't been able to find any
> further issues with it; so I think Thomas's patches should be committed
> as-is. Unless anyone objects I will do this within the next few days.
> 
> (Any preferences for whether it should be one commit or 3 separate ones?)

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com



Re: [HACKERS] transition table behavior with inheritance appears broken

От
Andrew Gierth
Дата:
>>>>> "Noah" == Noah Misch <noah@leadboat.com> writes:
Noah> This PostgreSQL 10 open item is past due for your status update.Noah> Kindly send a status update within 24
hours,

oops, sorry! I forgot to include a date in the last one, and in fact a
personal matter delayed things anyway. I expect to have this wrapped up
by 23:59 BST on the 24th.

-- 
Andrew.



Re: [HACKERS] transition table behavior with inheritance appears broken

От
Noah Misch
Дата:
On Fri, Jun 23, 2017 at 02:39:48PM +0100, Andrew Gierth wrote:
> >>>>> "Noah" == Noah Misch <noah@leadboat.com> writes:
> 
>  Noah> This PostgreSQL 10 open item is past due for your status update.
>  Noah> Kindly send a status update within 24 hours,
> 
> oops, sorry! I forgot to include a date in the last one, and in fact a
> personal matter delayed things anyway. I expect to have this wrapped up
> by 23:59 BST on the 24th.

This PostgreSQL 10 open item is again past due for your status update.  Kindly
send a status update within 24 hours, and include a date for your subsequent
status update.



Re: [HACKERS] transition table behavior with inheritance appears broken

От
Noah Misch
Дата:
On Sat, Jun 24, 2017 at 10:57:49PM -0700, Noah Misch wrote:
> On Fri, Jun 23, 2017 at 02:39:48PM +0100, Andrew Gierth wrote:
> > >>>>> "Noah" == Noah Misch <noah@leadboat.com> writes:
> > 
> >  Noah> This PostgreSQL 10 open item is past due for your status update.
> >  Noah> Kindly send a status update within 24 hours,
> > 
> > oops, sorry! I forgot to include a date in the last one, and in fact a
> > personal matter delayed things anyway. I expect to have this wrapped up
> > by 23:59 BST on the 24th.
> 
> This PostgreSQL 10 open item is again past due for your status update.  Kindly
> send a status update within 24 hours, and include a date for your subsequent
> status update.

IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 10 open item is long past due
for your status update.  Please reacquaint yourself with the policy on open
item ownership[1] and then reply immediately.  If I do not hear from you by
2017-06-28 06:00 UTC, I will transfer this item to release management team
ownership without further notice.

[1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com



Re: [HACKERS] transition table behavior with inheritance appears broken

От
Andrew Gierth
Дата:
>>>>> "Noah" == Noah Misch <noah@leadboat.com> writes:
Noah> IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 10 open item isNoah> long past due for your status update.  Please
reacquaintyourselfNoah> with the policy on open item ownership[1] and then replyNoah> immediately.  If I do not hear
fromyou by 2017-06-28 06:00 UTC,Noah> I will transfer this item to release management team ownershipNoah> without
furthernotice.
 

Sorry for the lack of updates. I need to sleep now, but I will send a
proper status update by 1800 UTC (1900 BST) today (28th).

-- 
Andrew.



Re: [HACKERS] transition table behavior with inheritance appears broken

От
Andrew Gierth
Дата:
Commits pushed.

Unless I broke the buildfarm again (which I'll check up on later), or
some new issue arises with the fixes, this should close all 3 related
items for transition tables.

-- 
Andrew.



Re: [HACKERS] transition table behavior with inheritance appears broken

От
Thomas Munro
Дата:
On Thu, Jun 29, 2017 at 6:21 AM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
> Commits pushed.

Great news.  Thanks for stepping up to get this committed.  Thanks a
lot also to Marko, Amit L, Kevin, Robert, Noah and Peter G for the
problem reports, reviews and issue chasing.

-- 
Thomas Munro
http://www.enterprisedb.com