Обсуждение: partition routing layering in nodeModifyTable.c
Hi, While discussing partition related code with David in [1], I again was confused by the layering of partition related code in nodeModifyTable.c. 1) How come partition routing is done outside of ExecInsert()? case CMD_INSERT: /* Prepare for tuple routing if needed. */ if (proute) slot = ExecPrepareTupleRouting(node, estate, proute, resultRelInfo, slot); slot = ExecInsert(node, slot, planSlot, estate, node->canSetTag); /* Revert ExecPrepareTupleRouting's state change. */ if (proute) estate->es_result_relation_info = resultRelInfo; break; That already seems like a layering violation, but it's made worse by ExecUpdate() having its partition handling solely inside - including another call to ExecInsert(), including the surrounding partition setup code. And even worse, after all that, ExecInsert() still contains partitioning code. It seems to me that if we just moved the ExecPrepareTupleRouting() into ExecInsert(), we could remove the duplication. 2) The contents of the /* * If a partition check failed, try to move the row into the right * partition. */ if (partition_constraint_failed) block ought to be moved to a separate function (maybe ExecCrossPartitionUpdate or ExecMove). ExecUpdate() is already complicated enough without dealing with the partition move. 3) How come we reset estate->es_result_relation_info after partition routing, but not the mtstate wide changes by ExecPrepareTupleRouting()? Note that its comment says: * Caller must revert the estate changes after executing the insertion! * In mtstate, transition capture changes may also need to be reverted. ExecUpdate() contains /* * Updates set the transition capture map only when a new subplan * is chosen. But for inserts, it is set for each row. So after * INSERT, we need to revert back to the map created for UPDATE; * otherwise the next UPDATE will incorrectly use the one created * for INSERT. So first save the one created for UPDATE. */ if (mtstate->mt_transition_capture) saved_tcs_map = mtstate->mt_transition_capture->tcs_map; but as I read the code, that's not really true? It's ExecPrepareTupleRouting() that does so, and that's called directly in ExecUpdate(). 4) /* * If this insert is the result of a partition key update that moved the * tuple to a new partition, put this row into the transition NEW TABLE, * if there is one. We need to do this separately for DELETE and INSERT * because they happen on different tables. */ ar_insert_trig_tcs = mtstate->mt_transition_capture; if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture && mtstate->mt_transition_capture->tcs_update_new_table) { ExecARUpdateTriggers(estate, resultRelInfo, NULL, NULL, slot, NULL, mtstate->mt_transition_capture); /* * We've already captured the NEW TABLE row, so make sure any AR * INSERT trigger fired below doesn't capture it again. */ ar_insert_trig_tcs = NULL; } /* AFTER ROW INSERT Triggers */ ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes, ar_insert_trig_tcs); Besides not using the just defined ar_insert_trig_tcs and instead repeatedly referring to mtstate->mt_transition_capture, wouldn't this be a easier to understand if the were an if/else, instead of resetting ar_insert_trig_tcs? If the block were /* * triggers behave differently depending on this being a delete as * part of a partion move, or a deletion proper. if (mtstate->operation == CMD_UPDATE) { /* * If this insert is the result of a partition key update that moved the * tuple to a new partition, put this row into the transition NEW TABLE, * if there is one. We need to do this separately for DELETE and INSERT * because they happen on different tables. */ ExecARUpdateTriggers(estate, resultRelInfo, NULL, NULL, slot, NULL, mtstate->mt_transition_capture); /* * But we do want to fire plain per-row INSERT triggers on the * new table. By not passing in transition_capture we prevent * .... */ ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes, NULL); } else { /* AFTER ROW INSERT Triggers */ ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes, ar_insert_trig_tcs); } it seems like it'd be quite a bit clearer (although I do think the comments also need a fair bit of polishing independent of this proposed change). Greetings, Andres Freund [1] https://www.postgresql.org/message-id/CAKJS1f-YObQJTbncGJGRZ6gSFiS%2Bgw_Y5kvrpR%3DvEnFKH17AVA%40mail.gmail.com
Hi Andres, On Thu, Jul 18, 2019 at 10:09 AM Andres Freund <andres@anarazel.de> wrote: > 1) How come partition routing is done outside of ExecInsert()? > > case CMD_INSERT: > /* Prepare for tuple routing if needed. */ > if (proute) > slot = ExecPrepareTupleRouting(node, estate, proute, > resultRelInfo, slot); > slot = ExecInsert(node, slot, planSlot, > estate, node->canSetTag); > /* Revert ExecPrepareTupleRouting's state change. */ > if (proute) > estate->es_result_relation_info = resultRelInfo; > break; > > That already seems like a layering violation, The decision to move partition routing out of ExecInsert() came about when we encountered a bug [1] whereby ExecInsert() would fail to reset estate->es_result_relation_info back to the root table if it had to take an abnormal path out (early return), of which there are quite a few instances. The first solution I came up with was to add a goto label for the code to reset estate->es_result_relation_info and jump to it from the various places that do an early return, which was complained about as reducing readability. So, the solution we eventually settled on in 6666ee49f was to perform ResultRelInfos switching at a higher level. > but it's made worse by > ExecUpdate() having its partition handling solely inside - including another > call to ExecInsert(), including the surrounding partition setup code. > > And even worse, after all that, ExecInsert() still contains partitioning > code. AFAIK, it's only to check the partition constraint when necessary. Partition routing complexity is totally outside, but based on what you write in point 4 below there's bit more... > It seems to me that if we just moved the ExecPrepareTupleRouting() into > ExecInsert(), we could remove the duplication. I agree that there's duplication here. Given what I wrote above, I can think of doing this: move all of ExecInsert()'s code into ExecInsertInternal() and make the former instead look like this: static TupleTableSlot * ExecInsert(ModifyTableState *mtstate, TupleTableSlot *slot, TupleTableSlot *planSlot, EState *estate, bool canSetTag) { PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing; ResultRelInfo *resultRelInfo = estate->es_result_relation_info; /* Prepare for tuple routing if needed. */ if (proute) slot = ExecPrepareTupleRouting(mtstate, estate, proute, resultRelInfo, slot); slot = ExecInsertInternal(mtstate, slot, planSlot, estate, mtstate->canSetTag); /* Revert ExecPrepareTupleRouting's state change. */ if (proute) estate->es_result_relation_info = resultRelInfo; return slot; } > 2) The contents of the > /* > * If a partition check failed, try to move the row into the right > * partition. > */ > if (partition_constraint_failed) > > block ought to be moved to a separate function (maybe > ExecCrossPartitionUpdate or ExecMove). ExecUpdate() is already > complicated enough without dealing with the partition move. I tend to agree with this. Adding Amit Khandekar in case he wants to chime in about this. > 3) How come we reset estate->es_result_relation_info after partition > routing, but not the mtstate wide changes by > ExecPrepareTupleRouting()? Note that its comment says: > > * Caller must revert the estate changes after executing the insertion! > * In mtstate, transition capture changes may also need to be reverted. > > ExecUpdate() contains > > /* > * Updates set the transition capture map only when a new subplan > * is chosen. But for inserts, it is set for each row. So after > * INSERT, we need to revert back to the map created for UPDATE; > * otherwise the next UPDATE will incorrectly use the one created > * for INSERT. So first save the one created for UPDATE. > */ > if (mtstate->mt_transition_capture) > saved_tcs_map = mtstate->mt_transition_capture->tcs_map; > > but as I read the code, that's not really true? It's > ExecPrepareTupleRouting() that does so, and that's called directly in ExecUpdate(). Calling ExecPrepareTupleRouting() is considered a part of a given INSERT operation, so anything it does is to facilitate the INSERT. In this case, which map to assign to tcs_map can only be determined after a partition is chosen and determining the partition (routing) is a job of ExecPrepareTupleRouting(). Perhaps, we need to update the comment here a bit. > 4) > /* > * If this insert is the result of a partition key update that moved the > * tuple to a new partition, put this row into the transition NEW TABLE, > * if there is one. We need to do this separately for DELETE and INSERT > * because they happen on different tables. > */ > ar_insert_trig_tcs = mtstate->mt_transition_capture; > if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture > && mtstate->mt_transition_capture->tcs_update_new_table) > { > ExecARUpdateTriggers(estate, resultRelInfo, NULL, > NULL, > slot, > NULL, > mtstate->mt_transition_capture); > > /* > * We've already captured the NEW TABLE row, so make sure any AR > * INSERT trigger fired below doesn't capture it again. > */ > ar_insert_trig_tcs = NULL; > } > > /* AFTER ROW INSERT Triggers */ > ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes, > ar_insert_trig_tcs); > > Besides not using the just defined ar_insert_trig_tcs and instead > repeatedly referring to mtstate->mt_transition_capture, wouldn't this be > a easier to understand if the were an if/else, instead of resetting > ar_insert_trig_tcs? If the block were > > /* > * triggers behave differently depending on this being a delete as > * part of a partion move, or a deletion proper. > if (mtstate->operation == CMD_UPDATE) > { > /* > * If this insert is the result of a partition key update that moved the > * tuple to a new partition, put this row into the transition NEW TABLE, > * if there is one. We need to do this separately for DELETE and INSERT > * because they happen on different tables. > */ > ExecARUpdateTriggers(estate, resultRelInfo, NULL, > NULL, > slot, > NULL, > mtstate->mt_transition_capture); > > /* > * But we do want to fire plain per-row INSERT triggers on the > * new table. By not passing in transition_capture we prevent > * .... > */ > ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes, > NULL); > } > else > { > /* AFTER ROW INSERT Triggers */ > ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes, > ar_insert_trig_tcs); > } Maybe you meant to use mtstate->mt_transition_capture instead of ar_insert_trig_tcs in the else block. We don't need ar_insert_trig_tcs at all. > it seems like it'd be quite a bit clearer (although I do think the > comments also need a fair bit of polishing independent of this proposed > change). Fwiw, I agree with your proposed restructuring, although I'd let Amit Kh chime in as he'd be more familiar with this code. I wasn't aware of this partitioning-related bit being present in ExecInsert(). Would you like me to write a patch for some or all items? Thanks, Amit [1] https://www.postgresql.org/message-id/flat/0473bf5c-57b1-f1f7-3d58-455c2230bc5f%40lab.ntt.co.jp
Hi, On 2019-07-18 14:24:29 +0900, Amit Langote wrote: > On Thu, Jul 18, 2019 at 10:09 AM Andres Freund <andres@anarazel.de> wrote: > > 1) How come partition routing is done outside of ExecInsert()? > > > > case CMD_INSERT: > > /* Prepare for tuple routing if needed. */ > > if (proute) > > slot = ExecPrepareTupleRouting(node, estate, proute, > > resultRelInfo, slot); > > slot = ExecInsert(node, slot, planSlot, > > estate, node->canSetTag); > > /* Revert ExecPrepareTupleRouting's state change. */ > > if (proute) > > estate->es_result_relation_info = resultRelInfo; > > break; > > > > That already seems like a layering violation, > > The decision to move partition routing out of ExecInsert() came about > when we encountered a bug [1] whereby ExecInsert() would fail to reset > estate->es_result_relation_info back to the root table if it had to > take an abnormal path out (early return), of which there are quite a > few instances. The first solution I came up with was to add a goto > label for the code to reset estate->es_result_relation_info and jump > to it from the various places that do an early return, which was > complained about as reducing readability. So, the solution we > eventually settled on in 6666ee49f was to perform ResultRelInfos > switching at a higher level. I think that was the wrong path, given that the code now lives in multiple places. Without even a comment explaining that if one has to be changed, the other has to be changed too. > > It seems to me that if we just moved the ExecPrepareTupleRouting() into > > ExecInsert(), we could remove the duplication. > > I agree that there's duplication here. Given what I wrote above, I > can think of doing this: move all of ExecInsert()'s code into > ExecInsertInternal() and make the former instead look like this: For me just having the gotos is cleaner than that here. But perhaps the right fix would be to not have ExecPrepareTupleRouting() change global state at all, and instead change it much more locally inside ExecInsert(), around the calls that need it to be set differently. Or perhaps the actually correct fix is to remove es_result_relation_info alltogether, and just pass it down the places that need it - we've a lot more code setting it than using the value. And it'd not be hard to actually pass it to the places that read it. Given all the setting/resetting of it it's pretty obvious that a query-global resource isn't the right place for it. > > /* > > * triggers behave differently depending on this being a delete as > > * part of a partion move, or a deletion proper. > > if (mtstate->operation == CMD_UPDATE) > > { > > /* > > * If this insert is the result of a partition key update that moved the > > * tuple to a new partition, put this row into the transition NEW TABLE, > > * if there is one. We need to do this separately for DELETE and INSERT > > * because they happen on different tables. > > */ > > ExecARUpdateTriggers(estate, resultRelInfo, NULL, > > NULL, > > slot, > > NULL, > > mtstate->mt_transition_capture); > > > > /* > > * But we do want to fire plain per-row INSERT triggers on the > > * new table. By not passing in transition_capture we prevent > > * .... > > */ > > ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes, > > NULL); > > } > > else > > { > > /* AFTER ROW INSERT Triggers */ > > ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes, > > ar_insert_trig_tcs); > > } > > Maybe you meant to use mtstate->mt_transition_capture instead of > ar_insert_trig_tcs in the else block. We don't need > ar_insert_trig_tcs at all. Yes, it was just a untested example of how the code could be made clearer. > > it seems like it'd be quite a bit clearer (although I do think the > > comments also need a fair bit of polishing independent of this proposed > > change). > > Fwiw, I agree with your proposed restructuring, although I'd let Amit > Kh chime in as he'd be more familiar with this code. I wasn't aware > of this partitioning-related bit being present in ExecInsert(). > > Would you like me to write a patch for some or all items? Yes, that would be awesome. Greetings, Andres Freund
On Thu, Jul 18, 2019 at 2:53 PM Andres Freund <andres@anarazel.de> wrote: > On 2019-07-18 14:24:29 +0900, Amit Langote wrote: > > On Thu, Jul 18, 2019 at 10:09 AM Andres Freund <andres@anarazel.de> wrote: > > > 1) How come partition routing is done outside of ExecInsert()? > > > > > > case CMD_INSERT: > > > /* Prepare for tuple routing if needed. */ > > > if (proute) > > > slot = ExecPrepareTupleRouting(node, estate, proute, > > > resultRelInfo, slot); > > > slot = ExecInsert(node, slot, planSlot, > > > estate, node->canSetTag); > > > /* Revert ExecPrepareTupleRouting's state change. */ > > > if (proute) > > > estate->es_result_relation_info = resultRelInfo; > > > break; > > > > > > That already seems like a layering violation, > > > > The decision to move partition routing out of ExecInsert() came about > > when we encountered a bug [1] whereby ExecInsert() would fail to reset > > estate->es_result_relation_info back to the root table if it had to > > take an abnormal path out (early return), of which there are quite a > > few instances. The first solution I came up with was to add a goto > > label for the code to reset estate->es_result_relation_info and jump > > to it from the various places that do an early return, which was > > complained about as reducing readability. So, the solution we > > eventually settled on in 6666ee49f was to perform ResultRelInfos > > switching at a higher level. > > I think that was the wrong path, given that the code now lives in > multiple places. Without even a comment explaining that if one has to be > changed, the other has to be changed too. > > > > > It seems to me that if we just moved the ExecPrepareTupleRouting() into > > > ExecInsert(), we could remove the duplication. > > > > I agree that there's duplication here. Given what I wrote above, I > > can think of doing this: move all of ExecInsert()'s code into > > ExecInsertInternal() and make the former instead look like this: > > For me just having the gotos is cleaner than that here. > > But perhaps the right fix would be to not have ExecPrepareTupleRouting() > change global state at all, and instead change it much more locally > inside ExecInsert(), around the calls that need it to be set > differently. > > Or perhaps the actually correct fix is to remove es_result_relation_info > alltogether, and just pass it down the places that need it - we've a lot > more code setting it than using the value. And it'd not be hard to > actually pass it to the places that read it. Given all the > setting/resetting of it it's pretty obvious that a query-global resource > isn't the right place for it. I tend to agree that managing state through es_result_relation_info across various operations on a result relation has turned a bit messy at this point. That said, while most of the places that access the currently active result relation from es_result_relation_info can be easily modified to receive it directly, the FDW API BeginDirectModify poses bit of a challenge. BeginDirectlyModify() is called via ExecInitForeignScan() that in turn can't be changed to add a result relation (Index or ResultRelInfo *) argument, so the only way left for BeginDirectlyModify() is to access it via es_result_relation_info. Maybe we can do to ExecPrepareTupleRouting() what you say -- remove all code in it that changes ModifyTable-global and EState-global states. Also, maybe call ExecPrepareTupleRouting() inside ExecInsert() at the beginning instead of outside of it. I agree that setting and reverting global states around the exact piece of code that need that to be done is better for clarity. All of that assuming you're not saying that we scrap ExecPrepareTupleRouting altogether. Thoughts? Other opinions? > > Would you like me to write a patch for some or all items? > > Yes, that would be awesome. OK, I will try to post a patch soon. Thanks, Amit
On Thu, Jul 18, 2019 at 4:51 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Thu, Jul 18, 2019 at 2:53 PM Andres Freund <andres@anarazel.de> wrote: > > On 2019-07-18 14:24:29 +0900, Amit Langote wrote: > > > On Thu, Jul 18, 2019 at 10:09 AM Andres Freund <andres@anarazel.de> wrote: > > > > 1) How come partition routing is done outside of ExecInsert()? > > > > > > > > case CMD_INSERT: > > > > /* Prepare for tuple routing if needed. */ > > > > if (proute) > > > > slot = ExecPrepareTupleRouting(node, estate, proute, > > > > resultRelInfo, slot); > > > > slot = ExecInsert(node, slot, planSlot, > > > > estate, node->canSetTag); > > > > /* Revert ExecPrepareTupleRouting's state change. */ > > > > if (proute) > > > > estate->es_result_relation_info = resultRelInfo; > > > > break; > > > > > > > > That already seems like a layering violation, > > > > > > The decision to move partition routing out of ExecInsert() came about > > > when we encountered a bug [1] whereby ExecInsert() would fail to reset > > > estate->es_result_relation_info back to the root table if it had to > > > take an abnormal path out (early return), of which there are quite a > > > few instances. The first solution I came up with was to add a goto > > > label for the code to reset estate->es_result_relation_info and jump > > > to it from the various places that do an early return, which was > > > complained about as reducing readability. So, the solution we > > > eventually settled on in 6666ee49f was to perform ResultRelInfos > > > switching at a higher level. > > > > I think that was the wrong path, given that the code now lives in > > multiple places. Without even a comment explaining that if one has to be > > changed, the other has to be changed too. I thought this would be OK because we have the ExecPrepareTupleRouting() call in just two places in a single source file, at least currently. > > Or perhaps the actually correct fix is to remove es_result_relation_info > > alltogether, and just pass it down the places that need it - we've a lot > > more code setting it than using the value. And it'd not be hard to > > actually pass it to the places that read it. Given all the > > setting/resetting of it it's pretty obvious that a query-global resource > > isn't the right place for it. > > I tend to agree that managing state through es_result_relation_info > across various operations on a result relation has turned a bit messy > at this point. That said, while most of the places that access the > currently active result relation from es_result_relation_info can be > easily modified to receive it directly, the FDW API BeginDirectModify > poses bit of a challenge. BeginDirectlyModify() is called via > ExecInitForeignScan() that in turn can't be changed to add a result > relation (Index or ResultRelInfo *) argument, so the only way left for > BeginDirectlyModify() is to access it via es_result_relation_info. That's right. I'm not sure that's a good idea, because I think other extensions also might look at es_result_relation_info, and if so, removing es_result_relation_info altogether would require the extension authors to update their extensions without any benefit, which I think isn't a good thing. Best regards, Etsuro Fujita
On Thu, Jul 18, 2019 at 4:50 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Thu, Jul 18, 2019 at 2:53 PM Andres Freund <andres@anarazel.de> wrote: > > On 2019-07-18 14:24:29 +0900, Amit Langote wrote: > > > On Thu, Jul 18, 2019 at 10:09 AM Andres Freund <andres@anarazel.de> wrote: > > Or perhaps the actually correct fix is to remove es_result_relation_info > > alltogether, and just pass it down the places that need it - we've a lot > > more code setting it than using the value. And it'd not be hard to > > actually pass it to the places that read it. Given all the > > setting/resetting of it it's pretty obvious that a query-global resource > > isn't the right place for it. >> > > > Would you like me to write a patch for some or all items? > > > > Yes, that would be awesome. > > OK, I will try to post a patch soon. Attached are two patches. The first one (0001) deals with reducing the core executor's reliance on es_result_relation_info to access the currently active result relation, in favor of receiving it from the caller as a function argument. So no piece of core code relies on it being correctly set anymore. It still needs to be set correctly for the third-party code such as FDWs. Also, because the partition routing related suggestions upthread are closely tied into this, especially those around ExecInsert(), I've included them in the same patch. I chose to keep the function ExecPrepareTupleRouting, even though it's now only called from ExecInsert(), to preserve the readability of the latter. The second patch (0002) implements some rearrangement of the UPDATE tuple movement code, addressing the point 2 of in the first email of this thread. Mainly the block of code in ExecUpdate() that implements row movement proper has been moved in a function called ExecMove(). It also contains the cosmetic improvements suggested in point 4. Thanks, Amit
Вложения
Hi, On 2019-07-19 17:52:20 +0900, Amit Langote wrote: > Attached are two patches. Awesome. > The first one (0001) deals with reducing the core executor's reliance > on es_result_relation_info to access the currently active result > relation, in favor of receiving it from the caller as a function > argument. So no piece of core code relies on it being correctly set > anymore. It still needs to be set correctly for the third-party code > such as FDWs. I'm inclined to just remove it. There's not much code out there relying on it, as far as I can tell. Most FDWs don't support the direct modify API, and that's afaict the case where we one needs to use es_result_relation_info? In fact, I searched through alllFDWs listed on https://wiki.postgresql.org/wiki/Foreign_data_wrappers that are on github and in first few categories (up and including to "file wrappers"), and there was only one reference to es_result_relation_info, and that just in comments in a test: https://github.com/pgspider/griddb_fdw/search?utf8=%E2%9C%93&q=es_result_relation_info&type= which I think was just copied from our source code. IOW, we should just change the direct modify calls to get the relevant ResultRelationInfo or something in that vein (perhaps just the relevant RT index?). pglogical also references it, but just because it creates its own EState afaict. > @@ -334,32 +335,50 @@ ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot) > * ExecInsert > * > * For INSERT, we have to insert the tuple into the target relation > - * and insert appropriate tuples into the index relations. > + * (or partition thereof) and insert appropriate tuples into the index > + * relations. > * > * Returns RETURNING result if any, otherwise NULL. > + * > + * This may change the currently active tuple conversion map in > + * mtstate->mt_transition_capture, so the callers must take care to > + * save the previous value to avoid losing track of it. > * ---------------------------------------------------------------- > */ > static TupleTableSlot * > ExecInsert(ModifyTableState *mtstate, > + ResultRelInfo *resultRelInfo, > TupleTableSlot *slot, > TupleTableSlot *planSlot, > EState *estate, > bool canSetTag) > { > - ResultRelInfo *resultRelInfo; > Relation resultRelationDesc; > List *recheckIndexes = NIL; > TupleTableSlot *result = NULL; > TransitionCaptureState *ar_insert_trig_tcs; > ModifyTable *node = (ModifyTable *) mtstate->ps.plan; > OnConflictAction onconflict = node->onConflictAction; > + PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing; > + > + /* > + * If the input result relation is a partitioned table, find the leaf > + * partition to insert the tuple into. > + */ > + if (proute) > + { > + ResultRelInfo *partRelInfo; > + > + slot = ExecPrepareTupleRouting(mtstate, estate, proute, > + resultRelInfo, slot, > + &partRelInfo); > + resultRelInfo = partRelInfo; > + /* Result relation has changed, so update EState reference too. */ > + estate->es_result_relation_info = resultRelInfo; > + } I think by removing es_result_relation entirely, this would look cleaner. > @@ -1271,18 +1274,18 @@ lreplace:; > mtstate->mt_root_tuple_slot); > > /* > - * Prepare for tuple routing, making it look like we're inserting > - * into the root. > + * ExecInsert() may scribble on mtstate->mt_transition_capture, > + * so save the currently active map. > */ > + if (mtstate->mt_transition_capture) > + saved_tcs_map = mtstate->mt_transition_capture->tcs_map; Wonder if we could remove the need for this somehow, it's still pretty darn ugly. Thomas, perhaps you have some insights? To me the need to modify these ModifyTable wide state on a per-subplan and even per-partition basis indicates that the datastructures are in the wrong place. > @@ -2212,23 +2207,17 @@ ExecModifyTable(PlanState *pstate) > switch (operation) > { > case CMD_INSERT: > - /* Prepare for tuple routing if needed. */ > - if (proute) > - slot = ExecPrepareTupleRouting(node, estate, proute, > - resultRelInfo, slot); > - slot = ExecInsert(node, slot, planSlot, > + slot = ExecInsert(node, resultRelInfo, slot, planSlot, > estate, node->canSetTag); > - /* Revert ExecPrepareTupleRouting's state change. */ > - if (proute) > - estate->es_result_relation_info = resultRelInfo; > break; > case CMD_UPDATE: > - slot = ExecUpdate(node, tupleid, oldtuple, slot, planSlot, > - &node->mt_epqstate, estate, node->canSetTag); > + slot = ExecUpdate(node, resultRelInfo, tupleid, oldtuple, slot, > + planSlot, &node->mt_epqstate, estate, > + node->canSetTag); > break; > case CMD_DELETE: > - slot = ExecDelete(node, tupleid, oldtuple, planSlot, > - &node->mt_epqstate, estate, > + slot = ExecDelete(node, resultRelInfo, tupleid, oldtuple, > + planSlot, &node->mt_epqstate, estate, > true, node->canSetTag, > false /* changingPart */ , NULL, NULL); > break; This reminds me of another complaint: ExecDelete and ExecInsert() have gotten more boolean parameters for partition moving, but only one of them is explained with a comment (/* changingPart */) - think we should do that for all. > diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c > index 43edfef089..7df3e78b22 100644 > --- a/src/backend/replication/logical/worker.c > +++ b/src/backend/replication/logical/worker.c > @@ -173,10 +173,10 @@ ensure_transaction(void) > * This is based on similar code in copy.c > */ > static EState * > -create_estate_for_relation(LogicalRepRelMapEntry *rel) > +create_estate_for_relation(LogicalRepRelMapEntry *rel, > + ResultRelInfo **resultRelInfo) > { > EState *estate; > - ResultRelInfo *resultRelInfo; > RangeTblEntry *rte; > > estate = CreateExecutorState(); > @@ -188,12 +188,11 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel) > rte->rellockmode = AccessShareLock; > ExecInitRangeTable(estate, list_make1(rte)); > > - resultRelInfo = makeNode(ResultRelInfo); > - InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0); > + *resultRelInfo = makeNode(ResultRelInfo); > + InitResultRelInfo(*resultRelInfo, rel->localrel, 1, NULL, 0); > > - estate->es_result_relations = resultRelInfo; > + estate->es_result_relations = *resultRelInfo; > estate->es_num_result_relations = 1; > - estate->es_result_relation_info = resultRelInfo; > > estate->es_output_cid = GetCurrentCommandId(true); > > @@ -567,6 +566,7 @@ GetRelationIdentityOrPK(Relation rel) > static void > apply_handle_insert(StringInfo s) > { > + ResultRelInfo *resultRelInfo; > LogicalRepRelMapEntry *rel; > LogicalRepTupleData newtup; > LogicalRepRelId relid; > @@ -589,7 +589,7 @@ apply_handle_insert(StringInfo s) > } > > /* Initialize the executor state. */ > - estate = create_estate_for_relation(rel); > + estate = create_estate_for_relation(rel, &resultRelInfo); Hm. It kinda seems cleaner if we were to instead return the relevant index, rather than the entire ResultRelInfo, as an output from create_estate_for_relation(). Makes it clearer that it's still in the EState. Or perhaps we ought to compute it in a separate step? Then that'd be more amenable to support replcating into partition roots. > /* > - * If this insert is the result of a partition key update that moved the > - * tuple to a new partition, put this row into the transition NEW TABLE, > - * if there is one. We need to do this separately for DELETE and INSERT > - * because they happen on different tables. > + * If this delete is a part of a partition key update, put this row into > + * the UPDATE trigger's NEW TABLE instead of that of an INSERT trigger. > */ > - ar_insert_trig_tcs = mtstate->mt_transition_capture; > - if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture > - && mtstate->mt_transition_capture->tcs_update_new_table) > + if (mtstate->operation == CMD_UPDATE && > + mtstate->mt_transition_capture && > + mtstate->mt_transition_capture->tcs_update_new_table) > { > - ExecARUpdateTriggers(estate, resultRelInfo, NULL, > - NULL, > - slot, > - NULL, > - mtstate->mt_transition_capture); > + ExecARUpdateTriggers(estate, resultRelInfo, NULL, NULL, slot, > + NIL, mtstate->mt_transition_capture); > > /* > - * We've already captured the NEW TABLE row, so make sure any AR > - * INSERT trigger fired below doesn't capture it again. > + * Execute AFTER ROW INSERT Triggers, but such that the row is not > + * captured again in the transition table if any. > */ > - ar_insert_trig_tcs = NULL; > + ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes, > + NULL); > + } > + else > + { > + /* AFTER ROW INSERT Triggers */ > + ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes, > + mtstate->mt_transition_capture); > } > - > - /* AFTER ROW INSERT Triggers */ > - ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes, > - ar_insert_trig_tcs); While a tiny bit more code, perhaps, this is considerably clearer imo. Thanks. > +/* > + * ExecMove > + * Move an updated tuple from the input result relation to the > + * new partition of its root parent table > + * > + * This works by first deleting the tuple from the input result relation > + * followed by inserting it into the root parent table, that is, > + * mtstate->rootResultRelInfo. > + * > + * Returns true if it's detected that the tuple we're trying to move has > + * been concurrently updated. > + */ > +static bool > +ExecMove(ModifyTableState *mtstate, ResultRelInfo *resultRelInfo, > + ItemPointer tupleid, HeapTuple oldtuple, TupleTableSlot *planSlot, > + EPQState *epqstate, bool canSetTag, TupleTableSlot **slot, > + TupleTableSlot **inserted_tuple) > +{ I know that it was one of the names I proposed, but now that I'm thinking about it again, it sounds too generic. Perhaps ExecCrossPartitionUpdate() wouldn't be a quite so generic name? Since there's only one reference the longer name wouldn't be painful. > + /* > + * Row movement, part 1. Delete the tuple, but skip RETURNING > + * processing. We want to return rows from INSERT. > + */ > + ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot, > + epqstate, estate, false, false /* canSetTag */ , > + true /* changingPart */ , &tuple_deleted, &epqslot); Here again it'd be nice if all the booleans would be explained with a comment. Greetings, Andres Freund
On 2019-Jul-19, Andres Freund wrote: > On 2019-07-19 17:52:20 +0900, Amit Langote wrote: > > The first one (0001) deals with reducing the core executor's reliance > > on es_result_relation_info to access the currently active result > > relation, in favor of receiving it from the caller as a function > > argument. So no piece of core code relies on it being correctly set > > anymore. It still needs to be set correctly for the third-party code > > such as FDWs. > > I'm inclined to just remove it. There's not much code out there relying > on it, as far as I can tell. Most FDWs don't support the direct modify > API, and that's afaict the case where we one needs to use > es_result_relation_info? Yeah, I too agree with removing it; since our code doesn't use it, it seems very likely that it will become slightly out of sync with reality and we'd not notice until some FDW misbehaves weirdly. > > - slot = ExecDelete(node, tupleid, oldtuple, planSlot, > > - &node->mt_epqstate, estate, > > + slot = ExecDelete(node, resultRelInfo, tupleid, oldtuple, > > + planSlot, &node->mt_epqstate, estate, > > true, node->canSetTag, > > false /* changingPart */ , NULL, NULL); > > break; > > This reminds me of another complaint: ExecDelete and ExecInsert() have > gotten more boolean parameters for partition moving, but only one of > them is explained with a comment (/* changingPart */) - think we should > do that for all. Maybe change the API to use a flags bitmask? (IMO the placement of the comment inside the function call, making the comma appear preceded with a space, looks ugly. If we want to add comments, let's put each param on its own line with the comment beyond the comma. That's what we do in other places where this pattern is used.) > > /* Initialize the executor state. */ > > - estate = create_estate_for_relation(rel); > > + estate = create_estate_for_relation(rel, &resultRelInfo); > > Hm. It kinda seems cleaner if we were to instead return the relevant > index, rather than the entire ResultRelInfo, as an output from > create_estate_for_relation(). Makes it clearer that it's still in the > EState. Yeah. > Or perhaps we ought to compute it in a separate step? Then that'd be > more amenable to support replcating into partition roots. I'm not quite seeing the shape that you're imagining this would take. I vote not to mess with that for this patch; I bet that we'll have to change a few other things in this code when we add better support for partitioning in logical replication. > > +/* > > + * ExecMove > > + * Move an updated tuple from the input result relation to the > > + * new partition of its root parent table > > + * > > + * This works by first deleting the tuple from the input result relation > > + * followed by inserting it into the root parent table, that is, > > + * mtstate->rootResultRelInfo. > > + * > > + * Returns true if it's detected that the tuple we're trying to move has > > + * been concurrently updated. > > + */ > > +static bool > > +ExecMove(ModifyTableState *mtstate, ResultRelInfo *resultRelInfo, > > + ItemPointer tupleid, HeapTuple oldtuple, TupleTableSlot *planSlot, > > + EPQState *epqstate, bool canSetTag, TupleTableSlot **slot, > > + TupleTableSlot **inserted_tuple) > > +{ > > I know that it was one of the names I proposed, but now that I'm > thinking about it again, it sounds too generic. Perhaps > ExecCrossPartitionUpdate() wouldn't be a quite so generic name? Since > there's only one reference the longer name wouldn't be painful. That name sounds good. Isn't the return convention backwards? Sounds like "true" should mean that it succeeded. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2019-07-19 17:11:10 -0400, Alvaro Herrera wrote: > On 2019-Jul-19, Andres Freund wrote: > > > - slot = ExecDelete(node, tupleid, oldtuple, planSlot, > > > - &node->mt_epqstate, estate, > > > + slot = ExecDelete(node, resultRelInfo, tupleid, oldtuple, > > > + planSlot, &node->mt_epqstate, estate, > > > true, node->canSetTag, > > > false /* changingPart */ , NULL, NULL); > > > break; > > > > This reminds me of another complaint: ExecDelete and ExecInsert() have > > gotten more boolean parameters for partition moving, but only one of > > them is explained with a comment (/* changingPart */) - think we should > > do that for all. > > Maybe change the API to use a flags bitmask? > > (IMO the placement of the comment inside the function call, making the > comma appear preceded with a space, looks ugly. If we want to add > comments, let's put each param on its own line with the comment beyond > the comma. That's what we do in other places where this pattern is > used.) Well, that's the pre-existing style, so I'd just have gone with that. I'm not sure I buy there's much point in going for a bitmask, as this is file-private code, not code where changing the signature requires modifying multiple files. > > > /* Initialize the executor state. */ > > > - estate = create_estate_for_relation(rel); > > > + estate = create_estate_for_relation(rel, &resultRelInfo); > > > > Hm. It kinda seems cleaner if we were to instead return the relevant > > index, rather than the entire ResultRelInfo, as an output from > > create_estate_for_relation(). Makes it clearer that it's still in the > > EState. > > Yeah. > > > Or perhaps we ought to compute it in a separate step? Then that'd be > > more amenable to support replcating into partition roots. > > I'm not quite seeing the shape that you're imagining this would take. > I vote not to mess with that for this patch; I bet that we'll have to > change a few other things in this code when we add better support for > partitioning in logical replication. Yea, I think it's fine to do that separately. If we wanted to support replication roots as replication targets, we'd obviously need to do something pretty similar to what ExecInsert()/ExecUpdate() already do. And there we can't just reference an index in EState, as partition children aren't in there. I kind of was wondering if we were to have a separate function for getting the ResultRelInfo targeted, we'd be able to just extend that function to support replication. But now that I think about it a bit more, that's so much just scratching the surface... We really ought to have the replication "sink" code share more code with nodeModifyTable.c. Greetings, Andres Freund
Hi Andres, Sorry about the delay in replying as I was on vacation for the last few days. On Sat, Jul 20, 2019 at 1:52 AM Andres Freund <andres@anarazel.de> wrote: > > The first one (0001) deals with reducing the core executor's reliance > > on es_result_relation_info to access the currently active result > > relation, in favor of receiving it from the caller as a function > > argument. So no piece of core code relies on it being correctly set > > anymore. It still needs to be set correctly for the third-party code > > such as FDWs. > > I'm inclined to just remove it. There's not much code out there relying > on it, as far as I can tell. Most FDWs don't support the direct modify > API, and that's afaict the case where we one needs to use > es_result_relation_info? Right, only the directly modify API uses it. > In fact, I searched through alllFDWs listed on https://wiki.postgresql.org/wiki/Foreign_data_wrappers > that are on github and in first few categories (up and including to > "file wrappers"), and there was only one reference to > es_result_relation_info, and that just in comments in a test: > https://github.com/pgspider/griddb_fdw/search?utf8=%E2%9C%93&q=es_result_relation_info&type= > which I think was just copied from our source code. > > IOW, we should just change the direct modify calls to get the relevant > ResultRelationInfo or something in that vein (perhaps just the relevant > RT index?). It seems easy to make one of the two functions that constitute the direct modify API, IterateDirectModify(), access the result relation from ForeignScanState by saving either the result relation RT index or ResultRelInfo pointer itself into the ForeignScanState's FDW-private area. For example, for postgres_fdw, one would simply add a new member to PgFdwDirectModifyState struct. Doing that for the other function BeginDirectModify() seems a bit more involved. We could add a new field to ForeignScan, say resultRelation, that's set by either PlanDirectModify() (the FDW code) or make_modifytable() (the core code) if the ForeignScan node contains the command for direct modification. BeginDirectModify() can then use that value instead of relying on es_result_relation_info being set. Thoughts? Fujita-san, do you have any opinion on whether that would be a good idea? > pglogical also references it, but just because it creates its own > EState afaict. That sounds easily manageable. > > @@ -334,32 +335,50 @@ ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot) > > * ExecInsert > > * > > * For INSERT, we have to insert the tuple into the target relation > > - * and insert appropriate tuples into the index relations. > > + * (or partition thereof) and insert appropriate tuples into the index > > + * relations. > > * > > * Returns RETURNING result if any, otherwise NULL. > > + * > > + * This may change the currently active tuple conversion map in > > + * mtstate->mt_transition_capture, so the callers must take care to > > + * save the previous value to avoid losing track of it. > > * ---------------------------------------------------------------- > > */ > > static TupleTableSlot * > > ExecInsert(ModifyTableState *mtstate, > > + ResultRelInfo *resultRelInfo, > > TupleTableSlot *slot, > > TupleTableSlot *planSlot, > > EState *estate, > > bool canSetTag) > > { > > - ResultRelInfo *resultRelInfo; > > Relation resultRelationDesc; > > List *recheckIndexes = NIL; > > TupleTableSlot *result = NULL; > > TransitionCaptureState *ar_insert_trig_tcs; > > ModifyTable *node = (ModifyTable *) mtstate->ps.plan; > > OnConflictAction onconflict = node->onConflictAction; > > + PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing; > > + > > + /* > > + * If the input result relation is a partitioned table, find the leaf > > + * partition to insert the tuple into. > > + */ > > + if (proute) > > + { > > + ResultRelInfo *partRelInfo; > > + > > + slot = ExecPrepareTupleRouting(mtstate, estate, proute, > > + resultRelInfo, slot, > > + &partRelInfo); > > + resultRelInfo = partRelInfo; > > + /* Result relation has changed, so update EState reference too. */ > > + estate->es_result_relation_info = resultRelInfo; > > + } > > I think by removing es_result_relation entirely, this would look > cleaner. I agree. Maybe, setting es_result_relation_info here isn't really needed, because the ResultRelInfo is directly passed through ExecForeignInsert. Still, some FDWs may be relying on es_result_relation_info being correctly set despite the aforementioned. Again, the only way to get them to stop doing so may be to remove it. > > @@ -1271,18 +1274,18 @@ lreplace:; > > mtstate->mt_root_tuple_slot); > > > > /* > > - * Prepare for tuple routing, making it look like we're inserting > > - * into the root. > > + * ExecInsert() may scribble on mtstate->mt_transition_capture, > > + * so save the currently active map. > > */ > > + if (mtstate->mt_transition_capture) > > + saved_tcs_map = mtstate->mt_transition_capture->tcs_map; > > Wonder if we could remove the need for this somehow, it's still pretty > darn ugly. Thomas, perhaps you have some insights? > > To me the need to modify these ModifyTable wide state on a per-subplan > and even per-partition basis indicates that the datastructures are in > the wrong place. I agree that having to ensure tcs_map is set correctly is cumbersome, because it has to be reset every time the currently active result relation changes. I think a better place for the map to be is ResultRelInfo itself. The trigger code can just get the correct map from the ResultRelInfo of the result relation it's processing. Regarding that idea, the necessary map is already present in the tuple-routing state struct that's embedded in the partition's ResultRelInfo. But the UPDATE result relations that are never processed as tuple routing targets don't have routing info initialized (also think non-partition inheritance children), so we could add another TupleConversionMap * field in ResultRelInfo. Attached patch 0003 implements that. With this change, we no longer need to track the map in a global variable, that is, TransitionCaptureState no longer needs tcs_map. We still have tcs_original_insert_tuple though, which must be set during ExecInsert and reset after it's read by AfterTriggerSaveEvent. I have moved the resetting of its value to right after where the originally set value is read to make it clear that the value must be read only once. > > @@ -2212,23 +2207,17 @@ ExecModifyTable(PlanState *pstate) > > switch (operation) > > { > > case CMD_INSERT: > > - /* Prepare for tuple routing if needed. */ > > - if (proute) > > - slot = ExecPrepareTupleRouting(node, estate, proute, > > - resultRelInfo, slot); > > - slot = ExecInsert(node, slot, planSlot, > > + slot = ExecInsert(node, resultRelInfo, slot, planSlot, > > estate, node->canSetTag); > > - /* Revert ExecPrepareTupleRouting's state change. */ > > - if (proute) > > - estate->es_result_relation_info = resultRelInfo; > > break; > > case CMD_UPDATE: > > - slot = ExecUpdate(node, tupleid, oldtuple, slot, planSlot, > > - &node->mt_epqstate, estate, node->canSetTag); > > + slot = ExecUpdate(node, resultRelInfo, tupleid, oldtuple, slot, > > + planSlot, &node->mt_epqstate, estate, > > + node->canSetTag); > > break; > > case CMD_DELETE: > > - slot = ExecDelete(node, tupleid, oldtuple, planSlot, > > - &node->mt_epqstate, estate, > > + slot = ExecDelete(node, resultRelInfo, tupleid, oldtuple, > > + planSlot, &node->mt_epqstate, estate, > > true, node->canSetTag, > > false /* changingPart */ , NULL, NULL); > > break; > > This reminds me of another complaint: ExecDelete and ExecInsert() have > gotten more boolean parameters for partition moving, but only one of > them is explained with a comment (/* changingPart */) - think we should > do that for all. Agree about the confusing state of ExecDelete call sites. I've reformatted the calls to properly label the arguments (the changes are contained in the revised 0001). I don't see many partitioning-specific boolean parameters in ExecInsert though. > > diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c > > index 43edfef089..7df3e78b22 100644 > > --- a/src/backend/replication/logical/worker.c > > +++ b/src/backend/replication/logical/worker.c > > @@ -173,10 +173,10 @@ ensure_transaction(void) > > * This is based on similar code in copy.c > > */ > > static EState * > > -create_estate_for_relation(LogicalRepRelMapEntry *rel) > > +create_estate_for_relation(LogicalRepRelMapEntry *rel, > > + ResultRelInfo **resultRelInfo) > > { > > EState *estate; > > - ResultRelInfo *resultRelInfo; > > RangeTblEntry *rte; > > > > estate = CreateExecutorState(); > > @@ -188,12 +188,11 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel) > > rte->rellockmode = AccessShareLock; > > ExecInitRangeTable(estate, list_make1(rte)); > > > > - resultRelInfo = makeNode(ResultRelInfo); > > - InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0); > > + *resultRelInfo = makeNode(ResultRelInfo); > > + InitResultRelInfo(*resultRelInfo, rel->localrel, 1, NULL, 0); > > > > - estate->es_result_relations = resultRelInfo; > > + estate->es_result_relations = *resultRelInfo; > > estate->es_num_result_relations = 1; > > - estate->es_result_relation_info = resultRelInfo; > > > > estate->es_output_cid = GetCurrentCommandId(true); > > > > @@ -567,6 +566,7 @@ GetRelationIdentityOrPK(Relation rel) > > static void > > apply_handle_insert(StringInfo s) > > { > > + ResultRelInfo *resultRelInfo; > > LogicalRepRelMapEntry *rel; > > LogicalRepTupleData newtup; > > LogicalRepRelId relid; > > @@ -589,7 +589,7 @@ apply_handle_insert(StringInfo s) > > } > > > > /* Initialize the executor state. */ > > - estate = create_estate_for_relation(rel); > > + estate = create_estate_for_relation(rel, &resultRelInfo); > > Hm. It kinda seems cleaner if we were to instead return the relevant > index, rather than the entire ResultRelInfo, as an output from > create_estate_for_relation(). Makes it clearer that it's still in the > EState. For now, I've reverted these changes in favor of just doing this: /* Initialize the executor state. */ estate = create_estate_for_relation(rel); + resultRelInfo = &estate->es_result_relations[0]; This seems OK as we know for sure that there is only one target relation. > Or perhaps we ought to compute it in a separate step? Then that'd be > more amenable to support replcating into partition roots. If we think of create_estate_for_relation() being like InitPlan(), then perhaps it makes sense to leave it as is. Any setup needed for replicating into partition roots will have to be in a separate function anyway. > > +/* > > + * ExecMove > > + * Move an updated tuple from the input result relation to the > > + * new partition of its root parent table > > + * > > + * This works by first deleting the tuple from the input result relation > > + * followed by inserting it into the root parent table, that is, > > + * mtstate->rootResultRelInfo. > > + * > > + * Returns true if it's detected that the tuple we're trying to move has > > + * been concurrently updated. > > + */ > > +static bool > > +ExecMove(ModifyTableState *mtstate, ResultRelInfo *resultRelInfo, > > + ItemPointer tupleid, HeapTuple oldtuple, TupleTableSlot *planSlot, > > + EPQState *epqstate, bool canSetTag, TupleTableSlot **slot, > > + TupleTableSlot **inserted_tuple) > > +{ > > I know that it was one of the names I proposed, but now that I'm > thinking about it again, it sounds too generic. Perhaps > ExecCrossPartitionUpdate() wouldn't be a quite so generic name? Since > there's only one reference the longer name wouldn't be painful. OK, I've renamed ExecMove to ExecCrossPartitionUpdate. > > + /* > > + * Row movement, part 1. Delete the tuple, but skip RETURNING > > + * processing. We want to return rows from INSERT. > > + */ > > + ExecDelete(mtstate, resultRelInfo, tupleid, oldtuple, planSlot, > > + epqstate, estate, false, false /* canSetTag */ , > > + true /* changingPart */ , &tuple_deleted, &epqslot); > > Here again it'd be nice if all the booleans would be explained with a > comment. Done too. Attached updated 0001, 0002, and the new 0003 for transition tuple conversion map related refactoring as explained above. Thanks, Amit
Вложения
On Tue, Jul 30, 2019 at 4:20 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Sat, Jul 20, 2019 at 1:52 AM Andres Freund <andres@anarazel.de> wrote: > > > The first one (0001) deals with reducing the core executor's reliance > > > on es_result_relation_info to access the currently active result > > > relation, in favor of receiving it from the caller as a function > > > argument. So no piece of core code relies on it being correctly set > > > anymore. It still needs to be set correctly for the third-party code > > > such as FDWs. > > > > I'm inclined to just remove it. There's not much code out there relying > > on it, as far as I can tell. Most FDWs don't support the direct modify > > API, and that's afaict the case where we one needs to use > > es_result_relation_info? > > Right, only the directly modify API uses it. > > > In fact, I searched through alllFDWs listed on https://wiki.postgresql.org/wiki/Foreign_data_wrappers > > that are on github and in first few categories (up and including to > > "file wrappers"), and there was only one reference to > > es_result_relation_info, and that just in comments in a test: > > https://github.com/pgspider/griddb_fdw/search?utf8=%E2%9C%93&q=es_result_relation_info&type= > > which I think was just copied from our source code. > > > > IOW, we should just change the direct modify calls to get the relevant > > ResultRelationInfo or something in that vein (perhaps just the relevant > > RT index?). > > It seems easy to make one of the two functions that constitute the > direct modify API, IterateDirectModify(), access the result relation > from ForeignScanState by saving either the result relation RT index or > ResultRelInfo pointer itself into the ForeignScanState's FDW-private > area. For example, for postgres_fdw, one would simply add a new > member to PgFdwDirectModifyState struct. > > Doing that for the other function BeginDirectModify() seems a bit more > involved. We could add a new field to ForeignScan, say > resultRelation, that's set by either PlanDirectModify() (the FDW code) > or make_modifytable() (the core code) if the ForeignScan node contains > the command for direct modification. BeginDirectModify() can then use > that value instead of relying on es_result_relation_info being set. > > Thoughts? Fujita-san, do you have any opinion on whether that would > be a good idea? I looked into trying to do the things I mentioned above and it seems to me that revising BeginDirectModify()'s API to receive the ResultRelInfo directly as Andres suggested might be the best way forward. I've implemented that in the attached 0001. Patches that were previously 0001, 0002, and 0003 are now 0002, 003, and 0004, respectively. 0002 is now a patch to "remove" es_result_relation_info. Thanks, Amit
Вложения
On Wed, Jul 31, 2019 at 5:05 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Tue, Jul 30, 2019 at 4:20 PM Amit Langote <amitlangote09@gmail.com> wrote: > > On Sat, Jul 20, 2019 at 1:52 AM Andres Freund <andres@anarazel.de> wrote: > > > IOW, we should just change the direct modify calls to get the relevant > > > ResultRelationInfo or something in that vein (perhaps just the relevant > > > RT index?). > > > > It seems easy to make one of the two functions that constitute the > > direct modify API, IterateDirectModify(), access the result relation > > from ForeignScanState by saving either the result relation RT index or > > ResultRelInfo pointer itself into the ForeignScanState's FDW-private > > area. For example, for postgres_fdw, one would simply add a new > > member to PgFdwDirectModifyState struct. > > > > Doing that for the other function BeginDirectModify() seems a bit more > > involved. We could add a new field to ForeignScan, say > > resultRelation, that's set by either PlanDirectModify() (the FDW code) > > or make_modifytable() (the core code) if the ForeignScan node contains > > the command for direct modification. BeginDirectModify() can then use > > that value instead of relying on es_result_relation_info being set. > > > > Thoughts? Fujita-san, do you have any opinion on whether that would > > be a good idea? I'm still not sure that it's a good idea to remove es_result_relation_info, but if I had to say then I think the latter would probably be better. I'm planning to rework on direct modification to base it on upper planner pathification so we can perform direct modification without the ModifyTable node. (I'm not sure we can really do this for inherited UPDATE/DELETE, though.) For that rewrite, I'm thinking to call BeginDirectModify() from the ForeignScan node (ie, ExecInitForeignScan()) as-is. The latter approach would allow that without any changes and avoid changing that API many times. That's the reason why I think the latter would probably be better. > I looked into trying to do the things I mentioned above and it seems > to me that revising BeginDirectModify()'s API to receive the > ResultRelInfo directly as Andres suggested might be the best way > forward. I've implemented that in the attached 0001. Patches that > were previously 0001, 0002, and 0003 are now 0002, 003, and 0004, > respectively. 0002 is now a patch to "remove" > es_result_relation_info. Sorry for speaking this late. Best regards, Etsuro Fujita
Hi, On 2019-07-31 21:03:58 +0900, Etsuro Fujita wrote: > I'm still not sure that it's a good idea to remove > es_result_relation_info, but if I had to say then I think the latter > would probably be better. I'm planning to rework on direct > modification to base it on upper planner pathification so we can > perform direct modification without the ModifyTable node. (I'm not > sure we can really do this for inherited UPDATE/DELETE, though.) For > that rewrite, I'm thinking to call BeginDirectModify() from the > ForeignScan node (ie, ExecInitForeignScan()) as-is. The latter > approach would allow that without any changes and avoid changing that > API many times. That's the reason why I think the latter would > probably be better. I think if we did that, it'd become *more* urgent to remove es_result_relation. Having more and more plan nodes change global resources is a recipse for disaster. Greetings, Andres Freund
Fujita-san, Thanks for the reply and sorry I didn't wait a bit more before posting the patch. On Wed, Jul 31, 2019 at 9:04 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > On Wed, Jul 31, 2019 at 5:05 PM Amit Langote <amitlangote09@gmail.com> wrote: > > On Tue, Jul 30, 2019 at 4:20 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > On Sat, Jul 20, 2019 at 1:52 AM Andres Freund <andres@anarazel.de> wrote: > > > > IOW, we should just change the direct modify calls to get the relevant > > > > ResultRelationInfo or something in that vein (perhaps just the relevant > > > > RT index?). > > > > > > It seems easy to make one of the two functions that constitute the > > > direct modify API, IterateDirectModify(), access the result relation > > > from ForeignScanState by saving either the result relation RT index or > > > ResultRelInfo pointer itself into the ForeignScanState's FDW-private > > > area. For example, for postgres_fdw, one would simply add a new > > > member to PgFdwDirectModifyState struct. > > > > > > Doing that for the other function BeginDirectModify() seems a bit more > > > involved. We could add a new field to ForeignScan, say > > > resultRelation, that's set by either PlanDirectModify() (the FDW code) > > > or make_modifytable() (the core code) if the ForeignScan node contains > > > the command for direct modification. BeginDirectModify() can then use > > > that value instead of relying on es_result_relation_info being set. > > > > > > Thoughts? Fujita-san, do you have any opinion on whether that would > > > be a good idea? > > I'm still not sure that it's a good idea to remove > es_result_relation_info, but if I had to say then I think the latter > would probably be better. Could you please clarify what you meant by the "latter"? If it's the approach of adding a resultRelation Index field to ForeignScan node, I tried and had to give up, realizing that we don't maintain ResultRelInfos in an array that is indexable by RT indexes. It would've worked if es_result_relations had mirrored es_range_table, although that probably complicates how the individual ModifyTable nodes attach to that array. In any case, given this discussion, further hacking on a global variable like es_result_relations may be a course we might not want to pursue. > I'm planning to rework on direct > modification to base it on upper planner pathification so we can > perform direct modification without the ModifyTable node. (I'm not > sure we can really do this for inherited UPDATE/DELETE, though.) For > that rewrite, I'm thinking to call BeginDirectModify() from the > ForeignScan node (ie, ExecInitForeignScan()) as-is. The latter > approach would allow that without any changes and avoid changing that > API many times. That's the reason why I think the latter would > probably be better. Will the new planning approach you're thinking of get rid of needing any result relations at all (and so the ResultRelInfos in the executor)? Thanks, Amit
Amit-san, On Thu, Aug 1, 2019 at 10:33 AM Amit Langote <amitlangote09@gmail.com> wrote: > On Wed, Jul 31, 2019 at 9:04 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > > On Wed, Jul 31, 2019 at 5:05 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > On Tue, Jul 30, 2019 at 4:20 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > > On Sat, Jul 20, 2019 at 1:52 AM Andres Freund <andres@anarazel.de> wrote: > > > > > IOW, we should just change the direct modify calls to get the relevant > > > > > ResultRelationInfo or something in that vein (perhaps just the relevant > > > > > RT index?). > > > > > > > > It seems easy to make one of the two functions that constitute the > > > > direct modify API, IterateDirectModify(), access the result relation > > > > from ForeignScanState by saving either the result relation RT index or > > > > ResultRelInfo pointer itself into the ForeignScanState's FDW-private > > > > area. For example, for postgres_fdw, one would simply add a new > > > > member to PgFdwDirectModifyState struct. > > > > > > > > Doing that for the other function BeginDirectModify() seems a bit more > > > > involved. We could add a new field to ForeignScan, say > > > > resultRelation, that's set by either PlanDirectModify() (the FDW code) > > > > or make_modifytable() (the core code) if the ForeignScan node contains > > > > the command for direct modification. BeginDirectModify() can then use > > > > that value instead of relying on es_result_relation_info being set. > > > > > > > > Thoughts? Fujita-san, do you have any opinion on whether that would > > > > be a good idea? > > > > I'm still not sure that it's a good idea to remove > > es_result_relation_info, but if I had to say then I think the latter > > would probably be better. > > Could you please clarify what you meant by the "latter"? > > If it's the approach of adding a resultRelation Index field to > ForeignScan node, I tried and had to give up, realizing that we don't > maintain ResultRelInfos in an array that is indexable by RT indexes. > It would've worked if es_result_relations had mirrored es_range_table, > although that probably complicates how the individual ModifyTable > nodes attach to that array. In any case, given this discussion, > further hacking on a global variable like es_result_relations may be a > course we might not want to pursue. Yeah, I mean that approach. To get the ResultRelInfo, I think searching through the es_result_relations for the ResultRelInfo for the resultRelation added to the ForeignScan in BeginDirectModify() like the attached, which is created along your proposal. ExecFindResultRelInfo() added by the patch wouldn't work efficiently for inherited UPDATE/DELETE where there are many children that are foreign tables, but I think that would probably be OK because in most use-cases, including sharding, the number of such children would be at most < 100 or so. For improving the efficiency for the cases where there are a lot more such children, however, I think it would be an option to do something about global variables so that we can access the ResultRelInfos by RT indexes more efficiently, because IMO I don't think that would be against the point here ie, removing the dependency on es_result_relation_info. Maybe I'm missing something, though. > > I'm planning to rework on direct > > modification to base it on upper planner pathification so we can > > perform direct modification without the ModifyTable node. (I'm not > > sure we can really do this for inherited UPDATE/DELETE, though.) For > > that rewrite, I'm thinking to call BeginDirectModify() from the > > ForeignScan node (ie, ExecInitForeignScan()) as-is. The latter > > approach would allow that without any changes and avoid changing that > > API many times. That's the reason why I think the latter would > > probably be better. > > Will the new planning approach you're thinking of get rid of needing > any result relations at all (and so the ResultRelInfos in the > executor)? I think the new planning approach would still need result relations and ResultRelInfos in the executor as-is; and the FDW would probably use the ResultRelInfo for the foreign table created by the core. Some of the ResultRelInfo data would prpbably need to be initialized by the FDW itesef, though (eg, WCO constraints and/or RETURNING if any). Best regards, Etsuro Fujita
Вложения
Hi, On 2019-07-31 17:04:38 +0900, Amit Langote wrote: > I looked into trying to do the things I mentioned above and it seems > to me that revising BeginDirectModify()'s API to receive the > ResultRelInfo directly as Andres suggested might be the best way > forward. I've implemented that in the attached 0001. Patches that > were previously 0001, 0002, and 0003 are now 0002, 003, and 0004, > respectively. 0002 is now a patch to "remove" > es_result_relation_info. Thanks! Some minor quibbles aside, the non FDW patches look good to me. Fujita-san, do you have any comments on the FDW API change? Or anybody else? I'm a bit woried about the move of BeginDirectModify() into nodeModifyTable.c - it just seems like an odd control flow to me. Not allowing any intermittent nodes between ForeignScan and ModifyTable also seems like an undesirable restriction for the future. I realize that we already do that for BeginForeignModify() (just btw, that already accepts resultRelInfo as a parameter, so being symmetrical for BeginDirectModify makes sense), but it still seems like the wrong direction to me. The need for that move, I assume, comes from needing knowing the correct ResultRelInfo, correct? I wonder if we shouldn't instead determine the at plan time (in setrefs.c), somewhat similar to how we determine ModifyTable.resultRelIndex. Doesn't look like that'd be too hard? Then we could just have BeginForeignModify, BeginDirectModify, BeginForeignScan all be called from ExecInitForeignScan(). Path 04 is such a nice improvement. Besides getting rid of a substantial amount of code, it also makes the control flow a lot easier to read. > @@ -4644,9 +4645,7 @@ GetAfterTriggersTableData(Oid relid, CmdType cmdType) > * If there are no triggers in 'trigdesc' that request relevant transition > * tables, then return NULL. > * > - * The resulting object can be passed to the ExecAR* functions. The caller > - * should set tcs_map or tcs_original_insert_tuple as appropriate when dealing > - * with child tables. > + * The resulting object can be passed to the ExecAR* functions. > * > * Note that we copy the flags from a parent table into this struct (rather > * than subsequently using the relation's TriggerDesc directly) so that we can > @@ -5750,14 +5749,26 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo, > */ > if (row_trigger && transition_capture != NULL) > { > - TupleTableSlot *original_insert_tuple = transition_capture->tcs_original_insert_tuple; > - TupleConversionMap *map = transition_capture->tcs_map; > + TupleTableSlot *original_insert_tuple; > + PartitionRoutingInfo *pinfo = relinfo->ri_PartitionInfo; > + TupleConversionMap *map = pinfo ? > + pinfo->pi_PartitionToRootMap : > + relinfo->ri_ChildToRootMap; > bool delete_old_table = transition_capture->tcs_delete_old_table; > bool update_old_table = transition_capture->tcs_update_old_table; > bool update_new_table = transition_capture->tcs_update_new_table; > bool insert_new_table = transition_capture->tcs_insert_new_table; > > /* > + * Get the originally inserted tuple from the global variable and set > + * the latter to NULL because any given tuple must be read only once. > + * Note that the TransitionCaptureState is shared across many calls > + * to this function. > + */ > + original_insert_tuple = transition_capture->tcs_original_insert_tuple; > + transition_capture->tcs_original_insert_tuple = NULL; Maybe I'm missing something, but original_insert_tuple is not a global variable? > @@ -888,7 +889,8 @@ ExecInitRoutingInfo(ModifyTableState *mtstate, > PartitionTupleRouting *proute, > PartitionDispatch dispatch, > ResultRelInfo *partRelInfo, > - int partidx) > + int partidx, > + bool is_update_result_rel) > { > MemoryContext oldcxt; > PartitionRoutingInfo *partrouteinfo; > @@ -935,10 +937,15 @@ ExecInitRoutingInfo(ModifyTableState *mtstate, > if (mtstate && > (mtstate->mt_transition_capture || mtstate->mt_oc_transition_capture)) > { > - partrouteinfo->pi_PartitionToRootMap = > - convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc), > - RelationGetDescr(partRelInfo->ri_PartitionRoot), > - gettext_noop("could not convert row type")); > + /* If partition is an update target, then we already got the map. */ > + if (is_update_result_rel) > + partrouteinfo->pi_PartitionToRootMap = > + partRelInfo->ri_ChildToRootMap; > + else > + partrouteinfo->pi_PartitionToRootMap = > + convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc), > + RelationGetDescr(partRelInfo->ri_PartitionRoot), > + gettext_noop("could not convert row type")); > } Hm, isn't is_update_result_rel just ModifyTable->operation == CMD_UPDATE? Greetings, Andres Freund
Hi, On Sat, Aug 3, 2019 at 3:01 AM Andres Freund <andres@anarazel.de> wrote: > On 2019-07-31 17:04:38 +0900, Amit Langote wrote: > > I looked into trying to do the things I mentioned above and it seems > > to me that revising BeginDirectModify()'s API to receive the > > ResultRelInfo directly as Andres suggested might be the best way > > forward. I've implemented that in the attached 0001. > Fujita-san, do you have any comments on the FDW API change? Or anybody > else? > > I'm a bit woried about the move of BeginDirectModify() into > nodeModifyTable.c - it just seems like an odd control flow to me. Not > allowing any intermittent nodes between ForeignScan and ModifyTable also > seems like an undesirable restriction for the future. I realize that we > already do that for BeginForeignModify() (just btw, that already accepts > resultRelInfo as a parameter, so being symmetrical for BeginDirectModify > makes sense), but it still seems like the wrong direction to me. > > The need for that move, I assume, comes from needing knowing the correct > ResultRelInfo, correct? I wonder if we shouldn't instead determine the > at plan time (in setrefs.c), somewhat similar to how we determine > ModifyTable.resultRelIndex. Doesn't look like that'd be too hard? I'd vote for that; I created a patch for that [1]. > Then we could just have BeginForeignModify, BeginDirectModify, > BeginForeignScan all be called from ExecInitForeignScan(). I think so too. Best regards, Etsuro Fujita [1] https://www.postgresql.org/message-id/CAPmGK15%3DoFHmWNND5vopfokSGfn6jMXVvnHa7K7P49F7k1hWPQ%40mail.gmail.com
Hi, On 2019-08-03 05:20:35 +0900, Etsuro Fujita wrote: > On Sat, Aug 3, 2019 at 3:01 AM Andres Freund <andres@anarazel.de> wrote: > > On 2019-07-31 17:04:38 +0900, Amit Langote wrote: > > > I looked into trying to do the things I mentioned above and it seems > > > to me that revising BeginDirectModify()'s API to receive the > > > ResultRelInfo directly as Andres suggested might be the best way > > > forward. I've implemented that in the attached 0001. > > > Fujita-san, do you have any comments on the FDW API change? Or anybody > > else? > > > > I'm a bit woried about the move of BeginDirectModify() into > > nodeModifyTable.c - it just seems like an odd control flow to me. Not > > allowing any intermittent nodes between ForeignScan and ModifyTable also > > seems like an undesirable restriction for the future. I realize that we > > already do that for BeginForeignModify() (just btw, that already accepts > > resultRelInfo as a parameter, so being symmetrical for BeginDirectModify > > makes sense), but it still seems like the wrong direction to me. > > > > The need for that move, I assume, comes from needing knowing the correct > > ResultRelInfo, correct? I wonder if we shouldn't instead determine the > > at plan time (in setrefs.c), somewhat similar to how we determine > > ModifyTable.resultRelIndex. Doesn't look like that'd be too hard? > > I'd vote for that; I created a patch for that [1]. > > [1] https://www.postgresql.org/message-id/CAPmGK15%3DoFHmWNND5vopfokSGfn6jMXVvnHa7K7P49F7k1hWPQ%40mail.gmail.com Oh, missed that. But that's not quite what I'm proposing. I don't like ExecFindResultRelInfo at all. What's the point of it? It's introduction is still an API break - I don't understand why that break is better than just passing the ResultRelInfo directly to BeginDirectModify()? I want to again remark that BeginForeignModify() does get the ResultRelInfo - it should have been done the same when adding direct modify. Even if you need the loop - which I don't think is right - it should live somewhere that individual FDWs don't have to care about. - Andres
Hi, On 2019-08-01 18:38:09 +0900, Etsuro Fujita wrote: > On Thu, Aug 1, 2019 at 10:33 AM Amit Langote <amitlangote09@gmail.com> wrote: > > If it's the approach of adding a resultRelation Index field to > > ForeignScan node, I tried and had to give up, realizing that we don't > > maintain ResultRelInfos in an array that is indexable by RT indexes. > > It would've worked if es_result_relations had mirrored es_range_table, > > although that probably complicates how the individual ModifyTable > > nodes attach to that array. We know at plan time what the the resultRelation offset for a ModifyTable node is. We just need to transport that to the respective foreign scan node, and update it properly in setrefs? Then we can index es_result_relations without any additional mapping? Maybe I'm missing something? I think all we need to do is to have setrefs.c:set_plan_refs() iterate over ->fdwDirectModifyPlans or such, and set the respective node's result_relation_offset or whatever we're naming it to splan->resultRelIndex + offset from fdwDirectModifyPlans? > > In any case, given this discussion, further hacking on a global > > variable like es_result_relations may be a course we might not want > > to pursue. I don't think es_result_relations really is problem - it doesn't have to change while processing individual subplans / partitions / whatnot. If we needed a mapping between rtis and result indexes, I'd not see a problem. Doubtful it's needed though. There's a fundamental difference between EState->es_result_relations and EState->es_result_relation_info. The former stays static during the whole query once initialized, whereas es_result_relation_info changes depending on which relation we're processing. The latter is what makes the code more complicated, because we cannot ever return early etc. Similarly, ModifyTableState->mt_per_subplan_tupconv_maps is not a problem, it stays static, but e.g. mtstate->mt_transition_capture is a problem, because we have to change for each subplan / routing / partition movement. Greetings, Andres Freund
Hi, On Sat, Aug 3, 2019 at 5:31 AM Andres Freund <andres@anarazel.de> wrote: > On 2019-08-03 05:20:35 +0900, Etsuro Fujita wrote: > > On Sat, Aug 3, 2019 at 3:01 AM Andres Freund <andres@anarazel.de> wrote: > > > On 2019-07-31 17:04:38 +0900, Amit Langote wrote: > > > > I looked into trying to do the things I mentioned above and it seems > > > > to me that revising BeginDirectModify()'s API to receive the > > > > ResultRelInfo directly as Andres suggested might be the best way > > > > forward. I've implemented that in the attached 0001. > > > > > Fujita-san, do you have any comments on the FDW API change? Or anybody > > > else? > > > > > > I'm a bit woried about the move of BeginDirectModify() into > > > nodeModifyTable.c - it just seems like an odd control flow to me. Not > > > allowing any intermittent nodes between ForeignScan and ModifyTable also > > > seems like an undesirable restriction for the future. I realize that we > > > already do that for BeginForeignModify() (just btw, that already accepts > > > resultRelInfo as a parameter, so being symmetrical for BeginDirectModify > > > makes sense), but it still seems like the wrong direction to me. > > > > > > The need for that move, I assume, comes from needing knowing the correct > > > ResultRelInfo, correct? I wonder if we shouldn't instead determine the > > > at plan time (in setrefs.c), somewhat similar to how we determine > > > ModifyTable.resultRelIndex. Doesn't look like that'd be too hard? > > > > I'd vote for that; I created a patch for that [1]. > > > > [1] https://www.postgresql.org/message-id/CAPmGK15%3DoFHmWNND5vopfokSGfn6jMXVvnHa7K7P49F7k1hWPQ%40mail.gmail.com > > Oh, missed that. But that's not quite what I'm proposing. Sorry, I misread your message. I think I was too tired. > I don't like > ExecFindResultRelInfo at all. What's the point of it? It's introduction > is still an API break - I don't understand why that break is better than > just passing the ResultRelInfo directly to BeginDirectModify()? What API does that function break? The point of that function was to keep the direct modify layering/API as-is, because 1) I too felt the same way about the move of BeginDirectModify() to nodeModifyTable.c, and 2) I was thinking that when rewriting direct modify with upper planner pathification so that we can perform it without ModifyTable, we could still use the existing layering/API as-is, leading to smaller changes to the core for that. > I want > to again remark that BeginForeignModify() does get the ResultRelInfo - > it should have been done the same when adding direct modify. Might have been so. > Even if you need the loop - which I don't think is right - it should > live somewhere that individual FDWs don't have to care about. I was thinking to use hash lookup in ExecFindResultRelInfo() when es_result_relations is very long, but I think the setters.c approach you mentioned above might be much better. Will consider that. Best regards, Etsuro Fujita
Hi, On 2019-08-03 19:41:55 +0900, Etsuro Fujita wrote: > > I don't like > > ExecFindResultRelInfo at all. What's the point of it? It's introduction > > is still an API break - I don't understand why that break is better than > > just passing the ResultRelInfo directly to BeginDirectModify()? > > What API does that function break? You need to call it, whereas previously you did not need to call it. The effort to change an FDW to get one more parameter, or to call that function is about the same. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2019-08-03 19:41:55 +0900, Etsuro Fujita wrote: >> What API does that function break? > You need to call it, whereas previously you did not need to call it. The > effort to change an FDW to get one more parameter, or to call that > function is about the same. If those are the choices, adding a parameter is clearly the preferable solution, because it makes the API breakage obvious at compile. Adding a function would make sense, perhaps, if only a minority of FDWs need to do so. It'd still be risky if the need to do so could be missed in light testing. regards, tom lane
Hi, On 2019-08-03 13:48:01 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2019-08-03 19:41:55 +0900, Etsuro Fujita wrote: > >> What API does that function break? > > > You need to call it, whereas previously you did not need to call it. The > > effort to change an FDW to get one more parameter, or to call that > > function is about the same. > > If those are the choices, adding a parameter is clearly the preferable > solution, because it makes the API breakage obvious at compile. Right. I think it's a *bit* less clear in this case because we'd also remove the field that such FDWs with direct modify support would use now (EState.es_result_relation_info). But I think it's also just plainly a better API to use the parameter. Even if, in contrast to the BeginDirectModify at hand, BeginForeignModify didn't already accept it. Requiring a function call to gather information that just about every realistic implementation is going to need doesn't make sense. Greetings, Andres Freund
Hi, On Sun, Aug 4, 2019 at 3:03 AM Andres Freund <andres@anarazel.de> wrote: > On 2019-08-03 13:48:01 -0400, Tom Lane wrote: > > Andres Freund <andres@anarazel.de> writes: > > > On 2019-08-03 19:41:55 +0900, Etsuro Fujita wrote: > > >> What API does that function break? > > > > > You need to call it, whereas previously you did not need to call it. The > > > effort to change an FDW to get one more parameter, or to call that > > > function is about the same. I got the point. > > If those are the choices, adding a parameter is clearly the preferable > > solution, because it makes the API breakage obvious at compile. > > Right. I think it's a *bit* less clear in this case because we'd also > remove the field that such FDWs with direct modify support would use > now (EState.es_result_relation_info). > > But I think it's also just plainly a better API to use the > parameter. Even if, in contrast to the BeginDirectModify at hand, > BeginForeignModify didn't already accept it. Requiring a function call to > gather information that just about every realistic implementation is > going to need doesn't make sense. Agreed. Best regards, Etsuro Fujita
Hi, On Sun, Aug 4, 2019 at 4:45 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > On Sun, Aug 4, 2019 at 3:03 AM Andres Freund <andres@anarazel.de> wrote: > > On 2019-08-03 13:48:01 -0400, Tom Lane wrote: > > > If those are the choices, adding a parameter is clearly the preferable > > > solution, because it makes the API breakage obvious at compile. > > > > Right. I think it's a *bit* less clear in this case because we'd also > > remove the field that such FDWs with direct modify support would use > > now (EState.es_result_relation_info). > > > > But I think it's also just plainly a better API to use the > > parameter. Even if, in contrast to the BeginDirectModify at hand, > > BeginForeignModify didn't already accept it. Requiring a function call to > > gather information that just about every realistic implementation is > > going to need doesn't make sense. > > Agreed. So, is it correct to think that the consensus is to add a parameter to BeginDirectModify()? Also, avoid changing where BeginDirectModify() is called from, like my patch did, only to have easy access to the ResultRelInfo to pass. We can do that by by augmenting ForeignScan node to add the information needed to fetch the ResultRelInfo efficiently from ExecInitForeignScan() itself. That information is the ordinal position of a given result relation in PlannedStmt.resultRelations, not the RT index as we were discussing. Thanks, Amit
Amit-san, On Mon, Aug 5, 2019 at 1:31 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Sun, Aug 4, 2019 at 4:45 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > > On Sun, Aug 4, 2019 at 3:03 AM Andres Freund <andres@anarazel.de> wrote: > > > On 2019-08-03 13:48:01 -0400, Tom Lane wrote: > > > > If those are the choices, adding a parameter is clearly the preferable > > > > solution, because it makes the API breakage obvious at compile. > > > > > > Right. I think it's a *bit* less clear in this case because we'd also > > > remove the field that such FDWs with direct modify support would use > > > now (EState.es_result_relation_info). > > > > > > But I think it's also just plainly a better API to use the > > > parameter. Even if, in contrast to the BeginDirectModify at hand, > > > BeginForeignModify didn't already accept it. Requiring a function call to > > > gather information that just about every realistic implementation is > > > going to need doesn't make sense. > > > > Agreed. > > So, is it correct to think that the consensus is to add a parameter to > BeginDirectModify()? I think so. > Also, avoid changing where BeginDirectModify() is called from, like my > patch did, only to have easy access to the ResultRelInfo to pass. We > can do that by by augmenting ForeignScan node to add the information > needed to fetch the ResultRelInfo efficiently from > ExecInitForeignScan() itself. I think so. > That information is the ordinal > position of a given result relation in PlannedStmt.resultRelations, > not the RT index as we were discussing. Yeah, that would be what Andres is proposing, which I think is much better than what I proposed using the RT index. Could you update your patch? Best regards, Etsuro Fujita
Fujita-san, Thanks for the quick follow up. On Mon, Aug 5, 2019 at 2:31 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > On Mon, Aug 5, 2019 at 1:31 PM Amit Langote <amitlangote09@gmail.com> wrote: > > On Sun, Aug 4, 2019 at 4:45 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > > > On Sun, Aug 4, 2019 at 3:03 AM Andres Freund <andres@anarazel.de> wrote: > > > > On 2019-08-03 13:48:01 -0400, Tom Lane wrote: > > > > > If those are the choices, adding a parameter is clearly the preferable > > > > > solution, because it makes the API breakage obvious at compile. > > > > > > > > Right. I think it's a *bit* less clear in this case because we'd also > > > > remove the field that such FDWs with direct modify support would use > > > > now (EState.es_result_relation_info). > > > > > > > > But I think it's also just plainly a better API to use the > > > > parameter. Even if, in contrast to the BeginDirectModify at hand, > > > > BeginForeignModify didn't already accept it. Requiring a function call to > > > > gather information that just about every realistic implementation is > > > > going to need doesn't make sense. > > > > > > Agreed. > > > > So, is it correct to think that the consensus is to add a parameter to > > BeginDirectModify()? > > I think so. > > > Also, avoid changing where BeginDirectModify() is called from, like my > > patch did, only to have easy access to the ResultRelInfo to pass. We > > can do that by by augmenting ForeignScan node to add the information > > needed to fetch the ResultRelInfo efficiently from > > ExecInitForeignScan() itself. > > I think so. > > > That information is the ordinal > > position of a given result relation in PlannedStmt.resultRelations, > > not the RT index as we were discussing. > > Yeah, that would be what Andres is proposing, which I think is much > better than what I proposed using the RT index. > > Could you update your patch? OK, I will do that. I'll reply with the updated patches to an upthread email of Andres' [1], where he also comments on the other patches. Thanks, Amit [1] https://www.postgresql.org/message-id/20190802180138.64zcircokw2upaho%40alap3.anarazel.de
Amit-san, On Mon, Aug 5, 2019 at 2:36 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Mon, Aug 5, 2019 at 2:31 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > > On Mon, Aug 5, 2019 at 1:31 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > On Sun, Aug 4, 2019 at 4:45 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > > > > On Sun, Aug 4, 2019 at 3:03 AM Andres Freund <andres@anarazel.de> wrote: > > > > > On 2019-08-03 13:48:01 -0400, Tom Lane wrote: > > > > > > If those are the choices, adding a parameter is clearly the preferable > > > > > > solution, because it makes the API breakage obvious at compile. > > > > > > > > > > Right. I think it's a *bit* less clear in this case because we'd also > > > > > remove the field that such FDWs with direct modify support would use > > > > > now (EState.es_result_relation_info). > > > > > > > > > > But I think it's also just plainly a better API to use the > > > > > parameter. Even if, in contrast to the BeginDirectModify at hand, > > > > > BeginForeignModify didn't already accept it. Requiring a function call to > > > > > gather information that just about every realistic implementation is > > > > > going to need doesn't make sense. > > > > > > > > Agreed. > > > > > > So, is it correct to think that the consensus is to add a parameter to > > > BeginDirectModify()? > > > > I think so. > > > > > Also, avoid changing where BeginDirectModify() is called from, like my > > > patch did, only to have easy access to the ResultRelInfo to pass. We > > > can do that by by augmenting ForeignScan node to add the information > > > needed to fetch the ResultRelInfo efficiently from > > > ExecInitForeignScan() itself. > > > > I think so. > > > > > That information is the ordinal > > > position of a given result relation in PlannedStmt.resultRelations, > > > not the RT index as we were discussing. > > > > Yeah, that would be what Andres is proposing, which I think is much > > better than what I proposed using the RT index. > > > > Could you update your patch? > > OK, I will do that. I'll reply with the updated patches to an > upthread email of Andres' [1], where he also comments on the other > patches. Thanks! Will review the updated version of the FDW patch, at least. Best regards, Etsuro Fujita
Hi Andres, Fujita-san, On Sat, Aug 3, 2019 at 3:01 AM Andres Freund <andres@anarazel.de> wrote: > On 2019-07-31 17:04:38 +0900, Amit Langote wrote: > > I looked into trying to do the things I mentioned above and it seems > > to me that revising BeginDirectModify()'s API to receive the > > ResultRelInfo directly as Andres suggested might be the best way > > forward. I've implemented that in the attached 0001. Patches that > > were previously 0001, 0002, and 0003 are now 0002, 003, and 0004, > > respectively. 0002 is now a patch to "remove" > > es_result_relation_info. > > Thanks! Some minor quibbles aside, the non FDW patches look good to me. > > Fujita-san, do you have any comments on the FDW API change? Or anybody > else? Based on the discussion, I have updated the patch. > I'm a bit woried about the move of BeginDirectModify() into > nodeModifyTable.c - it just seems like an odd control flow to me. Not > allowing any intermittent nodes between ForeignScan and ModifyTable also > seems like an undesirable restriction for the future. I realize that we > already do that for BeginForeignModify() (just btw, that already accepts > resultRelInfo as a parameter, so being symmetrical for BeginDirectModify > makes sense), but it still seems like the wrong direction to me. > > The need for that move, I assume, comes from needing knowing the correct > ResultRelInfo, correct? I wonder if we shouldn't instead determine the > at plan time (in setrefs.c), somewhat similar to how we determine > ModifyTable.resultRelIndex. Doesn't look like that'd be too hard? The patch adds a resultRelIndex field to ForeignScan node, which is set to >= 0 value for non-SELECT queries. I first thought to set it only if direct modification is being used, but maybe it'd be simpler to set it even if direct modification is not used. To set it, the patch teaches set_plan_refs() to initialize resultRelIndex of ForeignScan plans that appear under ModifyTable. Fujita-san said he plans to revise the planning of direct-modification style queries to not require a ModifyTable node anymore, but maybe he'll just need to add similar code elsewhere but not outside setrefs.c. > Then we could just have BeginForeignModify, BeginDirectModify, > BeginForeignScan all be called from ExecInitForeignScan(). I too think that it would've been great if we could call both BeginForeignModify and BeginDirectModify from ExecInitForeignScan, but the former's API seems to be designed to be called from ExecInitModifyTable from the get-go. Maybe we should leave that as-is? > Path 04 is such a nice improvement. Besides getting rid of a substantial > amount of code, it also makes the control flow a lot easier to read. Thanks. > > @@ -4644,9 +4645,7 @@ GetAfterTriggersTableData(Oid relid, CmdType cmdType) > > * If there are no triggers in 'trigdesc' that request relevant transition > > * tables, then return NULL. > > * > > - * The resulting object can be passed to the ExecAR* functions. The caller > > - * should set tcs_map or tcs_original_insert_tuple as appropriate when dealing > > - * with child tables. > > + * The resulting object can be passed to the ExecAR* functions. > > * > > * Note that we copy the flags from a parent table into this struct (rather > > * than subsequently using the relation's TriggerDesc directly) so that we can > > @@ -5750,14 +5749,26 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo, > > */ > > if (row_trigger && transition_capture != NULL) > > { > > - TupleTableSlot *original_insert_tuple = transition_capture->tcs_original_insert_tuple; > > - TupleConversionMap *map = transition_capture->tcs_map; > > + TupleTableSlot *original_insert_tuple; > > + PartitionRoutingInfo *pinfo = relinfo->ri_PartitionInfo; > > + TupleConversionMap *map = pinfo ? > > + pinfo->pi_PartitionToRootMap : > > + relinfo->ri_ChildToRootMap; > > bool delete_old_table = transition_capture->tcs_delete_old_table; > > bool update_old_table = transition_capture->tcs_update_old_table; > > bool update_new_table = transition_capture->tcs_update_new_table; > > bool insert_new_table = transition_capture->tcs_insert_new_table; > > > > /* > > + * Get the originally inserted tuple from the global variable and set > > + * the latter to NULL because any given tuple must be read only once. > > + * Note that the TransitionCaptureState is shared across many calls > > + * to this function. > > + */ > > + original_insert_tuple = transition_capture->tcs_original_insert_tuple; > > + transition_capture->tcs_original_insert_tuple = NULL; > > Maybe I'm missing something, but original_insert_tuple is not a global > variable? I really meant to refer to the fact that it's maintained in a ModifyTable-global struct. I've updated this comment a bit. > > @@ -888,7 +889,8 @@ ExecInitRoutingInfo(ModifyTableState *mtstate, > > PartitionTupleRouting *proute, > > PartitionDispatch dispatch, > > ResultRelInfo *partRelInfo, > > - int partidx) > > + int partidx, > > + bool is_update_result_rel) > > { > > MemoryContext oldcxt; > > PartitionRoutingInfo *partrouteinfo; > > @@ -935,10 +937,15 @@ ExecInitRoutingInfo(ModifyTableState *mtstate, > > if (mtstate && > > (mtstate->mt_transition_capture || mtstate->mt_oc_transition_capture)) > > { > > - partrouteinfo->pi_PartitionToRootMap = > > - convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc), > > - RelationGetDescr(partRelInfo->ri_PartitionRoot), > > - gettext_noop("could not convert row type")); > > + /* If partition is an update target, then we already got the map. */ > > + if (is_update_result_rel) > > + partrouteinfo->pi_PartitionToRootMap = > > + partRelInfo->ri_ChildToRootMap; > > + else > > + partrouteinfo->pi_PartitionToRootMap = > > + convert_tuples_by_name(RelationGetDescr(partRelInfo->ri_RelationDesc), > > + RelationGetDescr(partRelInfo->ri_PartitionRoot), > > + gettext_noop("could not convert row type")); > > } > > Hm, isn't is_update_result_rel just ModifyTable->operation == CMD_UPDATE? No. The operation being CMD_UPDATE doesn't mean that the ResultRelInfo that is passed to ExecInitRoutingInfo() is an UPDATE result rel. It could be a ResultRelInfo built by ExecFindPartition() when a row needed to be moved into a partition that is not present in the UPDATE result rels contained in ModifyTableState. Though I realized that we don't really need to add a new parameter to figure that out. Looking at ri_RangeTableIndex property of the passed-in ResultRelInfo is enough to distinguish the two types of ResultRelInfos. I've updated the patch that way. I found more dead code related to transition capture setup, which I've removed in the latest 0004. For example, the mt_per_subplan_tupconv_maps array and the code in nodeModifyTable.c that was used to initialize it. Attached updated patches. Thanks, Amit
Вложения
Amit-san, On Mon, Aug 5, 2019 at 6:16 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Sat, Aug 3, 2019 at 3:01 AM Andres Freund <andres@anarazel.de> wrote: > Based on the discussion, I have updated the patch. > > > I'm a bit woried about the move of BeginDirectModify() into > > nodeModifyTable.c - it just seems like an odd control flow to me. Not > > allowing any intermittent nodes between ForeignScan and ModifyTable also > > seems like an undesirable restriction for the future. I realize that we > > already do that for BeginForeignModify() (just btw, that already accepts > > resultRelInfo as a parameter, so being symmetrical for BeginDirectModify > > makes sense), but it still seems like the wrong direction to me. > > > > The need for that move, I assume, comes from needing knowing the correct > > ResultRelInfo, correct? I wonder if we shouldn't instead determine the > > at plan time (in setrefs.c), somewhat similar to how we determine > > ModifyTable.resultRelIndex. Doesn't look like that'd be too hard? > > The patch adds a resultRelIndex field to ForeignScan node, which is > set to >= 0 value for non-SELECT queries. Thanks for the updated patch! > I first thought to set it > only if direct modification is being used, but maybe it'd be simpler > to set it even if direct modification is not used. To set it, the > patch teaches set_plan_refs() to initialize resultRelIndex of > ForeignScan plans that appear under ModifyTable. Fujita-san said he > plans to revise the planning of direct-modification style queries to > not require a ModifyTable node anymore, but maybe he'll just need to > add similar code elsewhere but not outside setrefs.c. Yeah, but I'm not sure this is a good idea: @ -877,12 +878,6 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) rc->rti += rtoffset; rc->prti += rtoffset; } - foreach(l, splan->plans) - { - lfirst(l) = set_plan_refs(root, - (Plan *) lfirst(l), - rtoffset); - } /* * Append this ModifyTable node's final result relation RT @@ -908,6 +903,27 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) lappend_int(root->glob->rootResultRelations, splan->rootRelation); } + + resultRelIndex = splan->resultRelIndex; + foreach(l, splan->plans) + { + lfirst(l) = set_plan_refs(root, + (Plan *) lfirst(l), + rtoffset); + + /* + * For foreign table result relations, save their index + * in the global list of result relations into the + * corresponding ForeignScan nodes. + */ + if (IsA(lfirst(l), ForeignScan)) + { + ForeignScan *fscan = (ForeignScan *) lfirst(l); + + fscan->resultRelIndex = resultRelIndex; + } + resultRelIndex++; + } } because I still feel the same way as mentioned above by Andres. What I'm thinking for the setrefs.c change is to modify ForeignScan (ie, set_foreignscan_references) rather than ModifyTable, like the attached. Maybe I'm missing something, but for direct modification without ModifyTable, I think we would probably only have to modify that function further so that it not only adjusts resultRelIndex but does some extra work such as appending the result relation RT index to root->glob->resultRelations as done for ModifyTable. > > Then we could just have BeginForeignModify, BeginDirectModify, > > BeginForeignScan all be called from ExecInitForeignScan(). Sorry, previously, I mistakenly agreed with that. As I said before, I think I was too tired. > I too think that it would've been great if we could call both > BeginForeignModify and BeginDirectModify from ExecInitForeignScan, but > the former's API seems to be designed to be called from > ExecInitModifyTable from the get-go. Maybe we should leave that > as-is? +1 for leaving that as-is; it seems reasonable to me to call BeginForeignModify in ExecInitModifyTable, because the ForeignModify API is designed based on an analogy with local table modifications, in which case the initialization needed for performing ExecInsert/ExecUpdate/ExecDelete is done in ModifyTable, not in the underlying scan/join node. @@ -895,6 +898,12 @@ BeginDirectModify(ForeignScanState *node, for <function>ExplainDirectModify</function> and <function>EndDirectModif\ y</function>. </para> + <note> + Also note that it's a good idea to store the <literal>rinfo</literal> + in the <structfield>fdw_state</structfield> for + <function>IterateDirectModify</function> to use. + </node> Actually, if the FDW only supports direct modifications for queries without RETURNING, it wouldn't need the rinfo in IterateDirectModify, so I think we would probably need to update this as such. Having said that, it seems too detailed to me to describe such a thing in the FDW documentation. To avoid making the documentation verbose, it would be better to not add such kind of thing at all? Note: other change in the attached patch is that I modified _readForeignScan accordingly. Best regards, Etsuro Fujita
Вложения
Hi, On 2019-08-05 18:16:10 +0900, Amit Langote wrote: > The patch adds a resultRelIndex field to ForeignScan node, which is > set to >= 0 value for non-SELECT queries. I first thought to set it > only if direct modification is being used, but maybe it'd be simpler > to set it even if direct modification is not used. Yea, I think we should just always set it. > To set it, the > patch teaches set_plan_refs() to initialize resultRelIndex of > ForeignScan plans that appear under ModifyTable. Fujita-san said he > plans to revise the planning of direct-modification style queries to > not require a ModifyTable node anymore, but maybe he'll just need to > add similar code elsewhere but not outside setrefs.c. I think I prefer the approach in Fujita-san's email. While not extremely pretty either, it would allow for having nodes between the foreign scan and the modify node. > > Then we could just have BeginForeignModify, BeginDirectModify, > > BeginForeignScan all be called from ExecInitForeignScan(). > > I too think that it would've been great if we could call both > BeginForeignModify and BeginDirectModify from ExecInitForeignScan, but > the former's API seems to be designed to be called from > ExecInitModifyTable from the get-go. Maybe we should leave that > as-is? Yea, we should leave it where it is. I think the API here is fairly ugly, but it's probably not worth changing. And if we were to change it, it'd need a lot bigger hammer. Greetings, Andres Freund
Fujita-san, Thanks a lot the review. On Tue, Aug 6, 2019 at 9:56 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > On Mon, Aug 5, 2019 at 6:16 PM Amit Langote <amitlangote09@gmail.com> wrote: > > I first thought to set it > > only if direct modification is being used, but maybe it'd be simpler > > to set it even if direct modification is not used. To set it, the > > patch teaches set_plan_refs() to initialize resultRelIndex of > > ForeignScan plans that appear under ModifyTable. Fujita-san said he > > plans to revise the planning of direct-modification style queries to > > not require a ModifyTable node anymore, but maybe he'll just need to > > add similar code elsewhere but not outside setrefs.c. > > Yeah, but I'm not sure this is a good idea: > > @ -877,12 +878,6 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) > rc->rti += rtoffset; > rc->prti += rtoffset; > } > - foreach(l, splan->plans) > - { > - lfirst(l) = set_plan_refs(root, > - (Plan *) lfirst(l), > - rtoffset); > - } > > /* > * Append this ModifyTable node's final result relation RT > @@ -908,6 +903,27 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) > lappend_int(root->glob->rootResultRelations, > splan->rootRelation); > } > + > + resultRelIndex = splan->resultRelIndex; > + foreach(l, splan->plans) > + { > + lfirst(l) = set_plan_refs(root, > + (Plan *) lfirst(l), > + rtoffset); > + > + /* > + * For foreign table result relations, save their index > + * in the global list of result relations into the > + * corresponding ForeignScan nodes. > + */ > + if (IsA(lfirst(l), ForeignScan)) > + { > + ForeignScan *fscan = (ForeignScan *) lfirst(l); > + > + fscan->resultRelIndex = resultRelIndex; > + } > + resultRelIndex++; > + } > } > > because I still feel the same way as mentioned above by Andres. Reading Andres' emails again, I now understand why we shouldn't set ForeignScan's resultRelIndex the way my patches did. > What > I'm thinking for the setrefs.c change is to modify ForeignScan (ie, > set_foreignscan_references) rather than ModifyTable, like the > attached. Thanks for the patch. I have couple of comments: * I'm afraid that we've implicitly created an ordering constraint on some code in set_plan_refs(). That is, a ModifyTable's plans now must always be processed before adding its result relations to the global list, which for good measure, should be written down somewhere; I propose this comment in the ModifyTable's case block in set_plan_refs: @@ -877,6 +877,13 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) rc->rti += rtoffset; rc->prti += rtoffset; } + /* + * Caution: Do not change the relative ordering of this loop + * and the statement below that adds the result relations to + * root->glob->resultRelations, because we need to use the + * current value of list_length(root->glob->resultRelations) + * in some plans. + */ foreach(l, splan->plans) { lfirst(l) = set_plan_refs(root, * Regarding setting ForeignScan.resultRelIndex even for non-direct modifications, maybe that's not a good idea anymore. A foreign table result relation might be involved in a local join, which prevents it from being directly-modifiable and also hides the ForeignScan node from being easily modifiable in PlanForeignModify. Maybe, we should just interpret resultRelIndex as being set only when direct-modification is feasible. Should we rename the field accordingly to be self-documenting? Please let me know your thoughts, so that I can modify the patch. > Maybe I'm missing something, but for direct modification > without ModifyTable, I think we would probably only have to modify > that function further so that it not only adjusts resultRelIndex but > does some extra work such as appending the result relation RT index to > root->glob->resultRelations as done for ModifyTable. Yeah, that seems reasonable. > > > Then we could just have BeginForeignModify, BeginDirectModify, > > > BeginForeignScan all be called from ExecInitForeignScan(). > > Sorry, previously, I mistakenly agreed with that. As I said before, I > think I was too tired. > > > I too think that it would've been great if we could call both > > BeginForeignModify and BeginDirectModify from ExecInitForeignScan, but > > the former's API seems to be designed to be called from > > ExecInitModifyTable from the get-go. Maybe we should leave that > > as-is? > > +1 for leaving that as-is; it seems reasonable to me to call > BeginForeignModify in ExecInitModifyTable, because the ForeignModify > API is designed based on an analogy with local table modifications, in > which case the initialization needed for performing > ExecInsert/ExecUpdate/ExecDelete is done in ModifyTable, not in the > underlying scan/join node. Thanks for the explanation. > @@ -895,6 +898,12 @@ BeginDirectModify(ForeignScanState *node, > for <function>ExplainDirectModify</function> and <function>EndDirectModif\ > y</function>. > </para> > > + <note> > + Also note that it's a good idea to store the <literal>rinfo</literal> > + in the <structfield>fdw_state</structfield> for > + <function>IterateDirectModify</function> to use. > + </node> > > Actually, if the FDW only supports direct modifications for queries > without RETURNING, it wouldn't need the rinfo in IterateDirectModify, > so I think we would probably need to update this as such. Having said > that, it seems too detailed to me to describe such a thing in the FDW > documentation. To avoid making the documentation verbose, it would be > better to not add such kind of thing at all? Hmm OK. Perhaps, others who want to implement the direct modification API can work that out by looking at postgres_fdw implementation. > Note: other change in the attached patch is that I modified > _readForeignScan accordingly. Thanks. Regards, Amit
Amit-san, On Wed, Aug 7, 2019 at 10:24 AM Amit Langote <amitlangote09@gmail.com> wrote: > On Tue, Aug 6, 2019 at 9:56 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > > What > > I'm thinking for the setrefs.c change is to modify ForeignScan (ie, > > set_foreignscan_references) rather than ModifyTable, like the > > attached. > > Thanks for the patch. I have couple of comments: > > * I'm afraid that we've implicitly created an ordering constraint on > some code in set_plan_refs(). That is, a ModifyTable's plans now must > always be processed before adding its result relations to the global > list, which for good measure, should be written down somewhere; I > propose this comment in the ModifyTable's case block in set_plan_refs: > > @@ -877,6 +877,13 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) > rc->rti += rtoffset; > rc->prti += rtoffset; > } > + /* > + * Caution: Do not change the relative ordering of this loop > + * and the statement below that adds the result relations to > + * root->glob->resultRelations, because we need to use the > + * current value of list_length(root->glob->resultRelations) > + * in some plans. > + */ > foreach(l, splan->plans) > { > lfirst(l) = set_plan_refs(root, +1 > * Regarding setting ForeignScan.resultRelIndex even for non-direct > modifications, maybe that's not a good idea anymore. A foreign table > result relation might be involved in a local join, which prevents it > from being directly-modifiable and also hides the ForeignScan node > from being easily modifiable in PlanForeignModify. Maybe, we should > just interpret resultRelIndex as being set only when > direct-modification is feasible. Yeah, I think so; when using PlanForeignModify because for example, the foreign table result relation is involved in a local join, as you mentioned, ForeignScan.operation would be left unchanged (ie, CMD_SELECT), so to me it's more understandable to not set ForeignScan.resultRelIndex. > Should we rename the field > accordingly to be self-documenting? IMO I like the name resultRelIndex, but do you have any better idea? > > @@ -895,6 +898,12 @@ BeginDirectModify(ForeignScanState *node, > > for <function>ExplainDirectModify</function> and <function>EndDirectModif\ > > y</function>. > > </para> > > > > + <note> > > + Also note that it's a good idea to store the <literal>rinfo</literal> > > + in the <structfield>fdw_state</structfield> for > > + <function>IterateDirectModify</function> to use. > > + </node> > > > > Actually, if the FDW only supports direct modifications for queries > > without RETURNING, it wouldn't need the rinfo in IterateDirectModify, > > so I think we would probably need to update this as such. Having said > > that, it seems too detailed to me to describe such a thing in the FDW > > documentation. To avoid making the documentation verbose, it would be > > better to not add such kind of thing at all? > > Hmm OK. Perhaps, others who want to implement the direct modification > API can work that out by looking at postgres_fdw implementation. Yeah, I think so. Best regards, Etsuro Fujita
Fujita-san, Thanks for the quick follow up. On Wed, Aug 7, 2019 at 11:30 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > On Wed, Aug 7, 2019 at 10:24 AM Amit Langote <amitlangote09@gmail.com> wrote: > > * Regarding setting ForeignScan.resultRelIndex even for non-direct > > modifications, maybe that's not a good idea anymore. A foreign table > > result relation might be involved in a local join, which prevents it > > from being directly-modifiable and also hides the ForeignScan node > > from being easily modifiable in PlanForeignModify. Maybe, we should > > just interpret resultRelIndex as being set only when > > direct-modification is feasible. > > Yeah, I think so; when using PlanForeignModify because for example, > the foreign table result relation is involved in a local join, as you > mentioned, ForeignScan.operation would be left unchanged (ie, > CMD_SELECT), so to me it's more understandable to not set > ForeignScan.resultRelIndex. OK. > > Should we rename the field > > accordingly to be self-documenting? > > IMO I like the name resultRelIndex, but do you have any better idea? On second thought, I'm fine with sticking to resultRelIndex. Trying to make it self documenting might make the name very long. Here are the updated patches. Thanks, Amit
Вложения
Hi, On Wed, Aug 7, 2019 at 11:47 AM Amit Langote <amitlangote09@gmail.com> wrote: > On Wed, Aug 7, 2019 at 11:30 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > > On Wed, Aug 7, 2019 at 10:24 AM Amit Langote <amitlangote09@gmail.com> wrote: > > > * Regarding setting ForeignScan.resultRelIndex even for non-direct > > > modifications, maybe that's not a good idea anymore. A foreign table > > > result relation might be involved in a local join, which prevents it > > > from being directly-modifiable and also hides the ForeignScan node > > > from being easily modifiable in PlanForeignModify. Maybe, we should > > > just interpret resultRelIndex as being set only when > > > direct-modification is feasible. > > > > Yeah, I think so; when using PlanForeignModify because for example, > > the foreign table result relation is involved in a local join, as you > > mentioned, ForeignScan.operation would be left unchanged (ie, > > CMD_SELECT), so to me it's more understandable to not set > > ForeignScan.resultRelIndex. > > OK. > > > > Should we rename the field > > > accordingly to be self-documenting? > > > > IMO I like the name resultRelIndex, but do you have any better idea? > > On second thought, I'm fine with sticking to resultRelIndex. Trying > to make it self documenting might make the name very long. OK > Here are the updated patches. IIUC, I think we reached a consensus at least on the 0001 patch. Andres, would you mind if I commit that patch? Best regards, Etsuro Fujita
On Wed, Aug 7, 2019 at 12:00 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > IIUC, I think we reached a consensus at least on the 0001 patch. > Andres, would you mind if I commit that patch? I just noticed obsolete references to es_result_relation_info that 0002 failed to remove. One of them is in fdwhandler.sgml: <programlisting> TupleTableSlot * IterateDirectModify(ForeignScanState *node); </programlisting> ... The data that was actually inserted, updated or deleted must be stored in the <literal>es_result_relation_info->ri_projectReturning->pi_exprContext->ecxt_scantuple</literal> of the node's <structname>EState</structname>. We will need to rewrite this without mentioning es_result_relation_info. How about as follows: - <literal>es_result_relation_info->ri_projectReturning->pi_exprContext->ecxt_scantuple</literal> - of the node's <structname>EState</structname>. + <literal>ri_projectReturning->pi_exprContext->ecxt_scantuple</literal> + of the result relation's<structname>ResultRelInfo</structname> that has + been made available via node. I've updated 0001 with the above change. Also, I updated 0002 to remove other references. Thanks, Amit
Вложения
Amit-san, On Wed, Aug 7, 2019 at 4:28 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Wed, Aug 7, 2019 at 12:00 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > > IIUC, I think we reached a consensus at least on the 0001 patch. > > Andres, would you mind if I commit that patch? > > I just noticed obsolete references to es_result_relation_info that > 0002 failed to remove. One of them is in fdwhandler.sgml: > > <programlisting> > TupleTableSlot * > IterateDirectModify(ForeignScanState *node); > </programlisting> > > ... The data that was actually inserted, updated > or deleted must be stored in the > <literal>es_result_relation_info->ri_projectReturning->pi_exprContext->ecxt_scantuple</literal> > of the node's <structname>EState</structname>. > > We will need to rewrite this without mentioning > es_result_relation_info. How about as follows: > > - <literal>es_result_relation_info->ri_projectReturning->pi_exprContext->ecxt_scantuple</literal> > - of the node's <structname>EState</structname>. > + <literal>ri_projectReturning->pi_exprContext->ecxt_scantuple</literal> > + of the result relation's<structname>ResultRelInfo</structname> that has > + been made available via node. > > I've updated 0001 with the above change. Good catch! This would be nitpicking, but: * IIUC, we don't use the term "result relation" in fdwhandler.sgml. For consistency with your change to the doc for BeginDirectModify, how about using the term "target foreign table" instead of "result relation"? * ISTM that "<structname>ResultRelInfo</structname> that has been made available via node" would be a bit fuzzy to FDW authors. To be more specific, how about changing it to "<structname>ResultRelInfo</structname> passed to <function>BeginDirectModify</function>" or something like that? Best regards, Etsuro Fujita
Fujita-san, On Wed, Aug 7, 2019 at 6:00 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > On Wed, Aug 7, 2019 at 4:28 PM Amit Langote <amitlangote09@gmail.com> wrote: > > I just noticed obsolete references to es_result_relation_info that > > 0002 failed to remove. One of them is in fdwhandler.sgml: > > > > <programlisting> > > TupleTableSlot * > > IterateDirectModify(ForeignScanState *node); > > </programlisting> > > > > ... The data that was actually inserted, updated > > or deleted must be stored in the > > <literal>es_result_relation_info->ri_projectReturning->pi_exprContext->ecxt_scantuple</literal> > > of the node's <structname>EState</structname>. > > > > We will need to rewrite this without mentioning > > es_result_relation_info. How about as follows: > > > > - <literal>es_result_relation_info->ri_projectReturning->pi_exprContext->ecxt_scantuple</literal> > > - of the node's <structname>EState</structname>. > > + <literal>ri_projectReturning->pi_exprContext->ecxt_scantuple</literal> > > + of the result relation's<structname>ResultRelInfo</structname> that has > > + been made available via node. > > > > I've updated 0001 with the above change. > > Good catch! Thanks for the review. > This would be nitpicking, but: > > * IIUC, we don't use the term "result relation" in fdwhandler.sgml. > For consistency with your change to the doc for BeginDirectModify, how > about using the term "target foreign table" instead of "result > relation"? Agreed, done. > * ISTM that "<structname>ResultRelInfo</structname> that has been made > available via node" would be a bit fuzzy to FDW authors. To be more > specific, how about changing it to > "<structname>ResultRelInfo</structname> passed to > <function>BeginDirectModify</function>" or something like that? That works for me, although an FDW author reading this still has got to make the connection. Attached updated patches; only 0001 changed in this version. Thanks, Amit
Вложения
Hi, On Thu, Aug 8, 2019 at 10:10 AM Amit Langote <amitlangote09@gmail.com> wrote: > On Wed, Aug 7, 2019 at 6:00 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > > On Wed, Aug 7, 2019 at 4:28 PM Amit Langote <amitlangote09@gmail.com> wrote: > > > I just noticed obsolete references to es_result_relation_info that > > > 0002 failed to remove. One of them is in fdwhandler.sgml: > > > > > > <programlisting> > > > TupleTableSlot * > > > IterateDirectModify(ForeignScanState *node); > > > </programlisting> > > > > > > ... The data that was actually inserted, updated > > > or deleted must be stored in the > > > <literal>es_result_relation_info->ri_projectReturning->pi_exprContext->ecxt_scantuple</literal> > > > of the node's <structname>EState</structname>. > > > > > > We will need to rewrite this without mentioning > > > es_result_relation_info. How about as follows: > > > > > > - <literal>es_result_relation_info->ri_projectReturning->pi_exprContext->ecxt_scantuple</literal> > > > - of the node's <structname>EState</structname>. > > > + <literal>ri_projectReturning->pi_exprContext->ecxt_scantuple</literal> > > > + of the result relation's<structname>ResultRelInfo</structname> that has > > > + been made available via node. > > > > > > I've updated 0001 with the above change. > > This would be nitpicking, but: > > > > * IIUC, we don't use the term "result relation" in fdwhandler.sgml. > > For consistency with your change to the doc for BeginDirectModify, how > > about using the term "target foreign table" instead of "result > > relation"? > > Agreed, done. > > > * ISTM that "<structname>ResultRelInfo</structname> that has been made > > available via node" would be a bit fuzzy to FDW authors. To be more > > specific, how about changing it to > > "<structname>ResultRelInfo</structname> passed to > > <function>BeginDirectModify</function>" or something like that? > > That works for me, although an FDW author reading this still has got > to make the connection. > > Attached updated patches; only 0001 changed in this version. Thanks for the updated version, Amit-san! I updated the 0001 patch a bit further: * Tweaked comments in plannodes.h, createplan.c, and nodeForeignscan.c. * Made cosmetic changes to postgres_fdw.c. * Adjusted doc changes a bit, mainly not to produce unnecessary diff. * Modified the commit message. Attached is an updated version of the 0001 patch. Does that make sense? Best regards, Etsuro Fujita
Вложения
Fujita-san, On Thu, Aug 8, 2019 at 9:49 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > On Thu, Aug 8, 2019 at 10:10 AM Amit Langote <amitlangote09@gmail.com> wrote: > > Attached updated patches; only 0001 changed in this version. > > Thanks for the updated version, Amit-san! I updated the 0001 patch a > bit further: > > * Tweaked comments in plannodes.h, createplan.c, and nodeForeignscan.c. > * Made cosmetic changes to postgres_fdw.c. > * Adjusted doc changes a bit, mainly not to produce unnecessary diff. > * Modified the commit message. > > Attached is an updated version of the 0001 patch. Does that make sense? Looks perfect, thank you. Regards, Amit
On Fri, Aug 9, 2019 at 10:51 AM Amit Langote <amitlangote09@gmail.com> wrote: > > Fujita-san, > > On Thu, Aug 8, 2019 at 9:49 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > > On Thu, Aug 8, 2019 at 10:10 AM Amit Langote <amitlangote09@gmail.com> wrote: > > > Attached updated patches; only 0001 changed in this version. > > > > Thanks for the updated version, Amit-san! I updated the 0001 patch a > > bit further: > > > > * Tweaked comments in plannodes.h, createplan.c, and nodeForeignscan.c. > > * Made cosmetic changes to postgres_fdw.c. > > * Adjusted doc changes a bit, mainly not to produce unnecessary diff. > > * Modified the commit message. > > > > Attached is an updated version of the 0001 patch. Does that make sense? > > Looks perfect, thank you. To avoid losing track of this, I've added this to November CF. https://commitfest.postgresql.org/25/2277/ Struggled a bit to give a title to the entry though. Thanks, Amit
On Wed, Sep 4, 2019 at 10:45 AM Amit Langote <amitlangote09@gmail.com> wrote: > On Fri, Aug 9, 2019 at 10:51 AM Amit Langote <amitlangote09@gmail.com> wrote: > To avoid losing track of this, I've added this to November CF. > > https://commitfest.postgresql.org/25/2277/ > > Struggled a bit to give a title to the entry though. Noticed that one of the patches needed a rebase. Attached updated patches. Note that v8-0001 is v7-0001 unchanged that Fujita-san posted on Aug 8. Thanks, Amit
Вложения
On Thu, Sep 26, 2019 at 1:56 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Wed, Sep 4, 2019 at 10:45 AM Amit Langote <amitlangote09@gmail.com> wrote: > > On Fri, Aug 9, 2019 at 10:51 AM Amit Langote <amitlangote09@gmail.com> wrote: > > To avoid losing track of this, I've added this to November CF. > > > > https://commitfest.postgresql.org/25/2277/ > > > > Struggled a bit to give a title to the entry though. > > Noticed that one of the patches needed a rebase. > > Attached updated patches. Note that v8-0001 is v7-0001 unchanged that > Fujita-san posted on Aug 8. Rebased again. Thanks, Amit
Вложения
Amit Langote <amitlangote09@gmail.com> writes: > Rebased again. Seems to need that again, according to cfbot :-( regards, tom lane
On Mon, Mar 2, 2020 at 4:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Amit Langote <amitlangote09@gmail.com> writes: > > Rebased again. > > Seems to need that again, according to cfbot :-( Thank you, done. Regards, Amit
Вложения
> On 2 Mar 2020, at 06:08, Amit Langote <amitlangote09@gmail.com> wrote: > > On Mon, Mar 2, 2020 at 4:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Amit Langote <amitlangote09@gmail.com> writes: >>> Rebased again. >> >> Seems to need that again, according to cfbot :-( > > Thank you, done. ..and another one is needed as it no longer applies, please submit a rebased version. cheers ./daniel
On Wed, Jul 1, 2020 at 6:56 PM Daniel Gustafsson <daniel@yesql.se> wrote: > > > On 2 Mar 2020, at 06:08, Amit Langote <amitlangote09@gmail.com> wrote: > > > > On Mon, Mar 2, 2020 at 4:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Amit Langote <amitlangote09@gmail.com> writes: > >>> Rebased again. > >> > >> Seems to need that again, according to cfbot :-( > > > > Thank you, done. > > ..and another one is needed as it no longer applies, please submit a rebased > version. Sorry, it took me a while to get to this. It's been over 11 months since there was any significant commentary on the contents of the patches themselves, so perhaps I should reiterate what the patches are about and why it might still be a good idea to consider them. The thread started with some very valid criticism of the way executor's partition tuple routing logic looks randomly sprinkled over in nodeModifyTable.c, execPartition.c. In the process of making it look less random, we decided to get rid of the global variable es_result_relation_info to avoid complex maneuvers of setting/resetting it correctly when performing partition tuple routing, causing some other churn beside the partitioning code. Same with another global variable TransitionCaptureState.tcs_map. So, the patches neither add any new capabilities, nor improve performance, but they do make the code in this area a bit easier to follow. Actually, there is a problem that some of the changes here conflict with patches being discussed on other threads ([1], [2]), so much so that I decided to absorb some changes here into another "refactoring" patch that I have posted at [2]. Attached rebased patches. 0001 contains preparatory FDW API changes to stop relying on es_result_relation_info being set correctly. 0002 removes es_result_relation_info in favor passing the active result relation around as a parameter in the various functions that need it 0003 Moves UPDATE tuple-routing logic into a new function 0004 removes the global variable TransitionCaptureState.tcs_map which needed to be set/reset whenever the active result relation relation changes in favor of a new field in ResultRelInfo to store the same map -- Amit Langote EnterpriseDB: http://www.enterprisedb.com [1] https://commitfest.postgresql.org/28/2575/ [2] https://commitfest.postgresql.org/28/2621/
Вложения
On 13/07/2020 08:47, Amit Langote wrote: > It's been over 11 months since there was any significant commentary on > the contents of the patches themselves, so perhaps I should reiterate > what the patches are about and why it might still be a good idea to > consider them. > > The thread started with some very valid criticism of the way > executor's partition tuple routing logic looks randomly sprinkled over > in nodeModifyTable.c, execPartition.c. In the process of making it > look less random, we decided to get rid of the global variable > es_result_relation_info to avoid complex maneuvers of > setting/resetting it correctly when performing partition tuple > routing, causing some other churn beside the partitioning code. Same > with another global variable TransitionCaptureState.tcs_map. So, the > patches neither add any new capabilities, nor improve performance, but > they do make the code in this area a bit easier to follow. > > Actually, there is a problem that some of the changes here conflict > with patches being discussed on other threads ([1], [2]), so much so > that I decided to absorb some changes here into another "refactoring" > patch that I have posted at [2]. Thanks for the summary. It's been a bit hard to follow what depends on what across these threads, and how they work together. It seems that this patch set is the best place to start. > Attached rebased patches. > > 0001 contains preparatory FDW API changes to stop relying on > es_result_relation_info being set correctly. Makes sense. The only thing I don't like about this is the way the ForeignScan->resultRelIndex field is set. make_foreignscan() initializes it to -1, and the FDW's PlanDirectModify() function is expected to set it, like you did in postgres_fdw: > @@ -2319,6 +2322,11 @@ postgresPlanDirectModify(PlannerInfo *root, > rebuild_fdw_scan_tlist(fscan, returningList); > } > > + /* > + * Set the index of the subplan result rel. > + */ > + fscan->resultRelIndex = subplan_index; > + > table_close(rel, NoLock); > return true; > } It has to be set to that value (subplan_index is an argument to PlanDirectModify()), the FDW doesn't have any choice there, so this is just additional boilerplate code that has to be copied to every FDW that implements direct modify. Furthermore, if the FDW doesn't set it correctly, you could have some very interesting results, like updating wrong table. It would be better to set it in make_modifytable(), just after calling PlanDirectModify(). I'm also a bit unhappy with the way it's updated in set_plan_refs(): > --- a/src/backend/optimizer/plan/setrefs.c > +++ b/src/backend/optimizer/plan/setrefs.c > @@ -904,6 +904,13 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) > rc->rti += rtoffset; > rc->prti += rtoffset; > } > + /* > + * Caution: Do not change the relative ordering of this loop > + * and the statement below that adds the result relations to > + * root->glob->resultRelations, because we need to use the > + * current value of list_length(root->glob->resultRelations) > + * in some plans. > + */ > foreach(l, splan->plans) > { > lfirst(l) = set_plan_refs(root, > @@ -1243,6 +1250,14 @@ set_foreignscan_references(PlannerInfo *root, > } > > fscan->fs_relids = offset_relid_set(fscan->fs_relids, rtoffset); > + > + /* > + * Adjust resultRelIndex if it's valid (note that we are called before > + * adding the RT indexes of ModifyTable result relations to the global > + * list) > + */ > + if (fscan->resultRelIndex >= 0) > + fscan->resultRelIndex += list_length(root->glob->resultRelations); > } > > /* That "Caution" comment is well deserved, but could we make this more robust to begin with? The most straightforward solution would be to pass down the "current resultRelIndex" as an extra parameter to set_plan_refs(), similar to rtoffset. If we did that, we wouldn't actually need to set it before setrefs.c processing at all. I'm a bit wary of adding another argument to set_plan_refs() because that's a lot of code churn, but it does seem like the most natural solution to me. Maybe create a new context struct to hold the PlannerInfo, rtoffset, and the new "currentResultRelIndex" value, similar to fix_scan_expr_context, to avoid passing through so many arguments. Another idea is to merge "resultRelIndex" and a "range table index" into one value. Range table entries that are updated would have a ResultRelInfo, others would not. I'm not sure if that would end up being cleaner or messier than what we have now, but might be worth trying. > 0002 removes es_result_relation_info in favor passing the active > result relation around as a parameter in the various functions that > need it Looks good. > 0003 Moves UPDATE tuple-routing logic into a new function > > 0004 removes the global variable TransitionCaptureState.tcs_map which > needed to be set/reset whenever the active result relation relation > changes in favor of a new field in ResultRelInfo to store the same map I didn't look closely, but these make sense at a quick glance. - Heikki
Hekki, Thanks a lot for the review! On Tue, Oct 6, 2020 at 12:45 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > On 13/07/2020 08:47, Amit Langote wrote: > > It's been over 11 months since there was any significant commentary on > > the contents of the patches themselves, so perhaps I should reiterate > > what the patches are about and why it might still be a good idea to > > consider them. > > > > The thread started with some very valid criticism of the way > > executor's partition tuple routing logic looks randomly sprinkled over > > in nodeModifyTable.c, execPartition.c. In the process of making it > > look less random, we decided to get rid of the global variable > > es_result_relation_info to avoid complex maneuvers of > > setting/resetting it correctly when performing partition tuple > > routing, causing some other churn beside the partitioning code. Same > > with another global variable TransitionCaptureState.tcs_map. So, the > > patches neither add any new capabilities, nor improve performance, but > > they do make the code in this area a bit easier to follow. > > > > Actually, there is a problem that some of the changes here conflict > > with patches being discussed on other threads ([1], [2]), so much so > > that I decided to absorb some changes here into another "refactoring" > > patch that I have posted at [2]. > > Thanks for the summary. It's been a bit hard to follow what depends on > what across these threads, and how they work together. It seems that > this patch set is the best place to start. Great. I'd be happy if I will have one less set of patches to keep at home. :-) > > Attached rebased patches. > > > > 0001 contains preparatory FDW API changes to stop relying on > > es_result_relation_info being set correctly. > > Makes sense. The only thing I don't like about this is the way the > ForeignScan->resultRelIndex field is set. make_foreignscan() initializes > it to -1, and the FDW's PlanDirectModify() function is expected to set > it, like you did in postgres_fdw: > > > @@ -2319,6 +2322,11 @@ postgresPlanDirectModify(PlannerInfo *root, > > rebuild_fdw_scan_tlist(fscan, returningList); > > } > > > > + /* > > + * Set the index of the subplan result rel. > > + */ > > + fscan->resultRelIndex = subplan_index; > > + > > table_close(rel, NoLock); > > return true; > > } > > It has to be set to that value (subplan_index is an argument to > PlanDirectModify()), the FDW doesn't have any choice there, so this is > just additional boilerplate code that has to be copied to every FDW that > implements direct modify. Furthermore, if the FDW doesn't set it > correctly, you could have some very interesting results, like updating > wrong table. It would be better to set it in make_modifytable(), just > after calling PlanDirectModify(). Actually, that's how it was done in earlier iterations but I think I decided to move that into the FDW's functions due to some concern of one of the other patches that depended on this patch. Maybe it makes sense to bring that back into make_modifytable() and worry about the other patch later. > I'm also a bit unhappy with the way it's updated in set_plan_refs(): > > > --- a/src/backend/optimizer/plan/setrefs.c > > +++ b/src/backend/optimizer/plan/setrefs.c > > @@ -904,6 +904,13 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) > > rc->rti += rtoffset; > > rc->prti += rtoffset; > > } > > + /* > > + * Caution: Do not change the relative ordering of this loop > > + * and the statement below that adds the result relations to > > + * root->glob->resultRelations, because we need to use the > > + * current value of list_length(root->glob->resultRelations) > > + * in some plans. > > + */ > > foreach(l, splan->plans) > > { > > lfirst(l) = set_plan_refs(root, > > @@ -1243,6 +1250,14 @@ set_foreignscan_references(PlannerInfo *root, > > } > > > > fscan->fs_relids = offset_relid_set(fscan->fs_relids, rtoffset); > > + > > + /* > > + * Adjust resultRelIndex if it's valid (note that we are called before > > + * adding the RT indexes of ModifyTable result relations to the global > > + * list) > > + */ > > + if (fscan->resultRelIndex >= 0) > > + fscan->resultRelIndex += list_length(root->glob->resultRelations); > > } > > > > /* > > That "Caution" comment is well deserved, but could we make this more > robust to begin with? The most straightforward solution would be to pass > down the "current resultRelIndex" as an extra parameter to > set_plan_refs(), similar to rtoffset. If we did that, we wouldn't > actually need to set it before setrefs.c processing at all. Hmm, I don't think I understand the last sentence. A given ForeignScan node's resultRelIndex will have to be set before getting to set_plan_refs(). I mean we shouldn't be making it a job of setrefs.c to figure out which ForeignScan nodes need to have its resultRelIndex set to a valid value. > I'm a bit wary of adding another argument to set_plan_refs() because > that's a lot of code churn, but it does seem like the most natural > solution to me. Maybe create a new context struct to hold the > PlannerInfo, rtoffset, and the new "currentResultRelIndex" value, > similar to fix_scan_expr_context, to avoid passing through so many > arguments. I like the idea of a context struct. I've implemented it as a separate refactoring patch (0001) and 0002 (what was before 0001) extends it for "current ResultRelIndex", although I used the name rroffset for "current ResultRelIndex" to go along with rtoffset. > Another idea is to merge "resultRelIndex" and a "range table index" into > one value. Range table entries that are updated would have a > ResultRelInfo, others would not. I'm not sure if that would end up being > cleaner or messier than what we have now, but might be worth trying. I have thought about something like this before. An idea I had is to make es_result_relations array indexable by plain RT indexes, then we don't need to maintain separate indexes that we do today for result relations. > > 0002 removes es_result_relation_info in favor passing the active > > result relation around as a parameter in the various functions that > > need it > > Looks good. > > > 0003 Moves UPDATE tuple-routing logic into a new function > > > > 0004 removes the global variable TransitionCaptureState.tcs_map which > > needed to be set/reset whenever the active result relation relation > > changes in favor of a new field in ResultRelInfo to store the same map > > I didn't look closely, but these make sense at a quick glance. Updated patches attached. -- Amit Langote EDB: http://www.enterprisedb.com
Вложения
On 07/10/2020 12:50, Amit Langote wrote: > On Tue, Oct 6, 2020 at 12:45 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> It would be better to set it in make_modifytable(), just >> after calling PlanDirectModify(). > > Actually, that's how it was done in earlier iterations but I think I > decided to move that into the FDW's functions due to some concern of > one of the other patches that depended on this patch. Maybe it makes > sense to bring that back into make_modifytable() and worry about the > other patch later. On second thoughts, I take back my earlier comment. Setting it in make_modifytable() relies on the assumption that the subplan is a single ForeignScan node, on the target relation. The documentation for PlanDirectModify says: > To execute the direct modification on the remote server, this > function must rewrite the target subplan with a ForeignScan plan node > that executes the direct modification on the remote server. So I guess that assumption is safe. But I'd like to have some wiggle room here. Wouldn't it be OK to have a Result node on top of the ForeignScan, for example? If it really must be a simple ForeignScan node, the PlanDirectModify API seems pretty strange. I'm not entirely sure what I would like to do with this now. I could live with either version, but I'm not totally happy with either. (I like your suggestion below) Looking at this block in postgresBeginDirectModify: > /* > * Identify which user to do the remote access as. This should match what > * ExecCheckRTEPerms() does. > */ > Assert(fsplan->resultRelIndex >= 0); > dmstate->resultRelIndex = fsplan->resultRelIndex; > rtindex = list_nth_int(resultRelations, fsplan->resultRelIndex); > rte = exec_rt_fetch(rtindex, estate); > userid = rte->checkAsUser ? rte->checkAsUser : GetUserId(); That's a complicated way of finding out the target table's RTI. We should probably store the result RTI in the ForeignScan in the first place. >> Another idea is to merge "resultRelIndex" and a "range table index" into >> one value. Range table entries that are updated would have a >> ResultRelInfo, others would not. I'm not sure if that would end up being >> cleaner or messier than what we have now, but might be worth trying. > > I have thought about something like this before. An idea I had is to > make es_result_relations array indexable by plain RT indexes, then we > don't need to maintain separate indexes that we do today for result > relations. That sounds like a good idea. es_result_relations is currently an array of ResultRelInfos, so that would leave a lot of unfilled structs in the array. But in on of your other threads, you proposed turning es_result_relations into an array of pointers anyway (https://www.postgresql.org/message-id/CA+HiwqE4k1Q2TLmCAvekw+8_NXepbnfUOamOeX=KpHRDTfSKxA@mail.gmail.com). - Heikki
On Wed, Oct 7, 2020 at 9:07 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > On 07/10/2020 12:50, Amit Langote wrote: > > On Tue, Oct 6, 2020 at 12:45 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > >> It would be better to set it in make_modifytable(), just > >> after calling PlanDirectModify(). > > > > Actually, that's how it was done in earlier iterations but I think I > > decided to move that into the FDW's functions due to some concern of > > one of the other patches that depended on this patch. Maybe it makes > > sense to bring that back into make_modifytable() and worry about the > > other patch later. > > On second thoughts, I take back my earlier comment. Setting it in > make_modifytable() relies on the assumption that the subplan is a single > ForeignScan node, on the target relation. The documentation for > PlanDirectModify says: > > > To execute the direct modification on the remote server, this > > function must rewrite the target subplan with a ForeignScan plan node > > that executes the direct modification on the remote server. >> > So I guess that assumption is safe. But I'd like to have some wiggle > room here. Wouldn't it be OK to have a Result node on top of the > ForeignScan, for example? If it really must be a simple ForeignScan > node, the PlanDirectModify API seems pretty strange. > > I'm not entirely sure what I would like to do with this now. I could > live with either version, but I'm not totally happy with either. (I like > your suggestion below) Assuming you mean the idea of using RT index to access ResultRelInfos in es_result_relations, we would still need to store the index in the ForeignScan node, so the question of whether to do it in make_modifytable() or in PlanDirectModify() must still be answered. > Looking at this block in postgresBeginDirectModify: > > > /* > > * Identify which user to do the remote access as. This should match what > > * ExecCheckRTEPerms() does. > > */ > > Assert(fsplan->resultRelIndex >= 0); > > dmstate->resultRelIndex = fsplan->resultRelIndex; > > rtindex = list_nth_int(resultRelations, fsplan->resultRelIndex); > > rte = exec_rt_fetch(rtindex, estate); > > userid = rte->checkAsUser ? rte->checkAsUser : GetUserId(); > > That's a complicated way of finding out the target table's RTI. We > should probably store the result RTI in the ForeignScan in the first place. > > >> Another idea is to merge "resultRelIndex" and a "range table index" into > >> one value. Range table entries that are updated would have a > >> ResultRelInfo, others would not. I'm not sure if that would end up being > >> cleaner or messier than what we have now, but might be worth trying. > > > > I have thought about something like this before. An idea I had is to > > make es_result_relations array indexable by plain RT indexes, then we > > don't need to maintain separate indexes that we do today for result > > relations. > > That sounds like a good idea. es_result_relations is currently an array > of ResultRelInfos, so that would leave a lot of unfilled structs in the > array. But in on of your other threads, you proposed turning > es_result_relations into an array of pointers anyway > (https://www.postgresql.org/message-id/CA+HiwqE4k1Q2TLmCAvekw+8_NXepbnfUOamOeX=KpHRDTfSKxA@mail.gmail.com). Okay, I am reorganizing the patches around that idea and will post an update soon. -- Amit Langote EDB: http://www.enterprisedb.com
On Thu, Oct 8, 2020 at 9:35 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Wed, Oct 7, 2020 at 9:07 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > On 07/10/2020 12:50, Amit Langote wrote: > > > I have thought about something like this before. An idea I had is to > > > make es_result_relations array indexable by plain RT indexes, then we > > > don't need to maintain separate indexes that we do today for result > > > relations. > > > > That sounds like a good idea. es_result_relations is currently an array > > of ResultRelInfos, so that would leave a lot of unfilled structs in the > > array. But in on of your other threads, you proposed turning > > es_result_relations into an array of pointers anyway > > (https://www.postgresql.org/message-id/CA+HiwqE4k1Q2TLmCAvekw+8_NXepbnfUOamOeX=KpHRDTfSKxA@mail.gmail.com). > > Okay, I am reorganizing the patches around that idea and will post an > update soon. Attached updated patches. 0001 makes es_result_relations an RTI-indexable array, which allows to get rid of all "result relation index" fields across the code. -- Amit Langote EDB: http://www.enterprisedb.com
Вложения
- v13-0001-Make-es_result_relations-array-indexable-by-RT-i.patch
- v13-0002-Include-result-relation-index-if-any-in-ForeignS.patch
- v13-0005-Revise-child-to-root-tuple-conversion-map-manage.patch
- v13-0004-Rearrange-partition-update-row-movement-code-a-b.patch
- v13-0003-Remove-es_result_relation_info.patch
On 09/10/2020 11:01, Amit Langote wrote: > On Thu, Oct 8, 2020 at 9:35 PM Amit Langote <amitlangote09@gmail.com> wrote: >> On Wed, Oct 7, 2020 at 9:07 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: >>> On 07/10/2020 12:50, Amit Langote wrote: >>>> I have thought about something like this before. An idea I had is to >>>> make es_result_relations array indexable by plain RT indexes, then we >>>> don't need to maintain separate indexes that we do today for result >>>> relations. >>> >>> That sounds like a good idea. es_result_relations is currently an array >>> of ResultRelInfos, so that would leave a lot of unfilled structs in the >>> array. But in on of your other threads, you proposed turning >>> es_result_relations into an array of pointers anyway >>> (https://www.postgresql.org/message-id/CA+HiwqE4k1Q2TLmCAvekw+8_NXepbnfUOamOeX=KpHRDTfSKxA@mail.gmail.com). >> >> Okay, I am reorganizing the patches around that idea and will post an >> update soon. > > Attached updated patches. > > 0001 makes es_result_relations an RTI-indexable array, which allows to > get rid of all "result relation index" fields across the code. Thanks! A couple small things I wanted to check with you before committing: 1. We have many different cleanup/close routines now: ExecCloseResultRelations, ExecCloseRangeTableRelations, and ExecCleanUpTriggerState. Do we need them all? It seems to me that we could merge ExecCloseRangeTableRelations() and ExecCleanUpTriggerState(), they seem to do roughly the same thing: close relations that were opened for ResultRelInfos. They are always called together, except in afterTriggerInvokeEvents(). And in afterTriggerInvokeEvents() too, there would be no harm in doing both, even though we know there aren't any entries in the es_result_relations array at that point. 2. The way this is handled in worker.c is a bit funny. In create_estate_for_relation(), you create a ResultRelInfo, but you *don't* put it in the es_opened_result_relations list. That's surprising, but I'm also surprised there are no ExecCloseResultRelations() calls before the FreeExecutorState() calls in worker.c. It's not needed because the apply_handle_insert/update/delete_internal() functions call ExecCloseIndices() directly, so they don't rely on the ExecCloseResultRelations() function for cleanup. That works too, but it's a bit surprising because it's different from how it's done in copy.c and nodeModifyTable.c. It would feel natural to rely on ExecCloseResultRelations() in worker.c as well, but on the other hand, it also calls ExecOpenIndices() in a more lazy fashion, and it makes sense to call ExecCloseIndices() in the same functions that ExecOpenIndices() is called. So I'm not sure if changing that would be an improvement overall. What do you think? Did you consider doing that? Attached is your original patch v13, and a patch on top of it that merges ExecCloseResultRelations() and ExecCleanUpTriggerState(), and makes some minor comment changes. I didn't do anything about the worker.c business, aside from adding a comment about it. - Heikki
Вложения
On Mon, Oct 12, 2020 at 8:12 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > On 09/10/2020 11:01, Amit Langote wrote: > > On Thu, Oct 8, 2020 at 9:35 PM Amit Langote <amitlangote09@gmail.com> wrote: > >> On Wed, Oct 7, 2020 at 9:07 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > >>> On 07/10/2020 12:50, Amit Langote wrote: > >>>> I have thought about something like this before. An idea I had is to > >>>> make es_result_relations array indexable by plain RT indexes, then we > >>>> don't need to maintain separate indexes that we do today for result > >>>> relations. > >>> > >>> That sounds like a good idea. es_result_relations is currently an array > >>> of ResultRelInfos, so that would leave a lot of unfilled structs in the > >>> array. But in on of your other threads, you proposed turning > >>> es_result_relations into an array of pointers anyway > >>> (https://www.postgresql.org/message-id/CA+HiwqE4k1Q2TLmCAvekw+8_NXepbnfUOamOeX=KpHRDTfSKxA@mail.gmail.com). > >> > >> Okay, I am reorganizing the patches around that idea and will post an > >> update soon. > > > > Attached updated patches. > > > > 0001 makes es_result_relations an RTI-indexable array, which allows to > > get rid of all "result relation index" fields across the code. > > Thanks! A couple small things I wanted to check with you before committing: Thanks for checking. > 1. We have many different cleanup/close routines now: > ExecCloseResultRelations, ExecCloseRangeTableRelations, and > ExecCleanUpTriggerState. Do we need them all? It seems to me that we > could merge ExecCloseRangeTableRelations() and > ExecCleanUpTriggerState(), they seem to do roughly the same thing: close > relations that were opened for ResultRelInfos. They are always called > together, except in afterTriggerInvokeEvents(). And in > afterTriggerInvokeEvents() too, there would be no harm in doing both, > even though we know there aren't any entries in the es_result_relations > array at that point. Hmm, I find trigger result relations to behave differently enough to deserve a separate function. For example, unlike plan-specified result relations, they don't point to range table relations and don't open indices. Maybe the name could be revisited, say, ExecCloseTriggerResultRelations(). Also, maybe call the other functions: ExecInitPlanResultRelationsArray() ExecInitPlanResultRelation() ExecClosePlanResultRelations() Thoughts? > 2. The way this is handled in worker.c is a bit funny. In > create_estate_for_relation(), you create a ResultRelInfo, but you > *don't* put it in the es_opened_result_relations list. That's > surprising, but I'm also surprised there are no > ExecCloseResultRelations() calls before the FreeExecutorState() calls in > worker.c. It's not needed because the > apply_handle_insert/update/delete_internal() functions call > ExecCloseIndices() directly, so they don't rely on the > ExecCloseResultRelations() function for cleanup. That works too, but > it's a bit surprising because it's different from how it's done in > copy.c and nodeModifyTable.c. It would feel natural to rely on > ExecCloseResultRelations() in worker.c as well, but on the other hand, > it also calls ExecOpenIndices() in a more lazy fashion, and it makes > sense to call ExecCloseIndices() in the same functions that > ExecOpenIndices() is called. So I'm not sure if changing that would be > an improvement overall. What do you think? Did you consider doing that? Yeah, that did bother me too a bit. I'm okay either way but it does look a bit inconsistent. Actually, maybe we don't need to be so paranoid about setting up es_result_relations in worker.c, because none of the downstream functionality invoked seems to rely on it, that is, no need to call ExecInitResultRelationsArray() and ExecInitResultRelation(). ExecSimpleRelation* and downstream functionality assume a single-relation operation and the ResultRelInfo is explicitly passed. > Attached is your original patch v13, and a patch on top of it that > merges ExecCloseResultRelations() and ExecCleanUpTriggerState(), and > makes some minor comment changes. I didn't do anything about the > worker.c business, aside from adding a comment about it. Thanks for the cleanup. I had noticed there was some funny capitalization in my patch: + ResultRelInfo **es_result_relations; /* Array of Per-range-table-entry s/Per-/per- Also, I think a comma may be needed in the parenthetical below: + * can index it by the RT index (minus 1 to be accurate). ...(minus 1, to be accurate) -- Amit Langote EDB: http://www.enterprisedb.com
On 12/10/2020 16:47, Amit Langote wrote: > On Mon, Oct 12, 2020 at 8:12 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> 1. We have many different cleanup/close routines now: >> ExecCloseResultRelations, ExecCloseRangeTableRelations, and >> ExecCleanUpTriggerState. Do we need them all? It seems to me that we >> could merge ExecCloseRangeTableRelations() and >> ExecCleanUpTriggerState(), they seem to do roughly the same thing: close >> relations that were opened for ResultRelInfos. They are always called >> together, except in afterTriggerInvokeEvents(). And in >> afterTriggerInvokeEvents() too, there would be no harm in doing both, >> even though we know there aren't any entries in the es_result_relations >> array at that point. > > Hmm, I find trigger result relations to behave differently enough to > deserve a separate function. For example, unlike plan-specified > result relations, they don't point to range table relations and don't > open indices. Maybe the name could be revisited, say, > ExecCloseTriggerResultRelations(). Matter of perception I guess. I still prefer to club them together into one Close call. It's true that they're slightly different, but they're also pretty similar. And IMHO they're more similar than different. > Also, maybe call the other functions: > > ExecInitPlanResultRelationsArray() > ExecInitPlanResultRelation() > ExecClosePlanResultRelations() > > Thoughts? Hmm. How about initializing the array lazily, on the first ExecInitPlanResultRelation() call? It's not performance critical, and that way there's one fewer initialization function that you need to remember to call. It occurred to me that if we do that (initialize the array lazily), there's very little need for the PlannedStmt->resultRelations list anymore. It's only used in ExecRelationIsTargetRelation(), but if we assume that ExecRelationIsTargetRelation() is only called after InitPlan has initialized the result relation for the relation, it can easily check es_result_relations instead. I think that's a safe assumption. ExecRelationIsTargetRelation() is only used in FDWs, and I believe the FDWs initialization routine can only be called after ExecInitModifyTable has been called on the relation. The PlannedStmt->rootResultRelations field is even more useless. > Actually, maybe we don't need to be so paranoid about setting up > es_result_relations in worker.c, because none of the downstream > functionality invoked seems to rely on it, that is, no need to call > ExecInitResultRelationsArray() and ExecInitResultRelation(). > ExecSimpleRelation* and downstream functionality assume a > single-relation operation and the ResultRelInfo is explicitly passed. Hmm, yeah, I like that. Similarly in ExecuteTruncateGuts(), there isn't actually any need to put the ResultRelInfos in the es_result_relations array. Putting all this together, I ended up with the attached. It doesn't include the subsequent commits in this patch set yet, for removal of es_result_relation_info et al. - Heikki
Вложения
On Tue, Oct 13, 2020 at 1:57 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > On 12/10/2020 16:47, Amit Langote wrote: > > On Mon, Oct 12, 2020 at 8:12 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > >> 1. We have many different cleanup/close routines now: > >> ExecCloseResultRelations, ExecCloseRangeTableRelations, and > >> ExecCleanUpTriggerState. Do we need them all? It seems to me that we > >> could merge ExecCloseRangeTableRelations() and > >> ExecCleanUpTriggerState(), they seem to do roughly the same thing: close > >> relations that were opened for ResultRelInfos. They are always called > >> together, except in afterTriggerInvokeEvents(). And in > >> afterTriggerInvokeEvents() too, there would be no harm in doing both, > >> even though we know there aren't any entries in the es_result_relations > >> array at that point. > > > > Hmm, I find trigger result relations to behave differently enough to > > deserve a separate function. For example, unlike plan-specified > > result relations, they don't point to range table relations and don't > > open indices. Maybe the name could be revisited, say, > > ExecCloseTriggerResultRelations(). > > Matter of perception I guess. I still prefer to club them together into > one Close call. It's true that they're slightly different, but they're > also pretty similar. And IMHO they're more similar than different. Okay, fine with me. > > Also, maybe call the other functions: > > > > ExecInitPlanResultRelationsArray() > > ExecInitPlanResultRelation() > > ExecClosePlanResultRelations() > > > > Thoughts? > > Hmm. How about initializing the array lazily, on the first > ExecInitPlanResultRelation() call? It's not performance critical, and > that way there's one fewer initialization function that you need to > remember to call. Agree that's better. > It occurred to me that if we do that (initialize the array lazily), > there's very little need for the PlannedStmt->resultRelations list > anymore. It's only used in ExecRelationIsTargetRelation(), but if we > assume that ExecRelationIsTargetRelation() is only called after InitPlan > has initialized the result relation for the relation, it can easily > check es_result_relations instead. I think that's a safe assumption. > ExecRelationIsTargetRelation() is only used in FDWs, and I believe the > FDWs initialization routine can only be called after ExecInitModifyTable > has been called on the relation. > > The PlannedStmt->rootResultRelations field is even more useless. I am very much tempted to remove those fields from PlannedStmt, although I am concerned that the following now assumes that *all* result relations are initialized in the executor initialization phase: bool ExecRelationIsTargetRelation(EState *estate, Index scanrelid) { if (!estate->es_result_relations) return false; return estate->es_result_relations[scanrelid - 1] != NULL; } In the other thread [1], I am proposing that we initialize result relations lazily, but the above will be a blocker to that. > > Actually, maybe we don't need to be so paranoid about setting up > > es_result_relations in worker.c, because none of the downstream > > functionality invoked seems to rely on it, that is, no need to call > > ExecInitResultRelationsArray() and ExecInitResultRelation(). > > ExecSimpleRelation* and downstream functionality assume a > > single-relation operation and the ResultRelInfo is explicitly passed. > > Hmm, yeah, I like that. Similarly in ExecuteTruncateGuts(), there isn't > actually any need to put the ResultRelInfos in the es_result_relations > array. > > Putting all this together, I ended up with the attached. It doesn't > include the subsequent commits in this patch set yet, for removal of > es_result_relation_info et al. Thanks. + * We put the ResultRelInfos in the es_opened_result_relations list, even + * though we don't have a range table and don't populate the + * es_result_relations array. That's a big bogus, but it's enough to make + * ExecGetTriggerResultRel() find them. */ estate = CreateExecutorState(); resultRelInfos = (ResultRelInfo *) palloc(list_length(rels) * sizeof(ResultRelInfo)); resultRelInfo = resultRelInfos; + estate->es_result_relations = (ResultRelInfo **) + palloc(list_length(rels) * sizeof(ResultRelInfo *)); Maybe don't allocate es_result_relations here? +/* + * Close all relations opened by ExecGetRangeTableRelation() + */ +void +ExecCloseRangeTableRelations(EState *estate) +{ + int i; + + for (i = 0; i < estate->es_range_table_size; i++) { if (estate->es_relations[i]) table_close(estate->es_relations[i], NoLock); } I think we have an optimization opportunity here (maybe as a separate patch). Why don't we introduce es_opened_relations? That way, if only a single or few of potentially 1000s relations in the range table is/are opened, we don't needlessly loop over *all* relations here. That can happen, for example, with a query where no partitions could be pruned at planning time, so the range table contains all partitions, but only one or few are accessed during execution and the rest run-time pruned. Although, in the workloads where it would matter, other overheads easily mask the overhead of this loop; see the first message at the linked thread [1], so it is hard to show an immediate benefit from this. Anyway, other than my concern about ExecRelationIsTargetRelation() mentioned above, I think the patch looks good. -- Amit Langote EDB: http://www.enterprisedb.com [1] https://commitfest.postgresql.org/30/2621/
On 13/10/2020 07:32, Amit Langote wrote: > On Tue, Oct 13, 2020 at 1:57 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> It occurred to me that if we do that (initialize the array lazily), >> there's very little need for the PlannedStmt->resultRelations list >> anymore. It's only used in ExecRelationIsTargetRelation(), but if we >> assume that ExecRelationIsTargetRelation() is only called after InitPlan >> has initialized the result relation for the relation, it can easily >> check es_result_relations instead. I think that's a safe assumption. >> ExecRelationIsTargetRelation() is only used in FDWs, and I believe the >> FDWs initialization routine can only be called after ExecInitModifyTable >> has been called on the relation. >> >> The PlannedStmt->rootResultRelations field is even more useless. > > I am very much tempted to remove those fields from PlannedStmt, > although I am concerned that the following now assumes that *all* > result relations are initialized in the executor initialization phase: > > bool > ExecRelationIsTargetRelation(EState *estate, Index scanrelid) > { > if (!estate->es_result_relations) > return false; > > return estate->es_result_relations[scanrelid - 1] != NULL; > } > > In the other thread [1], I am proposing that we initialize result > relations lazily, but the above will be a blocker to that. Ok, I'll leave it alone then. But I'll still merge resultRelations and rootResultRelations into one list. I don't see any point in keeping them separate. I'm tempted to remove ExecRelationIsTargetRelation() altogether, but keeping the resultRelations list isn't really a big deal, so I'll leave that for another discussion. >>> Actually, maybe we don't need to be so paranoid about setting up >>> es_result_relations in worker.c, because none of the downstream >>> functionality invoked seems to rely on it, that is, no need to call >>> ExecInitResultRelationsArray() and ExecInitResultRelation(). >>> ExecSimpleRelation* and downstream functionality assume a >>> single-relation operation and the ResultRelInfo is explicitly passed. >> >> Hmm, yeah, I like that. Similarly in ExecuteTruncateGuts(), there isn't >> actually any need to put the ResultRelInfos in the es_result_relations >> array. >> >> Putting all this together, I ended up with the attached. It doesn't >> include the subsequent commits in this patch set yet, for removal of >> es_result_relation_info et al. > > Thanks. > > + * We put the ResultRelInfos in the es_opened_result_relations list, even > + * though we don't have a range table and don't populate the > + * es_result_relations array. That's a big bogus, but it's enough to make > + * ExecGetTriggerResultRel() find them. > */ > estate = CreateExecutorState(); > resultRelInfos = (ResultRelInfo *) > palloc(list_length(rels) * sizeof(ResultRelInfo)); > resultRelInfo = resultRelInfos; > + estate->es_result_relations = (ResultRelInfo **) > + palloc(list_length(rels) * sizeof(ResultRelInfo *)); > > Maybe don't allocate es_result_relations here? Fixed. > Anyway, other than my concern about ExecRelationIsTargetRelation() > mentioned above, I think the patch looks good. Ok, committed. I'll continue to look at the rest of the patches in this patch series now. - Heikki
On Tue, Oct 13, 2020 at 7:13 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > On 13/10/2020 07:32, Amit Langote wrote: > > On Tue, Oct 13, 2020 at 1:57 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > >> It occurred to me that if we do that (initialize the array lazily), > >> there's very little need for the PlannedStmt->resultRelations list > >> anymore. It's only used in ExecRelationIsTargetRelation(), but if we > >> assume that ExecRelationIsTargetRelation() is only called after InitPlan > >> has initialized the result relation for the relation, it can easily > >> check es_result_relations instead. I think that's a safe assumption. > >> ExecRelationIsTargetRelation() is only used in FDWs, and I believe the > >> FDWs initialization routine can only be called after ExecInitModifyTable > >> has been called on the relation. > >> > >> The PlannedStmt->rootResultRelations field is even more useless. > > > > I am very much tempted to remove those fields from PlannedStmt, > > although I am concerned that the following now assumes that *all* > > result relations are initialized in the executor initialization phase: > > > > bool > > ExecRelationIsTargetRelation(EState *estate, Index scanrelid) > > { > > if (!estate->es_result_relations) > > return false; > > > > return estate->es_result_relations[scanrelid - 1] != NULL; > > } > > > > In the other thread [1], I am proposing that we initialize result > > relations lazily, but the above will be a blocker to that. > > Ok, I'll leave it alone then. But I'll still merge resultRelations and > rootResultRelations into one list. I don't see any point in keeping them > separate. Should be fine. As you said in the commit message, it should probably have been that way to begin with, but I don't recall why I didn't make it so. > I'm tempted to remove ExecRelationIsTargetRelation() altogether, but > keeping the resultRelations list isn't really a big deal, so I'll leave > that for another discussion. Yeah, makes sense. > > Anyway, other than my concern about ExecRelationIsTargetRelation() > > mentioned above, I think the patch looks good. > > Ok, committed. I'll continue to look at the rest of the patches in this > patch series now. Thanks. BTW, you mentioned the lazy ResultRelInfo optimization bit in the commit message, so does that mean you intend to take a look at the other thread [1] too? Or should I post a rebased version of the lazy ResultRelInfo initialization patch here in this thread? That patch is just a bunch of refactoring too. -- Amit Langote EDB: http://www.enterprisedb.com [1] https://commitfest.postgresql.org/30/2621/
On 13/10/2020 15:03, Amit Langote wrote: > On Tue, Oct 13, 2020 at 7:13 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> Ok, committed. I'll continue to look at the rest of the patches in this >> patch series now. I've reviewed the next two patches in the series, they are pretty much ready for commit now. I made just a few minor changes, notably: - I moved the responsibility to set ForeignTable->resultRelation to the FDWs, like you had in the original patch version. Sorry for flip-flopping on that. - In postgres_fdw.c, I changed it to store the ResultRelInfo pointer in PgFdwDirectModifyState, instead of storing the RT index and looking it up in the BeginDirectModify and IterateDirectModify. I think you did it that way in the earlier patch versions, too. - Some minor comment and docs kibitzing. One little idea I had: I think all FDWs that support direct modify will have to carry the resultRelaton index or the ResultRelInfo pointer from BeginDirectModify to IterateDirectModify in the FDW's private struct. It's not complicated, but should we make life easier for FDWs by storing the ResultRelInfo pointer in the ForeignScanState struct in the core code? The doc now says: > The data that was actually inserted, updated or deleted must be > stored in the ri_projectReturning->pi_exprContext->ecxt_scantuple of > the target foreign table's ResultRelInfo obtained using the > information passed to BeginDirectModify. Return NULL if no more rows > are available. That "ResultRelInfo obtained using the information passed to BeginDirectModify" part is a pretty vague. We could expand it, but if we stored the ResultRelInfo in the ForeignScanState, we could explain it succinctly. > BTW, you mentioned the lazy ResultRelInfo optimization bit in the > commit message, so does that mean you intend to take a look at the > other thread [1] too? Or should I post a rebased version of the lazy > ResultRelInfo initialization patch here in this thread? That patch is > just a bunch of refactoring too. No promises, but yeah, now that I'm knee-deep in this ResultRelInfo business, I'll try to take a look at that too :-). - Heikki
Вложения
On 13/10/2020 19:09, Heikki Linnakangas wrote: > One little idea I had: > > I think all FDWs that support direct modify will have to carry the > resultRelaton index or the ResultRelInfo pointer from BeginDirectModify > to IterateDirectModify in the FDW's private struct. It's not > complicated, but should we make life easier for FDWs by storing the > ResultRelInfo pointer in the ForeignScanState struct in the core code? > The doc now says: > >> The data that was actually inserted, updated or deleted must be >> stored in the ri_projectReturning->pi_exprContext->ecxt_scantuple of >> the target foreign table's ResultRelInfo obtained using the >> information passed to BeginDirectModify. Return NULL if no more rows >> are available. > > That "ResultRelInfo obtained using the information passed to > BeginDirectModify" part is a pretty vague. We could expand it, but if we > stored the ResultRelInfo in the ForeignScanState, we could explain it > succinctly. I tried that approach, see attached. Yeah, this feels better to me. - Heikki
Вложения
On Wed, Oct 14, 2020 at 1:30 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > On 13/10/2020 19:09, Heikki Linnakangas wrote: > > One little idea I had: > > > > I think all FDWs that support direct modify will have to carry the > > resultRelaton index or the ResultRelInfo pointer from BeginDirectModify > > to IterateDirectModify in the FDW's private struct. It's not > > complicated, but should we make life easier for FDWs by storing the > > ResultRelInfo pointer in the ForeignScanState struct in the core code? > > The doc now says: > > > >> The data that was actually inserted, updated or deleted must be > >> stored in the ri_projectReturning->pi_exprContext->ecxt_scantuple of > >> the target foreign table's ResultRelInfo obtained using the > >> information passed to BeginDirectModify. Return NULL if no more rows > >> are available. > > > > That "ResultRelInfo obtained using the information passed to > > BeginDirectModify" part is a pretty vague. We could expand it, but if we > > stored the ResultRelInfo in the ForeignScanState, we could explain it > > succinctly. > > I tried that approach, see attached. Yeah, this feels better to me. I like the idea of storing the ResultRelInfo in ForeignScanState, but it would be better if we can document the fact that an FDW may not reliably access until IterateDirectModify(). That's because, setting it in ExecInitForeignScan() will mean *all* result relations must be initialized during ExecInitModifyTable(), which defies my lazy-ResultRelInfo-initiailization proposal. As to why why I'm pushing that proposal, consider that when we'll get the ability to use run-time pruning for UPDATE/DELETE with [1], initializing all result relations before initializing the plan tree will mean most of those ResultRelInfos will be unused, because run-time pruning that occurs when the plan tree is initialized (and/or when it is executed) may eliminate most but a few result relations. I've attached a diff to v17-0001 to show one way of delaying setting ForeignScanState.resultRelInfo. -- Amit Langote EDB: http://www.enterprisedb.com [1] https://commitfest.postgresql.org/30/2575/
Вложения
On 14/10/2020 09:44, Amit Langote wrote: > I like the idea of storing the ResultRelInfo in ForeignScanState, but > it would be better if we can document the fact that an FDW may not > reliably access until IterateDirectModify(). That's because, setting > it in ExecInitForeignScan() will mean *all* result relations must be > initialized during ExecInitModifyTable(), which defies my > lazy-ResultRelInfo-initiailization proposal. As to why why I'm > pushing that proposal, consider that when we'll get the ability to use > run-time pruning for UPDATE/DELETE with [1], initializing all result > relations before initializing the plan tree will mean most of those > ResultRelInfos will be unused, because run-time pruning that occurs > when the plan tree is initialized (and/or when it is executed) may > eliminate most but a few result relations. > > I've attached a diff to v17-0001 to show one way of delaying setting > ForeignScanState.resultRelInfo. The BeginDirectModify function does a lot of expensive things, like opening a connection to the remote server. If we want to optimize run-time pruning, I think we need to avoid calling BeginDirectModify for pruned partitions altogether. I pushed this without those delay-setting-resultRelInfo changes. But we can revisit those changes with the run-time pruning optimization patch. I'll continue with the last couple of patches in this thread. - Heikki
On Wed, Oct 14, 2020 at 6:04 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > On 14/10/2020 09:44, Amit Langote wrote: > > I like the idea of storing the ResultRelInfo in ForeignScanState, but > > it would be better if we can document the fact that an FDW may not > > reliably access until IterateDirectModify(). That's because, setting > > it in ExecInitForeignScan() will mean *all* result relations must be > > initialized during ExecInitModifyTable(), which defies my > > lazy-ResultRelInfo-initiailization proposal. As to why why I'm > > pushing that proposal, consider that when we'll get the ability to use > > run-time pruning for UPDATE/DELETE with [1], initializing all result > > relations before initializing the plan tree will mean most of those > > ResultRelInfos will be unused, because run-time pruning that occurs > > when the plan tree is initialized (and/or when it is executed) may > > eliminate most but a few result relations. > > > > I've attached a diff to v17-0001 to show one way of delaying setting > > ForeignScanState.resultRelInfo. > > The BeginDirectModify function does a lot of expensive things, like > opening a connection to the remote server. If we want to optimize > run-time pruning, I think we need to avoid calling BeginDirectModify for > pruned partitions altogether. Note that if foreign partitions get pruned during the so called "init" run-time pruning (that is, in the ExecInitNode() phase), BeginDirectModify() won't get called on them. Although, your concern does apply if there is only going to be "exec" run-time pruning and no "initial" pruning. For me, the former case is a bit more interesting, as it occurs with prepared statements using a generic plan (update parted_table set ... where partkey = $1). > I pushed this without those delay-setting-resultRelInfo changes. But we > can revisit those changes with the run-time pruning optimization patch. Sure, that makes sense. > I'll continue with the last couple of patches in this thread. Okay, thanks. -- Amit Langote EDB: http://www.enterprisedb.com
On Wed, Oct 14, 2020 at 6:04 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > I'll continue with the last couple of patches in this thread. I committed the move of the cross-partition logic to new ExecCrossPartitionUpdate() function, with just minor comment editing and pgindent. I left out the refactoring around the calls to AFTER ROW INSERT/DELETE triggers. I stared at the change for a while, and wasn't sure if I liked the patched or the unpatched new version better, so I left it alone. Looking at the last patch, "Revise child-to-root tuple conversion map management", that's certainly an improvement. However, I find it confusing that sometimes the mapping from child to root is in relinfo->ri_ChildToRootMap, and sometimes in relinfo->ri_PartitionInfo->pi_PartitionToRootMap. When is each of those filled in? If both are set, is it well defined which one is initialized first? In general, I'm pretty confused by the initialization of ri_PartitionInfo. Where is initialized, and when? In execnodes.h, the definition of ResultRelInfo says: > /* info for partition tuple routing (NULL if not set up yet) */ > struct PartitionRoutingInfo *ri_PartitionInfo; That implies that the field is initialized lazily. But in ExecFindPartition, we have this: > if (partidx == partdesc->boundinfo->default_index) > { > PartitionRoutingInfo *partrouteinfo = rri->ri_PartitionInfo; > > /* > * The tuple must match the partition's layout for the constraint > * expression to be evaluated successfully. If the partition is > * sub-partitioned, that would already be the case due to the code > * above, but for a leaf partition the tuple still matches the > * parent's layout. > * > * Note that we have a map to convert from root to current > * partition, but not from immediate parent to current partition. > * So if we have to convert, do it from the root slot; if not, use > * the root slot as-is. > */ > if (partrouteinfo) > { > TupleConversionMap *map = partrouteinfo->pi_RootToPartitionMap; > > if (map) > slot = execute_attr_map_slot(map->attrMap, rootslot, > partrouteinfo->pi_PartitionTupleSlot); > else > slot = rootslot; > } > > ExecPartitionCheck(rri, slot, estate, true); > } That check implies that it's not just lazily initialized, the code will work differently if ri_PartitionInfo is set or not. I think all this would be more clear if ri_PartitionInfo and ri_ChildToRootMap were both truly lazily initialized, the first time they're needed. And if we removed ri_PartitionInfo->pi_PartitionToRootMap, and always used ri_ChildToRootMap for it. Maybe remove PartitionRoutingInfo struct altogether, and just move its fields directly to ResultRelInfo. - Heikki
On Thu, Oct 15, 2020 at 11:59 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > On Wed, Oct 14, 2020 at 6:04 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > I'll continue with the last couple of patches in this thread. > > I committed the move of the cross-partition logic to new > ExecCrossPartitionUpdate() function, with just minor comment editing and > pgindent. I left out the refactoring around the calls to AFTER ROW > INSERT/DELETE triggers. I stared at the change for a while, and wasn't > sure if I liked the patched or the unpatched new version better, so I > left it alone. Okay, thanks for committing that. > Looking at the last patch, "Revise child-to-root tuple conversion map > management", that's certainly an improvement. However, I find it > confusing that sometimes the mapping from child to root is in > relinfo->ri_ChildToRootMap, and sometimes in > relinfo->ri_PartitionInfo->pi_PartitionToRootMap. When is each of those > filled in? If both are set, is it well defined which one is initialized > first? It is ri_ChildToRootMap that is set first, because it's only set in child UPDATE target relations which are initialized in ExecInitModifyTable(), that is way before partition tuple routing comes into picture. ri_PartitionInfo and hence pi_PartitionToRootMap is set in tuple routing target partition's ResultRelInfos, which are lazily initialized when tuples land into them. If a tuple routing target partition happens to be an UPDATE target relation and we need to initialize the partition-to-root map, which for a tuple routing target partition is to be saved in pi_PartitionToRootMap, with the patch, we will try to reuse ri_ChildToRootMap because it would already be initialized. But as you say below, maybe we don't need to have two fields for the same thing, which I agree with. Having only ri_ChildToRootMap as you suggest below suffices. > In general, I'm pretty confused by the initialization of > ri_PartitionInfo. Where is initialized, and when? In execnodes.h, the > definition of ResultRelInfo says: > > > /* info for partition tuple routing (NULL if not set up yet) */ > > struct PartitionRoutingInfo *ri_PartitionInfo; > > That implies that the field is initialized lazily. But in > ExecFindPartition, we have this: > > > if (partidx == partdesc->boundinfo->default_index) > > { > > PartitionRoutingInfo *partrouteinfo = rri->ri_PartitionInfo; > > > > /* > > * The tuple must match the partition's layout for the constraint > > * expression to be evaluated successfully. If the partition is > > * sub-partitioned, that would already be the case due to the code > > * above, but for a leaf partition the tuple still matches the > > * parent's layout. > > * > > * Note that we have a map to convert from root to current > > * partition, but not from immediate parent to current partition. > > * So if we have to convert, do it from the root slot; if not, use > > * the root slot as-is. > > */ > > if (partrouteinfo) > > { > > TupleConversionMap *map = partrouteinfo->pi_RootToPartitionMap; > > > > if (map) > > slot = execute_attr_map_slot(map->attrMap, rootslot, > > partrouteinfo->pi_PartitionTupleSlot); > > else > > slot = rootslot; > > } > > > > ExecPartitionCheck(rri, slot, estate, true); > > } > > That check implies that it's not just lazily initialized, the code will > work differently if ri_PartitionInfo is set or not. > > I think all this would be more clear if ri_PartitionInfo and > ri_ChildToRootMap were both truly lazily initialized, the first time > they're needed. So, we initialize these maps when we initialize a partition's ResultRelInfo. I mean if the partition has a different tuple descriptor than root, we know we are going to need to convert tuples between them (in either direction), so we might as well initialize the maps when the ResultRelInfo is built, which we do lazily for tuple routing target relations at least. In that sense, at least root-to-partition maps are initialized lazily, that is only when a partition receives a tuple via routing. Partition-to-root maps' initialization though is not always lazy, because they are also needed by UPDATE target relations, whose ResultRelInfo are initialized in ExecInitModifyTable(), which is not lazy enough. That will change with my other patch though. :) > And if we removed > ri_PartitionInfo->pi_PartitionToRootMap, and always used > ri_ChildToRootMap for it. Done in the attached. > Maybe remove PartitionRoutingInfo struct altogether, and just move its > fields directly to ResultRelInfo. If we do that, we'll end up with 3 notations for the same thing across releases: In v10 and v11, PartitionRoutingInfos members are saved in arrays in ModifyTableState, totally detached from the partition ResultRelInfos. In v12 (3f2393edef), we moved them into ResultRelInfo but chose to add them into a sub-struct (PartitionRoutingInfo), which in retrospect was not a great decision. Now if we pull them into ResultRelInfo, we'll have invented the 3rd notation. Maybe that makes things hard when back-patching bug-fixes? Attached updated patch. -- Amit Langote EDB: http://www.enterprisedb.com
Вложения
On 2020-Oct-16, Amit Langote wrote: > On Thu, Oct 15, 2020 at 11:59 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > On Wed, Oct 14, 2020 at 6:04 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > And if we removed > > ri_PartitionInfo->pi_PartitionToRootMap, and always used > > ri_ChildToRootMap for it. > > Done in the attached. Hmm... Overall I like the simplification. > > Maybe remove PartitionRoutingInfo struct altogether, and just move its > > fields directly to ResultRelInfo. > > If we do that, we'll end up with 3 notations for the same thing across > releases: In v10 and v11, PartitionRoutingInfos members are saved in > arrays in ModifyTableState, totally detached from the partition > ResultRelInfos. In v12 (3f2393edef), we moved them into ResultRelInfo > but chose to add them into a sub-struct (PartitionRoutingInfo), which > in retrospect was not a great decision. Now if we pull them into > ResultRelInfo, we'll have invented the 3rd notation. Maybe that makes > things hard when back-patching bug-fixes? I don't necessarily agree that PartitionRoutingInfo was such a bad idea. In fact I wonder if we shouldn't move *more* stuff into it (ri_PartitionCheckExpr), and keep struct ResultRelInfo clean of partitioning-related stuff (other than ri_PartitionInfo and ri_PartitionRoot); there are plenty of ResultRelInfos that are not partitions, so I think it makes sense to keep the split. I'm thinking that the ChildToRootMap should continue to be in PartitionRoutingInfo. Maybe what we need in order to keep the initialization "lazy enough" is some inline functions that act as getters, initializing members of PartitionRoutingInfo when first needed. (This would probably need boolean flags, to distinguish "hasn't been set up yet" from "it is not needed for this partition" for each member that requires it). BTW it is curious that ExecInitRoutingInfo is called both in ExecInitPartitionInfo() (from ExecFindPartition when the ResultRelInfo for the partition is not found) *and* from ExecFindPartition again, when the ResultRelInfo for the partition *is* found. Doesn't this mean that ri_PartitionInfo is set up twice for the same partition?
On Fri, Oct 16, 2020 at 11:45 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > On 2020-Oct-16, Amit Langote wrote: > > On Thu, Oct 15, 2020 at 11:59 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > > On Wed, Oct 14, 2020 at 6:04 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > > > And if we removed > > > ri_PartitionInfo->pi_PartitionToRootMap, and always used > > > ri_ChildToRootMap for it. > > > > Done in the attached. > > Hmm... Overall I like the simplification. Thank you for looking it over. > > > Maybe remove PartitionRoutingInfo struct altogether, and just move its > > > fields directly to ResultRelInfo. > > > > If we do that, we'll end up with 3 notations for the same thing across > > releases: In v10 and v11, PartitionRoutingInfos members are saved in > > arrays in ModifyTableState, totally detached from the partition > > ResultRelInfos. In v12 (3f2393edef), we moved them into ResultRelInfo > > but chose to add them into a sub-struct (PartitionRoutingInfo), which > > in retrospect was not a great decision. Now if we pull them into > > ResultRelInfo, we'll have invented the 3rd notation. Maybe that makes > > things hard when back-patching bug-fixes? > > I don't necessarily agree that PartitionRoutingInfo was such a bad idea. > In fact I wonder if we shouldn't move *more* stuff into it > (ri_PartitionCheckExpr), and keep struct ResultRelInfo clean of > partitioning-related stuff (other than ri_PartitionInfo and > ri_PartitionRoot); there are plenty of ResultRelInfos that are not > partitions, so I think it makes sense to keep the split. I'm thinking > that the ChildToRootMap should continue to be in PartitionRoutingInfo. Hmm, I don't see ri_PartitionCheckExpr as being a piece of routing information, because it's primarily meant to be used when inserting *directly* into a partition, although it's true we do initialize it in routing target partitions too in some cases. Also, ChildToRootMap was introduced by the trigger transition table project, not tuple routing. I think we misjudged this when we added PartitionToRootMap to PartitionRoutingInfo, because it doesn't really belong there. This patch fixes that by removing PartitionToRootMap. RootToPartitionMap and the associated partition slot is the only piece of extra information that is needed by tuple routing target relations. > Maybe what we need in order to keep the initialization "lazy enough" is > some inline functions that act as getters, initializing members of > PartitionRoutingInfo when first needed. (This would probably need > boolean flags, to distinguish "hasn't been set up yet" from "it is not > needed for this partition" for each member that requires it). As I said in my previous email, I don't see how we can make initializing the map any lazier than it already is. If a partition has a different tuple descriptor than the root table, then we know for sure that any tuples that are routed to it will need to be converted from the root tuple format to its tuple format, so we might as well build the map when the ResultRelInfo is built. If no tuple lands into a partition, we would neither build its ResultRelInfo nor the map. With the current arrangement, if the map field is NULL, it simply means that the partition has the same tuple format as the root table. > BTW it is curious that ExecInitRoutingInfo is called both in > ExecInitPartitionInfo() (from ExecFindPartition when the ResultRelInfo > for the partition is not found) *and* from ExecFindPartition again, when > the ResultRelInfo for the partition *is* found. Doesn't this mean that > ri_PartitionInfo is set up twice for the same partition? No. ExecFindPartition() directly calls ExecInitRoutingInfo() only for reused update result relations, that too, only the first time a tuple lands into such a partition. For the subsequent tuples that land into the same partition, ExecFindPartition() will be able to find that ResultRelInfo in the proute->partitions[] array. All ResultRelInfos in that array are assumed to have been processed by ExecInitRoutingInfo(). -- Amit Langote EDB: http://www.enterprisedb.com
On 2020-Oct-17, Amit Langote wrote: > Hmm, I don't see ri_PartitionCheckExpr as being a piece of routing > information, because it's primarily meant to be used when inserting > *directly* into a partition, although it's true we do initialize it in > routing target partitions too in some cases. > > Also, ChildToRootMap was introduced by the trigger transition table > project, not tuple routing. I think we misjudged this when we added > PartitionToRootMap to PartitionRoutingInfo, because it doesn't really > belong there. This patch fixes that by removing PartitionToRootMap. > > RootToPartitionMap and the associated partition slot is the only piece > of extra information that is needed by tuple routing target relations. Well, I was thinking on making the ri_PartitionInfo be about partitioning in general, not just specifically for partition tuple routing. Maybe Heikki is right that it may end up being simpler to remove ri_PartitionInfo altogether. It'd just be a couple of additional pointers in ResultRelInfo after all. (Remember that we wanted to get rid of fields specific to only certain kinds of RTEs in RangeTblEntry for example, to keep things cleanly separated, although that project eventually found its demise for other reasons.) > As I said in my previous email, I don't see how we can make > initializing the map any lazier than it already is. If a partition > has a different tuple descriptor than the root table, then we know for > sure that any tuples that are routed to it will need to be converted > from the root tuple format to its tuple format, so we might as well > build the map when the ResultRelInfo is built. If no tuple lands into > a partition, we would neither build its ResultRelInfo nor the map. > With the current arrangement, if the map field is NULL, it simply > means that the partition has the same tuple format as the root table. I see -- makes sense. > On Fri, Oct 16, 2020 at 11:45 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > BTW it is curious that ExecInitRoutingInfo is called both in > > ExecInitPartitionInfo() (from ExecFindPartition when the ResultRelInfo > > for the partition is not found) *and* from ExecFindPartition again, when > > the ResultRelInfo for the partition *is* found. Doesn't this mean that > > ri_PartitionInfo is set up twice for the same partition? > > No. ExecFindPartition() directly calls ExecInitRoutingInfo() only for > reused update result relations, that too, only the first time a tuple > lands into such a partition. For the subsequent tuples that land into > the same partition, ExecFindPartition() will be able to find that > ResultRelInfo in the proute->partitions[] array. All ResultRelInfos > in that array are assumed to have been processed by > ExecInitRoutingInfo(). Doh, right, sorry, I was misreading the if/else maze there.
On Sun, Oct 18, 2020 at 12:54 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > On 2020-Oct-17, Amit Langote wrote: > > Hmm, I don't see ri_PartitionCheckExpr as being a piece of routing > > information, because it's primarily meant to be used when inserting > > *directly* into a partition, although it's true we do initialize it in > > routing target partitions too in some cases. > > > > Also, ChildToRootMap was introduced by the trigger transition table > > project, not tuple routing. I think we misjudged this when we added > > PartitionToRootMap to PartitionRoutingInfo, because it doesn't really > > belong there. This patch fixes that by removing PartitionToRootMap. > > > > RootToPartitionMap and the associated partition slot is the only piece > > of extra information that is needed by tuple routing target relations. > > Well, I was thinking on making the ri_PartitionInfo be about > partitioning in general, not just specifically for partition tuple > routing. Maybe Heikki is right that it may end up being simpler to > remove ri_PartitionInfo altogether. It'd just be a couple of additional > pointers in ResultRelInfo after all. So that's 2 votes for removing PartitionRoutingInfo from the tree. Okay, I have tried that in the attached 0002 patch. Also, I fixed some comments in 0001 that still referenced PartitionToRootMap. -- Amit Langote EDB: http://www.enterprisedb.com
Вложения
On 19/10/2020 07:54, Amit Langote wrote: > On Sun, Oct 18, 2020 at 12:54 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: >> On 2020-Oct-17, Amit Langote wrote: >>> Hmm, I don't see ri_PartitionCheckExpr as being a piece of routing >>> information, because it's primarily meant to be used when inserting >>> *directly* into a partition, although it's true we do initialize it in >>> routing target partitions too in some cases. >>> >>> Also, ChildToRootMap was introduced by the trigger transition table >>> project, not tuple routing. I think we misjudged this when we added >>> PartitionToRootMap to PartitionRoutingInfo, because it doesn't really >>> belong there. This patch fixes that by removing PartitionToRootMap. >>> >>> RootToPartitionMap and the associated partition slot is the only piece >>> of extra information that is needed by tuple routing target relations. >> >> Well, I was thinking on making the ri_PartitionInfo be about >> partitioning in general, not just specifically for partition tuple >> routing. Maybe Heikki is right that it may end up being simpler to >> remove ri_PartitionInfo altogether. It'd just be a couple of additional >> pointers in ResultRelInfo after all. > > So that's 2 votes for removing PartitionRoutingInfo from the tree. > Okay, I have tried that in the attached 0002 patch. Also, I fixed > some comments in 0001 that still referenced PartitionToRootMap. Pushed, with minor comment changes. I also noticed that the way the getTargetResultRelInfo() helper function was used, was a bit messy. It was used when firing AFTER STATEMENT triggers, but for some reason the code to fire BEFORE STATEMENT triggers didn't use it but duplicated the logic instead. I made that a bit simpler, by always setting the rootResultRelInfo field in ExecInitModifyTable(), making the getTargetResultRelInfo() function unnecessary. Thanks! - Heikki
On 17/10/2020 18:54, Alvaro Herrera wrote: > On 2020-Oct-17, Amit Langote wrote: >> As I said in my previous email, I don't see how we can make >> initializing the map any lazier than it already is. If a partition >> has a different tuple descriptor than the root table, then we know for >> sure that any tuples that are routed to it will need to be converted >> from the root tuple format to its tuple format, so we might as well >> build the map when the ResultRelInfo is built. If no tuple lands into >> a partition, we would neither build its ResultRelInfo nor the map. >> With the current arrangement, if the map field is NULL, it simply >> means that the partition has the same tuple format as the root table. > > I see -- makes sense. It's probably true that there's no performance gain from initializing them more lazily. But the reasoning and logic around the initialization is complicated. After tracing through various path through the code, I'm convinced enough that it's correct, or at least these patches didn't break it, but I still think some sort of lazy initialization on first use would make it more readable. Or perhaps there's some other refactoring we could do. Perhaps we should have a magic TupleConversionMap value to mean "no conversion required". NULL could then mean "not initialized yet". >> On Fri, Oct 16, 2020 at 11:45 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > >>> BTW it is curious that ExecInitRoutingInfo is called both in >>> ExecInitPartitionInfo() (from ExecFindPartition when the ResultRelInfo >>> for the partition is not found) *and* from ExecFindPartition again, when >>> the ResultRelInfo for the partition *is* found. Doesn't this mean that >>> ri_PartitionInfo is set up twice for the same partition? >> >> No. ExecFindPartition() directly calls ExecInitRoutingInfo() only for >> reused update result relations, that too, only the first time a tuple >> lands into such a partition. For the subsequent tuples that land into >> the same partition, ExecFindPartition() will be able to find that >> ResultRelInfo in the proute->partitions[] array. All ResultRelInfos >> in that array are assumed to have been processed by >> ExecInitRoutingInfo(). > > Doh, right, sorry, I was misreading the if/else maze there. I think that demonstrates my point that the logic is hard to follow :-). - Heikki
On Mon, Oct 19, 2020 at 8:48 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > On 19/10/2020 07:54, Amit Langote wrote: > > On Sun, Oct 18, 2020 at 12:54 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > >> Well, I was thinking on making the ri_PartitionInfo be about > >> partitioning in general, not just specifically for partition tuple > >> routing. Maybe Heikki is right that it may end up being simpler to > >> remove ri_PartitionInfo altogether. It'd just be a couple of additional > >> pointers in ResultRelInfo after all. > > > > So that's 2 votes for removing PartitionRoutingInfo from the tree. > > Okay, I have tried that in the attached 0002 patch. Also, I fixed > > some comments in 0001 that still referenced PartitionToRootMap. > > Pushed, with minor comment changes. Thank you. > I also noticed that the way the getTargetResultRelInfo() helper function > was used, was a bit messy. It was used when firing AFTER STATEMENT > triggers, but for some reason the code to fire BEFORE STATEMENT triggers > didn't use it but duplicated the logic instead. I made that a bit > simpler, by always setting the rootResultRelInfo field in > ExecInitModifyTable(), making the getTargetResultRelInfo() function > unnecessary. Good, I was mildly annoyed by that function too. -- Amit Langote EDB: http://www.enterprisedb.com
On Mon, Oct 19, 2020 at 8:55 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > On 17/10/2020 18:54, Alvaro Herrera wrote: > > On 2020-Oct-17, Amit Langote wrote: > >> As I said in my previous email, I don't see how we can make > >> initializing the map any lazier than it already is. If a partition > >> has a different tuple descriptor than the root table, then we know for > >> sure that any tuples that are routed to it will need to be converted > >> from the root tuple format to its tuple format, so we might as well > >> build the map when the ResultRelInfo is built. If no tuple lands into > >> a partition, we would neither build its ResultRelInfo nor the map. > >> With the current arrangement, if the map field is NULL, it simply > >> means that the partition has the same tuple format as the root table. > > > > I see -- makes sense. > > It's probably true that there's no performance gain from initializing > them more lazily. But the reasoning and logic around the initialization > is complicated. After tracing through various path through the code, I'm > convinced enough that it's correct, or at least these patches didn't > break it, but I still think some sort of lazy initialization on first > use would make it more readable. Or perhaps there's some other > refactoring we could do. So the other patch I have mentioned is about lazy initialization of the ResultRelInfo itself, not the individual fields, but maybe with enough refactoring we can get the latter too. Currently, ExecInitModifyTable() performs ExecInitResultRelation() for all relations in ModifyTable.resultRelations, which sets most but not all ResultRelInfo fields (whatever InitResultRelInfo() can set), followed by initializing some other fields based on the contents of the ModifyTable plan. My patch moves those two steps into a function ExecBuildResultRelation() which is called lazily during ExecModifyTable() for a given result relation on the first tuple produced by that relation's plan. Actually, there's a "getter" named ExecGetResultRelation() which first consults es_result_relations[rti - 1] for the requested relation and if it's NULL then calls ExecBuildResultRelation(). Would you mind taking a look at that as a starting point? I am thinking there's enough relevant discussion here that I should post the rebased version of that patch here. > Perhaps we should have a magic TupleConversionMap value to mean "no > conversion required". NULL could then mean "not initialized yet". Perhaps, a TupleConversionMap with its attrMap set to NULL means "no conversion required". -- Amit Langote EDB: http://www.enterprisedb.com
On Tue, Oct 20, 2020 at 9:57 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Mon, Oct 19, 2020 at 8:55 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > It's probably true that there's no performance gain from initializing > > them more lazily. But the reasoning and logic around the initialization > > is complicated. After tracing through various path through the code, I'm > > convinced enough that it's correct, or at least these patches didn't > > break it, but I still think some sort of lazy initialization on first > > use would make it more readable. Or perhaps there's some other > > refactoring we could do. > > So the other patch I have mentioned is about lazy initialization of > the ResultRelInfo itself, not the individual fields, but maybe with > enough refactoring we can get the latter too. So, I tried implementing a lazy-initialization-on-first-access approach for both the ResultRelInfos themselves and some of the individual fields of ResultRelInfo that don't need to be set right away. You can see the end result in the attached 0003 patch. This slims down ExecInitModifyTable() significantly, both in terms of code footprint and the amount of work that it does. 0001 fixes a thinko of the recent commit 1375422c782 that I discovered when debugging a problem with 0003. 0002 is for something I have mentioned upthread. ForeignScanState.resultRelInfo cannot be set in ExecInit* stage as it's done now, because with 0003, child ResultRelInfos will not have been added to es_result_relations during that stage. -- Amit Langote EDB: http://www.enterprisedb.com
Вложения
On 2020-Oct-22, Amit Langote wrote: > 0001 fixes a thinko of the recent commit 1375422c782 that I discovered > when debugging a problem with 0003. Hmm, how hard is it to produce a test case that fails because of this problem?
On Thu, Oct 22, 2020 at 11:25 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2020-Oct-22, Amit Langote wrote: > > > 0001 fixes a thinko of the recent commit 1375422c782 that I discovered > > when debugging a problem with 0003. > > Hmm, how hard is it to produce a test case that fails because of this > problem? I checked and don't think there's any live bug here. You will notice if you take a look at 1375422c7 that we've made es_result_relations an array of pointers while the individual ModifyTableState nodes own the actual ResultRelInfos. So, EvalPlanQualStart() setting the parent EState's es_result_relations array to NULL implies that those pointers become inaccessible to the parent query's execution after EvalPlanQual() returns. However, nothing in the tree today accesses ResulRelInfos through es_result_relations array, except during ExecInit* stage (see ExecInitForeignScan()) but it would still be intact at that stage. With the lazy-initialization patch though, we do check es_result_relations when trying to open a result relation to see if it has already been initialized (a non-NULL pointer in that array means yes), so resetting it in the middle of the execution can't be safe. For one example, we will end up initializing the same relation many times after not finding it in es_result_relations and also add it *duplicatively* to es_opened_result_relations list, breaking the invariant that that list contains distinct relations. -- Amit Langote EDB: http://www.enterprisedb.com
On 23/10/2020 05:56, Amit Langote wrote: > On Thu, Oct 22, 2020 at 11:25 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: >> >> On 2020-Oct-22, Amit Langote wrote: >> >>> 0001 fixes a thinko of the recent commit 1375422c782 that I discovered >>> when debugging a problem with 0003. >> >> Hmm, how hard is it to produce a test case that fails because of this >> problem? > > I checked and don't think there's any live bug here. You will notice > if you take a look at 1375422c7 that we've made es_result_relations an > array of pointers while the individual ModifyTableState nodes own the > actual ResultRelInfos. So, EvalPlanQualStart() setting the parent > EState's es_result_relations array to NULL implies that those pointers > become inaccessible to the parent query's execution after > EvalPlanQual() returns. However, nothing in the tree today accesses > ResulRelInfos through es_result_relations array, except during > ExecInit* stage (see ExecInitForeignScan()) but it would still be > intact at that stage. > > With the lazy-initialization patch though, we do check > es_result_relations when trying to open a result relation to see if it > has already been initialized (a non-NULL pointer in that array means > yes), so resetting it in the middle of the execution can't be safe. > For one example, we will end up initializing the same relation many > times after not finding it in es_result_relations and also add it > *duplicatively* to es_opened_result_relations list, breaking the > invariant that that list contains distinct relations. Pushed that thinko-fix, thanks! - Heikki
On 22/10/2020 16:49, Amit Langote wrote: > On Tue, Oct 20, 2020 at 9:57 PM Amit Langote <amitlangote09@gmail.com> wrote: >> On Mon, Oct 19, 2020 at 8:55 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: >>> It's probably true that there's no performance gain from initializing >>> them more lazily. But the reasoning and logic around the initialization >>> is complicated. After tracing through various path through the code, I'm >>> convinced enough that it's correct, or at least these patches didn't >>> break it, but I still think some sort of lazy initialization on first >>> use would make it more readable. Or perhaps there's some other >>> refactoring we could do. >> >> So the other patch I have mentioned is about lazy initialization of >> the ResultRelInfo itself, not the individual fields, but maybe with >> enough refactoring we can get the latter too. > > So, I tried implementing a lazy-initialization-on-first-access > approach for both the ResultRelInfos themselves and some of the > individual fields of ResultRelInfo that don't need to be set right > away. You can see the end result in the attached 0003 patch. This > slims down ExecInitModifyTable() significantly, both in terms of code > footprint and the amount of work that it does. Have you done any performance testing? I'd like to know how much of a difference this makes in practice. Another alternative is to continue to create the ResultRelInfos in ExecInitModify(), but initialize the individual fields in them lazily. Does this patch become moot if we do the "Overhaul UPDATE/DELETE processing"? (https://www.postgresql.org/message-id/CA%2BHiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ%40mail.gmail.com)? - Heikki
On Fri, Oct 23, 2020 at 4:04 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > On 22/10/2020 16:49, Amit Langote wrote: > > On Tue, Oct 20, 2020 at 9:57 PM Amit Langote <amitlangote09@gmail.com> wrote: > >> On Mon, Oct 19, 2020 at 8:55 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > >>> It's probably true that there's no performance gain from initializing > >>> them more lazily. But the reasoning and logic around the initialization > >>> is complicated. After tracing through various path through the code, I'm > >>> convinced enough that it's correct, or at least these patches didn't > >>> break it, but I still think some sort of lazy initialization on first > >>> use would make it more readable. Or perhaps there's some other > >>> refactoring we could do. > >> > >> So the other patch I have mentioned is about lazy initialization of > >> the ResultRelInfo itself, not the individual fields, but maybe with > >> enough refactoring we can get the latter too. > > > > So, I tried implementing a lazy-initialization-on-first-access > > approach for both the ResultRelInfos themselves and some of the > > individual fields of ResultRelInfo that don't need to be set right > > away. You can see the end result in the attached 0003 patch. This > > slims down ExecInitModifyTable() significantly, both in terms of code > > footprint and the amount of work that it does. > > Have you done any performance testing? I'd like to know how much of a > difference this makes in practice. I have shown some numbers here: https://www.postgresql.org/message-id/CA+HiwqG7ZruBmmih3wPsBZ4s0H2EhywrnXEduckY5Hr3fWzPWA@mail.gmail.com To reiterate, if you apply the following patch: > Does this patch become moot if we do the "Overhaul UPDATE/DELETE > processing"? > (https://www.postgresql.org/message-id/CA%2BHiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ%40mail.gmail.com)? ...and run this benchmark with plan_cache_mode = force_generic_plan pgbench -i -s 10 --partitions={0, 10, 100, 1000} pgbench -T120 -f test.sql -M prepared test.sql: \set aid random(1, 1000000) update pgbench_accounts set abalance = abalance + 1 where aid = :aid; you may see roughly the following results: HEAD: 0 tps = 13045.485121 (excluding connections establishing) 10 tps = 9358.157433 (excluding connections establishing) 100 tps = 1878.274500 (excluding connections establishing) 1000 tps = 84.684695 (excluding connections establishing) Patched (overhaul update/delete processing): 0 tps = 12743.487196 (excluding connections establishing) 10 tps = 12644.240748 (excluding connections establishing) 100 tps = 4158.123345 (excluding connections establishing) 1000 tps = 391.248067 (excluding connections establishing) And if you apply the patch being discussed here, TPS shoots up a bit, especially for higher partition counts: Patched (lazy-ResultRelInfo-initialization) 0 tps = 13419.283168 (excluding connections establishing) 10 tps = 12588.016095 (excluding connections establishing) 100 tps = 8560.824225 (excluding connections establishing) 1000 tps = 1926.553901 (excluding connections establishing) To explain these numbers a bit, "overheaul update/delete processing" patch improves the performance of that benchmark by allowing the updates to use run-time pruning when executing generic plans, which they can't today. However without "lazy-ResultRelInfo-initialization" patch, ExecInitModifyTable() (or InitPlan() when I ran those benchmarks) can be seen to be spending time initializing all of those result relations, whereas only one of those will actually be used. As mentioned further in that email, it's really the locking of all relations by AcquireExecutorLocks() that occurs even before we enter the executor that's a much thornier bottleneck for this benchmark. But the ResultRelInfo initialization bottleneck sounded like it could get alleviated in a relatively straightforward manner. The patches that were developed for attacking the locking bottleneck would require further reflection on whether they are correct. (Note: I've just copy pasted the numbers I reported in that email. To reproduce, I'll have to rebase the "overhaul update/delete processing" patch on this one, which I haven't yet done.) > Another alternative is to continue to create the ResultRelInfos in > ExecInitModify(), but initialize the individual fields in them lazily. If you consider the above, maybe you can see how that will not really eliminate the bottleneck I'm aiming to fix here. -- Amit Langote EDB: http://www.enterprisedb.com
On 23/10/2020 12:37, Amit Langote wrote: > To explain these numbers a bit, "overheaul update/delete processing" > patch improves the performance of that benchmark by allowing the > updates to use run-time pruning when executing generic plans, which > they can't today. > > However without "lazy-ResultRelInfo-initialization" patch, > ExecInitModifyTable() (or InitPlan() when I ran those benchmarks) can > be seen to be spending time initializing all of those result > relations, whereas only one of those will actually be used. > > As mentioned further in that email, it's really the locking of all > relations by AcquireExecutorLocks() that occurs even before we enter > the executor that's a much thornier bottleneck for this benchmark. > But the ResultRelInfo initialization bottleneck sounded like it could > get alleviated in a relatively straightforward manner. The patches > that were developed for attacking the locking bottleneck would require > further reflection on whether they are correct. > > (Note: I've just copy pasted the numbers I reported in that email. To > reproduce, I'll have to rebase the "overhaul update/delete processing" > patch on this one, which I haven't yet done.) Ok, thanks for the explanation, now I understand. This patch looks reasonable to me at a quick glance. I'm a bit worried or unhappy about the impact on FDWs, though. It doesn't seem nice that the ResultRelInfo is not available in the BeginDirectModify call. It's not too bad, the FDW can call ExecGetResultRelation() if it needs it, but still. Perhaps it would be better to delay calling BeginDirectModify() until the first modification is performed, to avoid any initialization overhead there, like establishing the connection in postgres_fdw. But since this applies on top of the "overhaul update/delete processing" patch, let's tackle that patch set next. Could you rebase that, please? - Heikki
On Tue, Oct 27, 2020 at 10:23 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > On 23/10/2020 12:37, Amit Langote wrote: > > To explain these numbers a bit, "overheaul update/delete processing" > > patch improves the performance of that benchmark by allowing the > > updates to use run-time pruning when executing generic plans, which > > they can't today. > > > > However without "lazy-ResultRelInfo-initialization" patch, > > ExecInitModifyTable() (or InitPlan() when I ran those benchmarks) can > > be seen to be spending time initializing all of those result > > relations, whereas only one of those will actually be used. > > > > As mentioned further in that email, it's really the locking of all > > relations by AcquireExecutorLocks() that occurs even before we enter > > the executor that's a much thornier bottleneck for this benchmark. > > But the ResultRelInfo initialization bottleneck sounded like it could > > get alleviated in a relatively straightforward manner. The patches > > that were developed for attacking the locking bottleneck would require > > further reflection on whether they are correct. > > > > (Note: I've just copy pasted the numbers I reported in that email. To > > reproduce, I'll have to rebase the "overhaul update/delete processing" > > patch on this one, which I haven't yet done.) > > Ok, thanks for the explanation, now I understand. > > But since this applies on top of the "overhaul update/delete processing" > patch, let's tackle that patch set next. Could you rebase that, please? Actually, I made lazy-ResultRelInfo-initialization apply on HEAD directly at one point because of its separate CF entry, that is, to appease the CF app's automatic patch tester that wouldn't know to apply the other patch first. Because both of these patch sets want to change thow ModifyTable works, there are conflicts. The "overhaul update/delete processing" patch is somewhat complex and I expect some amount of back and forth on its design points. OTOH, the lazy-ResultRelInfo-initialization patch is straightforward enough that I hoped it would be easier to bring it into a committable state than the other. But I can see why one may find it hard to justify committing the latter without the former already in, because the bottleneck it purports to alleviate (that of eager ResultRelInfo initialization) is not apparent until update/delete can use run-time pruning. Anyway, I will post the rebased patch on the "overhaul update/delete processing" thread. -- Amit Langote EDB: http://www.enterprisedb.com
On Tue, Oct 27, 2020 at 10:23 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > This patch looks reasonable to me at a quick glance. I'm a bit worried > or unhappy about the impact on FDWs, though. It doesn't seem nice that > the ResultRelInfo is not available in the BeginDirectModify call. It's > not too bad, the FDW can call ExecGetResultRelation() if it needs it, > but still. Perhaps it would be better to delay calling > BeginDirectModify() until the first modification is performed, to avoid > any initialization overhead there, like establishing the connection in > postgres_fdw. Ah, calling BeginDirectModify() itself lazily sounds like a good idea; see attached updated 0001 to see how that looks. While updating that patch, I realized that the ForeignScan.resultRelation that we introduced in 178f2d560d will now be totally useless. :-( -- Amit Langote EDB: http://www.enterprisedb.com
Вложения
On Wed, Oct 28, 2020 at 12:02 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Tue, Oct 27, 2020 at 10:23 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > But since this applies on top of the "overhaul update/delete processing" > > patch, let's tackle that patch set next. Could you rebase that, please? > > > Anyway, I will post the rebased patch on the "overhaul update/delete > processing" thread. Done. -- Amit Langote EDB: http://www.enterprisedb.com
On Wed, Oct 28, 2020 at 4:46 PM Amit Langote <amitlangote09@gmail.com> wrote: > > On Tue, Oct 27, 2020 at 10:23 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > This patch looks reasonable to me at a quick glance. I'm a bit worried > > or unhappy about the impact on FDWs, though. It doesn't seem nice that > > the ResultRelInfo is not available in the BeginDirectModify call. It's > > not too bad, the FDW can call ExecGetResultRelation() if it needs it, > > but still. Perhaps it would be better to delay calling > > BeginDirectModify() until the first modification is performed, to avoid > > any initialization overhead there, like establishing the connection in > > postgres_fdw. > > Ah, calling BeginDirectModify() itself lazily sounds like a good idea; > see attached updated 0001 to see how that looks. While updating that > patch, I realized that the ForeignScan.resultRelation that we > introduced in 178f2d560d will now be totally useless. :-( Given that we've closed the CF entry for this thread and given that there seems to be enough context to these patches, I will move these patches back to their original thread, that is: * ModifyTable overheads in generic plans * https://www.postgresql.org/message-id/flat/CA%2BHiwqE4k1Q2TLmCAvekw%2B8_NXepbnfUOamOeX%3DKpHRDTfSKxA%40mail.gmail.com That will also make the CF-bot happy. -- Amit Langote EDB: http://www.enterprisedb.com