Обсуждение: MERGE SQL statement for PG12

Поиск
Список
Период
Сортировка

MERGE SQL statement for PG12

От
Pavan Deolasee
Дата:
Hello,

I would like to resubmit the MERGE patch for PG12. The past discussions about the patch can be found here [1] [2].

The patch is rebased on the current master. But otherwise I haven't done any further work on it since it was punted from PG11. Couple of hackers had expressed desire to review the patch much more carefully and possibly also help in reworking some bits of it. So the idea is to get those reviews going. Once that happens, I can address the review comments in a more meaningful way.

Thanks,
Pavan


 
--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Вложения

Re: MERGE SQL statement for PG12

От
Andrew Dunstan
Дата:

On 06/19/2018 07:06 AM, Pavan Deolasee wrote:
> Hello,
>
> I would like to resubmit the MERGE patch for PG12. The past 
> discussions about the patch can be found here [1] [2].
>
> The patch is rebased on the current master. But otherwise I haven't 
> done any further work on it since it was punted from PG11. Couple of 
> hackers had expressed desire to review the patch much more carefully 
> and possibly also help in reworking some bits of it. So the idea is to 
> get those reviews going. Once that happens, I can address the review 
> comments in a more meaningful way.
>
> Thanks,
> Pavan
>
> [1] 
> https://www.postgresql.org/message-id/CANP8%2BjKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc%2BXrz8QB0nXA%40mail.gmail.com
>
> [2] 
> https://www.postgresql.org/message-id/CAH2-WzkJdBuxj9PO%3D2QaO9-3h3xGbQPZ34kJH%3DHukRekwM-GZg%40mail.gmail.com
>
> -- 
>  Pavan Deolasee http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services


It's already in the commitfest, although I think it's almost certain to 
be pushed out to the September CF. I'll add this email to the existing item.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: MERGE SQL statement for PG12

От
Pavan Deolasee
Дата:


On Tue, Jun 19, 2018 at 4:41 PM, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:


It's already in the commitfest, although I think it's almost certain to be pushed out to the September CF. I'll add this email to the existing item.


Thanks Andrew; I was gonna do that once the email gets archives but thanks for taking care of it. I changed the status to Needs Review, though I understand this patch might be pushed to Sep CF. There were suggestions in the last release cycle that this feature should get early into PG12, if at all, given the complexity of the patch. So I thought it makes sense to submit it early.
 
Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: MERGE SQL statement for PG12

От
Peter Geoghegan
Дата:
On Tue, Jun 19, 2018 at 4:06 AM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
> I would like to resubmit the MERGE patch for PG12. The past discussions
> about the patch can be found here [1] [2].

FWIW, I'm really glad that you're going to work on this for v12.

-- 
Peter Geoghegan


Re: MERGE SQL statement for PG12

От
Alvaro Herrera
Дата:
On 2018-Jun-19, Pavan Deolasee wrote:

> Hello,
> 
> I would like to resubmit the MERGE patch for PG12. The past discussions
> about the patch can be found here [1] [2].

Hello.  A very minor thing, please see commit 15a8f8caad14 and the
discussion that led to it.

Thanks

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: MERGE SQL statement for PG12

От
Pavan Deolasee
Дата:

I've rebased the patch against the current master. The patch hasn't changed much since the last time.

I hope that the patch gets some review this CF so that we're left with enough time to address those reviews and get the patch committed early in the cycle. Copying Tom and Andres since they had expressed willingness to review the patch in detail when the new development cycle opens. But fresh reviews from Peter G, others is welcome too.

On Mon, Jun 25, 2018 at 11:13 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Hello.  A very minor thing, please see commit 15a8f8caad14 and the
discussion that led to it.


Alvaro, thanks for the reminder. I've fixed that in the rebased patch. While working on it, I thought we should just consolidate these different counters in an array. Something like this:

+typedef enum TupleCounterType
+{
+ TUPLE_COUNTER_PROCESSED = 0,
+ TUPLE_COUNTER_CONFLICTING,
+ TUPLE_COUNTER_IOS_HEAP_FETCHES,
+ TUPLE_COUNTER_MERGE_INSERTED,
+ TUPLE_COUNTER_MERGE_UPDATED,
+ TUPLE_COUNTER_MERGE_DELETED,
+ TUPLE_COUNTER_FILTERED_BY_JOIN_QUALS,
+ TUPLE_COUNTER_FILTERED_BY_OTHER_QUALS,
+ TUPLE_COUNTER_MAX_COUNT /* should be last */
+} TupleCounterType;
+
 typedef struct Instrumentation
 {
  /* Parameters set at node creation: */
@@ -56,14 +69,9 @@ typedef struct Instrumentation
  /* Accumulated statistics across all completed cycles: */
  double startup; /* Total startup time (in seconds) */
  double total; /* Total total time (in seconds) */
- double ntuples; /* Total tuples produced */
- /* Additional node-specific tuple counters */
- double node_ntuples1;
- double node_ntuples2;
- double node_ntuples3;
+ /* Tuple counters */
+ double ntuples[TUPLE_COUNTER_MAX_COUNT];
  double nloops; /* # of run cycles for this node */
- double nfiltered1; /* # tuples removed by scanqual or joinqual */
- double nfiltered2; /* # tuples removed by "other" quals */
  BufferUsage bufusage; /* Total buffer usage */
 } Instrumentation;
 
And then have matching macros to get/set/manage those counters per type. Do you see a value in doing so?

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Вложения

Re: MERGE SQL statement for PG12

От
Jaime Casanova
Дата:
On Tue, 4 Sep 2018 at 00:01, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
>
> I've rebased the patch against the current master. The patch hasn't changed much since the last time.
>

Hi Pavan,

I had this crash when running sqlsmith against the previous version of
this patch and just confirmed it still crash with this version (which
makes sense because you said patch hasn't changed much)

To reproduce run this query against regression database:

"""
MERGE INTO public.pagg_tab_ml_p3 as target_0
USING public.prt2_l_p3_p2 as ref_0 ON target_0.a = ref_0.a
WHEN MATCHED AND cast(null as tid) <= cast(null as tid) THEN DELETE;
"""

attached is a backtrace

-- 
Jaime Casanova                      www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: MERGE SQL statement for PG12

От
Pavan Deolasee
Дата:


On Tue, Sep 4, 2018 at 11:51 AM Jaime Casanova <jaime.casanova@2ndquadrant.com> wrote:
On Tue, 4 Sep 2018 at 00:01, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
>
> I've rebased the patch against the current master. The patch hasn't changed much since the last time.
>

Hi Pavan,

I had this crash when running sqlsmith against the previous version of
this patch and just confirmed it still crash with this version (which
makes sense because you said patch hasn't changed much)

Hi Jaime,

Thanks for taking efforts to run sqlsmith. I've fixed the problem in the attached patch. Please confirm and let me know if sqlsmith throws more errors with the revised version.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Вложения

Re: MERGE SQL statement for PG12

От
Pavan Deolasee
Дата:

A new version rebased against the current master is attached.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Вложения

Re: MERGE SQL statement for PG12

От
Jaime Casanova
Дата:
On Mon, 24 Sep 2018 at 05:15, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
>
> A new version rebased against the current master is attached.
>

Hi Pavan,

A day after you posted this patch commit
29c94e03c7d05d2b29afa1de32795ce178531246 removed ExecStoreTuple.
I'm right in believe that the change in
src/backend/executor/execMerge.c should be for ExecStoreHeapTuples?

-       ExecStoreTuple(&tuple, mtstate->mt_existing, buffer, false);
+       ExecStoreHeapTuple(&tuple, mtstate->mt_existing, false);


-- 
Jaime Casanova                      www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: MERGE SQL statement for PG12

От
Pavan Deolasee
Дата:


On Sun, Sep 30, 2018 at 2:55 AM Jaime Casanova <jaime.casanova@2ndquadrant.com> wrote:
On Mon, 24 Sep 2018 at 05:15, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
>
> A new version rebased against the current master is attached.
>

Hi Pavan,

A day after you posted this patch commit
29c94e03c7d05d2b29afa1de32795ce178531246 removed ExecStoreTuple.
I'm right in believe that the change in
src/backend/executor/execMerge.c should be for ExecStoreHeapTuples?

-       ExecStoreTuple(&tuple, mtstate->mt_existing, buffer, false);
+       ExecStoreHeapTuple(&tuple, mtstate->mt_existing, false);


Hi Jaime,

Thanks for keeping an eye on the patch. I've rebased the patch against the current master. A new version is attached.

Thanks,
Pavan 

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Вложения

Re: MERGE SQL statement for PG12

От
Tomas Vondra
Дата:
Hi Pavan,

On 10/29/2018 10:23 AM, Pavan Deolasee wrote:
> 
> ...
> 
> Thanks for keeping an eye on the patch. I've rebased the patch
> against the current master. A new version is attached.
> 
> Thanks,
> Pavan
> 

It's good to see the patch moving forward. What are your thoughts 
regarding things that need to be improved/fixed to make it committable?

I briefly discussed the patch with a couple of people at pgconf.eu last 
week, and IIRC the main concern was how the merge is represented in 
parser/executor.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: MERGE SQL statement for PG12

От
Pavan Deolasee
Дата:

Hi Tomas,

Sorry for a delayed response. 

On Mon, Oct 29, 2018 at 4:59 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Hi Pavan,

On 10/29/2018 10:23 AM, Pavan Deolasee wrote:
>
> ...
>
> Thanks for keeping an eye on the patch. I've rebased the patch
> against the current master. A new version is attached.
>
> Thanks,
> Pavan
>

It's good to see the patch moving forward. What are your thoughts
regarding things that need to be improved/fixed to make it committable?

I briefly discussed the patch with a couple of people at pgconf.eu last
week, and IIRC the main concern was how the merge is represented in
parser/executor.


Yes, Andres and to some extent Peter G had expressed concerns about that. Andres suggested that we should rework the parser and executor side. Here are some excerpts from his review comments.

<review>
"I don't think the parser / executor implementation
of MERGE is architecturally sound.  I think creating hidden joins during
parse-analysis to implement MERGE is a seriously bad idea and it needs
to be replaced by a different executor structure."

+    * As top-level statements INSERT, UPDATE and DELETE have a Query, whereas
+    * with MERGE the individual actions do not require separate planning,
+    * only different handling in the executor. See nodeModifyTable handling
+    * of commandType CMD_MERGE.
+    *
+    * A sub-query can include the Target, but otherwise the sub-query cannot
+    * reference the outermost Target table at all.
+    */
+   qry->querySource = QSRC_PARSER;

Why is this, and not building a proper executor node for merge that
knows how to get the tuples, the right approach?  We did a rough
equivalent for matview updates, and I think it turned out to be a pretty
bad plan.
</review>

<review>
On Fri, Apr 6, 2018 at 1:30 AM, Andres Freund <andres@anarazel.de> wrote:


My impression is that this simply shouldn't be going through
nodeModifyTuple, but be it's own nodeMerge node. The trigger handling
would need to be abstraced into execTrigger.c or such to avoid
duplication.  We're now going from nodeModifyTable.c:ExecModifyTable()
into execMerge.c:ExecMerge(), back to
nodeModifyTable.c:ExecUpdate/Insert(). To avoid ExecInsert() doing
things that aren't appropriate for merge, we then pass an actionState,
which neuters part of ExecUpdate/Insert(). This is just bad.

</review> 

To be honest, I need more directions on how to address these major architectural concerns. I'd tried to rework the executor part and had commented on that. But I know that was probably done in a hurry since we were trying to save a revert. Having said that, I am still not very sure how exactly the executor side should look like without causing too much duplication of handling nodeModifyTable() and proposed nodeMerge(). If Andres can give me further inputs, I will be more than happy to figure out the details and improve the patch.

As far as the parser side goes, do I understand correctly that instead of building a special join in transformMergeStmt() as the patch does today, we should follow what transformUpdateStmt() does for a potential join? i.e. just have a single RTE for the source relation and present it as a left hand side for the implicit join? If I get a little more direction on this, I can try to address the issues.

It looks very likely that the patch won't get much review in the current state. But if I get inputs, I can work out the details and try to get the patch in committable state.

BTW I am aware that the patch is bit-rotten because of the partitioning related changes. I will address those and post a revised patch soon.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: MERGE SQL statement for PG12

От
Tomas Vondra
Дата:
On 11/22/18 7:44 AM, Pavan Deolasee wrote:
> 
> Hi Tomas,
> 
> Sorry for a delayed response.
> 
> On Mon, Oct 29, 2018 at 4:59 PM Tomas Vondra 
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
> 
>     Hi Pavan,
> 
>     On 10/29/2018 10:23 AM, Pavan Deolasee wrote:
>      >
>      > ...
>      >
>      > Thanks for keeping an eye on the patch. I've rebased the patch
>      > against the current master. A new version is attached.
>      >
>      > Thanks,
>      > Pavan
>      >
> 
>     It's good to see the patch moving forward. What are your thoughts
>     regarding things that need to be improved/fixed to make it committable?
> 
>     I briefly discussed the patch with a couple of people at pgconf.eu
>     <http://pgconf.eu> last
>     week, and IIRC the main concern was how the merge is represented in
>     parser/executor.
> 
> 
> Yes, Andres and to some extent Peter G had expressed concerns about 
> that. Andres suggested that we should rework the parser and executor 
> side. Here are some excerpts from his review comments.
> 
> <review>
> "I don't think the parser / executor implementation
> ofMERGEis architecturally sound.  I think creating hidden joins during
> parse-analysis to implementMERGEis a seriously bad idea and it needs
> to be replaced by a different executor structure."
> 
> +    * As top-level statements INSERT, UPDATE and DELETE have a Query, 
> whereas
> +    * with MERGE the individual actions do not require separate planning,
> +    * only different handling in the executor. See nodeModifyTable handling
> +    * of commandType CMD_MERGE.
> +    *
> +    * A sub-query can include the Target, but otherwise the sub-query 
> cannot
> +    * reference the outermost Target table at all.
> +    */
> +   qry->querySource = QSRC_PARSER;
> 
> Why is this, and not building a proper executor node for merge that
> knows how to get the tuples, the right approach?  We did a rough
> equivalent for matview updates, and I think it turned out to be a pretty
> bad plan.
> </review>
> 
> <review>
> On Fri, Apr 6, 2018 at 1:30 AM, Andres Freund<andres@anarazel.de 
> <mailto:andres@anarazel.de>>wrote:
> 
> 
> 
>     My impression is that this simply shouldn't be going through
>     nodeModifyTuple, but be it's own nodeMerge node. The trigger handling
>     would need to be abstraced into execTrigger.c or such to avoid
>     duplication.  We're now going from nodeModifyTable.c:ExecModifyTable()
>     into execMerge.c:ExecMerge(), back to
>     nodeModifyTable.c:ExecUpdate/Insert(). To avoid ExecInsert() doing
>     things that aren't appropriate formerge, we then pass an actionState,
>     which neuters part of ExecUpdate/Insert(). This is just bad.
> 
> 
> </review>
> 
> To be honest, I need more directions on how to address these major 
> architectural concerns. I'd tried to rework the executor part and had 
> commented on that. But I know that was probably done in a hurry since we 
> were trying to save a revert. Having said that, I am still not very sure 
> how exactly the executor side should look like without causing too much 
> duplication of handling nodeModifyTable() and proposed nodeMerge(). If 
> Andres can give me further inputs, I will be more than happy to figure 
> out the details and improve the patch.
> 

I think not going through nodeModifyTable and having a separate 
nodeMerge.c makes sense. It might result in some code being duplicated, 
but I suppose that code can be shared (wrapped as a function, moved to 
some file shared by the two nodes). I wouldn't expect the result to be 
particularly ugly, at least not compared to how nodeModifyTable is 
twisted with the current patch.

> As far as the parser side goes, do I understand correctly that instead 
> of building a special join in transformMergeStmt() as the patch does 
> today, we should follow what transformUpdateStmt() does for a potential 
> join? i.e. just have a single RTE for the source relation and present it 
> as a left hand side for the implicit join? If I get a little more 
> direction on this, I can try to address the issues.
> 

Not sure.

> It looks very likely that the patch won't get much review in the current 
> state. But if I get inputs, I can work out the details and try to get 
> the patch in committable state.
> 
> BTW I am aware that the patch is bit-rotten because of the partitioning 
> related changes. I will address those and post a revised patch soon.
> 

thanks

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: MERGE SQL statement for PG12

От
Pavan Deolasee
Дата:

Hi Tomas,

On Thu, Nov 22, 2018 at 10:03 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

I think not going through nodeModifyTable and having a separate
nodeMerge.c makes sense. It might result in some code being duplicated,
but I suppose that code can be shared (wrapped as a function, moved to
some file shared by the two nodes). I wouldn't expect the result to be
particularly ugly, at least not compared to how nodeModifyTable is
twisted with the current patch.

Ok. I will try that approach again. In the meanwhile, I am posting a rebased version. There had been quite a lot changes on partitioning side and that caused non-trivial conflicts. I noticed a couple of problems during the rebase, but I haven't attempted to address them fully yet, since I think a detailed review at some point might require us to change that anyways.

- Noticed that partColsUpdated is not set correctly in case of MERGE since we never get to call expand_partitioned_rtentry() for the partition table in case of MERGE. This right now does not cause significant problem, since we initialise the required states by other means. But we should fix this.

- I am not entirely sure if the tuple-conversion business is bug-free for MERGE, post this rebase. Since this code has changed quite a bit in the master,  I will have another look and check. The tests do not show any issues, but that could also be because of lack of tests in this area with respect to MERGE.

- I am not sure if I adopted the slot related changes introduced by 1a0586de3657cd35581f0639c87d5050c6197bb7.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Вложения

Re: MERGE SQL statement for PG12

От
Amit Langote
Дата:
Hi Pavan,

Thanks for continuing to work on this.

On 2018/11/27 20:18, Pavan Deolasee wrote:
> Ok. I will try that approach again. In the meanwhile, I am posting a
> rebased version. There had been quite a lot changes on partitioning side
> and that caused non-trivial conflicts. I noticed a couple of problems
> during the rebase, but I haven't attempted to address them fully yet, since
> I think a detailed review at some point might require us to change that
> anyways.
> 
> - Noticed that partColsUpdated is not set correctly in case of MERGE since
> we never get to call expand_partitioned_rtentry() for the partition table
> in case of MERGE. This right now does not cause significant problem, since
> we initialise the required states by other means. But we should fix this.

Regarding this, you may want to take a look at the following patch that
refactors things in this area.

https://commitfest.postgresql.org/20/1778/

Among other changes (such as completely redesigned inheritance_planner),
expand_partitioned_rtentry is now called after entering make_one_rel()
with the patch, which is much later than currently.  That allows us to
initialize RTEs and RelOptInfos for only the partitions that are left
after pruning.  As of now, it's called at a point in subquery_planner
where it's too soon to prune, so  expand_partitioned_rtentry ends up
building RTEs for *all* partitions.  I think that is something we're going
to have change in the long run anyway, so perhaps something you should
consider when designing related parts of MERGE.  I will try to look at
that portion of your patch maybe next month.

Thanks,
Amit



Re: MERGE SQL statement for PG12

От
Pavan Deolasee
Дата:


On Tue, Nov 27, 2018 at 4:48 PM Pavan Deolasee <pavan.deolasee@gmail.com> wrote:

 In the meanwhile, I am posting a rebased version. 

Another rebase on the current master.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Вложения

Re: MERGE SQL statement for PG12

От
Mi Tar
Дата:
Hi!

Looking at the commitfest as a novice contributor I was searching for patches to review without any reviewers set. And
becauseI just spend some time and made a patch improving how REFRESH MATERIALIZED VIEW CONCURRENTLY works (does
INSERTs/UPDATEs/DELETEsinstead of just DELETEs/INSERTs) when I saw this patch I said to myself, great, MERGE is exactly
whatwould be needed there. Because we already have a merge implementation there (requiring unique columns). I didn't
knowthat I will discover such a long and beautiful thread.
 

So I will just add my 2c based on experience from REFRESH MATERIALIZED VIEW CONCURRENTLY work. I think that we would
needan additional statement-level trigger for MERGE, instead of it being exposed as INSERT, UPDATE, and DELETE
triggers.Because it is really tricky to make triggers work if you want to know how exactly the table as changed through
MERGEif this is split into three separate triggers and transient relations. If we do not have a new statement-level
triggerfor MERGE, then this is really just a syntactic sugar on top of INSERTs, UPDATEs, and DELETEs.
 


Mitar

Re: MERGE SQL statement for PG12

От
Robert Haas
Дата:
On Thu, Jan 3, 2019 at 2:11 AM Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
> On Tue, Nov 27, 2018 at 4:48 PM Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
>>  In the meanwhile, I am posting a rebased version.
>
> Another rebase on the current master.

I feel like there has been some other thread where this was discussed,
but I can't find it right now.  I think that the "query construction"
logic in transformMergeStmt is fundamentally the wrong way to do this.
I think several others have said the same.  And I don't think this
should be considered for commit until that is rewritten.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: MERGE SQL statement for PG12

От
Peter Geoghegan
Дата:
On Thu, Jan 10, 2019 at 1:15 PM Robert Haas <robertmhaas@gmail.com> wrote:
> I feel like there has been some other thread where this was discussed,
> but I can't find it right now.  I think that the "query construction"
> logic in transformMergeStmt is fundamentally the wrong way to do this.
> I think several others have said the same.  And I don't think this
> should be considered for commit until that is rewritten.

I agree with that.

I think that it's worth acknowledging that Pavan is in a difficult
position with this patch. As things stand, Pavan really needs input
from a couple of people to put the query construction stuff on the
right path, and that input has not been forthcoming. I'm not
suggesting that anybody has failed to meet an obligation to Pavan to
put time in here, or that anybody has suggested that this is down to a
failure on Pavan's part. I'm merely pointing out that Pavan is in an
unenviable position with this patch, through no fault of his own, and
despite a worthy effort.

I hope that he sticks it out, because that seems to be what it takes
to see something like this through. I don't know how to do better at
that.

-- 
Peter Geoghegan


Re: MERGE SQL statement for PG12

От
Pavan Deolasee
Дата:
Hi Robert,

Thanks for the comments.

On Fri, Jan 11, 2019 at 2:45 AM Robert Haas <robertmhaas@gmail.com> wrote:

I feel like there has been some other thread where this was discussed,
but I can't find it right now.  I think that the "query construction"
logic in transformMergeStmt is fundamentally the wrong way to do this.
I think several others have said the same. 

Can you please help me understand what's fundamentally wrong with the approach and more importantly, can you please explain what would the the architecturally sound way to do this? The same also applies to the executor side where the current approach is deemed wrong, but very little is said on what's the correct way. 
 
And I don't think this
should be considered for commit until that is rewritten.
 
I understand. I'm not suggesting that the patch is in committable shape, if it wasn't last April, because nothing has changed since then. The purpose of keeping it up-to-date is to solicit feedback and directions and to show that my interest in the patch is still intact.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: MERGE SQL statement for PG12

От
Tomas Vondra
Дата:

On 1/11/19 12:26 AM, Peter Geoghegan wrote:
> On Thu, Jan 10, 2019 at 1:15 PM Robert Haas <robertmhaas@gmail.com> wrote:
>> I feel like there has been some other thread where this was discussed,
>> but I can't find it right now.  I think that the "query construction"
>> logic in transformMergeStmt is fundamentally the wrong way to do this.
>> I think several others have said the same.  And I don't think this
>> should be considered for commit until that is rewritten.
> 
> I agree with that.
> 
> I think that it's worth acknowledging that Pavan is in a difficult
> position with this patch. As things stand, Pavan really needs input
> from a couple of people to put the query construction stuff on the
> right path, and that input has not been forthcoming. I'm not
> suggesting that anybody has failed to meet an obligation to Pavan to
> put time in here, or that anybody has suggested that this is down to a
> failure on Pavan's part. I'm merely pointing out that Pavan is in an
> unenviable position with this patch, through no fault of his own, and
> despite a worthy effort.
> 

I 100% agree with this. I guess this is the phase where you have a patch
that generally does the thing you want it to do, but others are saying
the approach is not the right one. But you're under the spell of your
approach and can't really see why would it be wrong ...

> I hope that he sticks it out, because that seems to be what it takes
> to see something like this through. I don't know how to do better at
> that.

In my experience shepherding a patch like this is quite exhausting, and
while I've always advocated for pushing the MERGE patch forward, I'd not
blame Pavan one iota for just abandoning it.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: MERGE SQL statement for PG12

От
Simon Riggs
Дата:
On Mon, 14 Jan 2019 at 15:45, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:


On 1/11/19 12:26 AM, Peter Geoghegan wrote:
> On Thu, Jan 10, 2019 at 1:15 PM Robert Haas <robertmhaas@gmail.com> wrote:
>> I feel like there has been some other thread where this was discussed,
>> but I can't find it right now.  I think that the "query construction"
>> logic in transformMergeStmt is fundamentally the wrong way to do this.
>> I think several others have said the same.  And I don't think this
>> should be considered for commit until that is rewritten.
>
> I agree with that.
>
> I think that it's worth acknowledging that Pavan is in a difficult
> position with this patch. As things stand, Pavan really needs input
> from a couple of people to put the query construction stuff on the
> right path, and that input has not been forthcoming. I'm not
> suggesting that anybody has failed to meet an obligation to Pavan to
> put time in here, or that anybody has suggested that this is down to a
> failure on Pavan's part. I'm merely pointing out that Pavan is in an
> unenviable position with this patch, through no fault of his own, and
> despite a worthy effort.
>

I 100% agree with this. I guess this is the phase where you have a patch
that generally does the thing you want it to do, but others are saying
the approach is not the right one. But you're under the spell of your
approach and can't really see why would it be wrong ...

There has been *no* discussion on this point, just assertions that it is wrong. I have no clue how 3 people can all agree something is wrong without any discussion and nobody even questions it. So I think there is a spell here, either Darkness or Cone of Silence.

Where is the further detail? Why is it wrong? What bad effects happen if you do it this way? What is a better way? Is there even a different way to do it? Nothing has been said, at all, ever.

The mechanism is exactly the one that Heikki recommended for the MERGE patch some years ago, so he obviously thought it would work. That was not my invention, nor Pavan's. We already use SQL assembly in multiple other places without problem, namely Mat Views and RI Triggers; the approach used here is better than that and doesn't rely on string handling at all.

Most importantly, it works.

BUT if there really is something wrong, Pavan would love to know and is willing to make any changes suggested. Review requested.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: MERGE SQL statement for PG12

От
Robert Haas
Дата:
On Mon, Jan 14, 2019 at 1:37 AM Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
> Can you please help me understand what's fundamentally wrong with the approach and more importantly, can you please
explainwhat would the the architecturally sound way to do this? The same also applies to the executor side where the
currentapproach is deemed wrong, but very little is said on what's the correct way. 

I think the basic problem is that optimization should happen in the
optimizer, not the parser.  Deciding that MERGE looks like a RIGHT
JOIN is not a parsing task and therefore shouldn't happen in the
parser.  The guys who were (are?) proposing support for ALIGN and
NORMALIZE for temporal joins got the same complaint. Here's Tom
talking about that:

http://postgr.es/m/32265.1510681378@sss.pgh.pa.us

I realize that you may have worked around some of the specific issues
that Tom mentions in that email, but I believe the general point
remains valid.  What parse analysis should do is construct a
representation of the query the user entered, not some other query
derived from it.  And this code clearly does the latter. You can see
by the fact that it uses QSRC_PARSER, for example, a constant that is
currently unused (and for good reason).  And you can see that it
results in hacks like the cross-check to make sure that
resultRelRTE->relid == mergeRelRTE->relid.  You need that because you
are manipulating and transforming the query too early in the pipeline.
If you do the transformation in the optimizer, you won't need stuff
like this.  There are other hints in this function that you're really
doing planning tasks:

+ * The MERGE query's join can be tuned in some cases, see below for these
+ * special case tweaks.

Tuning the query is otherwise known as optimization.  Do it in the optimizer.

+ * Simplify the MERGE query as much as possible
+ *
+ * These seem like things that could go into Optimizer, but they are
+ * semantic simplifications rather than optimizations, per se.

Actually, they are optimizations.  The fact that an outer join may be
more expensive than an inner join is a costing consideration, not
something that the parser needs to know about.

+ /*
+ * This query should just provide the source relation columns. Later, in
+ * preprocess_targetlist(), we shall also add "ctid" attribute of the
+ * target relation to ensure that the target tuple can be fetched
+ * correctly.
+ */

This is not a problem unique to MERGE.  It comes up for UPDATE and
DELETE as well.  The handling is not all one place, but it's all in
the optimizer.  The place which actually adds the CTID column is in
preprocess_targetlist(), but note that it's much smarter than the code
in the MERGE patch, because it can do different things depending on
which type of rowmark is requested.  The code as it exists in the
MERGE patch can't possibly work for anything that doesn't have a CTID
table, which means it won't work for FDWs and, in the future, any new
heap types added via pluggable storage unless they happen to support
CTID.  Correctly written, MERGE will leverage what the optimizer
already knows about rowmarks rather than reinventing them in the parse
analysis phase.

You should redesign this whole representation so that you just pass
the whole query through to the optimizer without any structural
change.  Just as we do for other statements, you need to do the basic
transformation stuff like looking up relation OIDs: that has to do
with what the query MEANS and the external SQL objects on which it
DEPENDS; those things have to be figured out before planning so that
things like the plan-cache work correctly.  But also just as we do for
other statements, you should avoid having any code in this function
that has to do with the way in which the MERGE should ultimately be
executed, and you should not have any code here that builds a new
query or does heavy restructuring of the parse tree.

Since it looks like we will initially have only one strategy for
executing MERGE, it might be OK to move the transformation that you're
currently doing in transformMergeStmt() to some early phase of the
optimizer.  Here's me making approximately the same point about ALIGN
and NORMALIZE:

https://www.postgresql.org/message-id/CA+TgmoZ=UkRzpisqK5Qox_ekLG+SQP=xBeFiDkXTgLF_=1FH+Q@mail.gmail.com

However, I'm not sure that's actually the right way to go.  Something,
possibly this transformation, is causing us to end up with two copies
of the target relation. This comment is a pretty good clue that this
is nothing but an ugly kludge:

+ * While executing MERGE, the target relation is processed twice; once
+ * as a target relation and once to run a join between the target and the
+ * source. We generate two different RTEs for these two purposes, one with
+ * rte->inh set to false and other with rte->inh set to true.
+ *
+ * Since the plan re-evaluated by EvalPlanQual uses the join RTE, we must
+ * install the updated tuple in the scan corresponding to that RTE. The
+ * following member tracks the index of the second RTE for EvalPlanQual
+ * purposes. ri_mergeTargetRTI is non-zero only when MERGE is in-progress.
+ * We use ri_mergeTargetRTI to run EvalPlanQual for MERGE and
+ * ri_RangeTableIndex elsewhere.

First of all, overloading the rte->inh flag to have anything to do
with this seems dead wrong.  It's not a good idea to reuse random
Boolean variables for unrelated purposes.  Moreover, I can't see how
this could ever be made to work with partitioned tables if that flag
has already been taken over for something else.  Second, the fact that
you need logic to take the EPQ tuple from one scan and stick it into
the other scan strongly suggests to me that there shouldn't be two
scans or two RTEs in the first place.

I want to point out that it is not as if nobody has reviewed this
patch previously.  Here is Andres making basically the same point
about parse analysis that I'm making here -- FWIW, I didn't find his
reply until after I'd written the above:

https://www.postgresql.org/message-id/20180403021800.b5nsgiclzanobiup%40alap3.anarazel.de

Here he is explaining these points some more:

https://www.postgresql.org/message-id/20180405200003.gar3j26gsk32gqpe%40alap3.anarazel.de

And here's Peter Geoghegan not only explaining the same problem but
having a go at fixing it:

https://www.postgresql.org/message-id/CAH2-Wz%3DZwNQvp11XjeHo-dBLHr9GDRi1vao8w1j7LQ8mOsUkzw%40mail.gmail.com

Actually, I see that Peter Geoghegan not just the emails above but a
blizzard of other emails explaining the structural problems with the
patch, which I now see include not only the parse analysis concerns
but also the business of multiple RTEs which I mentioned above.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: MERGE SQL statement for PG12

От
Andres Freund
Дата:
Hi,

On 2019-01-15 14:05:25 -0500, Robert Haas wrote:
> On Mon, Jan 14, 2019 at 1:37 AM Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
> > Can you please help me understand what's fundamentally wrong with
> > the approach and more importantly, can you please explain what would
> > the the architecturally sound way to do this? The same also applies
> > to the executor side where the current approach is deemed wrong, but
> > very little is said on what's the correct way.
> 
> [ Long and good explanation by Robert ]

> I want to point out that it is not as if nobody has reviewed this
> patch previously.  Here is Andres making basically the same point
> about parse analysis that I'm making here -- FWIW, I didn't find his
> reply until after I'd written the above:
> 
> https://www.postgresql.org/message-id/20180403021800.b5nsgiclzanobiup%40alap3.anarazel.de
> 
> Here he is explaining these points some more:
> 
> https://www.postgresql.org/message-id/20180405200003.gar3j26gsk32gqpe%40alap3.anarazel.de
> 
> And here's Peter Geoghegan not only explaining the same problem but
> having a go at fixing it:
> 
> https://www.postgresql.org/message-id/CAH2-Wz%3DZwNQvp11XjeHo-dBLHr9GDRi1vao8w1j7LQ8mOsUkzw%40mail.gmail.com
> 
> Actually, I see that Peter Geoghegan not just the emails above but a
> blizzard of other emails explaining the structural problems with the
> patch, which I now see include not only the parse analysis concerns
> but also the business of multiple RTEs which I mentioned above.

+ many.

Pavan, I think the reason you're not getting much further feedback is
that you and Simon have gotten a lot and only incorporated feedback only
very grudgingly, if at all.  You've not even attempted to sketch out a
move of the main merge handling from parse-analysis to planner, as far
as I can tell, despite this being one of the main criticisms for about a
year. Given that not much besides rebasing has happened since v11, I
don't find it surprising that people don't want to invest more energy in
this patch.

Greetings,

Andres Freund


Re: MERGE SQL statement for PG12

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> You should redesign this whole representation so that you just pass
> the whole query through to the optimizer without any structural
> change.  Just as we do for other statements, you need to do the basic
> transformation stuff like looking up relation OIDs: that has to do
> with what the query MEANS and the external SQL objects on which it
> DEPENDS; those things have to be figured out before planning so that
> things like the plan-cache work correctly.  But also just as we do for
> other statements, you should avoid having any code in this function
> that has to do with the way in which the MERGE should ultimately be
> executed, and you should not have any code here that builds a new
> query or does heavy restructuring of the parse tree.

Just to comment on that: IMV, the primary task of parse analysis is
exactly to figure out which database objects the query references or
depends on.  We must know that much so that we can store proper
dependency information if the query goes into a view or rule.  But as
soon as you add more processing than that, you are going to run into
problems of at least two kinds:

* The further the parsetree diverges from just representing what was
entered, the harder the task of ruleutils.c to reconstruct something
that looks like what was entered.

* The more information you absorb about the referenced objects, the
more likely it is that a stored view/rule will be broken by subsequent
ALTER commands.  This connects to Robert's complaint about the parse
transformation depending on CTID, though I don't think it's quite the
same thing.  We want to minimize the dependency surface of stored
rules -- for example, it's okay for a stored view to depend on the
signature (argument and return data types) of a function, but not
for it to depend on, say, the provolatile property.  If we allowed
that then we'd have to forbid ALTER FUNCTION from changing the
volatility.  I've not looked at this patch closely enough to know
whether it moves the goalposts in terms of what a dependency-on-a-
table means, but judging from Robert's and Andres' reviews, I'm
quite worried that it does.

Delaying transformations to the rewrite or plan phases avoids these
problems because now you are past the point where the query could
be stored to or fetched from a rule.

            regards, tom lane


Re: MERGE SQL statement for PG12

От
Alvaro Herrera
Дата:
I'm going to close this patch as returned with feedback, since it's had
plenty and it's pretty clear that it will take some time to address it.
Pavan is welcome to resubmit when he has a new version.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services