Обсуждение: UPDATE using sub selects

Поиск
Список
Период
Сортировка

UPDATE using sub selects

От
NikhilS
Дата:
Hi,

As per discussion on -hackers, a patch which allows updates to use subselects is attached with this mail.

As per discussion with Tom, I have adopted the following approach:

* Introduce ROWEXPR_SUBLINK type for subqueries that allows multiple column outputs.
* Populate the targetList with PARAM_SUBLINK entries dependent on the subselects.
* Modify the targets in-place into PARAM_EXEC entries in the make_subplan phase.

The above does not require any kluges in the targetList processing code path at all.

UPDATEs seem to work fine using subselects with this patch. I have modified the update.sql regression test to include possible variations .
 
No documentation changes are present in this patch.
Feedback, comments appreciated.

Regards,
Nikhils
--
EnterpriseDB               http://www.enterprisedb.com
Вложения

Re: UPDATE using sub selects

От
Bruce Momjian
Дата:
This has been saved for the 8.4 release:

    http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

NikhilS wrote:
> Hi,
>
> As per discussion on -hackers, a patch which allows updates to use
> subselects is attached with this mail.
>
> As per discussion with Tom, I have adopted the following approach:
>
> * Introduce ROWEXPR_SUBLINK type for subqueries that allows multiple column
> outputs.
> * Populate the targetList with PARAM_SUBLINK entries dependent on the
> subselects.
> * Modify the targets in-place into PARAM_EXEC entries in the make_subplan
> phase.
>
> The above does not require any kluges in the targetList processing code path
> at all.
>
> UPDATEs seem to work fine using subselects with this patch. I have modified
> the update.sql regression test to include possible variations .
>
> No documentation changes are present in this patch.
> Feedback, comments appreciated.
>
> Regards,
> Nikhils
> --
> EnterpriseDB               http://www.enterprisedb.com

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: UPDATE using sub selects

От
Tom Lane
Дата:
NikhilS <nikkhils@gmail.com> writes:
> As per discussion on -hackers, a patch which allows updates to use
> subselects is attached with this mail.

I looked over this patch briefly, and I'm afraid it still needs a lot of
work.

The way you altered transformUpdateStmt is a mess; it's duplicating
logic and also using inapplicable tests.  (In particular, invoking
checkInsertTargets on a subset of the complete target list is just not
sensible --- it's supposed to be catching conflicts.  I think you really
don't need it at all anyway, because the needed tests are done
downstream in the rewriter.)  I think the right way is to expand out the
row-assignment portions of the SET target list and then process the
whole SET list the same as before.

The real problem though is that the patch creates a new storage method
and processing path for ROWEXPR_SUBLINK sublinks that's got nothing to
do with the other kinds of sublinks.  This is going to be a mess and a
fertile source of bugs.  (You already have quite a few stemming from
having overlooked all the places that have to be touched to add a field
to Query, eg backend/nodes/, optimizer/util/clauses.c, ruleutils.c...)

The basic idea is not bad, but I think if we're going to make a list of
subqueries to attach to Query, we should try to do all the forms of
SubLink that way, not just one.  From a logical point of view all the
single-result-row forms of SubLink are about the same, and we should
strive to unify them not make them even more different.  Also there are
a lot of kluges that could be got rid of if subqueries were pulled out
of expression trees leaving only Param references behind.  We'd need
something a bit more general than Param, perhaps, depending on how we
wanted to distinguish output values from different subqueries.  Right
now we don't have to worry about that at parse time, but can leave it to
the planner to assign globally unique Param IDs; but that work would
have to be pushed further upstream.  (BTW, I'm fairly certain your patch
won't work for multiple subselects in one UPDATE, because it fails to
deal with this point.  The test cases in the patch do NOT exercise the
case because you don't have multiple multiple-output subselects in any
of them.)

I thought about trying to fix this stuff but soon concluded that it
was more work than I could justify spending on one patch during
commit fest.  It has to be bounced back for now.

            regards, tom lane

Re: UPDATE using sub selects

От
Tom Lane
Дата:
I wrote:
> ... eg backend/nodes/, optimizer/util/clauses.c, ruleutils.c...)

Actually, on further thought ruleutils.c represents one of the biggest
stumbling blocks here.  We have to be able to reverse-compile the parse
tree into something that's at least semantically equivalent to the
original query.  The existing kluge for UPDATE SET (columns) = ... can
ignore this because it rearranges the query into a sematically
equivalent update with independent SET targets, but the whole problem
with sub-select updates is that there *is* no semantically equivalent
command with a flat targetlist.  So one way or another there will have
to be more information in the parse tree than there is now.

If we use Params that can be traced back to specific SubLink list
entries, it might be okay to finesse this by having ruleutils.c check
for successive targetlist entries that reference outputs of the same
SubLink.  That's pretty ugly but it might be better than changing the
representation of targetlists, which is something that's understood
by one heck of a lot of code.

            regards, tom lane

Re: UPDATE using sub selects

От
NikhilS
Дата:
Hi Tom,

> ... eg backend/nodes/, optimizer/util/clauses.c, ruleutils.c...)

Actually, on further thought ruleutils.c represents one of the biggest
stumbling blocks here.  We have to be able to reverse-compile the parse
tree into something that's at least semantically equivalent to the
original query.  The existing kluge for UPDATE SET (columns) = ... can
ignore this because it rearranges the query into a sematically
equivalent update with independent SET targets, but the whole problem
with sub-select updates is that there *is* no semantically equivalent
command with a flat targetlist.  So one way or another there will have
to be more information in the parse tree than there is now.

If we use Params that can be traced back to specific SubLink list
entries, it might be okay to finesse this by having ruleutils.c check
for successive targetlist entries that reference outputs of the same
SubLink.  That's pretty ugly but it might be better than changing the
representation of targetlists, which is something that's understood
by one heck of a lot of code.

Thanks a lot for taking a look at this patch. As mentioned and detailed out by you, this patch *does* need a lot more amount of work and is certainly not a simpler effort that I had envisioned it to be earlier. Another issue with the patch as it stands today is that it does not work with correlated subqueries. This will require some more work too. So for example, something of this sort does not work yet:
 
UPDATE update_tbl set (a, b) = (select a, b from other_tbl where c = update_tbl.c) where a = 10;

I will try to have another shot at it if I can, before the next commit fest.

Thanks and Regards,
Nikhils

--
EnterpriseDB http://www.enterprisedb.com

Re: UPDATE using sub selects

От
Bruce Momjian
Дата:
Patch removed from patch queue --- NikhilS, please resubmit when you are
ready.  Thanks.

---------------------------------------------------------------------------

NikhilS wrote:
> Hi,
>
> As per discussion on -hackers, a patch which allows updates to use
> subselects is attached with this mail.
>
> As per discussion with Tom, I have adopted the following approach:
>
> * Introduce ROWEXPR_SUBLINK type for subqueries that allows multiple column
> outputs.
> * Populate the targetList with PARAM_SUBLINK entries dependent on the
> subselects.
> * Modify the targets in-place into PARAM_EXEC entries in the make_subplan
> phase.
>
> The above does not require any kluges in the targetList processing code path
> at all.
>
> UPDATEs seem to work fine using subselects with this patch. I have modified
> the update.sql regression test to include possible variations .
>
> No documentation changes are present in this patch.
> Feedback, comments appreciated.
>
> Regards,
> Nikhils
> --
> EnterpriseDB               http://www.enterprisedb.com

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +