Обсуждение: Writeable CTEs and empty relations

Поиск

Список

Период

Сортировка

Writeable CTEs and empty relations

От

Marko Tiikkaja

Дата:

08 февраля 2010 г., 15:30:32

Hi,

While playing around with another issue with the patch, I came across
the following:

=> create table foo(a int);
CREATE TABLE
=> with t as (insert into foo values(0)) select * from foo;a
---
(0 rows)

I traced this down to heapam.c, which has this:

/** return null immediately if relation is empty*/if (scan->rs_nblocks == 0){    Assert(!BufferIsValid(scan->rs_cbuf));
  tuple->t_data = NULL;    return;}
 

and

/*
* Determine the number of blocks we have to scan.
*
* It is sufficient to do this once at scan start, since any tuples added
* while the scan is in progress will be invisible to my snapshot anyway.
* (That is not true when using a non-MVCC snapshot.  However, we couldn't
* guarantee to return tuples added after scan start anyway, since they
* might go into pages we already scanned.      To guarantee consistent
* results for a non-MVCC snapshot, the caller must hold some higher-level
* lock that ensures the interesting tuple(s) won't change.)
*/
scan->rs_nblocks = RelationGetNumberOfBlocks(scan->rs_rd);


This doesn't exactly work anymore since we modify the snapshot after
calling ExecInitScan().  I'm not really familiar with this part of the
code, so I'm asking: is there a simple enough way around this?  Would
updating scan->rs_nblocks before scanning the first tuple be OK?


Regards,
Marko Tiikkaja

Re: Writeable CTEs and empty relations

От

Tom Lane

Дата:

08 февраля 2010 г., 19:09:34

Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes:
> I traced this down to heapam.c, which has this:
> ...
> This doesn't exactly work anymore since we modify the snapshot after
> calling ExecInitScan().

So don't do that.  The executor is generally entitled to assume that
parameters given to ExecutorStart are correct.  In particular, changing
the snapshot after the query has started to run seems to me to ensure
all sorts of inconsistent and undesirable behavior.
        regards, tom lane

Re: Writeable CTEs and empty relations

От

Marko Tiikkaja

Дата:

09 февраля 2010 г., 01:33:04

On 2010-02-09 01:09 +0200, Tom Lane wrote:
> Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes:
>> I traced this down to heapam.c, which has this:
>> ...
>> This doesn't exactly work anymore since we modify the snapshot after
>> calling ExecInitScan().
> 
> So don't do that.  The executor is generally entitled to assume that
> parameters given to ExecutorStart are correct.  In particular, changing
> the snapshot after the query has started to run seems to me to ensure
> all sorts of inconsistent and undesirable behavior.

You do remember that the whole patch in its current form depends on
modifying the snapshot?


Regards,
Marko Tiikkaja

Re: Writeable CTEs and empty relations

От

Marko Tiikkaja

Дата:

09 февраля 2010 г., 18:49:18

On 2010-02-08 21:30 +0200, I wrote:
> This doesn't exactly work anymore since we modify the snapshot after
> calling ExecInitScan().  I'm not really familiar with this part of the
> code, so I'm asking: is there a simple enough way around this?  Would
> updating scan->rs_nblocks before scanning the first tuple be OK?

I've looked at this some more, and the problem is a lot bigger than I
originally thought.  We'd basically be forced to do another initscan()
before starting a new scan after the snapshot changed.  One way to
accomplish this would be that ExecutePlan() would leave a flag in EState
whenever the scan nodes need to reinit.

Does this sound completely unacceptable?

Regards,
Marko Tiikkaja

Re: Writeable CTEs and empty relations

От

Tom Lane

Дата:

09 февраля 2010 г., 20:20:11

Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes:
> On 2010-02-08 21:30 +0200, I wrote:
>> This doesn't exactly work anymore since we modify the snapshot after
>> calling ExecInitScan().  I'm not really familiar with this part of the
>> code, so I'm asking: is there a simple enough way around this?  Would
>> updating scan->rs_nblocks before scanning the first tuple be OK?

> I've looked at this some more, and the problem is a lot bigger than I
> originally thought.  We'd basically be forced to do another initscan()
> before starting a new scan after the snapshot changed.  One way to
> accomplish this would be that ExecutePlan() would leave a flag in EState
> whenever the scan nodes need to reinit.

> Does this sound completely unacceptable?

You still haven't explained why it's a good idea to change the snapshot
after the executor has started.  Right at the moment I'm prepared to
reject the patch on that ground alone.
        regards, tom lane

Re: Writeable CTEs and empty relations

От

Marko Tiikkaja

Дата:

09 февраля 2010 г., 20:25:30

On 2010-02-10 02:19 +0200, Tom Lane wrote:
> Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes:
>> Does this sound completely unacceptable?
> 
> You still haven't explained why it's a good idea to change the snapshot
> after the executor has started.  Right at the moment I'm prepared to
> reject the patch on that ground alone.

The patch only touches the snapshot's curcid.  That's needed to allow
the queries see the tuples of the DML WITHs ran before that particular
query.


Regards,
Marko Tiikkaja

Re: Writeable CTEs and empty relations

От

Tom Lane

Дата:

09 февраля 2010 г., 21:21:06

Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes:
> On 2010-02-10 02:19 +0200, Tom Lane wrote:
>> You still haven't explained why it's a good idea to change the snapshot
>> after the executor has started.  Right at the moment I'm prepared to
>> reject the patch on that ground alone.

> The patch only touches the snapshot's curcid.  That's needed to allow
> the queries see the tuples of the DML WITHs ran before that particular
> query.

... and they will also see, for example, tuples inserted by triggers
fired by those queries.  I thought the plan for this feature was to
store and re-read the RETURNING data, not to go back and re-read the
underlying tables.
        regards, tom lane

Re: Writeable CTEs and empty relations

От

Marko Tiikkaja

Дата:

10 февраля 2010 г., 06:06:03

On 2010-02-10 03:20 +0200, Tom Lane wrote:
> Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes:
>> On 2010-02-10 02:19 +0200, Tom Lane wrote:
>>> You still haven't explained why it's a good idea to change the snapshot
>>> after the executor has started.  Right at the moment I'm prepared to
>>> reject the patch on that ground alone.
> 
>> The patch only touches the snapshot's curcid.  That's needed to allow
>> the queries see the tuples of the DML WITHs ran before that particular
>> query.
> 
> ... and they will also see, for example, tuples inserted by triggers
> fired by those queries.  I thought the plan for this feature was to
> store and re-read the RETURNING data, not to go back and re-read the
> underlying tables.

I originally suggested this approach about four months ago.  During this
whole time you haven't expressed any concerns about this, but suddenly
it's a reason to reject the patch?

Anyway, if we want to avoid modifying the snapshot, we can't bump the
CID between queries.  I haven't thought this through, but it seems like
this could work.  However, none of the WITH queries can see the previous
statement's tuples, which means that someone may be suprised when they
try to modify the same tuples twice just to discover that only the first
modification took place.  I don't see this as a huge problem though.

This will also solve the problem I started this thread for and makes the
patch smaller, simpler and even a bit prettier.  Unless someone sees a
problem with this approach, I'm going to make the change.

Regards,
Marko Tiikkaja

Re: Writeable CTEs and empty relations

От

Robert Haas

Дата:

10 февраля 2010 г., 15:59:30

On Wed, Feb 10, 2010 at 5:05 AM, Marko Tiikkaja
<marko.tiikkaja@cs.helsinki.fi> wrote:
> On 2010-02-10 03:20 +0200, Tom Lane wrote:
>> Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes:
>>> On 2010-02-10 02:19 +0200, Tom Lane wrote:
>>>> You still haven't explained why it's a good idea to change the snapshot
>>>> after the executor has started.  Right at the moment I'm prepared to
>>>> reject the patch on that ground alone.
>>
>>> The patch only touches the snapshot's curcid.  That's needed to allow
>>> the queries see the tuples of the DML WITHs ran before that particular
>>> query.
>>
>> ... and they will also see, for example, tuples inserted by triggers
>> fired by those queries.  I thought the plan for this feature was to
>> store and re-read the RETURNING data, not to go back and re-read the
>> underlying tables.
>
> I originally suggested this approach about four months ago.  During this
> whole time you haven't expressed any concerns about this, but suddenly
> it's a reason to reject the patch?
>
> Anyway, if we want to avoid modifying the snapshot, we can't bump the
> CID between queries.  I haven't thought this through, but it seems like
> this could work.  However, none of the WITH queries can see the previous
> statement's tuples, which means that someone may be suprised when they
> try to modify the same tuples twice just to discover that only the first
> modification took place.  I don't see this as a huge problem though.

I think that we agreed previously that running all the DML queries to
completion before the main query, bumping the CID after each one, and
saving the outputs, was the only workable approach.

http://archives.postgresql.org/pgsql-hackers/2009-10/msg00566.php
http://archives.postgresql.org/pgsql-hackers/2009-10/msg00614.php

If the executor has buried in it the assumption that the snapshot
can't change after startup, then does that mean that we need to start
up and shut down the executor for each subquery?

http://archives.postgresql.org/pgsql-hackers/2009-11/msg01964.php

...Robert

Re: Writeable CTEs and empty relations

От

Marko Tiikkaja

Дата:

10 февраля 2010 г., 16:24:11

On 2010-02-10 21:59 +0200, Robert Haas wrote:
> On Wed, Feb 10, 2010 at 5:05 AM, Marko Tiikkaja
> <marko.tiikkaja@cs.helsinki.fi> wrote:
>> On 2010-02-10 03:20 +0200, Tom Lane wrote:
>>> Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes:
>>>> On 2010-02-10 02:19 +0200, Tom Lane wrote:
>>>>> You still haven't explained why it's a good idea to change the snapshot
>>>>> after the executor has started.  Right at the moment I'm prepared to
>>>>> reject the patch on that ground alone.
>>>
>>>> The patch only touches the snapshot's curcid.  That's needed to allow
>>>> the queries see the tuples of the DML WITHs ran before that particular
>>>> query.
>>>
>>> ... and they will also see, for example, tuples inserted by triggers
>>> fired by those queries.  I thought the plan for this feature was to
>>> store and re-read the RETURNING data, not to go back and re-read the
>>> underlying tables.
>>
>> I originally suggested this approach about four months ago.  During this
>> whole time you haven't expressed any concerns about this, but suddenly
>> it's a reason to reject the patch?
>>
>> Anyway, if we want to avoid modifying the snapshot, we can't bump the
>> CID between queries.  I haven't thought this through, but it seems like
>> this could work.  However, none of the WITH queries can see the previous
>> statement's tuples, which means that someone may be suprised when they
>> try to modify the same tuples twice just to discover that only the first
>> modification took place.  I don't see this as a huge problem though.
> 
> I think that we agreed previously that running all the DML queries to
> completion before the main query, bumping the CID after each one, and
> saving the outputs, was the only workable approach.
> 
> http://archives.postgresql.org/pgsql-hackers/2009-10/msg00566.php

Hmm.  Yeah.  Didn't think about that :-(

> If the executor has buried in it the assumption that the snapshot
> can't change after startup, then does that mean that we need to start
> up and shut down the executor for each subquery?

Then I guess that's pretty much the only option.


Regards,
Marko Tiikkaja

Re: Writeable CTEs and empty relations

От

Tom Lane

Дата:

10 февраля 2010 г., 17:59:38

Robert Haas <robertmhaas@gmail.com> writes:
> If the executor has buried in it the assumption that the snapshot
> can't change after startup, then does that mean that we need to start
> up and shut down the executor for each subquery?

Yes, I think so.  That's the way it's always worked in the past;
see for example PortalRunMulti() and ProcessQuery().  I think trying
to change that is a high-risk, low-reward activity.

This probably means that the planner output for queries involving
writeable CTEs has to be a separate PlannedStmt per such CTE.
        regards, tom lane

Re: Writeable CTEs and empty relations

От

Marko Tiikkaja

Дата:

10 февраля 2010 г., 18:50:58

On 2010-02-10 23:57 +0200, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> If the executor has buried in it the assumption that the snapshot
>> can't change after startup, then does that mean that we need to start
>> up and shut down the executor for each subquery?
> 
> Yes, I think so.  That's the way it's always worked in the past;
> see for example PortalRunMulti() and ProcessQuery().  I think trying
> to change that is a high-risk, low-reward activity.
> 
> This probably means that the planner output for queries involving
> writeable CTEs has to be a separate PlannedStmt per such CTE.

I'm looking at this, but I can't think of any good way of associating
the tuplestores from PortalRunMulti() with the correct CTEs.  Any ideas?


Regards,
Marko Tiikkaja

Re: Writeable CTEs and empty relations

От

Marko Tiikkaja

Дата:

10 февраля 2010 г., 19:25:34

On 2010-02-11 00:50 +0200, Marko Tiikkaja wrote:
> On 2010-02-10 23:57 +0200, Tom Lane wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> If the executor has buried in it the assumption that the snapshot
>>> can't change after startup, then does that mean that we need to start
>>> up and shut down the executor for each subquery?
>>
>> Yes, I think so.  That's the way it's always worked in the past;
>> see for example PortalRunMulti() and ProcessQuery().  I think trying
>> to change that is a high-risk, low-reward activity.
>>
>> This probably means that the planner output for queries involving
>> writeable CTEs has to be a separate PlannedStmt per such CTE.
> 
> I'm looking at this, but I can't think of any good way of associating
> the tuplestores from PortalRunMulti() with the correct CTEs.  Any ideas?

Ok, what about the following: - after planning the original query, standard_planner() goes through   the list of
top-levelCTEs and assigns a running number for each of   the DML WITHs, in the order they will be executed and returns
a  list of PlannedStmts.  all necessary statements wi have a flag   signaling that the result should be stored in a
tuplestore.- the portal keeps the list of tuplestores around and passes that   list to every query through PlannedStmt.
it keeps on executing   the statements until it finds a PlannedStmt that doesn't want its   results stored anywhere and
thenfrees the list of tuplestores

Does this sound reasonable?  And more importantly, am I going to be
wasting my time implementing this?  The 15th isn't that far away..

Regards,
Marko Tiikkaja

Re: Writeable CTEs and empty relations

От

Robert Haas

Дата:

10 февраля 2010 г., 19:58:18

On Wed, Feb 10, 2010 at 6:25 PM, Marko Tiikkaja
<marko.tiikkaja@cs.helsinki.fi> wrote:
> On 2010-02-11 00:50 +0200, Marko Tiikkaja wrote:
>> On 2010-02-10 23:57 +0200, Tom Lane wrote:
>>> Robert Haas <robertmhaas@gmail.com> writes:
>>>> If the executor has buried in it the assumption that the snapshot
>>>> can't change after startup, then does that mean that we need to start
>>>> up and shut down the executor for each subquery?
>>>
>>> Yes, I think so.  That's the way it's always worked in the past;
>>> see for example PortalRunMulti() and ProcessQuery().  I think trying
>>> to change that is a high-risk, low-reward activity.
>>>
>>> This probably means that the planner output for queries involving
>>> writeable CTEs has to be a separate PlannedStmt per such CTE.
>>
>> I'm looking at this, but I can't think of any good way of associating
>> the tuplestores from PortalRunMulti() with the correct CTEs.  Any ideas?
>
> Ok, what about the following:
>  - after planning the original query, standard_planner() goes through
>    the list of top-level CTEs and assigns a running number for each of
>    the DML WITHs, in the order they will be executed and returns a
>    list of PlannedStmts.  all necessary statements wi have a flag
>    signaling that the result should be stored in a tuplestore.
>  - the portal keeps the list of tuplestores around and passes that
>    list to every query through PlannedStmt.  it keeps on executing
>    the statements until it finds a PlannedStmt that doesn't want its
>    results stored anywhere and then frees the list of tuplestores

Wouldn't you'd want to stop when you ran out of PlannedStmts, not when
you hit one that didn't need its results stored?  What happens if
there's an unreferenced CTE in the middle of the list?

> Does this sound reasonable?  And more importantly, am I going to be
> wasting my time implementing this?  The 15th isn't that far away..

I have to admit I've been starting to have a feeling over the last
couple of days that this patch isn't going to make it for 9.0...  but
obviously I'm in no position to guarantee anything one way or the
other.  Please do keep us up to date on your plans, though.

Thanks,

...Robert

Re: Writeable CTEs and empty relations

От

Marko Tiikkaja

Дата:

10 февраля 2010 г., 20:13:30

On 2010-02-11 01:58 +0200, Robert Haas wrote:
> On Wed, Feb 10, 2010 at 6:25 PM, Marko Tiikkaja
> <marko.tiikkaja@cs.helsinki.fi> wrote:
>> On 2010-02-11 00:50 +0200, Marko Tiikkaja wrote:
>> Ok, what about the following:
>>  - after planning the original query, standard_planner() goes through
>>    the list of top-level CTEs and assigns a running number for each of
>>    the DML WITHs, in the order they will be executed and returns a
>>    list of PlannedStmts.  all necessary statements wi have a flag
>>    signaling that the result should be stored in a tuplestore.
>>  - the portal keeps the list of tuplestores around and passes that
>>    list to every query through PlannedStmt.  it keeps on executing
>>    the statements until it finds a PlannedStmt that doesn't want its
>>    results stored anywhere and then frees the list of tuplestores
> 
> Wouldn't you'd want to stop when you ran out of PlannedStmts, not when
> you hit one that didn't need its results stored?  What happens if
> there's an unreferenced CTE in the middle of the list?

Right, of course.

>> Does this sound reasonable?  And more importantly, am I going to be
>> wasting my time implementing this?  The 15th isn't that far away..
> 
> I have to admit I've been starting to have a feeling over the last
> couple of days that this patch isn't going to make it for 9.0...  but
> obviously I'm in no position to guarantee anything one way or the
> other.  Please do keep us up to date on your plans, though.

Unfortunately, so have I.  I'll take a shot at implementing this, but if
I come across any problems, I have to give up.


Regards,
Marko Tiikkaja

Re: Writeable CTEs and empty relations

От

Marko Tiikkaja

Дата:

10 февраля 2010 г., 21:44:34

On 2010-02-11 02:13 +0200, I wrote:
> On 2010-02-11 01:58 +0200, Robert Haas wrote:
>> I have to admit I've been starting to have a feeling over the last
>> couple of days that this patch isn't going to make it for 9.0...  but
>> obviously I'm in no position to guarantee anything one way or the
>> other.  Please do keep us up to date on your plans, though.
> 
> Unfortunately, so have I.  I'll take a shot at implementing this, but if
> I come across any problems, I have to give up.

I'm going to have to disappoint a bunch of people and give up. :-(

At this point, I've already used a huge amount of my time and energy to
work on this patch during this commit fest, and I simply can't continue
like this.  There's only 4 days left anyway, and I don't feel confident
enough to spend a lot of time during the next couple of days just to get
one more shot.  I don't even have any kind of certainty that there
would've been a shot, even if I finished the patch on time.

Thanks everyone for helping out and reviewing this, especially Robert
and Tom.

Regards,
Marko Tiikkaja

Re: Writeable CTEs and empty relations

От

Marko Tiikkaja

Дата:

11 февраля 2010 г., 09:47:26

On 2010-02-11 03:44 +0200, I wrote:
> I'm going to have to disappoint a bunch of people and give up. :-(

Btw. would it make sense to apply the WITH-on-top-of-DML part of this
patch?  At least to me, this seems useful because you can write a
RECURSIVE SELECT and then use UPDATE .. FROM or DELETE .. USING on that CTE.


Regards,
Marko Tiikkaja

Re: Writeable CTEs and empty relations

От

Robert Haas

Дата:

11 февраля 2010 г., 11:53:55

On Thu, Feb 11, 2010 at 8:46 AM, Marko Tiikkaja
<marko.tiikkaja@cs.helsinki.fi> wrote:
> On 2010-02-11 03:44 +0200, I wrote:
>> I'm going to have to disappoint a bunch of people and give up. :-(
>
> Btw. would it make sense to apply the WITH-on-top-of-DML part of this
> patch?  At least to me, this seems useful because you can write a
> RECURSIVE SELECT and then use UPDATE .. FROM or DELETE .. USING on that CTE.

Hmm, that's a thought.  Can you split out just that part?

...Robert

Re: Writeable CTEs and empty relations

От

Marko Tiikkaja

Дата:

11 февраля 2010 г., 13:32:50

On Thu, 11 Feb 2010 10:53:22 -0500, Robert Haas <robertmhaas@gmail.com>
wrote:
> On Thu, Feb 11, 2010 at 8:46 AM, Marko Tiikkaja
> <marko.tiikkaja@cs.helsinki.fi> wrote:
>> On 2010-02-11 03:44 +0200, I wrote:
>>> I'm going to have to disappoint a bunch of people and give up. :-(
>>
>> Btw. would it make sense to apply the WITH-on-top-of-DML part of this
>> patch?  At least to me, this seems useful because you can write a
>> RECURSIVE SELECT and then use UPDATE .. FROM or DELETE .. USING on that
>> CTE.
>
> Hmm, that's a thought.  Can you split out just that part?

Here's the patch.  It's the same as the stuff in writeable CTE patches, but
I added regression tests.


Regards,
Marko Tiikkaja

Вложения

with_on_dml.patch

Re: Writeable CTEs and empty relations

От

Marko Tiikkaja

Дата:

11 февраля 2010 г., 13:39:05

On Thu, 11 Feb 2010 19:28:28 +0200, I wrote:
> On Thu, 11 Feb 2010 10:53:22 -0500, Robert Haas <robertmhaas@gmail.com>
> wrote:
>> On Thu, Feb 11, 2010 at 8:46 AM, Marko Tiikkaja
>> <marko.tiikkaja@cs.helsinki.fi> wrote:
>>> On 2010-02-11 03:44 +0200, I wrote:
>>>> I'm going to have to disappoint a bunch of people and give up. :-(
>>>
>>> Btw. would it make sense to apply the WITH-on-top-of-DML part of this
>>> patch?  At least to me, this seems useful because you can write a
>>> RECURSIVE SELECT and then use UPDATE .. FROM or DELETE .. USING on that
>>> CTE.
>>
>> Hmm, that's a thought.  Can you split out just that part?
>
> Here's the patch.  It's the same as the stuff in writeable CTE patches,
but
> I added regression tests.

Whoops.  The reference section in docs still had some traces of writeable
CTEs.  Updated patch attached.


Regards,
Marko Tiikkaja

Вложения

with_on_dml2.patch

Re: Writeable CTEs and empty relations

От

Robert Haas

Дата:

12 февраля 2010 г., 00:26:37

On Thu, Feb 11, 2010 at 12:35 PM, Marko Tiikkaja
<marko.tiikkaja@cs.helsinki.fi> wrote:
> On Thu, 11 Feb 2010 19:28:28 +0200, I wrote:
>> On Thu, 11 Feb 2010 10:53:22 -0500, Robert Haas <robertmhaas@gmail.com>
>> wrote:
>>> On Thu, Feb 11, 2010 at 8:46 AM, Marko Tiikkaja
>>> <marko.tiikkaja@cs.helsinki.fi> wrote:
>>>> On 2010-02-11 03:44 +0200, I wrote:
>>>>> I'm going to have to disappoint a bunch of people and give up. :-(
>>>>
>>>> Btw. would it make sense to apply the WITH-on-top-of-DML part of this
>>>> patch?  At least to me, this seems useful because you can write a
>>>> RECURSIVE SELECT and then use UPDATE .. FROM or DELETE .. USING on that
>>>> CTE.
>>>
>>> Hmm, that's a thought.  Can you split out just that part?
>>
>> Here's the patch.  It's the same as the stuff in writeable CTE patches,
> but
>> I added regression tests.
>
> Whoops.  The reference section in docs still had some traces of writeable
> CTEs.  Updated patch attached.

This looks simple and useful.  I haven't tested it, but if it's really
this easy, we should definitely do it.

...Robert

Re: Writeable CTEs and empty relations

От

Tom Lane

Дата:

12 февраля 2010 г., 01:12:03

Robert Haas <robertmhaas@gmail.com> writes:
> This looks simple and useful.  I haven't tested it, but if it's really
> this easy, we should definitely do it.

I should be out from under the window functions patch tomorrow,
will look at this one then.
        regards, tom lane

Re: Writeable CTEs and empty relations

От

Robert Haas

Дата:

12 февраля 2010 г., 01:19:52

On Fri, Feb 12, 2010 at 12:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> This looks simple and useful.  I haven't tested it, but if it's really
>> this easy, we should definitely do it.
>
> I should be out from under the window functions patch tomorrow,
> will look at this one then.

Cool, thanks.

...Robert

Re: Writeable CTEs and empty relations

От

Tom Lane

Дата:

12 февраля 2010 г., 21:07:00

Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes:
>> Here's the patch.  It's the same as the stuff in writeable CTE patches,
>> but I added regression tests.

> Whoops.  The reference section in docs still had some traces of writeable
> CTEs.  Updated patch attached.

I spent some time playing with this but concluded that it's not
committable.  I ran into two significant problems:

1. In an INSERT statement, it's already possible to attach a WITH to
the contained statement, ie
    INSERT INTO foo WITH ... SELECT ...
    INSERT INTO foo WITH ... VALUES (...)
and the patch wasn't doing anything nice with the case where one tries
to put WITH at both places:
    WITH ... INSERT INTO foo WITH ... VALUES (...)
(The SELECT case actually works, mostly, but the VALUES one doesn't.)
I thought about just concat'ing the two WITH lists but this introduces
various strange corner cases; in particular when one list is marked
RECURSIVE and the other isn't there's no way to avoid surprising
behavior.  However, since the option for an inner WITH already does
everything you would want, we could just forget about adding outer WITH
for INSERT.  The attached modified patch does that.

2. None of the cases play nicely with NEW or OLD references in a rule.
For example,

regression=#  create temp table x(f1 int);
CREATE TABLE
regression=#  create temp table y(f2 int);
CREATE TABLE
regression=# create rule r2 as on update to x do instead
with t as (select old.*) update y set f2 = t.f1 from t;
CREATE RULE
regression=# update x set f1 = f1+1;
ERROR:  bogus local parameter passed to WITH query
regression=#

I don't see any very nice way to fix this.  The problem is that the
NEW or OLD reference is treated as though it were a relation of the
main query (the UPDATE in this case), which is something that's not
valid to reference in a WITH query.  I'm afraid that it might not
be possible to fix it without significant changes in the way rules
work (and consequent compatibility issues).

We could possibly put in some hack to disallow OLD/NEW references in
the WITH queries, but that got past my threshold of ugliness, so
I'm not going to commit it without further discussion.

Attached is the patch as I had it before giving up (sans documentation
since I'd not gotten to that yet).  The main other change from what
you submitted was adding ruleutils.c support.

            regards, tom lane

Index: src/backend/nodes/copyfuncs.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/nodes/copyfuncs.c,v
retrieving revision 1.461
diff -c -r1.461 copyfuncs.c
*** src/backend/nodes/copyfuncs.c    12 Feb 2010 17:33:20 -0000    1.461
--- src/backend/nodes/copyfuncs.c    13 Feb 2010 00:49:41 -0000
***************
*** 2279,2284 ****
--- 2279,2285 ----
      COPY_NODE_FIELD(usingClause);
      COPY_NODE_FIELD(whereClause);
      COPY_NODE_FIELD(returningList);
+     COPY_NODE_FIELD(withClause);

      return newnode;
  }
***************
*** 2293,2298 ****
--- 2294,2300 ----
      COPY_NODE_FIELD(whereClause);
      COPY_NODE_FIELD(fromClause);
      COPY_NODE_FIELD(returningList);
+     COPY_NODE_FIELD(withClause);

      return newnode;
  }
Index: src/backend/nodes/equalfuncs.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/nodes/equalfuncs.c,v
retrieving revision 1.382
diff -c -r1.382 equalfuncs.c
*** src/backend/nodes/equalfuncs.c    12 Feb 2010 17:33:20 -0000    1.382
--- src/backend/nodes/equalfuncs.c    13 Feb 2010 00:49:41 -0000
***************
*** 899,904 ****
--- 899,905 ----
      COMPARE_NODE_FIELD(usingClause);
      COMPARE_NODE_FIELD(whereClause);
      COMPARE_NODE_FIELD(returningList);
+     COMPARE_NODE_FIELD(withClause);

      return true;
  }
***************
*** 911,916 ****
--- 912,918 ----
      COMPARE_NODE_FIELD(whereClause);
      COMPARE_NODE_FIELD(fromClause);
      COMPARE_NODE_FIELD(returningList);
+     COMPARE_NODE_FIELD(withClause);

      return true;
  }
Index: src/backend/parser/analyze.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/parser/analyze.c,v
retrieving revision 1.401
diff -c -r1.401 analyze.c
*** src/backend/parser/analyze.c    12 Feb 2010 22:48:56 -0000    1.401
--- src/backend/parser/analyze.c    13 Feb 2010 00:49:41 -0000
***************
*** 282,287 ****
--- 282,294 ----

      qry->commandType = CMD_DELETE;

+     /* process the WITH clause independently of all else */
+     if (stmt->withClause)
+     {
+         qry->hasRecursive = stmt->withClause->recursive;
+         qry->cteList = transformWithClause(pstate, stmt->withClause);
+     }
+
      /* set up range table with just the result rel */
      qry->resultRelation = setTargetTable(pstate, stmt->relation,
                                    interpretInhOption(stmt->relation->inhOpt),
***************
*** 380,386 ****
       * Must get write lock on INSERT target table before scanning SELECT, else
       * we will grab the wrong kind of initial lock if the target table is also
       * mentioned in the SELECT part.  Note that the target table is not added
!      * to the joinlist or namespace.
       */
      qry->resultRelation = setTargetTable(pstate, stmt->relation,
                                           false, false, ACL_INSERT);
--- 387,394 ----
       * Must get write lock on INSERT target table before scanning SELECT, else
       * we will grab the wrong kind of initial lock if the target table is also
       * mentioned in the SELECT part.  Note that the target table is not added
!      * to the joinlist or namespace (and hence it won't affect processing
!      * of the contained statement).
       */
      qry->resultRelation = setTargetTable(pstate, stmt->relation,
                                           false, false, ACL_INSERT);
***************
*** 1730,1735 ****
--- 1738,1750 ----
      qry->commandType = CMD_UPDATE;
      pstate->p_is_update = true;

+     /* process the WITH clause independently of all else */
+     if (stmt->withClause)
+     {
+         qry->hasRecursive = stmt->withClause->recursive;
+         qry->cteList = transformWithClause(pstate, stmt->withClause);
+     }
+
      qry->resultRelation = setTargetTable(pstate, stmt->relation,
                                    interpretInhOption(stmt->relation->inhOpt),
                                           true,
Index: src/backend/parser/gram.y
===================================================================
RCS file: /cvsroot/pgsql/src/backend/parser/gram.y,v
retrieving revision 2.708
diff -c -r2.708 gram.y
*** src/backend/parser/gram.y    12 Feb 2010 17:33:20 -0000    2.708
--- src/backend/parser/gram.y    13 Feb 2010 00:49:41 -0000
***************
*** 429,435 ****
  %type <boolean> xml_whitespace_option

  %type <node>     common_table_expr
! %type <with>     with_clause
  %type <list>    cte_list

  %type <list>    window_clause window_definition_list opt_partition_clause
--- 429,435 ----
  %type <boolean> xml_whitespace_option

  %type <node>     common_table_expr
! %type <with>     with_clause opt_with_clause
  %type <list>    cte_list

  %type <list>    window_clause window_definition_list opt_partition_clause
***************
*** 7141,7154 ****
   *
   *****************************************************************************/

! DeleteStmt: DELETE_P FROM relation_expr_opt_alias
              using_clause where_or_current_clause returning_clause
                  {
                      DeleteStmt *n = makeNode(DeleteStmt);
!                     n->relation = $3;
!                     n->usingClause = $4;
!                     n->whereClause = $5;
!                     n->returningList = $6;
                      $$ = (Node *)n;
                  }
          ;
--- 7141,7155 ----
   *
   *****************************************************************************/

! DeleteStmt: opt_with_clause DELETE_P FROM relation_expr_opt_alias
              using_clause where_or_current_clause returning_clause
                  {
                      DeleteStmt *n = makeNode(DeleteStmt);
!                     n->relation = $4;
!                     n->usingClause = $5;
!                     n->whereClause = $6;
!                     n->returningList = $7;
!                     n->withClause = $1;
                      $$ = (Node *)n;
                  }
          ;
***************
*** 7203,7220 ****
   *
   *****************************************************************************/

! UpdateStmt: UPDATE relation_expr_opt_alias
              SET set_clause_list
              from_clause
              where_or_current_clause
              returning_clause
                  {
                      UpdateStmt *n = makeNode(UpdateStmt);
!                     n->relation = $2;
!                     n->targetList = $4;
!                     n->fromClause = $5;
!                     n->whereClause = $6;
!                     n->returningList = $7;
                      $$ = (Node *)n;
                  }
          ;
--- 7204,7222 ----
   *
   *****************************************************************************/

! UpdateStmt: opt_with_clause UPDATE relation_expr_opt_alias
              SET set_clause_list
              from_clause
              where_or_current_clause
              returning_clause
                  {
                      UpdateStmt *n = makeNode(UpdateStmt);
!                     n->relation = $3;
!                     n->targetList = $5;
!                     n->fromClause = $6;
!                     n->whereClause = $7;
!                     n->returningList = $8;
!                     n->withClause = $1;
                      $$ = (Node *)n;
                  }
          ;
***************
*** 7556,7561 ****
--- 7558,7569 ----
              }
          ;

+ opt_with_clause:
+         with_clause                                { $$ = $1; }
+         | /*EMPTY*/                                { $$ = NULL; }
+         ;
+
+
  into_clause:
              INTO OptTempTableName
                  {
Index: src/backend/utils/adt/ruleutils.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/ruleutils.c,v
retrieving revision 1.321
diff -c -r1.321 ruleutils.c
*** src/backend/utils/adt/ruleutils.c    12 Feb 2010 17:33:20 -0000    1.321
--- src/backend/utils/adt/ruleutils.c    13 Feb 2010 00:49:42 -0000
***************
*** 3351,3356 ****
--- 3351,3359 ----
      RangeTblEntry *rte;
      ListCell   *l;

+     /* Insert the WITH clause if given */
+     get_with_clause(query, context);
+
      /*
       * Start the query with UPDATE relname SET
       */
***************
*** 3432,3437 ****
--- 3435,3443 ----
      StringInfo    buf = context->buf;
      RangeTblEntry *rte;

+     /* Insert the WITH clause if given */
+     get_with_clause(query, context);
+
      /*
       * Start the query with DELETE FROM relname
       */
Index: src/include/nodes/parsenodes.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/nodes/parsenodes.h,v
retrieving revision 1.429
diff -c -r1.429 parsenodes.h
*** src/include/nodes/parsenodes.h    12 Feb 2010 17:33:20 -0000    1.429
--- src/include/nodes/parsenodes.h    13 Feb 2010 00:49:42 -0000
***************
*** 906,911 ****
--- 906,912 ----
      List       *usingClause;    /* optional using clause for more tables */
      Node       *whereClause;    /* qualifications */
      List       *returningList;    /* list of expressions to return */
+     WithClause *withClause;        /* WITH clause */
  } DeleteStmt;

  /* ----------------------
***************
*** 920,925 ****
--- 921,927 ----
      Node       *whereClause;    /* qualifications */
      List       *fromClause;        /* optional from clause for more tables */
      List       *returningList;    /* list of expressions to return */
+     WithClause *withClause;        /* WITH clause */
  } UpdateStmt;

  /* ----------------------
Index: src/test/regress/expected/with.out
===================================================================
RCS file: /cvsroot/pgsql/src/test/regress/expected/with.out,v
retrieving revision 1.13
diff -c -r1.13 with.out
*** src/test/regress/expected/with.out    22 Nov 2009 05:20:41 -0000    1.13
--- src/test/regress/expected/with.out    13 Feb 2010 00:49:42 -0000
***************
*** 738,743 ****
--- 738,820 ----
  (54 rows)

  --
+ -- WITH on top of a DML statement
+ --
+ CREATE TEMPORARY TABLE y (a INTEGER);
+ INSERT INTO y SELECT generate_series(1, 10);
+ INSERT INTO y
+ WITH t AS (
+     SELECT a FROM y
+ )
+ SELECT a+20 FROM t RETURNING *;
+  a
+ ----
+  21
+  22
+  23
+  24
+  25
+  26
+  27
+  28
+  29
+  30
+ (10 rows)
+
+ WITH t AS (
+     SELECT a FROM y
+ )
+ UPDATE y SET a = y.a-10 FROM t WHERE y.a > 20 AND t.a = y.a RETURNING y.a;
+  a
+ ----
+  11
+  12
+  13
+  14
+  15
+  16
+  17
+  18
+  19
+  20
+ (10 rows)
+
+ WITH RECURSIVE t(a) AS (
+     SELECT 11
+     UNION ALL
+     SELECT a+1 FROM t WHERE a < 50
+ )
+ DELETE FROM y USING t WHERE t.a = y.a RETURNING y.a;
+  a
+ ----
+  11
+  12
+  13
+  14
+  15
+  16
+  17
+  18
+  19
+  20
+ (10 rows)
+
+ SELECT * FROM y;
+  a
+ ----
+   1
+   2
+   3
+   4
+   5
+   6
+   7
+   8
+   9
+  10
+ (10 rows)
+
+ --
  -- error cases
  --
  -- INTERSECT
***************
*** 774,781 ****
  ERROR:  recursive reference to query "x" must not appear within its non-recursive term
  LINE 1: WITH RECURSIVE x(n) AS (SELECT n FROM x UNION ALL SELECT 1)
                                                ^
- CREATE TEMPORARY TABLE y (a INTEGER);
- INSERT INTO y SELECT generate_series(1, 10);
  -- LEFT JOIN
  WITH RECURSIVE x(n) AS (SELECT a FROM y WHERE a = 1
      UNION ALL
--- 851,856 ----
Index: src/test/regress/sql/with.sql
===================================================================
RCS file: /cvsroot/pgsql/src/test/regress/sql/with.sql,v
retrieving revision 1.11
diff -c -r1.11 with.sql
*** src/test/regress/sql/with.sql    9 Sep 2009 03:32:52 -0000    1.11
--- src/test/regress/sql/with.sql    13 Feb 2010 00:49:42 -0000
***************
*** 339,344 ****
--- 339,371 ----
   SELECT * FROM z;

  --
+ -- WITH on top of a DML statement
+ --
+
+ CREATE TEMPORARY TABLE y (a INTEGER);
+ INSERT INTO y SELECT generate_series(1, 10);
+
+ INSERT INTO y
+ WITH t AS (
+     SELECT a FROM y
+ )
+ SELECT a+20 FROM t RETURNING *;
+
+ WITH t AS (
+     SELECT a FROM y
+ )
+ UPDATE y SET a = y.a-10 FROM t WHERE y.a > 20 AND t.a = y.a RETURNING y.a;
+
+ WITH RECURSIVE t(a) AS (
+     SELECT 11
+     UNION ALL
+     SELECT a+1 FROM t WHERE a < 50
+ )
+ DELETE FROM y USING t WHERE t.a = y.a RETURNING y.a;
+
+ SELECT * FROM y;
+
+ --
  -- error cases
  --

***************
*** 364,372 ****
  WITH RECURSIVE x(n) AS (SELECT n FROM x UNION ALL SELECT 1)
      SELECT * FROM x;

- CREATE TEMPORARY TABLE y (a INTEGER);
- INSERT INTO y SELECT generate_series(1, 10);
-
  -- LEFT JOIN

  WITH RECURSIVE x(n) AS (SELECT a FROM y WHERE a = 1
--- 391,396 ----

Re: Writeable CTEs and empty relations

От

Robert Haas

Дата:

12 февраля 2010 г., 22:12:57

On Fri, Feb 12, 2010 at 8:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes:
>>> Here's the patch.  It's the same as the stuff in writeable CTE patches,
>>> but I added regression tests.
>
>> Whoops.  The reference section in docs still had some traces of writeable
>> CTEs.  Updated patch attached.
>
> I spent some time playing with this but concluded that it's not
> committable.  I ran into two significant problems:
>
> 1. In an INSERT statement, it's already possible to attach a WITH to
> the contained statement, ie
>        INSERT INTO foo WITH ... SELECT ...
>        INSERT INTO foo WITH ... VALUES (...)
> and the patch wasn't doing anything nice with the case where one tries
> to put WITH at both places:
>        WITH ... INSERT INTO foo WITH ... VALUES (...)
> (The SELECT case actually works, mostly, but the VALUES one doesn't.)
> I thought about just concat'ing the two WITH lists but this introduces
> various strange corner cases; in particular when one list is marked
> RECURSIVE and the other isn't there's no way to avoid surprising
> behavior.  However, since the option for an inner WITH already does
> everything you would want, we could just forget about adding outer WITH
> for INSERT.  The attached modified patch does that.

That seems reasonable, though we might want to document that somehow
for posterity.

By the way, are we going to support WITH ... TABLE?

> 2. None of the cases play nicely with NEW or OLD references in a rule.
> For example,
>
> regression=#  create temp table x(f1 int);
> CREATE TABLE
> regression=#  create temp table y(f2 int);
> CREATE TABLE
> regression=# create rule r2 as on update to x do instead
> with t as (select old.*) update y set f2 = t.f1 from t;
> CREATE RULE
> regression=# update x set f1 = f1+1;
> ERROR:  bogus local parameter passed to WITH query
> regression=#
>
> I don't see any very nice way to fix this.  The problem is that the
> NEW or OLD reference is treated as though it were a relation of the
> main query (the UPDATE in this case), which is something that's not
> valid to reference in a WITH query.  I'm afraid that it might not
> be possible to fix it without significant changes in the way rules
> work (and consequent compatibility issues).
>
> We could possibly put in some hack to disallow OLD/NEW references in
> the WITH queries, but that got past my threshold of ugliness, so
> I'm not going to commit it without further discussion.

On the face of it it's not obvious to me why you wouldn't just do
that.  If it's not valid to reference them there, then just don't let
it happen (now comes the part where you tell me why it's not that
simple).

...Robert

Re: Writeable CTEs and empty relations

От

Tom Lane

Дата:

12 февраля 2010 г., 22:32:13

Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Feb 12, 2010 at 8:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> We could possibly put in some hack to disallow OLD/NEW references in
>> the WITH queries, but that got past my threshold of ugliness, so
>> I'm not going to commit it without further discussion.

> On the face of it it's not obvious to me why you wouldn't just do
> that.  If it's not valid to reference them there, then just don't let
> it happen (now comes the part where you tell me why it's not that
> simple).

Well, there's no obvious-to-the-user reason why it shouldn't work.
If we hack it, then an example like I gave will give a "no such
table" error, and I'll bet long odds we'll get bug reports about it.

(Now maybe we could suppress the bug reports if we could get it to
print something more like "NEW can't be referenced in WITH", but
doing that seems significantly harder --- the way that I can see
to do it would be to not have NEW/OLD in the namespace while parsing
WITH, and that would lead to a pretty stupid error message.)
        regards, tom lane

Re: Writeable CTEs and empty relations

От

Robert Haas

Дата:

14 февраля 2010 г., 00:48:07

On Fri, Feb 12, 2010 at 9:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Fri, Feb 12, 2010 at 8:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> We could possibly put in some hack to disallow OLD/NEW references in
>>> the WITH queries, but that got past my threshold of ugliness, so
>>> I'm not going to commit it without further discussion.
>
>> On the face of it it's not obvious to me why you wouldn't just do
>> that.  If it's not valid to reference them there, then just don't let
>> it happen (now comes the part where you tell me why it's not that
>> simple).
>
> Well, there's no obvious-to-the-user reason why it shouldn't work.
> If we hack it, then an example like I gave will give a "no such
> table" error, and I'll bet long odds we'll get bug reports about it.
>
> (Now maybe we could suppress the bug reports if we could get it to
> print something more like "NEW can't be referenced in WITH", but
> doing that seems significantly harder --- the way that I can see
> to do it would be to not have NEW/OLD in the namespace while parsing
> WITH, and that would lead to a pretty stupid error message.)

Oh, I get it.  I thought you were saying that it was semantically
nonsensical, but now I think you're saying that there's some
implementation artifact that prevents us from supporting it.
Personally, I don't use NEW and OLD in UPDATE queries, so it wouldn't
bother me to just disallow them, but (1) I might be in the minority
and (2) even if I'm not, the ugliness factor is something to consider.So I'm not sure what the right thing to do is,
butwe need to make up 
our mind...

If we do decide to hold off on this, then we should probably have a
discussion of what the right way to fix the rule issues is for 9.1.
If there's no non-ugly way to do this, then we might as well go ahead
and do something ugly.  If there is a non-ugly way to fix it, we
should figure out what it is so that those who may be interested in
having this feature can see about coding it up.

...Robert

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Writeable CTEs and empty relations

Вложения

Вложения