Обсуждение: Joins of data-modifying CTE with the target table

Поиск
Список
Период
Сортировка

Joins of data-modifying CTE with the target table

От
Alex Bolenok
Дата:
Hi list,

This popped up yesterday during a discussion at the Boston PostgreSQL group meetup, and Jesper Pedersen had advised that I post it here.

Imagine this setup:

CREATE TABLE IF NOT EXISTS mytable (id BIGSERIAL PRIMARY KEY, value TEXT NOT NULL);

WITH    insert_cte AS
        (
        INSERT
        INTO    mytable (value)
        VALUES  ('test')
        RETURNING
                *
        )
SELECT  mytable.*
FROM    insert_cte
JOIN    mytable
USING   (id)
;

This query will return nothing, even though people would expect it to return the newly inserted record.

This is just a minimally reproducible example, in which you can easily work around the problem just by getting rid of the join to mytable. But during my consulting career, I've seen people try putting together more complex queries using the same pattern, and this always comes as a surprise.

I get why it's not working (because the statement is not allowed to see the tuples with its own cmin), but I was wondering if it was worth it at least to spell it out explicitly in the documentation.

Right now the documentation says:


RETURNING data is the only way to communicate changes between different WITH sub-statements and the main query

which I don't think is covering the JOIN issue (after all, I am using the RETURNING clause to communicate with the main query).

Can we please add this example to the documentation? I can do the wording if that's something worth adding.

Thank you! 

Re: Joins of data-modifying CTE with the target table

От
Tom Lane
Дата:
Alex Bolenok <quassnoi@gmail.com> writes:
> I get why it's not working (because the statement is not allowed to see the
> tuples with its own cmin), but I was wondering if it was worth it at least
> to spell it out explicitly in the documentation.

What's not explicit about this?

    The sub-statements in WITH are executed concurrently with each other
    and with the main query. Therefore, when using data-modifying
    statements in WITH, the order in which the specified updates actually
    happen is unpredictable. All the statements are executed with the same
    snapshot (see Chapter 13), so they cannot “see” one another's effects
    on the target tables.

            regards, tom lane



Re: Joins of data-modifying CTE with the target table

От
Adrian Klaver
Дата:
On 4/19/23 08:37, Alex Bolenok wrote:
> Hi list,
> 
> This popped up yesterday during a discussion at the Boston PostgreSQL 
> group meetup, and Jesper Pedersen had advised that I post it here.
> 
> Imagine this setup:
> 
> CREATE TABLE IF NOT EXISTS mytable (id BIGSERIAL PRIMARY KEY, value TEXT 
> NOT NULL);
> 
> WITH    insert_cte AS
>          (
>          INSERT
>          INTO    mytable (value)
>          VALUES  ('test')
>          RETURNING
>                  *
>          )
> SELECT  mytable.*
> FROM    insert_cte
> JOIN    mytable
> USING   (id);
> 
> This query will return nothing, even though people would expect it to 
> return the newly inserted record.
> 
> This is just a minimally reproducible example, in which you can easily 
> work around the problem just by getting rid of the join to mytable. But 
> during my consulting career, I've seen people try putting together more 
> complex queries using the same pattern, and this always comes as a surprise.
> 
> I get why it's not working (because the statement is not allowed to see 
> the tuples with its own cmin), but I was wondering if it was worth it at 
> least to spell it out explicitly in the documentation.
> 
> Right now the documentation says:
> 
> https://www.postgresql.org/docs/15/queries-with.html#QUERIES-WITH-MODIFYING
<https://www.postgresql.org/docs/15/queries-with.html#QUERIES-WITH-MODIFYING>
> 
>     RETURNING data is the only way to communicate changes between
>     different WITH sub-statements and the main query
> 
> 
> which I don't think is covering the JOIN issue (after all, I am using 
> the RETURNING clause to communicate with the main query).
> 
> Can we please add this example to the documentation? I can do the 
> wording if that's something worth adding.

To add to Tom's post.

"
Data-modifying statements in WITH usually have RETURNING clauses (see 
Section 6.4), as shown in the example above. It is the output of the 
RETURNING clause, not the target table of the data-modifying statement, 
that forms the temporary table that can be referred to by the rest of 
the query. If a data-modifying statement in WITH lacks a RETURNING 
clause, then it forms no temporary table and cannot be referred to in 
the rest of the query. Such a statement will be executed nonetheless. A 
not-particularly-useful example is:

...

The sub-statements in WITH are executed concurrently with each other and 
with the main query. Therefore, when using data-modifying statements in 
WITH, the order in which the specified updates actually happen is 
unpredictable. All the statements are executed with the same snapshot 
(see Chapter 13), so they cannot “see” one another's effects on the 
target tables. This alleviates the effects of the unpredictability of 
the actual order of row updates, and means that RETURNING data is the 
only way to communicate changes between different WITH sub-statements 
and the main query. An example of this is that in

WITH t AS (
     UPDATE products SET price = price * 1.05
     RETURNING *
)
SELECT * FROM products;

the outer SELECT would return the original prices before the action of 
the UPDATE, while in

WITH t AS (
     UPDATE products SET price = price * 1.05
     RETURNING *
)
SELECT * FROM t;

the outer SELECT would return the updated data.
"

So the RETURNING temp table is the only thing you have to work on.

> 
> Thank you!

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Joins of data-modifying CTE with the target table

От
Alex Bolenok
Дата:
To reiterate, I (I personally) get why it's not working, but the documentation is something that is intended for people who don't.

Technically, anyone can deduce it from reading the documentation thoroughly enough, but do we need the documentation to be as terse as possible?

To answer your question, by "explicitly" I mean mentioning this very pattern, where you join the returned id with the target table. People often try to use this pattern for queries like "add an item to the order in a CTE, select the order total in the main query", and often don't notice that the order total doesn't include the new item until it hits production.

ср, 19 апр. 2023 г. в 11:46, Tom Lane <tgl@sss.pgh.pa.us>:
Alex Bolenok <quassnoi@gmail.com> writes:
> I get why it's not working (because the statement is not allowed to see the
> tuples with its own cmin), but I was wondering if it was worth it at least
> to spell it out explicitly in the documentation.

What's not explicit about this?

    The sub-statements in WITH are executed concurrently with each other
    and with the main query. Therefore, when using data-modifying
    statements in WITH, the order in which the specified updates actually
    happen is unpredictable. All the statements are executed with the same
    snapshot (see Chapter 13), so they cannot “see” one another's effects
    on the target tables.

                        regards, tom lane

Re: Joins of data-modifying CTE with the target table

От
Adrian Klaver
Дата:
On 4/19/23 09:09, Alex Bolenok wrote:
> To reiterate, I (I personally) get why it's not working, but the 
> documentation is something that is intended for people who don't.
> 
> Technically, anyone can deduce it from reading the documentation 
> thoroughly enough, but do we need the documentation to be as terse 
> as possible?

Not so much terse as possible as not as verbose as possible. The 
difference being that documenting every possible usage would turn the 
page into an eye glazing experience. At that point folks would really 
just move on. The documentation expectations(as I see it) are that: 1) 
People can work from basic principles to specific usage. 2) They also 
actually test/verify what they run.

What you are after is more on the lines of a separate enhanced tutorial. 
That has been discussed on this list and elsewhere. So far it has not 
progressed beyond discussion.


> 
> To answer your question, by "explicitly" I mean mentioning this very 
> pattern, where you join the returned id with the target table. People 
> often try to use this pattern for queries like "add an item to the order 
> in a CTE, select the order total in the main query", and often don't 
> notice that the order total doesn't include the new item until it hits 
> production.
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Joins of data-modifying CTE with the target table

От
Tom Lane
Дата:
Alex Bolenok <quassnoi@gmail.com> writes:
> To answer your question, by "explicitly" I mean mentioning this very
> pattern, where you join the returned id with the target table.

I do not see why this existing example isn't sufficient:

    ... An example of this is that in

    WITH t AS (
        UPDATE products SET price = price * 1.05
        RETURNING *
    )
    SELECT * FROM products;

    the outer SELECT would return the original prices before the action of
    the UPDATE, ...

Yeah, it lacks a JOIN and it's an UPDATE not an INSERT.  But we can't
provide an example to precisely match every possible mistake.  As
Adrian said, brevity in the docs does have value.

            regards, tom lane