Обсуждение: Confusing results with lateral references

Поиск
Список
Период
Сортировка

Confusing results with lateral references

От
Ashutosh Bapat
Дата:
Hi,

I am seeing different results with two queries which AFAIU have same semantics and hence are expected to give same results.

postgres=# \d t1
      Table "public.t1"
 Column |  Type   | Modifiers
--------+---------+-----------
 val    | integer |
 val2   | integer |

postgres=# \d t2
      Table "public.t2"
 Column |  Type   | Modifiers
--------+---------+-----------
 val    | integer |
 val2   | integer |

There's no data in the table to start with.

postgres=# insert into t1 values (1, 1);
postgres=# insert into t2 values (1, 1), (2, 2);

Session 1
postgres=# begin;
BEGIN
postgres=# update t1 set val = 2 where val2 = 1;
UPDATE 1

Session 2
postgres=# select * from t1, (select distinct val, val2 from t2) t2 where t1.val = t2.val for update of t1;

query waits here because of FOR UPDATE clause

Session 1
postgres=# commit;
COMMIT

Session 2 gives no rows
postgres=# select * from t1, (select distinct val, val2 from t2) t2 where t1.val = t2.val for update of t1;
val | val2 | val | val2
-----+------+-----+------
(0 rows)


Reset values of t1
postgres=# update t1 set val = 1 where val2 = 1;
UPDATE 1

Session 1
postgres=# begin;
BEGIN
postgres=# update t1 set val = 2 where val2 = 1;
UPDATE 1

Session 2
postgres=# select * from t1, lateral (select distinct val, val2 from t2 where t2.val = t1.val) t2 for update of t1;

query waits here

Session 1
postgres=# commit;
COMMIT

Session 2 gives results of the query
postgres=# select * from t1, lateral (select distinct val, val2 from t2 where t2.val = t1.val) t2 for update of t1;
 val | val2 | val | val2
-----+------+-----+------
   2 |    1 |   1 |    1
(1 row)

AFAIU, both the queries

select * from t1, (select distinct val, val2 from t2) t2 where t1.val = t2.val for update of t1;

AND

select * from t1, lateral (select distinct val, val2 from t2 where t2.val = t1.val) t2 for update of t1;

have same semantic and should give same results.

Is seeing different results expected behaviour?
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Confusing results with lateral references

От
Ashutosh Bapat
Дата:
There's another seemingly wrong result, not with lateral, but with FOR UPDATE.

postgres=# select * from t1;
 val | val2
-----+------
   1 |    1
(1 row)

postgres=# select * from t2;
 val | val2
-----+------
   1 |    1
   2 |    2
   1 |    1
(3 rows)

Session 1
postgres=# begin;
BEGIN
postgres=# update t1 set val = 2 where val2 = 1;
UPDATE 1

Session 2
postgres=# select * from t1 left join t2 on (t1.val = t2.val) for update of t1;

query waits

Session 1
postgres=# commit;
COMMIT


Session 2 query returns two rows
select * from t1 left join t2 on (t1.val = t2.val) for update of t1;
 val | val2 | val | val2
-----+------+-----+------
   2 |    1 |     |    
   2 |    1 |     |    
(2 rows)

It's confusing to see two rows from left join result when the table really has only a single row. Is this behaviour expected?

On Thu, Dec 3, 2015 at 3:49 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
Hi,

I am seeing different results with two queries which AFAIU have same semantics and hence are expected to give same results.

postgres=# \d t1
      Table "public.t1"
 Column |  Type   | Modifiers
--------+---------+-----------
 val    | integer |
 val2   | integer |

postgres=# \d t2
      Table "public.t2"
 Column |  Type   | Modifiers
--------+---------+-----------
 val    | integer |
 val2   | integer |

There's no data in the table to start with.

postgres=# insert into t1 values (1, 1);
postgres=# insert into t2 values (1, 1), (2, 2);

Session 1
postgres=# begin;
BEGIN
postgres=# update t1 set val = 2 where val2 = 1;
UPDATE 1

Session 2
postgres=# select * from t1, (select distinct val, val2 from t2) t2 where t1.val = t2.val for update of t1;

query waits here because of FOR UPDATE clause

Session 1
postgres=# commit;
COMMIT

Session 2 gives no rows
postgres=# select * from t1, (select distinct val, val2 from t2) t2 where t1.val = t2.val for update of t1;
val | val2 | val | val2
-----+------+-----+------
(0 rows)


Reset values of t1
postgres=# update t1 set val = 1 where val2 = 1;
UPDATE 1

Session 1
postgres=# begin;
BEGIN
postgres=# update t1 set val = 2 where val2 = 1;
UPDATE 1

Session 2
postgres=# select * from t1, lateral (select distinct val, val2 from t2 where t2.val = t1.val) t2 for update of t1;

query waits here

Session 1
postgres=# commit;
COMMIT

Session 2 gives results of the query
postgres=# select * from t1, lateral (select distinct val, val2 from t2 where t2.val = t1.val) t2 for update of t1;
 val | val2 | val | val2
-----+------+-----+------
   2 |    1 |   1 |    1
(1 row)

AFAIU, both the queries

select * from t1, (select distinct val, val2 from t2) t2 where t1.val = t2.val for update of t1;

AND

select * from t1, lateral (select distinct val, val2 from t2 where t2.val = t1.val) t2 for update of t1;

have same semantic and should give same results.

Is seeing different results expected behaviour?
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Confusing results with lateral references

От
Amit Langote
Дата:
On 2015/12/03 21:26, Ashutosh Bapat wrote:
> Session 1
> postgres=# begin;
> BEGIN
> postgres=# update t1 set val = 2 where val2 = 1;
> UPDATE 1
> 
> Session 2
> postgres=# select * from t1 left join t2 on (t1.val = t2.val) for update of
> t1;
> 
> query waits
> 
> Session 1
> postgres=# commit;
> COMMIT
> 
> 
> Session 2 query returns two rows
> select * from t1 left join t2 on (t1.val = t2.val) for update of t1;
>  val | val2 | val | val2
> -----+------+-----+------
>    2 |    1 |     |
>    2 |    1 |     |
> (2 rows)
> 
> It's confusing to see two rows from left join result when the table really
> has only a single row. Is this behaviour expected?

Maybe it is. Because the other table still has two (1, 1) rows, LockRows's
subplan would still produce two rows in result, no?

Thanks,
Amit





Re: Confusing results with lateral references

От
Ashutosh Bapat
Дата:


On Fri, Dec 4, 2015 at 10:58 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2015/12/03 21:26, Ashutosh Bapat wrote:
> Session 1
> postgres=# begin;
> BEGIN
> postgres=# update t1 set val = 2 where val2 = 1;
> UPDATE 1
>
> Session 2
> postgres=# select * from t1 left join t2 on (t1.val = t2.val) for update of
> t1;
>
> query waits
>
> Session 1
> postgres=# commit;
> COMMIT
>
>
> Session 2 query returns two rows
> select * from t1 left join t2 on (t1.val = t2.val) for update of t1;
>  val | val2 | val | val2
> -----+------+-----+------
>    2 |    1 |     |
>    2 |    1 |     |
> (2 rows)
>
> It's confusing to see two rows from left join result when the table really
> has only a single row. Is this behaviour expected?

Maybe it is. Because the other table still has two (1, 1) rows, LockRows's
subplan would still produce two rows in result, no?


Documentation at http://www.postgresql.org/docs/9.4/static/queries-table-expressions.html says
(T1) LEFT OUTER JOIN (T2)

First, an inner join is performed. Then, for each row in T1 that does not satisfy the join condition with any row in T2, a joined row is added with null values in columns of T2. Thus, the joined table always has at least one row for each row in T1.

So there should be only one row for each row of outer table that didn't join with the inner table. IOW a join with no joining rows should have same number of rows as outer table.
 
Thanks,
Amit





--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Confusing results with lateral references

От
Tom Lane
Дата:
Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:
> I am seeing different results with two queries which AFAIU have same
> semantics and hence are expected to give same results.

> postgres=# select * from t1, (select distinct val, val2 from t2) t2ss where t1.val = t2ss.val for update of t1;

> postgres=# select * from t1, lateral (select distinct val, val2 from t2 where t2.val = t1.val) t2ss for update of
t1;

(I renamed your inline sub-selects to avoid confusion between them and the
table t2.)

I'm skeptical that those should be claimed to have identical semantics.

In the first example, after we've found the join row (1,1,1,1), we block
to see if the pending update on t1 will commit.  After it does, we recheck
the join condition using the updated row from t1 (and the original row
from t2ss).  The condition fails, so the updated row is not output.

The same thing happens in the second example, ie, we consider the updated
row from t1 and the non-updated row from t2ss (NOT t2).  There are no join
conditions to recheck (in the outer query level), so the row passes, and
we output it.

If you'd allowed the FOR UPDATE to propagate into the sub-select, then the
sub-select's conditions would be considered as needing rechecks ... of
course, that would require removing the DISTINCT.

This example does show that a lateral reference to a FOR UPDATE table from
a non-FOR-UPDATE subselect has confusing behavior.  Maybe we ought to
forbid that.
        regards, tom lane



Re: Confusing results with lateral references

От
Tom Lane
Дата:
Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:
> There's another seemingly wrong result, not with lateral, but with FOR
> UPDATE.

[ shrug... ]  You're getting the post-update images of the two join
rows that would have been reported without FOR UPDATE.  This one is
definitely not a bug.
        regards, tom lane



Re: Confusing results with lateral references

От
Robert Haas
Дата:
On Fri, Dec 4, 2015 at 10:23 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:
>> I am seeing different results with two queries which AFAIU have same
>> semantics and hence are expected to give same results.
>
>> postgres=# select * from t1, (select distinct val, val2 from t2) t2ss where t1.val = t2ss.val for update of t1;
>
>> postgres=# select * from t1, lateral (select distinct val, val2 from t2 where t2.val = t1.val) t2ss for update of
t1;
>
> (I renamed your inline sub-selects to avoid confusion between them and the
> table t2.)
>
> I'm skeptical that those should be claimed to have identical semantics.
>
> In the first example, after we've found the join row (1,1,1,1), we block
> to see if the pending update on t1 will commit.  After it does, we recheck
> the join condition using the updated row from t1 (and the original row
> from t2ss).  The condition fails, so the updated row is not output.

Check.

> The same thing happens in the second example, ie, we consider the updated
> row from t1 and the non-updated row from t2ss (NOT t2).  There are no join
> conditions to recheck (in the outer query level), so the row passes, and
> we output it.

What's surprising is that t2.val = t1.val isn't rechecked here.  I
think that's not really possible, because of the DISTINCT operation,
which prevents us from identifying a single row from t2 that accounts
for the subquery's output row.  Not sure whether it would work without
the DISTINCT.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company