Re: Reference to parent query from ANY sublink

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Reference to parent query from ANY sublink
Дата
Msg-id 10795.1386796541@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Reference to parent query from ANY sublink  (Kevin Grittner <kgrittn@ymail.com>)
Ответы Re: Reference to parent query from ANY sublink
Re: Reference to parent query from ANY sublink
Список pgsql-hackers
Kevin Grittner <kgrittn@ymail.com> writes:
> Kevin Grittner <kgrittn@ymail.com> wrote:
>> I applied it to master and ran the regression tests, and one of
>> the subselect tests failed.
>> 
>> This query:
>> 
>> SELECT '' AS six, f1 AS "Correlated Field", f3 AS "Second
>> Field"
>> �� FROM SUBSELECT_TBL upper
>> �� WHERE f1 IN
>> ���� (SELECT f2 FROM SUBSELECT_TBL WHERE CAST(upper.f2 AS float) = f3);

>> [ ... ] during the `make check` or `make install-check` [ ... ]
>> is missing the last two rows.� Oddly, if I go into the database
>> later and try it, the rows show up.� It's not immediately
>> apparent to me what's wrong.

> Using the v2 patch, with the default statistics from table
> creation, the query modified with an alias of "lower" for the
> second reference, just for clarity, yields a plan which generates
> incorrect results:

> �Hash Join� (cost=37.12..80.40 rows=442 width=12) (actual time=0.059..0.064 rows=3 loops=1)
> �� Hash Cond: (((upper.f2)::double precision = lower.f3) AND (upper.f1 = lower.f2))
> �� ->� Seq Scan on subselect_tbl upper� (cost=0.00..27.70 rows=1770 width=16) (actual time=0.006..0.007 rows=8
loops=1)
> �� ->� Hash� (cost=34.12..34.12 rows=200 width=12) (actual time=0.020..0.020 rows=5 loops=1)
> �������� Buckets: 1024� Batches: 1� Memory Usage: 1kB
> �������� ->� HashAggregate� (cost=32.12..34.12 rows=200 width=12) (actual time=0.014..0.018 rows=6 loops=1)
> �������������� ->� Seq Scan on subselect_tbl lower� (cost=0.00..27.70 rows=1770 width=12) (actual time=0.002..0.004
rows=8loops=1)
 
> �Total runtime: 0.111 ms

FWIW, that plan isn't obviously wrong; if it is broken, most likely the
reason is that the HashAggregate is incorrectly unique-ifying the lower
table.  (Unfortunately, EXPLAIN doesn't show enough about the HashAgg
to know what it's doing exactly.)  The given query is, I think, in
principle equivalent to
SELECT ... FROM SUBSELECT_TBL upper WHERE (f1, f2::float) IN   (SELECT f2, f3 FROM SUBSELECT_TBL);

and if you ask unmodified HEAD to plan that you get
Hash Join  (cost=41.55..84.83 rows=442 width=16)  Hash Cond: ((upper.f1 = subselect_tbl.f2) AND ((upper.f2)::double
precision= subselect_tbl.f3))  ->  Seq Scan on subselect_tbl upper  (cost=0.00..27.70 rows=1770 width=16)  ->  Hash
(cost=38.55..38.55rows=200 width=12)        ->  HashAggregate  (cost=36.55..38.55 rows=200 width=12)              ->
SeqScan on subselect_tbl  (cost=0.00..27.70 rows=1770 width=12)
 

which is the same thing at the visible level of detail ... but this
version computes the correct result.  The cost of the HashAggregate is
estimated higher, though, which suggests that maybe it's distinct'ing on
two columns where the bogus plan only does one.

Not sure about where Antonin's patch is going off the rails.  I suspect
it's too simple somehow, but it's also possible that it's OK and the
real issue is some previously undetected bug in LATERAL processing.
        regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Josh Berkus
Дата:
Сообщение: Re: autovacuum_work_mem
Следующее
От: Gavin Flower
Дата:
Сообщение: Re: ANALYZE sampling is too good