Обсуждение: exists

Поиск
Список
Период
Сортировка

exists

От
Joseph Shraibman
Дата:
I want to select a boolean if there exists a row in another table that matches this one. 
So I did select ..., (select count(*) from table2 where ...) > 0 ...
but that count(*) was taking forever.  I know there is a better way to do it, but whenever 
I try to use EXISTS I get a syntax error.  What is the proper way?

-- 
Joseph Shraibman
jks@selectacast.net
Increase signal to noise ratio.  http://www.targabot.com



Re: exists

От
Stephan Szabo
Дата:
On Mon, 20 Aug 2001, Joseph Shraibman wrote:

> I want to select a boolean if there exists a row in another table that matches this one. 
> So I did select ..., (select count(*) from table2 where ...) > 0 ...
> but that count(*) was taking forever.  I know there is a better way to do it, but whenever 
> I try to use EXISTS I get a syntax error.  What is the proper way?

Hmm, on current sources I can do:
select ..., exists (select * from table2 where ...) from table1;

I don't know if that's new though...



Re: exists

От
Joseph Shraibman
Дата:
Thank you, I was missing the parens.

If I do an explain I see:

->  Index Scan using m_u_and_p_key on m  (cost=0.00..3035.22 rows=1363 width=12)


even if I put a limit 1 on the select.  Why is that?

Stephan Szabo wrote:
> On Mon, 20 Aug 2001, Joseph Shraibman wrote:
> 
> 
>>I want to select a boolean if there exists a row in another table that matches this one. 
>>So I did select ..., (select count(*) from table2 where ...) > 0 ...
>>but that count(*) was taking forever.  I know there is a better way to do it, but whenever 
>>I try to use EXISTS I get a syntax error.  What is the proper way?
>>
> 
> Hmm, on current sources I can do:
> select ..., exists (select * from table2 where ...) from table1;
> 
> I don't know if that's new though...
> 


-- 
Joseph Shraibman
jks@selectacast.net
Increase signal to noise ratio.  http://www.targabot.com



Re: exists

От
Stephan Szabo
Дата:
On Tue, 21 Aug 2001, Joseph Shraibman wrote:

> Thank you, I was missing the parens.
> 
> If I do an explain I see:
> 
> ->  Index Scan using m_u_and_p_key on m  (cost=0.00..3035.22 rows=1363 width=12)
> 
> 
> even if I put a limit 1 on the select.  Why is that?

Is that the inner query (on the exists) or the entire explain?

I guess it'd be useful to see the whole query and explain and maybe
schema.

> Stephan Szabo wrote:
> > On Mon, 20 Aug 2001, Joseph Shraibman wrote:
> > 
> > 
> >>I want to select a boolean if there exists a row in another table that matches this one. 
> >>So I did select ..., (select count(*) from table2 where ...) > 0 ...
> >>but that count(*) was taking forever.  I know there is a better way to do it, but whenever 
> >>I try to use EXISTS I get a syntax error.  What is the proper way?
> >>
> > 
> > Hmm, on current sources I can do:
> > select ..., exists (select * from table2 where ...) from table1;
> > 
> > I don't know if that's new though...



Re: exists

От
Joseph Shraibman
Дата:
Stephan Szabo wrote:

>>Limit  (cost=48.39..48.39 rows=1 width=70)
>>   ->  Sort  (cost=48.39..48.39 rows=2 width=70)
>>         ->  Hash Join  (cost=18.46..48.38 rows=2 width=70)
>>               ->  Index Scan using u_p_key on u  (cost=0.00..27.66 rows=48 width=28)
>>               ->  Hash  (cost=18.39..18.39 rows=28 width=42)
>>                     ->  Seq Scan on d  (cost=0.00..18.39 rows=28 width=42)
>>               SubPlan
>>                 ->  Nested Loop  (cost=0.00..4.04 rows=1 width=20)
>>                       ->  Index Scan using a_pkey on a  (cost=0.00..2.01 rows=1 width=4)
>>                       ->  Index Scan using p_pkey on pu  (cost=0.00..2.02 rows=1 width=16)
>>                 ->  Index Scan using m_u_and_p_key on m  (cost=0.00..3035.22 rows=1363 
>>width=44)
>>
> 
> At least, what was the query that generated this and is it running
> slowly or otherwise giving problems?  The total explain doesn't seem
> unreasonable to my relatively untrained eyes in the absense of knowing the
> query :)
> 

Well the total cost should be at least as big as the sub-costs, no?  Doesn't that seem 
strange?


-- 
Joseph Shraibman
jks@selectacast.net
Increase signal to noise ratio.  http://www.targabot.com



Re: exists

От
Joseph Shraibman
Дата:

Stephan Szabo wrote:
> On Tue, 21 Aug 2001, Joseph Shraibman wrote:
> 
> 
>>Thank you, I was missing the parens.
>>
>>If I do an explain I see:
>>
>>->  Index Scan using m_u_and_p_key on m  (cost=0.00..3035.22 rows=1363 width=12)
>>
>>
>>even if I put a limit 1 on the select.  Why is that?
>>
> 
> Is that the inner query (on the exists) or the entire explain?

Just the inner query

> 
> I guess it'd be useful to see the whole query and explain and maybe
> schema.

That's big and complicated.  Can you reproduce this somewhere else?

Here is the whole explain:

Limit  (cost=48.39..48.39 rows=1 width=70)  ->  Sort  (cost=48.39..48.39 rows=2 width=70)        ->  Hash Join
(cost=18.46..48.38rows=2 width=70)              ->  Index Scan using u_p_key on u  (cost=0.00..27.66 rows=48 width=28)
           ->  Hash  (cost=18.39..18.39 rows=28 width=42)                    ->  Seq Scan on d  (cost=0.00..18.39
rows=28width=42)              SubPlan                ->  Nested Loop  (cost=0.00..4.04 rows=1 width=20)
    ->  Index Scan using a_pkey on a  (cost=0.00..2.01 rows=1 width=4)                      ->  Index Scan using p_pkey
onpu  (cost=0.00..2.02 rows=1 width=16)                ->  Index Scan using m_u_and_p_key on m  (cost=0.00..3035.22
rows=1363
 
width=44)


I tried to make a simple test case:

create table ta (a int);
create table tb (c int, b int);
insert into ta values (2);
insert into ta values (4);
insert into ta values (6);
insert into ta values (8);
insert into ta values (10);
insert into tb values (1,1);
insert into tb values (2,2);
insert into tb values (3,3);
insert into tb values (4,4);
vacuum analyze;

select c, b , exists(select a from ta where a = c) from tb;
explain select c, b , exists(select a from ta where a = c) from tb;

drop table ta;
drop table tb;


... but the data is so small it uses a seq scan:
Seq Scan on tb  (cost=0.00..1.04 rows=4 width=8)  SubPlan    ->  Seq Scan on ta  (cost=0.00..1.06 rows=1 width=4)

> 
> 
>>Stephan Szabo wrote:
>>
>>>On Mon, 20 Aug 2001, Joseph Shraibman wrote:
>>>
>>>
>>>
>>>>I want to select a boolean if there exists a row in another table that matches this one. 
>>>>So I did select ..., (select count(*) from table2 where ...) > 0 ...
>>>>but that count(*) was taking forever.  I know there is a better way to do it, but whenever 
>>>>I try to use EXISTS I get a syntax error.  What is the proper way?
>>>>
>>>>
>>>Hmm, on current sources I can do:
>>>select ..., exists (select * from table2 where ...) from table1;
>>>
>>>I don't know if that's new though...
>>>


-- 
Joseph Shraibman
jks@selectacast.net
Increase signal to noise ratio.  http://www.targabot.com



Re: exists

От
Stephan Szabo
Дата:
> Stephan Szabo wrote:
> > On Tue, 21 Aug 2001, Joseph Shraibman wrote:
> > 
> > 
> >>Thank you, I was missing the parens.
> >>
> >>If I do an explain I see:
> >>
> >>->  Index Scan using m_u_and_p_key on m  (cost=0.00..3035.22 rows=1363 width=12)
> >>
> >>
> >>even if I put a limit 1 on the select.  Why is that?
> >>
> > 
> > Is that the inner query (on the exists) or the entire explain?
> 
> Just the inner query
> 
> > 
> > I guess it'd be useful to see the whole query and explain and maybe
> > schema.
> 
> That's big and complicated.  Can you reproduce this somewhere else?

I've only tried small data sets so nothing terribly meaningful.
> 
> Here is the whole explain:
> 
> Limit  (cost=48.39..48.39 rows=1 width=70)
>    ->  Sort  (cost=48.39..48.39 rows=2 width=70)
>          ->  Hash Join  (cost=18.46..48.38 rows=2 width=70)
>                ->  Index Scan using u_p_key on u  (cost=0.00..27.66 rows=48 width=28)
>                ->  Hash  (cost=18.39..18.39 rows=28 width=42)
>                      ->  Seq Scan on d  (cost=0.00..18.39 rows=28 width=42)
>                SubPlan
>                  ->  Nested Loop  (cost=0.00..4.04 rows=1 width=20)
>                        ->  Index Scan using a_pkey on a  (cost=0.00..2.01 rows=1 width=4)
>                        ->  Index Scan using p_pkey on pu  (cost=0.00..2.02 rows=1 width=16)
>                  ->  Index Scan using m_u_and_p_key on m  (cost=0.00..3035.22 rows=1363 
> width=44)

At least, what was the query that generated this and is it running
slowly or otherwise giving problems?  The total explain doesn't seem
unreasonable to my relatively untrained eyes in the absense of knowing the
query :)



Re: exists

От
Tom Lane
Дата:
Joseph Shraibman <jks@selectacast.net> writes:
> Well the total cost should be at least as big as the sub-costs, no?

Not if the sub-plan in question is for an EXISTS.  The sub-plan cost
is stated in terms of cost to retrieve all rows --- but the outer level
EXISTS isn't going to retrieve all rows, it's going to stop as soon as
it gets even one.  So the cost estimate that propagates up is
3035.22/1363.

BTW, this sort of consideration is why 7.0 and later state plan costs
in terms of startup and total cost: if a plan has a nontrivial startup
cost, just dividing total cost by number of tuples isn't a good way to
estimate the costs of partial retrieval.  Really the cost estimate is
figured as
startup_cost + (total_cost-startup_cost) * tuples_retrieved/total_tuples.
This is important for EXISTS, LIMIT, and maybe a couple other things.
Without this, we'd not be bright enough to choose fast-startup plans
over least-total-cost plans in cases where fast-startup is what you want.
        regards, tom lane


Re: exists

От
Joseph Shraibman
Дата:
Then why does the explain say rows=1363 ?

I don't mean to nitpick here, but maybe this is the symptom of a larger problem.

Tom Lane wrote:
> Joseph Shraibman <jks@selectacast.net> writes:
> 
>>Well the total cost should be at least as big as the sub-costs, no?
>>
> 
> Not if the sub-plan in question is for an EXISTS.  The sub-plan cost
> is stated in terms of cost to retrieve all rows --- but the outer level
> EXISTS isn't going to retrieve all rows, it's going to stop as soon as
> it gets even one.  So the cost estimate that propagates up is
> 3035.22/1363.
> 
> BTW, this sort of consideration is why 7.0 and later state plan costs
> in terms of startup and total cost: if a plan has a nontrivial startup
> cost, just dividing total cost by number of tuples isn't a good way to
> estimate the costs of partial retrieval.  Really the cost estimate is
> figured as
> startup_cost + (total_cost-startup_cost) * tuples_retrieved/total_tuples.
> This is important for EXISTS, LIMIT, and maybe a couple other things.
> Without this, we'd not be bright enough to choose fast-startup plans
> over least-total-cost plans in cases where fast-startup is what you want.
> 
>             regards, tom lane
> 


-- 
Joseph Shraibman
jks@selectacast.net
Increase signal to noise ratio.  http://www.targabot.com



Re: exists

От
Tom Lane
Дата:
Joseph Shraibman <jks@selectacast.net> writes:
> Then why does the explain say rows=1363 ?

That's the estimate of how many rows the inner SELECT would return, if
left free to return them all.  You should get the same row count
estimate (though quite possibly a different plan) if you just do
an EXPLAIN of the sub-select that you have inside EXISTS.  Of course you
can't easily do that if the sub-select has outer references, but
consider this simplified example from the regression database:

regression=# explain select * from tenk1;
NOTICE:  QUERY PLAN:

Seq Scan on tenk1  (cost=0.00..333.00 rows=10000 width=148)

EXPLAIN
regression=# explain select * from tenk1 where unique1 > 9000;
NOTICE:  QUERY PLAN:

Seq Scan on tenk1  (cost=0.00..358.00 rows=1045 width=148)

EXPLAIN
regression=# explain select unique2,
regression-# exists(select * from tenk1 where unique1 > 9000),
regression-# unique1 from tenk1;
NOTICE:  QUERY PLAN:

Seq Scan on tenk1  (cost=0.00..333.00 rows=10000 width=8) InitPlan   ->  Seq Scan on tenk1  (cost=0.00..358.00
rows=1045width=148)
 

EXPLAIN
regression=# explain select unique2,
regression-# exists(select * from tenk1 where unique1 > out1.unique2),
regression-# unique1 from tenk1 out1;
NOTICE:  QUERY PLAN:

Seq Scan on tenk1 out1  (cost=0.00..333.00 rows=10000 width=8) SubPlan   ->  Seq Scan on tenk1  (cost=0.00..358.00
rows=3333width=148)
 

EXPLAIN

In the last case the rows count has changed from a statistics-driven
estimate to a default estimate because the planner doesn't have any
idea what out1.unique2 will be on any given execution of the subplan.

> I don't mean to nitpick here, but maybe this is the symptom of a
> larger problem.

The only "larger problem" I see here is that the planner doesn't bother
to estimate the costs of expensive expressions in the SELECT target
list; you can see by comparing the above examples that the total cost
at the top level doesn't include the obviously-nontrivial cost of
evaluating the EXIST expressions.  This however is not a bug, but a
deliberate simplification to save planning time.  The planner cannot
alter the number of times the SELECT output expressions are evaluated
(at least not if it's delivering the right answer) so there's no point
in worrying whether they are expensive or cheap.  But it would include
the costs of an EXISTS appearing in WHERE.
        regards, tom lane


Re: exists

От
Joseph Shraibman
Дата:
Why does explain show more than one row, even if there is a LIMIT = 1?



-- 
Joseph Shraibman
jks@selectacast.net
Increase signal to noise ratio.  http://www.targabot.com



Re: exists

От
Tom Lane
Дата:
Joseph Shraibman <jks@selectacast.net> writes:
> Why does explain show more than one row, even if there is a LIMIT = 1?

What version are you running?  I get results like

regression=# explain select * from tenk1 limit 1;
NOTICE:  QUERY PLAN:

Limit  (cost=0.00..0.03 rows=1 width=148) ->  Seq Scan on tenk1  (cost=0.00..333.00 rows=10000 width=148)

EXPLAIN

which seems at least moderately self-explanatory.
        regards, tom lane


Re: exists

От
Joseph Shraibman
Дата:
I'm running 7.1.3.  What does 'rows=10000' mean?   The number of rows returned or the 
number postgres has to look through?

Tom Lane wrote:
> Joseph Shraibman <jks@selectacast.net> writes:
> 
>>Why does explain show more than one row, even if there is a LIMIT = 1?
>>
> 
> What version are you running?  I get results like
> 
> regression=# explain select * from tenk1 limit 1;
> NOTICE:  QUERY PLAN:
> 
> Limit  (cost=0.00..0.03 rows=1 width=148)
>   ->  Seq Scan on tenk1  (cost=0.00..333.00 rows=10000 width=148)
> 
> EXPLAIN
> 
> which seems at least moderately self-explanatory.
> 
>             regards, tom lane
> 


-- 
Joseph Shraibman
jks@selectacast.net
Increase signal to noise ratio.  http://www.targabot.com