Обсуждение: What does explain show ?

Поиск
Список
Период
Сортировка

What does explain show ?

От
"Hiroshi Inoue"
Дата:
Hi all,

I have a question about "explain" output.
Could someone teach me ?

Let a and b tables such that
 create table a ( int4    pkey  primary key, .... );
 create table b ( int4    key1, int2    key2, ...., primary key (key1,key2) );

Table a has 15905 rows and table b has 25905 rows.


For the following query
 select a.pkey, b.key2 from a, b where b.key1 = 1369 and     a.pkey = b.key1;

"explain" shows

NOTICE:  QUERY PLAN:

Nested Loop  (cost=6.19 rows=3 width=10) ->  Index Scan using b_pkey on b on b  (cost=2.09 rows=2 width=6) ->  Index
Scanusing a_pkey on a on a  (cost=2.05 rows=15905 width=4)
 

What does "rows=15905" of InnerPlan mean ?
Is "rows=3" of Nested Loop irrelevant to "rows=15905" ?

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp


Re: [HACKERS] What does explain show ?

От
Bruce Momjian
Дата:
> Table a has 15905 rows and table b has 25905 rows.
> 
> 
> For the following query
> 
>   select a.pkey, b.key2 from a, b
>   where b.key1 = 1369
>   and     a.pkey = b.key1;
> 
> "explain" shows
> 
> NOTICE:  QUERY PLAN:
> 
> Nested Loop  (cost=6.19 rows=3 width=10)
>   ->  Index Scan using b_pkey on b on b  (cost=2.09 rows=2 width=6)
>   ->  Index Scan using a_pkey on a on a  (cost=2.05 rows=15905 width=4)
> 
> What does "rows=15905" of InnerPlan mean ?
> Is "rows=3" of Nested Loop irrelevant to "rows=15905" ?

It means it thinks it is going to access X rows in that pass, but end
up with 3 joined rows as a result of the nested loop.  It is only an
estimate, based on table size and column uniqueness from vacuum analyze.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


RE: [HACKERS] What does explain show ?

От
"Hiroshi Inoue"
Дата:
> -----Original Message-----
> From: Bruce Momjian [mailto:maillist@candle.pha.pa.us]
> Sent: Wednesday, July 14, 1999 11:57 AM
> To: Hiroshi Inoue
> Cc: pgsql-hackers
> Subject: Re: [HACKERS] What does explain show ?
>
>
> > Table a has 15905 rows and table b has 25905 rows.
> >
> >
> > For the following query
> >
> >   select a.pkey, b.key2 from a, b
> >   where b.key1 = 1369
> >   and     a.pkey = b.key1;
> >
> > "explain" shows
> >
> > NOTICE:  QUERY PLAN:
> >
> > Nested Loop  (cost=6.19 rows=3 width=10)
> >   ->  Index Scan using b_pkey on b on b  (cost=2.09 rows=2 width=6)
> >   ->  Index Scan using a_pkey on a on a  (cost=2.05 rows=15905 width=4)
> >
> > What does "rows=15905" of InnerPlan mean ?
> > Is "rows=3" of Nested Loop irrelevant to "rows=15905" ?
>
> It means it thinks it is going to access X rows in that pass, but end
> up with 3 joined rows as a result of the nested loop.  It is only an
> estimate, based on table size and column uniqueness from vacuum analyze.
>

Hmmm,I couldn't understand where does "rows=15905" come from.
Shouldn't "rows" of InnerPlan be 1 ?
Is the caluculation "rows of Nested loop = rows of OuterPlan * rows of
InnerPlan"
wrong ?

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp




Re: [HACKERS] What does explain show ?

От
Tom Lane
Дата:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
>>>> select a.pkey, b.key2 from a, b
>>>> where b.key1 = 1369
>>>> and     a.pkey = b.key1;
>>>> 
>>>> NOTICE:  QUERY PLAN:
>>>> 
>>>> Nested Loop  (cost=6.19 rows=3 width=10)
>>>>   -> Index Scan using b_pkey on b on b  (cost=2.09 rows=2 width=6)
>>>>   -> Index Scan using a_pkey on a on a  (cost=2.05 rows=15905 width=4)

> Hmmm,I couldn't understand where does "rows=15905" come from.
> Shouldn't "rows" of InnerPlan be 1 ?

No, because that number is formed by considering just the available
restriction clauses on table A, and there aren't any --- so the system
uses the whole size of A as the rows count.  The fact that we are
joining against another table should be taken into account at the
next level up, ie the nested loop.

Actually the number that looks fishy to me for the innerplan is the cost
--- if the system thinks it will be visiting all 15905 rows each time,
it should be estimating a cost of more than 2.05 to do it.

> Is the caluculation "rows of Nested loop = rows of OuterPlan * rows of
> InnerPlan" wrong ?

Careful --- rows produced and cost are quite different things.  The
cost estimate for a nestloop is "cost of outerplan + rows of outerplan *
cost of innerplan", but we don't necessarily expect to get as many rows
out as the product of the row counts.  Typically, it'd be lower due to
join selectivity.  Above you see only 3 rows out, which is not too bad
a guess, certainly better than 2*15905 would be.

You raise a good point though.  That cost estimate is reasonable if
the inner plan is a sequential scan, since then the system will actually
have to visit each inner tuple on each iteration.  But if the inner plan
is an index scan then the outer tuple's key value could be used as an
index constraint, reducing the number of tuples visited by a lot.
I am not sure whether the executor is smart enough to do that --- there
are comments in nodeNestloop suggesting that it is, but I haven't traced
through it for sure.  I am fairly sure that the optimizer isn't figuring
the costs correctly, if that is how it's done :-(
        regards, tom lane


RE: [HACKERS] What does explain show ?

От
"Hiroshi Inoue"
Дата:
>
> > Is the caluculation "rows of Nested loop = rows of OuterPlan * rows of
> > InnerPlan" wrong ?
>
> Careful --- rows produced and cost are quite different things.  The
> cost estimate for a nestloop is "cost of outerplan + rows of outerplan *
> cost of innerplan", but we don't necessarily expect to get as many rows
> out as the product of the row counts.  Typically, it'd be lower due to
> join selectivity.  Above you see only 3 rows out, which is not too bad
> a guess, certainly better than 2*15905 would be.
>

I see. rows of Join = rows of outerplan * rows of innerplan * "join
selectity".
and "join selectivity" is calcutated by eqjoinsel() etc.

Thanks.

Hiroshi Inoue
Inoue@tpf.co.jp



Re: [HACKERS] What does explain show ?

От
Tom Lane
Дата:
Quite some time ago, "Hiroshi Inoue" <Inoue@tpf.co.jp> wrote:
> I have a question about "explain" output.
> Table a has 15905 rows and table b has 25905 rows.
> For the following query
>   select a.pkey, b.key2 from a, b
>   where b.key1 = 1369
>   and     a.pkey = b.key1;
> "explain" shows

> NOTICE:  QUERY PLAN:
> Nested Loop  (cost=6.19 rows=3 width=10)
>   -> Index Scan using b_pkey on b on b  (cost=2.09 rows=2 width=6)
>   -> Index Scan using a_pkey on a on a  (cost=2.05 rows=15905 width=4)

> What does "rows=15905" of InnerPlan mean ?

I have finally traced through enough of the optimizer logic that I
understand where these numbers are coming from.  A nestloop with an
inner index scan is a slightly unusual beast, because the cost of the
inner scan can often be reduced by using the join conditions as index
restrictions.  For example, if we have "outer.a = inner.b" and the
inner scan is an indexscan on b, then during the inner scan that's
done for an outer tuple with a = 42 we'd use "b = 42" as an indexqual.
This makes the inner scan much cheaper than it would be if we had to
scan the whole table.

Now the problem is that the "rows=" numbers come from the RelOptInfo
nodes for each relation, and they are set independently of the context
that the relation is used in.  For any context except an inner
indexscan, we would indeed have to scan all 15905 rows of a, because
we have no pure-restriction WHERE clauses that apply to a.  So that's
why rows says 15905.  The cost is being estimated correctly for the
context, though --- an indexscan across 15905 rows would take a lot more
than 2 disk accesses.

This is just a cosmetic bug since it doesn't affect the planner's cost
estimate; still, it makes the EXPLAIN output confusing.  I think the
output for a nestloop should probably show the estimated number of rows
that will be scanned during each pass of the inner indexscan, which
would be about 1 in the above example.  This could be done by saving the
estimated row count (or just the selectivity) in IndexScan path nodes.

Comments?  Does anyone think we should show some other number?
        regards, tom lane


RE: [HACKERS] What does explain show ?

От
"Hiroshi Inoue"
Дата:
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Wednesday, January 05, 2000 9:31 AM
> 
> Quite some time ago, "Hiroshi Inoue" <Inoue@tpf.co.jp> wrote:
> > I have a question about "explain" output.
> > Table a has 15905 rows and table b has 25905 rows.
> > For the following query
> >   select a.pkey, b.key2 from a, b
> >   where b.key1 = 1369
> >   and     a.pkey = b.key1;
> > "explain" shows
> 
> > NOTICE:  QUERY PLAN:
> > Nested Loop  (cost=6.19 rows=3 width=10)
> >   -> Index Scan using b_pkey on b on b  (cost=2.09 rows=2 width=6)
> >   -> Index Scan using a_pkey on a on a  (cost=2.05 rows=15905 width=4)
> 
> > What does "rows=15905" of InnerPlan mean ?
> 
> I have finally traced through enough of the optimizer logic that I
> understand where these numbers are coming from.  A nestloop with an
> inner index scan is a slightly unusual beast, because the cost of the
> inner scan can often be reduced by using the join conditions as index
> restrictions.  For example, if we have "outer.a = inner.b" and the
> inner scan is an indexscan on b, then during the inner scan that's
> done for an outer tuple with a = 42 we'd use "b = 42" as an indexqual.
> This makes the inner scan much cheaper than it would be if we had to
> scan the whole table.
> 
> Now the problem is that the "rows=" numbers come from the RelOptInfo
> nodes for each relation, and they are set independently of the context
> that the relation is used in.  For any context except an inner
> indexscan, we would indeed have to scan all 15905 rows of a, because
> we have no pure-restriction WHERE clauses that apply to a.  So that's
> why rows says 15905.  The cost is being estimated correctly for the
> context, though --- an indexscan across 15905 rows would take a lot more
> than 2 disk accesses.
> 
> This is just a cosmetic bug since it doesn't affect the planner's cost
> estimate; still, it makes the EXPLAIN output confusing.  I think the
> output for a nestloop should probably show the estimated number of rows
> that will be scanned during each pass of the inner indexscan, which
> would be about 1 in the above example.  This could be done by saving the
> estimated row count (or just the selectivity) in IndexScan path nodes.
> 
> Comments?  Does anyone think we should show some other number?
>

I agree with you.
The rows should show some kind of average number of rows,because
the cost of innerplan seems to mean average cost.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp