Обсуждение: Negative cost is seen for plan node

Поиск
Список
Период
Сортировка

Negative cost is seen for plan node

От
Richard Guo
Дата:
Hi all,

With the following statements on latest master (c81bd3b9), I find
negative cost for plan nodes.

create table a (i int, j int);
insert into a select i%100000, i from generate_series(1,1000000)i;
analyze a;

# explain select i from a group by i;
                           QUERY PLAN
-----------------------------------------------------------------
 HashAggregate  (cost=1300.00..-1585.82 rows=102043 width=4)
   Group Key: i
   Planned Partitions: 4
   ->  Seq Scan on a  (cost=0.00..14425.00 rows=1000000 width=4)
(4 rows)

In function cost_agg, when we add the disk costs of hash aggregation
that spills to disk, nbatches is calculated as 1.18 in this case. It is
greater than 1, so there will be spill. And the depth is calculated as
-1 in this case, with num_partitions being 4. I think this is where
thing goes wrong.

Thanks
Richard

Re: Negative cost is seen for plan node

От
Kyotaro Horiguchi
Дата:
At Mon, 23 Mar 2020 21:13:48 +0800, Richard Guo <guofenglinux@gmail.com> wrote in 
> Hi all,
> 
> With the following statements on latest master (c81bd3b9), I find
> negative cost for plan nodes.
> 
> create table a (i int, j int);
> insert into a select i%100000, i from generate_series(1,1000000)i;
> analyze a;
> 
> # explain select i from a group by i;
>                            QUERY PLAN
> -----------------------------------------------------------------
>  HashAggregate  (cost=1300.00..-1585.82 rows=102043 width=4)

Good catch!

>    Group Key: i
>    Planned Partitions: 4
>    ->  Seq Scan on a  (cost=0.00..14425.00 rows=1000000 width=4)
> (4 rows)
> 
> In function cost_agg, when we add the disk costs of hash aggregation
> that spills to disk, nbatches is calculated as 1.18 in this case. It is
> greater than 1, so there will be spill. And the depth is calculated as
> -1 in this case, with num_partitions being 4. I think this is where
> thing goes wrong.

The depth is the expected number of iterations of reading the relation.

>    depth = ceil( log(nbatches - 1) / log(num_partitions) );

I'm not sure what the expression based on, but apparently it is wrong
for nbatches <= 2.0. It looks like a thinko of something like this.

    depth = ceil( log(nbatches) / log(num_partitions + 1) );

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Negative cost is seen for plan node

От
Kyotaro Horiguchi
Дата:
At Mon, 23 Mar 2020 21:13:48 +0800, Richard Guo <guofenglinux@gmail.com> wrote in 
> Hi all,
> 
> With the following statements on latest master (c81bd3b9), I find
> negative cost for plan nodes.
> 
> create table a (i int, j int);
> insert into a select i%100000, i from generate_series(1,1000000)i;
> analyze a;
> 
> # explain select i from a group by i;
>                            QUERY PLAN
> -----------------------------------------------------------------
>  HashAggregate  (cost=1300.00..-1585.82 rows=102043 width=4)

Good catch!

>    Group Key: i
>    Planned Partitions: 4
>    ->  Seq Scan on a  (cost=0.00..14425.00 rows=1000000 width=4)
> (4 rows)
> 
> In function cost_agg, when we add the disk costs of hash aggregation
> that spills to disk, nbatches is calculated as 1.18 in this case. It is
> greater than 1, so there will be spill. And the depth is calculated as
> -1 in this case, with num_partitions being 4. I think this is where
> thing goes wrong.

The depth is the expected number of iterations of reading the relation.

>    depth = ceil( log(nbatches - 1) / log(num_partitions) );

I'm not sure what the expression based on, but apparently it is wrong
for nbatches <= 2.0. It looks like a thinko of something like this.

    depth = ceil( log(nbatches) / log(num_partitions + 1) );

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Negative cost is seen for plan node

От
Richard Guo
Дата:

On Tue, Mar 24, 2020 at 12:01 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
At Mon, 23 Mar 2020 21:13:48 +0800, Richard Guo <guofenglinux@gmail.com> wrote in
> Hi all,
>
> With the following statements on latest master (c81bd3b9), I find
> negative cost for plan nodes.
>
> create table a (i int, j int);
> insert into a select i%100000, i from generate_series(1,1000000)i;
> analyze a;
>
> # explain select i from a group by i;
>                            QUERY PLAN
> -----------------------------------------------------------------
>  HashAggregate  (cost=1300.00..-1585.82 rows=102043 width=4)

Good catch!

>    Group Key: i
>    Planned Partitions: 4
>    ->  Seq Scan on a  (cost=0.00..14425.00 rows=1000000 width=4)
> (4 rows)
>
> In function cost_agg, when we add the disk costs of hash aggregation
> that spills to disk, nbatches is calculated as 1.18 in this case. It is
> greater than 1, so there will be spill. And the depth is calculated as
> -1 in this case, with num_partitions being 4. I think this is where
> thing goes wrong.

The depth is the expected number of iterations of reading the relation.

>       depth = ceil( log(nbatches - 1) / log(num_partitions) );

Yes correct.
 

I'm not sure what the expression based on, but apparently it is wrong
for nbatches <= 2.0. It looks like a thinko of something like this.

        depth = ceil( log(nbatches) / log(num_partitions + 1) );

It seems to me we should use '(nbatches - 1)', without the log function.
Maybe I'm wrong.

I have sent this issue to the 'Memory-Bounded Hash Aggregation' thread.

Thanks
Richard

Re: Negative cost is seen for plan node

От
Richard Guo
Дата:

On Tue, Mar 24, 2020 at 12:01 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
At Mon, 23 Mar 2020 21:13:48 +0800, Richard Guo <guofenglinux@gmail.com> wrote in
> Hi all,
>
> With the following statements on latest master (c81bd3b9), I find
> negative cost for plan nodes.
>
> create table a (i int, j int);
> insert into a select i%100000, i from generate_series(1,1000000)i;
> analyze a;
>
> # explain select i from a group by i;
>                            QUERY PLAN
> -----------------------------------------------------------------
>  HashAggregate  (cost=1300.00..-1585.82 rows=102043 width=4)

Good catch!

>    Group Key: i
>    Planned Partitions: 4
>    ->  Seq Scan on a  (cost=0.00..14425.00 rows=1000000 width=4)
> (4 rows)
>
> In function cost_agg, when we add the disk costs of hash aggregation
> that spills to disk, nbatches is calculated as 1.18 in this case. It is
> greater than 1, so there will be spill. And the depth is calculated as
> -1 in this case, with num_partitions being 4. I think this is where
> thing goes wrong.

The depth is the expected number of iterations of reading the relation.

>       depth = ceil( log(nbatches - 1) / log(num_partitions) );

Yes correct.
 

I'm not sure what the expression based on, but apparently it is wrong
for nbatches <= 2.0. It looks like a thinko of something like this.

        depth = ceil( log(nbatches) / log(num_partitions + 1) );

It seems to me we should use '(nbatches - 1)', without the log function.
Maybe I'm wrong.

I have sent this issue to the 'Memory-Bounded Hash Aggregation' thread.

Thanks
Richard