[HACKERS] 200 = 199 + 1?

Поиск
Список
Период
Сортировка
От Marko Tiikkaja
Тема [HACKERS] 200 = 199 + 1?
Дата
Msg-id CAL9smLC_ZfUO1YxEWMoiuxhEA_VriRhk5U1BQj7wXJdWLYXWKA@mail.gmail.com
обсуждение исходный текст
Ответы Re: [HACKERS] 200 = 199 + 1?  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Hi,

I just came across this very peculiar behavior:

=# create table foo(id int primary key);
CREATE TABLE
=# insert into foo select generate_series(1, 1000000);
INSERT 0 1000000
=# set enable_hashjoin to false; set enable_mergejoin to false;
SET
SET
=# explain select * from foo where id in (select i from generate_series(1, 200) i limit 199);
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Nested Loop  (cost=4.90..1648.52 rows=199 width=4)
   ->  HashAggregate  (cost=4.48..6.47 rows=199 width=4)
         Group Key: i.i
         ->  Limit  (cost=0.00..1.99 rows=199 width=4)
               ->  Function Scan on generate_series i  (cost=0.00..10.00 rows=1000 width=4)
   ->  Index Only Scan using foo_pkey on foo  (cost=0.42..8.24 rows=1 width=4)
         Index Cond: (id = i.i)
(7 rows)

=# explain select * from foo where id in (select i from generate_series(1, 200) i limit 200);
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Nested Loop  (cost=4.93..1653.00 rows=500000 width=4)
   ->  HashAggregate  (cost=4.50..6.50 rows=200 width=4)
         Group Key: i.i
         ->  Limit  (cost=0.00..2.00 rows=200 width=4)
               ->  Function Scan on generate_series i  (cost=0.00..10.00 rows=1000 width=4)
   ->  Index Only Scan using foo_pkey on foo  (cost=0.42..8.22 rows=1 width=4)
         Index Cond: (id = i.i)
(7 rows)

So it seems that once the HashAggregate estimates to return 200 rows or more, something extremely weird happens and the Nested Loop's estimate goes wild.  I've recently seen numerous instances of this kind of a problem where the row estimates from a nested loop's child nodes are very reasonable but the loop itself goes absolutely nuts.  I can't see how this can possibly be justified.

I wonder if the nested loop shouldn't have some kind of a cap on its own estimate if it's wildly off of what you'd get by multiplying the child nodes' estimates with each other?


.m

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: [HACKERS] Enhancements to passwordcheck
Следующее
От: Kohei KaiGai
Дата:
Сообщение: Re: [HACKERS] Float value 'Infinity' is cast to numeric 1 on Windows