Bogus nestloop rows estimate in 8.4.7

Поиск
Список
Период
Сортировка
От Marti Raudsepp
Тема Bogus nestloop rows estimate in 8.4.7
Дата
Msg-id CABRT9RARmcPq-2_Q84xXcp7JCND0SdkGBwRB4t1oX5rdXTZNKA@mail.gmail.com
обсуждение исходный текст
Ответы Re: Bogus nestloop rows estimate in 8.4.7  (Marti Raudsepp <marti@juffo.org>)
Re: Bogus nestloop rows estimate in 8.4.7  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Hi list,

This bug isn't causing me any immediate problems -- the plan works out
well regardless -- but PostgreSQL 8.4.7 is somehow overestimating the
number of rows coming from a nestloop join, when joining 2 large
partitioned tables. Maybe it's been fixed in more recent versions,
sadly it's an EOL Linux distro and I have no immediate plans to
upgrade.

It's estimating to join 135957 x 281 rows, but the product is somehow
2946151270877

In reality, it's joining 132577 x ~1 rows to get 133116 results

QUERY PLAN
 GroupAggregate  (cost=852067259163.57..977278688175.85
rows=2946151270877 width=36)
   ->  Sort  (cost=852067259163.57..859432637340.77 rows=2946151270877 width=36)
         Sort Key: b.banner_id, b.client_body_id,
(COALESCE(b.partner_body_id, a.partner_body_id)), b.space_id,
b.campaign_id, a.evt_type_id
         ->  Nested Loop  (cost=0.00..213859871.55 rows=2946151270877 width=36)
               Join Filter: (a.request_id = b.request_id)
               ->  Append  (cost=0.00..5905.69 rows=135957 width=20)
                     ->  Index Scan using "XIF01request" on request a
(cost=0.00..8.27 rows=1 width=20)
                           Index Cond: ((request_time >= '2012-05-28
09:00:00'::timestamp without time zone) AND (request_time <
'2012-05-28 10:00:00'::timestamp without time zone))
                     ->  Index Scan using "XIF01request_1222" on
request_1222 a  (cost=0.00..5897.42 rows=135956 width=20)
                           Index Cond: ((request_time >= '2012-05-28
09:00:00'::timestamp without time zone) AND (request_time <
'2012-05-28 10:00:00'::timestamp without time zone))
               ->  Append  (cost=0.00..1569.44 rows=281 width=32)
                     ->  Seq Scan on request_data b  (cost=0.00..11.30
rows=130 width=32)
                     ->  Index Scan using
"IX_relationship64_request_d_c_1150" on request_d_c_1150 b
(cost=0.00..9.56 rows=2 width=32)
                           Index Cond: (b.request_id = a.request_id)
*snip lots of partition index scans*

Query:
SELECT '2012-05-28T09:00:00', count(*),
uniq(sort(array_agg(visitor_id))), banner_id, client_body_id,
partner_body_id, space_id, campaign_id, evt_type_id FROM stats_request
WHERE stats_request.request_time >= '2012-05-28T09:00:00' AND
stats_request.request_time < (timestamp '2012-05-28T09:00:00' +
interval E'1 hour')::timestamp
GROUP BY banner_id, client_body_id, partner_body_id, space_id,
campaign_id, evt_type_id ORDER BY banner_id, client_body_id,
partner_body_id, space_id, campaign_id, evt_type_id;

Full EXPLAIN ANALYZE is attached.

Regards,
Marti

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Sandro Santilli
Дата:
Сообщение: Re: Interrupting long external library calls
Следующее
От: Marti Raudsepp
Дата:
Сообщение: Re: Bogus nestloop rows estimate in 8.4.7