[PERFORM] self join estimate and constraint exclusion

Поиск
Список
Период
Сортировка
От	Justin Pryzby
Тема	[PERFORM] self join estimate and constraint exclusion
Дата	15 апреля 2017 г. 03:23:22
Msg-id	20170415002322.GA24216@telsasoft.com обсуждение исходный текст
Ответы	[PERFORM] join estimate of subqueries with range conditions and constraintexclusion
Список	pgsql-performance
Дерево обсуждения
I mailed last month [0] but didn't see any reponse .. (if I'm being naive,
daft, or missing something simple, please just say so).

[0] https://www.postgresql.org/message-id/20170326193344.GS31628%40telsasoft.com

It seems when self (inner/equi) joining there's two bad alternatives: either
specify a where clause for each self-joined table and incur poor estimate and
plan, due to incorrect perceived independence of clauses, even though joined
column(s) could/ought to be known equal; or, specify where clause only once,
and incur cost of joining across all partitions, due to no contraint exclusion
on one (or more) self-joined table heirarchy/s.

-- Specify WHERE for each table causes bad underestimate:
|ts=# explain analyze SELECT * FROM eric_enodeb_metrics a JOIN eric_enodeb_metrics b USING (start_time, site_id) WHERE
a.start_time>='2017-03-19'AND a.start_time<'2017-03-20' AND b.start_time>='2017-03-19' AND b.start_time<'2017-03-20'; 
| Hash Join  (cost=7310.80..14680.86 rows=14 width=1436) (actual time=33.053..73.180 rows=7869 loops=1)
|   Hash Cond: ((a.start_time = b.start_time) AND (a.site_id = b.site_id))
|   ->  Append  (cost=0.00..7192.56 rows=7883 width=723) (actual time=1.394..19.414 rows=7869 loops=1)
|         ->  Seq Scan on eric_enodeb_metrics a  (cost=0.00..0.00 rows=1 width=718) (actual time=0.003..0.003 rows=0
loops=1)
|               Filter: ((start_time >= '2017-03-19 00:00:00-04'::timestamp with time zone) AND (start_time <
'2017-03-2000:00:00-04'::timestamp with time zone)) 
|         ->  Bitmap Heap Scan on eric_enodeb_201703 a_1  (cost=605.34..7192.56 rows=7882 width=723) (actual
time=1.390..14.536rows=7869 loops=1) 
|               Recheck Cond: ((start_time >= '2017-03-19 00:00:00-04'::timestamp with time zone) AND (start_time <
'2017-03-2000:00:00-04'::timestamp with time zone)) 
|               Heap Blocks: exact=247
|               ->  Bitmap Index Scan on eric_enodeb_201703_unique_idx  (cost=0.00..603.37 rows=7882 width=0) (actual
time=1.351..1.351rows=7869 loops=1) 
|                     Index Cond: ((start_time >= '2017-03-19 00:00:00-04'::timestamp with time zone) AND (start_time <
'2017-03-2000:00:00-04'::timestamp with time zone)) 
|   ->  Hash  (cost=7192.56..7192.56 rows=7883 width=723) (actual time=31.620..31.620 rows=7869 loops=1)
|         Buckets: 8192  Batches: 1  Memory Usage: 1986kB
|         ->  Append  (cost=0.00..7192.56 rows=7883 width=723) (actual time=0.902..19.543 rows=7869 loops=1)
|               ->  Seq Scan on eric_enodeb_metrics b  (cost=0.00..0.00 rows=1 width=718) (actual time=0.002..0.002
rows=0loops=1) 
|                     Filter: ((start_time >= '2017-03-19 00:00:00-04'::timestamp with time zone) AND (start_time <
'2017-03-2000:00:00-04'::timestamp with time zone)) 
|               ->  Bitmap Heap Scan on eric_enodeb_201703 b_1  (cost=605.34..7192.56 rows=7882 width=723) (actual
time=0.899..14.353rows=7869 loops=1) 
|                     Recheck Cond: ((start_time >= '2017-03-19 00:00:00-04'::timestamp with time zone) AND (start_time
<'2017-03-20 00:00:00-04'::timestamp with time zone)) 
|                     Heap Blocks: exact=247
|                     ->  Bitmap Index Scan on eric_enodeb_201703_unique_idx  (cost=0.00..603.37 rows=7882 width=0)
(actualtime=0.867..0.867 rows=7869 loops=1) 
|                           Index Cond: ((start_time >= '2017-03-19 00:00:00-04'::timestamp with time zone) AND
(start_time< '2017-03-20 00:00:00-04'::timestamp with time zone)) 



-- Specify WHERE once gets good estimate, but with unnecessary scan of all child partitions:
|ts=# explain analyze SELECT * FROM eric_enodeb_metrics a JOIN eric_enodeb_metrics b USING (start_time, site_id) WHERE
start_time>='2017-03-19'AND start_time<'2017-03-20'; 
| Gather  (cost=8310.80..316545.60 rows=9591 width=1427) (actual time=9012.967..9073.539 rows=7869 loops=1)
|   Workers Planned: 3
|   Workers Launched: 3
|   ->  Hash Join  (cost=7310.80..314586.50 rows=3094 width=1427) (actual time=8892.121..8937.245 rows=1967 loops=4)
|         Hash Cond: ((b.start_time = a.start_time) AND (b.site_id = a.site_id))
|         ->  Append  (cost=0.00..261886.54 rows=2015655 width=714) (actual time=11.464..8214.063 rows=1308903 loops=4)
|               ->  Parallel Seq Scan on eric_enodeb_metrics b  (cost=0.00..0.00 rows=1 width=718) (actual
time=0.001..0.001rows=0 loops=4) 
|               ->  Parallel Seq Scan on eric_enodeb_201510 b_1  (cost=0.00..10954.43 rows=60343 width=707) (actual
time=11.460..258.852rows=46766 loops=4) 
|               ->  Parallel Seq Scan on eric_enodeb_201511 b_2  (cost=0.00..10310.91 rows=56891 width=707) (actual
time=18.395..237.841rows=44091 loops=4) 
|[...]
|               ->  Parallel Seq Scan on eric_enodeb_201703 b_29  (cost=0.00..6959.75 rows=81875 width=723) (actual
time=0.017..101.969rows=49127 loops=4) 
|         ->  Hash  (cost=7192.56..7192.56 rows=7883 width=723) (actual time=51.843..51.843 rows=7869 loops=4)
|               Buckets: 8192  Batches: 1  Memory Usage: 1970kB
|               ->  Append  (cost=0.00..7192.56 rows=7883 width=723) (actual time=2.558..27.829 rows=7869 loops=4)
|                     ->  Seq Scan on eric_enodeb_metrics a  (cost=0.00..0.00 rows=1 width=718) (actual
time=0.014..0.014rows=0 loops=4) 
|                           Filter: ((start_time >= '2017-03-19 00:00:00-04'::timestamp with time zone) AND (start_time
<'2017-03-20 00:00:00-04'::timestamp with time zone)) 
|                     ->  Bitmap Heap Scan on eric_enodeb_201703 a_1  (cost=605.34..7192.56 rows=7882 width=723)
(actualtime=2.542..17.305 rows=7869 loops=4) 
|                           Recheck Cond: ((start_time >= '2017-03-19 00:00:00-04'::timestamp with time zone) AND
(start_time< '2017-03-20 00:00:00-04'::timestamp with time zone)) 
|                           Heap Blocks: exact=247
|                           ->  Bitmap Index Scan on eric_enodeb_201703_unique_idx  (cost=0.00..603.37 rows=7882
width=0)(actual time=2.494..2.494 rows=7869 loops=4) 
|                                 Index Cond: ((start_time >= '2017-03-19 00:00:00-04'::timestamp with time zone) AND
(start_time< '2017-03-20 00:00:00-04'::timestamp with time zone)) 


Minor variations have same problems;
-- Scans all partitions:
ts=# explain analyze SELECT * FROM (SELECT * FROM eric_enodeb_metrics a) t1 JOIN (SELECT * FROM eric_enodeb_metrics b
WHEREstart_time>='2017-03-19 23:00:00' AND start_time<'2017-03-20') t2 USING (start_time, site_id); 

-- Underestimtes due to perceived independence of clause:
|ts=# explain analyze SELECT * FROM (SELECT * FROM eric_enodeb_metrics a WHERE start_time>='2017-03-19' AND
start_time<'2017-03-20')t1 JOIN (SELECT * FROM eric_enodeb_metrics b WHERE start_time>='2017-03-19' AND
start_time<'2017-03-20')t2 USING (start_time, site_id); 
| Hash Join  (cost=7308.59..14676.41 rows=14 width=1436) (actual time=30.352..64.004 rows=7869 loops=1)

Thank you in advance for your any response.

Justin
В списке pgsql-performance по дате отправления:
Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

[PERFORM] self join estimate and constraint exclusion