Re: plan_rows confusion with parallel queries
От | Tomas Vondra |
---|---|
Тема | Re: plan_rows confusion with parallel queries |
Дата | |
Msg-id | 558a40b8-40c1-87aa-39d2-93b9fa2a3209@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: plan_rows confusion with parallel queries (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: plan_rows confusion with parallel queries
|
Список | pgsql-hackers |
On 11/02/2016 09:00 PM, Tom Lane wrote: > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> while eye-balling some explain plans for parallel queries, I got a bit >> confused by the row count estimates. I wonder whether I'm alone. > > I got confused by that a minute ago, so no you're not alone. The problem > is even worse in join cases. For example: > > Gather (cost=34332.00..53265.35 rows=100 width=8) > Workers Planned: 2 > -> Hash Join (cost=33332.00..52255.35 rows=100 width=8) > Hash Cond: ((pp.f1 = cc.f1) AND (pp.f2 = cc.f2)) > -> Append (cost=0.00..8614.96 rows=417996 width=8) > -> Parallel Seq Scan on pp (cost=0.00..8591.67 rows=416667 widt > h=8) > -> Parallel Seq Scan on pp1 (cost=0.00..23.29 rows=1329 width=8 > ) > -> Hash (cost=14425.00..14425.00 rows=1000000 width=8) > -> Seq Scan on cc (cost=0.00..14425.00 rows=1000000 width=8) > > There are actually 1000000 rows in pp, and none in pp1. I'm not bothered > particularly by the nonzero estimate for pp1, because I know where that > came from, but I'm not very happy that nowhere here does it look like > it's estimating a million-plus rows going into the join. > Yeah. I wonder how tools visualizing explain plans are going to compute time spent in a given node (i.e. excluding the time spent in child nodes), or expected cost of that node. So far we could do something like self_time = total_time - child_node_time * nloops and example in this plan it's pretty clear we spend ~130ms in Aggregate: QUERY PLAN ---------------------------------------------------------------------------- Aggregate (cost=17140.50..17140.51 rows=1 width=8) (actual time=306.675..306.675 rows=1 loops=1) -> Seq Scan on tables (cost=0.00..16347.60 rows=317160width=0) (actual time=0.188..170.993 rows=317160 loops=1) Planning time: 0.201 ms Executiontime: 306.860 ms (4 rows) But in parallel plans it can easily happen that child_node_time * nloops > total_time Consider for example this parallel plan: QUERY PLAN ---------------------------------------------------------------------------- Finalize Aggregate (cost=15455.19..15455.20rows=1 width=8) (actual time=107.636..107.636 rows=1 loops=1) -> Gather (cost=15454.87..15455.18rows=3 width=8) (actual time=107.579..107.629 rows=4 loops=1) Workers Planned:3 Workers Launched: 3 -> Partial Aggregate (cost=14454.87..14454.88 rows=1 ...) (actual time=103.895..103.895 rows=1 loops=4) -> Parallel Seq Scan on tables (cost=0.00..14199.10 rows=102310 width=0) (actual time=0.059..59.217 rows=79290 loops=4) Planning time:0.052 ms Execution time: 109.250 ms (8 rows) Reading explains for parallel plans will always be complicated, but perhaps overloading the nloops like this makes it more complicated? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: