Re: Parallel Seq Scan

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Parallel Seq Scan
Дата
Msg-id CAA4eK1LFR8sR9viUpLPMKRqUVcRhEFDjSz1019rpwgjYftrXeQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Parallel Seq Scan  (Thom Brown <thom@linux.com>)
Ответы Re: Parallel Seq Scan  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Wed, Mar 25, 2015 at 7:09 PM, Thom Brown <thom@linux.com> wrote:
>
> On 25 March 2015 at 11:46, Thom Brown <thom@linux.com> wrote:
>>
>>
>> Still not sure why 8 workers are needed for each partial scan.  I would expect 8 workers to be used for 8 separate scans.  Perhaps this is just my misunderstanding of how this feature works.
>
>
> Another issue:
>
> SELECT * FROM pgb<tab>
>
> *crash*
>

The reason of this problem is that above tab-completion is executing
query [1] which contains subplan for the funnel node and currently
we don't have capability (enough infrastructure) to support execution
of subplans by parallel workers.  Here one might wonder why we
have choosen Parallel Plan (Funnel node) for such a case and the
reason for same is that subplans are attached after Plan generation
(SS_finalize_plan()) and if want to discard such a plan, it will be
much more costly, tedious and not worth the effort as we have to
eventually make such a plan work.

Here we have two choices to proceed, first one is to support execution
of subplans by parallel workers and second is execute/scan locally for
Funnel node having subplan (don't launch workers).

I have tried to evaluate what it would take us to support execution
of subplans by parallel workers.  We need to pass the sub plans
stored in Funnel Node (initPlan) and corresponding subplans stored
in planned statement (subplans)  as subplan's stored in Funnel node
has reference to subplans in planned statement.  Next currently
readfuncs.c (functions to read different type of nodes) doesn't support
reading any type of plan node, so we need to add support for reading all kind
of plan nodes (as subplan can have any type of plan node) and similarly
to execute any type of Plan node, we might need more work (infrastructure).

Currently I have updated the patch to use second approach which
is to execute/scan locally for Funnel node having subplan.

I understand that it is quite interesting if we can have support for
execution of subplans (un-correlated expression subselects) by
parallel workers, but I feel it is better done as a separate patch. 



[1] - 
SELECT pg_catalog.quote_ident(c.relname) FROM pg_catalog.pg_class c WHERE c.relkind IN ('r', 'S', 'v', 'm', 
'f') AND substring(pg_catalog.quote_ident(c.relname),1,3)='pgb' AND pg_catalog.pg_table_is_visible(c.oid) AND 
c.relnamespace <> (SELECT oid FROM pg_catalog.pg_namespace WHERE nspname = 'pg_catalog') UNION SELECT 
pg_catalog.quote_ident(n.nspname) || '.' FROM pg_catalog.pg_namespace n WHERE substring
(pg_catalog.quote_ident(n.nspname) || '.',1,3)='pgb' AND (SELECT pg_catalog.count(*) FROM 
pg_catalog.pg_namespace WHERE substring(pg_catalog.quote_ident(nspname) || '.',1,3) = substring
('pgb',1,pg_catalog.length(pg_catalog.quote_ident(nspname))+1)) > 1 UNION SELECT pg_catalog.quote_ident
(n.nspname) || '.' || pg_catalog.quote_ident(c.relname) FROM pg_catalog.pg_class c, pg_catalog.pg_namespace n 
WHERE c.relnamespace = n.oid AND c.relkind IN ('r', 'S', 'v', 'm', 'f') AND substring(pg_catalog.quote_ident
(n.nspname) || '.' || pg_catalog.quote_ident(c.relname),1,3)='pgb' AND substring(pg_catalog.quote_ident
(n.nspname) || '.',1,3) = substring('pgb',1,pg_catalog.length(pg_catalog.quote_ident(n.nspname))+1) AND 
(SELECT pg_catalog.count(*) FROM pg_catalog.pg_namespace WHERE substring(pg_catalog.quote_ident(nspname) || 
'.',1,3) = substring('pgb',1,pg_catalog.length(pg_catalog.quote_ident(nspname))+1)) = 1 LIMIT 1000;

Query Plan
--------------------------
                  QUERY PLAN

--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
----------------------------------------------
 Limit  (cost=10715.89..10715.92 rows=3 width=85)
   ->  HashAggregate  (cost=10715.89..10715.92 rows=3 width=85)
         Group Key: (quote_ident((c.relname)::text))
         ->  Append  (cost=8.15..10715.88 rows=3 width=85)
               ->  Funnel on pg_class c  (cost=8.15..9610.67 rows=1 width=64)
                     Filter: ((relnamespace <> $4) AND (relkind = ANY ('{r,S,v,m
,f}'::"char"[])) AND ("substring"(quote_ident((relname)::text), 1, 3) = 'pgb'::t
ext) AND pg_table_is_visible(oid))
                     Number of Workers: 1
                     InitPlan 3 (returns $4)
                       ->  Index Scan using pg_namespace_nspname_index on pg_nam
espace pg_namespace_2  (cost=0.13..8.15 rows=1 width=4)
                             Index Cond: (nspname = 'pg_catalog'::name)
                     ->  Partial Seq Scan on pg_class c  (cost=0.00..19043.43 ro
ws=1 width=64)
                           Filter: ((relnamespace <> $4) AND (relkind = ANY ('{r
,S,v,m,f}'::"char"[])) AND ("substring"(quote_ident((relname)::text), 1, 3) = 'p
gb'::text) AND pg_table_is_visible(oid))
               ->  Result  (cost=8.52..16.69 rows=1 width=64)
                     One-Time Filter: ($3 > 1)
                     InitPlan 2 (returns $3)
                       ->  Aggregate  (cost=8.37..8.38 rows=1 width=0)
                             ->  Index Only Scan using pg_namespace_nspname_inde
x on pg_namespace pg_namespace_1  (cost=0.13..8.37 rows=1 width=0)
                                   Filter: ("substring"((quote_ident((nspname)::
text) || '.'::text), 1, 3) = "substring"('pgb'::text, 1, (length(quote_ident((ns
pname)::text)) + 1)))
                     ->  Index Only Scan using pg_namespace_nspname_index on pg_
namespace n  (cost=0.13..8.30 rows=1 width=64)
                           Filter: ("substring"((quote_ident((nspname)::text) ||
 '.'::text), 1, 3) = 'pgb'::text)
               ->  Result  (cost=8.79..1088.49 rows=1 width=128)
                     One-Time Filter: ($0 = 1)
                     InitPlan 1 (returns $0)
                       ->  Aggregate  (cost=8.37..8.38 rows=1 width=0)
                             ->  Index Only Scan using pg_namespace_nspname_inde
x on pg_namespace  (cost=0.13..8.37 rows=1 width=0)
                                   Filter: ("substring"((quote_ident((nspname)::
text) || '.'::text), 1, 3) = "substring"('pgb'::text, 1, (length(quote_ident((ns
pname)::text)) + 1)))
                     ->  Nested Loop  (cost=0.41..1080.09 rows=1 width=128)
                           ->  Index Scan using pg_namespace_oid_index on pg_nam
espace n_1  (cost=0.13..12.37 rows=1 width=68)
                                 Filter: ("substring"((quote_ident((nspname)::te
xt) || '.'::text), 1, 3) = "substring"('pgb'::text, 1, (length(quote_ident((nspn
ame)::text)) + 1)))
                           ->  Index Scan using pg_class_relname_nsp_index on pg
_class c_1  (cost=0.28..1067.71 rows=1 width=68)
                                 Index Cond: (relnamespace = n_1.oid)
                                 Filter: ((relkind = ANY ('{r,S,v,m,f}'::"char"[
])) AND ("substring"(((quote_ident((n_1.nspname)::text) || '.'::text) || quote_i
dent((relname)::text)), 1, 3) = 'pgb'::text))
(32 rows)

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ashutosh Bapat
Дата:
Сообщение: Re: trying to study how sorting works
Следующее
От: Ravi Kiran
Дата:
Сообщение: Implementing a join algorithm in Postgres