Parallel Query - Can it be used within functions?

Поиск
Список
Период
Сортировка
От Michael Krüger
Тема Parallel Query - Can it be used within functions?
Дата
Msg-id CACSnzzVDSP3WvoxByFNiEL2RkCdKfqhmoaLDeYdMNXH5izW1XQ@mail.gmail.com
обсуждение исходный текст
Ответы Re: Parallel Query - Can it be used within functions?  (Andreas Kretschmer <andreas@a-kretschmer.de>)
Список pgsql-general
Dear community,

I need a bit of advice on parallel query execution.
I have a table with roughly 2 million rows in it. These rows are distributed over a set of IDs. Therefore I can group my rows based on the ID set.

If I query the table directly like this:

select mediatrunkid,count(*)::numeric from reports.mediatrunkkpi where mediatrunkid in (2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028) group by 1;
 mediatrunkid | count
--------------+-------
       409612 | 11862
       409613 | 11862
      2101260 | 11860
      2101261 | 11882
      2101263 | 11809
      2109452 | 11751
      2117644 | 11833
      2125836 | 11832
      2125839 | 11852
      2125841 | 11882
      2129932 | 11882
      2129933 | 11854
      2134028 | 11718
      2138124 | 11850
      2142220 | 11861
      2154508 | 11825
      2187276 | 11826
(17 rows)

Time: 44,056 ms

I see from the query plan that it executed nicely in parallel using two workers:
2018-02-06 08:18:47.381 CET [18898] LOG:  duration: 43.072 ms  plan:
Query Text: select mediatrunkid,count(*)::numeric from reports.mediatrunkkpi where mediatrunkid in (2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028) group by 1;
Finalize GroupAggregate  (cost=57784.41..57792.66 rows=300 width=40) (actual time=40.583..40.596 rows=17 loops=1)
  Group Key: mediatrunkid
  Buffers: shared hit=3076
  ->  Sort  (cost=57784.41..57785.91 rows=600 width=16) (actual time=40.578..40.582 rows=51 loops=1)
        Sort Key: mediatrunkid
        Sort Method: quicksort  Memory: 27kB
        Buffers: shared hit=3076
        ->  Gather  (cost=57693.72..57756.72 rows=600 width=16) (actual time=40.534..40.561 rows=51 loops=1)
              Workers Planned: 2
              Workers Launched: 2
              Buffers: shared hit=3076
              ->  Partial HashAggregate  (cost=56693.72..56696.72 rows=300 width=16) (actual time=33.901..33.909 rows=17 loops=3)
                    Group Key: mediatrunkid
                    Buffers: shared hit=6006
                    ->  Parallel Bitmap Heap Scan on mediatrunkkpi  (cost=4525.01..56279.28 rows=82889 width=8) (actual time=4.575..23.145 rows=67080 loops=3)
                          Recheck Cond: (mediatrunkid = ANY ('{2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028}'::bigint[]))
                          Heap Blocks: exact=2253
                          Buffers: shared hit=6006
                          ->  Bitmap Index Scan on idx_mediatrunkkpi_trunk  (cost=0.00..4475.27 rows=198933 width=0) (actual time=10.057..10.057 rows=201241 loops=1)
                                Index Cond: (mediatrunkid = ANY ('{2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028}'::bigint[]))
                                Buffers: shared hit=823

But when I do the very same within a function:

create or replace function reports.generic_query(_sql text)
  RETURNS SETOF record
  LANGUAGE 'plpgsql'
  PARALLEL SAFE
  COST 100
  SET "TimeZone"='utc'
  STABLE
  ROWS 10000
AS $BODY$
DECLARE
  --
BEGIN
  RETURN QUERY EXECUTE _sql;
END
$BODY$;

select * from reports.generic_query($$ select mediatrunkid,count(*)::numeric from reports.mediatrunkkpi where mediatrunkid in (2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028) group by 1 $$) as foo (trunkid bigint, count numeric);
 trunkid | count
---------+-------
 2109452 | 11751
  409613 | 11862
 2125839 | 11852
 2125841 | 11882
 2142220 | 11861
 2117644 | 11833
 2154508 | 11825
 2134028 | 11718
 2101263 | 11809
 2101261 | 11882
 2129933 | 11854
 2129932 | 11882
 2125836 | 11832
 2138124 | 11850
  409612 | 11862
 2187276 | 11826
 2101260 | 11860
(17 rows)

Time: 86,275 ms

the execution time almost doubles, and the query no longer is considered for parallel execution:

2018-02-06 08:20:30.553 CET [18898] LOG:  duration: 85.358 ms  plan:
Query Text:  select mediatrunkid,count(*)::numeric from reports.mediatrunkkpi where mediatrunkid in (2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028) group by 1
HashAggregate  (cost=60900.33..60904.08 rows=300 width=40) (actual time=85.341..85.348 rows=17 loops=1)
  Group Key: mediatrunkid
  Buffers: shared hit=6006
  ->  Bitmap Heap Scan on mediatrunkkpi  (cost=4525.01..59905.66 rows=198933 width=8) (actual time=9.084..39.655 rows=201241 loops=1)
        Recheck Cond: (mediatrunkid = ANY ('{2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028}'::bigint[]))
        Heap Blocks: exact=5183
        Buffers: shared hit=6006
        ->  Bitmap Index Scan on idx_mediatrunkkpi_trunk  (cost=0.00..4475.27 rows=198933 width=0) (actual time=8.386..8.386 rows=201241 loops=1)
              Index Cond: (mediatrunkid = ANY ('{2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028}'::bigint[]))
              Buffers: shared hit=823
2018-02-06 08:20:30.553 CET [18898] CONTEXT:  PL/pgSQL function reports.generic_query(text) line 5 at RETURN QUERY
2018-02-06 08:20:30.553 CET [18898] LOG:  duration: 85.986 ms  plan:
Query Text: select * from reports.generic_query($$ select mediatrunkid,count(*)::numeric from reports.mediatrunkkpi where mediatrunkid in (2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028) group by 1 $$) as foo (trunkid bigint, count numeric);
Function Scan on generic_query foo  (cost=0.25..100.25 rows=10000 width=40) (actual time=85.968..85.970 rows=17 loops=1)
  Buffers: shared hit=6006

This whole topic around parallel execution is fairly new, but from the documentation I would assume that functions declared as parallel safe, will really be used in parallel. And even if the function itself is not invoked in parallel (maybe does not even make sense here), the function body for sure should run in parallel if I'm not mistaken.

So what do I obviously do wrong here?

Regards,
Michael

В списке pgsql-general по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Alter table set logged hanging after writing out all WAL
Следующее
От: Andreas Kretschmer
Дата:
Сообщение: Re: Parallel Query - Can it be used within functions?