[GENERAL] Cannot spot the difference in two queries

Поиск
Список
Период
Сортировка
От Rob Sargent
Тема [GENERAL] Cannot spot the difference in two queries
Дата
Msg-id 4563a265-502b-7664-5647-ab83fd75885d@gmail.com
обсуждение исходный текст
Список pgsql-general
I have three very similar functions, two of which are fast and one is slow and I cannot explain the slowness of the third. All of which, with the correct arguments should return the same "optimal set". I present one of the two fast versions and the slow one, in whole and in part. I'm using postgres 9.6 on ubuntu 16.4 desktop (16G memory).

I'm confused because the bulk of the difference in the sql is in the "with" clause and those are equally quick and return the same dataset, and to my eye the processing after that is the same as sql but clearly not in the server.  I'm missing something obvious I'm sure but whatever it is is causing a nasty Cartesian then sorting that out.

(And I get that mm.markerset_id is not needed in the where clause)

Any insights appreciated.
Thanks,
rjs

I have not included table defs, thinking that since they're the same for both queries ...

Fast function
create or replace function seg.optimal_pvalue_set(people_name text, markers_name text, chr int)
returns table (segment_id uuid, optval numeric, firstbase int) as
$$
declare
  mkset uuid;
  rcount int;
begin
  select id into mkset from seg.markerset where name = markers_name and chrom = chr;
--
  create temp table optmarkers on commit drop as
  with segset as (
      select s.id
             , s.chrom
             , s.markerset_id
             , s.startbase
             , s.endbase
             , ((s.events_equal + s.events_greater)/(1.0 * (s.events_less + s.events_equal + s.events_greater))) as pval
      from seg.segment s
           join seg.probandset i on s.probandset_id = i.id
           join (select people_id, array_agg(person_id) as persons
                 from seg.people_member
                 group by people_id) as pa on i.probands <@ pa.persons
           join seg.people o on pa.people_id = o.id
      where
           s.markerset_id = mkset
           and o.name = people_name
  )
  select m.id as mkrid
         , min(ss.pval) as optval
  from segset ss
       join seg.markerset_member mm on ss.markerset_id = mm.markerset_id
       join seg.marker m on mm.member_id = m.id
  where
       m.basepos between ss.startbase and ss.endbase
       and m.chrom = ss.chrom
       and mm.markerset_id = mkset   -- 'b474655c-80d2-47e7-bcb5-c65245195888'
  group by m.id;
--
  get diagnostics rcount = ROW_COUNT;
  raise notice '% segments to optimize', rcount;
--
  return query
  select s.id as segment_id, o.optval, min(m.basepos) as firstbase
  from optmarkers o
       join seg.marker m on o.mkrid = m.id
       join seg.markerset_member mm on m.id = mm.member_id
       join seg.segment s on mm.markerset_id = s.markerset_id
  where mm.markerset_id = mkset
        and m.basepos between s.startbase and s.endbase
        and ((s.events_equal + s.events_greater)/(1.0 * (s.events_less + s.events_equal + s.events_greater))) = o.optval
  group by s.id, o.optval
  order by firstbase;
end;
$$ language plpgsql;

-- timing the CTE base
--with, the first
      select count(s.id)
             -- s.id
             -- , s.chrom
             -- , s.markerset_id
             -- , s.startbase
             -- , s.endbase
             -- , ((s.events_equal + s.events_greater)/(1.0 * (s.events_less + s.events_equal + s.events_greater))) as pval
      from seg.segment s
           join seg.probandset i on s.probandset_id = i.id
           join (select people_id, array_agg(person_id) as persons
                 from seg.people_member
                 group by people_id) as pa on i.probands <@ pa.persons
           join seg.people o on pa.people_id = o.id
      where
           s.markerset_id = 'ed3b4817-1739-4727-9fac-35d1d63071ea' --mkset
           and o.name = '1151704'; -- people_name
 count
-------
 30762
(1 row)

--explain analyze of Full CTE portion (that which constructs the temp table in prep for final query)
--manually supplying the needed ids and such.
                                                                                                                                                               QUERY PLAN                                                                                                                                                                   
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=15412.11..15413.24 rows=65 width=48) (actual time=5462.143..6042.901 rows=52196 loops=1)
   Group Key: m.id
   CTE segset
     ->  Nested Loop  (cost=4820.07..6024.99 rows=310 width=76) (actual time=11.854..449.869 rows=30762 loops=1)
           ->  Seq Scan on probandset i  (cost=0.00..21.27 rows=1 width=16) (actual time=0.007..0.286 rows=57 loops=1)
                 Filter: ((probands <@ '{1bfa3037-bf65-483b-9ac5-f7a57f9e48a6,69f339e3-8ef4-440d-82ee-76627caf8e8b,7a2150d9-cd9c-454a-92d4-4b5cda85258d,7b42b108-8c17-4261-a4d1-3ed1f5da892d,b56886d3-981e-4872-847f-ce019fe70a87,c7f46c7d-0223-4b89-8093-771b5d6964d8}'::uuid[]) AND (people_id = '10c8d51d-c413-4943-90f7-9e0d2dd8c8c4'::uuid))
                 Rows Removed by Filter: 494
           ->  Bitmap Heap Scan on segment s  (cost=4820.07..5995.20 rows=310 width=84) (actual time=7.438..7.493 rows=540 loops=57)
                 Recheck Cond: ((probandset_id = i.id) AND (markerset_id = 'ed3b4817-1739-4727-9fac-35d1d63071ea'::uuid))
                 Heap Blocks: exact=646
                 ->  BitmapAnd  (cost=4820.07..4820.07 rows=310 width=0) (actual time=7.434..7.434 rows=0 loops=57)
                       ->  Bitmap Index Scan on useg  (cost=0.00..283.82 rows=5235 width=0) (actual time=0.357..0.357 rows=6470 loops=57)
                             Index Cond: (probandset_id = i.id)
                       ->  Bitmap Index Scan on segment_markerset_id_idx  (cost=0.00..4493.24 rows=170774 width=0) (actual time=7.064..7.064 rows=150550 loops=57)
                             Index Cond: (markerset_id = 'ed3b4817-1739-4727-9fac-35d1d63071ea'::uuid)
   ->  Sort  (cost=9387.11..9387.28 rows=65 width=48) (actual time=5462.122..5818.158 rows=1780618 loops=1)
         Sort Key: m.id
         Sort Method: external merge  Disk: 64168kB
         ->  Nested Loop  (cost=105.76..9385.16 rows=65 width=48) (actual time=11.884..4648.098 rows=1780618 loops=1)
               ->  Nested Loop  (cost=105.34..9013.49 rows=706 width=64) (actual time=11.875..944.708 rows=1796463 loops=1)
                     ->  CTE Scan on segset ss  (cost=0.00..6.97 rows=2 width=60) (actual time=11.857..461.254 rows=30762 loops=1)
                           Filter: (markerset_id = 'ed3b4817-1739-4727-9fac-35d1d63071ea'::uuid)
                     ->  Bitmap Heap Scan on marker m  (cost=105.34..4471.17 rows=3209 width=24) (actual time=0.007..0.011 rows=58 loops=30762)
                           Recheck Cond: ((chrom = ss.chrom) AND (basepos >= ss.startbase) AND (basepos <= ss.endbase))
                           Heap Blocks: exact=54458
                           ->  Bitmap Index Scan on marker_chrom_basepos_idx  (cost=0.00..104.54 rows=3209 width=0) (actual time=0.005..0.005 rows=58 loops=30762)
                                 Index Cond: ((chrom = ss.chrom) AND (basepos >= ss.startbase) AND (basepos <= ss.endbase))
               ->  Index Scan using markerset_member_member_id_idx on markerset_member mm  (cost=0.42..0.52 rows=1 width=32) (actual time=0.002..0.002 rows=1 loops=1796463)
                     Index Cond: (member_id = m.id)
                     Filter: (markerset_id = 'ed3b4817-1739-4727-9fac-35d1d63071ea'::uuid)
                     Rows Removed by Filter: 0
 Planning time: 0.478 ms
 Execution time: 6052.463 ms
(33 rows)

Time: 6053.983 ms

Slow Function
-- For an (potentially) arbitrary of grouping of probandsets
create or replace function seg.optimal_pvalue_set_group(pbgroup_id uuid, markers_name text, people_name text, chr int)
returns table (segment_id uuid, optval numeric, firstbase int) as
$$
declare
  mkset uuid;
  rcount int;
  peop_id uuid;
begin
  select id into mkset from seg.markerset where name = markers_name and chrom = chr;
  select id into peop_id from seg.people where name = people_name;
  raise notice 'marker set id is %; people id is % (%)', mkset, peop_id, clock_timestamp();
--
  create temp table optmarkers on commit drop as
  with segset as (
      select s.id
             , s.chrom
             , s.markerset_id
             , s.startbase
             , s.endbase
             , ((s.events_equal + s.events_greater)/(1.0 * (s.events_less + s.events_equal + s.events_greater))) as pval
      from seg.segment s
           join seg.probandset i on s.probandset_id = i.id
           join seg.probandset_group_member pgm on s.probandset_id = pgm.member_id
      where
           s.markerset_id = mkset
           and i.people_id = peop_id
           and pgm.group_id = pbgroup_id
  )
  select m.id as mkrid
         , min(ss.pval) as optval
  from segset ss
       join seg.markerset_member mm on ss.markerset_id = mm.markerset_id
       join seg.marker m on mm.member_id = m.id
  where
       m.basepos between ss.startbase and ss.endbase
       and m.chrom = ss.chrom
       and mm.markerset_id = mkset
  group by m.id;
  get diagnostics rcount = ROW_COUNT;
  raise notice 'found % segments to optimize by probandset group id % (%)', rcount, pbgroup_id, clock_timestamp();
--
  return query
  select s.id as segment_id, o.optval, min(m.basepos) as firstbase
  from optmarkers o
       join seg.marker m on o.mkrid = m.id
       join seg.markerset_member mm on m.id = mm.member_id
       join seg.segment s on mm.markerset_id = s.markerset_id
  where mm.markerset_id = mkset
        and m.basepos between s.startbase and s.endbase
        and ((s.events_equal + s.events_greater)/(1.0 * (s.events_less + s.events_equal + s.events_greater))) = o.optval
  group by s.id, o.optval
  order by firstbase;
end;
$$ language plpgsql;

--timing the CTE base
--with, the third
      select count(s.id)
             -- s.id
             -- , s.chrom
             -- , s.markerset_id
             -- , s.startbase
             -- , s.endbase
             -- , ((s.events_equal + s.events_greater)/(1.0 * (s.events_less + s.events_equal + s.events_greater))) as pval
      from seg.segment s
           join seg.probandset i on s.probandset_id = i.id
           join seg.probandset_group_member pgm on s.probandset_id = pgm.member_id
      where
           s.markerset_id = 'ed3b4817-1739-4727-9fac-35d1d63071ea' -- mkset
           and i.people_id = '10c8d51d-c413-4943-90f7-9e0d2dd8c8c4' -- peop_id
           and pgm.group_id = 'e7c01f89-8301-4221-9980-343b119711c5' -- pbgroup_id
;
 count
-------
 30392
(1 row)

Time: 146.692 ms

--explain analyze of Full CTE
                                                                                 QUERY PLAN                                                                                 
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=39420.04..39425.18 rows=294 width=48) (actual time=154302.816..154860.213 rows=52196 loops=1)
   Group Key: m.id
   CTE segset
     ->  Nested Loop  (cost=282.86..23235.89 rows=1795 width=76) (actual time=0.568..82.964 rows=30392 loops=1)
           ->  Hash Join  (cost=11.11..31.27 rows=6 width=32) (actual time=0.035..0.147 rows=56 loops=1)
                 Hash Cond: (i.id = pgm.member_id)
                 ->  Seq Scan on probandset i  (cost=0.00..19.89 rows=57 width=16) (actual time=0.005..0.092 rows=57 loops=1)
                       Filter: (people_id = '10c8d51d-c413-4943-90f7-9e0d2dd8c8c4'::uuid)
                       Rows Removed by Filter: 494
                 ->  Hash  (cost=10.41..10.41 rows=56 width=16) (actual time=0.027..0.027 rows=56 loops=1)
                       Buckets: 1024  Batches: 1  Memory Usage: 11kB
                       ->  Bitmap Heap Scan on probandset_group_member pgm  (cost=4.71..10.41 rows=56 width=16) (actual time=0.013..0.021 rows=56 loops=1)
                             Recheck Cond: (group_id = 'e7c01f89-8301-4221-9980-343b119711c5'::uuid)
                             Heap Blocks: exact=1
                             ->  Bitmap Index Scan on psgm_pkey  (cost=0.00..4.70 rows=56 width=0) (actual time=0.008..0.008 rows=56 loops=1)
                                   Index Cond: (group_id = 'e7c01f89-8301-4221-9980-343b119711c5'::uuid)
           ->  Bitmap Heap Scan on segment s  (cost=271.75..3859.10 rows=310 width=84) (actual time=0.446..1.196 rows=543 loops=56)
                 Recheck Cond: (probandset_id = i.id)
                 Filter: (markerset_id = 'ed3b4817-1739-4727-9fac-35d1d63071ea'::uuid)
                 Rows Removed by Filter: 5974
                 Heap Blocks: exact=8223
                 ->  Bitmap Index Scan on useg  (cost=0.00..271.68 rows=5235 width=0) (actual time=0.434..0.434 rows=6517 loops=56)
                       Index Cond: (probandset_id = i.id)
   ->  Sort  (cost=16184.15..16184.88 rows=294 width=48) (actual time=154302.795..154633.935 rows=1767743 loops=1)
         Sort Key: m.id
         Sort Method: external merge  Disk: 63712kB
         ->  Nested Loop  (cost=40.92..16172.09 rows=294 width=48) (actual time=96.084..153723.675 rows=1767743 loops=1)
               ->  Hash Join  (cost=40.50..14499.61 rows=3177 width=64) (actual time=96.074..151048.722 rows=1783466 loops=1)
                     Hash Cond: (m.chrom = ss.chrom)
                     Join Filter: ((m.basepos >= ss.startbase) AND (m.basepos <= ss.endbase))
                     Rows Removed by Join Filter: 1598476902
                     ->  Seq Scan on marker m  (cost=0.00..11647.63 rows=635363 width=24) (actual time=0.007..39.685 rows=635412 loops=1)
                     ->  Hash  (cost=40.39..40.39 rows=9 width=60) (actual time=96.027..96.027 rows=30392 loops=1)
                           Buckets: 32768 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 2432kB
                           ->  CTE Scan on segset ss  (cost=0.00..40.39 rows=9 width=60) (actual time=0.571..92.319 rows=30392 loops=1)
                                 Filter: (markerset_id = 'ed3b4817-1739-4727-9fac-35d1d63071ea'::uuid)
               ->  Index Scan using markerset_member_member_id_idx on markerset_member mm  (cost=0.42..0.52 rows=1 width=32) (actual time=0.001..0.001 rows=1 loops=1783466)
                     Index Cond: (member_id = m.id)
                     Filter: (markerset_id = 'ed3b4817-1739-4727-9fac-35d1d63071ea'::uuid)
                     Rows Removed by Filter: 0
 Planning time: 0.867 ms
 Execution time: 154868.420 ms
(42 rows)




В списке pgsql-general по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: [GENERAL] [PERFORM] Questionaire: Common WAL write rates on busy servers.
Следующее
От: ""
Дата:
Сообщение: [GENERAL] Perl Dev 2017 survey results