Re: [PERFORM] Very poor read performance, query independent

Поиск
Список
Период
Сортировка
От Charles Nadeau
Тема Re: [PERFORM] Very poor read performance, query independent
Дата
Msg-id CADFyZw5pOiXpo_4zt2LeGneNuB91XHeOhL=9SzOQ36gWTX9rxA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PERFORM] Very poor read performance, query independent  (Igor Neyman <ineyman@perceptron.com>)
Ответы Re: [PERFORM] Very poor read performance, query independent  (Igor Neyman <ineyman@perceptron.com>)
Re: [PERFORM] Very poor read performance, query independent  (Claudio Freire <klaussfreire@gmail.com>)
Список pgsql-performance
Igor,

Initially temp_buffer was left to its default value (8MB). Watching the content of the directory that stores the temporary files, I found that I need at most 21GB of temporary files space. Should I set temp_buffer to 21GB?
Here is the explain you requested with work_mem set to 6GB:
flows=# set work_mem='6GB';
SET
flows=# explain (analyze, buffers) SELECT DISTINCT
   srcaddr,
   dstaddr,
   dstport,
   COUNT(*) AS conversation,
   SUM(doctets) / 1024 / 1024 AS mbytes
FROM
   flowscompact,
   mynetworks
WHERE
   mynetworks.ipaddr >>= flowscompact.srcaddr
   AND dstaddr IN
   (
      SELECT
         dstaddr
      FROM
         dstexterne
   )
GROUP BY
   srcaddr,
   dstaddr,
   dstport
ORDER BY
   mbytes DESC LIMIT 50;
                                                                                           QUERY PLAN                                                                                           
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=48135680.07..48135680.22 rows=50 width=52) (actual time=2227678.196..2227678.223 rows=50 loops=1)
   Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154
   ->  Unique  (cost=48135680.07..48143613.62 rows=2644514 width=52) (actual time=2227678.194..2227678.217 rows=50 loops=1)
         Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154
         ->  Sort  (cost=48135680.07..48137002.33 rows=2644514 width=52) (actual time=2227678.192..2227678.202 rows=50 loops=1)
               Sort Key: (((sum(flows.doctets) / '1024'::numeric) / '1024'::numeric)) DESC, flows.srcaddr, flows.dstaddr, flows.dstport, (count(*))
               Sort Method: quicksort  Memory: 654395kB
               Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154
               ->  GroupAggregate  (cost=48059426.65..48079260.50 rows=2644514 width=52) (actual time=2167909.030..2211446.192 rows=5859671 loops=1)
                     Group Key: flows.srcaddr, flows.dstaddr, flows.dstport
                     Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154
                     ->  Sort  (cost=48059426.65..48060748.90 rows=2644514 width=20) (actual time=2167896.815..2189107.205 rows=91745640 loops=1)
                           Sort Key: flows.srcaddr, flows.dstaddr, flows.dstport
                           Sort Method: external merge  Disk: 3049216kB
                           Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154
                           ->  Gather  (cost=30060688.07..48003007.07 rows=2644514 width=20) (actual time=1268989.000..1991357.232 rows=91745640 loops=1)
                                 Workers Planned: 12
                                 Workers Launched: 12
                                 Buffers: shared hit=728798037 read=82974833
                                 ->  Hash Semi Join  (cost=30059688.07..47951761.31 rows=220376 width=20) (actual time=1268845.181..2007864.725 rows=7057357 loops=13)
                                       Hash Cond: (flows.dstaddr = flows_1.dstaddr)
                                       Buffers: shared hit=728795193 read=82974833
                                       ->  Nested Loop  (cost=0.03..17891246.86 rows=220376 width=20) (actual time=0.207..723790.283 rows=37910370 loops=13)
                                             Buffers: shared hit=590692229 read=14991777
                                             ->  Parallel Seq Scan on flows  (cost=0.00..16018049.14 rows=55094048 width=20) (actual time=0.152..566179.117 rows=45371630 loops=13)
                                                   Buffers: shared hit=860990 read=14991777
                                             ->  Index Only Scan using mynetworks_ipaddr_idx on mynetworks  (cost=0.03..0.03 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=589831190)
                                                   Index Cond: (ipaddr >>= (flows.srcaddr)::ip4r)
                                                   Heap Fetches: 0
                                                   Buffers: shared hit=589831203
                                       ->  Hash  (cost=30059641.47..30059641.47 rows=13305 width=4) (actual time=1268811.101..1268811.101 rows=3803508 loops=13)
                                             Buckets: 4194304 (originally 16384)  Batches: 1 (originally 1)  Memory Usage: 166486kB
                                             Buffers: shared hit=138102964 read=67983056
                                             ->  HashAggregate  (cost=30059561.64..30059601.56 rows=13305 width=4) (actual time=1265248.165..1267432.083 rows=3803508 loops=13)
                                                   Group Key: flows_1.dstaddr
                                                   Buffers: shared hit=138102964 read=67983056
                                                   ->  Nested Loop Anti Join  (cost=0.00..29729327.92 rows=660467447 width=4) (actual time=0.389..1201072.707 rows=125838232 loops=13)
                                                         Join Filter: (mynetworks_1.ipaddr >> (flows_1.dstaddr)::ip4r)
                                                         Rows Removed by Join Filter: 503353617
                                                         Buffers: shared hit=138102964 read=67983056
                                                         ->  Seq Scan on flows flows_1  (cost=0.00..17836152.73 rows=661128576 width=4) (actual time=0.322..343152.274 rows=589831190 loops=13)
                                                               Buffers: shared hit=138102915 read=67983056
                                                         ->  Materialize  (cost=0.00..1.02 rows=4 width=8) (actual time=0.000..0.000 rows=2 loops=7667805470)
                                                               Buffers: shared hit=13
                                                               ->  Seq Scan on mynetworks mynetworks_1  (cost=0.00..1.01 rows=4 width=8) (actual time=0.006..0.007 rows=4 loops=13)
                                                                     Buffers: shared hit=13
 Planning time: 0.941 ms
 Execution time: 2228345.171 ms
(48 rows)

With a work_mem at 6GB, I noticed that for the first 20 minutes the query was running, the i/o wait was much lower, hovering aroun 3% then it jumped 45% until almost the end of the query. 

flowscompact and dstexterne are actually views. I use views to simplify query writing and to "abstract" queries that are use often in other queries. flowscompact is a view built on table flows (having about 590 million rows), it only keeps the most often used fields.
flows=# \d+ flowscompact;
                        View "public.flowscompact"
  Column   |           Type           | Modifiers | Storage | Description 
-----------+--------------------------+-----------+---------+-------------
 flow_id   | bigint                   |           | plain   | 
 sysuptime | bigint                   |           | plain   | 
 exaddr    | ip4                      |           | plain   | 
 dpkts     | integer                  |           | plain   | 
 doctets   | bigint                   |           | plain   | 
 first     | bigint                   |           | plain   | 
 last      | bigint                   |           | plain   | 
 srcaddr   | ip4                      |           | plain   | 
 dstaddr   | ip4                      |           | plain   | 
 srcport   | integer                  |           | plain   | 
 dstport   | integer                  |           | plain   | 
 prot      | smallint                 |           | plain   | 
 tos       | smallint                 |           | plain   | 
 tcp_flags | smallint                 |           | plain   | 
 timestamp | timestamp with time zone |           | plain   | 
View definition:
 SELECT flowstimestamp.flow_id,
    flowstimestamp.sysuptime,
    flowstimestamp.exaddr,
    flowstimestamp.dpkts,
    flowstimestamp.doctets,
    flowstimestamp.first,
    flowstimestamp.last,
    flowstimestamp.srcaddr,
    flowstimestamp.dstaddr,
    flowstimestamp.srcport,
    flowstimestamp.dstport,
    flowstimestamp.prot,
    flowstimestamp.tos,
    flowstimestamp.tcp_flags,
    flowstimestamp."timestamp"
   FROM flowstimestamp;
mynetworks is a table having one column and 4 rows; it contains a list of our network networks:
flows=# select * from mynetworks;
     ipaddr     
----------------
(4 row)
flows=# \d+ mynetworks;
                    Table "public.mynetworks"
 Column | Type | Modifiers | Storage | Stats target | Description 
--------+------+-----------+---------+--------------+-------------
 ipaddr | ip4r |           | plain   |              | 
Indexes:
    "mynetworks_ipaddr_idx" gist (ipaddr)
dstexterne is a view listing all the destination IPv4 addresses not inside our network; it has one column and 3.8 million rows.
flows=# \d+ dstexterne;
              View "public.dstexterne"
 Column  | Type | Modifiers | Storage | Description 
---------+------+-----------+---------+-------------
 dstaddr | ip4  |           | plain   | 
View definition:
 SELECT DISTINCT flowscompact.dstaddr
   FROM flowscompact
     LEFT JOIN mynetworks ON mynetworks.ipaddr >> flowscompact.dstaddr::ip4r
  WHERE mynetworks.ipaddr IS NULL;
Thanks!

Charles

On Wed, Jul 12, 2017 at 6:39 PM, Igor Neyman <ineyman@perceptron.com> wrote:

 

 

From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Charles Nadeau
Sent: Wednesday, July 12, 2017 6:05 AM
To: Jeff Janes <jeff.janes@gmail.com>
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very poor read performance, query independent

 

 

flows=# explain (analyze, buffers) SELECT DISTINCT

flows-#    srcaddr,

flows-#    dstaddr,

flows-#    dstport,

flows-#    COUNT(*) AS conversation,

flows-#    SUM(doctets) / 1024 / 1024 AS mbytes 

flows-# FROM

flows-#    flowscompact,

flows-#    mynetworks 

flows-# WHERE

flows-#    mynetworks.ipaddr >>= flowscompact.srcaddr 

flows-#    AND dstaddr IN 

flows-#    (

flows(#       SELECT

flows(#          dstaddr 

flows(#       FROM

flows(#          dstexterne

flows(#    )

flows-# GROUP BY

flows-#    srcaddr,

flows-#    dstaddr,

flows-#    dstport 

flows-# ORDER BY

flows-#    mbytes DESC LIMIT 50;

LOG:  temporary file: path "pg_tblspc/36238/PG_9.6_201608131/pgsql_tmp/pgsql_tmp14573.6", size 1073741824

LOG:  temporary file: path "pg_tblspc/36238/PG_9.6_201608131/pgsql_tmp/pgsql_tmp14573.7", size 1073741824

LOG:  temporary file: path "pg_tblspc/36238/PG_9.6_201608131/pgsql_tmp/pgsql_tmp14573.8", size 639696896

LOG:  duration: 2765020.327 ms  statement: explain (analyze, buffers) SELECT DISTINCT

   srcaddr,

   dstaddr,

   dstport,

   COUNT(*) AS conversation,

   SUM(doctets) / 1024 / 1024 AS mbytes 

FROM

   flowscompact,

   mynetworks 

WHERE

   mynetworks.ipaddr >>= flowscompact.srcaddr 

   AND dstaddr IN 

   (

      SELECT

         dstaddr 

      FROM

         dstexterne

   )

GROUP BY

   srcaddr,

   dstaddr,

   dstport 

ORDER BY

   mbytes DESC LIMIT 50;

                                                                                                              QUERY PLAN                                                                                                              

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 Limit  (cost=37762321.83..37762321.98 rows=50 width=52) (actual time=2764548.863..2764548.891 rows=50 loops=1)

   Buffers: shared hit=1116590560 read=15851133, temp read=340244 written=340244

   I/O Timings: read=5323746.860

   ->  Unique  (cost=37762321.83..37769053.57 rows=2243913 width=52) (actual time=2764548.861..2764548.882 rows=50 loops=1)

         Buffers: shared hit=1116590560 read=15851133, temp read=340244 written=340244

         I/O Timings: read=5323746.860

         ->  Sort  (cost=37762321.83..37763443.79 rows=2243913 width=52) (actual time=2764548.859..2764548.872 rows=50 loops=1)

               Sort Key: (((sum(flows.doctets) / '1024'::numeric) / '1024'::numeric)) DESC, flows.srcaddr, flows.dstaddr, flows.dstport, (count(*))

               Sort Method: quicksort  Memory: 563150kB

               Buffers: shared hit=1116590560 read=15851133, temp read=340244 written=340244

               I/O Timings: read=5323746.860

               ->  GroupAggregate  (cost=37698151.34..37714980.68 rows=2243913 width=52) (actual time=2696721.610..2752109.551 rows=4691734 loops=1)

                     Group Key: flows.srcaddr, flows.dstaddr, flows.dstport

                     Buffers: shared hit=1116590560 read=15851133, temp read=340244 written=340244

                     I/O Timings: read=5323746.860

                     ->  Sort  (cost=37698151.34..37699273.29 rows=2243913 width=20) (actual time=2696711.428..2732781.705 rows=81896988 loops=1)

                           Sort Key: flows.srcaddr, flows.dstaddr, flows.dstport

                           Sort Method: external merge  Disk: 2721856kB

                           Buffers: shared hit=1116590560 read=15851133, temp read=340244 written=340244

                           I/O Timings: read=5323746.860

                           ->  Gather  (cost=19463936.00..37650810.19 rows=2243913 width=20) (actual time=1777219.713..2590530.887 rows=81896988 loops=1)

                                 Workers Planned: 9

                                 Workers Launched: 9

                                 Buffers: shared hit=1116590559 read=15851133

                                 I/O Timings: read=5323746.860

                                 ->  Hash Semi Join  (cost=19462936.00..37622883.23 rows=249324 width=20) (actual time=1847579.360..2602039.780 rows=8189699 loops=10)

                                       Hash Cond: (flows.dstaddr = flows_1.dstaddr)

                                       Buffers: shared hit=1116588309 read=15851133

                                       I/O Timings: read=5323746.860

                                       ->  Nested Loop  (cost=0.03..18159012.30 rows=249324 width=20) (actual time=1.562..736556.583 rows=45499045 loops=10)

                                             Buffers: shared hit=996551813 read=15851133

                                             I/O Timings: read=5323746.860

                                             ->  Parallel Seq Scan on flows  (cost=0.00..16039759.79 rows=62330930 width=20) (actual time=1.506..547485.066 rows=54155970 loops=10)

                                                   Buffers: shared hit=1634 read=15851133

                                                   I/O Timings: read=5323746.860

                                             ->  Index Only Scan using mynetworks_ipaddr_idx on mynetworks  (cost=0.03..0.03 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=541559704)

                                                   Index Cond: (ipaddr >>= (flows.srcaddr)::ip4r)

                                                   Heap Fetches: 59971474

                                                   Buffers: shared hit=996550152

                                       ->  Hash  (cost=19462896.74..19462896.74 rows=11210 width=4) (actual time=1847228.894..1847228.894 rows=3099798 loops=10)

                                             Buckets: 4194304 (originally 16384)  Batches: 1 (originally 1)  Memory Usage: 141746kB

                                             Buffers: shared hit=120036496

                                             ->  HashAggregate  (cost=19462829.48..19462863.11 rows=11210 width=4) (actual time=1230049.015..1845955.764 rows=3099798 loops=10)

                                                   Group Key: flows_1.dstaddr

                                                   Buffers: shared hit=120036496

                                                   ->  Nested Loop Anti Join  (cost=0.12..19182620.78 rows=560417390 width=4) (actual time=0.084..831832.333 rows=113420172 loops=10)

                                                         Join Filter: (mynetworks_1.ipaddr >> (flows_1.dstaddr)::ip4r)

                                                         Rows Removed by Join Filter: 453681377

                                                         Buffers: shared hit=120036496

                                                         ->  Index Only Scan using flows_srcaddr_dstaddr_idx on flows flows_1  (cost=0.12..9091067.70 rows=560978368 width=4) (actual time=0.027..113052.437 rows=541559704 loops=10)

                                                               Heap Fetches: 91

                                                               Buffers: shared hit=120036459

                                                         ->  Materialize  (cost=0.00..1.02 rows=4 width=8) (actual time=0.000..0.000 rows=2 loops=5415597040)

                                                               Buffers: shared hit=10

                                                               ->  Seq Scan on mynetworks mynetworks_1  (cost=0.00..1.01 rows=4 width=8) (actual time=0.007..0.008 rows=4 loops=10)

                                                                     Buffers: shared hit=10

 Planning time: 6.689 ms

 Execution time: 2764860.853 ms

(58 rows)

 

Regarding "Also using dstat I can see that iowait time is at about 25%", I don't think the server was doing anything else. If it is important, I can repeat the benchmarks.

Thanks!

 

Charles

 

Charles,

 

In your original posting I couldn’t find what value you set for temp_buffers.

Considering you have plenty of RAM, try setting temp_buffers=’6GB’ and then run ‘explain (analyze, buffers) select…’ in the same session. This should alleviate “disk sort’ problem.

 

Also, could you post the structure of flowscompact, mynetworks, and dstextern tables with all the indexes and number of rows. Actually, are they all – tables, or some of them – views?

 

Igor

 

 

Sorry, I misstated the parameter to change.

It is work_mem (not temp_buffers) you should try to increase to 6GB.

 

Igor

 

 




--

В списке pgsql-performance по дате отправления:

Предыдущее
От: Charles Nadeau
Дата:
Сообщение: Re: [PERFORM] Very poor read performance, query independent
Следующее
От: Igor Neyman
Дата:
Сообщение: Re: [PERFORM] Very poor read performance, query independent