Re: AWS RDS PostgreSQL CPU Spiking to 100%

Поиск
Список
Период
Сортировка
От Prince Pathria
Тема Re: AWS RDS PostgreSQL CPU Spiking to 100%
Дата
Msg-id CAON0DawJsSz=ZTSoH8X-4g8ou6NDJ0Y92V8pC2cdaFQjAsU2Qg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: AWS RDS PostgreSQL CPU Spiking to 100%  (aditya desai <admad123@gmail.com>)
Ответы Re: AWS RDS PostgreSQL CPU Spiking to 100%  (aditya desai <admad123@gmail.com>)
Список pgsql-performance
We faced a similar issue, adding RDS proxy in front of RDS Postgres can help.
In our situation, there were a lot of connects/disconnects from Lambda functions although concurrency of Lambda was 100 only.
And adding connection pooler(RDS proxy) helped us to reduce the CPU load from 100% to 30%

Happy to help :)
Prince Pathria Systems Engineer | Certified Kubernetes Administrator | AWS Certified Solutions Architect Evive +91 9478670472 goevive.com


On Mon, Sep 28, 2020 at 9:21 PM aditya desai <admad123@gmail.com> wrote:

Hi,
We have an application where one of the APIs calling queries(attached) is spiking the CPU to 100% during load testing.
However, queries are making use of indexes(Bitmap Index and Bitmap Heap scan though). When run separately on DB queries hardly take less than 200 ms. Is CPU spiking due to Bitmap Heap Scan?
These queries are being called thousands of times. Application team says they have handled connection pooling from the Application side. So there is no connection pooling here from DB side. Current db instance size is "db.m4.4xlarge" 64 GB RAM 16 vCPU".  
The Application dev team has primary keys and foreign keys on tables so they are unable to partition the tables as well due to limitations of postgres partitioning. Columns in WHERE clauses are not constant in all queries to decide partition keys.

1. Does DB need more CPU considering this kind of load? 
2. Can the query be tuned further? It is already using indexes(Bitmap though).
3. Will connection pooling resolve the CPU Spike issues?

Also pasting Query and plans below.

----------------------exampleCount 1. Without internalexamplecode-----------------------

lmp_examples=> explain analyze with exampleCount as ( select examplestatuscode from example j where 1=1 and j.countrycode = 'AD'   and j.facilitycode in ('ABCD') and j.internalexamplecode in ('005','006','007','005') and ((j.examplestartdatetime  between '2020-05-18 00:00:00' and '2020-08-19 00:00:00' ) or j.examplestartdatetime IS NULL )  group by j.examplestatuscode)
lmp_examples-> select js.examplestatuscode,COALESCE(count(*),0) stat_count from exampleCount jc right outer join examplestatus js on jc.examplestatuscode=js.examplestatuscode group by js.examplestatuscode ;
                                                                                                                                                                                 QUERY PLAN                                                                                                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=79353.80..79353.89 rows=9 width=12) (actual time=88.847..88.850 rows=9 loops=1)
   Group Key: js.examplestatuscode
   CTE examplecount
     ->  HashAggregate  (cost=79352.42..79352.46 rows=4 width=4) (actual time=88.803..88.805 rows=5 loops=1)
           Group Key: j.examplestatuscode
           ->  Bitmap Heap Scan on example j  (cost=1547.81..79251.08 rows=40538 width=4) (actual time=18.424..69.658 rows=62851 loops=1)
                 Recheck Cond: ((((countrycode)::text = 'AD'::text) AND ((facilitycode)::text = 'ABCD'::text) AND ((internalexamplecode)::text = ANY ('{005,006,007,005}'::text[])) AND (examplestartdatetime >= '2020-05-18 00:00:00'::timestamp without time zone) AND (examplestartdatetime <= '2020-08-19 00:00:00'::timestamp without time zone)) OR (examplestartdatetime IS NULL))
                 Filter: (((countrycode)::text = 'AD'::text) AND ((facilitycode)::text = 'ABCD'::text) AND ((internalexamplecode)::text = ANY ('{005,006,007,005}'::text[])))
                 Rows Removed by Filter: 3
                 Heap Blocks: exact=18307
                 ->  BitmapOr  (cost=1547.81..1547.81 rows=40538 width=0) (actual time=15.707..15.707 rows=0 loops=1)
                       ->  Bitmap Index Scan on example_list9_idx  (cost=0.00..1523.10 rows=40538 width=0) (actual time=15.702..15.702 rows=62851 loops=1)
                             Index Cond: (((countrycode)::text = 'AD'::text) AND ((facilitycode)::text = 'ABCD'::text) AND ((internalexamplecode)::text = ANY ('{005,006,007,005}'::text[])) AND (examplestartdatetime >= '2020-05-18 00:00:00'::timestamp without time zone) AND (examplestartdatetime <= '2020-08-19 00:00:00'::timestamp without time zone))
                       ->  Bitmap Index Scan on example_list10_idx  (cost=0.00..4.44 rows=1 width=0) (actual time=0.004..0.004 rows=3 loops=1)
                             Index Cond: (examplestartdatetime IS NULL)
   ->  Hash Left Join  (cost=0.13..1.29 rows=9 width=4) (actual time=88.831..88.840 rows=9 loops=1)
         Hash Cond: ((js.examplestatuscode)::text = (jc.examplestatuscode)::text)
         ->  Seq Scan on examplestatus js  (cost=0.00..1.09 rows=9 width=4) (actual time=0.004..0.007 rows=9 loops=1)
         ->  Hash  (cost=0.08..0.08 rows=4 width=16) (actual time=88.817..88.817 rows=5 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 9kB
               ->  CTE Scan on examplecount jc  (cost=0.00..0.08 rows=4 width=16) (actual time=88.807..88.812 rows=5 loops=1)
 Planning Time: 0.979 ms
 Execution Time: 89.036 ms
(23 rows)


----------------exampleCount 2. With internalexamplecode---------------------------------


lmp_examples=> explain analyze with exampleCount as ( select examplestatuscode,count(1) stat_count from example j where 1=1 and j.countrycode = 'AD'   and j.facilitycode in ('ABCD') and j.internalexamplecode in ('005','006','007','005') and ((j.examplestartdatetime  between '2020-05-18 00:00:00' and '2020-08-19 00:00:00' ) or j.examplestartdatetime IS NULL )  group by j.examplestatuscode)
lmp_examples-> select js.examplestatuscode,COALESCE(stat_count,0) stat_count from exampleCount jc right outer join examplestatus js on jc.examplestatuscode=js.examplestatuscode;
                                                                                                                                                                                 QUERY PLAN                                                                                                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Left Join  (cost=79453.94..79455.10 rows=9 width=12) (actual time=89.660..89.669 rows=9 loops=1)
   Hash Cond: ((js.examplestatuscode)::text = (jc.examplestatuscode)::text)
   CTE examplecount
     ->  HashAggregate  (cost=79453.77..79453.81 rows=4 width=12) (actual time=89.638..89.640 rows=5 loops=1)
           Group Key: j.examplestatuscode
           ->  Bitmap Heap Scan on example j  (cost=1547.81..79251.08 rows=40538 width=4) (actual time=18.193..69.710 rows=62851 loops=1)
                 Recheck Cond: ((((countrycode)::text = 'AD'::text) AND ((facilitycode)::text = 'ABCD'::text) AND ((internalexamplecode)::text = ANY ('{005,006,007,005}'::text[])) AND (examplestartdatetime >= '2020-05-18 00:00:00'::timestamp without time zone) AND (examplestartdatetime <= '2020-08-19 00:00:00'::timestamp without time zone)) OR (examplestartdatetime IS NULL))
                 Filter: (((countrycode)::text = 'AD'::text) AND ((facilitycode)::text = 'ABCD'::text) AND ((internalexamplecode)::text = ANY ('{005,006,007,005}'::text[])))
                 Rows Removed by Filter: 3
                 Heap Blocks: exact=18307
                 ->  BitmapOr  (cost=1547.81..1547.81 rows=40538 width=0) (actual time=15.483..15.483 rows=0 loops=1)
                       ->  Bitmap Index Scan on example_list9_idx  (cost=0.00..1523.10 rows=40538 width=0) (actual time=15.477..15.478 rows=62851 loops=1)
                             Index Cond: (((countrycode)::text = 'AD'::text) AND ((facilitycode)::text = 'ABCD'::text) AND ((internalexamplecode)::text = ANY ('{005,006,007,005}'::text[])) AND (examplestartdatetime >= '2020-05-18 00:00:00'::timestamp without time zone) AND (examplestartdatetime <= '2020-08-19 00:00:00'::timestamp without time zone))
                       ->  Bitmap Index Scan on example_list10_idx  (cost=0.00..4.44 rows=1 width=0) (actual time=0.004..0.004 rows=3 loops=1)
                             Index Cond: (examplestartdatetime IS NULL)
   ->  Seq Scan on examplestatus js  (cost=0.00..1.09 rows=9 width=4) (actual time=0.003..0.005 rows=9 loops=1)
   ->  Hash  (cost=0.08..0.08 rows=4 width=24) (actual time=89.650..89.651 rows=5 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  CTE Scan on examplecount jc  (cost=0.00..0.08 rows=4 width=24) (actual time=89.641..89.647 rows=5 loops=1)
 Planning Time: 0.470 ms
 Execution Time: 89.737 ms

------------------------exampleSelect-----------------------------------


lmp_examples=> explain analyze select j.id from example j where 1=1  and j.countrycode = 'AD'  and j.facilitycode in ('ABCD') and j.examplestatuscode in ('101') and j.internalexamplecode in ('005','006','007','005')  and ((j.examplestartdatetime between '2020-05-18 00:00:00' and '2020-08-19 00:00:00') or j.examplestartdatetime IS NULL)  ORDER BY createddate DESC limit 10;
                                                                                                          QUERY PLAN                                                    
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=71286.65..71286.68 rows=10 width=12) (actual time=47.351..47.359 rows=10 loops=1)
   ->  Sort  (cost=71286.65..71335.31 rows=19462 width=12) (actual time=47.349..47.352 rows=10 loops=1)
         Sort Key: createddate DESC
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Bitmap Heap Scan on example j  (cost=1176.77..70866.09 rows=19462 width=12) (actual time=15.133..46.555 rows=2530 loops=1)
               Recheck Cond: (((countrycode)::text = 'AD'::text) AND ((facilitycode)::text = 'ABCD'::text) AND ((examplestatuscode)::text = '101'::text) AND ((internalexamplecode)::text = ANY ('{005,006,007,005}'::text[])))
               Filter: (((examplestartdatetime >= '2020-05-18 00:00:00'::timestamp without time zone) AND (examplestartdatetime <= '2020-08-19 00:00:00'::timestamp without time zone)) OR (examplestartdatetime IS NULL))
               Rows Removed by Filter: 38724
               Heap Blocks: exact=20923
               ->  Bitmap Index Scan on example_list1_idx  (cost=0.00..1171.90 rows=33211 width=0) (actual time=9.938..9.939 rows=41254 loops=1)
                     Index Cond: (((countrycode)::text = 'AD'::text) AND ((facilitycode)::text = 'ABCD'::text) AND ((examplestatuscode)::text = '101'::text) AND ((internalexamplecode)::text = ANY ('{005,006,007,005}'::text[])))
 Planning Time: 0.398 ms
 Execution Time: 47.416 ms

Regards,
Aditya.

В списке pgsql-performance по дате отправления:

Предыдущее
От: aditya desai
Дата:
Сообщение: Re: AWS RDS PostgreSQL CPU Spiking to 100%
Следующее
От: Timothy Garnett
Дата:
Сообщение: Is it possible to specify minimum number of rows planner should consider?