postgres performance

Поиск
Список
Период
Сортировка
От chidamparam muthusamy
Тема postgres performance
Дата
Msg-id CABUk_4jr9z3CU-5b8tMvgEhAsKJ8+JKRn+TivPGAis9FOZo1jQ@mail.gmail.com
обсуждение исходный текст
Ответы Re: postgres performance
Re: postgres performance
Re: postgres performance
Список pgsql-performance
hi,
Registered with PostgreSQL Help Forum to identify and resolve the Postgres DB performance issue, received suggestions but could not improve the speed/response time. Please help.

Details:
Postgres Version 9.3.1
Server configuration:
Processor: 2 x Intel Quad core E5620 @ 2.40GHz
RAM: 16 GB

Postgres configuration:
Effective cache size = 10 GB
shared Buffer = 1250 MB
random page cost = 4

Table size = 60 GB
Number of records = 44 million 
Carried out Vacuum Analyze after inserting new records and also after creating Index,
6 months data, every month around 10 GB will get added. Expecting good performance with 3 years data.
DB Will be used for Reporting/Read, will not be used for transaction. Daily records will be inserted through bulk insertion every day.

Table schema:
                    Table "public.detailed_report"
            Column             |            Type            | Modifiers 
-------------------------------+----------------------------+-----------
 group_id                      | character varying(50)      | not null
 client                        | character varying(50)      | 
 gateway                       | character varying(50)      | 
 call_id                       | character varying(120)     | not null
 parent_call_id                | character varying(120)     | 
 start_time                    | timestamp with time zone   | not null
 connect_time                  | timestamp with time zone   | 
 end_time                      | timestamp with time zone   | 
 duration                      | integer                    | 
 source                        | character varying(50)      | 
 source_alias                  | character varying(50)      | 
 dest_in_number                | character varying(50)      | 
 dest_out_number               | character varying(50)      | 
 bp_code_pay                   | character varying[]        | 
 billed_duration_pay           | integer[]                  | 
 rate_pay                      | character varying[]        | 
 rate_effective_date_pay       | timestamp with time zone[] | 
 type_value_pay                | character varying[]        | 
 slab_time_pay                 | character varying[]        | 
 pin_pay                       | bigint[]                   | 
 amount_pay                    | double precision[]         | 
 adjusted_pin_pay              | bigint[]                   | 
 adjusted_amount_pay           | double precision[]         | 
 call_amount_pay               | double precision           | 
 country_code_pay              | character varying[]        | 
 country_desc_pay              | character varying[]        | 
 master_country_code           | character varying(15)      | 
 master_country_desc           | character varying(100)     | 
 bp_code_recv                  | character varying[]        | 
 billed_duration_recv          | integer[]                  | 
 rate_recv                     | character varying[]        | 
 rate_effective_date_recv      | timestamp with time zone[] | 
 type_value_recv               | character varying[]        | 
 slab_time_recv                | character varying[]        | 
 pin_recv                      | bigint[]                   | 
 amount_recv                   | double precision[]         | 
 adjusted_pin_recv             | bigint[]                   | 
 adjusted_amount_recv          | double precision[]         | 
 call_amount_recv              | double precision           | 
 country_code_recv             | character varying[]        | 
 country_desc_recv             | character varying[]        | 
 subscriber_type               | character varying(50)      | 
 pdd                           | smallint                   | 
 disconnect_reason             | character varying(200)     | 
 source_ip                     | character varying(20)      | 
 dest_ip                       | character varying(20)      | 
 caller_hop                    | character varying(20)      | 
 callee_hop                    | character varying(20)      | 
 caller_received_from_hop      | character varying(20)      | 
 callee_sent_to_hop            | character varying(20)      | 
 caller_media_ip_port          | character varying(25)      | 
 callee_media_ip_port          | character varying(25)      | 
 caller_original_media_ip_port | character varying(25)      | 
 callee_original_media_ip_port | character varying(25)      | 
 switch_ip                     | character varying(20)      | 
 call_shop_amount_paid         | boolean                    | 
 version                       | character varying          | 
 call_duration_pay             | integer                    | 
 call_duration_recv            | integer                    | 
 audio_codec                   | character varying(5)       | 
 video_codec                   | character varying(5)       | 
 shadow_amount_recv            | double precision           | 
 shadow_amount_pay             | double precision           | 
 pulse_applied_recv            | character varying(50)      | 
 pulse_applied_pay             | character varying(50)      | 

Index, multi column, 3 columns, matches exactly with query where condition
"endtime_groupid_client_tsidx_detail_report" btree (end_time DESC, group_id, client), tablespace "indexspace" which exactly matches with 'where' condition,
" WHERE end_time>='2013-05-01 00:00' and end_time<'2013-07-01 00:00' and group_id='admin' and client ='CHOICE' GROUP by client, gateway;" 
Index on a separate tablespace on another hard disk.

Query:
EXPLAIN (analyze, buffers) SELECT text(client) as client, text(gateway) as gateway,count(*)::bigint as total_calls, (avg(duration)/1000.0) ::numeric(10,2) as acd, (avg(pdd)) ::numeric(10,2) as pdd, sum(call_duration_recv)/1000.0 as duration_recv, sum(call_duration_pay)/1000.0 as duration_pay, sum(call_amount_recv) as call_amount_recv, sum(call_amount_pay) as call_amount_
pay FROM detailed_report WHERE end_time>='2013-05-01 00:00' and end_time<'2013-07-01 00:00' and group_id='admin' and client ='CHOICE' GROUP by client, gateway ORDER BY call_amount_recv DESC; 
QUERY PLAN 
------------------------------------------------------
Sort (cost=3422863.06..3422868.69 rows=2254 width=44) (actual time=137852.474..137852.474 rows=5 loops=1)
Sort Key: (sum(call_amount_recv))
Sort Method: quicksort Memory: 25kB
Buffers: shared read=2491664
-> HashAggregate (cost=3422664.28..3422737.53 rows=2254 width=44) (actual time=137852.402..137852.454 rows=5 loops=1)
Buffers: shared read=2491664
-> Bitmap Heap Scan on detailed_report (cost=644828.11..3399506.87 rows=1029218 width=44) (actual time=4499.558..125443.122 rows=5248227 loops=1)
Recheck Cond: ((end_time >= '2013-05-01 00:00:00+00'::timestamp with time zone) AND (end_time < '2013-07-01 00:00:00+00'::timestamp with time zone) AND ((group_id)::text = 'adm
in'::text) AND ((client)::text = 'CHOICE'::text))
Buffers: shared read=2491664
-> Bitmap Index Scan on endtime_groupid_client_tsidx_detail_report (cost=0.00..644570.81 rows=1029218 width=0) (actual time=3418.754..3418.754 rows=5248227 loops=1)
Index Cond: ((end_time >= '2013-05-01 00:00:00+00'::timestamp with time zone) AND (end_time < '2013-07-01 00:00:00+00'::timestamp with time zone) AND ((group_id)::text = 
'admin'::text) AND ((client)::text = 'CHOICE'::text))
Buffers: shared read=95055
Total runtime: 137868.946 ms
(13 rows)

Checked by removing ORDER BY, but no improvement.

By increasing random_page_cost to 25, the query gets executed sequentially,Seq Scan on detailed_report, time taken is comparatively better than Indexed scan. But I am not preferring because when the data grows the sequential scan performance will come down.

carried out 3 more set of tests:
1. Index on columns
2. multiple column index, with 2 columns
3. multiple column index, with three columns

Test Case 1:
************
indexes :
1)client
2)group_id
3)gateway
4)end_time

"det_rep_pkey" PRIMARY KEY, btree (group_id, call_id, start_time)
"client_detailed_report_idx" btree (client), tablespace "indexspace"
"end_time_detailed_report_idx" btree (end_time), tablespace "indexspace"
"gateway_detailed_report_idx" btree (gateway), tablespace "indexspace"
"group_id_detailed_report_idx" btree (group_id), tablespace "indexspace"

testdb=# EXPLAIN (analyze,buffers,verbose)SELECT text(client) as client, text(gateway) as gateway,count(*)::bigint as total_calls, (avg(duration)/1000.0) ::numeric(10,2) as acd, (avg(pdd)) ::numeric(10,2) as pdd, sum(call_duration_recv)/1000.0 as duration_recv, sum(call_duration_pay)/1000.0 as duration_pay, sum(call_amount_recv) as call_amount_recv, sum(call_amount_pay) as call_amount_pay FROM detailed_report WHERE end_time>='2013-05-01 00:00' and end_time<'2013-07-01 00:00' and group_id='admin' and client ='CHOICE' GROUP by client, gateway ORDER BY call_amount_recv DESC; 
QUERY PLAN 

Sort (cost=3510106.93..3510112.25 rows=2127 width=44) (actual time=148557.599..148557.599 rows=5 loops=1)
Output: ((client)::text), ((gateway)::text), (count(*)), (((avg(duration) / 1000.0))::numeric(10,2)), ((avg(pdd))::numeric(10,2)), (((sum(call_duration_recv))::numeric / 1000.0)), (((sum(c
all_duration_pay))::numeric / 1000.0)), (sum(call_amount_recv)), (sum(call_amount_pay)), client, gateway
Sort Key: (sum(detailed_report.call_amount_recv))
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=69 read=2505035
-> HashAggregate (cost=3509920.24..3509989.37 rows=2127 width=44) (actual time=148557.556..148557.581 rows=5 loops=1)
Output: (client)::text, (gateway)::text, count(*), ((avg(duration) / 1000.0))::numeric(10,2), (avg(pdd))::numeric(10,2), ((sum(call_duration_recv))::numeric / 1000.0), ((sum(call_dur
ation_pay))::numeric / 1000.0), sum(call_amount_recv), sum(call_amount_pay), client, gateway
Buffers: shared hit=69 read=2505035
-> Bitmap Heap Scan on public.detailed_report (cost=832774.93..3487872.62 rows=979894 width=44) (actual time=14257.148..135355.676 rows=5248227 loops=1)
Output: group_id, client, gateway, call_id, parent_call_id, start_time, connect_time, end_time, duration, source, source_alias, dest_in_number, dest_out_number, bp_code_pay, bi
lled_duration_pay, rate_pay, rate_effective_date_pay, type_value_pay, slab_time_pay, pin_pay, amount_pay, adjusted_pin_pay, adjusted_amount_pay, call_amount_pay, country_code_pay, country_des
c_pay, master_country_code, master_country_desc, bp_code_recv, billed_duration_recv, rate_recv, rate_effective_date_recv, type_value_recv, slab_time_recv, pin_recv, amount_recv, adjusted_pin_
recv, adjusted_amount_recv, call_amount_recv, country_code_recv, country_desc_recv, subscriber_type, pdd, disconnect_reason, source_ip, dest_ip, caller_hop, callee_hop, caller_received_from_h
op, callee_sent_to_hop, caller_media_ip_port, callee_media_ip_port, caller_original_media_ip_port, callee_original_media_ip_port, switch_ip, call_shop_amount_paid, version, call_duration_pay,
call_duration_recv, audio_codec, video_codec, shadow_amount_recv, shadow_amount_pay, pulse_applied_recv, pulse_applied_pay
Recheck Cond: (((detailed_report.client)::text = 'CHOICE'::text) AND ((detailed_report.group_id)::text = 'admin'::text) AND (detailed_report.end_time >= '2013-05-01 00:00:00+00
'::timestamp with time zone) AND (detailed_report.end_time < '2013-07-01 00:00:00+00'::timestamp with time zone))
Buffers: shared hit=69 read=2505035
-> BitmapAnd (cost=832774.93..832774.93 rows=979894 width=0) (actual time=13007.643..13007.643 rows=0 loops=1)
Buffers: shared read=108495
-> Bitmap Index Scan on client_detailed_report_idx (cost=0.00..172876.66 rows=7862413 width=0) (actual time=2546.204..2546.204 rows=7840766 loops=1)
Index Cond: ((detailed_report.client)::text = 'CHOICE'::text)
Buffers: shared read=21427
-> Bitmap Index Scan on group_id_detailed_report_idx (cost=0.00..307105.20 rows=14971818 width=0) (actual time=4265.728..4265.728 rows=14945965 loops=1)
Index Cond: ((detailed_report.group_id)::text = 'admin'::text)
Buffers: shared read=40840
-> Bitmap Index Scan on end_time_detailed_report_idx (cost=0.00..352057.65 rows=16790108 width=0) (actual time=3489.106..3489.106 rows=16917795 loops=1)
Index Cond: ((detailed_report.end_time >= '2013-05-01 00:00:00+00'::timestamp with time zone) AND (detailed_report.end_time < '2013-07-01 00:00:00+00'::timestamp wi
th time zone))
Buffers: shared read=46228
Total runtime: 148558.070 ms
(24 rows)



Test Case 2:
************
Indexes :
1)client
2)group_id
3)gateway
4)end_time
5)client,group_id

"det_rep_pkey" PRIMARY KEY, btree (group_id, call_id, start_time)
"client_detailed_report_idx" btree (client), tablespace "indexspace"
"clientgroupid_detailed_report_idx" btree (client, group_id), tablespace "indexspace"
"end_time_detailed_report_idx" btree (end_time), tablespace "indexspace"
"gateway_detailed_report_idx" btree (gateway), tablespace "indexspace"
"group_id_detailed_report_idx" btree (group_id), tablespace "indexspace"


testdb=# EXPLAIN (analyze,buffers,verbose)SELECT text(client) as client, text(gateway) as gateway,count(*)::bigint as total_calls, (avg(duration)/1000.0) ::numeric(10,2) as acd, (avg(pdd)) ::numeric(10,2) as pdd, sum(call_duration_recv)/1000.0 as duration_recv, sum(call_duration_pay)/1000.0 as duration_pay, sum(call_amount_recv) as call_amount_recv, sum(call_amount_pay) as call_amount_pay FROM detailed_report WHERE end_time>='2013-05-01 00:00' and end_time<'2013-07-01 00:00' and group_id='admin' and client ='CHOICE' GROUP by client, gateway ORDER BY call_amount_recv DESC;

QUERY PLAN 
Sort (cost=3172381.37..3172387.11 rows=2297 width=44) (actual time=132725.901..132725.901 rows=5 loops=1)
Output: ((client)::text), ((gateway)::text), (count(*)), (((avg(duration) / 1000.0))::numeric(10,2)), ((avg(pdd))::numeric(10,2)), (((sum(call_duration_recv))::numeric / 1000.0)), (((sum(c
all_duration_pay))::numeric / 1000.0)), (sum(call_amount_recv)), (sum(call_amount_pay)), client, gateway
Sort Key: (sum(detailed_report.call_amount_recv))
Sort Method: quicksort Memory: 25kB
Buffers: shared read=2472883
-> HashAggregate (cost=3172178.48..3172253.13 rows=2297 width=44) (actual time=132725.861..132725.881 rows=5 loops=1)
Output: (client)::text, (gateway)::text, count(*), ((avg(duration) / 1000.0))::numeric(10,2), (avg(pdd))::numeric(10,2), ((sum(call_duration_recv))::numeric / 1000.0), ((sum(call_dur
ation_pay))::numeric / 1000.0), sum(call_amount_recv), sum(call_amount_pay), client, gateway
Buffers: shared read=2472883
-> Bitmap Heap Scan on public.detailed_report (cost=434121.21..3149462.57 rows=1009596 width=44) (actual time=8257.581..120311.450 rows=5248227 loops=1)
Output: group_id, client, gateway, call_id, parent_call_id, start_time, connect_time, end_time, duration, source, source_alias, dest_in_number, dest_out_number, bp_code_pay, bi
lled_duration_pay, rate_pay, rate_effective_date_pay, type_value_pay, slab_time_pay, pin_pay, amount_pay, adjusted_pin_pay, adjusted_amount_pay, call_amount_pay, country_code_pay, country_des
c_pay, master_country_code, master_country_desc, bp_code_recv, billed_duration_recv, rate_recv, rate_effective_date_recv, type_value_recv, slab_time_recv, pin_recv, amount_recv, adjusted_pin_
recv, adjusted_amount_recv, call_amount_recv, country_code_recv, country_desc_recv, subscriber_type, pdd, disconnect_reason, source_ip, dest_ip, caller_hop, callee_hop, caller_received_from_h
op, callee_sent_to_hop, caller_media_ip_port, callee_media_ip_port, caller_original_media_ip_port, callee_original_media_ip_port, switch_ip, call_shop_amount_paid, version, call_duration_pay,
call_duration_recv, audio_codec, video_codec, shadow_amount_recv, shadow_amount_pay, pulse_applied_recv, pulse_applied_pay
Recheck Cond: (((detailed_report.client)::text = 'CHOICE'::text) AND ((detailed_report.group_id)::text = 'admin'::text) AND (detailed_report.end_time >= '2013-05-01 00:00:00+00
'::timestamp with time zone) AND (detailed_report.end_time < '2013-07-01 00:00:00+00'::timestamp with time zone))
Buffers: shared read=2472883
-> BitmapAnd (cost=434121.21..434121.21 rows=1009596 width=0) (actual time=7101.419..7101.419 rows=0 loops=1)
Buffers: shared read=76274
-> Bitmap Index Scan on clientgroupid_detailed_report_idx (cost=0.00..74766.52 rows=2649396 width=0) (actual time=3066.346..3066.346 rows=7840766 loops=1)
Index Cond: (((detailed_report.client)::text = 'CHOICE'::text) AND ((detailed_report.group_id)::text = 'admin'::text))
Buffers: shared read=30046
-> Bitmap Index Scan on end_time_detailed_report_idx (cost=0.00..358849.64 rows=17114107 width=0) (actual time=2969.577..2969.577 rows=16917795 loops=1)
Index Cond: ((detailed_report.end_time >= '2013-05-01 00:00:00+00'::timestamp with time zone) AND (detailed_report.end_time < '2013-07-01 00:00:00+00'::timestamp wi
th time zone))
Buffers: shared read=46228
Total runtime: 132726.073 ms
(21 rows)



Test Case 3:
************
Indexes:
Index :
1)client
2)group_id
3)gateway
4)end_time
5)client,group_id
6)client,group_id,end_time

"det_rep_pkey" PRIMARY KEY, btree (group_id, call_id, start_time)
"client_detailed_report_idx" btree (client), tablespace "indexspace"
"clientgroupid_detailed_report_idx" btree (client, group_id), tablespace "indexspace"
"clientgroupidendtime_detailed_report_idx" btree (client, group_id, end_time), tablespace "indexspace"
"end_time_detailed_report_idx" btree (end_time), tablespace "indexspace"
"gateway_detailed_report_idx" btree (gateway), tablespace "indexspace"
"group_id_detailed_report_idx" btree (group_id), tablespace "indexspace"


testdb=# EXPLAIN (analyze, verbose) SELECT text(client) as client, text(gateway) as gateway,count(*)::bigint as total_calls, (avg(duration)/1000.0) ::numeric(10,2) as acd, (avg(pdd)) ::numeric(10,2) as pdd, sum(call_duration_recv)/1000.0 as duration_recv, sum(call_duration_pay)/1000.0 as duration_pay, sum(call_amount_recv) as call_amount_recv, sum(call_amount_pay) as call_amount_
pay FROM detailed_report WHERE end_time>='2013-05-01 00:00' and end_time<'2013-07-01 00:00' and group_id='admin' and client ='CHOICE' GROUP by client, gateway ORDER BY call_amount_recv DESC; 

QUERY PLAN 

Sort (cost=2725603.99..2725609.46 rows=2188 width=44) (actual time=137713.264..137713.265 rows=5 loops=1)
Output: ((client)::text), ((gateway)::text), (count(*)), (((avg(duration) / 1000.0))::numeric(10,2)), ((avg(pdd))::numeric(10,2)), (((sum(call_duration_recv))::numeric / 1000.0)), (((sum(c
all_duration_pay))::numeric / 1000.0)), (sum(call_amount_recv)), (sum(call_amount_pay)), client, gateway
Sort Key: (sum(detailed_report.call_amount_recv))
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=2725411.50..2725482.61 rows=2188 width=44) (actual time=137713.192..137713.215 rows=5 loops=1)
Output: (client)::text, (gateway)::text, count(*), ((avg(duration) / 1000.0))::numeric(10,2), (avg(pdd))::numeric(10,2), ((sum(call_duration_recv))::numeric / 1000.0), ((sum(call_dur
ation_pay))::numeric / 1000.0), sum(call_amount_recv), sum(call_amount_pay), client, gateway
-> Bitmap Heap Scan on public.detailed_report (cost=37356.61..2703244.88 rows=985183 width=44) (actual time=3925.850..124647.660 rows=5248227 loops=1)
Output: group_id, client, gateway, call_id, parent_call_id, start_time, connect_time, end_time, duration, source, source_alias, dest_in_number, dest_out_number, bp_code_pay, bi
lled_duration_pay, rate_pay, rate_effective_date_pay, type_value_pay, slab_time_pay, pin_pay, amount_pay, adjusted_pin_pay, adjusted_amount_pay, call_amount_pay, country_code_pay, country_des
c_pay, master_country_code, master_country_desc, bp_code_recv, billed_duration_recv, rate_recv, rate_effective_date_recv, type_value_recv, slab_time_recv, pin_recv, amount_recv, adjusted_pin_
recv, adjusted_amount_recv, call_amount_recv, country_code_recv, country_desc_recv, subscriber_type, pdd, disconnect_reason, source_ip, dest_ip, caller_hop, callee_hop, caller_received_from_h
op, callee_sent_to_hop, caller_media_ip_port, callee_media_ip_port, caller_original_media_ip_port, callee_original_media_ip_port, switch_ip, call_shop_amount_paid, version, call_duration_pay,
call_duration_recv, audio_codec, video_codec, shadow_amount_recv, shadow_amount_pay, pulse_applied_recv, pulse_applied_pay
Recheck Cond: (((detailed_report.client)::text = 'CHOICE'::text) AND ((detailed_report.group_id)::text = 'admin'::text) AND (detailed_report.end_time >= '2013-05-01 00:00:00+00
'::timestamp with time zone) AND (detailed_report.end_time < '2013-07-01 00:00:00+00'::timestamp with time zone))
-> Bitmap Index Scan on clientgroupidendtime_detailed_report_idx (cost=0.00..37110.31 rows=985183 width=0) (actual time=2820.150..2820.150 rows=5248227 loops=1)
Index Cond: (((detailed_report.client)::text = 'CHOICE'::text) AND ((detailed_report.group_id)::text = 'admin'::text) AND (detailed_report.end_time >= '2013-05-01 00:00:0
0+00'::timestamp with time zone) AND (detailed_report.end_time < '2013-07-01 00:00:00+00'::timestamp with time zone))
Total runtime: 137728.029 ms
(12 rows)

Tried by creating partial Index on group_id column for the value 'admin' and also end_time column for one month range.

With all the above experiment, could not reduce the response time, please suggest.



В списке pgsql-performance по дате отправления:

Предыдущее
От: Kevin Grittner
Дата:
Сообщение: Re: Similarity search with the tsearch2 extension
Следующее
От: Alan Hodgson
Дата:
Сообщение: Re: postgres performance