Problems with query in highly concurrent environment
От | Aleksei Valikov |
---|---|
Тема | Problems with query in highly concurrent environment |
Дата | |
Msg-id | 421E5751.5070804@gmx.net обсуждение исходный текст |
Список | pgsql-general |
Hi. We have a very strange problem and no idea how to solve it. We have a PROGRESSION table that stores information about various telephony events. Rows are never updated, only inserted, about 20-30 thousand rows a day. Based on this table, we need to produce certain statistics, which are created by a complex SQL query. This query runs quite fast on our test systems, but is EXTREMELY slow on the production system. The slow down factor is 100-500 times. A query that runs 2 seconds on the test system takes sometimes up to 12 minutes in the production system. Test systems run on the dumps of the production system, so the difference between databases is not big. The only difference is that production system is concurrently accessed by a number of users (and PROGRESSION table gets plenty of inserts). During the off-peak times (at night, for instance), statistics on the production system are as fast as on the test systems (even faster because of the server hardware). Moreover, what I find very strange is that when I log in with psql (or any other SQL client) onto the production database and execute the same query, it runs fast - even in peak hours! I would be extremely grateful for any hints or ideas on how could we resolve this. My personal theory is that concurrent reads/writes create some kind of blocks (for instance, on indexes). Simply "heavy load" of the production system would not explain the slowdown factor of 400. Under which conditions could writes block reading query? What could we do to diagnose those blocks? I'm afraid I can't drop indexes on the production system. Becoming a bit desperate... Bye. /lexi ps. The table description: Table "public.progression" Column | Type | Modifiers ---------------+------------------------+------------------------------------------------------ id | bigint | not null default nextval('progression_id_seq'::text) timestamp | bigint | not null entry_type | integer | not null call_id | bigint | not null hotline_id | bigint | not null hotline_name | character varying(64) | not null team_id | bigint | not null team_name | character varying(64) | not null expert_id | bigint | not null expert_name | character varying(64) | not null expert_role | bigint | not null remote_number | character varying(32) | comment | character varying(255) | not null default ''::character varying Indexes: "progression_call_id_idx" btree (call_id) "progression_entry_type_timestamp_idx" btree (entry_type, "timestamp") "progression_expert_id" btree (expert_id) select count(*) from progression returns 1870961 The statistics query (please be gentle): SELECT CALL.HOTLINE_ID || '.' || CALL.PERIOD_NUMBER as {record.id}, CALL.PERIOD_NUMBER as {record.periodNumber}, (:startTimestamp + CALL.PERIOD_NUMBER * :intervalSize) AS {record.timestamp}, CALL.HOTLINE_ID AS {record.hotlineId}, CALL.HOTLINE_NAME AS {record.hotlineName}, count(*) AS {record.totalCallCount}, sum(CALL.IS_PROCESSED) as {record.processedCallCount}, sum(CALL.TOTAL_TIME) AS {record.totalTime}, sum(CALL.SPOKEN_TIME) as {record.spokenTime} FROM (SELECT draft._CALL_ID as CALL_ID, draft._TIMESTAMP as TIMESTAMP, draft._HOTLINE_ID as HOTLINE_ID, draft._HOTLINE_NAME as HOTLINE_NAME, case when draft._CUSTOMER_CALL_GETS_CONNECTED > 0 then 1 else 0 end as IS_CONNECTED, case when draft._CUSTOMER_CALL_GETS_CONNECTED > 0 and (draft._CUSTOMER_CALL_GETS_DISCONNECTED > 0 OR draft._CUSTOMER_CALL_DROPPED > 0) then 1 else 0 end as IS_PROCESSED, case when draft._AGENT_CALL_GETS_CONNECTED > 0 and draft._AGENT_CALL_GETS_DISCONNECTED > 0 THEN draft._AGENT_CALL_GETS_DISCONNECTED - draft._AGENT_CALL_GETS_CONNECTED when draft._AGENT_CALL_GETS_CONNECTED > 0 and draft._AGENT_CALL_DROPPED > 0 THEN draft._AGENT_CALL_DROPPED - draft._AGENT_CALL_GETS_CONNECTED else 0 END as TOTAL_TIME, case when draft._AGENT_CALL_STARTS_TALKING > 0 and draft._AGENT_CALL_GETS_DISCONNECTED > 0 THEN draft._AGENT_CALL_GETS_DISCONNECTED - draft._AGENT_CALL_STARTS_TALKING when draft._AGENT_CALL_STARTS_TALKING > 0 and draft._AGENT_CALL_DROPPED > 0 THEN draft._AGENT_CALL_DROPPED - draft._AGENT_CALL_STARTS_TALKING else 0 END as SPOKEN_TIME, floor((draft._TIMESTAMP - :startTimestamp) / :intervalSize) as PERIOD_NUMBER FROM (SELECT p.call_id as _CALL_ID, sum(case when p.entry_type = 1 then p.timestamp else 0 end) as _TIMESTAMP , p.hotline_id as _HOTLINE_ID, p.hotline_name as _HOTLINE_NAME, sum(case when p.entry_type = 2 then p.timestamp else 0 end) as _CUSTOMER_CALL_GETS_CONNECTED , sum(case when p.entry_type = 3 then p.timestamp else 0 end) as _CUSTOMER_CALL_GETS_DISCONNECTED , sum(case when p.entry_type = 4 then p.timestamp else 0 end) as _CUSTOMER_CALL_DROPPED , sum(case when p.entry_type = 8 then p.timestamp else 0 end) as _AGENT_CALL_GETS_CONNECTED , sum(case when p.entry_type = 9 then p.timestamp else 0 end) as _AGENT_CALL_GETS_DISCONNECTED , sum(case when p.entry_type = 10 then p.timestamp else 0 end) as _AGENT_CALL_DROPPED , sum(case when p.entry_type = 16 then p.timestamp else 0 end) as _AGENT_CALL_STARTS_TALKING FROM progression p WHERE p.hotline_id in (:hotlineIds) and p.entry_type in (1,2,3,4,8,9,10,16) and p.timestamp > :startTimestamp and p.timestamp < (:endTimestamp + :maximalDuration) and (p.entry_type != 1 or p.timestamp < :endTimestamp) GROUP BY p.call_id, p.hotline_id, p.hotline_name) as draft) AS CALL WHERE CALL.TIMESTAMP > 0 GROUP BY CALL.HOTLINE_ID, CALL.HOTLINE_NAME, CALL.PERIOD_NUMBER Query execution plan for this query (for the time period of January this year, all hotlines, :maximalDuration = 36000000, :intervalSize = 3600000) is as follows: HashAggregate (cost=143395.98..159630.37 rows=209476 width=354) -> Subquery Scan draft (cost=125590.52..139730.15 rows=209476 width=354) -> HashAggregate (cost=125590.52..136588.01 rows=209476 width=55) Filter: (((min(CASE WHEN (entry_type = 1) THEN "timestamp" ELSE 0::bigint END) - 1104534000000::bigint) / 3600000) > 0) -> Seq Scan on progression p (cost=0.00..119306.24 rows=209476 width=55) Filter: (((entry_type = 1) OR (entry_type = 2) OR (entry_type = 3) OR (entry_type = 4) OR (entry_type = 8) OR (entry_type = 9) OR (entry_type = 10) OR (entry_type = 16)) AND ("timestamp" > 1104534000000::bigint) AND ("timestamp" < 1107248400000::bigint) AND ((entry_type <> 1) OR ("timestamp" < 1107212400000::bigint)))
В списке pgsql-general по дате отправления:
Предыдущее
От: Ron MayerДата:
Сообщение: Re: [PATCHES] A way to let Vacuum warn if FSM settings are low.
Следующее
От: Tom LaneДата:
Сообщение: Re: [PATCHES] A way to let Vacuum warn if FSM settings are low.