Problems with query in highly concurrent environment

Поиск
Список
Период
Сортировка
От	Aleksei Valikov
Тема	Problems with query in highly concurrent environment
Дата	24 февраля 2005 г. 22:40:40
Msg-id	421E5751.5070804@gmx.net обсуждение исходный текст
Список	pgsql-general
Дерево обсуждения
Hi.

We have a very strange problem and no idea how to solve it.

We have a PROGRESSION table that stores information about various
telephony events. Rows are never updated, only inserted, about 20-30
thousand rows a day.

Based on this table, we need to produce certain statistics, which are
created by a complex SQL query. This query runs quite fast on our test
systems, but is EXTREMELY slow on the production system. The slow down
factor is 100-500 times. A query that runs 2 seconds on the test system
takes sometimes up to 12 minutes in the production system.

Test systems run on the dumps of the production system, so the
difference between databases is not big. The only difference is that
production system is concurrently accessed by a number of users (and
PROGRESSION table gets plenty of inserts).

During the off-peak times (at night, for instance), statistics on the
production system are as fast as on the test systems (even faster
because of the server hardware).

Moreover, what I find very strange is that when I log in with psql (or
any other SQL client) onto the production database and execute the same
query, it runs fast - even in peak hours!

I would be extremely grateful for any hints or ideas on how could we
resolve this.

My personal theory is that concurrent reads/writes create some kind of
blocks (for instance, on indexes). Simply "heavy load" of the production
system would not explain the slowdown factor of 400. Under which
conditions could writes block reading query? What could we do to
diagnose those blocks?

I'm afraid I can't drop indexes on the production system.

Becoming a bit desperate...

Bye.
/lexi

ps.

The table description:

                                   Table "public.progression"
     Column     |          Type          |                      Modifiers
---------------+------------------------+------------------------------------------------------
  id            | bigint                 | not null default
nextval('progression_id_seq'::text)
  timestamp     | bigint                 | not null
  entry_type    | integer                | not null
  call_id       | bigint                 | not null
  hotline_id    | bigint                 | not null
  hotline_name  | character varying(64)  | not null
  team_id       | bigint                 | not null
  team_name     | character varying(64)  | not null
  expert_id     | bigint                 | not null
  expert_name   | character varying(64)  | not null
  expert_role   | bigint                 | not null
  remote_number | character varying(32)  |
  comment       | character varying(255) | not null default
''::character varying

Indexes:
     "progression_call_id_idx" btree (call_id)
     "progression_entry_type_timestamp_idx" btree (entry_type, "timestamp")
     "progression_expert_id" btree (expert_id)

select count(*) from progression returns 1870961

The statistics query (please be gentle):

SELECT CALL.HOTLINE_ID || '.' || CALL.PERIOD_NUMBER as {record.id},
CALL.PERIOD_NUMBER as {record.periodNumber}, (:startTimestamp +
CALL.PERIOD_NUMBER * :intervalSize) AS {record.timestamp},
CALL.HOTLINE_ID AS {record.hotlineId}, CALL.HOTLINE_NAME AS
{record.hotlineName}, count(*) AS {record.totalCallCount},
sum(CALL.IS_PROCESSED) as {record.processedCallCount},
sum(CALL.TOTAL_TIME) AS {record.totalTime}, sum(CALL.SPOKEN_TIME) as
{record.spokenTime}
     FROM
      (SELECT
         draft._CALL_ID as CALL_ID,
         draft._TIMESTAMP as TIMESTAMP,
         draft._HOTLINE_ID as HOTLINE_ID,
         draft._HOTLINE_NAME as HOTLINE_NAME,
         case when draft._CUSTOMER_CALL_GETS_CONNECTED > 0 then 1 else 0
end as IS_CONNECTED,
         case when draft._CUSTOMER_CALL_GETS_CONNECTED > 0 and
(draft._CUSTOMER_CALL_GETS_DISCONNECTED > 0 OR
draft._CUSTOMER_CALL_DROPPED > 0) then 1 else 0 end as IS_PROCESSED,
         case when draft._AGENT_CALL_GETS_CONNECTED > 0 and
draft._AGENT_CALL_GETS_DISCONNECTED > 0 THEN
draft._AGENT_CALL_GETS_DISCONNECTED - draft._AGENT_CALL_GETS_CONNECTED
              when draft._AGENT_CALL_GETS_CONNECTED > 0 and
draft._AGENT_CALL_DROPPED > 0 THEN draft._AGENT_CALL_DROPPED -
draft._AGENT_CALL_GETS_CONNECTED else 0 END as TOTAL_TIME,
         case when draft._AGENT_CALL_STARTS_TALKING > 0 and
draft._AGENT_CALL_GETS_DISCONNECTED > 0 THEN
draft._AGENT_CALL_GETS_DISCONNECTED - draft._AGENT_CALL_STARTS_TALKING
              when draft._AGENT_CALL_STARTS_TALKING > 0 and
draft._AGENT_CALL_DROPPED > 0 THEN draft._AGENT_CALL_DROPPED -
draft._AGENT_CALL_STARTS_TALKING else 0 END as SPOKEN_TIME,
         floor((draft._TIMESTAMP - :startTimestamp) / :intervalSize) as
PERIOD_NUMBER
       FROM
        (SELECT
          p.call_id
as _CALL_ID,
          sum(case when p.entry_type =  1 then p.timestamp else 0 end)
as _TIMESTAMP         ,
          p.hotline_id
as _HOTLINE_ID,
          p.hotline_name
as _HOTLINE_NAME,
          sum(case when p.entry_type =  2 then p.timestamp else 0 end)
as _CUSTOMER_CALL_GETS_CONNECTED         ,
          sum(case when p.entry_type =  3 then p.timestamp else 0 end)
as _CUSTOMER_CALL_GETS_DISCONNECTED      ,
          sum(case when p.entry_type =  4 then p.timestamp else 0 end)
as _CUSTOMER_CALL_DROPPED                ,
          sum(case when p.entry_type =  8 then p.timestamp else 0 end)
as _AGENT_CALL_GETS_CONNECTED            ,
          sum(case when p.entry_type =  9 then p.timestamp else 0 end)
as _AGENT_CALL_GETS_DISCONNECTED         ,
          sum(case when p.entry_type = 10 then p.timestamp else 0 end)
as _AGENT_CALL_DROPPED                   ,
          sum(case when p.entry_type = 16 then p.timestamp else 0 end)
as _AGENT_CALL_STARTS_TALKING
        FROM
          progression p
        WHERE
          p.hotline_id in (:hotlineIds) and
          p.entry_type in (1,2,3,4,8,9,10,16) and
          p.timestamp > :startTimestamp and
          p.timestamp < (:endTimestamp + :maximalDuration) and
          (p.entry_type != 1 or p.timestamp < :endTimestamp)
       GROUP BY p.call_id, p.hotline_id, p.hotline_name) as draft) AS CALL
     WHERE CALL.TIMESTAMP > 0
     GROUP BY CALL.HOTLINE_ID, CALL.HOTLINE_NAME, CALL.PERIOD_NUMBER

Query execution plan for this query (for the time period of January this
year, all hotlines, :maximalDuration = 36000000, :intervalSize =
3600000) is as follows:

HashAggregate
(cost=143395.98..159630.37 rows=209476 width=354)
  ->  Subquery Scan draft
  (cost=125590.52..139730.15 rows=209476 width=354)
   ->  HashAggregate
   (cost=125590.52..136588.01 rows=209476 width=55)
    Filter: (((min(CASE WHEN (entry_type = 1) THEN "timestamp" ELSE
0::bigint END) - 1104534000000::bigint) / 3600000) > 0)
    ->  Seq Scan on progression p
    (cost=0.00..119306.24 rows=209476 width=55)
     Filter: (((entry_type = 1) OR (entry_type = 2) OR (entry_type = 3)
OR (entry_type = 4) OR (entry_type = 8) OR (entry_type = 9) OR
(entry_type = 10) OR (entry_type = 16)) AND ("timestamp" >
1104534000000::bigint) AND ("timestamp" < 1107248400000::bigint) AND
((entry_type <> 1) OR ("timestamp" < 1107212400000::bigint)))
В списке pgsql-general по дате отправления:
Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Problems with query in highly concurrent environment