Re: Database transaction with intermittent slow responses

От: Gerardo Herzig
Тема: Re: Database transaction with intermittent slow responses
Дата: ,
Msg-id: 853320287.1573975.1463173501777.JavaMail.root@fmed.uba.ar
(см: обсуждение, исходный текст)
Ответ на: Database transaction with intermittent slow responses  (John Gorman)
Ответы: Re: Database transaction with intermittent slow responses  (John Gorman)
Список: pgsql-performance

Скрыть дерево обсуждения

Database transaction with intermittent slow responses  (John Gorman, )
 Re: Database transaction with intermittent slow responses  (Gerardo Herzig, )
  Re: Database transaction with intermittent slow responses  (John Gorman, )
   Re: Database transaction with intermittent slow responses  (Gerardo Herzig, )
    Re: Database transaction with intermittent slow responses  (Robert Klemme, )
    Re: Database transaction with intermittent slow responses  (John Gorman, )

After quick reading, im thinking about a couples of chances:

1) You are hitting a connection_limit
2) You are hitting a lock contention (perhaps some other backend is locking the table and not releasing it)

Who throws the timeout? It is Postgres or your JDBC connector?

My initial blind guess is that your "timed out queries" never gets postgres at all, and are blocked prior to that for
someother issue. If im wrong, well, you should at least have the timeout recorded in your logs. 

You should also track #of_connectinos and #of_locks over that tables.

See http://www.postgresql.org/docs/9.1/static/view-pg-locks.html for pg_lock information

That should be my starting point for viewing whats going on.

HTH
Gerardo

----- Mensaje original -----
> De: "John Gorman" <>
> Para: 
> CC: "John Gorman" <>
> Enviados: Viernes, 13 de Mayo 2016 16:59:51
> Asunto: [PERFORM] Database transaction with intermittent slow responses
>
>
> Transactions to table, ChangeHistory, have recently become
> intermittently slow and is increasing becoming slower.
>
> * No database configuration changes have been made recently
> * We have run vacuum analyze
> * We have tried backing up and reloading the table (data, indexes,
> etc)
>
> Some transactions respond quickly (200 ms) and others take over 55
> seconds (we cancel the query after 55 seconds – our timeout SLA).
> The problem has recently become very bad. It is the same query being
> issued but with different parameters.
>
> If the transaction takes over 55 seconds and I run the query manually
> (with or without EXPLAIN ANALYZE) it returns quickly (a few hundred
> ms). In case I am looking at cache, I have a list of other queries
> (just different parameters) that have timed out and when I run them
> (without the limit even) the response is very timely.
>
> Any help or insight would be great.
>
> NOTE: our application is connecting to the database via JDBC and we
> are using PreparedStatements. I have provided full details so all
> information is available, but please let me know if any other
> information is needed – thx in advance.
>
> p306=> EXPLAIN ANALYZE SELECT * FROM ChangeHistory WHERE Category
> BETWEEN 'Employee' AND 'Employeezz' AND PrimaryKeyOfChange BETWEEN
> '312313' AND '312313!zz' ORDER BY ChgTS DESC, ChgUser DESC, Category
> DESC, PrimaryKeyOfChange DESC LIMIT 11;
> QUERY PLAN
> ------------------------------------------------------------------------------------------------------
> Limit (cost=33.66..33.67 rows=1 width=136) (actual time=0.297..0.297
> rows=11 loops=1)
> -> Sort (cost=33.66..33.67 rows=1 width=136) (actual
> time=0.297..0.297 rows=11 loops=1)
> Sort Key: chgts, chguser, category, primarykeyofchange
> Sort Method: top-N heapsort Memory: 27kB
> -> Index Scan using changehistory_idx4 on changehistory
> (cost=0.00..33.65 rows=1 width=136) (actual time=0.046..
> 0.239 rows=85 loops=1)
> Index Cond: (((primarykeyofchange)::text >= '312313'::text) AND
> ((primarykeyofchange)::text <= '312313!zz'::
> text))
> Filter: (((category)::text >= 'Employee'::text) AND ((category)::text
> <= 'Employeezz'::text))
> Total runtime: 0.328 ms
> (8 rows)
>
> >>>
> History this week of counts with good response times vs timeouts.
>
> | Date | Success # | Time Out # | Avg. Success Secs |
> |------------+-----------+------------+-------------------|
> | 2016-05-09 | 18 | 31 | 7.9 |
> | 2016-05-10 | 17 | 25 | 10.5 |
> | 2016-05-11 | 27 | 33 | 10.1 |
> | 2016-05-12 | 68 | 24 | 9.9 |
>
>
> >>> Sample transaction response times
>
> | Timestamp | Tran ID | Resp MS | Resp CD
> --------------------+----------------+---------+--------
> 2016-05-10 06:20:19 | ListChangeHist | 55,023 | TIMEOUT
> 2016-05-10 07:47:34 | ListChangeHist | 55,017 | TIMEOUT
> 2016-05-10 07:48:00 | ListChangeHist | 9,866 | OK
> 2016-05-10 07:48:10 | ListChangeHist | 2,327 | OK
> 2016-05-10 07:59:23 | ListChangeHist | 55,020 | TIMEOUT
> 2016-05-10 08:11:20 | ListChangeHist | 55,030 | TIMEOUT
> 2016-05-10 08:31:45 | ListChangeHist | 4,216 | OK
> 2016-05-10 08:35:09 | ListChangeHist | 7,898 | OK
> 2016-05-10 08:36:18 | ListChangeHist | 9,810 | OK
> 2016-05-10 08:36:56 | ListChangeHist | 55,027 | TIMEOUT
> 2016-05-10 08:37:33 | ListChangeHist | 46,433 | OK
> 2016-05-10 08:38:09 | ListChangeHist | 55,019 | TIMEOUT
> 2016-05-10 08:53:43 | ListChangeHist | 55,019 | TIMEOUT
> 2016-05-10 09:45:09 | ListChangeHist | 55,022 | TIMEOUT
> 2016-05-10 09:46:13 | ListChangeHist | 55,017 | TIMEOUT
> 2016-05-10 09:49:27 | ListChangeHist | 55,011 | TIMEOUT
> 2016-05-10 09:52:12 | ListChangeHist | 55,018 | TIMEOUT
> 2016-05-10 09:57:42 | ListChangeHist | 9,462 | OK
> 2016-05-10 10:05:21 | ListChangeHist | 55,016 | TIMEOUT
> 2016-05-10 10:05:29 | ListChangeHist | 136 | OK
> 2016-05-10 10:05:38 | ListChangeHist | 1,517 | OK
>
> Artifacts
> ======================
>
> $ >uname -a
> SunOS ***** 5.10 Generic_150400-30 sun4v sparc sun4v
>
> Memory : 254G phys mem, 207G free mem.
> Processors: 32 - CPU is mostly 80% free
>
> >>>
> p306=> select version();
> version
> ---------------------------------------------------------------------------------------------------
> PostgreSQL 9.1.14 on sparc-sun-solaris2.10, compiled by gcc (GCC)
> 3.4.3 (csl-sol210-3_4-branch+sol_rpath), 64-bit
>
> >>>
> p306=> \dt+ changehistory
> List of relations
> Schema | Name | Type | Owner | Size | Description
> --------+---------------+-------+-------+-------+-------------
> public | changehistory | table | p306 | 17 GB |
>
> >>>
> p306=> \di+ changehistory*
> List of relations
> Schema | Name | Type | Owner | Table | Size | Description
> --------+-----------------------+-------+-------+---------------+---------+-------------
> public | changehistory_idx1 | index | p306 | changehistory | 9597 MB
> |
> public | changehistory_idx3 | index | p306 | changehistory | 11 GB |
> public | changehistory_idx4 | index | p306 | changehistory | 4973 MB
> |
> public | changehistory_pkey | index | p306 | changehistory | 2791 MB
> |
> public | changehistory_search2 | index | p306 | changehistory | 9888
> MB |
> public | changehistory_search3 | index | p306 | changehistory | 10 GB
> |
> public | changehistory_search4 | index | p306 | changehistory | 9240
> MB |
> public | changehistory_search5 | index | p306 | changehistory | 8373
> MB |
> (8 rows)
>
>
> >>>
> p306=> select count(*) from changehistory ;
> count
> ------------
> 129,185,024
>
> >>>
> Show all (filtered)
> ======================================================
>
> name | setting
> ---------------------------------+--------------------
> autovacuum | on
> autovacuum_analyze_scale_factor | 0.001
> autovacuum_analyze_threshold | 500
> autovacuum_freeze_max_age | 200000000
> autovacuum_max_workers | 5
> autovacuum_naptime | 1min
> autovacuum_vacuum_cost_delay | 0
> autovacuum_vacuum_cost_limit | -1
> autovacuum_vacuum_scale_factor | 0.001
> autovacuum_vacuum_threshold | 500
> bgwriter_delay | 200ms
> block_size | 8192
> check_function_bodies | on
> checkpoint_completion_target | 0.9
> checkpoint_segments | 256
> checkpoint_timeout | 1h
> checkpoint_warning | 30s
> client_encoding | UTF8
> commit_delay | 0
> commit_siblings | 5
> cpu_index_tuple_cost | 0.005
> cpu_operator_cost | 0.0025
> cpu_tuple_cost | 0.01
> cursor_tuple_fraction | 0.1
> deadlock_timeout | 1s
> default_statistics_target | 100
> default_transaction_deferrable | off
> default_transaction_isolation | read committed
> default_transaction_read_only | off
> default_with_oids | off
> effective_cache_size | 8GB
> from_collapse_limit | 8
> fsync | on
> full_page_writes | on
> ignore_system_indexes | off
> join_collapse_limit | 8
> krb_caseins_users | off
> lo_compat_privileges | off
> maintenance_work_mem | 1GB
> max_connections | 350
> max_files_per_process | 1000
> max_function_args | 100
> max_identifier_length | 63
> max_index_keys | 32
> max_locks_per_transaction | 64
> max_pred_locks_per_transaction | 64
> max_prepared_transactions | 0
> max_stack_depth | 2MB
> max_wal_senders | 5
> random_page_cost | 4
> segment_size | 1GB
> seq_page_cost | 1
> server_encoding | UTF8
> server_version | 9.1.14
> shared_buffers | 2GB
> sql_inheritance | on
> statement_timeout | 0
> synchronize_seqscans | on
> synchronous_commit | on
> synchronous_standby_names |
> tcp_keepalives_count | 0
> tcp_keepalives_idle | -1
> tcp_keepalives_interval | 0
> track_activities | on
> track_activity_query_size | 1024
> track_counts | on
> track_functions | none
> transaction_deferrable | off
> transaction_isolation | read committed
> transaction_read_only | off
> transform_null_equals | off
> update_process_title | on
> vacuum_cost_delay | 0
> vacuum_cost_limit | 200
> vacuum_cost_page_dirty | 20
> vacuum_cost_page_hit | 1
> vacuum_cost_page_miss | 10
> vacuum_defer_cleanup_age | 0
> vacuum_freeze_min_age | 50000000
> vacuum_freeze_table_age | 150000000
>
> John Gorman | Manager of Production Support, Architecture, Release
> Engineering | Eldorado | a Division of M PHASI S | www.eldoinc.com/
> |
> 5353 North 16 th Street, Suite 400, Phoenix, Arizona 85016-3228 | Tel
> 602.604.3100 | Fax: 602.604.3115
>
>
>



В списке pgsql-performance по дате сообщения:

От: Gerardo Herzig
Дата:
Сообщение: Re: Database transaction with intermittent slow responses
От: Robert Klemme
Дата:
Сообщение: Re: Database transaction with intermittent slow responses