Slow create temp table
От | Clinton Adams |
---|---|
Тема | Slow create temp table |
Дата | |
Msg-id | CAEuopLY3cptvK=xWCnEfzExXtV75d0YSYN3kwZi_xeYubgei7A@mail.gmail.com обсуждение исходный текст |
Список | pgsql-performance |
Greetings, Any help regarding a sporadic and mysterious issue would be much appreciated. A pgsql function for loading in data is occasionally taking 12+ hours to complete versus its normal 1-2 hours, due to a slow down at the CREATE TEMP TABLE step. During slow runs of the function, the temp table data file is being written to at 8192 bytes/second. This rate was consistent at the 5 hour mark up until I canceled the query at 6 hrs in. An immediate rerunning of the function finished in an hour. Temp table file size was 226 MB and was created in ~15 mins. PostgreSQL 9.3.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4), 64-bit Linux 2.6.32-431.20.3.el6.x86_64 #1 SMP Thu Jun 19 21:14:45 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux ProLiant DL380p Gen8, 2 x E5-2620 (hyperthreading on) 96 GB pgsql data dir mounted on 25 x ssd storage array, connected via fibre channel, pg_xlog on a RAID 10 hdd array deadline scheduler 8192 readahead name | current_setting | source ------------------------------+---------------------------+-------------------- application_name | psql | client archive_command | ********* | configuration file archive_mode | on | configuration file autovacuum | on | configuration file autovacuum_max_workers | 6 | configuration file bgwriter_delay | 40ms | configuration file bgwriter_lru_maxpages | 1000 | configuration file bgwriter_lru_multiplier | 3 | configuration file checkpoint_completion_target | 0.9 | configuration file checkpoint_segments | 1024 | configuration file checkpoint_timeout | 30min | configuration file client_encoding | UTF8 | client cpu_operator_cost | 0.5 | configuration file cpu_tuple_cost | 0.5 | configuration file DateStyle | ISO, MDY | configuration file deadlock_timeout | 10s | configuration file default_text_search_config | pg_catalog.english | configuration file effective_cache_size | 70GB | configuration file effective_io_concurrency | 6 | configuration file full_page_writes | on | configuration file hot_standby | on | configuration file hot_standby_feedback | on | configuration file lc_messages | en_US.UTF-8 | configuration file lc_monetary | en_US.UTF-8 | configuration file lc_numeric | en_US.UTF-8 | configuration file lc_time | en_US.UTF-8 | configuration file listen_addresses | * | configuration file log_autovacuum_min_duration | 1s | configuration file log_checkpoints | on | configuration file log_destination | csvlog | configuration file log_file_mode | 0600 | configuration file log_filename | postgresql-%a.log | configuration file log_lock_waits | on | configuration file log_min_duration_statement | 250ms | configuration file log_rotation_age | 1d | configuration file log_rotation_size | 0 | configuration file log_statement | ddl | configuration file log_timezone | America/New_York | configuration file log_truncate_on_rotation | on | configuration file logging_collector | on | configuration file maintenance_work_mem | 2400MB | configuration file max_connections | 1000 | configuration file max_stack_depth | 5MB | configuration file max_wal_senders | 5 | configuration file port | 5432 | command line random_page_cost | 4 | session seq_page_cost | 1 | configuration file shared_buffers | 8GB | configuration file shared_preload_libraries | auto_explain | configuration file stats_temp_directory | /var/lib/pgsql_stat_tmpfs | configuration file TimeZone | America/New_York | configuration file track_activities | on | configuration file track_counts | on | configuration file track_functions | all | configuration file track_io_timing | on | configuration file update_process_title | on | configuration file wal_buffers | 64MB | configuration file wal_keep_segments | 2000 | configuration file wal_level | hot_standby | configuration file work_mem | 32MB | configuration file Number of connections at any one time on the database is 300-400, with the majority idle - there are legacy reasons for that and the high max_connections. kernel.shmmax = 50701037568 kernel.shmall = 12378183 vm.swappiness = 0 vm.overcommit_memory = 2 vm.dirty_background_ratio = 2 vm.dirty_background_bytes = 0 vm.dirty_ratio = 5 vm.dirty_bytes = 0 vm.dirty_writeback_centisecs = 500 vm.dirty_expire_centisecs = 3000 EXPLAIN ANALYZE CREATE TEMPORARY TABLE temp AS SELECT rxfill.*, betapat.betapatientid, patrx.rxnum, patrx.patientrxid as patientrxid, betapat.pharmacypatientid AS originalpharmacypatientid, store.storeid as betastoreid FROM rxfilldata_parent rxfill JOIN (select MAX(id) id, storeid, rxnbr FROM rxfilldata_parent tmp WHERE clientid = 118 AND tmp.pkgfileid = 417995 GROUP BY storeid, rxnbr) rxfillmax ON (rxfillmax.id = rxfill.id) JOIN client.client client ON (rxfill.clientid = client.clientid) JOIN client.chain chain ON (client.clientid = chain.clientid) JOIN client.store store ON (chain.chainid = store.chainid AND rxfill.storeid = store.clientstoreid) LEFT OUTER JOIN patient.patientrx118 patrx ON (rxfill.clientid = patrx.clientid AND rxfill.rxnbr = patrx.rxnum AND patrx.storeid = store.storeid) LEFT OUTER JOIN patient.betapatient118 betapat ON (rxfill.clientid = betapat.clientid AND betapat.storeid = store.storeid AND rxfill.pharmacypatientid = betapat.pharmacypatientid) where rxfill.clientid = 118 and rxfill.pkgfileid = 417995; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop Left Join (cost=520598.91..572134.79 rows=7 width=911) (actual time=2862.795..90880.990 rows=273566 loops=1) Join Filter: (betapat.storeid = store.storeid) Rows Removed by Join Filter: 121214 -> Nested Loop Left Join (cost=520511.91..571468.59 rows=7 width=899) (actual time=2862.725..82299.527 rows=273566 loops=1) -> Hash Join (cost=520398.91..570601.16 rows=7 width=887) (actual time=2859.693..5998.178 rows=273566 loops=1) Hash Cond: ((max(tmp.id)) = rxfill.id) -> HashAggregate (cost=510782.11..525124.61 rows=28685 width=15) (actual time=728.322..1014.029 rows=273566 loops=1) -> Bitmap Heap Scan on rxfilldata_parent tmp (cost=100172.71..370349.11 rows=93622 width=15) (actual time=44.358..148.145 rows=274808 loops=1) Recheck Cond: ((clientid = 118) AND (pkgfileid = 417995)) -> Bitmap Index Scan on ix_neededimport (cost=0.00..95491.61 rows=93622 width=0) (actual time=38.742..38.742 rows=274808 loops=1) Index Cond: ((clientid = 118) AND (pkgfileid = 417995)) -> Hash (cost=9206.80..9206.80 rows=410 width=887) (actual time=2130.473..2130.473 rows=274808 loops=1) Buckets: 1024 Batches: 8 (originally 1) Memory Usage: 32769kB -> Nested Loop (cost=264.27..9206.80 rows=410 width=887) (actual time=1.239..1216.518 rows=274808 loops=1) -> Nested Loop (cost=178.77..261.15 rows=13 width=13) (actual time=0.343..2.309 rows=966 loops=1) -> Nested Loop (cost=110.00..124.51 rows=1 width=8) (actual time=0.093..0.098 rows=1 loops=1) -> Index Only Scan using pk_client on client (cost=55.00..60.00 rows=1 width=4) (actual time=0.076..0.078 rows=1 loops=1) Index Cond: (clientid = 118) Heap Fetches: 0 -> Index Scan using ix_chain on chain (cost=55.00..64.00 rows=1 width=8) (actual time=0.014..0.016 rows=1 loops=1) Index Cond: (clientid = 118) -> Bitmap Heap Scan on store (cost=68.77..129.64 rows=14 width=13) (actual time=0.246..1.572 rows=966 loops=1) Recheck Cond: (chainid = chain.chainid) -> Bitmap Index Scan on ix_store (cost=0.00..68.07 rows=14 width=0) (actual time=0.216..0.216 rows=966 loops=1) Index Cond: (chainid = chain.chainid) -> Index Scan using ix_merge8_daily on rxfilldata_parent rxfill (cost=85.50..670.63 rows=35 width=883) (actual time=0.120..1.081 rows=284 loops=966) Index Cond: (((storeid)::text = (store.clientstoreid)::text) AND (clientid = 118)) Filter: (pkgfileid = 417995) Rows Removed by Filter: 292 -> Index Scan using ux2_patientrx118 on patientrx118 patrx (cost=113.00..123.42 rows=1 width=20) (actual time=0.275..0.277 rows=1 loops=273566) Index Cond: (((rxfill.rxnbr)::text = (rxnum)::text) AND (rxfill.clientid = clientid) AND (clientid = 118) AND (storeid = store.storeid)) -> Index Scan using ix3_betapatient118 on betapatient118 betapat (cost=87.00..94.17 rows=1 width=20) (actual time=0.025..0.029 rows=1 loops=273566) Index Cond: ((rxfill.pharmacypatientid)::text = (pharmacypatientid)::text) Filter: ((clientid = 118) AND (rxfill.clientid = clientid)) Total runtime: 90945.841 ms pg_bgwriter_snapshots: now | checkpoints_timed | checkpoints_req | checkpoint_write_time | checkpoint_sync_time | buffers_checkpoint | buffers_clean | maxwritten_clean | buffers_backend | buffers_backend_fsync | buffers_alloc | stats_reset -------------------------------+-------------------+-----------------+-----------------------+----------------------+--------------------+---------------+------------------+-----------------+-----------------------+---------------+------------------------------- 2014-07-29 10:00:01.65379-04 | 4 | 0 | 2108808 | 1586 | 151291 | 2207939 | 83 | 67605 | 0 | 28932559 | 2014-07-29 08:01:01.623033-04 2014-07-29 11:00:01.578856-04 | 2 | 0 | 409840 | 618 | 17033 | 734718 | 53 | 52256 | 0 | 30524848 | 2014-07-29 10:01:01.877218-04 2014-07-29 12:00:01.420009-04 | 4 | 0 | 1939095 | 1120 | 44134 | 1515493 | 124 | 114122 | 0 | 43833671 | 2014-07-29 10:01:01.877218-04 2014-07-29 13:00:01.234634-04 | 2 | 0 | 1364481 | 169 | 16329 | 427183 | 11 | 30784 | 0 | 21273967 | 2014-07-29 12:01:01.542161-04 2014-07-29 14:00:02.007022-04 | 4 | 0 | 2493810 | 607 | 43316 | 1233492 | 115 | 93203 | 0 | 42564936 | 2014-07-29 12:01:01.542161-04 2014-07-29 15:00:01.713446-04 | 2 | 0 | 854284 | 93 | 15033 | 215280 | 28 | 14880 | 0 | 25766378 | 2014-07-29 14:01:01.119926-04 2014-07-29 16:00:01.542989-04 | 4 | 0 | 2704023 | 322 | 28730 | 330864 | 28 | 22278 | 0 | 43323265 | 2014-07-29 14:01:01.119926-04 2014-07-29 17:00:01.39066-04 | 2 | 0 | 809083 | 139 | 8264 | 167504 | 0 | 15206 | 0 | 33998495 | 2014-07-29 16:01:01.671427-04 2014-07-29 18:00:01.375621-04 | 4 | 0 | 1767296 | 335 | 17826 | 252109 | 0 | 29399 | 0 | 58038221 | 2014-07-29 16:01:01.671427-04 2014-07-29 19:00:01.252766-04 | 2 | 0 | 851746 | 278 | 30729 | 1737370 | 149 | 36806 | 0 | 41487870 | 2014-07-29 18:01:01.474614-04 2014-07-29 20:00:02.07362-04 | 4 | 0 | 1026422 | 364 | 44182 | 2209503 | 219 | 68020 | 0 | 63828691 | 2014-07-29 18:01:01.474614-04 The sar output for the timeframe the problem occurred is available as well, if that would be helpful, but does not show any issues of unusual load. Thanks for any help, Clinton
В списке pgsql-performance по дате отправления: