Обсуждение: 100% CPU Utilization when we run queries.
We have a postgresql 8.3.8 DB which consumes 100% of the CPU whenever we run any query. We got vmstat output Machine details are below: # /usr/bin/lscpu Architecture: x86_64 CPU(s): 2 Thread(s) per core: 1 Core(s) per socket: 1 CPU socket(s): 2 NUM node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 26 Stepping: 4 CPU MHz: 2666.761 L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 12288K Memory: Total Used Free Shared Buffers Cached Mem: 3928240 3889552 38688 0 26992 2517012 Swap: 2097144 312208 1784936 # /bin/uname -a Linux wdhsen01 2.6.32.12-0.7-default #1 SMP 2010-05-20 11:14:20 +0200 x86_64 x86_64 x86_64 GNU/Linux . Postgresql Memory parameters: shared_buffers = 1GB work_mem = 128MB maintenance_work_mem = 64MB I have attached the o/p of vmstat command alsp. Can you please help us in tuning any other parameters. http://postgresql.1045698.n5.nabble.com/file/n4465765/untitled.bmp -- View this message in context: http://postgresql.1045698.n5.nabble.com/100-CPU-Utilization-when-we-run-queries-tp4465765p4465765.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
On 8/06/2011 12:19 PM, bakkiya wrote: > We have a postgresql 8.3.8 DB which consumes 100% of the CPU whenever we run > any query. Yep, that's what it's supposed to do if it's not I/O limited. What's the problem? Is the query taking longer than you think it should to execute? Do you expect it to be waiting for disk rather than CPU? Please show your problem query/queries, and EXPLAIN ANALYZE output. See: http://wiki.postgresql.org/wiki/Guide_to_reporting_problems > http://postgresql.1045698.n5.nabble.com/file/n4465765/untitled.bmp Heh. That's been resized to a 250x250 bitmap, which isn't exactly useful or readable. Why not paste the text rather than a screenshot? -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/
http://postgresql.1045698.n5.nabble.com/file/n4475458/untitled.bmp -- View this message in context: http://postgresql.1045698.n5.nabble.com/100-CPU-Utilization-when-we-run-queries-tp4465765p4475458.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
On Wed, Jun 8, 2011 at 07:19, bakkiya <bakkiya@gmail.com> wrote: > We have a postgresql 8.3.8 DB which consumes 100% of the CPU whenever we run > any query. We got vmstat output Machine details are below: Any query? Does even "SELECT 1" not work? Or "SELECT * FROM sometable LIMIT 1" Or are you having problems with only certain kinds of queries? If so, please follow this for how to report it: https://wiki.postgresql.org/wiki/SlowQueryQuestions Regards, Marti
On 06/10/2011 03:39 PM, bakkiya wrote: > http://postgresql.1045698.n5.nabble.com/file/n4475458/untitled.bmp 404 file not found. That's ... not overly useful. Again, *PLEASE* read http://wiki.postgresql.org/wiki/Guide_to_reporting_problems and try posting again with enough information that someone might be able to actually help you. As Marti mentioned and as recommended by the link above, you should particularly focus on the questions here: https://wiki.postgresql.org/wiki/SlowQueryQuestions If you want help from people here, you will need to make enough effort to collect the information required to help you. -- Craig Ringer
Hi, Sorry. I am posting the query details below. Query: SELECT DISTINCT events_rpt_v3.init_service_comp FROM public.events_rpt_v3 events_rpt_v3; events_rpt_v3 is a view based on partition tables. Number of rows in events_rpt_v3: 57878 vmstat o/p: procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 1 0 632108 377948 132928 1392996 1 1 10 33 2 2 1 0 99 0 0 1 0 632108 371476 132928 1399016 0 0 3072 36 834 788 49 1 50 0 0 1 0 632108 364408 132928 1405112 0 0 3072 0 795 707 49 1 50 0 0 1 0 632108 357596 132928 1411944 0 0 3584 0 797 712 50 1 50 0 0 1 0 632108 351148 132928 1418540 0 0 3072 0 827 768 49 1 50 0 0 1 0 632108 344700 132928 1425028 0 0 3072 0 811 725 66 1 34 0 0 1 0 632108 337688 132928 1431976 0 0 3584 16 829 802 49 1 50 0 0 1 0 632108 330860 132928 1438808 0 0 3072 0 804 755 50 1 50 0 0 1 0 632108 323916 132928 1446032 0 0 3584 0 810 738 49 1 50 0 0 1 0 632108 317344 132928 1452544 0 0 3072 0 806 736 50 1 50 0 0 1 0 632108 310648 132928 1459120 0 0 3584 0 793 703 49 1 50 0 0 1 0 632108 304464 132928 1465488 0 0 3072 0 811 745 49 2 50 0 0 1 0 632108 297396 132936 1472068 0 0 3072 12 808 715 49 1 50 0 0 1 0 632108 290468 132936 1478876 0 0 3584 0 797 714 50 1 50 0 0 1 0 632108 284764 132944 1484132 0 0 3072 47284 996 776 50 2 43 6 0 2 0 632108 278564 132944 1490484 0 0 3072 0 813 720 58 1 41 0 0 1 0 632108 272480 132944 1496684 0 0 3072 32 822 742 56 2 43 0 0 1 0 632108 265800 132944 1503420 0 0 3584 0 826 743 50 1 50 0 0 1 0 632108 259592 132944 1509836 0 0 3072 0 798 742 49 1 50 0 0 1 0 632108 252772 132944 1516204 0 0 3072 0 771 716 58 2 40 0 0 1 0 632108 245952 132944 1523052 0 0 3584 4 785 699 50 1 50 0 0 procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 1 0 632108 239380 132944 1529576 0 0 3072 21796 832 649 49 2 49 0 0 1 0 632108 233072 132944 1535908 0 0 3072 0 773 718 50 0 50 0 0 1 0 632108 226128 132944 1542780 0 0 3584 0 834 769 49 1 49 0 0 2 0 632108 219556 132944 1549556 0 0 3072 0 817 757 57 1 43 0 0 1 0 632108 213248 132944 1555864 0 0 3072 0 798 710 57 2 40 0 0 1 0 632108 206480 132944 1562492 0 0 3584 0 841 836 50 1 49 0 0 1 0 632108 200000 132944 1569012 0 0 3072 0 842 809 50 1 50 0 0 1 0 632108 193684 132944 1575320 0 0 3072 0 841 749 49 1 50 0 0 1 0 632108 188104 132956 1580916 0 0 3584 25540 923 772 49 3 46 3 0 novell@wdhsen01:~> vmstat 1 30 procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 1 0 632108 45496 132808 1721868 1 1 10 33 2 2 1 0 99 0 0 2 0 632108 39420 132808 1728308 0 0 3072 0 817 747 58 1 42 0 0 1 0 632108 33344 132808 1734528 0 0 3072 0 760 670 56 1 43 0 0 1 0 632108 82200 132280 1685904 0 0 3584 12 751 679 49 2 49 1 0 1 0 632108 76496 132280 1692268 0 0 3072 0 739 672 48 2 50 0 0 1 0 632108 70172 132280 1698524 0 0 3072 28 767 691 50 1 50 0 0 1 0 632108 63152 132280 1705572 0 0 3584 0 785 727 49 1 50 0 0 1 0 632108 57208 132280 1711568 0 0 3072 0 784 722 49 2 49 0 0 2 0 632108 50884 132280 1717956 0 0 3072 0 804 711 50 1 50 0 0 1 0 632108 43956 132280 1724708 0 0 3584 0 815 779 49 1 50 0 0 2 0 632108 37772 132280 1731088 0 0 3072 32 814 724 56 1 43 0 0 1 0 632108 30456 132280 1738260 0 0 3584 0 790 700 60 1 40 0 0 1 0 632108 80056 131640 1689212 0 0 3072 0 795 716 49 2 50 0 0 1 0 632108 73740 131640 1695520 0 0 3072 0 832 803 50 1 50 0 0 1 0 632108 68664 131648 1700552 0 0 3072 47356 975 713 49 2 43 6 0 1 0 632108 61968 131652 1707292 0 0 3584 28 817 763 49 2 50 0 0 1 0 632108 56016 131652 1713424 0 0 3072 156 795 707 49 2 50 0 0 1 0 632108 49692 131652 1719764 0 0 3072 0 814 755 49 1 50 0 0 1 0 632108 42864 131652 1726720 0 0 3584 0 794 691 49 1 50 0 0 2 0 632108 36912 131652 1732716 0 0 3072 0 794 732 58 1 41 0 0 1 0 632108 91224 130992 1678484 0 0 3584 0 822 730 57 2 42 0 0 procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 1 0 632108 84900 130992 1685028 0 0 3072 22412 924 761 49 1 50 0 0 1 0 632108 78576 130992 1691260 0 0 3072 0 843 776 49 1 50 0 0 1 0 632108 71756 130992 1698284 0 0 3584 0 865 792 50 0 50 0 0 1 0 632108 65308 130992 1704828 0 0 3072 0 861 770 49 2 50 0 0 1 0 632108 58324 130992 1711600 0 0 3584 0 866 905 49 1 50 0 0 1 0 632108 51852 130992 1718008 0 0 3072 0 813 754 49 1 50 0 0 1 0 632108 45496 130992 1724524 0 0 3072 0 812 737 49 1 50 0 0 4 0 632108 38560 130992 1731312 0 0 3584 0 816 737 49 2 50 0 0 2 0 632108 33228 131004 1736472 0 0 3072 26356 904 727 61 3 34 3 0 For small tables, it is not using 100% of CPU, but the same query with limit 1 is also taking 100% of CPU. Explain Analyze o/p is very big. I am pasting here: Unique (cost=133906741.63..133916648.38 rows=200 width=516) (actual time=486765.451..487236.949 rows=35 loops=1) -> Sort (cost=133906741.63..133911695.00 rows=1981350 width=516) (actual time=486765.450..487053.139 rows=1979735 loops=1) Sort Key: events_rpt_v3.init_service_comp Sort Method: external merge Disk: 46416kB -> Subquery Scan events_rpt_v3 (cost=131752986.71..133238999.21 rows=1981350 width=516) (actual time=452529.136..472555.577 rows=1979735 loops=1) -> Unique (cost=131752986.71..133219185.71 rows=1981350 width=73436) (actual time=452529.128..471365.258 rows=1979735 loops=1) -> Sort (cost=131752986.71..131757940.09 rows=1981350 width=73436) (actual time=452529.126..460179.820 rows=1979735 loops=1) Sort Key: public.events.evt_id, public.events.res, public.events.sres, public.events.sev, public.events.evt_time, public.events.evt_time, public.events.device_evt_time, public.events.sentinel_process_time, public.events.begin_time, public.events.end_time, public.events.repeat_cnt, public.events.dp_int, public.events.sp_int, public.events.msg, public.events.evt, public.events.et, public.events.cust_id, public.events.src_asset_id, public.events.dest_asset_id, public.events.agent_id, public.events.prtcl_id, public.events.arch_id, public.events.sip, (to_ip_char(public.events.sip)), public.events.shn, public.events.sp, public.events.dip, (to_ip_char(public.events.dip)), public.events.dhn, public.events.dp, public.events.sun, public.events.dun, public.events.fn, public.events.ei, public.events.init_usr_sys_id, public.events.init_usr_identity_guid, public.events.trgt_usr_sys_id, public.events.trgt_usr_identity_guid, Sort Method: external merge Disk: 1504224kB -> Append (cost=0.00..7167670.80 rows=1981350 width=73436) (actual time=188.427..432870.506 rows=1979735 loops=1) -> Result (cost=0.00..7117927.97 rows=1972965 width=73436) (actual time=188.427..431513.274 rows=1971352 loops=1) -> Append (cost=0.00..212550.47 rows=1972965 width=73436) (actual time=4.445..19200.613 rows=1971352 loops=1) -> Seq Scan on events (cost=0.00..10.00 rows=1 width=73436) (actual time=0.000..0.000 rows=0 loops=1) -> Seq Scan on events_p_max events (cost=0.00..10.00 rows=1 width=73436) (actual time=0.000..0.000 rows=0 loops=1) -> Seq Scan on events_p_20110617110000 events (cost=0.00..10.00 rows=1 width=73436) (actual time=0.000..0.000 rows=0 loops=1) -> Result (cost=0.00..29929.33 rows=8385 width=73436) (actual time=11.691..928.277 rows=8383 loops=1) -> Append (cost=0.00..581.83 rows=8385 width=73436) (actual time=11.194..94.103 rows=8383 loops=1) -> Seq Scan on hist_events (cost=0.00..10.00 rows=1 width=73436) (actual time=0.001..0.001 rows=0 loops=1) -> Seq Scan on hist_events_p_max hist_events (cost=0.00..10.00 rows=1 width=73436) (actual time=0.000..0.000 rows=0 loops=1) -> Seq Scan on hist_events_p_20101105112519 hist_events (cost=0.00..212.18 rows=2918 width=62004) (actual time=11.192..31.355 rows=2918 loops=1) -> Seq Scan on hist_events_p_20101107112519 hist_events (cost=0.00..169.64 rows=2764 width=65510) (actual time=7.605..24.603 rows=2764 loops=1) -> Seq Scan on hist_events_p_20101104112519 hist_events (cost=0.00..180.01 rows=2701 width=61827) (actual time=19.064..37.047 rows=2701 loops=1) Total runtime: 487423.797 ms I have started vaccuum analyzing all the partitions, I will run the query and post the results once it is done. -- View this message in context: http://postgresql.1045698.n5.nabble.com/100-CPU-Utilization-when-we-run-queries-tp4465765p4493567.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
Any help, please? -- View this message in context: http://postgresql.1045698.n5.nabble.com/100-CPU-Utilization-when-we-run-queries-tp4465765p4556775.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
bakkiya <bakkiya@gmail.com> wrote: > Any help, please? You haven't provided enough information for anyone to be able to help. http://wiki.postgresql.org/wiki/Guide_to_reporting_problems -Kevin
Dne 6.7.2011 15:30, bakkiya napsal(a): > Any help, please? According to the EXPLAIN ANALYZE output (please, don't post it to the mailing list directly - use something like explain.depesz.com, I've done that for you this time: http://explain.depesz.com/s/HMN), you're doing a UNIQUE over a lot of data (2 million rows, 1.5GB). That is done by sorting the data, and sorting is very CPU intensive task usually. So the fact that the CPU is 100% utilized is kind of expected in this case. So that's a feature, not a bug. In general each process is hitting some bottleneck. It might be an I/O, it might be a CPU, it might be something less visible (memory bandwidth or something like that). But I've noticed one thing in your query - you're doing a UNIQUE in the view (probably, we don't know the definition) and then once again in the query (but using just one column from the view). The problem is the inner sort does not remove any rows (1979735 rows in/out). Why do you do the UNIQUE in the view? Do you really need it there? I guess removing it might significantly improve the plan. Try to do the query without the view - it seems it's just an union of current tables and a history (both partitioned, so do something like this) SELECT DISTINCT init_service_comp FROM ( SELECT init_service_comp FROM events UNION SELECT init_service_comp FROM hist_events ) or maybe even SELECT DISTINCT init_service_comp FROM ( SELECT DISTINCT init_service_comp FROM events UNION SELECT DISTINCT init_service_comp FROM hist_events ) Let's see how that works - post EXPLAIN ANALYZE using explain.depesz.com Tomas
On Wed, Jul 6, 2011 at 1:04 PM, Tomas Vondra <tv@fuzzy.cz> wrote: > Dne 6.7.2011 15:30, bakkiya napsal(a): >> Any help, please? > > According to the EXPLAIN ANALYZE output (please, don't post it to the > mailing list directly - use something like explain.depesz.com, I've done > that for you this time: http://explain.depesz.com/s/HMN), you're doing a > UNIQUE over a lot of data (2 million rows, 1.5GB). It might have been optimized lately, but generally unique is slower than group by will be.
On 7/07/2011 3:04 AM, Tomas Vondra wrote: > That is done by sorting the data, and sorting is very CPU intensive task > usually. So the fact that the CPU is 100% utilized is kind of expected > in this case. So that's a feature, not a bug. > > In general each process is hitting some bottleneck. It might be an I/O, > it might be a CPU, it might be something less visible (memory bandwidth > or something like that). > This is worth stressing. PostgreSQL will always use as much CPU time and disk I/O as it can to get a job done as quickly as possible. Because most queries need more CPU and less disk, or more disk and less CPU, you'll usually find that PostgreSQL maxes out one or the other but not both. People monitor CPU use more than disk use, so they tend to notice when CPU use is maxed out, but Pg maxes out your disk a lot too. This is normal, and a good thing. If Pg didn't max out your CPU or disk, queries would be slower. If you want to make things other than PostgreSQL happen faster at the expense of slowing down queries, you can use your operating system's process priority mechanisms to give PostgreSQL a lower priority for access to CPU and/or disk. That will still allow PostgreSQL to use all your CPU and disk when nothing else wants it, but will let other programs use it in preference to PostgreSQL if they need it. The same thing applies to memory use. People notice that their operating system reports very little "free" memory and get worried about it. The truth is that your OS should never have much free memory, because that memory is not being used for anything useful. It usually keeps disk cache in memory when it's not needed for anything else, and trying to make more "free" memory clears out the disk cache, making the computer slower. CPU use is a bit like that - it's not doing any good idle, so if nothing else needs it more you might as well use it. If you're on linux, you can use the "nice", "renice" and "ionice" programs to control CPU and disk access priority. -- Craig Ringer POST Newspapers 276 Onslow Rd, Shenton Park Ph: 08 9381 3088 Fax: 08 9388 2258 ABN: 50 008 917 717 http://www.postnewspapers.com.au/
On Wed, Jul 6, 2011 at 9:04 PM, Tomas Vondra <tv@fuzzy.cz> wrote: > Dne 6.7.2011 15:30, bakkiya napsal(a): >> Any help, please? > > According to the EXPLAIN ANALYZE output (please, don't post it to the > mailing list directly - use something like explain.depesz.com, I've done > that for you this time: http://explain.depesz.com/s/HMN), you're doing a > UNIQUE over a lot of data (2 million rows, 1.5GB). > > That is done by sorting the data, and sorting is very CPU intensive task > usually. So the fact that the CPU is 100% utilized is kind of expected > in this case. So that's a feature, not a bug. > > In general each process is hitting some bottleneck. It might be an I/O, > it might be a CPU, it might be something less visible (memory bandwidth > or something like that). > > But I've noticed one thing in your query - you're doing a UNIQUE in the > view (probably, we don't know the definition) and then once again in the > query (but using just one column from the view). > > The problem is the inner sort does not remove any rows (1979735 rows > in/out). Why do you do the UNIQUE in the view? Do you really need it > there? I guess removing it might significantly improve the plan. > > Try to do the query without the view - it seems it's just an union of > current tables and a history (both partitioned, so do something like this) > > SELECT DISTINCT init_service_comp FROM ( > SELECT init_service_comp FROM events > UNION > SELECT init_service_comp FROM hist_events > ) > > or maybe even > > SELECT DISTINCT init_service_comp FROM ( > SELECT DISTINCT init_service_comp FROM events > UNION > SELECT DISTINCT init_service_comp FROM hist_events > ) > > Let's see how that works - post EXPLAIN ANALYZE using explain.depesz.com In this case UNION ALL is probably more appropriate than UNION - and may have different performance characteristics (saving the UNIQUE?). Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/
Thanks all for your help. It is really useful, I will modify the query and post the result. -- View this message in context: http://postgresql.1045698.n5.nabble.com/100-CPU-Utilization-when-we-run-queries-tp4465765p4560941.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.