Обсуждение: Database performance problem
Hi All, I have recently migrated a Postgres database from 7.4 running on gentoo to 8.1 running on SLES 10. I migrated the data using pg_dump and then running the SQL in psql. The old server was a dual AMD opteron 2.6 GHz machine with a RAID 5 array and 4GB memory. The new machine is a dual dual-core AMD Opteron 2.6GHz with RAID 1 and 16GB memory. Several reports are taking way too long to run. I snagged a SELECT from one of the reports. It is a fairly complex query with 4 joins, which unfortunately I can't share. I can say that the plan for the query on both machines looks nearly identical - that is there are no sequential scans happening on the old DB server that aren't also happening on the new server. The total database size is less that 1GB. On my new DB server, it takes about 27 seconds for this query to run. On the old server, it takes 2 seconds. I've changed shared_buffers, checkpoint_segments, effective_cache_size and random_page_cost in an attempt to improve performance. That has helped a little, but 27 seconds is still almost 14 times what it used to take. I feel I'm missing something obvious, and I don't have much experience with Postgres. Any ideas? Thanks! Chris ----------------------------------------- IMPORTANT: The sender intends that this electronic message is for exclusive use by the person to whom it is addressed. This message may contain information that is confidential or privileged and exempt from disclosure under applicable law. If the reader of this message is not an intended recipient, be aware that any disclosure, dissemination, distribution or copying of this communication, or the use of its contents, is prohibited. If you have received this message in error, please immediately notify the sender of your inadvertent receipt and delete this message from all data storage systems. Thank you.
Porell, Chris wrote: > I have recently migrated a Postgres database from 7.4 running on gentoo to > 8.1 running on SLES 10. I migrated the data using pg_dump and then running > the SQL in psql. The old server was a dual AMD opteron 2.6 GHz machine with > a RAID 5 array and 4GB memory. The new machine is a dual dual-core AMD > Opteron 2.6GHz with RAID 1 and 16GB memory. Several reports are taking way > too long to run. I snagged a SELECT from one of the reports. It is a > fairly complex query with 4 joins, which unfortunately I can't share. I can > say that the plan for the query on both machines looks nearly identical - > that is there are no sequential scans happening on the old DB server that > aren't also happening on the new server. > > The total database size is less that 1GB. > > On my new DB server, it takes about 27 seconds for this query to run. On > the old server, it takes 2 seconds. > > I've changed shared_buffers, checkpoint_segments, effective_cache_size and > random_page_cost in an attempt to improve performance. That has helped a > little, but 27 seconds is still almost 14 times what it used to take. I > feel I'm missing something obvious, and I don't have much experience with > Postgres. > > Any ideas? I assume you have analyzed after you reloaded the data? If so, can you get a explain analzye of the query in question from both the old server and the new server and post that to the list.
Porell, Chris wrote: > ... I snagged a SELECT from one of the reports. It is a > fairly complex query with 4 joins, which unfortunately I can't share. I can > say that the plan for the query on both machines looks nearly identical - > that is there are no sequential scans happening on the old DB server that > aren't also happening on the new server. Did you use "EXPLAIN" or "EXPLAIN ANALYZE"? EXPLAIN alone just shows the plan - it doesn't run the query. EXPLAIN ANALYZE runs the query and reports time actually spent at each step. This should tell you if there is a particular part of the query that is causing the problems. NOTE! EXPLAIN ANALYZE executes the query, discards the result, and shows the timing information. If the query modifies the database, those modifications will still happen. In this case the manual recommends wrapping the statement; BEGIN; EXPLAIN ANALYZE ...; ROLLBACK; Cheers, Steve
Porell, Chris wrote: > Hi All, > I've changed shared_buffers, checkpoint_segments, effective_cache_size and > random_page_cost in an attempt to improve performance. That has helped a > little... Another thought. Have you looked at "work_mem" - this is probably a far more important setting. The units are kilobytes and you can set this on a session-by-session basis to test without needing to update postgresql.conf or reload the server by issuing "set work_mem to xxx;" I have individual nightly batch-runs where I greatly improved the processing time by setting work_mem to values in the range of 100000-500000 as needed. Note that if you are comparing your old configuration to your new one, this parameter used to be called "sort_mem". Cheers, Steve
On 6/13/07, Porell, Chris <Chris.Porell@ceridian.com> wrote: > Hi All, [...] > the SQL in psql. The old server was a dual AMD opteron 2.6 GHz machine with > a RAID 5 array and 4GB memory. The new machine is a dual dual-core AMD > Opteron 2.6GHz with RAID 1 and 16GB memory. [...] > On my new DB server, it takes about 27 seconds for this query to run. On > the old server, it takes 2 seconds. [...] > Any ideas? How many spindles were there in the RAID-5 on the old box, how many in the RAID-1 on the new one? What kind of controller? > Thanks! > Chris Cheers, Andrej -- Please don't top post, and don't use HTML e-Mail :} Make your quotes concise. http://www.american.edu/econ/notes/htmlmail.htm
Thanks all for the suggestions and for your willingness to help! First, I have done a vacuumdb --all --analyze. I have this scheduled nightly and I've also done it on demand to be sure it is fresh. My work_mem parameter is currently 32000. When I upped it from 2000, I saw no visible performance difference, though I'm sure the default was too small. >100000 may be better The old RAID-5 controller had 4 spindles, I believe. I don't know the type, but the machine was "homebuilt". Not implying anything by that ;-). The new server is a HP DL385 with 2 spindles on a SmartArray controller. Antoher piece of info... when this query is running, one CPU is pegged at 100%, top does not show that it is waiting on I/O. Lastly, the EXPLAIN ANALYZE output. Thank you all again!!! Aggregate (cost=9817.55..9817.56 rows=1 width=4) (actual time=160199.861..160199.862 rows=1 loops=1) -> Nested Loop (cost=4387.04..9817.54 rows=1 width=4) (actual time=1134.020..160195.837 rows=1842 loops=1) Join Filter: (("inner".recordnumber = "outer".recordnumber) AND ("outer".aaaa < ("inner".aaaa - 1::numeric))) -> Hash Join (cost=4387.04..9796.71 rows=1 width=56) (actual time=684.721..1057.800 rows=4816 loops=1) Hash Cond: ("outer".recordnumber = "inner".recordnumber) -> Hash Left Join (cost=727.18..6118.18 rows=3734 width=8) (actual time=95.695..426.861 rows=12573 loops=1) Hash Cond: ("outer".recordnumber = "inner".recordnumber) -> Hash Left Join (cost=717.68..6080.00 rows=3734 width=8) (actual time=94.746..387.821 rows=12573 loops=1) Hash Cond: ("outer".recordnumber = "inner".recordnumber) -> Hash Join (cost=697.90..6021.76 rows=3734 width=8) (actual time=93.148..347.491 rows=12573 loops=1) Hash Cond: ("outer".recordnumber = "inner".recordnumber) -> Seq Scan on rrrr (cost=0.00..4626.79 rows=52779 width=4) (actual time=0.009..98.233 rows=52779 loops=1) -> Hash (cost=688.56..688.56 rows=3734 width=4) (actual time=92.854..92.854 rows=12573 loops=1) -> Hash Left Join (cost=156.46..688.56 rows=3734 width=4) (actual time=10.702..73.651 rows=12573 loops=1) Hash Cond: ("outer".gggg = "inner".gggg) Filter: ((NOT "inner".bbbb) OR ("inner".bbbb IS NULL)) -> Seq Scan on cccc e (cost=0.00..459.32 rows=3734 width=8) (actual time=0.012..28.119 rows=12573 loops=1) Filter: (dddd AND (gggg <> 0) AND ((eeee)::date <= '20006-01-31 00:00:00'::timestamp without time zone)) -> Hash (cost=149.17..149.17 rows=2917 width=5) (actual time=10.668..10.668 rows=2917 loops=1) -> Seq Scan on ffff p (cost=0.00..149.17 rows=2917 width=5) (actual time=0.005..6.110 rows=2917 loops=1) -> Hash (cost=18.63..18.63 rows=463 width=4) (actual time=1.574..1.574 rows=463 loops=1) -> Seq Scan on jjjj (cost=0.00..18.63 rows=463 width=4) (actual time=0.007..0.866 rows=463 loops=1) -> Hash (cost=8.80..8.80 rows=280 width=4) (actual time=0.937..0.937 rows=280 loops=1) -> Seq Scan on kkkk (cost=0.00..8.80 rows=280 width=4) (actual time=0.005..0.501 rows=280 loops=1) -> Hash (cost=3659.85..3659.85 rows=2 width=48) (actual time=588.916..588.916 rows=4816 loops=1) -> Hash Join (cost=3642.33..3659.85 rows=2 width=48) (actual time=559.069..581.084 rows=4816 loops=1) Hash Cond: ("outer".recordnumber = "inner".recordnumber) -> Function Scan on aaaaresults (cost=0.00..12.50 rows=1000 width=36) (actual time=271.933..277.842 rows=4817 loops=1) -> Hash (cost=3642.05..3642.05 rows=114 width=12) (actual time=287.113..287.113 rows=4918 loops=1) -> Hash Join (cost=734.16..3642.05 rows=114 width=12) (actual time=80.629..278.870 rows=4918 loops=1) Hash Cond: ("outer".llll = "inner".llll) -> Nested Loop (cost=727.57..3633.75 rows=114 width=16) (actual time=80.222..264.871 rows=4918 loops=1) -> Hash Join (cost=727.57..2174.52 rows=388 width=12) (actual time=80.201..213.412 rows=4918 loops=1) Hash Cond: ("outer".recordnumber = "inner".recordnumber) Join Filter: (("outer".iiii >= (("inner".eeee)::date + '1 year'::interval)) AND ("outer".iiii < (("inner".eeee)::date + '1 year 3 mons'::interval))) -> Nested Loop (cost=50.32..1332.01 rows=7806 width=16) (actual time=3.112..84.800 rows=19024 loops=1) -> Seq Scan on mmmm s (cost=0.00..1.05 rows=1 width=4) (actual time=0.008..0.017 rows=1 loops=1) Filter: ((((status)::text = 'something'::text) OR ((status)::text = 'nnnn'::text)) AND ((status)::text = 'something'::text)) -> Bitmap Heap Scan on oooo f (cost=50.32..1233.38 rows=7806 width=20) (actual time=3.091..35.010 rows=19024 loops=1) Recheck Cond: ("outer".id = f.status) -> Bitmap Index Scan on fw_st (cost=0.00..50.32 rows=7806 width=0)(actual time=2.911..2.911 rows=19024 loops=1) Index Cond: ("outer".id = f.status) -> Hash (cost=659.87..659.87 rows=6954 width=12) (actual time=77.028..77.028 rows=10045 loops=1) -> Hash Join (cost=153.42..659.87 rows=6954 width=12) (actual time=6.543..61.899 rows=10045 loops=1) Hash Cond: ("outer".gggg = "inner".gggg) -> Seq Scan on cccc e (cost=0.00..342.61 rows=12573 width=16) (actual time=0.005..24.873 rows=12585 loops=1) Filter: dddd -> Hash (cost=149.17..149.17 rows=1700 width=4) (actual time=6.526..6.526 rows=1702 loops=1) -> Seq Scan on ffff p (cost=0.00..149.17 rows=1700 width=4) (actual time=0.006..4.103 rows=1702 loops=1) Filter: (pppp OR qqqq) -> Index Scan using rrrr_pkey on rrrr c (cost=0.00..3.75 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=4918) Index Cond: (c.recordnumber = "outer".recordnumber) -> Hash (cost=6.27..6.27 rows=127 width=4) (actual time=0.393..0.393 rows=127 loops=1) -> Seq Scan on ssss r (cost=0.00..6.27 rows=127 width=4) (actual time=0.008..0.221 rows=127 loops=1) -> Function Scan on aaaaresults (cost=0.00..15.00 rows=333 width=36) (actual time=0.087..18.696 rows=11306 loops=4816) Filter: (aaaa >= 25::numeric) Total runtime: 160202.265 ms (57 rows) -----Original Message----- From: Steve Crawford [mailto:scrawford@pinpointresearch.com] Sent: Tuesday, June 12, 2007 4:35 PM To: Porell, Chris Cc: 'pgsql-general@postgresql.org' Subject: Re: [GENERAL] Database performance problem Porell, Chris wrote: > Hi All, > I've changed shared_buffers, checkpoint_segments, effective_cache_size and > random_page_cost in an attempt to improve performance. That has helped a > little... Another thought. Have you looked at "work_mem" - this is probably a far more important setting. The units are kilobytes and you can set this on a session-by-session basis to test without needing to update postgresql.conf or reload the server by issuing "set work_mem to xxx;" I have individual nightly batch-runs where I greatly improved the processing time by setting work_mem to values in the range of 100000-500000 as needed. Note that if you are comparing your old configuration to your new one, this parameter used to be called "sort_mem". Cheers, Steve ----------------------------------------- IMPORTANT: The sender intends that this electronic message is for exclusive use by the person to whom it is addressed. This message may contain information that is confidential or privileged and exempt from disclosure under applicable law. If the reader of this message is not an intended recipient, be aware that any disclosure, dissemination, distribution or copying of this communication, or the use of its contents, is prohibited. If you have received this message in error, please immediately notify the sender of your inadvertent receipt and delete this message from all data storage systems. Thank you.
"Porell, Chris" <Chris.Porell@ceridian.com> writes: > Lastly, the EXPLAIN ANALYZE output. Do you have the equivalent for the old installation? > -> Nested Loop (cost=4387.04..9817.54 rows=1 width=4) (actual time=1134.020..160195.837 rows=1842 loops=1) > Join Filter: (("inner".recordnumber = "outer".recordnumber) AND ("outer".aaaa < ("inner".aaaa - 1::numeric))) > -> Hash Join (cost=4387.04..9796.71 rows=1 width=56) (actual time=684.721..1057.800 rows=4816 loops=1) > ... > -> Function Scan on aaaaresults (cost=0.00..15.00 rows=333 width=36) (actual time=0.087..18.696 rows=11306 loops=4816) > Filter: (aaaa >= 25::numeric) > Total runtime: 160202.265 ms This join is what's killing you, and even more specifically the factor of 4800 misestimate of the size of the hashjoin result. It wouldn't have tried a nestloop if the rowcount estimate had been even a little bit closer to reality. The misestimate seems to be mostly due to this lower join: > -> Hash Join (cost=3642.33..3659.85 rows=2 width=48) (actual time=559.069..581.084 rows=4816 loops=1) > Hash Cond: ("outer".recordnumber = "inner".recordnumber) > -> Function Scan on aaaaresults (cost=0.00..12.50 rows=1000 width=36) (actual time=271.933..277.842rows=4817 loops=1) > -> Hash (cost=3642.05..3642.05 rows=114 width=12) (actual time=287.113..287.113 rows=4918loops=1) I suppose this is not actually the same function that you are obscuring in the other case? Anyway this seems a bit strange, because with no stats on the functionscan result, I'd have expected a more conservative (larger) estimate for the size of the join result. Can you show us the pg_stats row for the column you've labeled inner.recordnumber here? regards, tom lane
I will get that and post it. NEW NEWS... turning off "enable_seqscan" made the query run in about .25 seconds!!! Now we're re-evaluating effective_cache_size -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Tuesday, June 12, 2007 5:43 PM To: Porell, Chris Cc: 'pgsql-general@postgresql.org' Subject: Re: [GENERAL] Database performance problem "Porell, Chris" <Chris.Porell@ceridian.com> writes: > Lastly, the EXPLAIN ANALYZE output. Do you have the equivalent for the old installation? > -> Nested Loop (cost=4387.04..9817.54 rows=1 width=4) (actual time=1134.020..160195.837 rows=1842 loops=1) > Join Filter: (("inner".recordnumber = "outer".recordnumber) AND ("outer".aaaa < ("inner".aaaa - 1::numeric))) > -> Hash Join (cost=4387.04..9796.71 rows=1 width=56) (actual time=684.721..1057.800 rows=4816 loops=1) > ... > -> Function Scan on aaaaresults (cost=0.00..15.00 rows=333 width=36) (actual time=0.087..18.696 rows=11306 loops=4816) > Filter: (aaaa >= 25::numeric) > Total runtime: 160202.265 ms This join is what's killing you, and even more specifically the factor of 4800 misestimate of the size of the hashjoin result. It wouldn't have tried a nestloop if the rowcount estimate had been even a little bit closer to reality. The misestimate seems to be mostly due to this lower join: > -> Hash Join (cost=3642.33..3659.85 rows=2 width=48) (actual time=559.069..581.084 rows=4816 loops=1) > Hash Cond: ("outer".recordnumber = "inner".recordnumber) > -> Function Scan on aaaaresults (cost=0.00..12.50 rows=1000 width=36) (actual time=271.933..277.842 rows=4817 loops=1) > -> Hash (cost=3642.05..3642.05 rows=114 width=12) (actual time=287.113..287.113 rows=4918 loops=1) I suppose this is not actually the same function that you are obscuring in the other case? Anyway this seems a bit strange, because with no stats on the functionscan result, I'd have expected a more conservative (larger) estimate for the size of the join result. Can you show us the pg_stats row for the column you've labeled inner.recordnumber here? regards, tom lane ----------------------------------------- IMPORTANT: The sender intends that this electronic message is for exclusive use by the person to whom it is addressed. This message may contain information that is confidential or privileged and exempt from disclosure under applicable law. If the reader of this message is not an intended recipient, be aware that any disclosure, dissemination, distribution or copying of this communication, or the use of its contents, is prohibited. If you have received this message in error, please immediately notify the sender of your inadvertent receipt and delete this message from all data storage systems. Thank you.
"Porell, Chris" <Chris.Porell@ceridian.com> writes: > NEW NEWS... turning off "enable_seqscan" made the query run in about .25 > seconds!!! [ squint... ] It was not the seqscans that were killing you, and changing just that setting wouldn't have moved the rowcount estimates one millimeter. I suppose this made it switch to some differently shaped plan that happened not to evaluate the functionscan so many times, but you should realize that what you got there is luck, not a trustworthy fix. BTW, on reflection the functionscan in itself shouldn't have been real expensive, because it would have materialized the function result in a tuplestore and then just rescanned that N times. I think the actual expense came from evaluating the (aaaa >= 25::numeric) filter condition over and over --- not only is numeric arithmetic pretty slow, but it might have done it vastly more than 11306 * 4816 times. We can't tell from this output how selective the filter was, but there could have been a lot more than 11306 rows in the raw function output. You might want to think about adjusting the function definition so that the min-value filtering happens inside the function instead of outside. For that matter, do you really need the value to be numeric rather than plain integer? regards, tom lane