Обсуждение: Query performance over a large proportion of data
Hi, I'd be grateful for any advice we can get... we recently switched from MySQL to PostgreSQL on the basis of some trials we carried out with datasets of varying sizes and varying rates of contention. For the most part we've been pleased with performance, but one particular application runs queries that pull back a lot of results across what is for us a large dataset. We've noticed enormous speed improvements over MySQL with complex queries, but some of these simpler queries are causing us some problems. We were hoping that the machine would be able to cache the majority of the database in memory and be able to run these kinds of queries very quickly. The box doesn't seem to be doing much I/O during these queries, and top usually reports the processor usage slowly increasing to about 75% but no higher than that (and then dropping once it's finished). We adjusted settings in postgresql.conf as recommended in various places on the web. In particular, experimentation led us to turn of enable_seq_scan, because it consistently led to faster query times, but we're not sure why that's the case or if it's a good idea generally. This example has been anonymised slightly, although I've checked it for typos. Our 'fact_table' has 6 million rows, each of which joins to one of 1.7 million rows in record_dimension, and one of 15,000 rows in 'date_dimension'. We have other tables that also join to 'fact_table', but for this example these three tables suffice. The total size (as reported on the file system, presumably including indexes) is 7.5GB. The query below pulls 12 months' worth of data (accounting for 20% of the rows in 'fact_table') with restrictions that account for 15% of the rows in 'record_dimension'. It's a read-only database (we dump it fresh nightly). The server itself is a dual-core 3.7GHz Xeon Dell (each core reporting 2 logical CPUs) running an amd64 build of FreeBSD 6.2, and postgres 8.3.5 built from source. It's got 400GB storage in RAID-5 (on 5 disks). It has 8GB of physical RAM. I'm able to use about 6GB of that for my own purposes; the server doesn't do much else but replicates a very low-usage mysql database. While it's running postgres only seems to use about 1.2GB of RAM. Postgres configuration is below the query and EXPLAIN. Any help would be much appreciated. ============= SELECT "record_dimension"."Id 1" AS "Id 1", "record_dimension"."Id 2" AS "fact_table"."Id 2", "Id 3" AS "Id 3" FROM "fact_table" INNER JOIN "record_dimension" ON "fact_table"."record_key" = "record_dimension"."record_key" INNER JOIN "date_dimension" ON "fact_table"."date_key" = "date_dimension"."date_key" WHERE "record_dimension"."Region" = 'Big Region' AND "date_dimension"."Month" BETWEEN '110' AND '121' AND "record_dimension"."A Common Property" AND "record_dimension"."Country" = 'USA'; ENABLE_SEQSCAN ON Nested Loop (cost=466.34..192962.24 rows=15329 width=12) (actual time=13653.238..31332.113 rows=131466 loops=1) -> Hash Join (cost=466.34..115767.54 rows=141718 width=8) (actual time=13649.952..19548.019 rows=1098344 loops=1) Hash Cond: (fact_table.date_key = date_dimension.date_key) -> Seq Scan on fact_table (cost=0.00..91589.38 rows=5945238 width=12) (actual time=0.014..8761.184 rows=5945238 loops=1) -> Hash (cost=461.99..461.99 rows=348 width=4) (actual time=4.651..4.651 rows=378 loops=1) -> Seq Scan on date_dimension (cost=0.00..461.99 rows=348 width=4) (actual time=0.044..4.007 rows=378 loops=1) Filter: (("Month" >= 110::smallint) AND ("Month" <= 121::smallint)) -> Index Scan using record_dimension_pkey on record_dimension (cost=0.00..0.53 rows=1 width=12) (actual time=0.007..0.007 rows=0 loops=1098344) Index Cond: (record_dimension.record_key = fact_table.record_key) Filter: (record_dimension."A Common Property" AND ((record_dimension."Region")::text = 'Big Region'::text) AND ((record_dimension."Country")::text = 'USA'::text)) Total runtime: 31522.166 ms (131466 rows) (Actual query time: 8606.963 ms) I/O during the query: +-----------------+-----------------------------------------+-----------------------------------+ | | SEQUENTIAL I/O | INDEXED I/O | | | scans | tuples | heap_blks |cached | scans | tuples | idx_blks |cached| |-----------------+-------+--------+-----------+------------+-------+---------+----------+------+ |date_dimension | 1 | 14599 | 0 | 243 | 0 | 0 | 0 | 0 | |fact_table | 1 |5945238 | 0 |32137 | 0 | 0 | 0 | 0 | |record_dimension | 0 | 0 | 0 |1098344 |1098344 |1098344 | 0 |3300506 | ENABLE_SEQSCAN OFF Nested Loop (cost=0.00..355177.96 rows=15329 width=12) (actual time=14763.749..32483.625 rows=131466 loops=1) -> Merge Join (cost=0.00..277983.26 rows=141718 width=8) (actual time=14760.467..20623.975 rows=1098344 loops=1) Merge Cond: (date_dimension.date_key = fact_table.date_key) -> Index Scan using date_dimension_pkey on date_dimension (cost=0.00..706.23 rows=348 width=4) (actual time=0.074..1.635 rows=13 loops=1) Filter: (("Month" >= 110::smallint) AND ("Month" <= 121::smallint)) -> Index Scan using date_key on fact_table (cost=0.00..261696.89 rows=5945238 width=12) (actual time=0.016..9903.593 rows=5945238 loops=1) -> Index Scan using record_dimension_pkey on record_dimension (cost=0.00..0.53 rows=1 width=12) (actual time=0.007..0.007 rows=0 loops=1098344) Index Cond: (record_dimension.record_key = fact_table.record_key) Filter: (record_dimension."A Common Property" AND ((record_dimension."Region")::text = 'Big Region'::text) AND ((record_dimension."Country")::text = 'USA'::text)) Total runtime: 32672.995 ms (10 rows) (131466 rows) (Actual query time: 9049.854 ms) postgresql.conf ============= shared_buffers=1200MB work_mem = 100MB maintenance_work_mem = 200MB max_fsm_pages = 179200 fsync = off synchronous_commit = off full_page_writes = off enable_seqscan = off effective_cache_size = 2000MB default_statistics_target = 100 lc_messages = 'en_US.UTF-8' lc_monetary = 'en_US.UTF-8' lc_numeric = 'en_US.UTF-8' lc_time = 'en_US.UTF-8'
On Tue, Mar 10, 2009 at 3:12 PM, Steve McLellan <smclellan@mintel.com> wrote: > Hi, > > I'd be grateful for any advice we can get... we recently switched from MySQL > to PostgreSQL on the basis of some trials we carried out with datasets of > varying sizes and varying rates of contention. For the most part we've been > pleased with performance, but one particular application runs queries that > pull back a lot of results across what is for us a large dataset. We've > noticed enormous speed improvements over MySQL with complex queries, but some > of these simpler queries are causing us some problems. We were hoping that the > machine would be able to cache the majority of the database in memory and be > able to run these kinds of queries very quickly. The box doesn't seem to be > doing much I/O during these queries, and top usually reports the processor > usage slowly increasing to about 75% but no higher than that (and then > dropping once it's finished). We adjusted settings in postgresql.conf as > recommended in various places on the web. In particular, experimentation led > us to turn of enable_seq_scan, because it consistently led to faster query > times, but we're not sure why that's the case or if it's a good idea > generally. No, it's not. The query planner in postgresql is quite good, and unless you're sure it's making a pathologically bad decision, turning on / off things like seqscan are kind of like a bullet in the brain to cure a headache. > This example has been anonymised slightly, although I've checked it for typos. > Our 'fact_table' has 6 million rows, each of which joins to one of 1.7 million > rows in record_dimension, and one of 15,000 rows in 'date_dimension'. We have > other tables that also join to 'fact_table', but for this example these three > tables suffice. The total size (as reported on the file system, presumably > including indexes) is 7.5GB. The query below pulls 12 months' worth of data > (accounting for 20% of the rows in 'fact_table') with restrictions that > account for 15% of the rows in 'record_dimension'. It's a read-only database > (we dump it fresh nightly). > > The server itself is a dual-core 3.7GHz Xeon Dell (each core reporting 2 > logical CPUs) running an amd64 build of FreeBSD 6.2, and postgres 8.3.5 built > from source. It's got 400GB storage in RAID-5 (on 5 disks). It has 8GB of > physical RAM. I'm able to use about 6GB of that for my own purposes; the > server doesn't do much else but replicates a very low-usage mysql database. > While it's running postgres only seems to use about 1.2GB of RAM. What do you mean you're able to use about 6GB for your own purposes? Note that postgresql relies on the OS to the majority of its caching so if you're doing something that chews up 6G ram on the same machine you are affecting pgsql performance on it. I'll let someone else look through the explain analyze and all, but as regards your sequential scan being turned off, you're far better off adjusting the cost of seqscan and random_page_cost in postgresql.conf to push the planner towards random access. Also increasing your effective cache size up will favor index scans over sequential scans. Then, use enable_seqscan=off / on to test if you have the best query plan or not. Don't just leave enable_seqscan = off.
>>> "Steve McLellan" <smclellan@mintel.com> wrote: > The server itself is a dual-core 3.7GHz Xeon Dell (each core > reporting 2 logical CPUs) running an amd64 build of FreeBSD 6.2, and > postgres 8.3.5 built from source. It's got 400GB storage in RAID-5 > (on 5 disks). It has 8GB of physical RAM. I'm able to use about 6GB > of that for my own purposes; the server doesn't do much else but > replicates a very low-usage mysql database. > shared_buffers=1200MB You might want to play with this -- that's not a bad starting point, but your best performance with your load could be on either side of that value. > work_mem = 100MB Probably kinda high, especially if you expect a lot of connections. This much memory can be concurrently used, possibly more than once, by each active connection. > fsync = off Don't use this setting unless you can afford to lose your entire database cluster. We use it for initial (repeatable) loads, but not much else. > enable_seqscan = off Not a good idea; some queries will optimize better with seqscans. You can probably get the behavior you want using other adjustments. > effective_cache_size = 2000MB From what you said above, I'd bump this up to 5GB or more. You probably need to reduce random_page_cost. If your caching is complete enough, you might want to set it equal to seq_page_cost (never set it lower that seq_page_cost!) and possibly reduce both of these to 0.1. Some people have had good luck with boosting cpu_tuple_cost and cpu_index_tuple_cost. (I've seen 0.5 for both recommended.) I've never had to do that, but if the earlier suggestions don't get good plans, you might try that. I hope that helps. -Kevin
On Tue, Mar 10, 2009 at 3:12 PM, Steve McLellan <smclellan@mintel.com> wrote: > > Nested Loop (cost=466.34..192962.24 rows=15329 width=12) (actual > time=13653.238..31332.113 rows=131466 loops=1) Both your query plans end with this nested loop join which is taking up about half your time in your query. Notice the estimation of the result set is off by a factor of about 10 here, which means a nested loop might be not so good a choice for this. Try increasing default stats target and re-analyzing to see if that helps. 1000 is the max you can give that a shot right off to see if it helps. If it does, drop it until the numbers start to go off again and stop. For a quicker test, you can set enable_nestloop = off in the psql command line and then run the query by hand and see if that helps.
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > You probably need to reduce random_page_cost. If your caching is > complete enough, you might want to set it equal to seq_page_cost > (never set it lower that seq_page_cost!) and possibly reduce both of > these to 0.1. > Some people have had good luck with boosting cpu_tuple_cost and > cpu_index_tuple_cost. (I've seen 0.5 for both recommended.) I've > never had to do that, but if the earlier suggestions don't get good > plans, you might try that. It might be worth pointing out here that all that matters are the relative values of the various xxx_cost parameters. If your DB is mostly or entirely cached, you probably want to lower the estimated cost of I/O relative to CPU work. You can do that *either* by dropping the seq_/random_page_costs, *or* by raising the cpu_xxx_costs (there are more than two of those BTW). Doing both, as Kevin's comments might be read to suggest, is not useful ... and in particular I bet that having seq_page_cost actually less than cpu_tuple_cost would lead to some pretty wacko decision-making by the planner. See http://www.postgresql.org/docs/8.3/static/runtime-config-query.html for some more info about what you're twiddling here. regards, tom lane
"Steve McLellan" <smclellan@mintel.com> writes: > lc_messages = 'en_US.UTF-8' > lc_monetary = 'en_US.UTF-8' > lc_numeric = 'en_US.UTF-8' > lc_time = 'en_US.UTF-8' BTW, aside from the points already made: the above indicates that you initialized your database in en_US.utf8 locale. This is not necessarily a good decision from a performance standpoint --- you might be much better off with C locale, and might even prefer it if you favor ASCII-order sorting over "dictionary" sorting. utf8 encoding might create some penalties you don't need too. This all depends on a lot of factors you didn't mention; maybe you actually need utf8 data, or maybe your application doesn't do many string comparisons and so isn't sensitive to the speed of strcoll() anyway. But I've seen it be a gotcha for people moving from MySQL, which AFAIK doesn't worry about honoring locale-specific sort order. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us>
Sent by: pgsql-performance-owner@postgresql.org
03/10/2009 08:16 PM AST"Steve McLellan" <smclellan@mintel.com> writes:
> lc_messages = 'en_US.UTF-8'
> lc_monetary = 'en_US.UTF-8'
> lc_numeric = 'en_US.UTF-8'
> lc_time = 'en_US.UTF-8'
BTW, aside from the points already made: the above indicates that you
initialized your database in en_US.utf8 locale. This is not necessarily
a good decision from a performance standpoint --- you might be much
better off with C locale, and might even prefer it if you favor
ASCII-order sorting over "dictionary" sorting. utf8 encoding might
create some penalties you don't need too. This all depends on a lot
of factors you didn't mention; maybe you actually need utf8 data,
or maybe your application doesn't do many string comparisons and so
isn't sensitive to the speed of strcoll() anyway. But I've seen it
be a gotcha for people moving from MySQL, which AFAIK doesn't worry
about honoring locale-specific sort order.regards, tom lane
Thanks, Steve
Scott Marlowe <scott.marlowe@gmail.com>
03/10/2009 05:19 PM
>
> Nested Loop (cost=466.34..192962.24 rows=15329 width=12) (actual
> time=13653.238..31332.113 rows=131466 loops=1)
Both your query plans end with this nested loop join which is taking
up about half your time in your query. Notice the estimation of the
result set is off by a factor of about 10 here, which means a nested
loop might be not so good a choice for this. Try increasing default
stats target and re-analyzing to see if that helps. 1000 is the max
you can give that a shot right off to see if it helps. If it does,
drop it until the numbers start to go off again and stop.
For a quicker test, you can set enable_nestloop = off in the psql
command line and then run the query by hand and see if that helps.Thanks - the nested loop is indeed causing problems - reducing seq_page_cost had the same effect of removing the nested loop for this query. We'd noticed the poor row count estimation. Increasing the statistics doesn't seem to have much effect, but we'll have more of a go with it.
"Kevin Grittner" <Kevin.Grittner@wicourts.gov>
03/10/2009 05:06 PM EST
> enable_seqscan = off
Not a good idea; some queries will optimize better with seqscans.
You can probably get the behavior you want using other adjustments.
You probably need to reduce random_page_cost. If your caching is
complete enough, you might want to set it equal to seq_page_cost
(never set it lower that seq_page_cost!) and possibly reduce both of
these to 0.1.
Thanks again, Steve
On Tue, Mar 10, 2009 at 9:15 PM, Steve McLellan <smclellan@mintel.com> wrote: > Thanks - the nested loop is indeed causing problems - reducing > seq_page_cost had the same effect of removing the nested loop for this > query. We'd noticed the poor row count estimation. Increasing the statistics > doesn't seem to have much effect, but we'll have more of a go with it. More than likely it's the sequential page cost versus the cpu_xxx cost setttings that's really making the difference. I.e. if you raised the cpu_xxx settings you'd get the same result. But I'm not sure, just a guess.
On Mar 10, 2009, at 4:12 PM, Steve McLellan wrote: > The server itself is a dual-core 3.7GHz Xeon Dell (each core > reporting 2 > logical CPUs) running an amd64 build of FreeBSD 6.2, and postgres > 8.3.5 built > from source. Uh, you're running an amd64 build on top of an Intel CPU? I didn't think FBSD would allow that, but if it does it wouldn't surprise me if kernel/OS performance stunk. If Postgres then used the same settings it would make matters even worse (IIRC there is some code that's different in an AMD vs Intel build). -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
decibel wrote: > On Mar 10, 2009, at 4:12 PM, Steve McLellan wrote: >> The server itself is a dual-core 3.7GHz Xeon Dell (each core reporting 2 >> logical CPUs) running an amd64 build of FreeBSD 6.2, and postgres >> 8.3.5 built >> from source. > > Uh, you're running an amd64 build on top of an Intel CPU? I didn't think > FBSD would allow that, but if it does it wouldn't surprise me if > kernel/OS performance stunk. If Postgres then used the same settings it > would make matters even worse (IIRC there is some code that's different > in an AMD vs Intel build). Uh? Amd64 just the name of the FreeBSD port for AMD/Intel 64 bit CPUs. See: http://www.freebsd.org/platforms/amd64.html and: http://en.wikipedia.org/wiki/X86-64 Cheers -- Matteo Beccati OpenX - http://www.openx.org