Обсуждение: Allow sorts to use more available memory
The recent "data warehouse" thread made me think about how I use work_mem for some of my big queries. So I tried SET work_mem = '4GB' for a session and got ERROR: 4194304 is outside the valid range for parameter "work_mem" (64 .. 2097151) A bit of searching turned up the "Allow sorts to use more available memory" section of the to-do list. Am I correct in reading that the max_val is 2GB and regardless of how much RAM I have in the box I'm stuck with only using 2GB? Am I missing something? I'm using: PostgreSQL 9.0.4, compiled by Visual C++ build 1500, 64-bit Windows 2008 Server Enterprise Thanks, Bob
On 9/12/2011 12:33 PM, Robert Schnabel wrote: > The recent "data warehouse" thread made me think about how I use > work_mem for some of my big queries. So I tried SET work_mem = '4GB' for > a session and got > > ERROR: 4194304 is outside the valid range for parameter "work_mem" (64 > .. 2097151) > > A bit of searching turned up the "Allow sorts to use more available > memory" section of the to-do list. Am I correct in reading that the > max_val is 2GB and regardless of how much RAM I have in the box I'm > stuck with only using 2GB? Am I missing something? > > I'm using: PostgreSQL 9.0.4, compiled by Visual C++ build 1500, 64-bit > Windows 2008 Server Enterprise > > Thanks, > Bob > > work_mem is not the total a query can use. I believe each step can use that much, and each backend can use it for multiple bits. So if you had two backends, each doing 2 sorts, you'd use 2*2 = 4 * 2GB = 8GB. -Andy
On 09/12/2011 12:47 PM, Andy Colson wrote: > work_mem is not the total a query can use. I believe each step can > use that much, and each backend can use it for multiple bits. So if > you had two backends, each doing 2 sorts, you'd use 2*2 = 4 * 2GB = > 8GB. Exactly. Find a big query somewhere in your system. Use EXPLAIN to examine it. Chances are, that one query has one or more sorts. Each one of those gets its own work_mem. Each sort. The query have four sorts? It may use 4*work_mem. On a whim a while back, I doubled our 8MB setting to 16MB on a test system. During a load test, the machine ran out of memory, swapped out, and finally crashed after the OOM killer went nuts. Set this value *at your own risk* and only after *significant* testing. Having it too high can have rather unexpected consequences. Setting it to 1 or 2GB, unless you have VERY few threads, or a TON of memory, is a very, very bad idea. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 sthomas@peak6.com ______________________________________________ See http://www.peak6.com/email-disclaimer/ for terms and conditions related to this email
On 9/12/2011 12:57 PM, Shaun Thomas wrote: > On 09/12/2011 12:47 PM, Andy Colson wrote: > >> work_mem is not the total a query can use. I believe each step can >> use that much, and each backend can use it for multiple bits. So if >> you had two backends, each doing 2 sorts, you'd use 2*2 = 4 * 2GB = >> 8GB. > > Exactly. Find a big query somewhere in your system. Use EXPLAIN to > examine it. Yeah, and even better, on PG 9, if you EXPLAIN ANALYZE it'll show you just how much memory is actually being used. -Andy
On 9/12/2011 12:57 PM, Shaun Thomas wrote: > On 09/12/2011 12:47 PM, Andy Colson wrote: > >> work_mem is not the total a query can use. I believe each step can >> use that much, and each backend can use it for multiple bits. So if >> you had two backends, each doing 2 sorts, you'd use 2*2 = 4 * 2GB = >> 8GB. > Exactly. Find a big query somewhere in your system. Use EXPLAIN to > examine it. Chances are, that one query has one or more sorts. Each one > of those gets its own work_mem. Each sort. The query have four sorts? It > may use 4*work_mem. On a whim a while back, I doubled our 8MB setting to > 16MB on a test system. During a load test, the machine ran out of > memory, swapped out, and finally crashed after the OOM killer went nuts. > > Set this value *at your own risk* and only after *significant* testing. > Having it too high can have rather unexpected consequences. Setting it > to 1 or 2GB, unless you have VERY few threads, or a TON of memory, is a > very, very bad idea. > Yep, I know. But in the context of the data warehouse where *I'm the only user* and I have a query that does, say 4 large sorts like http://explain.depesz.com/s/BrAO and I have 32GB RAM I'm not worried about using 8GB or 16GB in the case of work_mem = 4GB. I realize the query above only used 1.9GB for the largest sort but I know I have other queries with 1 or 2 sorts that I've watched go to disk. Bob
On 9/12/2011 1:22 PM, Robert Schnabel wrote: > > On 9/12/2011 12:57 PM, Shaun Thomas wrote: >> On 09/12/2011 12:47 PM, Andy Colson wrote: >> >>> work_mem is not the total a query can use. I believe each step can >>> use that much, and each backend can use it for multiple bits. So if >>> you had two backends, each doing 2 sorts, you'd use 2*2 = 4 * 2GB = >>> 8GB. >> Exactly. Find a big query somewhere in your system. Use EXPLAIN to >> examine it. Chances are, that one query has one or more sorts. Each one >> of those gets its own work_mem. Each sort. The query have four sorts? It >> may use 4*work_mem. On a whim a while back, I doubled our 8MB setting to >> 16MB on a test system. During a load test, the machine ran out of >> memory, swapped out, and finally crashed after the OOM killer went nuts. >> >> Set this value *at your own risk* and only after *significant* testing. >> Having it too high can have rather unexpected consequences. Setting it >> to 1 or 2GB, unless you have VERY few threads, or a TON of memory, is a >> very, very bad idea. >> > Yep, I know. But in the context of the data warehouse where *I'm the > only user* and I have a query that does, say 4 large sorts like > http://explain.depesz.com/s/BrAO and I have 32GB RAM I'm not worried > about using 8GB or 16GB in the case of work_mem = 4GB. I realize the > query above only used 1.9GB for the largest sort but I know I have other > queries with 1 or 2 sorts that I've watched go to disk. > > Bob > > > > Wow, you are getting close to the limits there. Another thing you can do is mount tmpfs in ram and then just let it spill. -Andy
On 9/12/2011 1:22 PM, Robert Schnabel wrote: > > On 9/12/2011 12:57 PM, Shaun Thomas wrote: >> On 09/12/2011 12:47 PM, Andy Colson wrote: >> >>> work_mem is not the total a query can use. I believe each step can >>> use that much, and each backend can use it for multiple bits. So if >>> you had two backends, each doing 2 sorts, you'd use 2*2 = 4 * 2GB = >>> 8GB. >> Exactly. Find a big query somewhere in your system. Use EXPLAIN to >> examine it. Chances are, that one query has one or more sorts. Each one >> of those gets its own work_mem. Each sort. The query have four sorts? It >> may use 4*work_mem. On a whim a while back, I doubled our 8MB setting to >> 16MB on a test system. During a load test, the machine ran out of >> memory, swapped out, and finally crashed after the OOM killer went nuts. >> >> Set this value *at your own risk* and only after *significant* testing. >> Having it too high can have rather unexpected consequences. Setting it >> to 1 or 2GB, unless you have VERY few threads, or a TON of memory, is a >> very, very bad idea. >> > Yep, I know. But in the context of the data warehouse where *I'm the > only user* and I have a query that does, say 4 large sorts like > http://explain.depesz.com/s/BrAO and I have 32GB RAM I'm not worried > about using 8GB or 16GB in the case of work_mem = 4GB. I realize the > query above only used 1.9GB for the largest sort but I know I have other > queries with 1 or 2 sorts that I've watched go to disk. > > Bob > > > > Huge guess here, cant see select or ddl, but looks like all the tables are sequential scans. It might help to add an index or two, then the table joins could be done much more efficiently with with a lot less memory. -Andy
I think , you may add a ramdisk as tablespace for temporary tables. This should work similar to bigger work_mem. 2011/9/12, Robert Schnabel <schnabelr@missouri.edu>: > > On 9/12/2011 12:57 PM, Shaun Thomas wrote: >> On 09/12/2011 12:47 PM, Andy Colson wrote: >> >>> work_mem is not the total a query can use. I believe each step can >>> use that much, and each backend can use it for multiple bits. So if >>> you had two backends, each doing 2 sorts, you'd use 2*2 = 4 * 2GB = >>> 8GB. >> Exactly. Find a big query somewhere in your system. Use EXPLAIN to >> examine it. Chances are, that one query has one or more sorts. Each one >> of those gets its own work_mem. Each sort. The query have four sorts? It >> may use 4*work_mem. On a whim a while back, I doubled our 8MB setting to >> 16MB on a test system. During a load test, the machine ran out of >> memory, swapped out, and finally crashed after the OOM killer went nuts. >> >> Set this value *at your own risk* and only after *significant* testing. >> Having it too high can have rather unexpected consequences. Setting it >> to 1 or 2GB, unless you have VERY few threads, or a TON of memory, is a >> very, very bad idea. >> > Yep, I know. But in the context of the data warehouse where *I'm the > only user* and I have a query that does, say 4 large sorts like > http://explain.depesz.com/s/BrAO and I have 32GB RAM I'm not worried > about using 8GB or 16GB in the case of work_mem = 4GB. I realize the > query above only used 1.9GB for the largest sort but I know I have other > queries with 1 or 2 sorts that I've watched go to disk. > > Bob > > > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- ------------ pasman
On Mon, Sep 12, 2011 at 11:33 AM, Robert Schnabel <schnabelr@missouri.edu> wrote: > The recent "data warehouse" thread made me think about how I use work_mem > for some of my big queries. So I tried SET work_mem = '4GB' for a session > and got > > ERROR: 4194304 is outside the valid range for parameter "work_mem" (64 .. > 2097151) Ubuntu 10.10, pgsql 8.4.8: smarlowe=# set work_mem='1000GB'; SET
On 9/12/2011 3:58 PM, Scott Marlowe wrote: > On Mon, Sep 12, 2011 at 11:33 AM, Robert Schnabel > <schnabelr@missouri.edu> wrote: >> The recent "data warehouse" thread made me think about how I use work_mem >> for some of my big queries. So I tried SET work_mem = '4GB' for a session >> and got >> >> ERROR: 4194304 is outside the valid range for parameter "work_mem" (64 .. >> 2097151) > Ubuntu 10.10, pgsql 8.4.8: > > smarlowe=# set work_mem='1000GB'; > SET Ok, so is this a limitation related to the Windows implementation? And getting back to the to-do list entry and reading the related posts, it appears that even if you could set work_mem that high it would only use 2GB anyway. I guess that was the second part of my question. Is that true?
On 9/12/2011 1:57 PM, Andy Colson wrote: > On 9/12/2011 1:22 PM, Robert Schnabel wrote: >> On 9/12/2011 12:57 PM, Shaun Thomas wrote: >>> On 09/12/2011 12:47 PM, Andy Colson wrote: >>> >>>> work_mem is not the total a query can use. I believe each step can >>>> use that much, and each backend can use it for multiple bits. So if >>>> you had two backends, each doing 2 sorts, you'd use 2*2 = 4 * 2GB = >>>> 8GB. >>> Exactly. Find a big query somewhere in your system. Use EXPLAIN to >>> examine it. Chances are, that one query has one or more sorts. Each one >>> of those gets its own work_mem. Each sort. The query have four sorts? It >>> may use 4*work_mem. On a whim a while back, I doubled our 8MB setting to >>> 16MB on a test system. During a load test, the machine ran out of >>> memory, swapped out, and finally crashed after the OOM killer went nuts. >>> >>> Set this value *at your own risk* and only after *significant* testing. >>> Having it too high can have rather unexpected consequences. Setting it >>> to 1 or 2GB, unless you have VERY few threads, or a TON of memory, is a >>> very, very bad idea. >>> >> Yep, I know. But in the context of the data warehouse where *I'm the >> only user* and I have a query that does, say 4 large sorts like >> http://explain.depesz.com/s/BrAO and I have 32GB RAM I'm not worried >> about using 8GB or 16GB in the case of work_mem = 4GB. I realize the >> query above only used 1.9GB for the largest sort but I know I have other >> queries with 1 or 2 sorts that I've watched go to disk. >> >> Bob >> >> >> >> > Huge guess here, cant see select or ddl, but looks like all the tables > are sequential scans. It might help to add an index or two, then the > table joins could be done much more efficiently with with a lot less > memory. > > -Andy In this case I doubt it. Basically what these queries are doing is taking table1 (~30M rows) and finding all the rows with a certain condition. This produces ~15M rows. Then I have to find all of those 15M rows that are present in table2. In the case of the query above this results in 1.79M rows. Basically, the 15M rows that meet the condition for table1 have matching rows spread out over 10 different tables (table2's). Actually, you just gave me an idea. When I generate the "table1" I can probably add a field that tells me which "table2" it came from for each row that satisfies my criteria. Sometimes just having someone else make you think is very productive. :-) Thanks Bob
Robert Schnabel <schnabelr@missouri.edu> writes: > On 9/12/2011 3:58 PM, Scott Marlowe wrote: >> On Mon, Sep 12, 2011 at 11:33 AM, Robert Schnabel >> <schnabelr@missouri.edu> wrote: >>> The recent "data warehouse" thread made me think about how I use work_mem >>> for some of my big queries. So I tried SET work_mem = '4GB' for a session >>> and got >>> ERROR: 4194304 is outside the valid range for parameter "work_mem" (64 .. >>> 2097151) >> Ubuntu 10.10, pgsql 8.4.8: >> smarlowe=# set work_mem='1000GB'; >> SET > Ok, so is this a limitation related to the Windows implementation? Yeah. If you look into guc.c you'll find this: /* upper limit for GUC variables measured in kilobytes of memory */ /* note that various places assume the byte size fits in a "long" variable */ #if SIZEOF_SIZE_T > 4 && SIZEOF_LONG > 4 #define MAX_KILOBYTES INT_MAX #else #define MAX_KILOBYTES (INT_MAX / 1024) #endif Since Windows, more or less alone among known 64-bit operating systems, chose not to make "long" the same width as pointers, these values get restricted just as if you were on a 32-bit machine. Few Postgres developers use Windows enough to get excited about doing all the tedious (and bug-prone) gruntwork that would be required to fix this. regards, tom lane
* Robert Schnabel (schnabelr@missouri.edu) wrote: > And getting back to the to-do list entry and reading the related > posts, it appears that even if you could set work_mem that high it > would only use 2GB anyway. I guess that was the second part of my > question. Is that true? Yes and no. work_mem is used by the planner to figure out what kind of plan to use. The planner plans things based off of statistics, but it's not perfect, especially on large tables with lots of data which have dependent data between columns. Where the 2GB limit comes into play is when you end up with a plan that does, say, a large sort. PG will use memory for the sort up to work_mem, or 2GB, whichever is lower, and spill to disk after that. I don't believe it has such a limit for a hash table, due to how the data structures for the hash table are allocated (and I recall seeing single PG queries that use hash tables getting into the 30+GB range, of course, I had work_mem set upwards of 100GB on a 32GB box... :). So, if you're doing data warehousing, and you're pretty much the only user (or there's only one at a time), setting it up pretty high is acceptable, but you do need to watch the box and make sure you don't run it out of memory. Also, make sure you have things configured correctly, if you're using Linux, to prevent the OOM killer from kicking in. Also, as I suggested before, set it to a reasonable level for the 'default' and just up it for specific queries that may benefit from it. Thanks, Stephen
Вложения
* Robert Schnabel (schnabelr@missouri.edu) wrote: > And getting back to the to-do list entry and reading the related > posts, it appears that even if you could set work_mem that high it > would only use 2GB anyway. I guess that was the second part of my > question. Is that true? Errr, and to get back to the to-do (which I've been considering doing something about...), it's to allow the *actual* memory usage for things like sorts to use more than 2GB, but as others have pointed out, you can do that by putting pgsql_tmp on a memory filesystem and letting the sorts spill to the memory-based FS. Thanks, Stephen
Вложения
Stephen Frost wrote: -- Start of PGP signed section. > * Robert Schnabel (schnabelr@missouri.edu) wrote: > > And getting back to the to-do list entry and reading the related > > posts, it appears that even if you could set work_mem that high it > > would only use 2GB anyway. I guess that was the second part of my > > question. Is that true? > > Errr, and to get back to the to-do (which I've been considering doing > something about...), it's to allow the *actual* memory usage for things > like sorts to use more than 2GB, but as others have pointed out, you can > do that by putting pgsql_tmp on a memory filesystem and letting the > sorts spill to the memory-based FS. It would be nice if the tempfs would allow us to control total temp memory usage, except it causes a failure rather than splilling to real disk. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +