Обсуждение: Help tuning a large table off disk and into RAM
I'm stuck trying to tune a big-ish postgres db and wondering if anyone has any pointers. I cannot get Postgres to make good use of plenty of available RAM and stop thrashing the disks. One main table. ~30 million rows, 20 columns all integer, smallint or char(2). Most have an index. It's a table for holding webserver logs. The main table is all foreign key ids. Row size is ~100bytes. The typical query is an aggregate over a large number of rows (~25% say). SELECT COUNT(*), COUNT(DISTINCT user_id) FROM table WHERE epoch > ... AND epoch < ... AND country = ... The box has 4 x Opterons, 4Gb RAM & five 15k rpm disks, RAID 5. We wanted fast query/lookup. We know we can get fast disk IO. Running a typical query like above seems to: * hardly tax a single CPU * plenty of RAM free * disks thrash about The last is based mostly on the observation that another tiddly unrelated mysql db which normally runs fast, grinds to a halt when we're querying the postgres db (and cpu, memory appear to have spare capacity). We've currently got these settings, and have tried doubling/halving them, restarted and benchmarked a test query. They don't appear to materially alter our query time. shared_buffers = 128MB temp_buffers = 160MB work_mem = 200MB max_stack_depth = 7MB We're less concerned about insert speed. Typically 1 or 2 users, but want fast queries. Perhaps a little extreme, but I'm trying to find a way to express this in a way that Postgres understands: * Load this table, and one or two indexes (epoch, user_id) into RAM. * All of the table, all of those indexes. * Keep them there, but keep a disk based backup for integrity. * Run all selects against the in RAM copy. Always. Aka, I know we're hitting this table (and a couple of columns) lots and lots, so just get it into RAM and stop thrashing disks. Pointers welcome.
In response to "James Williams" <james.wlms@googlemail.com>: > I'm stuck trying to tune a big-ish postgres db and wondering if anyone > has any pointers. > > I cannot get Postgres to make good use of plenty of available RAM and > stop thrashing the disks. > > One main table. ~30 million rows, 20 columns all integer, smallint or > char(2). Most have an index. It's a table for holding webserver > logs. The main table is all foreign key ids. Row size is ~100bytes. > > The typical query is an aggregate over a large number of rows (~25% say). > > SELECT COUNT(*), COUNT(DISTINCT user_id) > FROM table > WHERE epoch > ... > AND epoch < ... > AND country = ... > > The box has 4 x Opterons, 4Gb RAM & five 15k rpm disks, RAID 5. We > wanted fast query/lookup. We know we can get fast disk IO. > > Running a typical query like above seems to: > > * hardly tax a single CPU > * plenty of RAM free > * disks thrash about > > The last is based mostly on the observation that another tiddly > unrelated mysql db which normally runs fast, grinds to a halt when > we're querying the postgres db (and cpu, memory appear to have spare > capacity). > > We've currently got these settings, and have tried doubling/halving > them, restarted and benchmarked a test query. They don't appear to > materially alter our query time. > > shared_buffers = 128MB shared_buffers = 1.5GB Unless you've got a lot of stuff other than PostgreSQL on this machine. > temp_buffers = 160MB > work_mem = 200MB > max_stack_depth = 7MB These look reasonable, although I can't be sure without more details. > > We're less concerned about insert speed. Typically 1 or 2 users, but > want fast queries. > > Perhaps a little extreme, but I'm trying to find a way to express this > in a way that Postgres understands: > > * Load this table, and one or two indexes (epoch, user_id) into RAM. Give it enough shared_buffers and it will do that. You're estimating the size of your table @ 3G (try a pg_relation_size() on it to get an actual size) If you really want to get _all_ of it in all the time, you're probably going to need to add RAM to the machine. With 8G, you could allocate about 3G to shared_buffers, but that would be ignoring the size of indexes. However, I think you'll be surprised how much performance improves with 1.5G of shared_buffers. You may not need any more. 128M is really forcing PG to work within limited space. > * All of the table, all of those indexes. > * Keep them there, but keep a disk based backup for integrity. > * Run all selects against the in RAM copy. Always. This is what PG does if you allocate enough shared_buffers. -- Bill Moran http://www.potentialtech.com
James Williams wrote: > The box has 4 x Opterons, 4Gb RAM & five 15k rpm disks, RAID 5. We > wanted fast query/lookup. We know we can get fast disk IO. RAID 5 is usually adviced against here. It's not particularly fast or safe, IIRC. Try searching the ML archives for RAID 5 ;) -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
Have you tried clustering tables based on the most-frequently used indexes to improve locality? http://www.postgresql.org/docs/8.2/static/sql-cluster.html -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Bill Moran Sent: Wednesday, September 26, 2007 11:24 AM To: James Williams Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Help tuning a large table off disk and into RAM In response to "James Williams" <james.wlms@googlemail.com>: > I'm stuck trying to tune a big-ish postgres db and wondering if anyone > has any pointers. > > I cannot get Postgres to make good use of plenty of available RAM and > stop thrashing the disks. > > One main table. ~30 million rows, 20 columns all integer, smallint or > char(2). Most have an index. It's a table for holding webserver > logs. The main table is all foreign key ids. Row size is ~100bytes. > > The typical query is an aggregate over a large number of rows (~25% say). > > SELECT COUNT(*), COUNT(DISTINCT user_id) > FROM table > WHERE epoch > ... > AND epoch < ... > AND country = ... > > The box has 4 x Opterons, 4Gb RAM & five 15k rpm disks, RAID 5. We > wanted fast query/lookup. We know we can get fast disk IO. > > Running a typical query like above seems to: > > * hardly tax a single CPU > * plenty of RAM free > * disks thrash about > > The last is based mostly on the observation that another tiddly > unrelated mysql db which normally runs fast, grinds to a halt when > we're querying the postgres db (and cpu, memory appear to have spare > capacity). > > We've currently got these settings, and have tried doubling/halving > them, restarted and benchmarked a test query. They don't appear to > materially alter our query time. > > shared_buffers = 128MB shared_buffers = 1.5GB Unless you've got a lot of stuff other than PostgreSQL on this machine. > temp_buffers = 160MB > work_mem = 200MB > max_stack_depth = 7MB These look reasonable, although I can't be sure without more details. > > We're less concerned about insert speed. Typically 1 or 2 users, but > want fast queries. > > Perhaps a little extreme, but I'm trying to find a way to express this > in a way that Postgres understands: > > * Load this table, and one or two indexes (epoch, user_id) into RAM. Give it enough shared_buffers and it will do that. You're estimating the size of your table @ 3G (try a pg_relation_size() on it to get an actual size) If you really want to get _all_ of it in all the time, you're probably going to need to add RAM to the machine. With 8G, you could allocate about 3G to shared_buffers, but that would be ignoring the size of indexes. However, I think you'll be surprised how much performance improves with 1.5G of shared_buffers. You may not need any more. 128M is really forcing PG to work within limited space. > * All of the table, all of those indexes. > * Keep them there, but keep a disk based backup for integrity. > * Run all selects against the in RAM copy. Always. This is what PG does if you allocate enough shared_buffers. -- Bill Moran http://www.potentialtech.com ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org/ Confidentiality Notice. This message may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution, or copying of this message, or any attachments, is strictly prohibited. If you have received this message in error, please advise the sender by reply e-mail, and delete the message and any attachments. Thank you.
Bill Moran <wmoran@potentialtech.com> writes: > Give it enough shared_buffers and it will do that. You're estimating > the size of your table @ 3G (try a pg_relation_size() on it to get an > actual size) If you really want to get _all_ of it in all the time, > you're probably going to need to add RAM to the machine. The table alone will barely fit in RAM, and he says he's got a boatload of indexes too; and apparently Postgres isn't the only thing running on the machine. He *definitely* has to buy more RAM if he wants it all to fit. I wouldn't necessarily advise going to gigs of shared buffers; you'll be putting a lot of temptation on the kernel to swap parts of that out, and it does not sound at all like the workload will keep all of the buffers "hot" enough to prevent that. regards, tom lane
On Wed, 26 Sep 2007, James Williams wrote: > The box has 4 x Opterons, 4Gb RAM & five 15k rpm disks, RAID 5. We > wanted fast query/lookup. We know we can get fast disk IO. You might want to benchmark to prove that if you haven't already. You would not be the first person to presume you have fast disk I/O on RAID 5 only to discover that's not actually true when tested. http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm gives some details here. > shared_buffers = 128MB > temp_buffers = 160MB > work_mem = 200MB > max_stack_depth = 7MB The one you're missing is effective_cache_size, and I'd expect you'd need to more than double shared_buffers to have that impact things given what you've described of your tasks. Take a look at http://www.westnet.com/~gsmith/content/postgresql/pg-5minute.htm to get a better idea the right range for those two you should be considering; 128MB for shared_buffers is way low for your system, something >1GB is probably right, and effective_cache_size should probably be in the multiple GB range. If you actually want to see what's inside the shared_buffers memory, take a look at the contrib/pg_buffercache module. Installing that for your database will let you see how the memory is being used, to get a better idea how much of your indexes are staying in that part of memory. The hint you already got from Bill Moran about using pg_relation_size() will give you some basis for figuring out what % of the index is being held there. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On 9/26/07, James Williams <james.wlms@googlemail.com> wrote: > The last is based mostly on the observation that another tiddly > unrelated mysql db which normally runs fast, grinds to a halt when > we're querying the postgres db (and cpu, memory appear to have spare > capacity). Just a quick observation here. When the drives are thrashing, is it straight db-storage thrashing, or is the system swapping out a lot? Run vmstat 10 while running this query when this happens and pay attentino to bi bo and si so