Обсуждение: Help tuning a large table off disk and into RAM

Поиск
Список
Период
Сортировка

Help tuning a large table off disk and into RAM

От
"James Williams"
Дата:
I'm stuck trying to tune a big-ish postgres db and wondering if anyone
has any pointers.

I cannot get Postgres to make good use of plenty of available RAM and
stop thrashing the disks.

One main table. ~30 million rows, 20 columns all integer, smallint or
char(2).  Most have an index.  It's a table for holding webserver
logs.  The main table is all foreign key ids.  Row size is ~100bytes.

The typical query is an aggregate over a large number of rows (~25% say).

 SELECT COUNT(*), COUNT(DISTINCT user_id)
 FROM table
 WHERE epoch > ...
 AND epoch < ...
 AND country = ...

The box has 4 x Opterons, 4Gb RAM & five 15k rpm disks, RAID 5.  We
wanted fast query/lookup.  We know we can get fast disk IO.

Running a typical query like above seems to:

* hardly tax a single CPU
* plenty of RAM free
* disks thrash about

The last is based mostly on the observation that another tiddly
unrelated mysql db which normally runs fast, grinds to a halt when
we're querying the postgres db (and cpu, memory appear to have spare
capacity).

We've currently got these settings, and have tried doubling/halving
them, restarted and benchmarked a test query.  They don't appear to
materially alter our query time.

 shared_buffers  = 128MB
 temp_buffers    = 160MB
 work_mem        = 200MB
 max_stack_depth = 7MB

We're less concerned about insert speed.  Typically 1 or 2 users, but
want fast queries.

Perhaps a little extreme, but I'm trying to find a way to express this
in a way that Postgres understands:

* Load this table, and one or two indexes (epoch, user_id) into RAM.
* All of the table, all of those indexes.
* Keep them there, but keep a disk based backup for integrity.
* Run all selects against the in RAM copy.  Always.

Aka, I know we're hitting this table (and a couple of columns) lots
and lots, so just get it into RAM and stop thrashing disks.

Pointers welcome.

Re: Help tuning a large table off disk and into RAM

От
Bill Moran
Дата:
In response to "James Williams" <james.wlms@googlemail.com>:

> I'm stuck trying to tune a big-ish postgres db and wondering if anyone
> has any pointers.
>
> I cannot get Postgres to make good use of plenty of available RAM and
> stop thrashing the disks.
>
> One main table. ~30 million rows, 20 columns all integer, smallint or
> char(2).  Most have an index.  It's a table for holding webserver
> logs.  The main table is all foreign key ids.  Row size is ~100bytes.
>
> The typical query is an aggregate over a large number of rows (~25% say).
>
>  SELECT COUNT(*), COUNT(DISTINCT user_id)
>  FROM table
>  WHERE epoch > ...
>  AND epoch < ...
>  AND country = ...
>
> The box has 4 x Opterons, 4Gb RAM & five 15k rpm disks, RAID 5.  We
> wanted fast query/lookup.  We know we can get fast disk IO.
>
> Running a typical query like above seems to:
>
> * hardly tax a single CPU
> * plenty of RAM free
> * disks thrash about
>
> The last is based mostly on the observation that another tiddly
> unrelated mysql db which normally runs fast, grinds to a halt when
> we're querying the postgres db (and cpu, memory appear to have spare
> capacity).
>
> We've currently got these settings, and have tried doubling/halving
> them, restarted and benchmarked a test query.  They don't appear to
> materially alter our query time.
>
>  shared_buffers  = 128MB

shared_buffers = 1.5GB

Unless you've got a lot of stuff other than PostgreSQL on this machine.

>  temp_buffers    = 160MB
>  work_mem        = 200MB
>  max_stack_depth = 7MB

These look reasonable, although I can't be sure without more details.

>
> We're less concerned about insert speed.  Typically 1 or 2 users, but
> want fast queries.
>
> Perhaps a little extreme, but I'm trying to find a way to express this
> in a way that Postgres understands:
>
> * Load this table, and one or two indexes (epoch, user_id) into RAM.

Give it enough shared_buffers and it will do that.  You're estimating
the size of your table @ 3G (try a pg_relation_size() on it to get an
actual size)  If you really want to get _all_ of it in all the time,
you're probably going to need to add RAM to the machine.  With 8G, you
could allocate about 3G to shared_buffers, but that would be ignoring
the size of indexes.

However, I think you'll be surprised how much performance improves
with 1.5G of shared_buffers.  You may not need any more.  128M is
really forcing PG to work within limited space.

> * All of the table, all of those indexes.
> * Keep them there, but keep a disk based backup for integrity.
> * Run all selects against the in RAM copy.  Always.

This is what PG does if you allocate enough shared_buffers.

--
Bill Moran
http://www.potentialtech.com

Re: Help tuning a large table off disk and into RAM

От
Alban Hertroys
Дата:
James Williams wrote:
> The box has 4 x Opterons, 4Gb RAM & five 15k rpm disks, RAID 5.  We
> wanted fast query/lookup.  We know we can get fast disk IO.

RAID 5 is usually adviced against here. It's not particularly fast or
safe, IIRC. Try searching the ML archives for RAID 5 ;)

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

Re: Help tuning a large table off disk and into RAM

От
"Jimmy Choi"
Дата:
Have you tried clustering tables based on the most-frequently used
indexes to improve locality?

http://www.postgresql.org/docs/8.2/static/sql-cluster.html

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Bill Moran
Sent: Wednesday, September 26, 2007 11:24 AM
To: James Williams
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Help tuning a large table off disk and into RAM

In response to "James Williams" <james.wlms@googlemail.com>:

> I'm stuck trying to tune a big-ish postgres db and wondering if anyone
> has any pointers.
>
> I cannot get Postgres to make good use of plenty of available RAM and
> stop thrashing the disks.
>
> One main table. ~30 million rows, 20 columns all integer, smallint or
> char(2).  Most have an index.  It's a table for holding webserver
> logs.  The main table is all foreign key ids.  Row size is ~100bytes.
>
> The typical query is an aggregate over a large number of rows (~25%
say).
>
>  SELECT COUNT(*), COUNT(DISTINCT user_id)
>  FROM table
>  WHERE epoch > ...
>  AND epoch < ...
>  AND country = ...
>
> The box has 4 x Opterons, 4Gb RAM & five 15k rpm disks, RAID 5.  We
> wanted fast query/lookup.  We know we can get fast disk IO.
>
> Running a typical query like above seems to:
>
> * hardly tax a single CPU
> * plenty of RAM free
> * disks thrash about
>
> The last is based mostly on the observation that another tiddly
> unrelated mysql db which normally runs fast, grinds to a halt when
> we're querying the postgres db (and cpu, memory appear to have spare
> capacity).
>
> We've currently got these settings, and have tried doubling/halving
> them, restarted and benchmarked a test query.  They don't appear to
> materially alter our query time.
>
>  shared_buffers  = 128MB

shared_buffers = 1.5GB

Unless you've got a lot of stuff other than PostgreSQL on this machine.

>  temp_buffers    = 160MB
>  work_mem        = 200MB
>  max_stack_depth = 7MB

These look reasonable, although I can't be sure without more details.

>
> We're less concerned about insert speed.  Typically 1 or 2 users, but
> want fast queries.
>
> Perhaps a little extreme, but I'm trying to find a way to express this
> in a way that Postgres understands:
>
> * Load this table, and one or two indexes (epoch, user_id) into RAM.

Give it enough shared_buffers and it will do that.  You're estimating
the size of your table @ 3G (try a pg_relation_size() on it to get an
actual size)  If you really want to get _all_ of it in all the time,
you're probably going to need to add RAM to the machine.  With 8G, you
could allocate about 3G to shared_buffers, but that would be ignoring
the size of indexes.

However, I think you'll be surprised how much performance improves
with 1.5G of shared_buffers.  You may not need any more.  128M is
really forcing PG to work within limited space.

> * All of the table, all of those indexes.
> * Keep them there, but keep a disk based backup for integrity.
> * Run all selects against the in RAM copy.  Always.

This is what PG does if you allocate enough shared_buffers.

--
Bill Moran
http://www.potentialtech.com

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org/



Confidentiality Notice.  This message may contain information that is confidential or otherwise protected from
disclosure.
If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution,
or copying of this message, or any attachments, is strictly prohibited.  If you have received this message in error,
please advise the sender by reply e-mail, and delete the message and any attachments.  Thank you.




Re: Help tuning a large table off disk and into RAM

От
Tom Lane
Дата:
Bill Moran <wmoran@potentialtech.com> writes:
> Give it enough shared_buffers and it will do that.  You're estimating
> the size of your table @ 3G (try a pg_relation_size() on it to get an
> actual size)  If you really want to get _all_ of it in all the time,
> you're probably going to need to add RAM to the machine.

The table alone will barely fit in RAM, and he says he's got a boatload
of indexes too; and apparently Postgres isn't the only thing running on
the machine.  He *definitely* has to buy more RAM if he wants it all
to fit.  I wouldn't necessarily advise going to gigs of shared buffers;
you'll be putting a lot of temptation on the kernel to swap parts of
that out, and it does not sound at all like the workload will keep all
of the buffers "hot" enough to prevent that.

            regards, tom lane

Re: Help tuning a large table off disk and into RAM

От
Greg Smith
Дата:
On Wed, 26 Sep 2007, James Williams wrote:

> The box has 4 x Opterons, 4Gb RAM & five 15k rpm disks, RAID 5.  We
> wanted fast query/lookup.  We know we can get fast disk IO.

You might want to benchmark to prove that if you haven't already.  You
would not be the first person to presume you have fast disk I/O on RAID 5
only to discover that's not actually true when tested.
http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm gives
some details here.

> shared_buffers  = 128MB
> temp_buffers    = 160MB
> work_mem        = 200MB
> max_stack_depth = 7MB

The one you're missing is effective_cache_size, and I'd expect you'd need
to more than double shared_buffers to have that impact things given what
you've described of your tasks.  Take a look at
http://www.westnet.com/~gsmith/content/postgresql/pg-5minute.htm to get a
better idea the right range for those two you should be considering; 128MB
for shared_buffers is way low for your system, something >1GB is probably
right, and effective_cache_size should probably be in the multiple GB
range.

If you actually want to see what's inside the shared_buffers memory, take
a look at the contrib/pg_buffercache module.  Installing that for your
database will let you see how the memory is being used, to get a better
idea how much of your indexes are staying in that part of memory.  The
hint you already got from Bill Moran about using pg_relation_size() will
give you some basis for figuring out what % of the index is being held
there.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Help tuning a large table off disk and into RAM

От
"Scott Marlowe"
Дата:
On 9/26/07, James Williams <james.wlms@googlemail.com> wrote:

> The last is based mostly on the observation that another tiddly
> unrelated mysql db which normally runs fast, grinds to a halt when
> we're querying the postgres db (and cpu, memory appear to have spare
> capacity).

Just a quick observation here.

When the drives are thrashing, is it straight db-storage thrashing, or
is the system swapping out a lot?

Run vmstat 10 while running this query when this happens and pay
attentino to bi bo and si so