Re: Low Budget Performance

Поиск

Список

Период

Сортировка

От	Josh Berkus
Тема	Re: Low Budget Performance
Дата	29 октября 2002 г. 11:58:42
Msg-id	web-1803601@davinci.ethosmedia.com обсуждение исходный текст
Ответ на	Re: Low Budget Performance ("Shridhar Daithankar" <shridhar_daithankar@persistent.co.in>)
Ответы	Re: Low Budget Performance
Список	pgsql-performance

Дерево обсуждения

Eric,

> > Currently, I've got postgres running on the same system as the app
> server accessing a single fast IDE drive. The database is on the
> order of 1 gig, with two main tables accounting for 98% of the data.
> Between the app servers and the database, I'm pretty sure that
> neither of the main tables are
> cached in memory for any significant time. I'm guessing that this is
> sub optimal. (The data size is 1 gig now, but I will be adding more 1
> gig databases to this system in the near future)  I'm planning to
> split this into an app server and database server.

One gig is a large database for a single IDE drive -- especially with
multiple client connections.

> > In an ideal world, I'd throw a lot of 15k scsi/raid0+1 at this.
>  But I don't have an ideal world budget.  I've got more of an ide
> world budget, if that. (~1k)

Well, no matter how many performance tricks you add in, the speed will
be limited by the hardware.   Make sure that your client/employer knows
that *before* they complain about the speed.

> > I know the first order of business is to ignore the hardware and
> make sure that I've got all of the table scans found and turned into
> indexes.  I'm still working on that. Are there any tools that save
> queries and the plans, then report on the ones that are the biggest
> performance drags?

Not exactly.   If you enable Postgres 7.2 STATISTICS, you can get a lot
of information about which indexes are being used, which are not, and
which tables are having a lot of table scans.   A tool like you
describe would be really, really useful -- in fact, if anyone wrote
one, I'm sure you could sell it for $$$$.

> > But since I do software, it's obviously a hardware problem. ;>

<grin>

Actually, your best option for the hardware is to test what portion of
the hardware is bottlenecking your performance, and address that.   But
first:

> Well, IDE RAID looks like nice optio to me, but before finalising
> RAID config.,
> I would advice to test performance and scalability with separate
> database
> server and couple of Gigs of RAM.

I'm not convinced that current IDE RAID actually improves database disk
throughput -- there's a lot of overhead in the one controller I tried
(Promise).  Does anyone have some statistics they can throw at me?

A cheaper and easier method, involving 3-4 disks:

Channel 1, Disk 1:  Operating System, Swap, and PostgreSQL log
Channel 1, Disk 2:  WAL Files
Channel 2, Disk 1:  Database
Channel 2, Disk 2 (optional):  2nd database data

*however*, if you have multiple databases being simulteaneously
accessesed, you will want to experiment with shuffling around the
databases and WAL files to put them on different disks.  The principle
is to divide the disk tasks that are simultaenous ammonng as many disks
as possible; thus the WAL files always do better on a different disk
and channel than the database.

> > Does the write ahead logging of PG mean that no matter what indexes
> and data are changed, that there will be one sync to disk?  Does this
> reduce the penalty of indexes?

In a word:  No.   Depending on the size of the update, there may be
multiple synchs.    And indexes do carry a significant penalty on large
updates; just try runninng 10,000 updates to an indexed column as one
transaction, and the penalty will be obvious.  In fact, for my data
load procedures, I tend to drop and re-create indexes.

> WAL seems to mean that to get
> performance out of a drive array, I'd want to use the fastest
> (latency/throughput) logical
> single image I could get, not a collection of mirrored drives.

Mirrored drives are different than RAID.   However, you are correct
that the redundancy/fail-over factor in some RAID and Mirroring comes
at a performance penalty.

But you need to determine where you are actually losing time.
  Assuming that your tables are correctly indexed, your files are
distributed, and your database is VACUUM FULL ANALYZed, and your
postgresql.conf configured for optimum use of your exisiting memory,
then here's what you do (assuming that you use Linux)

1. From a workstation, open 2 terminal windows on the server.   In #1,
run "vmstat 3", in the other "top"

2. Have your users pound on the application, trying all of the most
complicated (and slow) operations in the app.  More users is better,
for this test.

3. Watch Vmstat and Top.   What you're looking for is:

a) Is the processor at 75% or above?  If so, you either need a faster
processor or more efficient queries

b) Is the system using 80-100% of the RAM which you allocated it?  If
so, add more RAM and increase the Postgresql.conf memory variables.

c) Is the system using Swap memory?  if so, either add more RAM, or
*decrease* the postgresql.conf memory variables.

d) Are RAM and Processor at less than 50%, but the Disk I/O reaches a
maximum number and stays there for minutes?  Then your disk channel is
flooded, and you cannot improve performance except by either improving
your queries so the pull less rows, or adding more/faster disk
capacity.

The above process, while drawn-out, will help you avoid spending a lot
of money on, for example, RAM that won't make a difference.

-Josh Berkus

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Low Budget Performance