Обсуждение: where is my bottleneck?

Поиск
Список
Период
Сортировка

where is my bottleneck?

От
Ben
Дата:
Hello everybody. I'm trying to build a postgres server to act as the
repository for a bunch of data-crunching clients. Hardware-wise, the
machine is a dual p3 700 with 768MB of RAM and 3 striped 10k RPM u160
drives sitting on a DPT SmartRAID V.  Software-wise, I've installed linux
2.4.18 and postgres 7.2.

The database is structured such that there is a control table that every
client will perform indexed lookups from and the very occasional update
on, and this table is small enough that it easily fits in memory. All the
data is written to a second table and never accessed again.

It seems to me that such a machine should be able to handle an awful lot
of simple database work, but I'm disappointed in how little seems to be
getting through. Each client connection does ~1500 lookups in the control
table and another ~1500 inserts into the data table. After I add about 10
of these clients simultaneously, the cpu load (according to vmstat) goes
to about 30% idle. Interestingly, despite the fact that the disks are not
very busy, adding more clients does not make the processors work
harder.... they stay around 20-30% loaded even if I triple the load to 30
simultaneous clients.

I've checked all the sql commands I use, and all the lookups and updates
are using indices. The system isn't swapping, and posgres is using its
400MB of shared buffers to good effect. I have noticed that context
switches seem rediculously high - about 30,000/sec, give or take 10k - but
having nothing to compare it to I don't know if that's unreasonable for a
busy database machine or even what I could do about it if it wasn't.

I am using a much-derided 8139-based ethernet card, but seeing as how I'm
only transfering about 30KB/sec each way (and I know for a fact that even
*that* chipset can transmit well over 2MB/sec), I don't think that is the
blocking issue.

Does anybody have any suggestions as to how I can make this server work
closer to its limits?  It takes 10 clients about 3 minutes to run, meaning
that postgres is roughly maxing out at (5,000 indexed lookups + 5,000
inserts)/sec. It seems to me that my hardware should be able to support
more than that. Am I being unreasonable? I'm particularly interested as to
why adding more clients doesn't make postgres work harder, when the disks
are not being used very much and vmstat claims the cpus are not fully
utilized.




Re: where is my bottleneck?

От
Jeff Fitzmyers
Дата:
> getting through. Each client connection does ~1500 lookups in the
> control
> table and another ~1500 inserts into the data table.

Have you tried a timing 100,000 simple INSERTs / SELECTs / UPDATEs
locally? Then you would have an approximate number for raw capability.
Then add a client connection. Then multiple connections. Then build up
to the clients queries.

Interesting question, Jeff


Re: where is my bottleneck?

От
Ben
Дата:
The strace for one of the clients is a lot of data. However, it certainly
doesn't seem right that, out of the 33038 commands I captured in the
strace, 31037 of them were semop().

BTW, after watching stats a bit closer, the number of context switches is
"only" about 1000/sec with just one client attached; attaching more than
one client makes the number jump up to the 30000/sec mark.

On Sat, 2 Mar 2002, Helge Bahmann wrote:

> On Sat, 2 Mar 2002, Ben wrote:
> > I've checked all the sql commands I use, and all the lookups and updates
> > are using indices. The system isn't swapping, and posgres is using its
> > 400MB of shared buffers to good effect. I have noticed that context
> > switches seem rediculously high - about 30,000/sec, give or take 10k - but
>
> This is ridiculously high; can you attach "strace" to one of the postgres
> processes while stressing the system and send a snippet of the output?
> (strace -p <pid>)
>
> I suspect Postgres locking code may be responsible for this huge amount of
> ctx switches; if you see huge amount of pointless select() statements
> in the trace, this would prove my theory and it would be time to
> rethink the current locking strategy.
>
> Regards
> --
> Helge Bahmann <bahmann@math.tu-freiberg.de>             /| \__
> Network admin, systems programmer                      /_|____\
>                                                      _/\ |   __)
> $ ./configure                                        \\ \|__/__|
> checking whether build environment is sane... yes     \\/___/ |
> checking for AIX... no (we already did this)            |
>
>
>