Обсуждение: Which hardware ?

Поиск

Список

Период

Сортировка

Which hardware ?

От

"Lionel"

Дата:

17 июня 2008 г., 14:06:40

Hi,

I need to install a 8.3 database and was wondering which hardware would be
sufficient to have good performances (less than 30s for� slowest select).

Database size: 25 Go /year, 5 years of history
One main table containing 40 million lines per year.
Batch inserts of 100000 lines. Very very few deletes, few updates.

30 other tables, 4 levels of hierarchy, containing from 10 lines up to 20000
lines.
5 of them have forein keys on the main table.

I will use table partitionning on the year column.

Statements will mainly do sums on the main table, grouped by whatever column
of the database (3-5 joined tables, or join on join), with some criterions
that may vary, lots of "joined varchar in ('a','b',...,'z')".
It's almost impossible to predict what users will do via the webapplication
that queries this database: almost all select, join, group by, where...
possibilities are available.

Up to 4 simultaneous users.

I'm planning to host it on a quad xeon 2.66Ghz with 8Go of DDR2, and a dual
(RAID1) SATA2 750Go HD.
Perharps with another HD for indexes.

Do you think it will be enough ?
Is another RAID for better performances a minimum requirement ?
Will a secondary HD for indexes help ?

Which OS would you use ? (knowing that there will be a JDK 1.6 installed
too)

With 5 millions of lines, the same application runs quite fast on windows
2000 on a single P4 2.8 GHz (very few statements last more than 10s, mostly
when concurrent statements are made). Each statement consumes 100% of the
CPU.


thanks for advices.

Re: Which hardware ?

От

"Scott Marlowe"

Дата:

17 июня 2008 г., 14:23:40

On Tue, Jun 17, 2008 at 7:38 AM, Lionel <lionel@art-informatique.com> wrote:
> Hi,
>
> I need to install a 8.3 database and was wondering which hardware would be
> sufficient to have good performances (less than 30s for² slowest select).
>
> Database size: 25 Go /year, 5 years of history
> One main table containing 40 million lines per year.
> Batch inserts of 100000 lines. Very very few deletes, few updates.
>
> 30 other tables, 4 levels of hierarchy, containing from 10 lines up to 20000
> lines.
> 5 of them have forein keys on the main table.
>
> I will use table partitionning on the year column.
>
> Statements will mainly do sums on the main table, grouped by whatever column
> of the database (3-5 joined tables, or join on join), with some criterions
> that may vary, lots of "joined varchar in ('a','b',...,'z')".
> It's almost impossible to predict what users will do via the webapplication
> that queries this database: almost all select, join, group by, where...
> possibilities are available.
>
> Up to 4 simultaneous users.
>
> I'm planning to host it on a quad xeon 2.66Ghz with 8Go of DDR2, and a dual
> (RAID1) SATA2 750Go HD.
> Perharps with another HD for indexes.
>
> Do you think it will be enough ?
> Is another RAID for better performances a minimum requirement ?
> Will a secondary HD for indexes help ?

More drives, all in the same RAID-10 setup.  For reporting like this
writing speed often isn't that critical, so you are often better off
with software RAID-10 than using a mediocre hardware RAID controller
(most adapatecs, low end LSI, etc...)

You'd be surprised what going from a 2 disk RAID1 to a 4 disk RAID10
can do in these circumstances.  Going up to 6, 8, 10 or more disks
really makes a difference.

> Which OS would you use ? (knowing that there will be a JDK 1.6 installed
> too)

I'd use RHEL5 because it's what I'm familiar with.  Any stable flavor
of linux or FreeBSD7 are good performance choices if you know how to
drive them.

> With 5 millions of lines, the same application runs quite fast on windows
> 2000 on a single P4 2.8 GHz (very few statements last more than 10s, mostly
> when concurrent statements are made). Each statement consumes 100% of the
> CPU.

That statement about concurrent statements REALLY sells me on the idea
of a many disk RAID10 here.  I'd take that over quad cores for what
you're doing any day.  Not that I'd turn down quad cores here either.
:)

Re: Which hardware ?

От

Andrew Sullivan

Дата:

17 июня 2008 г., 14:25:19

On Tue, Jun 17, 2008 at 03:38:59PM +0200, Lionel wrote:
> Hi,
>
> I need to install a 8.3 database and was wondering which hardware would be
> sufficient to have good performances (less than 30s for� slowest select).

> Statements will mainly do sums on the main table, grouped by whatever column
> of the database (3-5 joined tables, or join on join), with some criterions
> that may vary, lots of "joined varchar in ('a','b',...,'z')".
> It's almost impossible to predict what users will do via the webapplication
> that queries this database: almost all select, join, group by, where...
> possibilities are available.

I'm not sure that I have any specific recommendation to make in the
face of such sweeping requirements.  But I'd say you need to make I/O
cheap, which means piles of memory and extremely fast disk
subsystems.

Also, there's another important question (which never gets asked in
these discussions), which is, "How much is the performance worth to
you?"  If the last 10% of users get something longer than 30s, but
less than 40s, and they will pay no more to get the extra 10s response
time, then it's worth nothing to you, and you shouldn't fix it.

> Up to 4 simultaneous users.

You won't need lots of processer, then.

> I'm planning to host it on a quad xeon 2.66Ghz with 8Go of DDR2, and a dual
> (RAID1) SATA2 750Go HD.
> Perharps with another HD for indexes.

How big's the database?  If you can have enough memory to hold the
whole thing, including all indexes, in memory, that's what you want.
Apart from that, "dual SATA2" is probably underpowered.  But. . .

> Which OS would you use ? (knowing that there will be a JDK 1.6 installed
> too)

. . .I think this is the real mistake.  Get a separate database box.
It's approximately impossible to tune a box correctly for both your
application and your database, in my experience.

A

--
Andrew Sullivan
ajs@commandprompt.com
+1 503 667 4564 x104
http://www.commandprompt.com/

Re: Which hardware ?

От

"Scott Marlowe"

Дата:

17 июня 2008 г., 14:32:06

On Tue, Jun 17, 2008 at 8:25 AM, Andrew Sullivan <ajs@commandprompt.com> wrote:
> On Tue, Jun 17, 2008 at 03:38:59PM +0200, Lionel wrote:

>> Which OS would you use ? (knowing that there will be a JDK 1.6 installed
>> too)
>
> . . .I think this is the real mistake.  Get a separate database box.
> It's approximately impossible to tune a box correctly for both your
> application and your database, in my experience.

Having had to install jvm 1.6 for a few weird admin bits we were using
in the past, I didn't even think the same thing as you on this.  Good
catch.

Re: Which hardware ?

От

Matthew Wakeling

Дата:

17 июня 2008 г., 14:34:25

On Tue, 17 Jun 2008, Lionel wrote:
> I need to install a 8.3 database and was wondering which hardware would be
> sufficient to have good performances (less than 30s for² slowest select).

> It's almost impossible to predict what users will do via the webapplication
> that queries this database: almost all select, join, group by, where...
> possibilities are available.

Well, Scott has given you some good pointers on how to make a fast system.
However, your original question ("is this fast enough") is impossible to
answer, especially if the users are allowed to run absolutely anything
they want. I bet you I could craft a query that takes more than 30 seconds
regardless of how fast you make your system.

Having said that, I'll add the suggestion that you should put as much RAM
in the box as possible. It can only help. As others have said, if you only
have four users, then CPU power isn't going to be such a problem, and
given that, I'd disagree with Andrew and say as long as you have plenty of
RAM, Java can play well with a database on the same box. Depends what it
is doing, of course.

Matthew

--
To be or not to be           -- Shakespeare
To do is to be               -- Nietzsche
To be is to do               -- Sartre
Do be do be do               -- Sinatra

Re: Which hardware ?

От

"Lionel"

Дата:

17 июня 2008 г., 15:06:44

Andrew Sullivan wrote:
> You won't need lots of processer, then.

can't find less than quad core for this price range...

> How big's the database?

with 20 millions of rows, the main table is 3.5 Go on win XP.
With 8 Go of indexes.

I estimate the whole database around 30 Go / year

>  If you can have enough memory to hold the
> whole thing, including all indexes, in memory, that's what you want.
> Apart from that, "dual SATA2" is probably underpowered.  But. . .

RAID is twice more expansive.
(600euros/month for a 5x750Go SATA2 with 12Gb of ram and unnecessary 2x quad
core)

didn't find any RAID 10 "not too expansive" dedicated server.

If this setup is twice as fast, I can afford it. But if it a 30sec VS
40sec...I'm not sure my customer will pay.

>> Which OS would you use ? (knowing that there will be a JDK 1.6
>> installed too)
>
> . . .I think this is the real mistake.  Get a separate database box.
> It's approximately impossible to tune a box correctly for both your
> application and your database, in my experience.

My tomcat webapp is well coded  and consumes nearly nothing.
On such powerful hardware, I prefer to run both on the same server.
I could eventually run it on a different server, much less powerfull, but
it's not on the same network, I guess this would be an issue.

Re: Which hardware ?

От

Greg Smith

Дата:

17 июня 2008 г., 15:32:27

On Tue, 17 Jun 2008, Andrew Sullivan wrote:

>> Which OS would you use ? (knowing that there will be a JDK 1.6 installed
>> too)
>
> . . .I think this is the real mistake.  Get a separate database box.
> It's approximately impossible to tune a box correctly for both your
> application and your database, in my experience.

I can't remember the last time I deployed a PG box that didn't have a Java
app or three on it, too.  You've got even odds that putting it a separate
system will even be a improvement.  Yes, if the Java app is a pig and the
machine doesn't have enough resources, separating it out to another system
will help.  But there are plenty of these buggers that will end up so much
slower from the additional network latency that it's a net loss (depends
on how the app groups its requests for rows).

If you know enough about Java to watch things like how much memory the
JVMs are taking up, I wouldn't hesitate to put them all on the same
machine.  Considering that Lionel's system seems pretty overpowered for
what he's doing--runs plenty fast on a much slower system, enough RAM to
hold a large portion of the primary tables and database, all batch updates
that don't really need a good RAID setup--I'd say "looks good" here and
recommend he just follow the plan he outlined.  Just watch the system with
top for a bit under load to make sure the Java processes are staying under
control.

As for OS, a RHEL5 or clone like CentOS should work fine here, which is
more appropriate depends on your support requirements.  I would recommend
against using FreeBSD as it's not the friendliest Java platform, and the
additional complexity of Solaris seems like overkill for your app.
Basically, evem though it's available for more of them, I only consider
deploying a Java app on one of the mainstream platforms listed at
http://www.java.com/en/download/manual.jsp right now because those are the
mature releases.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Which hardware ?

От

"Scott Marlowe"

Дата:

17 июня 2008 г., 15:43:26

On Tue, Jun 17, 2008 at 9:32 AM, Greg Smith <gsmith@gregsmith.com> wrote:

>  Considering that Lionel's system seems pretty overpowered for what he's
> doing--runs plenty fast on a much slower system, enough RAM to hold a large
> portion of the primary tables and database, all batch updates that don't
> really need a good RAID setup--I'd say "looks good" here and recommend he
> just follow the plan he outlined.  Just watch the system with top for a bit
> under load to make sure the Java processes are staying under control.

In the original post he mentioned that he had 5 years of data at about
25G / year.

With 125G of data, it's likely that if most queries are on recent data
it'll be in RAM, but anything that hits older data will NOT have that
luxury.  Which is why I recommended RAID-10.  It doesn't have to be on
a $1200 card with 44 disks or something, but even 4 disks in a sw
RAID-10 will be noticeably faster (about 2x) than a simple RAID-1 at
hitting that old data.

We had a reporting server with about 80G of data on a machine with 4G
ram last place I worked, and it could take it a few extra seconds to
hit the old data, but the SW RAID-10 on it made it much faster at
reporting than it would have been with a single disk.

Re: Which hardware ?

От

Andrew Sullivan

Дата:

17 июня 2008 г., 15:43:29

On Tue, Jun 17, 2008 at 04:49:17PM +0200, Lionel wrote:
> My tomcat webapp is well coded  and consumes nearly nothing.

If I were ever inclined to say, "Nonsense," about code I've never
seen, this is probably the occasion on which I'd do it.  A running JVM
is necessarily going to use some memory, and that is memory use that
you won't be able to factor out properly when developing models of
your database system performance.

> I could eventually run it on a different server, much less powerfull, but
> it's not on the same network, I guess this would be an issue.

The power of the system is hard to know about in the context (with
only 8Go of memory, I don't consider this a powerful box at all,
note).  But why wouldn't it be on the same network?  You're using the
network stack anyway, note: JVMs can't go over domain sockets.

A

--
Andrew Sullivan
ajs@commandprompt.com
+1 503 667 4564 x104
http://www.commandprompt.com/

Re: Which hardware ?

От

"Scott Marlowe"

Дата:

17 июня 2008 г., 16:22:19

On Tue, Jun 17, 2008 at 9:42 AM, Andrew Sullivan <ajs@commandprompt.com> wrote:
> On Tue, Jun 17, 2008 at 04:49:17PM +0200, Lionel wrote:
>> My tomcat webapp is well coded  and consumes nearly nothing.
>
> If I were ever inclined to say, "Nonsense," about code I've never
> seen, this is probably the occasion on which I'd do it.  A running JVM
> is necessarily going to use some memory, and that is memory use that
> you won't be able to factor out properly when developing models of
> your database system performance.

But if that amount of memory is 256 Megs and it only ever acts as a
control panel or data access point, it's probably not a huge issue.
If it's 2 Gig it's another issue.  It's all about scale.  The real
performance hog for me on all in one boxes has been perl / fastcgi
setups.

> The power of the system is hard to know about in the context (with
> only 8Go of memory, I don't consider this a powerful box at all,
> note).

I always think of main memory in terms of how high a cache hit rate it
can get me.  If 8G gets you a 50% hit rate, and 16G gets you a 95% hit
rate, then 16G is the way to go.  But if 8G gets you to 75% and 32G
gets you to 79% because of your usage patterns (the world isn't always
bell curve shaped) then 8G is plenty and it's time to work on faster
disk subsystems if you need more performance.

Re: Which hardware ?

От

Greg Smith

Дата:

17 июня 2008 г., 16:32:19

On Tue, 17 Jun 2008, Andrew Sullivan wrote:

> A running JVM is necessarily going to use some memory, and that is
> memory use that you won't be able to factor out properly when developing
> models of your database system performance.

Now you've wandered into pure FUD.  Tuning maximum memory usage on a Java
app so you can model it is straightforward (albeit a little confusing at
first), and in most cases you can just sample it periodically to get a
good enough estimate for database tuning purposes.  JVMs let you adjust
maximum memory use with -Xmx , and if anything the bigger problem I run
into is that using too much memory hits that limit and crashes Java long
before it becomes a hazard to the database.

This is a system with 8GB of RAM here; having some Tomcat instances
co-existing with the database when there's that much room to work is not
that hard.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Which hardware ?

От

Greg Smith

Дата:

17 июня 2008 г., 16:56:34

On Tue, 17 Jun 2008, Scott Marlowe wrote:

> We had a reporting server with about 80G of data on a machine with 4G
> ram last place I worked, and it could take it a few extra seconds to
> hit the old data, but the SW RAID-10 on it made it much faster at
> reporting than it would have been with a single disk.

I agree with your statement above, that query time could likely be dropped
a few seconds with a better disk setup.  I just question whether that's
necessary given the performance target here.

Right now the app is running on an underpowered Windows box and is
returning results in around 10s, on a sample data set that sounds like 1/8
of a year worth of data (1/40 of the total).  It is seemingly CPU bound
with not enough processor to handle concurrent queries being the source of
the worst-case behavior.  The target is keeping that <30s on more powerful
hardware, with at least 6X as much processor power and a more efficient
OS, while using yearly partitions to keep the amount of data to juggle at
once under control.  That seems reasonable to me, and while better disks
would be nice I don't see any evidence they're really needed here.  This
application sounds a batch processing/reporting one where plus or minus a
few seconds doesn't have a lot of business value.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Which hardware ?

От

"Scott Marlowe"

Дата:

17 июня 2008 г., 17:07:39

On Tue, Jun 17, 2008 at 10:56 AM, Greg Smith <gsmith@gregsmith.com> wrote:
> On Tue, 17 Jun 2008, Scott Marlowe wrote:
>
>> We had a reporting server with about 80G of data on a machine with 4G
>> ram last place I worked, and it could take it a few extra seconds to
>> hit the old data, but the SW RAID-10 on it made it much faster at
>> reporting than it would have been with a single disk.
>
> I agree with your statement above, that query time could likely be dropped a
> few seconds with a better disk setup.  I just question whether that's
> necessary given the performance target here.
>
> Right now the app is running on an underpowered Windows box and is returning
> results in around 10s, on a sample data set that sounds like 1/8 of a year
> worth of data (1/40 of the total).  It is seemingly CPU bound with not
> enough processor to handle concurrent queries being the source of the
> worst-case behavior.  The target is keeping that <30s on more powerful
> hardware, with at least 6X as much processor power and a more efficient OS,
> while using yearly partitions to keep the amount of data to juggle at once
> under control.  That seems reasonable to me, and while better disks would be
> nice I don't see any evidence they're really needed here.  This application
> sounds a batch processing/reporting one where plus or minus a few seconds
> doesn't have a lot of business value.

I think you're making a big assumption that this is CPU bound.  And it
may be that when all the queries are operating on current data that it
is.  But as soon as a few ugly queries fire that need to read tens of
gigs of data off the drives, then you'll start to switch to I/O bound
and the system will slow a lot.

We had a single drive box doing work on an 80G set that was just fine
with the most recent bits.  Until I ran a report that ran across the
last year instead of the last two days, and took 2 hours to run.

All the queries that had run really quickly on all the recent data
suddenly were going from 1 or 2 seconds to 2 or 3 minutes.  And I'd
have to kill my reporting query.

Moved it to the same exact hardware but with a 4 disc RAID-10 and the
little queries stayed 1-2 seconds while th reporting queries were cut
down by factors of about 4 to 10.  RAID-1 will be somewhere between
them I'd imagine.  RAID-10 has an amazing ability to handle parallel
accesses without falling over performance-wise.

You're absolutely right though, we really need to know the value of
fast performance here.

If you're monitoring industrial systems you need fast enough response
to spot problems before they escalate to disasters.

If you're running aggregations of numbers used for filling out
quarterly reports, not so much.

Re: Which hardware ?

От

"Lionel"

Дата:

17 июня 2008 г., 18:06:37

"Scott Marlowe" wrote:
> You're absolutely right though, we really need to know the value of
> fast performance here.

the main problem is that my customers are used to have their reporting after
few seconds.
They want do have 10 times more data but still have the same speed, which
is, I think, quite impossible.

> If you're running aggregations of numbers used for filling out
> quarterly reports, not so much.

The application is used to analyse products sales behaviour, display charts,
perform comparisons, study progression...
10-40 seconds seems to be a quite good performance.
More than 1 minute will be too slow (meaning they won't pay for that).

I did some test with a 20 millions lines database on a single disk dual core
2GB win XP system (default postgresql config), most of the time is spent in
I/O: 50-100 secs for statements that scan 6 millions of lines, which will
happen. Almost no CPU activity.

So here is the next question: 4 disks RAID10 (did not find a french web host
yet) or 5 disk RAID5 (found at 600euros/month) ?
I don't want to have any RAID issue...
I did not have any problem with my basic RAID1 since many years, and don't
want that to change.

Re: Which hardware ?

От

"Scott Marlowe"

Дата:

17 июня 2008 г., 18:31:28

On Tue, Jun 17, 2008 at 11:59 AM, Lionel <lionel@art-informatique.com> wrote:
> "Scott Marlowe" wrote:
>> You're absolutely right though, we really need to know the value of
>> fast performance here.
>
> the main problem is that my customers are used to have their reporting after
> few seconds.
> They want do have 10 times more data but still have the same speed, which
> is, I think, quite impossible.
>
>> If you're running aggregations of numbers used for filling out
>> quarterly reports, not so much.
>
> The application is used to analyse products sales behaviour, display charts,
> perform comparisons, study progression...
> 10-40 seconds seems to be a quite good performance.
> More than 1 minute will be too slow (meaning they won't pay for that).
>
> I did some test with a 20 millions lines database on a single disk dual core
> 2GB win XP system (default postgresql config), most of the time is spent in
> I/O: 50-100 secs for statements that scan 6 millions of lines, which will
> happen. Almost no CPU activity.
>
> So here is the next question: 4 disks RAID10 (did not find a french web host
> yet) or 5 disk RAID5 (found at 600euros/month) ?
> I don't want to have any RAID issue...
> I did not have any problem with my basic RAID1 since many years, and don't
> want that to change.

Do you have root access on your servers?  then just ask for 5 disks
with one holding the OS / Apps and you'll do the rest.  Software RAID
is probably a good fit for cheap right now.

If you can set it up yourself, you might be best off with >2 disk
RAID-1.  5 750G disks in a RAID-1 yields 750G of storage (duh) but
allows for five different readers to operate without the heads having
to seek.  large amounts of data can be read at a medium speed from a
RAID-1 like this.  But most RAID implementations don't aggregate
bandwidth for RAID-1.

They do for RAID-0.  So, having a huge RAID-0 zero array allows for
reading a large chunk of data really fast from all disks at once.

RAID1+0 gives you the ability to tune this in either direction.  But
the standard config of a 4 disk setup (striping two mirrors, each made
from two disks, is a good compromise to start with.  Average read
speed of array is doubled, and the ability to have two reads not
conflict helps too.

RAID5 is a comproise to provide the most storage while having mediocre
performance or, when degraded, horrifficaly poor performance.

Hard drives are cheap, hosting not as much.

Also, always look at optimizing their queries.  A lot of analysis is
done by brute force queries that rewritten intelligently suddenly run
in minutes not hours.  or seconds not minutes.

Re: Which hardware ?

От

Greg Smith

Дата:

17 июня 2008 г., 21:15:29

On Tue, 17 Jun 2008, Lionel wrote:

> I did some test with a 20 millions lines database on a single disk dual core
> 2GB win XP system (default postgresql config), most of the time is spent in
> I/O: 50-100 secs for statements that scan 6 millions of lines, which will
> happen. Almost no CPU activity.

I hope you're aware that the default config is awful, and there are all
sorts of possible causes for heavy I/O churn that might improve if you
setup the postgresql.conf file to use the server's resources more
aggressively (the default is setup for machines with a very small amount
of RAM).  There are lots of links to articles that cover the various areas
you might improve at
http://wiki.postgresql.org/wiki/Performance_Optimization

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Which hardware ?

От

Jon D

Дата:

18 июня 2008 г., 01:15:11


----------------------------------------
> From: lionel@art-informatique.com
> Subject: [PERFORM] Which hardware ?
> Date: Tue, 17 Jun 2008 15:38:59 +0200
> To: pgsql-performance@postgresql.org
>
> Hi,
>
> I need to install a 8.3 database and was wondering which hardware would be
> sufficient to have good performances (less than 30s for² slowest select).
>
> Database size: 25 Go /year, 5 years of history
> One main table containing 40 million lines per year.
> Batch inserts of 100000 lines. Very very few deletes, few updates.
>
> 30 other tables, 4 levels of hierarchy, containing from 10 lines up to 20000
> lines.
> 5 of them have forein keys on the main table.
>
> I will use table partitionning on the year column.
>
> Statements will mainly do sums on the main table, grouped by whatever column
> of the database (3-5 joined tables, or join on join), with some criterions
> that may vary, lots of "joined varchar in ('a','b',...,'z')".
> It's almost impossible to predict what users will do via the webapplication
> that queries this database: almost all select, join, group by, where...
> possibilities are available.
>
> Up to 4 simultaneous users.
>
> I'm planning to host it on a quad xeon 2.66Ghz with 8Go of DDR2, and a dual
> (RAID1) SATA2 750Go HD.
> Perharps with another HD for indexes.
>
> Do you think it will be enough ?
> Is another RAID for better performances a minimum requirement ?
> Will a secondary HD for indexes help ?
>
> Which OS would you use ? (knowing that there will be a JDK 1.6 installed
> too)
>
> With 5 millions of lines, the same application runs quite fast on windows
> 2000 on a single P4 2.8 GHz (very few statements last more than 10s, mostly
> when concurrent statements are made). Each statement consumes 100% of the
> CPU.
>
>
> thanks for advices.
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


I think hardware isnt going to solve your problem,  especially the cpu.  You only have four users..  and postgres can
onlyuse 1 core per query.   If you have sequential scans that span this table and say it has 60-80 million rows,  It
cancould take longer then 30 seconds.  Even if you have alot of ram.  Just imagine what postgres is doing... if its
targetsearch is going to end in searching 40 million rows and it has to aggregate on two,  or three columns its going
tobe slow.  No amount of hardware is going to fix this.   Sure you can gain some speed by having entire tables in ram.
No magic bullet here.   Disk is definitely not a magic bullet.    Even if you have a bunch of fast disks its still much
slowerthen RAM in performing reads.    So if you read heavy then adding more disk isnt going to just solve all your
problems.  RAM is nice.   The more pages you can keep in ram the less reading from the disk.   

Even with that all said and done...   aggregating lots of rows takes time.    I suggest you come up with a system from
preaggregatingyour data if possible.  Identify all of your target dimensions.   If your lucky,  you only have a few key
dimensionswhich can reduce size of table by lots and reduce queries to 1-2 seconds.   There are a number of ways to
tacklethis,  but postgres is a nice db to do this with, since writers do not block readers.    

I think you should focus on getting this system to work well with minimal hardware first.  Then you can upgrade.   Over
thenext few years the db is only going to get larger.   You have 4 users now.. but who's to say what it will evolve
into.    
_________________________________________________________________
Earn cashback on your purchases with Live Search - the search that pays you back!
http://search.live.com/cashback/?&pkw=form=MIJAAF/publ=HMTGL/crea=earncashback

Re: Which hardware ?

От

"Lionel"

Дата:

20 июня 2008 г., 11:07:11

"Scott Marlowe" wrote:
> We had a reporting server with about 80G of data on a machine with 4G
> ram last place I worked, and it could take it a few extra seconds to
> hit the old data, but the SW RAID-10 on it made it much faster at
> reporting than it would have been with a single disk.

Would this be a nice choice ?

HP Proliant DL320 G5p  Xeon DC 3 GHz - 8 Go RAM DDR2 ECC
- 4 x 146 Go SAS 15k rpm   - RAID-10 HP Smart Array (128 Mo cache)

I finally choose to have 2 data tables:
- one with pre aggregated (dividing size by 10), unpartitionned (=the
database they currently use)
- one with original data, yearly partitionned

I will choose before each statement which table will be used depending on
which select/joins/where/groupby the user choosed.
The aggregated datas will allow me to maintain actual performances (and even
improve it using the new hardware twice more powerfull).

I think lines aggregation will be handled by the java application (excel/csv
file loaded in memory),
which will be much faster than using a trigger on insertion in the full
table.

Thanks.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Which hardware ?