Re: Hardware requirements for a PostGIS server

Поиск
Список
Период
Сортировка
От Bill Moran
Тема Re: Hardware requirements for a PostGIS server
Дата
Msg-id 20150210201550.e6042a8c5ba84d8215d7656b@potentialtech.com
обсуждение исходный текст
Ответ на Hardware requirements for a PostGIS server  (Mathieu Basille <basille.web@ase-research.org>)
Список pgsql-general
Responses in-line:

On Tue, 10 Feb 2015 19:52:41 -0500
Mathieu Basille <basille.web@ase-research.org> wrote:
>
> I am posting here a question that I initially asked on the PostGIS list
> [1], where I was advised to try here too (I will keep both lists updated
> about the developments on this issue).
>
> I am currently planning to set up a PostgreSQL + PostGIS instance for my
> lab. Turns out I believe this would be useful for the whole center, so that
> I'm now considering setting up the server for everyone?if interest is
> shared of course. At the moment, I am however struggling with what would be
> required in terms of hardware, and of course, the cost will depend on
> that?at the end of the day, it's really a matter of money well spent. I
> have then a series of questions/remarks, and I would welcome any feedback
> from people with existing experience on setting up a multi-user PostGIS
> server. I'm insisting on the PostGIS aspect, since the most heavy requests
> will be GIS requests (intersections, spatial queries, etc.). However,
> people with similar PostgreSQL setup may have very relevant comments about
> their own configuration.
>
> * My own experience about servers is rather limited: I used PostGIS quite a
> bit, but only on a desktop, with only 2 users. The desktop was quite good
> (quad-core Xeon, 12 Go RAM, 500 GB hd), running Debian, and we never had
> any performance issue (although some queries were rather long, but still
> acceptable).
>
> * The use case I'm envisioning would be (at least in the foreseeable future):
> - About 10 faculty users (which means potentially a little bit more
> students using it); I would have hard time considering more than 4
> concurrent users;
> - Data would primarily involve a lot (hundreds/thousands) of high
> resolution (spatial and temporal) raster and vector maps, possibly over
> large areas (Florida / USA / continental), as well as potentially millions
> of GPS records (animals individually monitored);
> - Queries will primarily involve retrieving points/maps over given
> areas/time, as well as intersecting points over environmental layers [from
> what I understand, a lot of I/O, with many intermediary tables involved];
> other use cases will involve working with steps, i.e. the straight line
> segment connecting two successive locations, and intersecting them with
> environmental layers;
>
> * I couldn't find comprehensive or detailed guidelines on-line about
> hardware, but from what I could see, it seems that memory wouldn't be the
> main issue, but the number of cores would be (one core per database
> connection if I'm not mistaken). At the same time, we want to make sure
> that the experience is smooth for everyone... I was advised on the PostGIS
> list to give a look at pgpool (however, UNIX only).

# of cores helps in parallel processing. But 4 simultaneous users doesn't
particularly mean 4 simultaneous queries. How much time do your users
spend running queries vs. idling? If you don't expect more than 4
concurrent users, I would think you'll be fine with a single quad-core
CPU. I would get the fastest CPU available, though, as it will make
number crunching go faster.

I can't see any reason why you'd want/need pgpool. pgpool is generally
useful when you have a LOT of simultaneous connections, and you're only
estimating 4. Additionally, pgpool is fairly easy to add on later if you
need it ... so my recommendation would be not to worry about it just yet.

> * Does anyone have worked with a server running the DB engine, while the DB
> itself was stored on another box/server? That would likely be the case here
> since we already have a dedicated box for file storage. Along these lines,
> does the system of the file storage box matter (Linux vs. MS)?

Yes. If you have a lot data that will need to be crunched, I would consider
getting SSDs directly attached to the computer running Postgres. Anything
you put between RAM and your disks that slows down transfers is going to
hurt performance. However, since you haven't made an estimate of the physical
size of the data, I can't comment on whether sufficient SSD storage is cost
effective or not.

If you can't get DAS storage, you can make up for some of the performance
hit by getting lots of RAM. Part of the effectiveness of the RAM is dependent
on the OS and it's storage drivers, though, and I have no experience with
how well Windows does that ... and since you didn't mention which file
storage technology you're using, I can't comment on that either. SAN and
NAS storage vary wildly from brand to brand on their performance
characteristics, so it's difficult to say unless you can find someone who
has tried the exact hardware you're liable to be using. If performance is
important, I highly recommend DAS, and furthermore SSDs if you can afford
them.

> * We may also use the server as a workstation to streamline PostGIS
> processing with further R analyses/modeling (or even use R from within the
> database using PL/R). Again, does anyone have experience doing it? Is a
> single workstation the recommended way to work with such workflow? Or would
> it be better (but more costly) to have one server dedicated to PostGIS and
> another one, with different specs, dedicated to analyses (R)?

I know nothing about R. But the question isn't really dependent on R. Whether
it works will depend on how memory and CPU intensive the code you're running
in R is, and whether that's enough CPU/memory usage to interfere with what
Postgres needs to do its portion of the work. Usually, you'll get better
performance by running your non-Postgres processes on another machine, thus
increasing the total # of cores and amount of RAM available to the process,
but sometimes, when the transfer of data from the database to the other
code is the bottleneck, the opposite is true.

Sorry that I'm saying "it depends" so many times, but hopefully the details
on how it depends will help you make decisions, or at least tell you what
to investigate to decide.

--
Bill Moran


В списке pgsql-general по дате отправления:

Предыдущее
От: Mathieu Basille
Дата:
Сообщение: Hardware requirements for a PostGIS server
Следующее
От: Gavin Flower
Дата:
Сообщение: Re: Hardware requirements for a PostGIS server