Working on huge RAM based datasets
От | Andy Ballingall |
---|---|
Тема | Working on huge RAM based datasets |
Дата | |
Msg-id | 00ac01c46516$3ec859e0$0300a8c0@lappy обсуждение исходный текст |
Ответ на | Odd sorting behaviour ("Steinar H. Gunderson" <sgunderson@bigfoot.com>) |
Ответы |
Re: Working on huge RAM based datasets
|
Список | pgsql-performance |
Hi, I'm really stuck and I wonder if any of you could help. I have an application which will be sitting on a quite large database (roughly 8-16GB). The nature of the application is such that, on a second by second basis, the working set of the database is likely to be a substantial portion (e.g. between 50 and 70%) of the data - Just imagine an almost stochastic sampling of the data in each table, and you'll get an idea. Potentially quite smelly. To start with, I thought. No problems. Just configure a DB server with an obscene amount of RAM (e.g. 64GB), and configure PG with a shared buffer cache that is big enough to hold every page of data in the database, plus 10% or whatever to allow for a bit of room, ensuring that there is enough RAM in the box so that all the backend processes can do their thing, and all the other services can do their thing, and the swap system on the host remains idle. Apparently not :( I've read a number of places now saying that the PG cache has an optimal size which isn't "as big as you can make it without affecting other stuff on the machine". The suggestion is to let linux take the strain for the lion's share of the caching (using its buffer cache), and just make the PG cache big enough to hold the data it needs for individual queries. ___ Ignoring for a moment the problem of answering the question 'so how big shall I make the PG cache?', and ignoring the possibility that as the database content changes over the months this answer will need updating from time to time for optimal performance, does anyone have any actual experience with trying to maintain a large, mainly RAM resident database? What is it about the buffer cache that makes it so unhappy being able to hold everything? I don't want to be seen as a cache hit fascist, but isn't it just better if the data is just *there*, available in the postmaster's address space ready for each backend process to access it, rather than expecting the Linux cache mechanism, optimised as it may be, to have to do the caching? Is it that the PG cache entries are accessed through a 'not particularly optimal for large numbers of tuples' type of strategy? (Optimal though it might be for more modest numbers). And on a more general note, with the advent of 64 bit addressing and rising RAM sizes, won't there, with time, be more and more DB applications that would want to capitalise on the potential speed improvements that come with not having to work hard to get the right bits in the right bit of memory all the time? And finally, am I worrying too much, and actually this problem is common to all databases? Thanks for reading, Andy
В списке pgsql-performance по дате отправления: