Re: Working on huge RAM based datasets
От | Andy Ballingall |
---|---|
Тема | Re: Working on huge RAM based datasets |
Дата | |
Msg-id | 011301c46597$15d145c0$0300a8c0@lappy обсуждение исходный текст |
Ответ на | Odd sorting behaviour ("Steinar H. Gunderson" <sgunderson@bigfoot.com>) |
Список | pgsql-performance |
Thanks, Chris. > > What is it about the buffer cache that makes it so unhappy being able to > > hold everything? I don't want to be seen as a cache hit fascist, but isn't > > it just better if the data is just *there*, available in the postmaster's > > address space ready for each backend process to access it, rather than > > expecting the Linux cache mechanism, optimised as it may be, to have to do > > the caching? > > Because the PostgreSQL buffer management algorithms are pitiful compared > to Linux's. In 7.5, it's improved with the new ARC algorithm, but still > - the Linux disk buffer cache will be very fast. > I've had that reply elsewhere too. Initially, I was afraid that there was a memory copy involved if the OS buffer cache supplied a block of data to PG, but I've learned a lot more about the linux buffer cache, so it now makes more sense to me why it's not a terrible thing to let the OS manage the lions' share of the caching on a high RAM system. On another thread, (not in this mailing list), someone mentioned that there are a class of databases which, rather than caching bits of database file (be it in the OS buffer cache or the postmaster workspace), construct a a well indexed memory representation of the entire data in the postmaster workspace (or its equivalent), and this, remaining persistent, allows the DB to service backend queries far quicker than if the postmaster was working with the assumption that most of the data was on disk (even if, in practice, large amounts or perhaps even all of it resides in OS cache). Though I'm no stranger to data management in general, I'm still in a steep learning curve for databases in general and PG in particular, but I just wondered how big a subject this is in the development group for PG at the moment? After all, we're now seeing the first wave of 'reasonably priced' 64 bit servers supported by a proper 64 bit OS (e.g. linux). HP are selling a 4 Opteron server which can take 256GB of RAM, and that starts at $10000 (ok - they don't give you that much RAM for that price - not yet, anyway!) This is the future, isn't it? Each year, a higher percentage of DB applications will be able to fit entirely in RAM, and that percentage is going to be quite significant in just a few years. The disk system gets relegated to a data preload on startup and servicing the writes as the server does its stuff. Regards, Andy
В списке pgsql-performance по дате отправления: