Обсуждение: Big Memory Boxes and pgtune
I've recently been blessed to move one of my databases onto a huge IBM P8 computer. Its a power PC architecture with 208-way cores (so postgres SHOULD believe there are 160 cores available) and 1 TB of RAM. I've always done my postgres tuning with a copy of "pgtune" which says in the output: # WARNING # this tool not being optimal # for very high memory systems So . . . what would I want to do differently based on the fact that I have a "very high memory system"? Suggestions welcome! (There are several different databases, mostly related to our work in social media and malware analytics. The databasesare smaller than available RAM. Around 80 million social media profiles with 700M or so links, growing by 10%a week or so. The malware database has extracted statistics and data about 17 million malware samples, growing by about5% per week. The Social Media side has just shy of 100 'fetchers' that insert/update (but don't delete.) A few humananalysts interact with the data, hitting some pretty heavy queries as they do link analysis and natural language processing,but mostly the database use is the "fetchers") ------------------------------------------------------------------------------ Gary Warner, Director CETIFS gar@uab.edu Center for Emerging Technology Investigations Forensics & Security The University of Alabama at Birmingham (UAB) 205.422.2113 ------------------------------------------------------------------------------
On Fri, Oct 28, 2016 at 10:44 AM, Warner, Gary, Jr <gar@uab.edu> wrote: > I've recently been blessed to move one of my databases onto a > huge IBM P8 computer. Its a power PC architecture with 20 8-way > cores (so postgres SHOULD believe there are 160 cores available) > and 1 TB of RAM. > So . . . what would I want to do differently based on the fact > that I have a "very high memory system"? What OS are you looking at? The first advice I would give is to use a very recent version of both the OS and PostgreSQL. Such large machines are a recent enough phenomenon that older software is not likely to be optimized to perform well on it. For similar reasons, be sure to stay up to date with minor releases of both. If the OS has support for them, you probably want to become familiar with these commands: numactl --hardware lscpu You may want to benchmark different options, but I suspect that you will see better performance by putting each database on a separate cluster and using cpusets (or the equivalent) so that each cluster uses a subset of the 160 cores and the RAM directly attached to the subset. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 10/28/2016 08:44 AM, Warner, Gary, Jr wrote: > I've recently been blessed to move one of my databases onto a huge IBM P8 computer. Its a power PC architecture with20 8-way cores (so postgres SHOULD believe there are 160 cores available) and 1 TB of RAM. > > I've always done my postgres tuning with a copy of "pgtune" which says in the output: > > # WARNING > # this tool not being optimal > # for very high memory systems > > So . . . what would I want to do differently based on the fact that I have a "very high memory system"? The most obvious is that you are going to want to have (depending on PostgreSQL version): * A very high shared_buffers (in newer releases, it is not uncommon to have many, many GB of) * Use that work_mem baby. You have 1TB available? Take your average data set return, and make work_mem at least that. * IIRC (and this may be old advice), maintenance_work_mem up to 4GB. As I recall it won't effectively use more than that but I could be wrong. Lastly but most importantly, test test test. JD -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them. Unless otherwise stated, opinions are my own.
On 10/28/16 2:33 PM, Joshua D. Drake wrote: > * A very high shared_buffers (in newer releases, it is not uncommon to > have many, many GB of) Keep in mind that you might get very poor results if shared_buffers is large, but not large enough to fit the entire database. In that case buffer replacement will be *extremely* expensive. Some operations will use a different buffer replacement strategy, so you might be OK if some of the database doesn't fit in shared buffers; that will depend a lot on your access patterns. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461
On Wed, Nov 2, 2016 at 5:46 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: > On 10/28/16 2:33 PM, Joshua D. Drake wrote: >> >> * A very high shared_buffers (in newer releases, it is not uncommon to >> have many, many GB of) > > > Keep in mind that you might get very poor results if shared_buffers is > large, but not large enough to fit the entire database. In that case buffer > replacement will be *extremely* expensive. Some operations will use a > different buffer replacement strategy, so you might be OK if some of the > database doesn't fit in shared buffers; that will depend a lot on your > access patterns. This. Especially on machines with fast CPUs / memory and SSDs underneath, lots of buffers can sometimes just get in the way. The linux kernel (and most other kernels) has hundreds, even thousands of man hours put into the file caching code and it's often faster to let the kernel do that job with the extra memory. Only a benchmark of a production type load can tell you what to expect, and only production itself will reveal the absolute truth. Where I used to work we had 5TB databases on machines with anywhere from 128GB to 512GB and honesly the extra memory didn't make a huge difference. They had 8 disk RAID-5 arrays with controller caching turned off, because it got in the way, and cranking up shared_buffers didn't not make them faster. I think we settled on something under 10GB on most of them