Обсуждение: MIT benchmarks pgsql multicore (up to 48)performance
Hi,
for whom it may concern:
http://pdos.csail.mit.edu/mosbench/They tested with 8.3.9, i wonder what results 9.0 would give.
Best regards and keep up the good work
Hakan
On Mon, Oct 4, 2010 at 10:44 AM, Hakan Kocaman <hkocam@googlemail.com> wrote: > for whom it may concern: > http://pdos.csail.mit.edu/mosbench/ > They tested with 8.3.9, i wonder what results 9.0 would give. > Best regards and keep up the good work > Hakan Here's the most relevant bit to us: -- The “Stock” line in Figures 7 and 8 shows that Post- greSQL has poor scalability on the stock kernel. The first bottleneck we encountered, which caused the read/write workload’s total throughput to peak at only 28 cores, was due to PostgreSQL’s design. PostgreSQL implements row- and table-level locks atop user-level mutexes; as a result, even a non-conflicting row- or table-level lock acquisition requires exclusively locking one of only 16 global mutexes. This leads to unnecessary contention for non-conflicting acquisitions of the same lock—as seen in the read/write workload—and to false contention between unrelated locks that hash to the same exclusive mutex. We address this problem by rewriting PostgreSQL’s row- and table-level lock manager and its mutexes to be lock-free in the uncontended case, and by increasing the number of mutexes from 16 to 1024. -- I believe the "one of only 16 global mutexes" comment is referring to NUM_LOCK_PARTITIONS (there's also NUM_BUFFER_PARTITIONS, but that wouldn't be relevant for row and table-level locks). Increasing that from 16 to 1024 wouldn't be free and it's not clear to me that they've done anything to work around the downsides of such a change. Perhaps it's worthwhile anyway on a 48-core machine! The use of lock-free techniques seems quite interesting; unfortunately, I know next to nothing about the topic and this paper doesn't provide much of an introduction. Anyone have a reference to a good introductory paper on the topic? The other sort of interesting thing that they mention is that apparently I/O between shared buffers and the underlying data files causes a lot of kernel contention due to inode locks induced by lseek(). There's nothing much we can do about that within PG but surely it would be nice if it got fixed upstream. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Oct 4, 2010, at 13:13 , Robert Haas wrote: > On Mon, Oct 4, 2010 at 10:44 AM, Hakan Kocaman <hkocam@googlemail.com> wrote: >> for whom it may concern: >> http://pdos.csail.mit.edu/mosbench/ >> They tested with 8.3.9, i wonder what results 9.0 would give. >> Best regards and keep up the good work >> Hakan > > Here's the most relevant bit to us: <snip/> > The use of lock-free > techniques seems quite interesting; unfortunately, I know next to > nothing about the topic and this paper doesn't provide much of an > introduction. Anyone have a reference to a good introductory paper on > the topic? The README in the postgres section of the git repo leads me to think the code that includes the fixes it there, if someonewants to look into it (wrt to the Postgres lock manager changes). Didn't check the licensing. Michael Glaesemann grzm seespotcode net
On Mon, Oct 4, 2010 at 1:38 PM, Michael Glaesemann <grzm@seespotcode.net> wrote: > > On Oct 4, 2010, at 13:13 , Robert Haas wrote: > >> On Mon, Oct 4, 2010 at 10:44 AM, Hakan Kocaman <hkocam@googlemail.com> wrote: >>> for whom it may concern: >>> http://pdos.csail.mit.edu/mosbench/ >>> They tested with 8.3.9, i wonder what results 9.0 would give. >>> Best regards and keep up the good work >>> Hakan >> >> Here's the most relevant bit to us: > > <snip/> > >> The use of lock-free >> techniques seems quite interesting; unfortunately, I know next to >> nothing about the topic and this paper doesn't provide much of an >> introduction. Anyone have a reference to a good introductory paper on >> the topic? > > The README in the postgres section of the git repo leads me to think the code that includes the fixes it there, if someonewants to look into it (wrt to the Postgres lock manager changes). Didn't check the licensing. It does, but it's a bunch of x86-specific hacks that breaks various important features and include comments like "use usual technique for lock-free thingamabob". So even if the licensing is/were suitable, the code's not usable. I think the paper is neat from the point of view of providing us with some information about where the scalability bottlenecks might be on hardware to which most of us don't have easy access, but as far as the implementation goes I think we're on our own. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
I wasn't involved in this work but I do know a bit about it. Sadly, the work on Postgres performance was cut down to under a page, complete with the amazing offhand mention of "rewriting PostgreSQL's lock manager". Here are a few more details... The benchmarks in this paper are all about stressing the kernel. The database is entirely in memory -- it's stored on tmpfs rather than on disk, and it fits within shared_buffers. The workload consists of index lookups and inserts on a single table. You can fill in all the caveats about what conclusions can and cannot be drawn from this workload. The big takeaway for -hackers, I think, is that lock manager performance is going to be an issue for large multicore systems, and the uncontended cases need to be lock-free. That includes cases where multiple threads are trying to acquire the same lock in compatible modes. Currently even acquiring a shared heavyweight lock requires taking out an exclusive LWLock on the partition, and acquiring shared LWLocks requires acquiring a spinlock. All of this gets more expensive on multicores, where even acquiring spinlocks can take longer than the work being done in the critical section. Their modifications to Postgres should be available in the code that was published last night. As I understand it, the approach is to implement LWLocks with atomic operations on a counter that contains both the exclusive and shared lock count. Heavyweight locks do something similar but with counters for each lock mode packed into a word. Note that their implementation of the lock manager omits some features for simplicity, like deadlock detection, 2PC, and probably any semblance of portability. (These are the sort of things we're allowed to do in the research world! :-) The other major bottleneck they ran into was a kernel one: reading from the heap file requires a couple lseek operations, and Linux acquires a mutex on the inode to do that. The proper place to fix this is certainly in the kernel but it may be possible to work around in Postgres. Dan -- Dan R. K. Ports MIT CSAIL http://drkp.net/
Here's a video on lock-free hashing for example: http://video.google.com/videoplay?docid=2139967204534450862# I guess by "lock-free in the uncontended case" they mean the buffer cache manager is lock-free unless you're actually contending on the same buffer?
On Mon, Oct 04, 2010 at 01:13:36PM -0400, Robert Haas wrote: > I believe the "one of only 16 global mutexes" comment is referring to > NUM_LOCK_PARTITIONS (there's also NUM_BUFFER_PARTITIONS, but that > wouldn't be relevant for row and table-level locks). Yes -- my understanding is that they hit two lock-related problems: 1) LWLock contention caused by acquiring the same lockin compatible modes (e.g. multiple shared locks) 2) false contention caused by acquiring two locks that hashed tothe same partition and the first was the worse problem. The lock-free structures helpe with both, so the impact of changing NUM_LOCK_PARTITIONS was less interesting. Dan -- Dan R. K. Ports MIT CSAIL http://drkp.net/
On Oct 4, 2010, at 11:06, Greg Stark <gsstark@mit.edu> wrote: > I guess by "lock-free in the uncontended case" they mean the buffer > cache manager is lock-free unless you're actually contending on the > same buffer? That refers to being able to acquire non-conflicting row/table locks without needing an exclusive LWLock, and acquiring sharedLWLocks without spinlocks if possible. I think the buffer cache manager is the next bottleneck after the row/table lock manager. Seems like it would also be a goodcandidate for similar techniques, but that's totally uninformed speculation on my part. Dan
On Mon, Oct 4, 2010 at 8:44 AM, Hakan Kocaman <hkocam@googlemail.com> wrote: > Hi, > for whom it may concern: > http://pdos.csail.mit.edu/mosbench/ > They tested with 8.3.9, i wonder what results 9.0 would give. > Best regards and keep up the good work They mention that these tests were run on the older 8xxx series opterons which has much slower memory speed and HT speed as well. I wonder how much better the newer 6xxx series magny cours would have done on it... When I tested some simple benchmarks like pgbench, I got scalability right to 48 processes on our 48 core magny cours machines. Still, lots of room for improvement in kernel and pgsql. -- To understand recursion, one must first understand recursion.