Обсуждение: getting to beta
A quick review of the open items list suggests that we have three main areas that need attention before we can declare ourselves ready for beta. In no particular order: 1. There are a bunch of small, outstanding SSI patches. 2. Bugs - plural - related to pg_upgrade & typed tables. 3. Assorted collation issues. There are a couple of smaller items, too, but those are the big ones. Per previous discussion, the viable dates for code freeze for beta1 appear to be April 14th and April 28th. If we want to hit the earlier of those dates, which in my opinion would be a great goal to have, then we need to get all of the above issues resolved in the next 8 days, and I think we're going to need to kick it up a notch if we want that to happen. Most urgently, I believe we need a bit more committer bandwidth. I believe that I could tackle either the SSI patches or the pg_upgrade & typed tables issue, or I could try to make a dent in the collation stuff, but I don't think I can cover two of those areas and I definitely can't cover all three. Especially in the area of SSI, and to some extent as regards typed tables, the patches are written, but we have to get them reviewed and committed. Is anyone available to help with this? There are also a few issues where we need a patch and don't have one. In those cases the patches could be written by either a committer or a non-committer, but we need to make sure we know who is doing it so that everything gets covered. In particular: - SSI needs patch for the issue "SSI: three different HTABs contend for shared memory in a free-for-all" - typed tables needs a patch to allow an existing table to be made into a typed table, and pg_dump --binary-upgrade needs to be made to use that feature - the open collation issues all lack any associated code (but maybe Tom is planning to do this himself?) The other minor issues are: - do latches have memory ordering problems? I think the consensus is that they work OK the way we're using them right now, so maybe we can just drop this item, unless someone wants to pontificate further on it. - sync rep & smart shutdown - someone needs to review & apply Fujii Masao's proposed patch - generate_series boundary issue - I think this isn't a new regression so it's probably not a blocker for beta1, but we might still want to try to fix it. I seem to remember thinking that the prototype patch looked like it needed pretty significant cleanup, but I haven't looked at it in a while so I might be all wet. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > ... Most urgently, I believe we need a bit more committer bandwidth. I > believe that I could tackle either the SSI patches or the pg_upgrade & > typed tables issue, or I could try to make a dent in the collation > stuff, but I don't think I can cover two of those areas and I > definitely can't cover all three. I intend to return to the collations issues as soon as I've knocked off the GUC assign-hooks patch. That's taking longer than I thought (there are a *lot* of assign hooks) but I think I'll be able to finish it today or tomorrow. I have yet to read any of the SSI code, so I can't offer much help in that area. > The other minor issues are: > - do latches have memory ordering problems? I think the consensus is > that they work OK the way we're using them right now, so maybe we can > just drop this item, unless someone wants to pontificate further on > it. I think this can be left as an open issue for now, to remind us that some harder stress-testing on affected platforms would be a good thing. > - generate_series boundary issue - I think this isn't a new regression > so it's probably not a blocker for beta1, but we might still want to > try to fix it. Again, there's no reason that can't stay on the open items list past beta1. We may or may not choose to fix it for 9.1, but it's not a beta blocker. regards, tom lane
On Wed, Apr 6, 2011 at 9:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> ... Most urgently, I believe we need a bit more committer bandwidth. I >> believe that I could tackle either the SSI patches or the pg_upgrade & >> typed tables issue, or I could try to make a dent in the collation >> stuff, but I don't think I can cover two of those areas and I >> definitely can't cover all three. > > I intend to return to the collations issues as soon as I've knocked off > the GUC assign-hooks patch. That's taking longer than I thought (there > are a *lot* of assign hooks) but I think I'll be able to finish it today > or tomorrow. I have yet to read any of the SSI code, so I can't offer > much help in that area. > >> The other minor issues are: > >> - do latches have memory ordering problems? I think the consensus is >> that they work OK the way we're using them right now, so maybe we can >> just drop this item, unless someone wants to pontificate further on >> it. > > I think this can be left as an open issue for now, to remind us that > some harder stress-testing on affected platforms would be a good thing. OK, fair enough. >> - generate_series boundary issue - I think this isn't a new regression >> so it's probably not a blocker for beta1, but we might still want to >> try to fix it. > > Again, there's no reason that can't stay on the open items list past > beta1. We may or may not choose to fix it for 9.1, but it's not a beta > blocker. I agree. But again, that's not really what I'm focusing on - the collations stuff, the typed tables patch, and SSI all need serious looking at, and I'm not sure who is going to pick all that up. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I agree. But again, that's not really what I'm focusing on - the > collations stuff, the typed tables patch, and SSI all need serious > looking at, and I'm not sure who is going to pick all that up. Well, I'll take responsibility for collations. If I get done with that before the 14th, I can see what's up with typed tables. I'm not willing to do anything with SSI at this stage. regards, tom lane
On 06.04.2011 18:02, Tom Lane wrote: > Robert Haas<robertmhaas@gmail.com> writes: >> I agree. But again, that's not really what I'm focusing on - the >> collations stuff, the typed tables patch, and SSI all need serious >> looking at, and I'm not sure who is going to pick all that up. > > Well, I'll take responsibility for collations. If I get done with that > before the 14th, I can see what's up with typed tables. I'm not willing > to do anything with SSI at this stage. I can look at the SSI patches, but not until next week, I'm afraid. Robert, would you like to pick that up before then? Kevin & Dan have done all the heavy lifting, but it's nevertheless pretty complicated code to review. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, Apr 6, 2011 at 12:06 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 06.04.2011 18:02, Tom Lane wrote: >>> I agree. But again, that's not really what I'm focusing on - the >>> collations stuff, the typed tables patch, and SSI all need serious >>> looking at, and I'm not sure who is going to pick all that up. >> >> Well, I'll take responsibility for collations. If I get done with that >> before the 14th, I can see what's up with typed tables. I'm not willing >> to do anything with SSI at this stage. > > I can look at the SSI patches, but not until next week, I'm afraid. Robert, > would you like to pick that up before then? Kevin & Dan have done all the > heavy lifting, but it's nevertheless pretty complicated code to review. I'll try, and see how far I get with it. If you can pick up whatever I don't get to by early next week, that would be a big help. I am going to be in Santa Clara next week for the MySQL conference (don't worry, I'll be talking about PostgreSQL!) and that's going to cut into my time quite a bit. The one I'm most worried about is "SSI: three different HTABs contend for shared memory in a free-for-all" - because there's no patch for that yet, and I am wary of breaking something mucking around with it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: >> I can look at the SSI patches, but not until next week, I'm >> afraid. Robert, would you like to pick that up before then? Kevin >> & Dan have done all the heavy lifting, but it's nevertheless >> pretty complicated code to review. > > I'll try, and see how far I get with it. If you can pick up > whatever I don't get to by early next week, that would be a big > help. I am going to be in Santa Clara next week for the MySQL > conference (don't worry, I'll be talking about PostgreSQL!) and > that's going to cut into my time quite a bit. The one I'm most > worried about is "SSI: three different HTABs contend for shared > memory in a free-for-all" - because there's no patch for that yet, > and I am wary of breaking something mucking around with it. I haven't seen any objection to Heikki's suggestion for how to handle the shared memory free-for-all: http://archives.postgresql.org/message-id/4D94C889.3050607@enterprisedb.com Either Dan or I will put something together along those lines before next week. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Robert Haas <robertmhaas@gmail.com> wrote: >> ... The one I'm most >> worried about is "SSI: three different HTABs contend for shared >> memory in a free-for-all" - because there's no patch for that yet, >> and I am wary of breaking something mucking around with it. > I haven't seen any objection to Heikki's suggestion for how to > handle the shared memory free-for-all: I confess to not having been reading the discussions about SSI very much, but ... do we actually care whether there's a free-for-all? What's the downside to letting the remaining shmem get claimed by whichever table uses it first? regards, tom lane
On 06.04.2011 17:46, Tom Lane wrote: > "Kevin Grittner"<Kevin.Grittner@wicourts.gov> writes: >> Robert Haas<robertmhaas@gmail.com> wrote: >>> ... The one I'm most >>> worried about is "SSI: three different HTABs contend for shared >>> memory in a free-for-all" - because there's no patch for that yet, >>> and I am wary of breaking something mucking around with it. > >> I haven't seen any objection to Heikki's suggestion for how to >> handle the shared memory free-for-all: > > I confess to not having been reading the discussions about SSI very > much, but ... do we actually care whether there's a free-for-all? > What's the downside to letting the remaining shmem get claimed by > whichever table uses it first? It's leads to odd behavior. You start the database, and your application runs fine. Then you restart the database, and now you get "out of shared memory" errors from transactions that used to work. It's not the end of the world, but I'd prefer stable, repeatable behavior, even though having the slack shared memory be grabbed by whoever needs it first might in theory lead to better utilization of resources. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 6 April 2011 17:57, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 06.04.2011 17:46, Tom Lane wrote: >> >> "Kevin Grittner"<Kevin.Grittner@wicourts.gov> writes: >>> >>> Robert Haas<robertmhaas@gmail.com> wrote: >>>> >>>> ... The one I'm most >>>> worried about is "SSI: three different HTABs contend for shared >>>> memory in a free-for-all" - because there's no patch for that yet, >>>> and I am wary of breaking something mucking around with it. >> >>> I haven't seen any objection to Heikki's suggestion for how to >>> handle the shared memory free-for-all: >> >> I confess to not having been reading the discussions about SSI very >> much, but ... do we actually care whether there's a free-for-all? >> What's the downside to letting the remaining shmem get claimed by >> whichever table uses it first? > > It's leads to odd behavior. You start the database, and your application > runs fine. Then you restart the database, and now you get "out of shared > memory" errors from transactions that used to work. > > It's not the end of the world, but I'd prefer stable, repeatable behavior, > even though having the slack shared memory be grabbed by whoever needs it > first might in theory lead to better utilization of resources. It sounds a bit apocalyptic to me, if that really is happening. -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935 EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > On 06.04.2011 17:46, Tom Lane wrote: >> I confess to not having been reading the discussions about SSI very >> much, but ... do we actually care whether there's a free-for-all? >> What's the downside to letting the remaining shmem get claimed by >> whichever table uses it first? > It's leads to odd behavior. You start the database, and your application > runs fine. Then you restart the database, and now you get "out of shared > memory" errors from transactions that used to work. If you get "out of shared memory" at all due to SSI, I'd say that that's the problem, not exactly when it happens. I thought that the patch included provisions for falling back to coarser-grained locks whenever it was short of resources. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > If you get "out of shared memory" at all due to SSI, I'd say that > that's the problem, not exactly when it happens. I thought that > the patch included provisions for falling back to coarser-grained > locks whenever it was short of resources. When one of the tests was getting out of memory errors we were initially having trouble telling where the memory was actually consumed, because it wasn't necessarily due to the type of object being allocated at the point of failure. That was the motivation for my attempt to log when an HTAB grew past its "maximum". The problem turned out to be a field which wasn't properly initialized in certain corner cases, making the cleanup phase fail to clear them when appropriate. There is a patch to fix that bug, but the issue raised in the early phase of investigation is what, if anything we should do about the free-for-all allocation. If we want to call that a feature and take it off the 9.1 list, that's OK with me. It's a new issue with 9.1 in the sense that there used to be only one HTAB which could grab the slack space, and only generate its out of memory error once that slack space was exhausted. Now that there are three, things are a bit less predictable. By the way, the problem with SSI potentially running out of shared memory is rather parallel to how heavyweight locks can run out of shared memory. The SLRU prevents the number of transactions from being limited in that way, and multiple locks per table escalate granularity, but with a strange enough workload (for example, accessing hundreds of tables per transaction) one might need to boost max_pred_locks_per_transaction above the default to avoid shared memory exhaustion. -Kevin
On Wed, Apr 06, 2011 at 12:25:26PM -0500, Kevin Grittner wrote: > By the way, the problem with SSI potentially running out of shared > memory is rather parallel to how heavyweight locks can run out of > shared memory. The SLRU prevents the number of transactions from > being limited in that way, and multiple locks per table escalate > granularity, but with a strange enough workload (for example, > accessing hundreds of tables per transaction) one might need to > boost max_pred_locks_per_transaction above the default to avoid > shared memory exhaustion. In fact, it's exactly the same: if a backend wants to acquire many heavyweight locks, it doesn't stop at max_locks_per_xact, it just keeps allocating them until shmem is exhausted. So it's possible, if less likely, to have the same problem with regular locks causing the system to run out of shared memory. Which sounds to me like a good reason to address both problems in one place. Dan -- Dan R. K. Ports MIT CSAIL http://drkp.net/
On Wed, Apr 6, 2011 at 3:27 PM, Dan Ports <drkp@csail.mit.edu> wrote: > On Wed, Apr 06, 2011 at 12:25:26PM -0500, Kevin Grittner wrote: >> By the way, the problem with SSI potentially running out of shared >> memory is rather parallel to how heavyweight locks can run out of >> shared memory. The SLRU prevents the number of transactions from >> being limited in that way, and multiple locks per table escalate >> granularity, but with a strange enough workload (for example, >> accessing hundreds of tables per transaction) one might need to >> boost max_pred_locks_per_transaction above the default to avoid >> shared memory exhaustion. > > In fact, it's exactly the same: if a backend wants to acquire many > heavyweight locks, it doesn't stop at max_locks_per_xact, it just > keeps allocating them until shmem is exhausted. > > So it's possible, if less likely, to have the same problem with regular > locks causing the system to run out of shared memory. Which sounds to > me like a good reason to address both problems in one place. The real fix for this problem is probably to have the ability to actually return memory to the shared pool, rather than having everyone grab as they need it until there's no more and never give back. But that's not going to happen in 9.1, so the question is whether this is a sufficiently serious problem that we ought to impose the proposed stopgap fix between now and whenever we do that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> wrote: > The real fix for this problem is probably to have the ability to > actually return memory to the shared pool, rather than having > everyone grab as they need it until there's no more and never give > back. But that's not going to happen in 9.1, so the question is > whether this is a sufficiently serious problem that we ought to > impose the proposed stopgap fix between now and whenever we do > that. There is a middle course between leaving the current approach of preallocating half the maximum size and leaving the other half up for grabs and the course Heikki proposes of making the maximum a hard limit. I submitted a patch to preallocate the maximum, so a request for a particular HTAB object will never get "out of shared memory" unless it is past its maximum: http://archives.postgresql.org/message-id/4D948066020000250003C00B@gw.wicourts.gov That would leave some extra which is factored into the calculations up for grabs, but each table would be guaranteed at least its maximum number of entries. This seems pretty safe to me, and not very invasive. We could always revisit in this 9.2 if that's not good enough. -Kevin
On Wed, Apr 6, 2011 at 6:32 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Robert Haas <robertmhaas@gmail.com> wrote: >> The real fix for this problem is probably to have the ability to >> actually return memory to the shared pool, rather than having >> everyone grab as they need it until there's no more and never give >> back. But that's not going to happen in 9.1, so the question is >> whether this is a sufficiently serious problem that we ought to >> impose the proposed stopgap fix between now and whenever we do >> that. > > There is a middle course between leaving the current approach of > preallocating half the maximum size and leaving the other half up > for grabs and the course Heikki proposes of making the maximum a > hard limit. I submitted a patch to preallocate the maximum, so a > request for a particular HTAB object will never get "out of shared > memory" unless it is past its maximum: > > http://archives.postgresql.org/message-id/4D948066020000250003C00B@gw.wicourts.gov > > That would leave some extra which is factored into the calculations > up for grabs, but each table would be guaranteed at least its > maximum number of entries. This seems pretty safe to me, and not > very invasive. We could always revisit in this 9.2 if that's not > good enough. OK, I agree. We certainly can't have a temporary demand for predicate locks starve out heavyweight locks for the rest of the postmaster lifetime, or visca versa. So we need to do at least that much. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Apr 6, 2011 at 12:16 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Apr 6, 2011 at 12:06 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> On 06.04.2011 18:02, Tom Lane wrote: >>>> I agree. But again, that's not really what I'm focusing on - the >>>> collations stuff, the typed tables patch, and SSI all need serious >>>> looking at, and I'm not sure who is going to pick all that up. >>> >>> Well, I'll take responsibility for collations. If I get done with that >>> before the 14th, I can see what's up with typed tables. I'm not willing >>> to do anything with SSI at this stage. >> >> I can look at the SSI patches, but not until next week, I'm afraid. Robert, >> would you like to pick that up before then? Kevin & Dan have done all the >> heavy lifting, but it's nevertheless pretty complicated code to review. > > I'll try, and see how far I get with it. If you can pick up whatever > I don't get to by early next week, that would be a big help. I am > going to be in Santa Clara next week for the MySQL conference (don't > worry, I'll be talking about PostgreSQL!) and that's going to cut into > my time quite a bit. I think I've cleared out most of the small stuff. The two SSI related issues still on the open items list are: * SSI: failure to clean up some SLRU-summarized locks * SSI: three different HTABs contend for shared memory in a free-for-all If you can pick those two up, that would be very helpful; I suspect you can work your way through them faster and with fewer mistakes than I would be able to manage. The other two items are: * Typed-tables patch broke pg_upgrade * assorted collation issues Tom said he'd take care of the collation issues. Peter Eisentraut, Noah Misch, and I have been exchanging emails on the typed tables problems, of which there appear to be several, but it's not real clear to me that we're converging on a comprehensive solution. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> wrote: > I think I've cleared out most of the small stuff. Thanks! > The two SSI related issues still on the open items list are: > > * SSI: failure to clean up some SLRU-summarized locks This one is very important. Not only could it lead to unnecessary false positive serialization failures, but (more importantly) it leaks shared memory by not clearing some locks, leading to potential "out of shared memory" errors. While this isn't as small as most of the SSI patches, I'm going to point out (to reassure those who haven't been reading the patches) that this one modifies two lines, adds six Assert statements which Dan found useful in debugging the issue, and adds (if you ignore white space and braces) four lines of code. "Big" is a relative term here. The problem is that the code in which these tweaks fall is hard to get one's head around. > * SSI: three different HTABs contend for shared memory in a > free-for-all I think we're pretty much agreed that something should be done about this, but the main issue here is that if either heavyweight locks or SIRead predicate locks exhaust memory, the other might be unlucky enough to get the error, making it harder to identify the cause. Without the above bug or an unusual workload, it would tend not to make a difference. If things come down to the wire and this is the only thing holding up the beta release, I'd suggest going ahead and cutting the beta. -Kevin
On Wed, Apr 6, 2011 at 9:21 AM, Robert Haas <robertmhaas@gmail.com> wrote: > A quick review of the open items list suggests that we have three main > areas that need attention before we can declare ourselves ready for > beta. > > In no particular order: > > 1. There are a bunch of small, outstanding SSI patches. > 2. Bugs - plural - related to pg_upgrade & typed tables. > 3. Assorted collation issues. Since we're targeting code freeze for beta1 for approximately now + 1 week, it's probably about time to take stock of where we are. 1. All of the SSI patches have been dealt with. 2. The typed tables stuff vs. pg_upgrade still needs work. I would be just as happy if Tom or Peter wanted to fix this, mostly for fear of getting flak over the details of the fixes, but if not I will do it. 3. The collation issues that have been discussed on-list have, I *think*, mostly been dealt with. But maybe there are some broken things that haven't been discussed yet? New things: - There is an outstanding bug-fix patch for PL/python tracebacks, proof that no patch is too small to require multiple rounds of bug fixing. - There are some minor infelicities in the handling of permissions for foreign tables. Since I committed a chunk of that stuff, I think it probably falls to me to clean this up, unless someone else wants to volunteer. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Since we're targeting code freeze for beta1 for approximately now + 1 > week, it's probably about time to take stock of where we are. > 3. The collation issues that have been discussed on-list have, I > *think*, mostly been dealt with. But maybe there are some broken > things that haven't been discussed yet? I have no open items for collations right now, but feel a need to re-read the original patch in toto before signing off on it. I'll try to get that done in the next day or two. BTW, I'm not sure if this was mentioned on-list previously, but we are thinking of wrapping the beta the evening of Wednesday 27th, not Thursday 28th as the traditional release scheduling would have it. (It seems our British contingent is planning to take the Friday off for some wedding or other, so there's no hope of getting Windows installers built on-time otherwise.) So that's one less day than you might have been thinking. I see no reason we can't make it though. It's past time to get this puppy out the door. regards, tom lane
On Tue, Apr 19, 2011 at 7:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Since we're targeting code freeze for beta1 for approximately now + 1 >> week, it's probably about time to take stock of where we are. > >> 3. The collation issues that have been discussed on-list have, I >> *think*, mostly been dealt with. But maybe there are some broken >> things that haven't been discussed yet? > > I have no open items for collations right now, but feel a need to > re-read the original patch in toto before signing off on it. > I'll try to get that done in the next day or two. > > > BTW, I'm not sure if this was mentioned on-list previously, but > we are thinking of wrapping the beta the evening of Wednesday 27th, > not Thursday 28th as the traditional release scheduling would have it. > (It seems our British contingent is planning to take the Friday off > for some wedding or other, so there's no hope of getting Windows > installers built on-time otherwise.) So that's one less day than > you might have been thinking. I see no reason we can't make it > though. It's past time to get this puppy out the door. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, 2011-04-19 at 18:18 -0400, Robert Haas wrote: > 2. The typed tables stuff vs. pg_upgrade still needs work. I would be > just as happy if Tom or Peter wanted to fix this, mostly for fear of > getting flak over the details of the fixes, but if not I will do it. Noah Misch is hot on the trail of that one. > - There is an outstanding bug-fix patch for PL/python tracebacks, That has been addressed.
On Thu, Apr 21, 2011 at 11:38 AM, Peter Eisentraut <peter_e@gmx.net> wrote: > On Tue, 2011-04-19 at 18:18 -0400, Robert Haas wrote: >> 2. The typed tables stuff vs. pg_upgrade still needs work. I would be >> just as happy if Tom or Peter wanted to fix this, mostly for fear of >> getting flak over the details of the fixes, but if not I will do it. > > Noah Misch is hot on the trail of that one. Yes, but inasmuch as he is not a committer, someone who is will need to be involved. I dealt with the prerequisite ALTER TABLE .. OF/NOT OF patch last night, but the related pg_dump patch that actually fixes the problem still needs to be looked at, and the earliest (and probably only) time that I can potentially do that is Monday. So it would be great if you or someone else could pick it up. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> wrote: > 1. All of the SSI patches have been dealt with. I'll add the non-serializable UPDATE performance issue. Dan has been benchmarking to try to find a worst case; I don't want to speak for him too much, but as he was headed off to lecture a class he sent me results so far, and with beta so close I figure I should pass along a rough outline. The worst case he has been able to construct so far was running 32 active processes on a 16 processor machine in an update-mostly mix against a database on tmpfs (so no disk writes) on a dataset which fits inside shared_memory. This was able to generate enough contention on an exclusive LW lock to cause a 0.7% slowdown. Speaking for myself, I believe we'll be able to provide a very small patch to eliminate this. Probably today or tomorrow. While in a less extreme runtime environment it would probably be hard to pick out a performance impact in the normal noise, I expect the fix to be small and safe enough to be worth doing. I do feel that it would be good to apply the one-line fix Heikki posted, which is orthogonal and needed in any event. That would give a little time for others to easily test it before beta. -Kevin
On Thu, Apr 21, 2011 at 12:32 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Robert Haas <robertmhaas@gmail.com> wrote: > >> 1. All of the SSI patches have been dealt with. > > I'll add the non-serializable UPDATE performance issue. Dan has > been benchmarking to try to find a worst case; I don't want to speak > for him too much, but as he was headed off to lecture a class he > sent me results so far, and with beta so close I figure I should > pass along a rough outline. The worst case he has been able to > construct so far was running 32 active processes on a 16 processor > machine in an update-mostly mix against a database on tmpfs (so no > disk writes) on a dataset which fits inside shared_memory. This was > able to generate enough contention on an exclusive LW lock to cause > a 0.7% slowdown. > > Speaking for myself, I believe we'll be able to provide a very small > patch to eliminate this. Probably today or tomorrow. While in a > less extreme runtime environment it would probably be hard to pick > out a performance impact in the normal noise, I expect the fix to be > small and safe enough to be worth doing. > > I do feel that it would be good to apply the one-line fix Heikki > posted, which is orthogonal and needed in any event. That would > give a little time for others to easily test it before beta. Please add that patch to the open items list if it is not there already. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
For background, the issue here is that there are three SSI calls that get invoked even on non-serializable transactions: - PredicateLockPageSplit/Combine, during B-tree page splits/combines -PredicateLockTupleRowVersionLink, from heap_update These have to update any matching SIREAD locks to match the new lock target. If there aren't any serializable transactions, there won't be any, but it still has to check and this requires taking a LWLock. Every other SSI function checks XactIsoLevel and bails out immediately if non-serializable. Like Kevin said, I tested this by removing these three calls and comparing under what I see as worst-case conditions. I used pgbench, an update-mostly workload, in read committed mode. The database (scale factor 100) fit in shared_buffers and was backed by tmpfs so disk accesses didn't enter the picture anywhere. I ran it on a 16-core machine to stress lock contention. Even under these conditions I couldn't reliably see a slowdown. My latest batch of results (16 backends, median of three 10 minute runs) shows a difference well below 1%. In a couple of cases I saw the code with the SSI checks running faster than with them removed, so this difference seems in the noise. Given that result, and considering it's a pretty extreme condition, it probably isn't worth worrying about this too much, but... There's a quick fix: we might as well bail out of these functions early if there are no serializable transactions running. Kevin points out we can do this by checking if PredXact->SxactGlobalXmin is invalid. I would add that we can do this safely without taking any locks, even on weak-memory-ordering machines. Even if a new serializable transaction starts concurrently, we have the appropriate buffer page locked, so it's not able to take any relevant SIREAD locks. Dan -- Dan R. K. Ports MIT CSAIL http://drkp.net/
On Thu, Apr 21, 2011 at 11:46 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Apr 21, 2011 at 11:38 AM, Peter Eisentraut <peter_e@gmx.net> wrote: >> On Tue, 2011-04-19 at 18:18 -0400, Robert Haas wrote: >>> 2. The typed tables stuff vs. pg_upgrade still needs work. I would be >>> just as happy if Tom or Peter wanted to fix this, mostly for fear of >>> getting flak over the details of the fixes, but if not I will do it. >> >> Noah Misch is hot on the trail of that one. > > Yes, but inasmuch as he is not a committer, someone who is will need > to be involved. I dealt with the prerequisite ALTER TABLE .. OF/NOT > OF patch last night, but the related pg_dump patch that actually fixes > the problem still needs to be looked at, and the earliest (and > probably only) time that I can potentially do that is Monday. So it > would be great if you or someone else could pick it up. Well, I addressed most of the remaining open items today, but not this one. Hopefully, someone else can pick it up, because I'm leaving end of day tomorrow for a week's vacation in Germany. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company