Обсуждение: Updated posix fadvise patch v19
Here is an updated posix_fadvise patch against CVS HEAD. It's my first patch generated using git, yay. I remembered to strip out configure (WHY is that STILL in CVS!?) and convert it to context diff, but if there's anything else I've missed just shout. Changes from before: 1) Based on discussions changed the parameter to effective_io_concurrency and modified the documentation discussion to include comments about SSDs and experimentation being necessary. 2) standardized all references to "prefetch" from the mixture of same and "preread" and "advise". This means the main entry point is now PrefetchBuffer(). 3) Added run-time check for buggy glibc versions with bad posix_fadvise() since it was pointed out that someone could run on a different version of glibc than it was compiled on (such as with binary installers). I kept the autoconf test though arguably it's redundant now and could be removed. Also, the previous patch included support for Zoltan's situation using POSIX_FADV_SEQUENTIAL for bulk sequential scans. I experimented with this and was not able to find any situation it improved. I've left it in so others can test and show it's helpful. This bit of code is a bit less polished -- it's missing documentation for the guc variable and there are some #defines which are probably in the wrong .h file (and probably should be an enum). This code is entirely separate from the prefetching code. They happen to both use posix_fadvise but I don't think they belong in the same abstraction for Postgres. PrefetchBuffer does just one thing -- prefetches a buffer. Sequential i/o needs to be set and tracked per-file. I have not changed the query costing side of things. I'm a bit leery about making those changes as I fear it will lead me to propose a major reworking of these parameters and I don't want to go there... -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!
Вложения
> - StrategyFileStrategy doesn't handle the recently added BAS_BULKWRITE > strategy. I'm not sure whether it needs to, but it seems to me that > this a trap for the unwary: we should probably add a comment where the > BAS_* constants are defined warning that any changes here may/will > also necessitate changes there. I think a detailed comment on the > function itself explaining why it does what it does and how to decide > what to do for a new type of BufferAccessStrategy would be a good > idea. In fact, now that I look at this a little further, I see that in general you've not added comments at the beginnings of functions - for example, the other functions in the files that contain smgrprefetch, mdprefetch, PrefetchBuffer seem to have a description of the purpose of those functions; the ones you've added do not. Good luck, I'd like to see this one get in - the performance results you've reported sound very impressive. ...Robert
I was pretty leery about reviewing this one due to the feeling that I might well be in over my head, but they talked me into it, so here goes nothin'. Apologies in advance for any deficiencies in this review. - Overall, this looks pretty clean. The style appears to be consistent with the surrounding code, the patch applies with only minor fuzz, there is not much cruft in the diff (I found a few very minor unnecessary hunks, see department of nitpicking below) and the whole thing compiles and passes regression tests. Also, while I'm not totally qualified to judge the success of your efforts, it appears that you've attempted to make the changes in a way that preserves the existing abstractions, which is good. - However, having said that, it looks as if there is still a bit of experimentation going on in terms of what you actually want the patch to do. There are a couple of things that say FIXME or XXX, and at least one diff hunk that adds code surrounded by #if 0 (in FileRead). Maybe committing FIXME is OK, but certainly #if 0 isn't, so there's at least a bit of work needed here to put this in its final form, though I think perhaps not that much. As you mentioned when posting this version of the patch, it does two unrelated things only one of which you are confident is beneficial. I suggest splitting this into two patches and trying to get the prefetch part committed first. I think that would make for easier review, too. - I dislike cluttering up EXPLAIN ANALYZE with even more crap. On the flip side, I am loathe to take away any sort of instrumentation that might be helpful. I think we need to revamp the syntax of EXPLAIN [ANALYZE] to support some kind of option list, so that users can request the information they care about and not be overwhelmed by what they don't care about. (ISTR that a similar proposal was made with regard to VACUUM some time ago... perhaps the same ideas could be applied to both.) That would need to be done as a separate patch, so maybe we shouldn't worry about it here. - It's not clear to me whether there is a reason why we are only adding instrumentation for bitmap heap scan; would not other scan types benefit from something similar? If we're not going to add instrumentation for all the cases that can benefit from it, then we should just rip it out. - StrategyFileStrategy doesn't handle the recently added BAS_BULKWRITE strategy. I'm not sure whether it needs to, but it seems to me that this a trap for the unwary: we should probably add a comment where the BAS_* constants are defined warning that any changes here may/will also necessitate changes there. I think a detailed comment on the function itself explaining why it does what it does and how to decide what to do for a new type of BufferAccessStrategy would be a good idea. - I am mildly concerned that we are overusing the word Strategy. The purpose of StrategyFileStrategy is, of course, to take a BufferAccessStrategy (which is an object) and return a FILE_STRATEGY_* constant. But someone might not find that entirely evident from the name. If we're going to have multiple things floating around that are different from each other but all called strategies, ISTM we should name functions etc. to make clear which one we're talking about. Or else pick a different word. Maybe for now it's sufficient to rename this function to AccessStrategyGetFileStrategy, or something. - The WARNING at the top of PrefetchBuffer() seems strange. Is that really only a WARNING? We just continue anyway? Department of nitpicking: - The very first comment change in src/backend/commands/explain.c is apparently extraneous. - guc.c misspells the work "concurrent" as "concurrenct". - The first diff hunk in each of fd.h and smgr.h is an unnecessary whitespace change. - nodeBitmapHeapscan.c adds an ELOG at DEBUG2 - do we really want this, or is it leftover debugging code? ...Robert
"Robert Haas" <robertmhaas@gmail.com> writes: > I was pretty leery about reviewing this one due to the feeling that I > might well be in over my head, but they talked me into it, so here > goes nothin'. Apologies in advance for any deficiencies in this > review. > > - Overall, this looks pretty clean. > > - However, having said that, it looks as if there is still a bit of > experimentation going on in terms of what you actually want the patch > to do. There are a couple of things that say FIXME or XXX, and at > least one diff hunk that adds code surrounded by #if 0 (in FileRead). > Maybe committing FIXME is OK, but certainly #if 0 isn't, so there's at > least a bit of work needed here to put this in its final form, though > I think perhaps not that much. As you mentioned when posting this > version of the patch, it does two unrelated things only one of which > you are confident is beneficial. I suggest splitting this into two > patches and trying to get the prefetch part committed first. I think > that would make for easier review, too. When I posted the plan was that people would run experiments with other systems. In particular I was hoping Zoltan who original reported the sequential file strategy stuff being helpful might be able to see if this works for him. As this hasn't happened and I haven't been able to demonstrate it being useful myself I guess it makes more sense to separate the two now and set the sequential scan stuff aside until someone can demonstrate it being useful. > - I dislike cluttering up EXPLAIN ANALYZE with even more crap. On the > flip side, I am loathe to take away any sort of instrumentation that > might be helpful. I put it in for debugging since it was hard to know if it was actually doing anything otherwise. I figured whoever was reviewing the patch would run into the same problem. It would be trivial to strip out though. > - It's not clear to me whether there is a reason why we are only > adding instrumentation for bitmap heap scan; would not other scan > types benefit from something similar? If we're not going to add > instrumentation for all the cases that can benefit from it, then we > should just rip it out. Well there is a difference because bitmap heap scans do that slow-start thing. They ramp up the prefetching as you fetch more tuples from the bitmap up to the user configurable GUC so the point was to instrument what level they get to. Regular index scans prefetch whatever they find in the index leaf pages which will vary from page to page and isn't really something the user can influence anyways. But as I said, it was mostly so I could tell what the slow start algorithm was doing and make sure it was doing anything at all in testing. > - The WARNING at the top of PrefetchBuffer() seems strange. Is that > really only a WARNING? We just continue anyway? Er, uh, yeah that's bogus. The RelationIsValid is pointless as RelationOpenSmgr will just crash if that's not true. But I'm not nearly as convinced that BlockNumberIsValid() isn't worth asserting for. I don't see anything in the buffer tag lookup code which will fail in that case. What confuses me is that there's no corresponding check in the ReadBuffer code. Tom? Heikki? Should we have an assert in ReadBuffer asserting BlockNumberIsValid -- especially for the zero-page case where we're not actually going to do I/O? Or am I missing where that will seg fault? > Department of nitpicking: Will clean these up, they all look valid. I thought I did clean things up already -- I guess you should be happy you're looking at it *after* that cleanup :/ -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning
> As this hasn't happened and I haven't been able to demonstrate it being useful > myself I guess it makes more sense to separate the two now and set the > sequential scan stuff aside until someone can demonstrate it being useful. Sounds good. How soon do you think you can post updated patches? > But as I said, it was mostly so I could tell what the slow start algorithm was > doing and make sure it was doing anything at all in testing. In that case, I think you should just rip it all out for now. >> Department of nitpicking: > > Will clean these up, they all look valid. I thought I did clean things up > already -- I guess you should be happy you're looking at it *after* that > cleanup :/ Maybe it's you who should be glad... :-) ...Robert
On Fri, 14 Nov 2008, Gregory Stark wrote: > In particular I was hoping Zoltan who original reported the sequential > file strategy stuff being helpful might be able to see if this works for > him. The original message there suggested it was particularly valuable when working with a somewhat broken kernel that was handling multiple simultaenous reads badly. I think the idea here is that if you've got an OS that handles that situation well, then the fadvise calls don't help much. But having them in there does improve the odds that the kernel will do the right thing with the sequential reads. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
"Robert Haas" <robertmhaas@gmail.com> writes: > - However, having said that, it looks as if there is still a bit of > experimentation going on in terms of what you actually want the patch > to do. There are a couple of things that say FIXME or XXX, and at > least one diff hunk that adds code surrounded by #if 0 (in FileRead). Well, yes, we have lots of XXX and FIXMEs throughout the source code. We need to make progress even if we don't know for certain that we have the perfect crystalline code structure in all its glory. I have the same reaction and fear when I see reviewers' questions about whether a patch is "finished". If we wait until someone is "finished" working on a feature we'll never get the feature -- there's always more that can be done. We want to commit things when they are ready for use, not wait until there's nothing more to do. The XXX is something that probably needs to be fixed but it's just a question of what header file to put a declaration in. I couldn't find a good choice but perhaps someone else has an idea? For the FIXMEs I don't have any problem leaving them in place. They're warnings to future coders working in the same area of what they may have to do to make the code more general. In particular both FIXMEs are related to memory management of the iterator structures. I think just allocating them in the bitmap memory context is fine for existing callers. I would rather leave them there because I would like a reviewer to double check that we don't have a memory leak there. The #if 0 is part of the FileStrategy code to use POSIX_FADV_SEQUENTIAL which I'm removing now so it's irrelevant. > As you mentioned when posting this version of the patch, it does two > unrelated things only one of which you are confident is beneficial. I > suggest splitting this into two patches and trying to get the prefetch part > committed first. I think that would make for easier review, too. Maintaining two interdependent patches is a bit of a pain. This is one reason it's annoying when a patch goes uncommitted for a long time. It's hard to work on further improvements which are dependent on it. Nobody seems to care about the sequential scan case any more so I doubt I'll bother maintaining it after I remove it from this patch. > - I dislike cluttering up EXPLAIN ANALYZE with even more crap. On the > flip side, I am loathe to take away any sort of instrumentation that > might be helpful. My feeling is that it was useful but only marginally and mostly only for testing that things were working properly. The main benefit is giving people a real feeling for how much data a given effective_io_concurrency actually translates to. I'm removing this now. If anyone else thinks it was nice to have scream now. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning
> For the FIXMEs I don't have any problem leaving them in place. They're > warnings to future coders working in the same area of what they may have to do > to make the code more general. That's fine with me. I think it's fine to document possibilities for future development, but sometimes it's hard to tell between which such things the patch author intends to be committed and which he or she intends to fix before commit. Maybe FIXME is just a noise word, though: there are other places in the code where we document things that might be related or need to be changed without using that word. I don't care very much, though. Looking forward to v20. ...Robert
Gregory Stark wrote: > The XXX is something that probably needs to be fixed but it's just a question > of what header file to put a declaration in. I couldn't find a good choice but > perhaps someone else has an idea? > > For the FIXMEs I don't have any problem leaving them in place. They're > warnings to future coders working in the same area of what they may have to do > to make the code more general. In particular both FIXMEs are related to memory > management of the iterator structures. I think just allocating them in the > bitmap memory context is fine for existing callers. I would rather leave them > there because I would like a reviewer to double check that we don't have a > memory leak there. There are probably no rigid rules on this, but my interpretation of these tags is usually this: XXX -- not sure if this is the best way to do this, needs ideas TODO -- specific ideas for improvement FIXME -- broken, must be fixed to be usable So committed code should probably not contain any FIXMEs, but possibly some of the others. I usually label stubs in work-in-progress code with // FIXME and then check if I removed them all before proposing a patch for inclusion. But those are just my ideas ...
"Robert Haas" <robertmhaas@gmail.com> writes: > Looking forward to v20. Here you go! I addressed all the nitpicks and added comments. I also stripped out the sequential i/o posix_fadvises. I'm kind of sad to see them go since it did seem like a nice way to give more info to the OS even if no OSes today make good use of it. But one thing at a time and this is clearly a lot more important. Today we're effectively not using raid arrays properly at all, we're using them like they're single drives. One thing which is bothering me is that the guc assign hook is throwing an error if you set effective_io_concurrency when your system's posix_fadvise is deemed inadequate (either unavailable or from an old version of glibc). I'm starting to think it shouldn't throw an error, just not set the internal variable and possible output a warning. We do have some GUC variables which throw errors if you use them and support isn't compiled in, but I'm not sure it's such a hot idea even for those. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!
Вложения
Peter Eisentraut <peter_e@gmx.net> writes: > There are probably no rigid rules on this, but my interpretation of these tags > is usually this: > > XXX -- not sure if this is the best way to do this, needs ideas > TODO -- specific ideas for improvement > FIXME -- broken, must be fixed to be usable I don't have strong feelings on this. I appear to use them more or less interchangeably. Or perhaps what I did is consistent with your rules. Except that the "FIXME" isn't unusable it's just that I put the prototype in clearly the wrong file. So it definitely has to be corrected but I don't know where to move it. The XXX is for something I think is correct now but might need to be fixed if new callers need tighter memory management. And which could use a close look by a reviewer to be sure I'm right about the memory management being ok for now. FWIW we don't seem to have any such strict rules about them: $ find . -name \*.[ch] -print0 | xargs -0 grep FIXME | wc -l 22 $ find . -name \*.[ch] -print0 | xargs -0 grep XXX | wc -l 485 $ find . -name \*.[ch] -print0 | xargs -0 grep TODO | wc -l 33 -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!
> One thing which is bothering me is that the guc assign hook is throwing an > error if you set effective_io_concurrency when your system's posix_fadvise is > deemed inadequate (either unavailable or from an old version of glibc). I'm > starting to think it shouldn't throw an error, just not set the internal > variable and possible output a warning. We do have some GUC variables which > throw errors if you use them and support isn't compiled in, but I'm not sure > it's such a hot idea even for those. I can't see why this would be a good idea. Warnings are easy to overlook, and then you have completely different behavior without knowing it. If I wanted a database that silently did something completely different from what I asked it to do, I'd use... well, let's just say products like that are available. ...Robert
> I addressed all the nitpicks and added comments. Woot, yeah for comments. There's a trivial conflict with CVS HEAD due to unrelated changes to AC_CHECK_FUNCS(...kitchen sink...) but that led me to notice something else: can't all this stuff about posix_fallocate be ripped out of configure.in at this point? In config.sgml, the documentation is good, but suffers from a slightly informal style. There are a lot of places where commas seem appropriate but are not present. Suggested changes by paragraph: 1. Replace last sentence: "Raising this value will cause PostgreSQL to initiate ...Robert
Obviously that went too soon. > In config.sgml, the documentation is good, but suffers from a slightly > informal style. There are a lot of places where commas seem > appropriate but are not present. Suggested changes by paragraph: > > 1. Replace last sentence: "Raising this value will cause PostgreSQL to initiate ...more I/O operations in parallel." 2. "Typically, this parameter should be set to the number of separate drives comprising a RAID 0 stripe or RAID 1 mirror or any combination.For RAID 5, it should be set to the number of drivesexcluding the parity drive. However, the optimal value varies based on the effectiveness of the RAID controller or software implementation, and may require some experimentation." 3. "For more exotic systems, such as memory-based storage or a RAID array which is limited by bus bandwidth, the correct value might be the number of paths available instead; again, experimentation is recommended." 4. Insert comma after "However" in third sentence. Replace last sentence with: "On these operating systems, setting effective_io_concurrency to a value greater than 1 will incur some CPU overhead without improving perforamance." ...Robert
this doesn't apply cleanly to head anymore, can you please post v21 ? I would love to test it here :)
On 2008-11-20, at 11:08, Grzegorz Jaskiewicz wrote: > this doesn't apply cleanly to head anymore, can you please post v21 ? > I would love to test it here :) bollocks, it's already in cvs head - isn't it ? ... :D