Re: [CORE] postpone next week's release
От | Andres Freund |
---|---|
Тема | Re: [CORE] postpone next week's release |
Дата | |
Msg-id | 20150530204727.GJ13944@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Re: [CORE] postpone next week's release (Bruce Momjian <bruce@momjian.us>) |
Ответы |
Re: [CORE] postpone next week's release
(Bruce Momjian <bruce@momjian.us>)
Re: [CORE] postpone next week's release (Heikki Linnakangas <hlinnaka@iki.fi>) |
Список | pgsql-hackers |
Hi Bruce, Everyone, On 2015-05-30 11:45:59 -0400, Bruce Momjian wrote: > Let me share something that people have told me privately but don't want > to state publicly (at least with attribution), and that is that we have > seen great increases in feature development (often funded), without a > corresponding increase development efforts focused on stability. Yes, I have seen and heard that too. What I think is also important that in turn our adoption has outpaced feature development (and thus transitively stability work). > The bottom line is that we just can't keep going on like this. The fact > we put out a release two weeks ago, then need to put out a fix release > for that, but we have more multi-xact bugs to fix and can't decide if we > should do one or two minor releases, and are pushing out an alpha of 9.5 > because we know we aren't ready for a beta, just confirms my analysis. I don't think that alone confirms very much. > I hate to be the bearer of bad news, but I think bad news is what we > must face. Well, the question is what we do with that observation. Personally I think it's not a new one. This point has been made repeatedly, including at most if not all developer meetings I attended. I definitely had conversations around it both in person, on IM and on list. I don't think it's primarily a problem of lack of review; although that is a large problem. I think the biggest systematic problem is that the compound complexity of postgres has increased dramatically over the years. Features have added complexity little by little, each not incrementally not looking that bad. But very little has been done to manage complexity. Since 8.0 the codesize has roughly doubled, but little has been done to manage the increased complexity. Few new abstractions have been introduced and the structure of the code is largely the same. As a somewhat extreme example, let's look at StartupXLOG(). In 8.0 it was ~500 LOC, in master it's ~1500. The interactions in 8.0 were complex, they have gotten much more complex since. It fullfills lots of different roles, all in one function: (roughly in the order things happen, but simplified) * Read the control file/determine whether we crashed * recovery.conf handling * backup label handling * tablespace map handling (huh, I missed that this was added directly to StartupXLOG. What a bad idea) * Determine whether we're doing archive recovery, read the relevant checkpoint if so * relcache init file removal * timeline switch handling * Loading the checkpoint we're starting from * Initialization of a lot of subsystems * crash recovery/replay * Including pgstat, unlogged table, exported snapshot handling * iff hot standby, some more subsystemsare initialized here * hot standby state handling * replay process intialization * crash replay itself, including * progress tracking * recovery pause handling * nextxid tracking * timeline increase handling * hot standbystate handling * unlogged relations handling * archive recovery handling * creation/initialization of the end of recoverycheckpoint * timeline increment if failover * subsystem initialization iff !hot_standby * end of recovery actions Yes. that's one routine. And, to make things even funnier, half of that routine isn't exercised by our tests. You can argue that this is an outlier, but I don't think so. Heapam, the planner, etc. have similar cases. And I think this, to some degree, explains a lot of the multixact problems. While there were a few "simple bugs", most of them were interactions between the various subsystems that are rather intricate. So, I think we have built up a lot of technical debt. And very little effort has been made to fix that; and in the cases where people have the reception has often been cool, because refactoring things obviously will destabilize in the short term, even if it fixes problems in the long term. I don't think that's sustainable. We can't improve the situation by just delaying the 9.5 release or something like that. We need to actively work on making the codebase easier to understand and better tested. But that is actual development work, and shouldn't happen at the tail end of a release. Regards, Andres
В списке pgsql-hackers по дате отправления: