Re: Large C files
От | Peter Geoghegan |
---|---|
Тема | Re: Large C files |
Дата | |
Msg-id | CAEYLb_XQf6JmppcNnEtN+T5mEuYw7YZxEe=LV_hyLWndGPfnWQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Large C files (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: Large C files
Re: Large C files |
Список | pgsql-hackers |
On 6 September 2011 21:07, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Sep 6, 2011 at 3:56 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> The only way I would entertain thoughts of major refactoring would be >> if comprehensive tests were contributed, so we could verify everything >> still works afterwards. > > That sounds like a really good idea. Mind you, I don't have a very > clear idea of how to design such tests and probably no immediate > availability to do the work either, but I like the concept very much. More or less off the top of my head, I don't think that it's much trouble to write an automated test for this. Of course, it would be as flawed as any test in that it can only prove the presence of errors by the criteria of the test, not the absence of all errors. Here's what could go wrong that we can test for when splitting a monster translation unit into more manageable modules that I can immediately think of, that is not already handled by compiler diagnostics or the build farm (I'm thinking of problems arising when some headers are excluded in new .c files because they appear to be superfluous, but turn out to not be on some platform): * Across TUs, we somehow fail to provide consistent linkage between the old object file and the sum of the new object files. * Within TUs, we unshadow a previously shadowed variable, so we link to a global variable rather than one local to the original/other new file. Unlikely to be a problem. Here's what I get when I compile xlog.c in the usual way with the addition of the -Wshadow flag: [peter@localhost transam]$ gcc -O0 -g -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wformat-security -fno-strict-aliasing -fwrapv -Wshadow -g -I../../../../src/include -D_GNU_SOURCE -c -o xlog.o xlog.c -MMD -MP -MF .deps/xlog.Po xlog.c: In function ‘XLogCheckBuffer’: xlog.c:1279:12: warning: declaration of ‘lower’ shadows a global declaration ../../../../src/include/utils/builtins.h:793:14: warning: shadowed declaration is here xlog.c:1280:12: warning: declaration of ‘upper’ shadows a global declaration ../../../../src/include/utils/builtins.h:794:14: warning: shadowed declaration is here xlog.c: In function ‘XLogArchiveNotifySeg’: xlog.c:1354:29: warning: declaration of ‘log’ shadows a global declaration xlog.c: In function ‘XLogFileInit’: xlog.c:2329:21: warning: declaration of ‘log’ shadows a global declaration xlog.c: In function ‘XLogFileCopy’: xlog.c:2480:21: warning: declaration of ‘log’ shadows a global declaration xlog.c: In function ‘InstallXLogFileSegment’: xlog.c:2598:32: warning: declaration of ‘log’ shadows a global declaration xlog.c: In function ‘XLogFileOpen’: xlog.c:2676:21: warning: declaration of ‘log’ shadows a global declaration xlog.c: In function ‘XLogFileRead’: *** SNIP, CUT OUT A BIT MORE OF SAME*** Having looked at the man page for nm, it should be possible to hack together a shellscript for src/tools/ that: 1. Takes one filename as its first argument, and a set of 2 or more equivalent split file names (no file extension - there is a requirement that both $FILENAME.c and $FILENAME.o exist). 2. Looks at the symbol table for the original C file's corresponding object file, say xlog.o, as output from nm, and sorts it. 3. Intelligently diffs that against the concatenated output of the symbol table for each new object file. This output would be sorted just as the the single object file nm output was, but only after concatenation. Here, by intelligently I mean that we exclude undefined symbols. That way, it shouldn't matter if symbols go missing when, for example, Text section symbols from one file but show up in multiple other new files as undefined symbols, nor should it matter that we call functions like memcpy() from each new file that only appeared once in the old object file's symbol table. 4. Do a similar kind of intelligent diff with the -Wshadow ouput above, stripping out line numbers and file names as the first step of a pipline, using sed, sorting the orignal file's compiler output, and stripping, concatenating then sorting the new set of outputs. I think that after that, the output from the original file should be the same as the combined output of the new files, unless we messed up. If you have to give a previously static variable global linkage, then prima facie "you're doing it wrong", so we don't have to worry about that case - maybe you can argue that it's okay in this one case, but that's considered controversial. We detect this case because the symbol type goes from 'd' to 'D'. Of course, we expect that some functions will lose their internal linkage as part of this process, and that is output by the shellscript - no attempt is made to hide that. This test doesn't reduce the failure to a simple pass or fail - it has to be interpreted by a human. It does take the drudgery out of verifying that this mechanical process has been performed correctly though. I agree that C files shouldn't be split because they're big, but because modularising them is a maintainability win. I also think that 10,000 line C files are a real problem that we should think about addressing. These two views may seem to be in tension, but they're not - if you're not able to usefully modularise such a large file, perhaps it's time to reconsider your design (this is not an allusion to any problems that I might believe any particular C file has) . Anyone interested in this idea? I think that an actual implementation will probably reveal a few more problems that haven't occurred to me yet, but it's too late to produce one right now. On 6 September 2011 08:29, Peter Eisentraut <peter_e@gmx.net> wrote: > I was thinking about splitting up plpython.c, but it's not even on that > list. ;-) IIRC the obesity of that file is something that Jan Urbański intends to fix, or is at least concerned about. Perhaps you should compare notes. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Robert HaasДата:
Сообщение: Re: [PATCH] Log crashed backend's query (activity string)