Обсуждение: I want to search my project source code
I have a lot of code -- millions of lines at this point, written over the last 5 years. Everything is in a bunch of nested folders. At least once a week, I want to find some code that uses a few modules, so I have to launch a find + grep at the top of the tree and then wait for it to finish. I wonder if I could store our source code in a postgresql table and then use full text searching to index. Then I hope I could run a query where I ask for all files that use modules X, Y, and Z. I'm looking for something sort of like the locate utility, except that instead of building a quickly-searchable list of file names, I want to be able to search file contents also. Matt
Matthew Wilson <matt@tplus1.com> writes: > At least once a week, I want to find some code that uses a few modules, > so I have to launch a find + grep at the top of the tree and then wait > for it to finish. Personally I use glimpse for this. It's a bit old and creaky but it performs wonders. There might be something better out there by now. I wouldn't recommend trying to use a standard FTS to index code: code is not a natural language and the kinds of searches you usually want to perform are a lot different. As an example, I glimpse for "foo" when looking for references to a function foo, but "^foo" when seeking its definition (this relies on the coding conventions about function layout, of course). An FTS doesn't think start-of-line is significant so it can't do that. regards, tom lane
openfts.sf.net is tool for you. It has even example scripts for indexing/searching file system. Oleg On Sat, 27 Oct 2007, Matthew Wilson wrote: > I have a lot of code -- millions of lines at this point, written > over the last 5 years. Everything is in a bunch of nested folders. > > At least once a week, I want to find some code that uses a few modules, > so I have to launch a find + grep at the top of the tree and then wait > for it to finish. > > I wonder if I could store our source code in a postgresql table and > then use full text searching to index. Then I hope I could run a query > where I ask for all files that use modules X, Y, and Z. > > I'm looking for something sort of like the locate utility, except that > instead of building a quickly-searchable list of file names, I want to > be able to search file contents also. > > > Matt > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
Matthew Wilson wrote: > I have a lot of code -- millions of lines at this point, written > over the last 5 years. Everything is in a bunch of nested folders. > > At least once a week, I want to find some code that uses a few modules, > so I have to launch a find + grep at the top of the tree and then wait > for it to finish. > > I wonder if I could store our source code in a postgresql table and > then use full text searching to index. Then I hope I could run a query > where I ask for all files that use modules X, Y, and Z. DBMSs are great tools for the right job, but IMO this is not the right job. I can't see how a database engine, with all it's transactional overhead and many other layers, will ever beat a simple grep performance-wise. I've used Eclipse for refactoring, but having done it once, I'm sticking with grep. -- Guy Rouillier
On Oct 28, 2007, at 12:59 AM, Guy Rouillier wrote: > Matthew Wilson wrote: >> I have a lot of code -- millions of lines at this point, written >> over the last 5 years. Everything is in a bunch of nested folders. >> At least once a week, I want to find some code that uses a few >> modules, >> so I have to launch a find + grep at the top of the tree and then >> wait >> for it to finish. >> I wonder if I could store our source code in a postgresql table and >> then use full text searching to index. Then I hope I could run a >> query >> where I ask for all files that use modules X, Y, and Z. > > DBMSs are great tools for the right job, but IMO this is not the > right job. I can't see how a database engine, with all it's > transactional overhead and many other layers, will ever beat a > simple grep performance-wise. I've used Eclipse for refactoring, > but having done it once, I'm sticking with grep. This is exactly what cscope is good for. http://cscope.sourceforge.net/ I've used it since the early 90's. I do level 3 support for really big companies. If you are an emacs fan, its hooked in to it as well. You want to use the -q option. If it is a million lines of code, its going to take a while. It pseudo-parses the code (some tricky constructs will confuse it) and builds a very simple database file. I think it uses Berkeley's DB file. After that, finding all the occurrences of foo is a few seconds. If you want to find just definitions (like where is foo defined), then use ctags or etags. There is exuberant ctags here: http://ctags.sourceforge.net/ Perry Smith ( pedz@easesoftware.com ) Ease Software, Inc. ( http://www.easesoftware.com ) Low cost SATA Disk Systems for IBMs p5, pSeries, and RS/6000 AIX systems
Perry- Does cscope support PHP? Thanks for the link M-- ----- Original Message ----- From: "Perry Smith" <pedz@easesoftware.com> To: "Guy Rouillier" <guyr-ml1@burntmail.com> Cc: <pgsql-general@postgresql.org> Sent: Sunday, October 28, 2007 10:25 AM Subject: Re: [GENERAL] I want to search my project source code > On Oct 28, 2007, at 12:59 AM, Guy Rouillier wrote: > > > Matthew Wilson wrote: > >> I have a lot of code -- millions of lines at this point, written > >> over the last 5 years. Everything is in a bunch of nested folders. > >> At least once a week, I want to find some code that uses a few > >> modules, > >> so I have to launch a find + grep at the top of the tree and then > >> wait > >> for it to finish. > >> I wonder if I could store our source code in a postgresql table and > >> then use full text searching to index. Then I hope I could run a > >> query > >> where I ask for all files that use modules X, Y, and Z. > > > > DBMSs are great tools for the right job, but IMO this is not the > > right job. I can't see how a database engine, with all it's > > transactional overhead and many other layers, will ever beat a > > simple grep performance-wise. I've used Eclipse for refactoring, > > but having done it once, I'm sticking with grep. > > This is exactly what cscope is good for. > > http://cscope.sourceforge.net/ > > I've used it since the early 90's. I do level 3 support for really > big companies. If you are an emacs fan, its hooked in to it as well. > > You want to use the -q option. If it is a million lines of code, its > going to take a while. It pseudo-parses the code (some tricky > constructs will confuse it) and builds a very simple database file. > I think it uses Berkeley's DB file. After that, finding all the > occurrences of foo is a few seconds. > > If you want to find just definitions (like where is foo defined), > then use ctags or etags. There is exuberant ctags here: > > http://ctags.sourceforge.net/ > > Perry Smith ( pedz@easesoftware.com ) > Ease Software, Inc. ( http://www.easesoftware.com ) > > Low cost SATA Disk Systems for IBMs p5, pSeries, and RS/6000 AIX systems > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
On Oct 28, 2000, at 9:41 AM, Martin Gainty wrote: > Perry- > > Does cscope support PHP? I don't think so. Exuberant tags suppose a lot of languages but it does not do references (I think) -- just definitions. > > Thanks for the link > M-- > ----- Original Message ----- > From: "Perry Smith" <pedz@easesoftware.com> > To: "Guy Rouillier" <guyr-ml1@burntmail.com> > Cc: <pgsql-general@postgresql.org> > Sent: Sunday, October 28, 2007 10:25 AM > Subject: Re: [GENERAL] I want to search my project source code > > >> On Oct 28, 2007, at 12:59 AM, Guy Rouillier wrote: >> >>> Matthew Wilson wrote: >>>> I have a lot of code -- millions of lines at this point, written >>>> over the last 5 years. Everything is in a bunch of nested folders. >>>> At least once a week, I want to find some code that uses a few >>>> modules, >>>> so I have to launch a find + grep at the top of the tree and then >>>> wait >>>> for it to finish. >>>> I wonder if I could store our source code in a postgresql table and >>>> then use full text searching to index. Then I hope I could run a >>>> query >>>> where I ask for all files that use modules X, Y, and Z. >>> >>> DBMSs are great tools for the right job, but IMO this is not the >>> right job. I can't see how a database engine, with all it's >>> transactional overhead and many other layers, will ever beat a >>> simple grep performance-wise. I've used Eclipse for refactoring, >>> but having done it once, I'm sticking with grep. >> >> This is exactly what cscope is good for. >> >> http://cscope.sourceforge.net/ >> >> I've used it since the early 90's. I do level 3 support for really >> big companies. If you are an emacs fan, its hooked in to it as well. >> >> You want to use the -q option. If it is a million lines of code, its >> going to take a while. It pseudo-parses the code (some tricky >> constructs will confuse it) and builds a very simple database file. >> I think it uses Berkeley's DB file. After that, finding all the >> occurrences of foo is a few seconds. >> >> If you want to find just definitions (like where is foo defined), >> then use ctags or etags. There is exuberant ctags here: >> >> http://ctags.sourceforge.net/ >> >> Perry Smith ( pedz@easesoftware.com ) >> Ease Software, Inc. ( http://www.easesoftware.com ) >> >> Low cost SATA Disk Systems for IBMs p5, pSeries, and RS/6000 AIX >> systems >> >> >> >> ---------------------------(end of >> broadcast)--------------------------- >> TIP 5: don't forget to increase your free space map settings >> > > ---------------------------(end of > broadcast)--------------------------- > TIP 6: explain analyze is your friend >
Tom Lane wrote: > I wouldn't recommend trying to use a standard FTS to index code: > code is not a natural language and the kinds of searches you usually > want to perform are a lot different. As an example, I glimpse for > "foo" when looking for references to a function foo, but "^foo" > when seeking its definition (this relies on the coding conventions > about function layout, of course). An FTS doesn't think start-of-line > is significant so it can't do that. +1. The nice thing about a tool that understands code is that you can query it in ways that make sense to code. For example I can search for "all files that include foo.h" or "all callers of function bar" or "all occurences of the symbol baz". I use cscope for this, which integrates nicely into my text editor (vim), and others have told me they use kscope which puts it inside a nice GUI window, if you care about such things. -- Alvaro Herrera http://www.amazon.com/gp/registry/5ZYLFMCVHXC "I would rather have GNU than GNOT." (ccchips, lwn.net/Articles/37595/)