New Ways to Search for Code

The way we get a few bits of code for a project we’re working on has changed a lot over the past 5-10 years.

In ancient times (circa 1995), code was handed down from one generation to the next. If you were a Programmer and you needed help with some code, you went to a Programmer II. If you were a Programmer II, you went to a Senior Programmer or Programmer III or Systems Architect or whatever self-aggrandizing title your organization uses to make people feel good about themselves, until eventually you would hit management, that impenetrable meme-barrier where practical knowledge ceases to exist.*

Now, aside from rare occasions when you’re dealing with an in-house code practice, that chain of knowledge isn’t really that important. When you need some code, you Google for it. Someone, somewhere out there, has done exactly what you need to do and has posted it to the web, and you can get to it without having to exit the confines of your office or pay a fealty fine to the next-highest ranked programmer.** When people come to my office looking for code, a lot of times they get to watch me Google for it.

Though I’m a bit of a Google ninja and I can write regexp queries to get Google to begrudgingly pull gold from its vast array of rubbish, there are a couple of sites out there that make searching for code a bit easier.

The first, naturally, is Google Code Search.

Google Code Search crawls through public SVN and CVS repositories, as well as compressed files (zip, tar, 7z, bz2, etc.), so you can search through all of that code in one convenient place. You can use POSIX extended regular expression syntax along with some other keywords to narrow your search. For example, if you want to see some code by frustrated developers, try this:

insert a swear word here
or
“in case some idiot”

Viola - you’re not the only one. Here are some handy operators:

  • The lang: operator, which restricts by programming language (e.g., lang:”c++”, -lang:java, or lang:^(c|c#|c++)$)
  • The license: operator, which restricts by software license (e.g., license:apache, -license:gpl, or license:bsd|mit)
  • The package: operator, which restricts by package URL (e.g., package:”www.kernel.org" or package:.tgz$)
  • The file: operator, which restricts by filename (e.g., file:include/linux/$ or -file:.cc$)
A minus (-) before the operator indicates to get everything but the argument.

Because Google Code Search also includes code in archived files, be careful - don’t zip your application directory as a backup and leave it in a web share with directory browsing allowed if you don’t want it available to Google Code Search. To see if you have any exposed code, try package:”your.url.here”.

You can also use Google Code Search to look for code vulnerabilities, so it’s become a bit of a hacker tool. For example, if you wanted to look for SQL injection opportunities in PHP code, you could do something like so:

lang:php mysql_query(.$_(GET|POST).)
or
lang:php “WHERE username=’$_”

Yikes.

Google Code Search is a great tool. Don’t let the regexp syntax scare you off - if you haven’t done it before, it’s really a lot easier than it looks. Google also offers a REST API for Google Code Search, though I’m having a hard time imagining why you would need to use it.

Another site to check out is Krugle. It’s also a code searching tool, but it focuses exclusively on open source repositories and code (SourceForge, Apache, JavaDocs, etc.). It has a very nice Web 2.0 interface and has tools to add search capabilities to Firefox and IE. It even has a beta plug-in for Eclipse. Imagine having a window in Eclipse where you could search repositories of code and drop code right into your project, whether it be Java or PHP or whatever. I’m starting to tear up at the thought of it.

It also give you a bit more than just the code, with links to Projects, Tech Pages, and snippets from Safari online books.

Between the two, Krugle has the nicer interface, and Google Code Search has more material to search from. I’ve added both to my Firefox search engines, and I generally start with Krugle and go to Google Code Search if I don’t have any luck (unless it’s for a language like VB that isn’t used a whole lot in open source projects).


A notable exception being any managers I work for that happen to be reading this.

*
With the advent of Google, my fealty fines have really headed south. I used to be able to charge anywhere from a Coke to a slice of pizza for my coding wisdom. Now I can’t even get in a decent Eye-Roll or Mildly Condescending Sigh out before they whiz off to get the answer on their own. Forget about my Sardonic Sneer. My Sardonic Sneer, which I practiced in front of a mirror and could make an inexperienced programmer curl into a fetal position, is now only useful when confronted with the overzealous sales people in those portable junkware booths at the mall. Sigh. Sometimes technology sucks.