David Robinson

Data Scientist at Stack Overflow, works in R and Python.

Email Twitter Github Stack Overflow

A Million Lines of Bad Code

This is the story of some bad code I wrote.

Early in my undergraduate days, I was working on a Java program that had to read a 6 MB file (a bacterial genome in FASTA format) into a string. I did this with a for loop that on each iteration concatenated on to a string. It looked something like:

BufferedReader reader = new BufferedReader( new FileReader (file));
String line = null;
String text = "";

while( ( line = reader.readLine() ) != null ) {
text = text + line;
}


Building a string with a series of concatenations like this is extremely inefficient, meaning it took (no exaggeration) 40 minutes to read the file (here are a few better ways to do it). And this was a program that, once it had read the file, took maybe 10 seconds to run. For two days I actually worked that way: I’d make a change to the code, start the program running, and watch an entire episode of LOST while I waited. “Argh, made a mistake on line 12! Start again.”

After many repetitions, I thought “there must be a better way.” I discovered that I could write a loop in Perl that would read the genome in less than one second. (I wasn’t any better at Perl than I was at Java, I just got lucky). So I wrote a Perl script that read in the file, combined it, and printed it out as a single line. Then I had my Java program call the Perl script through the command line, capture the printed output, and save that as the string.

If I still had that program, I would post it on my wall to ensure I never shame someone about their “bad code.”

Code Shaming

I was inspired to share this confession by today’s XKCD comic about “Code Quality”:

The whole comic is rather mean-spirited- the experienced programmer doesn’t provide any helpful advice, just hyperbolic analogies. But what strikes me most is how unrealistic that last line is: Okay, I’ll read a style guide.” Would you react that way, if someone had been so rude to you? Isn’t a more likely response “Well, that’s the last time I show you my code,” or even “Well, I guess I shouldn’t be programming at all”?

This is exactly how you end up with crises of reproducible research in science. There are many reasons scientists publish papers without sharing their code (none of them defensible), but high on the list is embarrassment: “my code is too ugly to share.” Code shamers aren’t helping!

A Million Lines of Bad Code

There’s a popular piece of writing advice that has many variations, but usually goes something like: “Everyone has a million bad words in them. Only once they’ve written those million words can they start writing well. So get writing!”

I was reminded of this by some insightful conversations on Twitter about the above XKCD:

If you’re an experienced programmer and you’re tempted to code-shame someone, try thinking back to your own million lines of bad code. Imagine someone had mocked you then, the way I’d like to mock myself for my Perl/Java “solution”. Would you have continued asking for help? Would you have ever gotten through your million lines?