### David Robinson

Data Scientist at Stack Overflow, works in R and Python.

# A Million Lines of Bad Code

This is the story of some bad code I wrote.

Early in my undergraduate days, I was working on a Java program that had to read a 6 MB file (a bacterial genome in FASTA format) into a string. I did this with a for loop that on each iteration concatenated on to a string. It looked something like:

BufferedReader reader = new BufferedReader( new FileReader (file));
String line = null;
String text = "";

text = text + line;
}


Building a string with a series of concatenations like this is extremely inefficient, meaning it took (no exaggeration) 40 minutes to read the file (here are a few better ways to do it). And this was a program that, once it had read the file, took maybe 10 seconds to run. For two days I actually worked that way: I’d make a change to the code, start the program running, and watch an entire episode of LOST while I waited. “Argh, made a mistake on line 12! Start again.”

After many repetitions, I thought “there must be a better way.” I discovered that I could write a loop in Perl that would read the genome in less than one second. (I wasn’t any better at Perl than I was at Java, I just got lucky). So I wrote a Perl script that read in the file, combined it, and printed it out as a single line. Then I had my Java program call the Perl script through the command line, capture the printed output, and save that as the string.

If I still had that program, I would post it on my wall to ensure I never shame someone about their “bad code.”