You are here

Dealing with (Computer) Bugs in the Forest


Thursday, July 28, 2016, by Moe Pwint Phyu
Printer-friendly version

Imagine you are a scientist with amazing data sets trying to make a groundbreaking discovery. But first, you need to replicate the way that an earlier scientist analyzed data sets for you to contextualize the experiment. And you painstakingly replicate every step of the whole analysis, but then you run into bug after bug in your code. You finally figure out that you missed a crucial step in the data manipulations leading up to the statistical tests—because the earlier scientist forgot to mention it in their methods.

My mentors, Barbara Lerner and Emery Boose, try to tackle this problem by helping scientists record how they transformed their raw data, also known as data provenance. They are developing software tools to record data provenance mainly in R, a statistical analysis software that is popular among scientists. Therefore, they produced an R package called RDataTracker and a Java program called DDG Explorer. RDataTracker captures data derivation from an R script (the code used to automate a task like data analysis) and DDG Explorer uses that processed information for visualization.

The first two weeks of the summer program were spent familiarizing myself with the software tools and thinking from a user’s perspective about what would make the software easier to use. From the user standpoint, I realized that it took multiple inconvenient steps to see the final visualization. First, I needed to execute RDataTracker command in R environment. Then, I needed to run the Java program and select the correct path that stores the data derivation file. It was hard to remember the path and the process became tedious after doing it for a few times.

So as a developer, I decided to integrate DDG Explorer into RDataTracker. Since the two programs are written in different languages, I explored different ways to connect the tools together.  I ultimately found a way to run Java by calling the operating system from R. Here's the before and after:

The next project I worked on was to help scientists with debugging. To debug, scientists usually execute their script line by line so that they can see where the bug lies. But DDG Explorer didn’t let you update visualization step by step, making debugging even more tedious than it had to be. So, I implemented incremental drawing, the process of adding on to the existing drawing rather than redrawing from the beginning. In order to do this, I used socket Connections, a way to relay information from R to Java. Along the way, I learned other valuable skills such as reading other people’s code and fishing for the code that I needed to modify among 10,000+ lines of code. Eventually, after much Googling, Stackoverflowing, and troubleshooting with my mentors, I was able to implement the feature.

In addition to finding software bugs in my program and staring at the computer for all day long, my partner Alex and I went into the forest occasionally to check whether the water sensors are working correctly at six different sites in the woods. On our quest, there were (real) bugs following us like our bodyguards. But with the help of bug spray, I mostly avoided getting too many bug bites. Triumphing over the bugs, I found plenty of ways to have fun in this summer program: biking with friends, taking nature walks, touring the Cabot Cheese and Ben and Jerry’s factories, playing intense badminton tournaments, and sometimes even doing balletminton (ballet+badminton). Overall, this has been a great summer filled with coding and more in the forest.

Bugs: 0   Me: 1