One code to rule them all — A story about code quality

Lauro Bravar
5 min readApr 3, 2019

Let’s start from the very beginning…

I studied Computer Engineering in college, and there I was taught throughout many courses how to code properly. We went through code design, testing, program readability, and software quality in general. The Shire in its full splendor.

I worked with my classmates on several projects where we were “forced” to apply the techniques that we’ve learned, such as unit-testing, diagramming the software architecture, commenting as if we were writing a novel and so on. But for the most part we learned through getting downgraded because of not commenting enough or having reunions with professors because the delivered code was unreadable.

My coding life was okay from the third year of college, where the most important aspects of good coding were internalized, onwards. But here comes Data Science

The glooming appeal of Data Science

I entered a job in Data Science (which is the major I did in college) after a proposal from a teacher. The code from this teacher was to me, and still is, like the Holy Grail to Indiana Jones. Fully documented, commented just enough: not too little so that it couldn’t be understood, not too much so that it was overwhelming, perfectly ordered in the most logical way, and what struck me the most: it was so elegant!

He usually coded in Emacs, which, in case you are unfamiliar with it, is probably the top 1 most difficult yet useful/powerful code editor out there. He also said (and I’m pretty sure he was exaggerating here) that he learned a new programming language every year, and I don’t mean understanding the language sample scripts, I mean in-depth knowledge and development skills in that language.

So this teacher called and said to me:

“Hey man, I’ve finished a contract for a project in this company and they’ve asked me for someone to continue my work. I think you would love it! Are you interested?”

He explained what the project was about to me and I just heard the words branch and bound search, planning and optimization, and immediately said I’m in!

The funny part is that during this conversation he mentioned one thing, which I really didn’t pay much attention to:

“They have a couple of guys working in their algorithms team, but their code is a mess because they don’t even know how to program, they are not computer engineers!”

After this year and a half of working with the most wonderful and incredibly talented team of Data Scientists I know what he meant!

I should also mention, this same professor has this theory that when we leave college after earning the computer engineering degree, we still don’t know how to code, which I also agree with to some extent.

The first big shocking experience when reading another person’s code came when I read this piece of Python code on forecasting for this same project which a teammate (and now friend) did. At first sight the code seemed normal, just a couple of functions declared. They seemed too long and heavily uncommented (that means one comment per function), but not too scary. But then I started scrolling down

Code hunting me down!

Oh boy, here came the Loch Ness monster, the Kraken and the Bogeyman all summed up in one terrifying creature. An endless stream of code, uncommented (I think there was about one comment every 50/70 lines of code), no functions (yes, no functions at all after those first two), level 10 indentation (that means a for-loop inside a while-loop inside an if-else inside a for-loop inside an if-else inside a while-loop inside a for-loop inside a for-loop inside an if-else inside an if-else — I’m not exaggerating) and variables named “a”, “b”, “x”, “var”, “this” (and using the same name for different variables 100 lines later). Elegancy-wise, let’s just say that there were some While True loops that were better off not written at all…

One code to rule them all

The code was like the text in the ring above, only understandable if you know the Dark Tongue of Mordor…

The best part is that the code worked perfectly, and that what it represented, the functionality of the code, was brilliant (just like the ring)! I need to mention that the fellow that programmed it is a total genius; he used to lose me quite frequently when we spoke about technical matters, specially Neural Networks!

We had many conversations about his “style of coding”, and he put his heart to making it better, which he absolutely did! And I hope he learned from my irritating and constant code remarks as much as 1% of what I learned from him.

Lastly, I need to mention before moving on, that I work in a product development department, as opposed to a project-driven department. This means that code will be reused and will have to be understood by other people, no matter what, hence the critical aspect of good quality code.

The point here is that when working on a team, or if your code is going to be used, read, understood, etc. by any other human being it needs to have a minimum degree of quality. And for the most part, data scientists that don’t come from a software-centric (or programming-centric) environment don’t know how to write good quality, reusable code.

Some of the things which I find obvious such as: 200-line functions should be split, a variable shouldn’t be named “var”, 27 global variables declared at the start of an unparameterized code is probably not good, and so forth, are not obvious nor intuitive to many people without the aforementioned software quality background.

Summing up

I think that a lot of the bad quality code in the field of Data Science comes from all those notebook tutorials, which are very good to learn the techniques being taught, but they really do a number on coding standards for new-to-code people. In fact a lot of the things I mentioned here as please don’t do this are seen in the best Data Science tutorials and online courses. And I understand it, that code is not meant for production, it’s meant for teaching.

Also the field of Data Science is full of people with a lot of different backgrounds, like mathematicians, physicists, all sort of engineers or even economists. This makes it even harder to have a good quality standard of coding, since they learn from many different places. I bet this happens in some other fields too.

I truly wish there was…

One code to rule them all, One code to find them, One code to bring them all and in the darkness bind them.

And by the darkness I mean good quality of course :)

If you want to know how to govern good quality code like the one ring governed the rest of the rings, check this out!

--

--

Lauro Bravar

Curious homo sapiens sapiens that reads, codes and writes. I work as a data scientist but I thrive to acquire knowledge from as many fields as possible.