The world’s leading publication for data science, AI, and ML professionals.

7 Reasons Why Scientific Software are Not Well Designed

Let's discuss some of the main reasons why software developed in the Academia and research centers are so poorly designed and find…

Photo by Markus Spiske on Unsplash
Photo by Markus Spiske on Unsplash

Due to changes in Medium.com policy concerning non-members reads, implemented in Sep 2023, this post is now freely available on geocorner.net: https://www.geocorner.net/post/7-reasons-why-scientific-software-are-not-well-designed

Introduction

It’s been some time I wanted to write about this topic, but I must confess I’ve been procrastinating it a little bit. Although I don’t have a formal Computer Science background myself (I have a civil engineer degree), I’ve been working with the IT industry for more than 15 years. During this time, I’ve seen some of the major trends and best practices used in software development to improve software quality.

Some years ago I took a license from my former position as IT manager in a Brazilian governmental agency, to start my doctorate in data science applied to Remote Sensing in France. Upon arrival, I was demanded to study a specific question related to water detection from satellite images. Then, to help me with my initial analysis and to avoid starting from scratch, they shared with me the code of a previous student that gathered some of the main ideas, but that needed, as they’ve said, some "improvements". That piece of code was part of a bigger processing chain that was meant to produce water quality maps on an European space agency.

Considering I was a complete newbie in the remote sensing field, I first thought, GREAT! I will not be starting from scratch, so it’s maybe half of the work done… but I was wrong.

Well, the code they’ve shared with me was written in Python, as most of the coding that is done in the Academia nowadays, and had no documentation, as "expected". It took me a lot of time to even understand what the input parameters should be. When I could finally make it work, the processing time for a really small sample area was… 10 minutes. I thought to myself. Wow! That’s rocket science! I was imagining the amount and complexity of all the mathematical operations that were being done "under the hood" to solve that problem.

Expectation vs Reality - Left: Photo by WikiImages on Pixabay; Right: Photo by Hello I'm Nik on Unsplash.
Expectation vs Reality – Left: Photo by WikiImages on Pixabay; Right: Photo by Hello I’m Nik on Unsplash.

After this first glimpse, I started my reverse engineering (thanks again to the lack of documentation) to understand all of the details. I took a paper notebook and started studying it… line by line, loop by loop, operator by operator. When it came to the main function (why classes if we can do it procedural?), the one that I expected would unveil all the answers to my questions, I got a big surprise! It had exactly 2148 lines of code. A single giant function. Sometimes the same code block repeated, again and again, prone to bugs, no rules, no guidelines, just that… a single giant function.

I am not giving this example (a very real one) to say that the code written in the academia is always crap. No, it is not. But it needs to get improved, and fast, otherwise tech companies will take the lead on research over traditional universities in the near future. Research needs data. And the amount of data that has to be analyzed nowadays cannot be manipulated manually in spreadsheets nor by softwares that struggle to produce reliable results. But, above all, imagining that a code like that was installed and running on a computational cluster of an European space agency, was the fact that surprised me the most.

And that brings us back to the main topic of this post. Why is it so difficult to have scientific software well designed, documented, well coded, etc. I did some research and share with you the main thoughts.

1- Researchers are not software engineers

Photo by RF._.studio on Pexels
Photo by RF._.studio on Pexels

The first and most obvious reason is that researchers are very skilled on their knowledge field, but they are not software engineers. And most of the software developed in the academia are not written by programmers, but by the researchers themselves, research groups or students. So, important concepts such as object orientation, testing, design patterns and many others are left behind because they are not taught on science courses.

2- Money

Photo by Senad Palic on Unsplash
Photo by Senad Palic on Unsplash

Lack of money is always an important constraint, everywhere. In this case, researchers don’t have specific budget set aside for software development. That’s why most of the scientific softwares are developed by students receiving scholarships or grants. And even when there are specific positions for IT professionals, the salaries in the academic area are normally lower than those in private sector, specially compared to the growing IT market, so it is difficult to keep talented professionals.

Still in the money section, contracting tech companies to outsource the job would be also prohibited financially. And there are other issues as discussed in reason #6.

3- Focus on papers, not code

Photo by Annie Spratt on Unsplash
Photo by Annie Spratt on Unsplash

Software is not the end goal. Scientists develop software for their own consumption and to get the results for the articles, period. Usability or design are not even considered in the project. In most cases, there is not even a "software", but just a prototype that did the job… once. The problem arises when this prototype is published as a software, without proper testing, documentation, etc.

Normally, the intention to reproduce the results on another area or with different inputs using the same prototype is not foreseen "a priori". Even when the software is intended to be used by other students or research groups, these fundamental topics are left behind and the prototype is used "as is".

4- Limited duration contracts

Photo by Nathan Dumlao on Unsplash
Photo by Nathan Dumlao on Unsplash

As software development is not their final goal, there are no permanent IT teams to take care of the software lifecycle. Softwares are usually funded from grants for specific research projects and maintained on a collaborative basis. But technology evolves continuously. I’ve came up to different scientific softwares or packages that don’t work anymore because they received no upgrade and the environment needed for them is not yet reproducible.

5- Lack of proper project management

Photo by Jo Szczepanska on Unsplash
Photo by Jo Szczepanska on Unsplash

Sometimes, the idea of developing a new scientific software is more ambitious and not confined to a single study/research and a bigger project is prepared. That would solve some of the aforementioned issues. But, the problem is that, in such situations, team leaders are usually the researchers themselves, not experienced project managers, so they are not qualified on managing time, quality or scope of the project. Parts of the software is then written by different students, with different background, without any software development process associated. Without guidelines, it is difficult to integrate the pieces together afterwards.

6- No commercial interest

Photo by NeONBRAND on Unsplash
Photo by NeONBRAND on Unsplash

The "money problem" would be solved if there was some commercial interest in the software being developed. That’s the case on very specific situations where an idea can impact an industry. However, as a rule of thumb, scientific software is ultra-specific for certain academic areas, and the user base is really limited. That leaves little commercial interest for tech companies to explore the market and compete to increase quality.

7- Difficulty for the programmers to understand the problem

Photo by KOBU Agency on Unsplash
Photo by KOBU Agency on Unsplash

Most of the previous reasons focus on the lack of computer science specialists involved in the development process of scientific software. However, there is another point that I would like to stand out. Scientific software is generally solving new problems on specific knowledge fields and it is not that easy to find programmers that are able to understand the science behind it and, at the same time, master all the necessary IT concepts. I have already seen scientists disappointed with IT professionals who were hired to make this bridge between science and programming because they were not able to catch up with the pace on the research aspect.

Conclusion

Well. As a conclusion, I would like to say that this is indeed a big topic and I don’t mean to answer all the philosophic questions on this quick article. I want it to serve as a starting point for the discussion and the pursuit of solutions. Will we ever have a category that is able to cope with the challenge of developing good scientific software? Should data scientists fill this blank?

Le me know your thoughts on that and start a good and productive discussion on the subject.

See you on the next post.

Stay Connected

If you liked this article and want to continue reading/learning these and other stories without limits, consider becoming a Medium member. I’ll receive a portion of your membership fee if you use the following link, for no extra cost.

Join Medium with my referral link – Maurício Cordeiro


Related Articles