Opinion

Update – Special thanks to Koki Yoshimoto for translating this article into Japanese. Check it out here.
In a recent group discussion, I found data scientists arguing over which machine learning framework is better: PyTorch or TensorFlow. What I found funny is that I’ve heard other versions of this debate countless times before. Python or R? MATLAB or Mathematica? Windows or Linux? As I’ve learned more and more programming languages over the years, I’ve found the question shouldn’t be which language or framework is the best, but rather which is best for the task at hand.
So what’s the best language to use for creating a machine learning prototype? If you ask five data scientists this question, you might get five different answers. To answer this question, you should think about what language you are most comfortable in, as well as what libraries and models are available in that language so that you can create a machine learning prototype as quickly as possible.
Programming Languages as a Notation vs. Technology
We often view programming languages as a form of technology. Programming languages all accomplish the same high level task of developing machine code, and therefore may be considered equivalent as forms of technology. However, even though they accomplish the same task, one language might perform the task faster, or even allow you to accomplish other more complex tasks that another language does not. When viewed as a form of technology, languages are logically equivalent. However, when viewed as a notation, one language may be better than another for the task at hand.
An analogy in mathematics is the concept of calculating the derivative of a function. To do this task, a mathematician needs to use a form of notation (Figure 1).

On the right side of Figure 1, we have the dot notation that Sir Isaac Newton developed. Roughly around the same time, Gottfried Wilhelm Liebniz’s notation on the left was developed. If we think about these notations as technology, they are logically equal because they both allow us to calculate the derivative of a function.
However, when we think about them as notations, the two are not necessarily equal. One notation may be better for a certain task. For example, Liebniz’s notation makes solving differential equations by separation of variables, integration by parts, and other approaches far more intuitive than Newton’s notation. As such, we might consider Liebniz’s notation more powerful than Newton’s for solving differential equations (Figure 2).

If there are various levels of power associated with the usefulness of two different forms of math notation, are there levels of power associated with programming languages? Absolutely!
Optimize the Time it Takes to Create an Initial Prototype
In every debate over which programming language is the best, the computation performance of the language always seems to come up. Take for example this article from Peter Xie that shows Python is 45,000 times slower than C. Does C being faster than Python mean it’s a better language? Not necessarily. While I can pull out a few tricks to speed up my Python code, I think people are missing a really important point. Developer and Data Scientist time is far more expensive than machine time.
People often get hung up on trying to make their code perfectly optimized on their first or second iteration. Instead of optimizing for machine time, I believe that people should be optimizing the amount of time it takes them to write the code for their proof of concept and initially deliver value to their customers.

While Python may be computationally slower than C, it takes me fewer lines of code and ultimately less time to create a machine learning model in Python compared to C. Little things like dynamic typing over static typing make a real difference in speed when you are prototyping. But this doesn’t necessarily mean Python is the right language for the job because many of the standard languages that Data Scientists use have similar capabilities and abstractions. It actually seems to come down to the libraries more so than the core languages.
If you’re like me, you might be doing a lot of Natural Language Processing (NLP). My experience is that Python has the most powerful libraries for NLP. Libraries like scikit-learn, NLTK, spaCy, gensim, textacy, and many others allow Data Scientists to quickly clean, pre-process, extract features, and model on text data. But if I’m trying to do data profiling or time series analysis, I find that the packages available in R are more powerful. However, if I need to use more sophisticated time series methods like recurrent neural networks, I often find myself back in Python because its libraries are great for deep learning.
When you begin to use complex machine learning architectures like deep learning, usually the time it takes to train your model rapidly increases. But training time is becoming less of an issue as compute power continues to increase. In fact, I would argue we are creating more complicated machine learning largely because so much compute power has become available and organizations are racing to use every bit of it.
Ultimately, when selecting a language or library for your project, I recommend choosing one that will optimize the time you spend creating your initial prototype. When someone learns to code in a language, they actually start to think in that language. I’ve found this true myself. I primarily code in Python today, and when I decide how to solve a problem, I think in terms of how to solve it with Python. Looking back on my career as I’ve learned new languages, I’ve realized that I would have thought completely differently about how to solve the problem given the language I was working in at the time.

Factors Affecting Prototyping Speed
One main factor that affects prototyping speed is the time it takes to build model architectures. I’ve generally gravitated towards libraries such as scikit-learn to build model architectures. I don’t have to do much work at all to build a model architecture which allows me to build an initial model quickly. I generally start with the simplest model and work my way up in model complexity. It’s only after I’ve worked through simple models that I will begin building more complicated deep learning models. At this point, I’m still looking for abstractions that will make putting a model architecture together as quick as possible using tools like Keras and Tensorflow.
Another factor that affects prototyping speed is the amount of data needed to train an initial model. Oftentimes we can start with a simple model that doesn’t require much data. But sometimes the simple model doesn’t meet our accuracy requirements, so we need to use a more complicated model, which usually requires more data. When it might take a month to acquire/create more data, this ultimately slows down prototyping speed.
To speed up the process of acquiring/creating more data, you might try to use transfer learning techniques or pre-trained models. While you might be able to prototype more quickly with these techniques or models, you might also be introducing some serious ethical concerns into your prototype.
Ethical Concerns When Quickly Prototyping
As a Data Scientist, we are responsible for ensuring that our models do not produce biased results against a protected class. Amazon learned this lesson the hard way. If you are using a pre-trained model, you likely don’t have access to the dataset that it was trained on. Even if you have the data it was trained on, you probably don’t know how the data was cleaned, pre-processed, or sampled for training and testing. Each one of these steps can fail to remove inherit bias in the dataset. Or worse, it could introduce or even amplify bias in the dataset. I can only speak for myself, but I’m not confident that I can remove bias from a pre-trained model during the transfer learning process. As such, I’m quite cautious about the use of this method, but when I’m confident the risks associated with a business use case are limited, I’ve let websites like Hugging Face dictate which library I use.
In this case, the power of the machine learning models are actually taking the place of the power of the library or the language dictating what I use. As such, I routinely find myself doing more cutting edge Data Science work in PyTorch, but this involves using transfer learning or directly using a model built by another organization without any transfer learning.
Conclusion
At the end of the day, the regularly occurring battle of the languages, libraries, and models are misguided. There isn’t a single programming language that is the best for data science. The best programming language is the one that allows you to prototype the fastest and deliver value to your customers and/or end users.
While prototyping, don’t get caught up optimizing your code immediately. Instead, optimize for Data Scientist speed. Once you have successfully proven your model will solve the problem at hand, you can use a profiler to help identify where the bottlenecks are in your code. This allows you to optimize for Machine Time, but you can’t profile your code until it is written. Don’t put the cart before the horse! Identify the type of problem you are trying to solve and choose the best set of tools that will help you build the fastest.
References
- https://peter-jp-xie.medium.com/how-slow-is-python-compared-to-c-3795071ce82a#:~:text=It%20is%20450%20million%20loops,mode%20for%20a%20better%20performance.&text=Yes%2C%20it%20is%20unbelievable!,45%2C000%20times%20faster%20than%20Python.
- https://skl.sh/3dq3Iz0
- https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G