It has been 14 years and 24 beta releases since the beginning of Scikit-Learn and finally, it has reached its 1.0 release. It may sound a bit strange noting the fact that Scikit-Learn has been used by thousands of companies, Data scientists, researchers… for a long time and everyone considers it as the most spread framework for general purpose Machine Learning.
In this article, I do not want to make an analysis of the new features as many other articles do but to understand the aim of Scikit-Learn with this release and what is its strategy for future developments
1. Some history
Scikit-Learn was born in 2007 first as a Google Summer of Code project and continued being developed in a researching environment. Its objective was to serve as a tool to make data analysis without having to focus on any particular technology or code. For this reason, it is based on Python, an open-source language, easy to use, general-purpose, and able to embed C code

Another big problem when working with data is the computational resources in terms of memory and processing, so Scikit-Learn has always made a big effort in improving the algorithm efficiency to allow even the users with low computational resources to work with data. Scikit-Learn does it by using statistical approximations and low-level code (Cython).
Moreover, the key point of Scikit-Learn beyond efficiency and simplicity is the documentation. Many Data scientists (myself) have learned Machine Learning by reading the documentation of Scikit-Learn. It is not expected to just be code documentation but a learning path for Data Science.
2. Release 1.0
If we see the highlights of the release we can see that there are changes in the API and even some cool new feature

However, by seeing the changelog page and plotting the tags extracted by processing the HTML page, we can that the majority of the tags are fixes and API changes.

In the next section, we are going to dive into the three main topics that can sum up all the changes of the release.
3. Release highlights
3.1. API Standardization
One important pattern in the library interface is that all the modules tend to be interchangeable. This means, for instance, if you are trying to build a supervised model, the functions and methods to fit, test, predict and measure the accuracy of this model, are independent of the flavor of the supervised model that you are building (linear regression, decision trees, k-means…).
Standard interface and signature between objects with the same functionality.
However, the signature of the module was the same, but the values that the expected were not, between different modules and releases (e.g. "X should be np.matrix or np.array?", "loss=’ls’ or loss=’mse’?…"). Some highlights to solve this are:
- Signature: Now they enforce to use of only keyword-only arguments.
- Data types: new features are working with Pandas (for instance estimators store the feature_names of the pd.DataFrame when training). Meanwhile, the type np.matrix is deprecated.
- Argument values: Some functions and modules have the same arguments (loss, scaler, criterion,…) but the values it expected were different and this has changed. Some encoders can now accept missing and unknown values.

3.2. Computation Performance
When you create a library to work with data thinking to be served to everyone, you also have to consider that the majority of people do not have high computational resources, but just humble home laptops. That is why Scikit-Learn has always tried to use low-level embedded languages (Cython) to increase performance (such as in SVM and GBDT algorithms).
Scikit-Learn allows to use Machine Learning in humble laptops with low resources.
In this release more efficiency enhancements have been made in many functions and modules, for instance:
- Preprocessors: (StandardScaler, KBinsDiscretizer, PolynomialFeatures)
- Estimators: (Logistic regression, Neighbors, Cluster, … algorithms)
- Dimensionality Reduction algorithms
And also, new features that are not a performance enhancement but new models that take advantage of statistical properties to performs faster than the original models:
- Online One-Class SVM: Use One-Class SVM with stochastic gradient descent and kernel approximations to reduce the complexity from quadratic to linear.
- HistGradientBoostingClassifier: Gradient boosting implementation where the values of the features are split into splines. The trees learn the spline values instead of the value itself reducing cardinality.

3.3. Measurement
It is a common mistake to think about Scikit-Learn just as a Machine Learning framework but the truth is that it goes further. It provides a set of tools not only to develop models but also to measure and understand the predictions of a trained model.
This point is specially important when you do not just want to create a predictive model but to extract insights from what it has learned
This release brings some new features that ease the metric computation and also some new metrics:
- Added method: for metrics and plots from_estimator and from_predictions.
- New metrics: pinball_loss and tweedie_score.
- New plots for displaying results CalibrationDisplay.

3.4. Community
Last but not least, are other kinds of changes that are not related to any technical reason but just to solve concerns and needs from the community. Scikit-Learn has always tried to be more than just a library but a complete environment to bring machine learning knowledge to people and define some standards. and a proof of that is the following fact:

Out of over 2100 merged pull requests, about 800 of them are improvements to our documentation.
These updates in the documentation are mainly to increase the quality of the resources to the users and also to supply more needs that they require. This needs can be also for ethical reasons such as the removal of the Boston dataset
4. Conclusions
In this article, we have covered the main features of release 1.0 of Scikit-Learn and tried to explain their importance from the historical context of the library.
It may sound strange that the first major release was mainly Fixes and API changes instead of new features, however, understanding the aim of Scikit-Learn, we have seen that its purpose has always been to
Define a standard of Machine Learning to bring tools to the majority of the people independently if they are researcher, employers, students or just as a hobby.
and this is exactly what they have achieved with this release.