A practical example on applying useful Scikit-learn features like pipelines and grid search in deep learning.

After spending a good part of 2019 learning the basics of machine learning, I was keen to start experimenting with some rudimentary deep learning.
While there’s no shortage of great tutorials for getting started with user-friendly libraries like Keras, it was far harder to find examples that connect the dots for beginners by demonstrating how useful Scikit-learn features like pipelines and grid search can be relevant as well in deep learning.
So I decided to pull together the materials I found on this subject, and rustled up a series of notebooks that would hopefully help fellow newcomers who are looking to take the plunge into deep learning.
In these notebooks, I used a mix of machine learning and deep learning techniques to try to predict the rain pattern in Singapore in December 2019. The models were trained on 37 years of weather data in Singapore, from Jan 01 1983 to the end of November in 2019.
NOTEBOOKS, DATA AND ASSUMPTIONS
Here’s the Github repo for my ongoing series of Data Science projects using historic weather data from Singapore. The specific notebooks for this post can be found [[[here](https://github.com/chuachinhon/weather_singapore_cch/blob/master/notebooks/5.3_dl_keras_tuner_cch.ipynb)](https://github.com/chuachinhon/weather_singapore_cch/blob/master/notebooks/5.2_dl_keras_gridsearch_cch.ipynb)](https://github.com/chuachinhon/weather_singapore_cch/blob/master/notebooks/5.1_ml_LR_XGB_cch.ipynb), here, here and here.
For brevity’s sake, I won’t paste any code on this post. The notebooks are simple enough to follow, and are easier to read on Github or via direct download.
Data for this post was taken from the Singapore Met Service’s website. To keep the project simple, it was kept to a binary classification problem to predict whether a given day would be rainy (1) or dry (0). I also chose not to drop the outliers, so as to expose the models to as much data as possible.
The dataset I’m using spans 37 years, but it contains just under 13,500 rows of data. It is fair to ask whether you need Deep Learning for a dataset like this, and whether it necessarily produces better results.
These are valid concerns, especially considering the additional time and resources needed. But I decided to set them aside since the goal here is to experiment and learn.
RESULTS
I hate having to scroll through a lengthy blog post just to see the results, so let’s have a quick look at how an XGB model fared against a Keras Classifier in predicting rain/no-rain pattern in Singapore in December 2019:


The Keras model did slightly better than the XGB version, correctly classifying 26 rainy/no-rain days out of 31. The XGB model managed 25 out of 31.
Both the Keras and XGB models shared the same weakness in recall scores (which were relatively lower compared to precision and f1 scores) – ie the ability to correctly classify rainy days as such.
Still, not too shabby overall. With a richer set of weather data, such as humidity and atmospheric pressure, we could possibly get even more accurate predictions.
Now, let’s dive into each approach separately.
‘Classic’ Machine Learning Approach Using Logistic Regression and XGB Classifier
Predictive modeling involves trying a number of different models to see which one works best, as well as fine tuning each model’s hyperparameters for the most optimal combination.
Scikit-learn’s pipeline and grid search features allow you to organise both tasks efficiently and run them in one go. In notebook 5.1, I chose to pit a Logistic Regression model against an XGB Classifier.
You can include more models in the same pipeline, or increase the number of XGB hyperparameters to tune. But the trade-off in time and resources is something you’ll have to weigh carefully. I picked the LogReg and XGB models to illustrate the extremes of the trade-offs one could encounter in these tasks.
The LogReg model took just seconds for a decent grid search, while the XGB Classifier took 7 hours (on a 6-core 2018 Mac Mini) for a grid search over 5 hyperparameters.
The XGB model gave slightly better scores, and could presumably be improved with a more exhaustive grid search. But is the 0.02 bump in scores worth the extra effort? Highly debatable in this instance.
I went ahead with the XGB model in any case, even though its performance was only slightly better than the LogReg model. Let’s have a look at the confusion matrix for the XGB’s predictions for December 2019:

The XGB model correctly classified 25 rainy/dry days out of 31, giving it an accuracy score of 0.8.
The model wrongly predicted that it would rain on 2 days, when they were in fact sunny (false positives). It also wrongly predicted 4 sunny days when it in fact rained on those days (false negatives).
The XGB model is weakest in terms of its recall score, in this case meaning its ability to correctly identify rainy days as such (10 out of 14). Can a deep learning model do better?
Keras Classifier With Grid Search
There’s a bewildering number of ways one could start experimenting with deep learning models. I wanted to start small, and see if I could integrate what I had learnt in Scikit-learn with the new techniques.
Keras popped up quickly as a good option, given the availability of two wrappers for the Scikit-learn API (for classification and regression). I also relied on two excellent online posts ([here](https://www.curiousily.com/posts/hackers-guide-to-hyperparameter-tuning/) and here) to guide my code for notebook 5.2.
The workflow is essentially similar to routine Scikit-learn approaches, though one new step required a bit of trial-and-error: Defining the function that creates and returns a Keras sequential model.
Depending on how many hyperparameters you want to tune, the structure of the function would have to adjusted accordingly. For this post, I opted to tune the number of hidden layers, the number of neurons, the optimizer, the dropout rate, the batch size and the number of epochs.
The pipeline/gridsearch construction is essentially the same, aside from the need to pass the Keras function you’ve defined to the Keras Classifier’s "build_fn" argument.
A grid search with these settings took over 18-hours on my machine (CPU-only). Adjust accordingly if you want a faster trial. Let’s have a look at the confusion matrix for the predictions by the optimised Keras Classifier:

The Keras model performed better than the XGB model by correctly predicting one more day of rainy weather, ie, it correctly classified 26 rainy/dry days out of 31, compared to 25 for the XGB model.
The model wrongly predicted that it would rain on 2 days, when they were in fact sunny (false positives). It also wrongly predicted 3 sunny days when it in fact rained on those days (false negatives).
Like the XGB model, the Keras model is also weakest in terms of its recall score, in this case meaning its ability to correctly identify rainy days as such (11 out of 14). Let’s compare the metrics for the XGB and Keras models’ predictions on the validation set (December 2019 weather data):

The Keras model clearly outperformed the XGB version, but took more than twice the time for grid search. In real-world terms, the improved performance translated to one additional day of correct weather prediction, out of 31 days that month.
I recently found out about the Keras Tuner, and was keen to see if it could deliver better results than the Scikit-learn/grid search approach.
Keras Tuner
There are at least 4 tuners to choose from, but I opted to try out just the Hyperband and RandomSearch tuners. I also opted to use the HyperModel subclass method, which made the testing of the two tuners more efficient.
I adapted my code from two helpful online posts [here](https://www.curiousily.com/posts/hackers-guide-to-hyperparameter-tuning/) and here. I picked the following parameters for tuning: number of hidden layers, the dropout rate, the learning rate and momentum. Both tuners turned up similar levels of performance:


Keras Tuner was blazingly fast, compared to the grid search process via Scikit-learn. Unfortunately, I couldn’t get accuracy levels to go beyond 0.7 despite several rounds of trial-and-error.
There’s probably a lot more to the art of using the Keras Tuner that I’m unaware of, but I’ll have to leave that to a future post with a more suitable dataset perhaps.
End Note
As many online articles have made clear, it is the nature and amount of data you are dealing with that largely determines whether you adopt the ‘classic’ Machine Learning models or use a deep learning approach.
But it is hard to get your hands on a massive real-world dataset that lends itself readily to deep learning. The time and computing resources needed for massive datasets may not be practical for beginners either.
In short, don’t let the perfect get in the way of the good. The learning process is more important than the results at this point. I picked up several good lessons while working on this project and hope that the notebooks would help anyone looking to dip their toes into deep learning.
As always, if you spot any errors in the code, ping me @
Twitter: @chinhon
LinkedIn: www.linkedin.com/in/chuachinhon
If you are interested in working on the Singapore weather data set, here are my earlier projects using the same records: