Creating, using and deploying a flexible custom estimator through PyCaret

Let’s agree, Pycaret is great. It does a lot for you in such a short time. But sometimes things are just not enough, and you want to bring your homegrown solution that is more suitable to your problem, or a solution that is available elsewhere and you want PyCaret to pick it up. This is what we will try to implement today, and we will do it in two parts.
In part1, we will learn : 1οΈβ£ How to make a simple estimator (model) using Python‘s class object 2οΈβ£ How to use Scipy’s curve_fit for our advantage 3οΈβ£ How to make curve_fit more flexible & agile for further integration
In part2, we will learn: 4οΈβ£ How to make custom estimator sklearn compatible 5οΈβ£ How to integrate with PyCaret
For this exercise we will create a regression estimator, since the idea remains the same, you can create any type of estimator you want. Let’s start building
π A Simple Estimator:
We will make the estimator using Python’s class object. Classes are like blueprints. You can create many independent objects from a class just like you can build many houses from a blueprint. For example, "LogisticRegression" imported from "sklearn.linear_model" is a class and when we ‘instantiate’ this class by setting cls = LogisticRegression(), we build an object of LogisticRegression. One more thing, the functions inside the class are called ‘method’. For example, fit() / predict() are the functions inside the LogisticRegression class, and we can use them by simply calling cls.fit() etc. In our example, we will have a class with the fit & predict method. Later on, we will add another method, called score. You can learn more about the classes here
The first estimator we are going to build is a simple mean estimator and then will jump to a more practical one. This estimator will simply predict the mean. We will ‘estimate’ that ‘mean’ from the training data. Here is how we do it (pay attention to the comments, I try to explain a lot while coding π» ) :
Now that we have made a simple estimator, we can move on to a more specific case that needs a bit more complicated algorithm.
π Scipy’s curve_fit :
In my line of business, we are often required to predict the quantity of a product by using prices. When we visualize the price and quantity information, we generally observe an elastic behaviour, i.e. as the price of a product goes up, the quantity goes down. However, this behaviour is exhibited at certain price points. At extreme price points, the quantity is inelastic / not responsive to further change in the price. On the graph, this looks like an exponential curve.

In this scenario, we often need to generalize this behaviour by fitting a curve that is of exponential shape. The exponential equation would look like this:

We can use this equation, through scipy’s (optimize module) _curve_fit_ function, and hope to find /solve for the coefficients a & b. The process is pretty simple. We define our custom function, and then we pass the function into the arguments (along with some other parameters) of the curve_fit function. For continuity purposes, I will keep using the toy data we used earlier. Price and quantity illustration was just to prove the point of using a custom equation. We will pick one feature and try to fit it against the label. Later on, we will repeat the same exercise for multiple features. This is how we will do a curve fit using an exponential function with only one feature(again, please read the comments in the code!):
π A more practical estimator:
So far so good π , but did you notice something? if you guessed that it is not practical, you are right on!. In real life, there are many more features available, and while experimenting, you will add or subtract different features/columns. If we write the code the way we have written above, it will be a nightmare to add/remove coefficients every time we experiment with a new feature. We want to be able to flexible enough to provide any number of coefficients (features) without having to change the function. Let’s try to solve this problem. We want a programmable function that can fit an exponential curve with n number of features

Packing/unpacking with positional arguments (args) is the answer. It is a very useful tool to handle such situations. You can read more about positional arguments here. In our exp_curve function(after argument x) we will allow the user to pass any number of coefficients through args (in fact, name after doesn’t really matter, it’s the that does the magic). Inside the function, *args will result in a tuple (with the name of args) of coefficients and we will use a loop to construct our formula. After all, much of our formula is nothing more than a sequential sum product of each other. We will use Enumerate function, which will return the coefficient along with its position/index in the tuple. Check the code below to get an understanding of what is happening when we use positional arguments unpacking & enumerate functions together.
There is another subtle but important point. In our equation, the number of coefficients will be equal to the number of features +1. The extra one is the intercept ‘a’, and it will always be at index/position 0. Now let’s see a new function once again:
In plain words, we start with setting y_=0 ,
pick up the first coefficient a
from args[1:0]
, multiply it with the first column X[:,i]
and then update y_
. We move on to the next iteration of the loop, get the product of the next coefficient and feature column, add it to the y_
keep in mind the exponential equation). Once we have gone through all the coefficients /features, we simply multiply the very first coefficient (intercept) args[0]
with the exponential of y_
and save it as y
.
We can further improve this function by using matrix multiplication. That can drastically enhance the speed. For this part, I will keep using the iterative approach, and in the next part will switch to matrix multiplication.
If you have come this far and understand what has been done, give yourself and me a pet on the back π . The rest of the stuff is easy. All we need is to use our custom function & the curve_fit function in the class we made earlier.
That’s enough for this episode. In this part, we learnt about _estimators, python class object, exponential function,curve_fit function, positional arguments packing /unpacking, enumerate function & finally built a more customized & flexible regression estimator_.
In the next episode, we will make this estimator sklearn compatible, and then will integrate it with pycaret. Stay tuned!
βββββββββββββββββββββββ You can follow me on medium & connect with me on LinkedIn & visit my GitHubβββββββββββββββββββββββ
You may also be interested in:
π Make your data science life easy with Docker π Custom Estimator With PyCaret, Part 2