Polymorphism: Understand the Concept Behind Creating Custom Classes like Custom Transformers in Scikit-Learn

Ever wondered why some Python classes call methods out of no-where? Or implement some methods just to pass?

Arafath Hossain
Towards Data Science

--

If you have ever encountered a Scikit-Learn custom transformer, you are very likely familiar with this phenomenons above. Well, if that’s the case this article is for you. We will dive into the concept called Polymorphism that enables such behavior, and we will build some custom classes to get some hands on experience and closer understanding.

Photo by Meagan Carsience on Unsplash

Scikit-learn transformers are a great set of tools to set up a pipeline for data preparation in production. Though the built-in transformer list is pretty exhaustive, building your custom transformer is a great way to automate custom feature transformation and experimentation. If you have ever worked with the scikit-learn transformer, you have very likely encountered the commonly used pattern of:

# defining a custom transformer
class CustomTransformer(BaseEstimator, TransformerMixin):
def fit(self, X, y=None):
return self
def transform(self, X):
......
# calling fit_transform()
customTransformer = CustomTransformer()
data = customTransformer.fit_transform(data)

But if you come from a non-programming background, it may seem a bit puzzling that the fit_transform() method wasn’t defined in the CustomTransformer class yet it was callable from that class. Moreover, if you have figured out that the method came from one of the classes above, yet it may seem puzzling to think how it can use methods that were not defined inside the same class where it belongs. For example, checkout the script of TransformerMixin class in the official GitHub repository here, you won't find any methods defined for fit() or transform() inside the TransformerMixin class.

In this article we will try to understand the concepts : Polymorphism, and Duck typing, that enable these behaviors. We will also do some hands on exercises to deepen our understanding.

What is Polymorphism?

In general terms Polymorphism means the ability of an object to take different forms. A concrete example would be when objects of different classes would contain the same methods but exhibit different behavior.

For example, in Python we can run operations like 1+2 or 'a' + 'b' and get results of 3 and ab respectively. Behind the scene Python calls a magic method called __add__() that's already implemented in the string and integer classes. For detail of how Python converts these core syntax into special methods check out my last post on Python Core Syntax.

This magic method — __add__() is an example of polymorphism - where it's the same method but depending on from which class object it's called, it adjusts its behavior from summing up numbers versus concatenating strings.

In Python class context we can achieve polymorphism in two ways: Inheritance, and Duck Typing.

Polymorphism Through Inheritance

Inheritance in the Object Oriented Programming context, means when we inherit or receive class properties from another class. The class that we inherit from is called Superclass and where we inherit the properties are called Subclass. Since the focus of this write up is not inheritance, we will jump into an examples and hopefully the concept will make sense as we go.

But if they don’t or you need a quick refresher feel free to read my previous post on understanding Inheritance and Subclass.

For our example, We will create a superclass called InheritList and three subclasses: DefaultList, EvenList and OddList to run examples of inheritance, and polymorphism.

Example Superclass01
Example Subclasses

Inheritance

In the above code block, notice we didn’t implement any methods inside the DefaultList class. And notice in the following code block that how yet we could call the methods (e.g. add_value(), get_list()) from the instance created from the class. Because DefaultList subclass inherited these methods from its superclass - InheritList. This is inheritance at play.nums = [1, 2, 3, 4, 5]

defaultNumList = DefaultList()[defaultNumList.add_value(i) for i in nums]print(f"List with all added values: {defaultNumList.get_list()}")​# removes the last item from the listdefaultNumList.remove_value()print(f"List after removing the last item: {defaultNumList.get_list()}")>>List with all added values: [1, 2, 3, 4, 5]
>>List after removing the last item: [1, 2, 3, 4]

The above example shows the basic inheritance — we get all the properties from the superclass and use them as they are. But we could change or update the methods that were inherited inside the subclass like we did in the other two subclasses — EvenList, and OddList.

Method Overriding

In EvenList, and OddList classes we modified the remove_value() method so that EvenList class would remove all the odd values and OddList would remove all the even values from the built list. By doing so we will introduce polymorphism — where remove_value() would behave differently in two cases.

Demo: Method Overriding
>>evenNumList with all the values: [1, 2, 3, 4, 5]
>>evenNumList after applying remove_value(): [2, 4]

>>oddNumList with all the values: [1, 2, 3, 4, 5]
>>oddNumList after applying remove_value(): [1, 3, 5]

Polymorphism Through Duck-Typing

Before going detail into Duck typing, let’s talk about another method do_all() that was implemented in the superclass - InheritList. Which takes a value as an input, adds it to the list, remove the unwanted values from the list, and return the final list. To accomplish all these tasks, it depends on other internal methods: add_value(), remove_value(), and get_list(). Check out the demo below.

print(f"evenNumList after calling do_call(58): {evenNumList.do_all(58)}")print(f"oddNumList after calling do_call(58): {oddNumList.do_all(55)}")>>evenNumList after calling do_call(58): [2, 4, 58, 58]
>>oddNumList after calling do_call(58): [1, 3, 5, 55, 55]

But Python allows us to implement this more flexibly. For example, we could’ve removed the remove_value() method entirely from the superclass, create a separate class with only the combine_all() method, and yet be able to use it without any problem. All thanks to Duck Typing!

Basically, we don’t care if the dependency properties come from the same class or not. We are good as long as the dependency properties are available. Which basically reflects the widely used quote to represent duck typing:

“If it walks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.”

To demonstrate, let’s create a new class called ComboFunc with only one method - combine_all() which will perform the same functionality as do_all() method. Also, let’s create a new subclass that will have one of the previously created subclass — EvenList and this new class as superclasses.

Notice we didn’t define any of the dependency methods (add_value(),remove_value() and get_list()) inside neither of the classes. And yet we will be able to successfully call the combine_all() method from an instance of GenDuckList class. Because the dependency methods will be inherited from the EvenList class and combine_all() method doesn't care about where they come from, as long as they exist.

>>Initial list: [1, 2, 3, 4, 5]
>>Final list: [2, 4, 40]

Notice that we could accomplish the above task in these other ways too,

  1. We could also totally avoid inheriting anything from the EvenList class and implement the dependency methods inside the class if we needed something customized. Or,
  2. We could leave it as a superclass and yet override any specific dependency methods to make it more customized. Overall, polymorphism let us become more flexible and re-use already implemented methods easily. Or,
  3. We we could remove remove_value() from the superclass and implement it inside our GenDuckList class and yet be able to perform the same tasks.

So to complete the circle, when we build a custom transformer in scikit-learn using BaseEstimator, and TransformerMixin classes as superclasses, we basically apply duck typing to implement polymorphism. To relate, you can think of the GenDuckList as a dummy custom transformer class, ComboFunc as a dummy TransformerMixin class, and EvenList as a dummy BaseEstimator class. The implementation level difference between the above duck typing example and the transformer example at the beginning is that we inherited the remove_value() method from a superclass whereas in custom transformer we define it inside the custom class — the 3rd alternaive way noted above.

Thanks for reading the article. Hopefully it helped you understand the concepts of polymorphism in a Python class context. If you liked the article please consider following my profile to get notifications about my future articles.

--

--