The world’s leading publication for data science, AI, and ML professionals.

Dataclass – Easiest Ever Object-Oriented Programming In Python

A Python built-in decorator reduces the complexity and length of your code

Image by wal_172619 from Pixabay
Image by wal_172619 from Pixabay

Object-oriented Programming (OOP) in Python is always one of the hot topics. This is because Python is famous because its flexibility and out-of-the-box features which can reduce the development efforts to a large extent, which is also true for OOP.

I have written several articles about using Python in an object-oriented scenario. Here are some of them for your reference.

Both of them need to leverage 3rd party libraries, which are still great solutions though.

In this article, I’ll introduce a Python built-in module – Dataclass. It was introduced in Python 3.7, which enables developers to code in an object-oriented manner without any 3rd party libraries.

1. Why Dataclass

Image by ElasticComputeFarm from Pixabay
Image by ElasticComputeFarm from Pixabay

The first question that I need to answer is probably why we need to use Dataclass? What’s wrong with the normal Python classes? Let’s consider such as scenario.

We are going to implement an application using Python. Suppose we need to write a "Person" class to hold some attribute about a person in the objects. We could write a class as follows.

This is already simplified with only 3 attributes. So, we can instantiate a person as follows.

p1 = Person('Christopher', 'Tao', 34)

We have to implement the __repr__() method because we want to print the object easily for debugging purposes.

We also want to implement the __eq__() method because we want to compare the objects to determine whether they are the same "person".

p2 = Person('Christopher', 'Tao', 34)
p1 == p2 # compare the two persons

Also, we need some customised features in the Person class such as the greeting() method.

It is good enough compare to most other programming languages. This is already concise. However, some of the things we were doing are still quite regular, which can be skipped under the guide of the "Zen" of Python.

Now, let’s have a look at how Dataclass can improve this. We just need to import dataclass and it’s built-in with Python 3.7 and above.

from dataclasses import dataclass

Then, we can use Dataclass as a decorator when we define a class.

@dataclass
class Person:
    firstname: str
    lastname: str
    age: int
    def greeting(self):
        print(f'Hello, {self.firstname} {self.lastname}!')

That’s it. All done. You can expect that the data class that we’ve just defined using the decorator has all the features that we have defined in the previous normal class. Yes, except for the greeting() method which is a true customised class method, all the others are automatically generated behind the scene. We can test with the same code (I’ll not attach the testing demo code twice) and get the same result.

So, the immediate answer is that Python Dataclass will automatically implement the __init__(), __repr__() and __eq__() methods for us.

2. Out-of-the-box Utilities

Image by free stock photos from www.picjumbo.com from Pixabay
Image by free stock photos from www.picjumbo.com from Pixabay

Apart from above-mentioned that are the basic benefits, Dataclass also provides some very convenient utilities. I will not go through each of them, but some examples will be demonstrated here.

Once we have defined a data class, we can leverage some tools from the dataclasses package. So, we will need to import it and probably give it an alias name for our convenience.

import dataclasses as dc

Then, we can retrieve the fields for a defined data class using the fields() method. Not only the class definition, but it also works with the instance.

dc.fields(Person)
dc.fields(p1)

Since these are "data classes", it will be quite common to serialise them into JSON objects. This usually needs 3rd party libraries in other programming languages such as Java. However, with the Python Dataclass, it is as easy as calling a built-in method. We can get a Python dictionary from a data class object.

dc.asdict(p1)

If we are only interested in the values of the fields, we can also get a tuple with all of them. This will also allow us to convert it to a list easily.

dc.astuple(p1)

Sometimes, we may want to define many classes, and some fields or methods might be parameterised. This is usually done with complex syntax in other programming languages such as the reflection in Java. However, in Python, we can use the make_dataclass() method to generate as many as we want.

Here is an example to generate a "Student" class using the method.

Then, we can use this class just like the other data classes.

s = Student('Christopher', 'Tao', '10001')
print(s)
s.greeting()

3. Customised Class Annotation

Image by cromaconceptovisual from Pixabay
Image by cromaconceptovisual from Pixabay

Usually, these kinds of features only satisfy very common use cases. When we have some special requirements, it may force us to go back to using the normal solution. However, that’s not always the case in Python, and so does Dataclass.

Dataclass allows us to annotate the class decorator to customise the behaviours.

Enabling the Comparison

It is great that Dataclass automatically implements the __eq__() method for us, but how about other comparing methods? In other words, we also need the __lt__(), __gt__(), __le__() and __ge__() methods.

We can easily have all of them automatically implemented too, by simply adding a flag order=True to the decoration.

@dataclass(order=True)
class Person:
    name: str
    age: int
p1 = Person('Alice', 30)
p2 = Person('Bob', 56)

The logic will use the first field as the criterion to compare the objects. So, for easily using the order annotation to automatically generate all the comparing methods, we can put the "age" field in the front. Then, the persons can be compared by their ages as follow.

Immutable Fields

Sometimes we may want to have the attributes of a data object not changeable. In this case, we can "freeze" the fields by adding the flag frozen=True in the decorator.

@dataclass(frozen=True)
class Person:
    name: str
    age: int
p1 = Person('Chris', 34)
print(p1)

Then, if we try to modify the attribute, an error will be thrown.

p1.name = 'Christopher'

4. Customised Field Annotation

Image by NOST from Pixabay
Image by NOST from Pixabay

Not only at the class level, but the fields in a data class can also be annotated. So, we can add some customised behaviours for them.

Default Value and Default Factory

We can give an attribute a default value. If it is not given during the initialisation, the attribute will be assigned with the default value.

Also, the default "value" is not limited to a value, it can be a function too.

@dataclass
class Employee:
    firstname: str
    lastname: str
    skills: list = dc.field(default_factory=list)
    employee_no: str = dc.field(default='00000')

In the above Employee class, the employee number will be "00000" if it is not given. The skills list will also be initialised if it is not given during the initialisation.

e1 = Employee('Christopher', 'Tao')
print(e1)

If we want to add some skills to this employee later, we can append the skill list rather than having to check whether it had been initialised or not.

e1.skills += ['Python', 'Writing']
print(e1)

Excluding Fields

Sometimes, we may not want all the fields to go to the __init__() method. In a normal class, we just don’t add them to the method. In a data class, we need to mark it as init=False if we don’t want to include them.

@dataclass
class Employee:
    firstname: str
    lastname: str
    test_field: str = dc.field(init=False)

Then, we can create an object without providing the value of the 3rd field as follows.

e1 = Employee('Christopher', 'Tao')

However, there will an issue. That is, the test_field attribute will still be implemented in the __repr__() method.

Therefore, we need to add another flag to exclude it from there as well.

@dataclass
class Employee:
    firstname: str
    lastname: str
    test_field: str = dc.field(init=False, repr=False)
e2 = Employee('Christopher', 'Tao')
print(e2)

In some cases, we may still want a field in the __init__() method, but just want to exclude it when we print the object. To achieve this, we just need the repr flag only.

@dataclass
class Employee:
    firstname: str
    lastname: str
    test_field: str = dc.field(repr=False)
e3 = Employee('Christopher', 'Tao', 'test value')
print(e3)

5. Post-Initialisation

Image by Photo Mix from Pixabay
Image by Photo Mix from Pixabay

As the last feature that I want to introduce, it allows us to customise the behaviours of a data class after the initialisation has been done.

Suppose that we want to define a class for rectangles. So, it needs to have height and width. We also want to have the area of a rectangle, but obviously, this can be derived from the other two attributes. Also, we want to compare the rectangles by their areas.

To achieve these, we can have a __post_init__() method implemented in the data class as follows.

@dataclass(order=True)
class Rectangle:
    area: float = dc.field(init=False)
    height: float
    width: float
    def __post_init__(self):
        self.area = self.height * self.width

The post init method will be executed once the object is created. We can test if it works.

r1 = Rectangle(2,4)
print(r1)

The reason why I want to put the area field in the first position is to let it becomes the comparing criterion. So, the rectangle objects can be compared by their area.

Rectangle(1,8) > Rectangle(2,3)

Summary

Image by Julius Silver from Pixabay
Image by Julius Silver from Pixabay

In this article, I have introduced the Dataclass module in Python. It is built-in since version 3.7, which can reduce the complexity of our code to a large extent and expedite our development a lot.

The Dataclass tries to generalise the common requirements of data classes and provide the out-of-the-box, but it also provides class-level and field-level annotations which allow us to customise the behaviours. Apart from that, the post init method gives us more flexibility.

Join Medium with my referral link – Christopher Tao

If you feel my articles are helpful, please consider joining Medium Membership to support me and thousands of other writers! (Click the link above)

Unless otherwise noted all images are by the author


Related Articles