The world’s leading publication for data science, AI, and ML professionals.

Demystifying Modules and Packages in Python

Modules and packages are the core of any large project. I will be showing some tips involving them such as how to organize packages and…

Photo by Alvaro Reyes from Unsplash
Photo by Alvaro Reyes from Unsplash

When I check complex projects on Github, I usually find myself lost between an infinite number of folders and source files. The authors of these reports see their customization pretty obvious to themselves and that is quite understandable; Sadly, I struggle to have the same perception of structuring different folders and source files to no avail.

How about demystifying some common reflexes of handling packages and modules?

In this quick tutorial, I took the time to simulate a project that has the following global structure:

Image By Author
Image By Author

And in detail, the tree-like structure of our project can be schematized as so :

We will most be interested in the content of superlibrary:

Dataloader handles all sorts of data loaders.

Learners contains models divided by learning type.

Preprocessor holds all sorts of preprocessor modules with Pandas, NumPy and Scikit-Learn.

Tests is where you centralize all your tests.

Utils usually contains a variety of functions that perform optimizations or act as cost functions.

Every function is just a simple print function that can also serve as a whole implementation.

We will be carrying out nice and quick experiments to highlight some concepts and will be giving examples to answer questions like :

How are packages made?

How to control an imported package?

How to execute a package as the main entry function ?

How to make use of namespace packages?

Experimentation

How are packages made?

If you checked the structure of our simulated project, you would notice disseminated __init__.py at different levels of our folders.

Each folder containing such a file is considered a package. The purpose of the __init__.py __ files is to include optional initialization code that runs as different levels of a package are encountered.

For example, let us position ourselves in the preprocessor folder. We will write a few lines of code that make some imports.

Let us write down the following lines in the preprocessor/__init__.py **** file :

Let us go back a level above the preprocessor directory ( so we would view the preprocessor, learner, dataloader, tests and utils directories ), open the Python interpreter and do the following :

>>> from preprocessor import numpy_prep, pd_prep, sk_prep
>>> numpy_prep()
This is the numpy implementation for the preprocessor.
>>> pd_prep()
This is the Pandas implementation for the preprocessor.
>>> sk_prep()
This is the sklearn implementation for the preprocessor.
>>> ...

It is important to get a sense of what was done here; the __init__.py file in the preprocessor directory has glued together all the needed pieces of functions that we would nee to call at a higher level.

By doing so, we provided the preprocessor package with additional logical capabilities that saves you time and more complex import lines.

How to control imported packages?

Say, with the same hierarchical configuration as before, you want to control a certain behaviour consisting in importing everything with the very known asterisk.

You would write :

from module import * .

Let us modify the preprocessor/__init__.py **** file by adding a new__all__ property :

The __all__ is receiving a restricted list of attributes.

Watch what happens if we run the interpreter a level above the preprocessor directory:

>>> from preprocessor import *
>>> numpy_prep()
This is the numpy implementation for the preprocessor.
>>> pd_prep()
This is the Pandas implementation for the preprocessor.
>>> sk_prep()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'sk_prep' is not defined
>>> ...

The sk_prep function was omitted and not included in the __all__ property. This shields the original package from which sk_prep was loaded from any unintended behaviour.

Despite the fact that using from module import * is generally discouraged, it is nonetheless often used in modules that declare a large number of names.

By default, this import will export all names that do not begin with an underscore. If __all__ is specified, however, only the names that are explicitly stated will be exported.

Nothing is exported if __all__ is defined as an empty list. If __all__ includes undefined names, an AttributeError is thrown on import.

How to execute a package as the main entry function?

You are undoubtedly familiar with writing something like this :

If __name__ == "__main__" :
...

How about taking it to another level? Would it be possible to run the learner package as a main module ?

Let us move to the learner folder :

What we would like to do next is to ensure that when importing each of the three packages, we simultaneously import the functions associated with each learning type.

For the clustering package, we write the next lines of code in the clustering/__init__.py file :

Same for supervised_learning **** package :

And for reinforcement_learning package :

We are now able to directly load all the necessary functions into a new implementation within the learner directory.

We create a new file which we name __main__.py:

We go back to the main superlibrary directory, so we would be a level above learner‘s directory and run the following :

Username@Host current_directory % python learner 
Methods for clustering :
This is the implementation for model1.
This is the implementation for model2.
Methods for reinforcement learning :
This is the implementation for model1.
This is the implementation for model2.
Methods for supervised learning :
This is the implementation for model1.
This is the implementation for model2.

Hurray! The __main__ attribute seems to do a lot more than what you’re used to. It can go beyond a simple file and take control of a whole package so you can either import it or run it.

How to make use of namespace packages?

Coming up to our last section, suppose you slightly modify the utils folder’s content to be:

So you have in each subpackage a new module named modules **** that has no __init__.py initialization file.

Maybe we should have talked about this type of scenarios that you may encounter and whose behaviour is highly questionable from the very beginning. A folder that has no __init__ file acts almost similarly as a package but is not considered as such.

More technically, it is considered a namespace **** package. It turns out that we can achieve interesting things with that.

You can create some common namespace contained in each of the backend_connectors and custom_functions packages so that modules will act as a single module ..

Not convinced enough? Let us write something in the interpreter a level above the two previously mentioned packages :

>>> import sys
>>> sys.path.extend(['backend_connectors','custom_functions'])
>>> from modules import cloud_connectors, optimizers
>>> cloud_connectors
<module 'modules.cloud_connectors' from 'path_to_current_directory/cloud_connectors.py'>
>>> optimizers
<module 'modules.optimizers' from 'path_to_current_directory/optimizers.py'>
>>> ...

By adding the two packages in the module search path, we take profit from a special kind of package designed for merging different directories of code together under a common namespace, as shown. For large frameworks, this can be very useful.

Here’s a tip on how Python perceives modules when it is imported.

>>> import modules
>>> modules.__path__
_NamespacePath(['backend_connectors/modules','custom_functions/modules'])
>>> modules.__file__
>>> modules
<module 'modules' (namespace)>
>>> ...

You observe a joint namespace path, an absent __file__ attribute which would have been associated with the path of its __init__ file had it had one, and a clear indication of its type: a namespace.

To fully make the best out of namespace packages, and particularly in this case, you must not include any __init__.py file in either of the modules directories. Suppose you actually add them :

Watch closely what happens when you try to merge the modules‘ directories :

>>> import sys
>>> sys.path.extend(['backend_connectors','custom_functions'])
>>> from modules import cloud_connectors, optimizers
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: cannot import name 'optimizers' from 'modules' (path_to_project_directory/superlibrary/utils/backend_connectors/modules/__init__.py)
...

As expected, Python is unable to load your functions when you treat ordinary packages as namespace packages.

Conclusion

I hope this content would provide you with a clearer understanding of what a moderately complex project would have as a structure. The subject is topical even though it lacks passion among the community to be shared and written about. There is so much to learn about how Python handles packages and modules.

The present material aims to sensitize the readers to the cruciality of putting one’s project in a readable hierarchy such that every other user feels confident about using it.

Thank you for your time.


Related Articles