
When I check complex projects on Github, I usually find myself lost between an infinite number of folders and source files. The authors of these reports see their customization pretty obvious to themselves and that is quite understandable; Sadly, I struggle to have the same perception of structuring different folders and source files to no avail.
How about demystifying some common reflexes of handling packages and modules?
In this quick tutorial, I took the time to simulate a project that has the following global structure:

And in detail, the tree-like structure of our project can be schematized as so :
We will most be interested in the content of superlibrary
:
Dataloader handles all sorts of data loaders.
Learners
contains models divided by learning type.
Preprocessor
holds all sorts of preprocessor modules with Pandas, NumPy and Scikit-Learn.
Tests
is where you centralize all your tests.
Utils
usually contains a variety of functions that perform optimizations or act as cost functions.
Every function is just a simple print function that can also serve as a whole implementation.
We will be carrying out nice and quick experiments to highlight some concepts and will be giving examples to answer questions like :
How are packages made?
How to control an imported package?
How to execute a package as the main entry function ?
How to make use of namespace packages?
Experimentation
How are packages made?
If you checked the structure of our simulated project, you would notice disseminated __init__.py
at different levels of our folders.
Each folder containing such a file is considered a package. The purpose of the __init__.py
__ files is to include optional initialization code that runs as different levels of a package are encountered.
For example, let us position ourselves in the preprocessor
folder. We will write a few lines of code that make some imports.
Let us write down the following lines in the preprocessor/__init__.py
**** file :
Let us go back a level above the preprocessor directory ( so we would view the preprocessor
, learner
, dataloader
, tests
and utils
directories ), open the Python interpreter and do the following :
>>> from preprocessor import numpy_prep, pd_prep, sk_prep
>>> numpy_prep()
This is the numpy implementation for the preprocessor.
>>> pd_prep()
This is the Pandas implementation for the preprocessor.
>>> sk_prep()
This is the sklearn implementation for the preprocessor.
>>> ...
It is important to get a sense of what was done here; the __init__.py
file in the preprocessor
directory has glued together all the needed pieces of functions that we would nee to call at a higher level.
By doing so, we provided the preprocessor
package with additional logical capabilities that saves you time and more complex import lines.
How to control imported packages?
Say, with the same hierarchical configuration as before, you want to control a certain behaviour consisting in importing everything with the very known asterisk.
You would write :
from module import * .
Let us modify the preprocessor/__init__.py
**** file by adding a new__all__
property :
The __all__
is receiving a restricted list of attributes.
Watch what happens if we run the interpreter a level above the preprocessor
directory:
>>> from preprocessor import *
>>> numpy_prep()
This is the numpy implementation for the preprocessor.
>>> pd_prep()
This is the Pandas implementation for the preprocessor.
>>> sk_prep()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'sk_prep' is not defined
>>> ...
The sk_prep
function was omitted and not included in the __all__
property. This shields the original package from which sk_prep
was loaded from any unintended behaviour.
Despite the fact that using from module import *
is generally discouraged, it is nonetheless often used in modules that declare a large number of names.
By default, this import will export all names that do not begin with an underscore. If __all__
is specified, however, only the names that are explicitly stated will be exported.
Nothing is exported if __all__
is defined as an empty list. If __all__
includes undefined names, an AttributeError
is thrown on import.
How to execute a package as the main entry function?
You are undoubtedly familiar with writing something like this :
If __name__ == "__main__" :
...
How about taking it to another level? Would it be possible to run the learner
package as a main module ?
Let us move to the learner
folder :
What we would like to do next is to ensure that when importing each of the three packages, we simultaneously import the functions associated with each learning type.
For the clustering
package, we write the next lines of code in the clustering/__init__.py
file :
Same for supervised_learning
**** package :
And for reinforcement_learning
package :
We are now able to directly load all the necessary functions into a new implementation within the learner directory.
We create a new file which we name __main__.py
:
We go back to the main superlibrary
directory, so we would be a level above learner
‘s directory and run the following :
Username@Host current_directory % python learner
Methods for clustering :
This is the implementation for model1.
This is the implementation for model2.
Methods for reinforcement learning :
This is the implementation for model1.
This is the implementation for model2.
Methods for supervised learning :
This is the implementation for model1.
This is the implementation for model2.
Hurray! The __main__
attribute seems to do a lot more than what you’re used to. It can go beyond a simple file and take control of a whole package so you can either import it or run it.
How to make use of namespace packages?
Coming up to our last section, suppose you slightly modify the utils folder’s content to be:
So you have in each subpackage
a new module named modules
**** that has no __init__.py
initialization file.
Maybe we should have talked about this type of scenarios that you may encounter and whose behaviour is highly questionable from the very beginning. A folder that has no __init__
file acts almost similarly as a package but is not considered as such.
More technically, it is considered a namespace
**** package.
It turns out that we can achieve interesting things with that.
You can create some common namespace contained in each of the backend_connectors
and custom_functions
packages so that modules
will act as a single
module ..
Not convinced enough? Let us write something in the interpreter a level above the two previously mentioned packages :
>>> import sys
>>> sys.path.extend(['backend_connectors','custom_functions'])
>>> from modules import cloud_connectors, optimizers
>>> cloud_connectors
<module 'modules.cloud_connectors' from 'path_to_current_directory/cloud_connectors.py'>
>>> optimizers
<module 'modules.optimizers' from 'path_to_current_directory/optimizers.py'>
>>> ...
By adding the two packages in the module search path, we take profit from a special kind of package designed for merging different directories of code together under a common namespace, as shown. For large frameworks, this can be very useful.
Here’s a tip on how Python perceives modules
when it is imported.
>>> import modules
>>> modules.__path__
_NamespacePath(['backend_connectors/modules','custom_functions/modules'])
>>> modules.__file__
>>> modules
<module 'modules' (namespace)>
>>> ...
You observe a joint namespace path, an absent __file__
attribute which would have been associated with the path of its __init__
file had it had one, and a clear indication of its type: a namespace.
To fully make the best out of namespace packages, and particularly in this case, you must not include any __init__.py
file in either of the modules
directories. Suppose you actually add them :
Watch closely what happens when you try to merge the modules
‘ directories :
>>> import sys
>>> sys.path.extend(['backend_connectors','custom_functions'])
>>> from modules import cloud_connectors, optimizers
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: cannot import name 'optimizers' from 'modules' (path_to_project_directory/superlibrary/utils/backend_connectors/modules/__init__.py)
...
As expected, Python is unable to load your functions when you treat ordinary packages as namespace packages.
Conclusion
I hope this content would provide you with a clearer understanding of what a moderately complex project would have as a structure. The subject is topical even though it lacks passion among the community to be shared and written about. There is so much to learn about how Python handles packages and modules.
The present material aims to sensitize the readers to the cruciality of putting one’s project in a readable hierarchy such that every other user feels confident about using it.
Thank you for your time.