Python’s Dependency Conflicts are Depriving Data Scientists of Sleep

A guide about mastering dependencies in your personal Data Science project and finding a possible way to deal with emerging conflicts.

Manuel Treffer
Towards Data Science

--

Photo by AltumCode on Unsplash

Introduction

Sharing knowledge is the core of the blogpost project of the Master Data Science & Intelligent Analytics at the FH Kufstein Tirol where I study. Check out the program at https://fh-kufstein.ac.at/Studieren/Master/Data-Science-Intelligent-Analytics-BB.

Let us be honest: Libraries are the solution to everything! You have a problem within a project or need a special functionality which you would like to avoid coding on your own? There will probably be somebody who has had the same issue before and may have already developed a library for that! However, as so often in life, all that glitters is not gold.

Libraries are very useful, of course, but they can just as quickly lead you into a situation where you have to deal with some “Dependency Conflicts”. Sooner or later, almost every software developer will find themselves at that point. Avoiding it is a little bit easier if the project remains small, but you definitely need to be more careful as soon as it gets bigger.

It does not really matter which programming language you are writing your code in; unfortunately, it can occur in nearly every single one of them. For example, Python is a well-known programming language that comes to most people’s minds when they hear “Dependency Conflicts”.

What are Dependency Conflicts?

At first, let us start with clarifying the term ‘dependency‘ in the context of software development. A dependency is when a piece of software, for example a package or library, relies on another piece of software, usually written by others (third-party code). In general, every time you are using a new package or library in your project, you are also adding its dependencies. [1]

I will give you a simple example how dependencies may look like.

You start working on a new software project; let us simply call it APP in its current version 1. In this application you need certain functionalities, which you include by adding new libraries, LIB A (version 1) and LIB B (version 1). Both, LIB A and LIB B, are using another functionality within their code, LIB C (version 1).

Application using libraries (Image by author)

One day, LIB B gets an update to version 2, which implemented the latest version 2 of LIB C. So far, it does not affect your APPLICATION, since it is still using version 1 of LIB A and LIB B.

New versions for specific libraries (Image by author)

Dependency Conflicts emerge as soon as you want to update your APPLICATION to version 2 where you include the latest version of LIB B. Now this will lead to a problem because LIB A still uses version 1 of LIB C, whereas LIB B already depends on LIB C (version 2) — but it is only possible to use one version of LIB C at the same time.

Application updates one library, which can cause problems due to dependency issues (Image by author)

To get rid of this dependency problem, you must now include the latest version of LIB A, which hopefully depends on the same version of LIB C as LIB B does. If there is no support (yet) of LIB A for using the latest LIB C version — WELCOME TO A WORLD FULL OF DEPENDENCY CONFLICTS!

Why can a Dependency Conflict occur?

As I tried to show you in a simplified version in the section above, one reason might be the incompatibility of different libraries or packages with each other. Usually, if many different libraries and packages are in use, they were developed by a lot of different people at different points of time, and in many cases, those libraries are not maintained anymore.

You hardly have control over all those libraries. If the developer of one single library you are using decides to update theirs and add functionality or remove parts of the code (for whatever reason), this could be the start of new Dependency Conflicts.

A conflict can also occur because the third-party library may not be using well-written code, or there might be a lack of good documentation for that library. In both cases, it can be very annoying when suddenly a bug appears, but you don’t (or hardly) have the chance to get rid of it.

But can I avoid it?

In theory, sure, you can avoid it. But — especially if you are not using package managers in some programming languages, like for example ‘Anaconda’ — it is probably almost impossible to REALLY avoid Dependency Conflicts. Nevertheless, there are a few tips you may keep in mind which can help you avoiding it as good as possible. So, here we go:

Before using a library:

  • Is it somehow possible to use a package manager like ‘Anaconda’? If the answer is yes, it is highly recommended to do that.
  • Think about the problem you want to solve. How much effort would it be to code on your own? Would that be possible (now or at a later stage in development) for you to get rid of that specific library at some point?
  • If possible, check the download counter and the rating of the library. Is it a popular one that a lot of people are already using, or is it not that well-known?
  • Does the library include a lot of dependencies as well? If that is the case, it might lead to conflicts even sooner.
  • Check the documentation: has it been well written and is detailed enough for you to work with this library?
  • How many versions / updates / bugfixes have been released for this library so far? Is it well-maintained and updated regularly, or is there still version 0.0.1, which has been released ten years ago and never been touched again?

While using a library:

  • Check the manifest if you added / removed a lot of packages and libraries. Does the manifest match the packages you are really using?
  • Are there other libraries available which offer the same functionality? (Maybe that step includes a few code adaptions.) If yes, you can replace the libraries.
  • If possible, you can try to fork and fix open source libraries, push those changes and hope that the maintainer accepts the patch.

Two more tips:

  • Monitor how the library affects the performance of your application. Does it improve its performance? Or is it affecting it in a bad way? Maybe there is a more lightweight one available.
  • You can version lock your project, which definitely helps you avoiding Dependency Conflicts. However, this step means you cannot implement new features or improvements of your libraries in use anymore since you need to stick with the chosen version. [2]

How to manage your dependencies

Now you have heard a few tips you might consider during your next Data Science project. As a next step, let me tell you some possible ways about how you can manage your dependencies in a proper manner.

At first it is necessary to set up some guidelines within the team how you want to deal with dependencies. Do you want to stick with a specific version knowing it might be vulnerable at some point? Or are you taking the risk that comes with updating a dependency? [3] Set up rules every team member must follow; it will help you during the project and prevents frustration for sure!

Prioritizing your dependencies is very important as well. Not every single one is as important as another. If possible, try to update your prioritized dependency first. This has several advantages: You have a better overview of dependencies that still need to be updated and those ones you have already updated, and these updates may fix potential security issues.

Managing dependencies with Anaconda can be a little bit easier since it will have an eye on the environment you are using. Nevertheless, without taking care of your environment, it can get quite big within a short period of time.

Recap

A Dependency Conflict can occur in almost every software development project. The combination of using a lot of third-party libraries (especially libraries which may not be maintained in a proper way) in a project over a longer period of time might lead to several problems concerning different versions of those libraries due to transitive dependencies. It is really difficult to completely avoid getting into this situation, but maybe some of the tips you read about in this article help.

  • Use package managers if possible
  • Try to get rid of a specific library
  • Use popular libraries
  • Try avoiding libraries which already have a lot of dependencies
  • Check how good the documentation is
  • Use well-maintained libraries
  • Keep an eye on your manifest
  • Use other libraries with the same functionality
  • Fix open source libraries on your own
  • Use lightweight libraries if possible
  • Version lock your project

Conclusion

Libraries, packages, dependencies in general — using them is and probably will always be part of being a Developer, a Data Scientist or a Software Engineer. Usually, they can save us oodles of time compared to coding an important functionality from scratch. Dependency Conflicts might look scary at first, and it is even easier to find yourself dealing with them, but if you keep several easy tips in mind, you are able to beat the conflicts and master the dependencies!

I hope you enjoyed reading this article, and maybe it will help you a little bit in your next project. Thank you!

References

[1] Prana, G.A.A., Sharma, A., Shar, L.K. et al. Out of sight, out of mind? How vulnerable dependencies affect open-source projects. Empir Software Eng 26, 59 (2021). https://doi.org/10.1007/s10664-021-09959-3

[2] Tanabe Y., Aotani T., and Masuhara H. 2018. A Context-Oriented Programming Approach to Dependency Hell. In Proceedings of the 10th International Workshop on Context-Oriented Programming: Advanced Modularity for Run-time Composition (COP ‘18). Association for Computing Machinery, New York, NY, USA, 8–14. DOI: https://doi.org/10.1145/3242921.3242923

[3] Pashchenko I., Vu D., and Massacci F. 2020. A Qualitative Study of Dependency Management and Its Security Implications. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security (CCS ‘20). Association for Computing Machinery, New York, NY, USA, 1513–1531. DOI: https://doi.org/10.1145/3372297.3417232

--

--