Control AI Costs Through Agile Data Science Project Management

A blueprint for running an agile data science organization

Published in

Towards Data Science

11 min readDec 8, 2023

Introduction

The world of data science is complex, with hidden costs that go beyond budgetary limits. Data scientists are a significant investment for any organization. Unfortunately, inefficiencies like idle infrastructure can waste significant amounts of data infrastructure investments. Agile methodologies offer a solution, improving workflow and cutting down on wasted time. With agile, the traditional data science process becomes optimized and adaptable, delivering value more efficiently. This article explores these hidden costs and demonstrates how agile practices can make your data science initiatives more cost-effective.

Section 1: The Hidden Costs of Data Science

Data scientists, with their intricate knowledge and expertise in handling data, are a valuable resource, and their productivity is paramount. The less time data scientists spend on innovation and more on tedious tasks, the greater the expense without the payoff. In addition, the tendency for data scientists to work on their own machines as not to be restricted by central IT or to stand up parallel “shadow IT” capacity makes knowledge discovery burdensome and often leads to reinventing the wheel scenarios.

Waste can come in many forms. The Boston Consulting Group found that only 44% of models make it to production, and a significant portions of a data scientist’s time per day can be wasted on menial tasks like IT setup. Additionally, when data scientists are hard at work, infrastructure costs add up quickly. When they’re toiling away and distracted from innovation, data infrastructure investments can actually become idle, always-on, and over-provisioned. Finally, moving data into and out of the cloud also get expensive at AI data scales. As a result, cloud costs become difficult to manage across multiple stacks, silos, and environments.

Machine learning — Generative AI in particular — necessitates tremendous volumes of cloud compute and expensive GPUs. In 2023, prominent models like ChatGPT cost organizations like OpenAI around $700,000 per day in computing costs (SemiAnalysis in the Washington Post [1]). By one estimate, ChatGPT required 1000’s of GPUs and months of training before it was ever deployed [2].

The struggle persists. About 56% of data science leaders need help scaling their data science projects properly (BCG). For example, data spread across multiple cloud platforms not only inflates storage costs but also makes it difficult to access and share data across teams. This fractured approach can further strain budgets, and undermine the collaboration and efficiency that is essential in the data science lifecycle. How can we transform these stumbling blocks into stepping stones? The answer may lie in embracing agile methodologies and a structured process design.

Section 2: Process Design and Agile Methodology in Data Science

Today, when efficiency and adaptability are key, agile methodologies are an increasingly relevant part of data science projects. Agile processes embrace adaptability, collaboration, and iterative development, all of which can significantly impact the cost efficiency of a project across the entire data science lifecycle. A typical Data Science project is a good fit for agile practices as it innately exhibits key traits of the agile management approach:

Incremental and iterative development — data science products are built incrementally. The majority of commonly adopted frameworks used for managing data science projects have strictly defined phases. For example, CRISP-DM uses Business Understanding, Data Understanding, Data Preparation, Modelling, and Evaluation.
Focus on values — predictive models, but also data science in general, is intrinsically value focused as model recommendations and insights directly drive business decisions.
Empowered team — the data science team achieves peak productivity when they are allowed to prioritise and organise work within the team. This includes selection of specific models, tools, frameworks, computational resources, programming languages etc.
Continuous learning — this is another important principle of agile. When we start working on a model we have a certain vision and we start building a product (model, report etc.) based on this vision. After the first iteration, or after one of the project phases (e.g. exploratory data analysis) we have gained additional knowledge about the problem, which enables us to also adjust the vision accordingly.

Data science projects often mandate interplay between phases. For example, poor model results may prompt revisiting data collection to amass data with better predictive power. The agile methodology embraces this cyclic nature, allowing teams to adapt and refine processes.

Here’s a brief overview of how an agile process could look for a typical data science project:

Business Case: Define the problem and potential impact.
Data Collection and Initial Analysis: Collect, analyze, and validate data.
Modeling / Exploratory Data Analysis: Develop and test models.
Operationalization: Deploy the models into production.
Monitoring and Analysis: Continuously monitor, analyze, and refine the models.

Project management tools like Jira enable agile methodologies to take different forms. If your data science platform uses projects to organise units of work and your workflow uses Epics with child issues like Tasks, Stories, and Bugs, linking the Epic issue to your project can streamline both the development process and the tracking of progress/workload.

For complex projects where different teams handle different stages, it might be more efficient to create projects that link to Task tickets. Each ticket represents a single stage or a group of stages, ensuring a better alignment with intricate workflows.

Section 3: Infrastructure Costs and Control

Infrastructure management is pivotal but often underemphasized in data science. The complexities involved in setting up and managing data science environments can lead to substantial hidden costs, particularly when resources are underutilized. When investments sit idle, always-on, and over-provisioned, these expenses quickly accumulate, and reduce opportunities to direct valuable resources towards more productive avenues.

Machine learning models, particularly deep learning, require an immense amount of computational resources — high-end GPUs and cloud compute instances — and the cost can be staggering.Additionally, commercial platforms might have markups that drive the price even higher. A strategic approach to infrastructure planning and investment, balancing the need for cutting-edge technology with the imperative of cost control.

This problem not only consumes financial resources but also leads to a loss in potential productivity and an efficiency bottleneck as resources are poorly allocated for use by multiple teams. Sadly, this form of waste isn’t always apparent and often requires meticulous tracking and management to detect and mitigate. Leveraging agile strategies can unlock more significant value from data science investments, turning potential waste into productivity and innovation. It also creates a paper trail for monitoring costs, resource utilization, and ultimately facilitates the calculation of ROI for individual data science projects.

Section 4: Scaling, Data Management, and Agile Workflow

Scaling data science projects is a monumental and often underestimated task. According to industry reports, only 56% of data science projects ever exceed the experimental stage to deliver business value. One significant factor is the ballooning costs associated with data storage and management, but also costs from a variety of hardware and software solutions. However, adopting agile practices can serve as a lifeboat in this rising tide of expenses.

An agile workflow, characterized by iterative development and feedback loops, allows data science teams to pinpoint storage inefficiencies. For example, redundant data sets that can often be avoided through iterative sprints that focus on data consolidation. By incrementally building on previous work and reusing data and code, an agile workflow minimizes the need for additional storage resources.

Moreover, agile practices like version control and feature branching enable efficient data management. Proper versioning makes it easier to roll back to previous states of the project, negating the need for multiple redundant copies and adding to storage savings.

Agility also means better resource allocation. Through Scrum meetings and Kanban boards, teams gain a transparent view of who is doing what, leading to more informed resource distribution, and optimal utilization of both human and machine resources, less idle time, and consequently, idle costs.

The agile mindset also extends to automation. Iterative development of automated pipelines for data extraction, transformation, and loading (ETL) can remove manual chokepoints one sprint at a time — accelerating the scaling process and significantly lowering costs related to manual labor and error rectification.

However, it’s crucial to note that agile is not a one-size-fits-all solution. Teams must be adaptive, willing to incorporate feedback, and make necessary pivots. Data science projects are multifaceted and complex; therefore, rigid adherence to any one methodology may introduce operational blind spots and unexpected costs.

Adopting agile methods to scale is not just about doing things faster; it’s about doing things smarter. By focusing on iterative improvements, transparency, and automation, you stand a far better chance of scaling your projects successfully while keeping costs in check.

Section 5: Efficiency, Automation, and the Role of IT

Efficiency is the linchpin holding the complex machinery of data science together. Without it, not only do costs spiral, but the time-to-value also increases, negating the competitive advantage of adopting data science in the first place. One often overlooked factor that plays a crucial role in enhancing efficiency is the role of IT.

While IT departments traditionally focus on maintaining system integrity and infrastructure, the rise of data science expands their role. They are now instrumental in establishing automated workflows and driving the adoption of agile practices, which has a direct impact on cost efficiency.

One actionable way to drive efficiency is by mapping Epics, or large chunks of work, to smaller Projects (or the equivalent unit of work supported by your data science platform), and Tasks/Stories to Projects, a practice often supported by agile methodologies. This integration serves as a lighthouse, guiding teams through the complexities of data science projects. Each Epic can be broken down into multiple smaller tasks or stories, helping in project scoping and role allocation. This fosters not just transparency but also accountability, thus driving efficiency.

Automated pipelines and CI/CD (Continuous Integration/Continuous Deployment) mechanisms, often overseen by IT, further enhance this efficiency. Automation expedites routine tasks, freeing up hours of data scientists’ time for more complex tasks and innovation. This is where IT’s role is indispensable. IT can set up these pipelines and maintain them, ensuring that the data science teams have all they need to work efficiently.

Another facet of this is managing cloud resources and computing power. Machine learning models require intense computation, which is both time-consuming and costly. Here, IT can allocate resources judiciously, based on the agile plan and current sprint tasks. This avoids the waste of computational power, ensuring that only the required amount of resources are utilized, thereby cutting costs.

In a nutshell, the role of IT is evolving to become an enabler in implementing agile practices in data science, which in turn is crucial for controlling costs and enhancing efficiency. By enabling agile practices and automation across data science teams, IT stands as a pillar supporting the agile framework in data science.

Section 6: The Broader Implications for Business Strategy and Competitive Advantage

As data science continues to mature, it becomes a more valuable core component of business strategy, offering avenues for significant competitive advantage. With agile methodologies, data science teams can amplify this impact, promoting data science from an operational tool to a strategic asset.

In the landscape of business strategy, agility equates to adaptability and responsiveness to market changes. Organizations with agile processes ingrained into their data science projects find it easier to pivot or scale, ensuring they stay ahead of competitors. For instance, breaking complex projects down into manageable ‘Epics’ or ‘Task Tickets,’ helps make it easier for executive-level decision-makers to grasp the trajectory of complex data science projects and allocate resources more judiciously.

Moreover, agile practices foster a culture of continuous improvement and innovation. As each sprint ends, teams review their progress and adapt future sprints accordingly. This iterative process nurtures an environment where failure is not penalized but seen as a learning opportunity. In a field like data science, which is often fraught with uncertainty and complexity, this culture is a strong competitive advantage

Furthermore, agile processes help manage risk — a critical priority for organizations looking to dominate their market space using data science. The iterative nature of agile, coupled with its emphasis on constant feedback, ensures that any risks are identified early in the process. This allows for timely mitigation strategies, ensuring that projects are not just completed on time but also meet the expected quality standards.

By focusing on these principles, businesses can unlock new dimensions of value, significantly impacting their bottom line and positioning themselves as leaders in their respective fields.

Section 7: Brief Tutorial on Building a Model Development Process Using Agile Processes

Navigating the complexities of data science projects can be daunting, especially when it involves building machine learning models. Follow this step-by-step guide to build a model development process using agile methodologies, akin to the Jira integration discussed earlier. The goal is to demystify the process, making it accessible for data science teams and enabling them to operate more efficiently and effectively.

Step 1: Define Project Scope and Objectives

Before you get started with any project, answer the following questions to form the baseline for your agile project:

What is the problem you’re trying to solve?
What are the success metrics?

Step 2: Break Down into Iterative Cycles or Sprints

Divide the project into smaller, manageable pieces, also known as sprints. These could last from two to four weeks, depending on the project’s complexity and the team’s familiarity with the tasks involved.

Step 3: Link to Broader Business Objectives (Using Epics or Task Tickets)

Ensure that your data science project, broken down into sprints, has clear linkages to broader business objectives. Utilize Epics or Task Tickets to maintain this alignment, making it easier for everyone involved, especially decision-makers, to see the bigger picture.

Step 4: Assign Roles and Create Cross-Functional Teams

In agile methodologies, cross-functional teams comprising data scientists, data engineers, and business analysts are critical. Assign roles and responsibilities early on to facilitate smooth collaboration.

Step 5: Utilize Agile Project Management Tools

Tools similar to Jira can be highly beneficial for tracking progress. These platforms allow for the efficient allocation of tasks and monitor the progression of sprints.

Step 6: Foster Collaboration and Constant Feedback

A culture of open communication and constant feedback is key. Encourage team members should to voice their opinions and concerns, enabling the project to adapt as needed.

Step 7: Monitor Progress, Adapt as Needed

Agile project management tools help you easily monitor a project’s progress. Leverage them, and if things aren’t going according to plan, the agile methodology allows you to adapt quickly. Make necessary adjustments either in the current sprint or plan for them in the next sprint.

Step 8: Conclude with a Retrospective and Lessons Learned

After each sprint — and at the end of the project — hold a retrospective meeting where the team discusses what went well, what didn’t, and how to improve in future sprints or projects.

Conclusion

In a world where data science and machine learning are increasingly vital for driving business strategy and achieving competitive advantage, managing costs and enhancing efficiency can’t be overstated. Adopting agile methodologies offers a robust framework for tackling these challenges head-on.

As you seek to scale your data science capabilities, consider the significant cost benefits that a well-implemented agile methodology can bring to your organization.

We encourage you to delve deeper into agile methodologies, and perhaps even engage in some further reading or practical training, as you continue your journey in data science. With the right practices in place, your data science projects won’t just be a cost center but a valuable asset contributing to your broader business objectives.

References

[1] Will Oremus, AI chatbots lose money every time you use them. That is a problem., The Washington Post, June 2023, last accessed 30 August 2023, https://www.washingtonpost.com/technology/2023/06/05/chatgpt-hidden-cost-gpu-compute/

[2] Andrej Karpathy, State of GPT, Microsoft BUILD, May 23, 2023, https://www.youtube.com/watch?v=bZQun8Y4L2A