The world’s leading publication for data science, AI, and ML professionals.

Applying Design and Agile Thinking to your Data Science project – Part 2

Learn how to apply Agile Thinking to any Data Science case

Photo by Hans-Peter Gauster on Unsplash
Photo by Hans-Peter Gauster on Unsplash

Think of a data-driven solution as a puzzle piece. This piece needs to have a suitable shape and colour to fit into a bigger puzzle. Our world is this puzzle, which is not complete yet. We want to make a suitable puzzle piece and then place it into the puzzle to make it more complete. In other words, we want to bring a suitable solution to this world. Designing and bringing an appropriate solution can be complex. Design Thinking makes the shape of the solution, while Agile Thinking ensures its implementation, taking the outcome of the Design Thinking.

This is Part 2 of the article, which focuses on how to apply Agile Thinking to a Data Science project. If you are more interested in applying Design Thinking to a Data Science case or learning about the differences and similarities of both methods, read Part 1 of this article, otherwise, keep reading!


Agile Thinking: Data science with data

Given the design of the solution, which was the outcome of the Design Thinking, the Agile Thinking method facilitates the implementation process of the design and ensures the delivery of the final data product in time. While during the Design Thinking the use of data was not necessary, in the Agile Thinking phase the use of data is required, but not necessarily from the beginning. Our desired outcome at this point is to launch the data product within a timeframe.

Agile Thinking is an iterative approach, consisting of phases. Each phase will gradually build on top of the previous one with the product launch to be the eventual goal. We start with a very basic outline of our product, which is the interactive wireframe (phase 1), and we continuously add functionalities to enrich each time the latest version. From the basic first version, we move swiftly and release a minimum viable product (phase 2) and then from that, the beta product version (phase 3). Finally, we deliver our final data product during the product launch (phase 4).

Each phase consists of the same five components; design, preprocessing, modeling, visualization, and deployment. The phases together with their basic components are displayed as pyramids in the diagram below. Every phase considers the same aspects, but each time builds on top of them by improving them and make them more complete.

Agile Thinking life-cycle, sketch by Elena Stamatelou
Agile Thinking life-cycle, sketch by Elena Stamatelou

At the beginning of each phase, there is some space for redefining our solution approach, by considering the learnings from the previous phase. This gives the possibility to combine the flexibility for change, as well as attachment to our initial direction, which was the outcome of the Design Thinking method. At the end of each phase, feedback sessions are necessary to be planned with the end-users of the data product (feedback team). That feedback provides knowledge about their needs and their level of satisfaction regarding the current version of the product. As a result, that helps to define the necessary modifications that need to be applied in the next phase.


Phase 1: Interactive wireframe

This phase covers the project kick-off, having no data yet. The main deliverable is the product outline and an interactive wireframe which will be a very basic user interface and will enable the users to get a first impression of our direction.

Agile Thinking: Interactive wireframe (phase 1), sketch by Elena Stamatelou
Agile Thinking: Interactive wireframe (phase 1), sketch by Elena Stamatelou

Design

We use the outcome of the Design Thinking method to define the initial architecture of the solution, the tasks, and the direction of this phase. We also create an initial sketch of the user stories, which we will use as a guide during the phase.

Preprocessing

Data requirements: Since there is no data, the primary aim here is to define the data requirements. **** We consider two opposite directions. On one side, by imagining that there are no restrictions regarding the data availability, we define the necessary data to solve the challenge. From the other side, by considering the existing data and its restrictions, we think of ways to achieve the final project goal. Then, we end up with a list with all the required data.

Data Protection Impact Assessment (DPIA): Before getting the data, we need to prepare and sign an agreement with the organization to ensure data protection. We work on the DPIA process to identify and minimize possible data protection risks.

Modeling

Literature study: Before starting with any modeling approach, we need to make a literature review. Our aim is to create a list with all the possible modeling methodologies, from the best practices to the state-of-the-art.

Visualization

Wireframe: Having in mind the user perspective and needs that are described in the user stories, we create the format and the general structure of the visualizations. We also make some high-level sketches and we combine them all into an interactive wireframe, which will be used for the feedback session at the end of the phase.

Deployment

Data infrastructure understanding: We gather information about the organizations’ infrastructure, such as the data storage or BI tools. Our goal is to figure out which tools we will use for the deployment of our data product. It is more cost-efficient and easy to use the tools that the employees are familiar with than trying to introduce new ones.

Feedback

At the end of the phase, we organize a feedback session, in which we present to the users the interactive wireframe, let them use it, and ask them questions to get their feedback.


Phase 2: Minimum viable product

Our goal, here, is to provide a foundation of the solution to the stakeholders and end-users, which will be the minimum viable product. From our side, we want to collect the maximum amount of feedback and learnings, so that we can ensure that our final working prototype will satisfy user’s and stakeholder’s expectations and needs.

Agile Thinking: Minimum viable product (phase 2), sketch by Elena Stamatelou
Agile Thinking: Minimum viable product (phase 2), sketch by Elena Stamatelou

Design

We use the feedback collected in the previous phase, to refine the user stories, reshape our design, and plan the current phase. We update the initial architecture by adding more details.

Preprocessing

Data collection design: Having the list with the data requirements, we design a way to get the data. If the data already exists in their systems, it is easy to request it and receive it. If the data does not exist yet, we should define a data collection process.

Data generation: Since it takes some time for the data collection, to save time, we generate synthetic data, based on the data requirements. For this phase, we use the generated data.

Modeling

Basic feature engineering: Before having the real data, we use the synthetic generated data to create basic features that we think are the most influential. We also clearly define the target variables.

Basic modeling: Using the basic features, we apply simple models to predict our target variables.

Visualization

Mockup: Using as a starting point the wireframe of the previous phase, we add data visualizations based on the updated user stories. Our focus is to make a user-friendly and interactive mockup that will make the users sense our direction and provide us constructive feedback.

Deployment

Deployment requirements: Based on the knowledge of the organization’s infrastructure, we specify the deployment requirements and check the feasibility and the costs involved.

Local deployment: For this phase, we run the model in our local server,

Feedback

Before the phase ends, we arrange a meeting with the users to share with them our concept and the data visualizations of the predictions by showing them our minimum viable product.


Phase 3: Beta product version

This phase focuses on delivering a beta product version, which includes the real model predictions and displays them in a user-friendly and interactive dashboard. At the end of this phase, users and stakeholders will test the product to identify bugs and other deficiencies, in order to go to the product delivery.

Agile Thinking: Beta product version (phase 3), sketch by Elena Stamatelou
Agile Thinking: Beta product version (phase 3), sketch by Elena Stamatelou

Design

Before starting the phase, we reconsider all the learnings and update the architecture by adding details from the data storage to the model deployment. This phase is one step closer to the actual product since we will work with real data.

Preprocessing

Data collection: Based on the data collection design and data requirements, we gather the data from the organization.

Data exploration: Having received the real data, we check the completeness (missing values) and timeliness (up-to-date data) to ensure data quality **** so that we can use it further for modeling. To gain a better understanding of the unexpected behaviors of the data, we look for outlier values (anomalies) which are false values or very important ones, showing a strange behavior of the data.

Modeling

Feature engineering: To continue using the same or similar modeling approach, we examine the extent to which the provided data can replace the synthetic data. Then, we match the features of the synthetic data with the features of the real data and add more features from the real data if we think they will improve the prediction results.

Advanced modeling: Having the features from the real data, we first apply the simple model from the previous phase to ensure that the features with real data can replace the synthetic ones. If the simple model is not suitable anymore, we will replace it with a more suitable or more complex one. Here, we could also experiment with different modeling approaches or with hyperparameter tuning and compare the performance of the different models or variations of the same model.

Visualization

Prototype: Moving from the interactive mockup of the previous phase to a prototype, requires working on further improvement of the data visualizations and on the front-end development of the dashboard. The focus here is to make a functional and interactive dashboard that can serve the end-user.

Deployment

Cloud/in-house deployment: Based on the deployment requirements that were defined in the previous phase, we should move from having the solution running locally to our devices to run it in the organization’s cloud or in-house servers. In this part, we deploy the prediction model and not the training. If everything runs smoothly, in the next phase, we will also deploy the training model.

Feedback

This phase results in a Beta product version that resembles the actual product since we worked both on the front-end (dashboard) and the back-end development (deployment). At this point, we give the last chance to get feedback from the users before the product launch.


Phase 4: Product Launch

In this phase, we apply the final improvements. When no improvements are needed, we deploy the training and prediction model by making continuous validation possible, and then launch the data product. Our goal here is to deliver a functional, reliable, and usable product to the stakeholders and users.

Agile Thinking: Product launch (phase 4), sketch by Elena Stamatelou
Agile Thinking: Product launch (phase 4), sketch by Elena Stamatelou

Design

In this last design cycle, we gather the feedback from the last feedback session and add the last refinements to the product architecture before the release.

Preprocessing

Preprocessing validation: We collect new data from the organization and compare it with the data used for the beta product version to detect differences.

Sanity checks: Based on the observation from the validation of the preprocessing with the new data, we create sanity checks to ensure that the input data has the suitable and desired format.

Modeling

Modeling validation: With the new data, we validate the beta product version and apply the last changes if needed.

Visualization

Dashboard: From the prototype of the previous phase, we build a more stable dashboard, in which we apply the last refinements.

Deployment

Continuous deployment: If the prediction model of the beta product version runs smoothly in the cloud or in-house infrastructure, we can go one step further to the deployment of our model. First, we also deploy our training model so that reruns in case of a code update or new data. Our aim here is to achieve continuous integration (CI), continuous delivery (CD), and continuous training (CT) for our prediction model. This means that with any new data or code update, the model is retrained and delivers the updated predictions.


Wrapping up

Finally, our product is ready to be launched!

With continuous integration, development, and training, our prediction model improves with new input. Providing this flexibility makes our data product adaptable to changes. By exposing it to new and more data improves the predictions of the data product.

Agile Thinking created a guideline to create a data product by taking as an input the design concept from the Design Thinking.

If you are interested in how to apply Design Thinking in a Data Science case, read Part 1 of this article.


Related Articles