Fail your way to success: The 11 steps we’ve taken in building our Data Science team

Over the past year, my team, my managers, and I have focused our efforts on building a Data Science team at our company’s headquarters. We already had data scientists at several teams, including one on my data analytics team. However, most of them worked separately, on different projects, and barely had an opportunity to share their knowledge, experience, and thoughts.
When we started building our data science team, some of the decisions we made were not good enough, or well thought out at the time. Also, none of us came from an engineering background, which drove us to learn (the hard way) some things that might have been obvious for data scientists with a computer science background. However, today we feel that we’ve uncovered some of the important building blocks that every data science team should make sure to include in their projects and practices.
In short, this is our list of 11 steps:
- Defining Our Customers
- Building a Pipeline
- Prioritizing the Pipeline
- Managing Tasks (via system)
- Weekly Meetings (for ideas sharing and dealing with problems together)
- Code and Model Reviews
- Version Management
- Coding Tests
- Model Monitoring (in production)
- Automated Training Processes
- Documentation
- Cooperation with other teams
So without further ado, I present The Hitchhiker’s Guide to Building a Data Science Team. You will need a spark in your eyes, a few months of your time, and – if you’re a Hitchhiker fan – a good towel! Here we go.
Defining Our Customers
The first thing we did was to determine every potential data science customer in our company. We talked to various managers and peers to obtain their input. For each department or unit, we asked whether we could build a model that would substantially improve their work or value. For any unit for which we gathered good ideas, we wrote them down and defined the unit as a potential customer. When this process was finished, our list contained several departments and units, including top management and even human resources.
Building a Pipeline
Gathering all the ideas from the first step of the project, our team built a list of suggestion models, creating a data science framework for each idea or problem raised for each of our prospective customers. Then we uploaded the list into our task-management system (in our case, Jira) to make all the ideas available to our team and our peers.
Prioritizing the Pipeline
After we had produced a list full of ideas and possible models, we had to admit that some of the ideas were probably not so brilliant. We decided to take the long list of models and organize it into something that we all know and love – a graph. We defined the X-axis as effort of development (taking into account data availability and the model’s complexity) and the Y-axis as value (to the company). The models requiring the least effort and adding the greatest value were assigned the highest priority, followed by the models requiring greater effort and adding high value. The models with the lowest priority required significant effort and yielded little value, to be left for happier days to come.

This list has changed a hundred times since we created it. It keeps growing, and the order of the models continues to change as the team and I become more acquainted with our abilities and limitations.
Managing Tasks
Now that we had a pipeline and its items were prioritized, we needed to figure out where we would coordinate with our teammates and plan how to develop the selected projects. This meant dividing each project into high-level stages, defining the tasks that had to be completed at each stage, and, most important, arriving at a definition of done (DOD) – that is, determining the outcome expected from this model and the level of accuracy measurements that the model requires.
In our organization, the R&D team uses Jira. We chose it as our task-management platform mainly because we might find ourselves collaborating with the R&D team on future tasks. We opened a board for our team and started to upload the high-level stages of each project.
Getting my team members to use Jira was a slightly harder task, given that most of them were accustomed to working alone and had never used a task-management platform before. When they gave me their weekly status reports on ongoing tasks, I would remind them to update Jira as well (and sometimes I would add tasks myself by putting in headlines and asking them to fill in the rest). Eventually, through managerial nagging, updating Jira became a habit. Today we all understand the value of Jira and its great importance in task coordination, time estimates, and documentation.
Weekly Meetings
Working as a team can be very helpful in many ways. One of those ways is through weekly team meetings, where everyone talks about the tasks they have been working on, the problems they have encountered, and the way in which they plan to solve those problems.
For example, one of us needed to create a mechanism that can update a certain prediction in cases when it is far enough from the actual result, within the first 30 days after the prediction was made. We had to plan a mechanism that would compare ongoing results with the prediction value, mark the predictions that needed updating, update the relevant data required for making a new prediction, and eventually send the output. Thinking about this process together helped us save a lot of time, and eventually the process we created was light and efficient, instead of the very sophisticated process that we initially thought was necessary.
These sharings can evolve into serious debates, in which we review data and methods and attack the problem as a team. We have solved many problems this way, including ones where we encountered a "hard wall" – a problem that we felt was unsolvable, such as too low accuracy metrics after we had tried "everything," or various performance issues. Sharing our knowledge, looking at a problem with a fresh set of eyes, and bringing together people with different kinds of expertise have often led to a new perspective and eventually a clear-cut solution.
Code and Model Review
Like weekly team meetings, peer review can improve the quality of our work. Once a model is completed, we ask one of the team members to conduct a detailed review of it, looking at the code, its design, the accuracy metrics, hyperparameters, thresholds, and the technical efficiency of the process. The reviewer provides feedback, such as "you might try this function to make the process more elegant and efficient" or "adding this feature might improve accuracy."
It usually takes a bit of time to implement the reviewer’s comments, occasionally forcing us to postpone the model’s deployment, but we never skip the review. Such input is key to maintaining quality results, and it keeps us from making mistakes or missing bugs that will come back to bite us in the future.
Version Management Platform
During the long, hard process of developing a data science project, we make many changes. These might be the addition or removal of a feature, the adjustment of a threshold, or an overhaul of our design after we have finished the model (for example, if we realize that the model should be calculated in a different order from what we had originally thought). Sometimes we later regret such changes; for some types of models, we don’t immediately understand the consequences of a change until it’s too late. Because some models work as a "black box," a change might not be able to be undone – a fact that we learned the hard way.
For these reasons and more, it is extremely important to manage versions carefully, always preserving the option of returning to an earlier version. When making drastic changes, we used to just save the model in separate notebooks. Then, following R&D’s advice (and two team tutorials), we moved to the more elegant solution of managing model versions in Git version-control software. This practice also enabled our team to collaborate better. Don’t skip version management – it can save you a lot of time and sweat.
Coding Tests
Data scientists use a wide variety of tools to conduct their work, from the principles of linear algebra to statistical methodologies. In order to create a learning model from all these techniques and data, we need to use coding. While we might not have started as coders, and coding might not be the main focus of our work, coding (and doing it right) is one of our most important tools.
One of the things we learned to do was to add tests to the code. For example, to check the validity of the data we are using (garbage in, garbage out) and to confirm the code’s functionality (via unit tests). Furthermore, before we deploy a new version, we make sure that all of our processes and packages work together properly (integration tests); we verify that the whole model functions correctly (end-to-end test); and we perform any other usability test that we consider necessary.
We learned how to do these things the hard way – from bad deployments (that had us working weekends to find and correct the problems) and from bugs that we accidentally inserted into our code without realizing the mistake at the time.
Once again, we asked for the help of our colleagues on the R&D team. After a few sessions with them, we slowly started to implement tests in our code. This is definitely still a work in progress for us, and we need to learn how to plan and implement tests better.
Model Monitoring
Tests are not enough. Once our team is handling more than one or two ongoing processes that run every day or every week, for example, we must build a monitoring system to make sure that they are running correctly. If not, the system should send us an immediate alert.
Our friends in R&D (who already knew whom they were dealing with!) helped us out here, as well. They use Grafana dashboards to monitor their processes and kindly offered to build us a dashboard. We connected the dashboard with the Slack communications platform, starting a group chat for all the relevant technical people, and programmed it to send us a push notification whenever a process fails.
Most of our processes run during the night, so we put Slack on mute when we go to sleep. However, when we wake up, we can see whether something has happened. If so, we can look into the problem immediately, often saving the process and putting it back into working order before the output is scheduled to be sent to the client.
Automated Training Processes
After we have selected a project, planned it, built it, reviewed it, deployed it, and started monitoring it, our work is still not done. To enable machine learning to carry out its "learning" process we must feed new data into our model, which then corrects and adjusts our predictions according to the new data’s trends. Sometimes the trends are continuous and strong enough to justify a recalibration of the model.
This process can be done manually, but then it requires manual monitoring of the model’s quality to determine when it has to be calibrated. Another method is to automate the process. One of the data scientists on my team decided to build an automated training process for one of his models, and it was so successful that we decided to embrace it for all models.
The training process goes like this: we load a new train data batch into the model and retrain it. Then we compare the most important accuracy measurements and check whether the retrained model generates better results on fresh unseen data. If the model improves, we keep the retrained model. If not, we discard it and move forward. This process can be scheduled on a regular basis (say, once a month) or applied whenever we deploy a new product version.
Documentation
When planning a project, we document some high-level designing to indicate how the model should work, what the various parts of the calculation and measurement processes are, and how the output should look at the end. What does this documentation mean, and what purpose does it serve?
Usually, a project’s documentation includes several parts:
- The business objective that gave rise to the project and the rationale for achieving the objective via a data science model
- An explanation of the model’s structure and the data being used
- A review of the accuracy goals of the project and the actual accuracy of the current model version (which might change over time)
- Details about the process (runtime, output form, and related processes)
- A link to the code itself, to the notebook and/or Git
The purpose of this documentation is to serve as a reference for anyone handling the project’s processes (such as people on our team, the R&D group, or the client) and to preserve necessary information for future use.
Preparing a design document at the start of a project, including a description of the final output and accuracy goals of the model, is an essential step. However, the actual outcome at the end of the project, which can sometimes take up to a year, is often very different from the original design. While the business problem usually stays the same, the ways in which we might be able to solve it or provide the final output are likely to vary considerably. Discovering that we can do something better, more efficiently than what we initially thought is part of a very positive process.
So why not skip the documentation stage and leave it for the very end of the project? Say we’re collaborating with another team and the model is going to be incorporated into their product. We would be expected to lay out the general structure of the model during the planning period and to understand which parts will require a development effort on the product side so that the other development teams (usually working in agile methods) would be able to plan their sprints and estimate the required effort accordingly.
Cooperation with Other Teams
Over and over, I have mentioned the excellent assistance that we received from our peers in the R&D department, from building and maintaining our infrastructure to helping out with new tools or with expertise that we were in need of (on tests, for example). While there might also be some problems over time (such as technical issues and occasional miscommunications), the good relationship with the R&D team has proven critical for us. In this section, we describe an example in which collaboration saves the day.
Over the course of the past year, we discovered that a good data science model is not always enough. It doesn’t matter how amazing and accurate the model is if we lack a client who is willing to take our model’s output and work with it. When predicting that a user is about to churn; we need to work out a treatment that can prevent the churning (or else it doesn’t matter that we flagged it in advance). Running some kind of analysis or conducting a survey may help us understand why users churn. Then we have to come up with appropriate solutions, one of which might change our user’s mind. Not all treatments work; testing can help show which treatment is most effective. This whole process can take as much effort and time as developing the data science model itself.
Our luck is that this activity is usually carried out by the company’s product, analytics, and research teams. For this reason, it is imperative that we share our projects with other teams and collaborate in the process. Usually, all sides end up enjoying the partnership, which provides an exciting opportunity for other teams to learn about what the data science team does.
Parting Words
Of course, none of the above would have been possible without my team’s hard work and creativity throughout the year. Much of what you have read here is the result of their initiative, their work, and lessons learned from what we didn’t do very well. So thank you guys for your share in making all of this happen.
Lastly, if you made it to the end of this long post, I thank you for your time and hope that this was a helpful read for you. Please reach out and let me know what you think, what you are doing differently, and mainly, what unknowns are still unknown to me!