The world’s leading publication for data science, AI, and ML professionals.

Range over depth – the value of a generalist in your data team

Don't be caught with a team full of specialists and be surprised when things don't turn out as expected.

Range over Depth

The value of a generalist in your data team

Photo by Pascal Swier on Unsplash
Photo by Pascal Swier on Unsplash

Don’t be caught with a team full of specialists and be surprised when things don’t turn out as expected.

One cannot simply proclaim this without some qualification. Let’s see how I got to this conclusion…

Googling these terms, generalist and specialist, will result in contradicting opinions, so it will be prudent to offer my definition for the sake of the argument.

Data generalists are data professionals that have range across different aspects of any data pipeline. From acquisition to AI. They focus on breadth. Typical skills would include ad-hoc analysis of data sets (across industries) in order to answer some Business questions. They do re-active reporting – "we have this problem – what happened?". They would also have some degree of competency in predictive modeling and identifying trends. They can hack a visual together with a front-end tool but can also spot issues throughout the data life cycle. For example in the data quality, data warehousing techniques or performance degradation to name some repetitive issues I deal with on a daily basis. Sounds like a unicorn? Not quite – they might not have significant depth in any of these fields, but they do have the range. I think these individuals could possibly go by many names or titles but the key is that they are fast learners who are not particularly placed in a certain silo. They can quickly interpret data and tell the story. They are big picture people.

Specialized data professionals, on the contrary, have in-depth knowledge of one or some of these areas. They typically perform optimally when a problem is already well defined, but struggle when the problems still need to be found within the bigger picture.

Of course, I am highlighting the skill or persona rather than the title. You could find what I deem to be a good generalist owning the title of Data engineer, data analyst, business analyst, data modeler or data scientist. Basically any data professional.

Now, let me backup my claim.

1 – Accept that we work in wicked learning environments

In the book Range¹, David Epstein elaborates on the concept of wicked vs. kind learning environments, but I am going to have to give you the crash course version for the sake of not losing you after this sentence.

The table below gives context as to what these two opposing learning environments mean.

When I first read this definition I confused it with easy vs. hard problems and made the reactive statement: "How dare you say chess is an easy problem!" (I hear all you recent chess converts after watching the Queen’s Gambit). However, it enlightened me on something that has glaringly bothered me for a while. Why with all the geniuses in the world can we not solve the most pressing problems out there?

My conclusion now, simply put, is because our world is not a kind learning environment. Life is more complicated than a closed loop which we can control. Our challenge in the first place is defining the problem, let alone solving it. Collectively we can solve hard defined problems, but it is much harder solving a problem which isn’t well defined in an environment where black swans tend to pop-up at random. We can really only pick better over worse – incremental changes which makes tomorrow a little better than yesterday.

Now picture your work environment. The technologies you use, the existing platforms, new platforms, ambitious new directions your fearless leaders want to go into. I write this from a data professional perspective, but the concept most certainly will be applicable to a wider professional community.

Creating Machine Learning algorithms on a defined data set might be hard, but kind.

Getting the data out of the source, might be easy and kind

Building a data pipeline, combining different sets of data and conforming on grain and semantic meaning starts getting hard, and potentially not so kind.

See how I threw in a couple of big words to try and convince you of my closing statement?

Now add time, people’s personalities, ego, career paths, organizational complexities and you are most definitely looking at a wicked learning environment.

2 – With information so readily available the need for hyper-specialization reduces.

Note, the need reduces, it’s not removed. With information being readily available, the generalist is empowered to hack something together for the sake of the analysis. With sites like stackoverflow, anything someone has ever tried before can typically be found in less than 10 minutes if you are skilled in the art of Googling.

3 – Coordination effort is the real killer

Coordination effort is the effort to coordinate the delivery of any work product. Some might know it by its street name – "herding cats".

Jeff Bezos famously coined the "2 pizza team" rule for his ideal sized team.² As the group size increases the relationships that need to be maintained starts becoming hefty. There is good literature on teams and team performance by Richard Hackman which goes into detail about team dynamics and why large teams should be avoided. ³ ⁴

Effort is related to the number of relationships that needs to be maintained to get anything done. Number of relationships goes up by n(n-1)/2 as a function of the number of people (n).

The simple math is that the more nodes you have (i.e. people you need to consult) to make a decision your coordination effort will go up with this graph:

graph by author
graph by author
  • Small teams of around 4 only has 6 relationships to maintain
  • A larger organizational program with say 40 people will have 780 links to manage

Now sure, not every individual needs to interact with every other individual, but the degree of increase remains the same. I would also argue that n probably represents not only the number of people but also abstract nodes like forums, i.e. groups of people that have the mandate to make decisions. Some people would also react differently when they need to make a decision in a group vs. one on one. So the curvy part of the graph is probably steeper than in this graphical depiction. So if you have large teams on top of various large decision-making forums, your coordination effort will follow suite.

The absolute requisite number of specialists within teams and generalists spanning across teams will reduce this complexity by a significant factor.

4 – Business does not want fancy algorithms, they want their problems to be solved

We might be in the age of increased AI capability, but it is important to note that the problem statements have not changed all that much. How do I grow my revenue? How do I upsell on my existing clients , how do I …. The methods to try and solve these problems have become more sophisticated, but the core problem statements remain.

An algorithm based on bad data will not solve any of those problems. I have seen a major increase in the need for data scientists and AI specialists, but I fear this buzz will fall flat on its face if the business problems do not get resolved. Soon execs will start seeing through the facade.

So why do I advocate for data generalists?

  • In wicked learning environments they are the people that will define the scope of issues (aka figure out what the real problem is) and hand it over to the specialists in a kind environment, so they can solve it faster and more efficiently. Too many work hours are spent on solving the wrong problem. Generalists find the right problem, quicker and then specialists solve it if it is hard. Generalists tend to have more access to analogies from similar situations and approaches to help solve issues faster which is key in wicked learning environments.¹
  • With mass availability of information it has become easier for the laymen to figure out enough context about a new problem. It has been the enabler for the generalists to quickly broaden their range without spending too much time in depth.
  • The generalist reduces the coordination effort by essentially eliminating unnecessary relationships, because they range across them. They need to be given the mandate to make decisions and thus cut out the management of added relationships.

You are on a boat and nearly back at shore when your engine fails, you think, I wish I had a boat mechanic nearby, but really you don’t need the mechanic, you just need someone to tell you that the lake is 3 feet deep so you can climb out and push the last stretch.

If we continue to not solve, or over solve the business problems, we will be replaced by more efficient problem solvers. Simplistic solutions rather than behemoths veering away from the end goal. In an environment which changes fast, we need to be faster in defining the problem and faster to decide a way forward. We need the big picture generalist.

1. Epstein D. RANGE Why Generalists Triumph In A Specialized World. 1st ed. MACMILLAN; 2020.
2. Two-Pizza Teams: The Science Behind Jeff Bezos' Rule | Inside Nuclino. Blog.nuclino.com. https://blog.nuclino.com/two-pizza-teams-the-science-behind-jeff-bezos-rule. Published 2019.
3. J. Richard Hackman. Leading Teams: Setting The Stage For Great Performances. Harvard Business Review; 2002.
4. Coutu D. Why Teams Don't Work. Harvard Business Review. https://hbr.org/2009/05/why-teams-dont-work. Published 2009. 

Related Articles