The world of Data is a lucrative one, no doubt. Dubbed "the new oil" by British mathematician Clive Humby all the way back in 2006, 15 years on and it seems as though now people are really beginning to witness the impact Data can have in commerce and now everyone wants a piece of the pie.
"The amount of data we produce every day is truly mind-boggling. There are 2.5 quintillion bytes of data created each day at our current pace, but that pace is only accelerating with the growth of the Internet of Things (IoT)." – How Much Data Do We Create Every Day? The Mind Blowing Stats Everyone Should Know, Forbes Magazine, 2018
Unlike oil, data cannot be "used up" since data is infinitely renewable. In contrast, we say the two are similar in the sense that, like oil, raw data isn’t valuable in and of itself. To generate any value from data, the data ought to be rigorously refined in a process better known as "preprocessing".
With what looks like a monstrosity of an industry on the horizon comes many new roles embedded into the ecosystem which allows the market to tick. So far, the outbreak of big data has created 4 main roles but due to the nascent nature of the industry, many of these roles are ill-defined and could all fall under one umbrella term "Data Science" depending on the company.
Note: I’ve used TargetJobs to get a general view of the duties involved in each job (the bullet points in each job section are taken from their job descriptions).
Data Scientist
The Data Scientist transforms disparate data into clean actionable insights. By further extrapolating and sharing insights derived, Data Scientists hold the power to solve some of the most daring problems on the planet.
In essence, the Data Scientist combines computer science skills with statistics & probability, math, analytics, modeling, and business acumen to aid in uncovering answers to the important question which assist the company in making objective decisions.
As a Data Scientist you will be responsible for identifying key areas of improvement within an organisation, scoping out problems through a data science lens, and through the use of advanced techniques it will be your job to deliver multiple key initiatives to drive business performance and revenue. – Data Scientists responsibilities from a job board.
Amongst these responsibilities, Data Scientists are expected to communicate with technical and non-technical audiences, make recommendations to adapt existing business strategies and extract data from multiple sources.
This role is huge and covers a broad range of tasks such as Time Series, Natural Language Processing, and Computer Vision.
Data Analyst
The close cousin of the Data Scientist is the Data Analyst. The Data Analyst scrutinizes information with the assistance of various analytical tools in order to identify facts and trends, again to aid in making decision making much more informed for employees, clients, or both.
Many baby boomers – people without training in a technical background – that wish for a career in Data Science tend to leverage Data Analyst roles to kickstart their careers in the world of data and later make the full transition into Data Scientist.
Duties of a Data Analyst include performing analysis to determine what is meant by data presented, preparing reports based on analysis, presenting reports to senior staff, analyzing the quality of data, and removing corrupted data.
The lines between a Data Scientist and Data Analyst can be quite blurred. Some say the distinguishing factor is that a Data Scientist uses models to make predictions though it’s possible a Data Analyst may do this also. I much prefer the description provided to me by Harpreet Sahota –
"Data Scientist discovers and Data analyst analyze".
Data Engineer
Data engineers are responsible for building data pipelines that transform raw, unstructured data into clean formats hence enabling the Data Scientists to proceed in performing their magic. Essentially, their roles include creating and maintaining the analytics infrastructure which unlocks almost all other data functions – i.e. databases, serves, and large-scale processing systems.
This role is quite intensive and requires a significant set of technical skills like a deep knowledge of SQL database design and multiple programming languages. However, to work effectively, the data engineers also ought to have extremely good soft skills as they’ll often be required to work across an array of different departments in order to understand what the senior management seeks to achieve from the companies data.
Encompassed in the need to fulfill business and/or client objects with data functions, the Data Engineer would also need to build algorithms that provide easier access to raw data.
All in all, a Data Engineer gets the right data to the right places.
Machine Learning Engineer
At the intersection of Software Engineering and Data Science lies the Machine Learning Engineer. According to springboards definition, the machine learning engineer "leverages big data tools and programming frameworks to ensure that the raw data gathered from data pipelines are redefined as Data Science models that are ready to scale as needed" – in layman’s terms, they deploy the machine learning models built by the Data Scientist into production.
This role involves merging the knowledge of software engineering best practices as well as various Data Science techniques such as Machine Learning, Deep Learning, and Statistical model in order to feed data into the models.
Typically, the roles and responsibilities of Machine Learning Engineers are often designed much better than that of a Data Scientist. This feat is due to the simple fact that to hire a Machine Learning Engineer, you’d often have to have a very good sense of how and why you’d want to leverage Machine Learning in the first place.
Wrap Up
There are various ways to break into the world of Data. Though I’ve listed only 4, there are some other titles that one may come across such as a Research Scientist and Decision Scientist. I typically find that the role of a Research Scientist is often related to tasks that are much more research-orientated like developing new algorithms (hereby making it a more academic role rather than practical), whereas, Decision Scientists are more practical and focused of framing data analysis in terms of decision making. I decided not to include these two roles in this post for 2 reasons:
- I mainly see them in larger companies
- They are mainly applied to research (talking of Research Scientists)
which made them ineligible for my list though not obsolete as a Data related role.
Thanks for reading! Connect with me on LinkedIn and/or Twitter