No one needs your data

Reduce the risk of not adding value with a data product

Aaron Berdanier
Towards Data Science

--

This article is the first part of a series. Check out the introduction here.

My mobile phone follows me around and shares my location with Google. They collect about 220,000 observations of my location every year, which I give to them in exchange for map services and advertising (they also give me access to my data at timeline.google.com⁠ — check it out if you have not yet, it is fascinating).

Certainly, these data are valuable. Location data are rich with information. Google uses them to advertise. But how valuable are they in the raw form? TechCrunch reports that a location datum on its own is worth about $0.0007 through data brokers and exchanges, which means that I could sell my data for about $150 per year. That is not nothing but, for me, selling the data is probably not worth the effort.

Let’s scale this up a little bit. Lyft, a rideshare company, completed 620 million rides in 2018. Assuming the pricing estimate above is valid, Lyft’s trip destination data would be worth about $434,000 in 2018 ⁠ — also valuable, but tiny compared to their $2.2 billion in 2018 revenue.

Despite the potential additional revenue, rideshare companies are notoriously reluctant to offer up their data for others. Why? On top of the immediate monetary value of their raw data, Lyft also has bigger plans in mind for the data that they are collecting. Lyft is refining their raw location data to derive outcomes that will support their own platform, from improving arrival estimates to testing self-driving vehicles.

Data do not add value on their own

Where my mobile phone went in North America in 2019. Image by author.

Raw location data are valuable, but they do not add value on their own. This is because the end goal is never to get data. The end goal is to get a job done. As a result, the value that people get is not inherent in the data, but is dependent on how well it enables outcomes compared to the alternatives.

The value of my personal location data goes up if it comes with additional derived information about who I am as a person and what I like. What that extra data adds is more actionable information. For an advertising company, the fact that I have only visited my local coffee shop seven times in the last year (despite being within 100 meters of it 114 times) is less informative than knowing that I also identify as a man between the ages of 24 and 39 (a demographic that spends less but consumes more coffee than others, and is more likely to drink coffee at home than at a coffee shop).

True to my demographic, I buy a lot of locally-roasted coffee at the grocery store for brewing at home. A coffee company would find more value in knowing the probability that I am a coffee drinker than just knowing how often I visit other coffee shops, let alone knowing my age and gender or getting a bunch of data on places that I have visited.

Similarly, even though both crude oil companies and mobility companies use petroleum as an input, they offer totally different output to totally different customers. Most people do not refine their own crude oil to fuel their cars, or even care how or where it was refined. When I fill up my car with gasoline I am buying a derived product so that I can move around. If I used a rideshare service or bought an electric car, I wouldn’t even need that refined product to fulfill my main need: moving around.

Different types of value from data

The value of data changes dramatically depending on how it is productized. As noted above, raw data are usually less valuable than a more processed form because doing a job with them is more difficult. Higher-level outputs⁠ — for example building off of algorithms to support or even automate decisions⁠ — involve more complexity but enable different results.

This is visualized nicely in a pyramid of data product types:

Data Product Pyramid. Image by author, based on content in Designing Data Products.

I like this concept because it clarifies the outcomes. At the base, your data product is just raw information. In the middle, your data product is calculations based on some inputs. And, at the top, your data product is decisions.

This pyramid fits neatly with increasing requirements and needs from the product. Raw data are at the base because they are the least refined, and each successive level builds off of the previous in a hierarchy of needs for your data scientists and engineers (higher levels usually require more technical development). Thus, it also maps closely with what you will require from your users. Lower levels on the pyramid will necessarily require more technical users.

The data product pyramid is a tool for considering what outcomes your data enable and for whom. The implication is that data products and the companies that produce them are not all alike. They are quite diverse. Cassie Kozyrkov, Chief Decision Scientist at Google, explains the consequences for the business with bread:

If you’re opening a bakery, it’s a great idea to hire an experienced baker well-versed in the nuances of making delicious bread and pastry. You’d also want an oven. While it’s a critical tool, I bet you wouldn’t charge your top pastry chef with the task of knowing how to build that oven; so why is your company focused on the equivalent for machine learning?

Are you in the business of making bread? Or making ovens?

Sometimes, the alternatives are good enough

Imagine I want to sell my location data to an autonomous vehicle manufacturer (and get my $150!). Why would they want my data? A market problem that autonomous vehicle manufacturers face is getting consumers safely from point A to point B. Acquiring new data could help those companies do that job better by training their vehicles on where I drive fast or slow.

The value of my data goes up as it is refined for a specific outcome. For example, do I offer my vehicle positions (raw data) or the derivative of where I’ve sped up and slowed down (derived data)? Using the data product pyramid, there is value added (and effort expended) by moving from raw vehicle position data, to derived velocity data, to an algorithm that suggests speed based on traffic, to an automated decision-making system that warns a passenger that the vehicle is going to slow down to let a pedestrian cross the street.

But how an autonomous vehicle is trained doesn’t matter to the consumer, as long as it is trained. (Training is the manufacturer’s ultimate problem.) For the first step above, a manufacturer might be able to make the leap with my data, but they could also just run some live tests on a track. For the second or third, they could buy aggregate traffic data from another supplier. And if the last step is their main goal, they might not even need real location data. These are all alternatives (competitors) to my data and, at the end of the day, my data might not really meet their needs without some additional work.

I think that is probably why, instead of just buying a bunch of location data, companies like Ford* have decided to buy simulation companies to support autonomous vehicle development. They are buying the ability to get higher-order outcomes sooner than would be possible by collecting real-world data.

*Disclosure: I work for a company that is a wholly owned subsidiary of Ford.

How to ensure a data product adds value

Ensuring value from a data product is just like any other product: it requires considering the customer problem, the product solution, and the alternatives. But, data products have some unique constraints for each step.

Identify the customer needs.

The problems that data products solve are not necessarily novel, even if the method or output is new. Also, your customers are not likely technical. In fact, it is sometimes easier to identify problems that data can solve by discovering existing inefficiencies with customers who have well-framed problems around other solutions. For example, sabermetrics evolved as a quantitative approach for baseball teams to gain a competitive advantage in strategy. Does your problem need to be validated? Maybe not if you are early on in the process but, remember, a product without a problem is an hypothesis. (Sabermetrics took about 20 years to catch on because the customers did not recognize the value.)

Identifying customer needs is almost just like any other product discovery. But, if you’re reading this, you probably have an assumption that a data product is a good way to solve some problem. Given that, a major challenge with data products is removing the solution from the conversation to avoid confirmation bias. Data products, machine learning, and artificial intelligence are super loaded with buzz, which can give you false positives if you are not careful. [In a future version I’ll expound with some cautionary tales!]

Clarify the outcomes your data offer.

By starting with the problem before turning to the data product pyramid, you can safeguard yourself against a common critique of data products that do not address desired outcomes. How is knowing the customer problem first helpful? Because the way you position your solution (and what type of solution you offer) will depend on the user and their needs. For example, are you helping other software companies understand their product usage? If your users are product managers (e.g., Pendo), then you may need to provide some derived output in pretty graphs. If your users are software engineers and data scientists (e.g., Segment), then you may need a raw data API.

Understanding where you stand on the data product pyramid will help you clarify what outcomes you can deliver to the customer. Generally, outcomes that are higher on the pyramid are more complex and can attract more value, but that is not always the case, as we’ll see in the coming weeks when I talk about complex versus complicated and humans-in-the-loop. So, right-sizing your solution to the problem is important for maximizing value. If you cannot identify how your data product contributes to solving the problem, then you might need to do some more validation first to make sure your problem needs solving.

Explain the differences from alternatives.

All products have competitors. For data products, those could also include alternatives that do not rely on your type of data. You can demonstrate value by differentiating your product from the alternatives. How could people do the same thing without your product? Maybe someone is recording information by hand or needs to do a lot of manual rearranging of information. Maybe they need to make calculations on their own. Maybe someone is making decisions that rely on intuition… maybe someone needs to make a decision! Focusing on these differentiators will also help you clarify what outcomes you can offer.

A common pitch for data products is that they are “faster, more accurate, more consistent, and more transparent.” They are sometimes more expensive too, either in money or in effort. Because of that, it is beneficial to stay skeptical and critically ask why your offering is truly more valuable than an alternative. For example, why would a company spend more money for a new system that is 90% accurate when they could have a human do it for less money with 80% accuracy? Can someone get a similar outcome in a spreadsheet? Does the extra 10% offer enough additional value? If you cannot differentiate your outcomes, then you might need to go back to the data product pyramid and reevaluate what you are offering.

Summary

No one needs your data, but it might help them do their jobs better. How much value your data adds depends on what types of outcomes you can support.

Ensuring that you are adding value for customers depends on discovering a need, clarifying how you could solve it, and explaining why your approach is better than the alternatives. For data products, remaining neutral about the hype, focusing on tangible outcomes, and staying skeptical about why your product is different can reduce the risk of not adding value.

Following the series? Check out part two, which is about strategies to increase the viability of data products for your business. Reach out on LinkedIn or Twitter to talk more, I’d love to hear from you.

--

--