The world’s leading publication for data science, AI, and ML professionals.

Data Collection in Machine Learning Products

With examples

Photo by Brett Jordan on Unsplash
Photo by Brett Jordan on Unsplash

When I’ve just started my path in data science everything was about accurate modeling for me. But quickly I realized that to provide real value, models can’t exist in a vacuum. I was missing important aspects of data to get reasonable performance, it wasn’t very clear how users react to model outcomes. So I’ve started to collect examples from the products that I thought or knew were ML-powered, to understand the ways different companies collect their data to address these questions. In this post, I want to share some of the cases I’ve gathered, mostly from consumer-facing products, and what problems they solve for data scientists and product managers working on data-powered products.

Although many of the patterns I describe below are not purely specific to ML products and can apply to any digital product, they become critical when it comes to ML. Why? ML models can operate on examples they’ve never seen before or in highly personalized environments, so they’re nearly impossible to test for every output. Allowing for user feedback thus helps to identify experiences that didn’t work well. Also, some models are capable of dynamically incorporating feedback and almost immediately adjusting user experience. Most importantly, ML models are based on data, so quality Data Collection at scale is a basis for quality models.

Disclaimer. I don’t actually know how the elements described in this post function within the real products – this review is based on my understanding and information companies shared in publicly available articles and presentations. What I describe here is my opinion – "what I would likely do".

I’ll write about several categories of data collection:

  1. Pre-experience – used to tune in the product functionality before usage.
  2. Feedback – used to measure the reaction of the user to the product experience.
  3. Crowdsourcing – used to collect additional data which is not linked to the product experience for the specific user.

Pre-experience patterns – ready to start?

Pre-experience data collection can be used to quickly personalize the product. The goal here is to collect relevant data fast. It helps to make sure users are exposed to the data-powered product functionality as soon as possible. In some cases, it’s the only way to make a product work. So data collection needs to happen during onboarding to a product or feature. It can come in a form of filling the user profile, setting up personal goals, or calibrating the product.

There is a range of possible ML learning models that can be involved in personalization – recommenders, classifiers, regression models, to name a few. They can use collected data either as model features or labels.

It’s also important to remember that when you ask users to share information, you need to provide them with value in return.

Goals and preferences – I want to, I like, I am

For the recommender systems collecting preferences data helps to tackle a user "cold start" problem. We can build an initial feature vector before we get enough implicit data from the user’s interaction with the system. In the below example Spotify users are asked to select at least 3 favorite artists during the onboarding. Collaborative filtering can then be applied to the user’s choices to provide the initial playlist recommendations.

(Left) Image by the author, screenshot taken from author's experience at Coursera content recommendations setup | (Middle) Image by the author, screenshot taken from author's experience of Youper onboarding, mental health goals | (Right) Image by the author, screenshot taken from author's experience at Spotify favorite artists onboarding screen.
(Left) Image by the author, screenshot taken from author’s experience at Coursera content recommendations setup | (Middle) Image by the author, screenshot taken from author’s experience of Youper onboarding, mental health goals | (Right) Image by the author, screenshot taken from author’s experience at Spotify favorite artists onboarding screen.

Since users generally expect to answer some questions during onboarding it can also be a window of opportunity to capture behavior aspects that otherwise are extremely hard to measure, as motivations or interests.

It’s important to be selective enough with the number and type of questions you want to ask – to avoid breaching user’s privacy or making the onboarding too long for users to complete. So you need to understand what kind of data can potentially be the most discriminative for your models. Before asking users an additional question, you’ll need to do user or domain research, talk to experts, analyze available data to identify the most promising direction. Then come up with hypotheses, try collecting a bit of data to validate that it brings in useful signals.

Calibration – let’s adjust it

Calibration is usually used for sensor data, to make sure that at every specific device the tool works as expected. An example could be Face ID where the last layer of the face recognition model should be trained on specific user’s data. Calibration can be used to obtain positive class examples – as user’s face from different angles.

Apple Face ID calibration interface | Image by Vladimir Yakimov, posted with permission, screenshot taken by Vladimir while setting up Face ID.
Apple Face ID calibration interface | Image by Vladimir Yakimov, posted with permission, screenshot taken by Vladimir while setting up Face ID.

Feedback patterns – how did you like it?

Feedback is referred to as outputs that can be imputed back to the system. When we think about the product context of ML – the feedback can be defined by user reactions to the outputs of the model that can be used to improve user experiences.

Implicit vs explicit – observe or ask

Implicit feedback is collected by observing users’ reactions to product components (clicks, mouse hovers, conversions, engagement duration).

The benefit of implicit feedback is that it doesn’t require additional actions from the user, so it’s collected for everyone. At the same time, relying only on implicit feedback makes it harder to understand the nuanced reasoning behind user behavior. Without additional information, it’s impossible to tell if the user got bored or got busy when stopping a YouTube video from playing. However, understanding the reason can be crucial in deciding to show this or similar content when the user comes back to the app again. Also in many cases, it takes time to collect enough discriminative implicit data to make accurate decisions for every user.

Not any user activity data collection can be viewed as feedback. For example, my Spotify streaming history needs to be connected to the recommendations I’ve got to become feedback. Did I select a playlist from suggested or searched for a specific track instead? Did I skip any tracks while streaming a playlist? The feedback question is always – what was the reaction to the experience?

Tools like Facebook reactions or saving to bookmarks can also be attributed to implicit feedback. They both are helping users to achieve their goals – letting know their friends that they care about the post or assuring easy access to important items in the future. These actions signal to the models that specific content is important for this user. At the same time, they have some diversity to understand motivations – reacting to a post with Care is different from reacting with Angry.

Explicit feedback is obtained by asking an explicit question. If users can opt-out from answering (which is a good practice when it comes to user experience) this type of feedback comes with self-selection bias. Only engaged users or those with strong opinions will be willing to provide it.

(Left) Image by the author, screenshot taken from author's experience at Facebook Messenger Voice Call feedback screen | (Right) Image by the author, screenshot taken from author's ads experience at Gmail inbox.
(Left) Image by the author, screenshot taken from author’s experience at Facebook Messenger Voice Call feedback screen | (Right) Image by the author, screenshot taken from author’s ads experience at Gmail inbox.

At the same time, it can help you understand the problem deeper by asking more specific questions about the experience.

Positive vs negative feedback – like it or hate it

Positive feedback confirms that interaction with the product was successful. I like it, it’s useful, this works for me. Negative feedback highlights issues with the system or allows to change unwanted outcomes. I don’t like it, it’s incorrect, I don’t want it this way.

"Conversation is not important" | Image by the author, screenshot taken from author's experience at Gmail desktop inbox.
"Conversation is not important" | Image by the author, screenshot taken from author’s experience at Gmail desktop inbox.

For example, Gmail’s "conversation is not important" functionality can allow collecting correct labels to the model that predicts message importance. It can be used to improve the model overall or personalized predictions for a specific user.

Implicit feedback tends to be positive-only in most configurations, while explicit can be both positive and negative within one interface element.

Asking vs allowing for feedback

We can select to proactively ask users for feedback by sending push notifications or making it part of product flows.

For example, it can be used as part of an active learning algorithm where it’s important to get labels for specific cases. Reaching out to relevant users for labels would increase the chances that they’ll react. For example, services like Airbnb or Booking.com can proactively reach out to users to solve an item "cold start" problem by asking about a new property or build a user profile by comparing reactions of different users.

(Upper left) Image by the author, screenshot taken from property rating push-notification sent to the author by Booking.com after the stay (hotel name modified)| (Lower left) Image by the author, screenshot taken from place rating push-notification sent to the author by Google maps after a visit | (Right) Image by the author, screenshot taken from property rating in-app flow sent to the author by Airbnb after stay.
(Upper left) Image by the author, screenshot taken from property rating push-notification sent to the author by Booking.com after the stay (hotel name modified)| (Lower left) Image by the author, screenshot taken from place rating push-notification sent to the author by Google maps after a visit | (Right) Image by the author, screenshot taken from property rating in-app flow sent to the author by Airbnb after stay.

Passive feedback interferes with user experience less but can be still accessible. It can be used by users to correct or report their experience. As I mentioned before, sometimes we can’t test for all the edge cases that the model will encounter in real life, so allowing for such feedback gives us transparency over negative or confusing user experiences.

Feedback in the unit converter of Google search | Image by the author, screenshots taken by the author by searching "inch to cm", and opening Feedback survey.
Feedback in the unit converter of Google search | Image by the author, screenshots taken by the author by searching "inch to cm", and opening Feedback survey.

Confirmation feedback – did we get it right?

Confirmation feedback is similar to proactive feedback. Here the product change is made proactively, and the user has an option to confirm it or dismiss it. In the below example Grammarly autocorrects the word but allows users to provide positive feedback by confirming the change or negative – by rolling back to the originally typed word. This feedback then can be utilized as labels for the classification model. These are going to be the important labels because the model made a mistake that was corrected by the user.

Grammarly autocorrection | Image by the author, screenshots taken by the author by typing "implicitely" while using Grammarly autocorrection.
Grammarly autocorrection | Image by the author, screenshots taken by the author by typing "implicitely" while using Grammarly autocorrection.

Crowdsourcing patterns – help us to help others

In the above examples, users usually expect something to happen right after sharing the data to improve their experiences, or to express their opinion about interaction that just happened. With crowdsourcing, users can be motivated to provide data to impact other users or to get better product quality over time.

This might be your way to get more information on items you don’t have enough data by asking several users about it, obtaining labels on predictions that you’re not sure about, or collecting data for a completely new product that doesn’t exist yet. In the below example Linkedin asks other users to fill in information about their connections, which is not going to impact their immediate experience.

(Left) "Help us identify relevant opportunities" - Image by the author, screenshot taken from author's Linkedin feed and modified to hide personal information | (Middle) "Help us understand what your video is about" - Image by the author, screenshot taken from author's "Help improve Google Photos" flow featuring author's video | (Right) "What's it like onboard at the bus?" - Image by the author, screenshot taken from author's Google Maps search.
(Left) "Help us identify relevant opportunities" – Image by the author, screenshot taken from author’s Linkedin feed and modified to hide personal information | (Middle) "Help us understand what your video is about" – Image by the author, screenshot taken from author’s "Help improve Google Photos" flow featuring author’s video | (Right) "What’s it like onboard at the bus?" – Image by the author, screenshot taken from author’s Google Maps search.

Conclusion

Collecting relevant data is an important tool in building effective ML models and great products they power. Working closely with Product and UX teams on diversifying data collection mechanisms can help data scientists to get relevant data points on time ensuring a better model performance.

If you want to learn more about product and UX aspects of Machine Learning, check:

  1. Apple Human Interface Guidelines – Inputs
  2. Human-Centered Machine Learning – Plan for co-learning and adaptation
  3. Jawaheer, Gawesh & Szomszor, Martin & Kostkova, Patty. (2010). Comparison of implicit and explicit feedback from an online music recommendation service. 10.1145/1869446.1869453.

Related Articles