Notes from Industry

A good Data scientist asks users lots of questions about the (business) purpose of the product they might build. I’ve been able to save weeks of my time on some projects by talking to end users before performing any technical work.
Before the data preparation, the data engineering, or the model building ask your users "What will you do with the final outputs?" Go even further by showing them a low fidelity prototype of what the output might look like with fake data or visualizations.

I’ll give an outline of an example project I worked on. The business had a vague idea to outreach to customers that had an unresolved angry phone call. The business wanted users to be happy and customer satisfaction was a key performance metric that had potential downstream revenue and bonus impacts. In addition, if customers filed an official complaint there was required regulatory documentation that automatically cost the company $100 along with the cost of more staff time.
Some colleagues immediately began some sophisticated data engineering to get the massive amount of call center transcriptions and connecting the calls to customer information in our data warehouse. We started debating how well the natural language processing (NLP) steps were working and manually checking call transcriptions to see if the NLP summaries were appropriately labeling calls as angry or not; some calls started angry, but then the customer seemed ok by the end. But before we went further we met with our colleagues who would be outreaching to these angry customers, which was a task they had not done before. When presented with the low fidelity prototype they had numerous questions and needs.
- How do I know why the customer was angry?
- Can I see their latest call transcription?
- Can I see the customer’s order history?
- I need the customer’s phone number, email address, contact address, and billing address
- How do I know if someone else isn’t set to call them? Would this data coordinate with the call center operations team?
- Can you prioritize people with higher dollar amounts and anger scores?
- Would this report come daily? How would we receive it? In an internal application? In Excel?
- How many people would show up on this list daily? We don’t think we will have time to contact more than 10 people a day until we get used to this approach.
Ultimately we made a number of additions to the output for our colleagues doing the outreach and focused much less on the performance of the natural language processing.

For a different project, the business wanted us to evaluate some vendor financial risk models on a sample of our own historical data. We already had the infrastructure and contract for one of the models; the business wanted to judge whether it was worth the effort to potentially switch models. We showed them a table with fake data of what the output would look like (AKA the low fidelity prototype). This prompted questions and comments:
- What is the time period for this analysis?
- What are the population inclusion and exclusions?
- What are our criteria for making a change in vendors? Can we multiply the difference in mean absolute error by our population to get a potential savings dollar amount?
- Is mean absolute error most appropriate? Aren’t we sensitive to outliers and would mean squared error be more appropriate? Or should we include multiple metrics in our evaluation such as also including r squared and percent of people within plus or minus 5% of actual?
- Should we talk to IT ahead of time to ask them how much work and time it might be to implement a new financial risk model?
- How are we using the financial risk model today? How much does it’s performance matter? Do we just need a model that is "within the ballpark"?
- What does the future improvement road map look like for these vendors? Do all the vendors seem poised to succeed in the next few years?
The prototype helped us data scientists solidify our methods and helped managers think more about what they were asking.
Try creating a low fidelity prototype for your next project. Low fidelity means you should be able to do it quickly; it may not look nice, but the users should get a good idea. I like making visualization sketches on paper or using some fake data to create tables and graphs in Excel or R.