Data Science in the real world

3 Lessons from a Data Journalism Intern at a Fin-Tech Startup

Find out about the lessons that I learned from interning at a Fin-Tech startup that is using machine learning to revolutionize the credit underwriting industry.

James Le
Towards Data Science
7 min readOct 3, 2019

--

The Data Science Team at ZestFinance (I’m on the upper left corner!)

As we move into the autumn season, I thought I’d take the time to reflect on my 14-week summer internship as a Data Journalist for Zest Finance in Los Angeles, CA.

If you’re not familiar with my background, I’m currently a Master student entering my last year at graduate school while studying Computer Science. How did I end up at ZestFinance? I applied online and the recruiter found my experience interesting, went through the interview process, and eventually accepted the offer to come out to their office for the summer as one of 12 interns.

My goal is to share some of the key lessons that I learned from this memorable experience.

What is ZestFinance?

The company’s mission is to make fair and transparent credit available to everyone. According to the company website:

The multi-trillion-dollar lending industry still relies on aging scoring techniques that oversimplify the view of a borrower’s finances. As a result, millions of deserving borrowers get rejected all the time. Some 46 million Americans are either “credit invisible,” with no file at one of the three major credit bureaus, or “unscorable,” with insufficient information to generate a credit score. This is a failure of the modern credit system.

Machine learning-based credit models can generate more profitable underwriting by drawing deeper insights from more data, especially unused data banks already have. ZestFinance was one of the first companies to deploy machine learning models for lending, and we know it works.

Zest’s Automated Machine Learning (ZAML)

More specifically, I spent my summer working on ZAML, Zest’s proprietary automated machine learning platform. ZAML enables lenders to analyze non-traditional data, including data they already have in-house, such as customer support data, payment histories, and purchase transactions. The platform can also take into account traditional credit information and nontraditional credit variables, such as how a customer fills out a form, how they navigate a lender’s site, and more. While the black box problem has slowed the adoption of machine learning in consumer finance, ZAML is able to fully explain data modeling results, measure business impact, and comply with regulatory requirements.

As a data journalist, I was a proud member of the Data Science team. More specifically, my responsibilities entail: (1) Unearthing trends and insights from large and small datasets to tell data stories, (2) Leveraging internal and external data sources for analysis, modeling, and visualizations and translating them to topical narratives, reports, and white papers, and (3) Evangelizing data science research initiatives while working cross-functionally with the Marketing team to quantify their growing impact and relevance.

Now that you know a little more about the team and my role, let’s dive into a handful of the main insights that I took away from this internship.

Minimum Viable Analysis

This concept from an excellent article by is one that I kept coming back to throughout the summer (check out my Datacast interview with him as well!). The post talks about the concept of Minimum Viable Analysis, meaning that data scientists can make incremental progress and don’t assume that stakeholder needs the most complex solution available.

The process of producing an MVA is quite straightforward: (1) Understand the business problem at hand clearly, (2) Produce fast and superficial insights to address the problem, (3) Communicate the results back to the stakeholders and get their thoughts, and (4) Wrap up the analysis or dig deeper.

By keeping this framework in mind, I was able to churn out a lot of different iterations for the data stories/technical reports/blog posts that I was working at the time. The stakeholders in my case were other data scientists in the team as well as several executives in different functional departments.

Cognitive Diversity

I was very fortunate to be in a position where I can interface with colleagues from multiple functions, including data science, marketing/design, legal, business analysis, product management, and software engineering. This allows me to tap into what I call cognitive diversity — differences in perspective or information processing styles. In other words, it’s how individuals think about and engage with new, uncertain, and complex situations.

Given a hypothetical example of implementing a machine learning model to determine whether a loan/credit applicant gets rejected or approved, here are a couple of mental models to address that:

  • A data scientist cares about how to design the perfect experiment to achieve the best performing models.
  • A software engineer cares about how to set up the right infrastructure to put the models into production.
  • A marketer cares about how to communicate the unique features that the models use to the public.
  • A legal counsel cares about how to address the potential risk and compliance associated with the model’s results.
  • A business analyst cares about how to calculate the business impact that the model can have for the clients.
  • A product manager cares about pretty much everything mentioned above.

If there’s one takeaway I am truly grateful for, it was the sheer amount of mental models that I was able to develop. By learning to listen to different opinions and speak/write in the language of my colleagues in a manner they understand, I honed my muscle to think multi-laterally — the ability to solve problems using indirect and creative approaches via reasoning that is not immediately obvious.

Domain Knowledge

I have pretty much no knowledge about credit underwriting before doing this internship with ZestFinance. But after the summer, I know plenty about this age-old industry:

  • Some 26 million Americans are considered “credit-invisible” by the federal Consumer Financial Protection Bureau because they have no history on file with one of the three credit bureaus. Another 19 million people don’t have enough data on file to be considered scorable by the lending system. Millions more have significant errors in some stage of being corrected on their credit files. These 45 million-plus Americans face real consequences: higher rejection rates, higher loan expenses, inferior financial products — even though many of them may, in fact, be highly creditworthy.
  • The credit score, one of the great economic catalysts of the 20th century, has not kept up with today’s consumers. The proliferation of data sources available to score people more accurately and fairly cannot be consumed easily or readily by the legacy scoring techniques.
  • Lenders are starting to switch to AI and machine learning underwriting, which processes more data through sophisticated algorithms and can handle messy or flawed data. Machine learning credit models draw conclusions from millions of interactions and use 10 to 100 times more variables than traditional techniques. Banks and lenders that have used machine learning report higher approval rates or lower default rates, sometimes both, by finding good borrowers missed by traditional techniques (and rejected bad borrowers who might have gotten approved before).
  • Makers of newer machine learning scoring algorithms have to ensure that their models do not carry over the bias that exists in the current lending system. In fact, ZestFinance’s ZAML tool can “de-bias” models in credit and other regulated industries.

Over the course of the internship, I gain substantial domain knowledge on how banks and financial institutions make decisions on credit applicants, how machine learning is being adopted by them, how to build explainability and fairness into model design, and how to ensure a correlation between a model’s predictive performance and its ROI value for the business clients.

Most importantly, I truly enjoyed ZestFinance’s mission-driven culture to build a world that gives more people the opportunity that comes with credit. I have nothing but warm feelings toward the company and highly recommend you check out their blog and openings for new content and job opportunities.

Wrapping Up

The internship has definitely reaffirmed my passion for Artificial Intelligence/Machine Learning and I am grateful that my works did leave some traction for future works. The rapid analysis and iterative experimentation phase, the communication skills required to talk to different stakeholders, the massive potential application to solve problems in a wide variety of business domains have all contributed to my interest in this field.

Me and fellow interns at the Wisdom Tree on top of Hollywood!

I couldn’t be more grateful for the opportunity to spend the summer using data journalism to create value for an incredible product in an emerging industry. I accomplished a lot, made plenty of mistakes, and most importantly, learned more than I could’ve ever anticipated thanks to an awesome team and a diverse group of mentors. I hope that I was successful in communicating at least a few of those lessons to you. Thank you for reading.

If you enjoyed this piece, you can find more of my writing and projects at https://jameskle.com/. You can also follow me on Twitter, check out my code on GitHub, email me directly or find me on LinkedIn. Sign up for my newsletter to receive my latest thoughts on data science, machine learning, and artificial intelligence right at your inbox!

--

--