The Challenges of Solving Problems with Data
It’s an axiom shared widely among data science researchers and industry practitioners: given enough data, no problem is impossible to solve. In reality, data can be unruly, project management entails many factors that go beyond tweaking a model, and turning results into actionable insights can often be a complex and uncertain process.
This week, we’re zooming in on data science problem-solving, with a selection of excellent posts that approach it from multiple angles. (Scroll down for some great reads on other topics.)
- How graph theory can help tackle a routing problem. Taking inspiration from popular video game Stardew Valley, Lily Wu dives deep into route optimization, and patiently walks us through her approach to solving shortest-path and minimum-spanning-tree problems with the algorithm she wrote in Python.
- Making data visible is a key step towards solid analysis (and solutions). Aine Fairbrother-Browne’s debut TDS post revolves around airline efficiency and the environmental impacts of the aviation industry—a massive, global challenge if ever there was one. But it also highlights the value of translating data (in this case, public aviation data) into clear visuals, which in turn allow us to detect trends, construct arguments, and (hopefully) point at potential solutions.
- The risks of pushing AI beyond its current abilities. Through the example of chatbots, Iulia Turc demonstrates the limits of technology to solve complex issues. Focusing on conversational AI in the context of mental health, this study is a good reminder that badly designed solutions can often exacerbate problems rather than solve them, so we need to be very cautious about deploying them in the real world. That’s especially the case when people’s lives and well-being are at stake. (Content warning: this post contains references to self-harm and suicide.)
- How to improve a project’s success through better design. Sometimes, the key to tackling a thorny problem isn’t in the data or the technical approach we use. As Khuyen Tran stresses in her latest post, setting up the right project structure is at least as crucial as choosing the right ML model. The template she shares will help data scientists produce work that is transparent, readable, and well documented.
Ready for more? We hope so—TDS authors have been sharing incredible work recently, on topics that span a wide spectrum: from the highly theoretical to the extremely hands-on.
- If you’re building a reinforcement learning project, don’t miss Felix Hofstätter’s new post, where he explains how to stop your AI agents from hacking the reward function.
- Are AI-generated Wiki articles on the horizon? Jeremie Harris and AI researcher Angela Fan discussed this question in depth in a recent episode of the TDS Podcast.
- Interested in data mining of physical materials? Joyita Bhattacharya continued exploring this topic, with a new article that explains how to measure materials’ characteristics from images.
- Anyone new to the field of ranking algorithms will appreciate Samuel Flender’s new, accessible primer, which touches on recommender systems, social media feeds, and more.
- Here’s a hands-on tutorial for the tinkerers out there: Bildea Ana walks us through the process of building custom Vertex AI pipelines.
- To end on a lofty note, carve out some time to read Ajit Rajasekharan’s deep dive on vectors and image representation in the context of generative tasks.
Do you enjoy spending time on TDS? Consider supporting our authors’ work by becoming a Medium member. We’ll be eternally grateful; you’ll get unlimited access to our entire archive.
Until the next Variable,
TDS Editors