The day-to-day work of a data scientist can involve very complex concepts, but at its core we find a simple premise: if we look at enough (reliable) information from the past, we might be able to say what’s likely to happen in the future.
The journey from observation to prediction takes time, skill, and intuition; there’s no one-size-fits-all magic trick to get us there. But acquiring a deep toolkit and experimenting with a wide range of use cases certainly helps. To support you along the way, this week we’re highlighting some of our recent favorite posts on the subtle art of making better predictions. Let’s dive in.
- Build a solid foundation around forecasting basics. If you’re new to the world of time series analysis, Jason Chong‘s beginner’s guide is the perfect place to start: it’s thorough but accessible, and covers essential concepts like stationarity, time series decomposition, and ARIMA modelling.
- It’s always a good time to think about seasons. Staying in the general area of time series, Alvin T. Tan, Ph.D. focuses on the crucial role of seasonality when making forecasts. He emphasizes the particular importance of the concept in the context of model calibration, and shows why data scientists ignore seasonal cycles at their own peril.
- Explore the intersection of prediction and natural language processing. Predictions are important in many other subfields of Data Science and machine learning. Case in point: a new, fascinating project by Yu Huang, M.D., M.S. in CS. The challenge at hand is to predict the medical discipline of clinical texts; this patient walkthrough demonstrates the amount of work that goes into the processing and preparation of textual data for generating robust predictions.
- Why not give collaborative filtering a try? Recommender systems are all around us, and they invariably rely on algorithms making predictions. A good one will successfully tell you what product or dish you should choose—or in the case of Khuyen Tran‘s latest tutorial, which movie to watch. Her easy-to-follow post focuses on the power of collaborative filtering, and explains how to leverage this approach to make solid recommendations.
- Using the power of data to inform climate-related policies. To make smart and effective decisions around climate change, governments and other stakeholders need to have a good idea of our current trajectory. Giannis Tolios recently shared a handy resource for creating atmospheric CO2 time series forecasts using the Darts library in Python.
Our human intuition-based prediction algorithm (aka "gut feeling") tells us you might want to read about some other topics this week. We hope that’s the case, because the following links will take you to some great posts that you absolutely shouldn’t miss.
- On the TDS podcast, Jeremie Harris and guest Katya Sedova discussed potential malicious uses of AI (disinformation campaigns, fake-news generation, and others)—and what researchers need to do now to reduce future risks.
- If you’re applying to a data science master’s degree—or thinking about it— Alison Yuhan Yao just published a comprehensive guide based on her own recent experiences.
- How can we quantify information? Casey Cheng‘s deep dive explores Claude Shannon’s work on information theory and the fundamental particles of communication.
- It happens to the best of us: a wrong click, and a whole dataframe or dataset goes missing. Marie Sharapa is here to the rescue with a detailed guide on salvaging your lost data in Google BigQuery.
- For a foray into software engineering, check out Lily Chen‘s insights on the correlation between redux size and app performance, based on her past work at Slack.
Thank you for learning and exploring with us this week—and a special shoutout goes to all of you who support our authors’ work by becoming Medium members.
Until the next Variable,
TDS Editors