We often focus on how to manipulate and extract value from the data we have at our disposal, and devote less brainpower to the infrastructure and workflows that keep data orgs running smoothly. Well, not this week! Here are four standout articles on the different facets of data strategy. (For posts on other topics, scroll down!)
- Learn how to avoid costly mistakes in your annotation process. As Maria Mestre shows, faulty or ineffective annotation can lead to the failure of entire ML projects. On the other hand, getting labelling schema right can have a direct positive impact on your business’s bottom line.
- Putting together the building blocks of a data strategy. How does a company go from "we collect heaps of data" to "we leverage data for better business outcomes"? Ivy Liu shares key high-level insights based on her enterprise experience.
- When do you need your own data platform? With the proliferation of data-architecture solutions and frameworks, it’s sometimes tough for leaders to decide on the right approach. Barr Moses compiled a list of symptoms that suggest a company should think more seriously about its data-management strategy.
- Unit tests, and how to make the most out of them. Exploring a more hands-on aspect of Data Strategy, Karen Bajador Valencia zooms in on data-quality unit tests, which can be a tool for ensuring data integrity—and for preventing data-fueled disasters. This useful post focuses on an integration of PySpark and Great Expectations to achieve a robust big-data processing workflow.
Our other favorite reads this week cover a rich and wide range of topics—if you’ve had your fill of data-strategy talk, give these a try:
- Our latest Monthly Edition looked at the intersection of data science and food.
- Dan Baker shared his new (and massive) labor of love: a public world atlas where visitors can explore 2500 datasets.
- How do you go about tackling your first NLP project? Santiago Víquez wrote a handy resource for those who are new to text data and its challenges.
- Johanna Appel published [an excellent deep dive on neural networks](http://natural evolution and neuronal growth processes) and the way they echo natural evolution and neuronal growth processes.
- Still on neural networks, Daniel Holmberg‘s recent post is an accessible introduction to GNNs in Python (with a step-by-step implementation included).
- For a thorough guide to transfer learning in the context of greyscale images, look no further than Chris Hughes‘s new article.
To all of you who were inspired to become Medium members after reading a TDS article: thank you for your support of our authors’ work. We really appreciate it.
Until the next Variable,
TDS Editors